How to manipulate a text / CSV file using awk/gawk? How to insert/add a column between columns, remove columns, or to update a particular column? Let us discuss in this article.
Consider a CSV file with the following contents:
$ cat file Unix,10,A Linux,30,B Solaris,40,C Fedora,20,D Ubuntu,50,E1. To insert a new column (say serial number) before the 1st column
$ awk -F, '{$1=++i FS $1;}1' OFS=, file 1,Unix,10,A 2,Linux,30,B 3,Solaris,40,C 4,Fedora,20,D 5,Ubuntu,50,E$1=++i FS $1 => Space is used to concatenate columns in awk. This expression concatenates a new field(++i) with the 1st field along with the delimiter(FS), and assigns it back to the 1st field($1). FS contains the file delimiter.
2. To insert a new column after the last column
$ awk -F, '{$(NF+1)=++i;}1' OFS=, file Unix,10,A,1 Linux,30,B,2 Solaris,40,C,3 Fedora,20,D,4 Ubuntu,50,E,5$NF indicates the value of last column. Hence,by assigning something to $(NF+1), a new field is inserted at the end automatically.
3. Add 2 columns after the last column:
$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file Unix,10,A,1,X Linux,30,B,2,X Solaris,40,C,3,X Fedora,20,D,4,X Ubuntu,50,E,5,XThe explanation gives for the above 2 examples holds good here.
4. To insert a column before the 2nd last column
$ awk -F, '{$(NF-1)=++i FS $(NF-1);}1' OFS=, file Unix,1,10,A Linux,2,30,B Solaris,3,40,C Fedora,4,20,D Ubuntu,5,50,ENF-1 points to the 2nd last column. Hence, by concatenating the serial number in the beginning of NF-1 ends up in inserting a column before the 2nd last.
5. Update 2nd column by adding 10 to the variable:
$ awk -F, '{$2+=10;}1' OFS=, file Unix,20,A Linux,40,B Solaris,50,C Fedora,30,D Ubuntu,60,E$2 is incremented by 10.
6.Convert a specific column(1st column) to uppercase in the CSV file:
$ awk -F, '{$1=toupper($1)}1' OFS=, file UNIX,10,A LINUX,30,B SOLARIS,40,C FEDORA,20,D UBUNTU,50,EUsing the toupper function of the awk, the 1st column is converted from lowercase to uppercase.
7. Extract only first 3 characters of a specific column(1st column):
$ awk -F, '{$1=substr($1,0,3)}1' OFS=, file Uni,10,A Lin,30,B Sol,40,C Fed,20,D Ubu,50,EUsing the substr function of awk, a substring of only the first few characters can be retrieved.
8.Empty the value in the 2nd column:
$ awk -F, '{$2="";}1' OFS=, file Unix,,A Linux,,B Solaris,,C Fedora,,D Ubuntu,,ESet the variable of 2nd column($2) to blank(""). Now, when the line is printed, $2 will be blank.
9. Remove/Delete the 2nd column from the CSV file:
$ awk -F, '{for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=2 file Unix,A Linux,B Solaris,C Fedora,D Ubuntu,EBy just emptying a particular column, the column stays as is with empty value. To remove a column, all the subsequent columns from that position, needs to be advanced one position ahead. The for loop loops on all the fields. Using the ternary operator, every column is concatenated to the variable "f" provided it is not 2nd column using the FS as delimiter. At the end, the variable "f" is printed which contains the updated record. The column to be removed is passed through the awk variable "x" and hence just be setting the appropriate number in x, any specific column can be removed.
10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
$ awk -F, '{$2=$2":"$x;for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=3 file Unix,10:A Linux,30:B Solaris,40:C Fedora,20:D Ubuntu,50:EAlmost same as last example expcept that first the 3rd column($3) is concatenated with 2nd column($2) and then removed.
how to print different values in each row in a new column using
ReplyDelete$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file
You can pass the output of the awk to sed and get it like this:
Deleteawk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file | sed 's/,/\n/g'
I have a csv file in which first column is given as date and time. eg: '30/04/2014 00:00' where 00:00 is common in all rows. I need to delete 00:00 alone in every fields.
ReplyDeleteYou can use sed for this, Refer examples here:
Deletehttp://www.theunixschool.com/2014/08/sed-examples-remove-delete-chars-from-line-file.html
I used below statement to display only 20 chars in file.But its not working as expected.
ReplyDeleteawk -F, '{$3=substr($3,0,20)}3' OFS=, SFDC_OP_TEST1.csv > SFDC_OP_TMP.csv
Please correct if i m wrong.
After substring command, you should do "print $3". Also remove the 3 outside.
DeleteI have a file like this
ReplyDeleteajith bangalore 32 50
ram mysore 30 43
manu indore 40 32
shyam bhopal 32 35
vijay indore 32 45
And I want to replace the city of "manu" to delhi. No other lines should be changed.
Is that possible using awk ?
awk '$1=="manu"{$2="delhi";}1' file
DeleteLet me know how could i get only columns whose datatype is String(non-numeric)??
ReplyDeleteI have a file file like this
ReplyDelete40000008,10,
40100017,22,0.0
40100075,212,20
40100153,20,22
40100182,1212,50
40100202,1313,0.0
40100229,133,55
40100231,3131,77
I need to generate a new file with respect to last column. If last column is zero or null then new column should be added with B and if not with A.
40000008,10,31,B
40100017,22,0.0,B
40100075,212,20,A
40100153,20,22,A
40100182,1212,50,A
40100202,1313,0.0,B
40100229,133,55,A
40100231,3131,77,A
Please help to get the command.
awk -F, '{$(NF+1)=int($NF)>0?"A":"B"}1' OFS=, file
DeleteHow can i merge two file into a different file werein their column one is the same but different values in column two.
ReplyDeleteEx:
file1:
Server OS
Testserver1 Solaris
Prod1 HPUX
File2:
Server Status
Testserver1 Running
Prod1 Disconnected
expected Output file3:
OS Status
Solaris Running
HXUP Disconnected
$ cat a1
DeleteTestserver1 Solaris
Prod1 HPUX
guru@laptop:~$
guru@laptop:~$ cat a2
Testserver1 Running
Prod1 Disconnected
guru@laptop:~$
guru@laptop:~$ awk 'NR==FNR{a[$1]=$2;next}{print a[$1],$2}' a1 a2
Solaris Running
HPUX Disconnected
guru@laptop:~$
Thanks Guru
DeleteHi Guru
ReplyDeleteI have CSV like:
1,1234566,asdasd,asdasda
and need to encrypt $2 using AES 128
please help.
Here is a sample input
ReplyDeletePiyush | 12345 | 5 | 5 | 4
James | 007 | 0 | 0 | 7
Knuth | 31415 | 100 | 100 | 100
For which the output is
Piyush | 12345 | 5 | 5 | 4 | 14
James | 007 | 0 | 0 | 7 | 7
Knuth | 31415 | 100 | 100 | 100 | 300
max | | 100 | 100 | 100 | 300
min | | 0 | 0 | 4 | 7
mean | | 35.00 | 35.00 | 37.00 | 107.00
sd | | 46.01 | 46.01 | 44.56 | 136.50
How do iadd extra record(row ) after the last line of input file
I hope you mean how to add the last column to every line in the file:
Deleteawk -F"|" '{$(NF+1)=$3+$4+$5;}1' OFS="|" file
Hi. Can you help? I need to get the number of characters-1 (letters only) of the fourth column and add them to the second column. I´ve tried several options without luck. Thanks!
ReplyDelete1 11782884 T -C
1 17034455 C -GCGCGCGT
1 27100940 G +C
Expected
1 11782885 T -C
1 17034463 C -GCGCGCGT
1 27100941 G +C
awk '{$2=$2+length($4)-1}1' file
DeleteGuru, you are awesome solution provider. I too have one to ask your help. I have a file which is pipe delimited. I would like to replace a value for a particular line and column which will always vary. for ex:
ReplyDeletecat file.txt
a|b|c|d|e
a|c|x|y|z|u|b
c|e|y|b
now I would like to find, replace and save, let's say line 2 pipe delimited value "y"(at 4) as "k" so my file will look like:
cat file.txt
a|b|c|d|e
a|c|x|k|z|u|b
c|e|y|b
this is just an example. my actual file has more than 200 pipes and it's really difficult for me to count and change values. I was thinking if I can read 3 inputs for my script, let's say:
$ ./changeValue 2 4 k
will update the same file without creating a new file.
Really appreciate your help.
awk -v line=2 -v col=4 -v val="k" -F"|" 'NR==line{$col=val;}1' OFS="|" file
DeleteHi, here are some simpler solutions to your examples
ReplyDelete9. Remove/Delete the 2nd column from the CSV file:
awk -F, '{print $1 "," $3}' file
10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
awk -F, '{print $1 "," $2 ":" $3}' file
Have a nice day.
Thanks for your comment. If the file were to have 100 columns, you cant do this, meaning it is not a generic solution.
DeleteI have a file named product_sale With Below format
ReplyDeleteproduct_name,jan,feb,march
cakes,20,40,60
pizzas,10,4,6
cokes,10,20,40
I want to add extra column which is average value of columns from jan to march. Like below format.
product_name,jan,feb,march,Average_sale
cakes,20,40,60,40
pizzas,11,4,6,7
cokes,10,10,40,20
How to generate this ?
awk -F, '{$5=round(($2+$3+$4)/3);}1' OFS=, file
DeleteHi can all these commands be applied to pipe separated dat files also?
ReplyDeleteI have a day file with below data 1|2|3|4 I need to remove the first column and two new columns to it in the start like a|b|2|3|4
ReplyDeleteecho "1|2|3|4" | awk -F"|" '{$1="a|b";}1' OFS="|"
DeleteHi...
ReplyDeleteI have a csv file with total 42 columns
and from that file i need to make a new csv file that contains only column 23-31 and 42th (only 10 columns) for every lines i have...
Can you help me how to generate it? thank...
You can do this using cut command itself:
Deletecut -d, -f23-31,42 file
HI Guru,
ReplyDeleteThanks for the wonderful explanation. My situation is a bit tricky.. I have a CSV file in which one field has http responses. So it includes comma, double quotes and many other special chacters, it isalso is not limited to one line. The issue im facing is how do i escape the special characters are new line and print the exact same thing as CSV.
Refer to following for detailed question : http://stackoverflow.com/questions/33965840/how-to-escape-comma-double-quotes-and-new-line-in-csv-using-shell-bash-script?noredirect=1#comment55689411_33965840
Hi Guru,
ReplyDeleteI have a csv file and need to include in column 3 with " " example "192.168.1.10,192.168.1.11"
cat file.csv
server1;redhat;192.168.1.10,192.168.1.11;up
server2;centos;192.168.1.20,192.168.1.21;down
and I want the output write in file like this:
server1;redhat;"192.168.1.10,192.168.1.11";up
server2;centos;"192.168.1.20,192.168.1.21";down
Really appreciate your help.
awk -F";" '{$3=q$3q;}1' q='"' OFS=";" file.csv
DeleteThanks Guru Prasad, I forgot to inform you that column 2 also need to add with symbol !!. For example !!redhat!! and !!centos!!.
ReplyDeleteResult will be like this:
cat file.csv
server1;!!redhat!!;"192.168.1.10,192.168.1.11";up
server2;!!centos!!;"192.168.1.20,192.168.1.21";down
awk -F";" '{$2=e$2e;$3=q$3q;}1' q='"' e='!!' OFS=";" file.csv
DeleteThank you very much Guru Prasad. It's work now. Again,
DeleteHow about from file.csv
cat file.csv
server1;!!redhat!!;"192.168.1.10,192.168.1.11";up
server2;!!centos!!;"192.168.1.20,192.168.1.21";down
in column 2 which is !!redhat!! and !!centos!! required remove !!?
Example:
server1;redhat;"192.168.1.10,192.168.1.11";up
server2;centos;"192.168.1.20,192.168.1.21";down
Really appreciate your help.
Hi Guru,
ReplyDeleteI have a .dat file (dellimiter = space)with thousand columns. I want to
1. duplicate column 1
2.insert 4 columns of 0 after 3rd column (after duplicating column 1)
Can you help me? I could not figure out from your examples :(
Thank you very much!
Hello
ReplyDeletecan we add a colum or columns not a serie of numbers but containing characters or something else thank you
Hi Guru,
ReplyDeletei have:
time,nooffiles1,nooffiles2,
HR00:,1,2
HR01:,3,4
i want to become like this
time,nooffiles1,nooffiles2,total
HR00:,1,2,3
HR01:,3,4,7
how to generate this?
Thank you very much
awk 'NR==1{$(NF+1)="Total"}NR!=1{$(NF+1)=$2+$3;}1' FS=, OFS=, file
DeleteHi Guru,
ReplyDeleteI have a file with "|" separator. we need to update a column value based on the 1st column and will print in output file note that file does not have fixed columns.
Eg: 101|a|b|c
101|q|a|f
102|w|r|p|f
102|e|s|d|g
103|l|d|v|h|b|x
103|z|x|c|b|n|m
need to update some column value based on the first column value.
Really appreciate your help and Thank You
awk -F"|" '$1==101{$2="b"}1' OFS="|"
DeleteHi Guru,
ReplyDeleteI'd to update the second column in a file based on the the hour and minute in the first column. The delimiter is ','.
For example, the original file contains the following
2017-07-02 aa:bb:00,0
2017-07-02 cc:dd:00,0
The updated file is
echo 2017-07-02 aa:bb:00,`expr 1000 '*' aa '+' 20 '*' bb`
echo 2017-07-02 cc:dd:00,`expr 1000 '*' cc '+' 20 '*' dd`
Hi
ReplyDeleteI have a text file with text like
[YTS.AM]/www.YTS.AM.jpg
I would like to insert text to make a command like...
rm "[1080p] [YTS.AM]/www.YTS.AM.jpg"
please help
How can we insert two columns before first column
ReplyDeleteExample: file.txt has
1
2
3
4
5
Need to have
Value 100 1
Value 102 2
Value 103 3
Value 104 4
Value 105 5
is it possible to update or replace a specific cell in a excel ?
ReplyDelete