The UNIX School: awk - 10 examples to insert / remove / update fields of a CSV file

Monday, November 26, 2012

awk - 10 examples to insert / remove / update fields of a CSV file

How to manipulate a text / CSV file using awk/gawk? How to insert/add a column between columns, remove columns, or to update a particular column? Let us discuss in this article.

Consider a CSV file with the following contents:

$ cat file
Unix,10,A
Linux,30,B
Solaris,40,C
Fedora,20,D
Ubuntu,50,E

1. To insert a new column (say serial number) before the 1st column

$ awk -F, '{$1=++i FS $1;}1' OFS=, file
1,Unix,10,A
2,Linux,30,B
3,Solaris,40,C
4,Fedora,20,D
5,Ubuntu,50,E

$1=++i FS $1 => Space is used to concatenate columns in awk. This expression concatenates a new field(++i) with the 1st field along with the delimiter(FS), and assigns it back to the 1st field($1). FS contains the file delimiter.

2. To insert a new column after the last column

$ awk -F, '{$(NF+1)=++i;}1' OFS=, file
Unix,10,A,1
Linux,30,B,2
Solaris,40,C,3
Fedora,20,D,4
Ubuntu,50,E,5

$NF indicates the value of last column. Hence,by assigning something to $(NF+1), a new field is inserted at the end automatically.

3. Add 2 columns after the last column:

$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file
Unix,10,A,1,X
Linux,30,B,2,X
Solaris,40,C,3,X
Fedora,20,D,4,X
Ubuntu,50,E,5,X

The explanation gives for the above 2 examples holds good here.

4. To insert a column before the 2nd last column

$ awk -F, '{$(NF-1)=++i FS $(NF-1);}1' OFS=, file
Unix,1,10,A
Linux,2,30,B
Solaris,3,40,C
Fedora,4,20,D
Ubuntu,5,50,E

NF-1 points to the 2nd last column. Hence, by concatenating the serial number in the beginning of NF-1 ends up in inserting a column before the 2nd last.

5. Update 2nd column by adding 10 to the variable:

$ awk -F, '{$2+=10;}1' OFS=, file
Unix,20,A
Linux,40,B
Solaris,50,C
Fedora,30,D
Ubuntu,60,E

$2 is incremented by 10.

6.Convert a specific column(1st column) to uppercase in the CSV file:

$ awk -F, '{$1=toupper($1)}1' OFS=, file
UNIX,10,A
LINUX,30,B
SOLARIS,40,C
FEDORA,20,D
UBUNTU,50,E

Using the toupper function of the awk, the 1st column is converted from lowercase to uppercase.

7. Extract only first 3 characters of a specific column(1st column):

$ awk -F, '{$1=substr($1,0,3)}1' OFS=, file
Uni,10,A
Lin,30,B
Sol,40,C
Fed,20,D
Ubu,50,E

Using the substr function of awk, a substring of only the first few characters can be retrieved.

8.Empty the value in the 2nd column:

$ awk -F, '{$2="";}1' OFS=, file
Unix,,A
Linux,,B
Solaris,,C
Fedora,,D
Ubuntu,,E

Set the variable of 2nd column($2) to blank(""). Now, when the line is printed, $2 will be blank.

9. Remove/Delete the 2nd column from the CSV file:

$ awk -F, '{for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=2 file
Unix,A
Linux,B
Solaris,C
Fedora,D
Ubuntu,E

By just emptying a particular column, the column stays as is with empty value. To remove a column, all the subsequent columns from that position, needs to be advanced one position ahead. The for loop loops on all the fields. Using the ternary operator, every column is concatenated to the variable "f" provided it is not 2nd column using the FS as delimiter. At the end, the variable "f" is printed which contains the updated record. The column to be removed is passed through the awk variable "x" and hence just be setting the appropriate number in x, any specific column can be removed.

10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:

$ awk -F, '{$2=$2":"$x;for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=3 file
Unix,10:A
Linux,30:B
Solaris,40:C
Fedora,20:D
Ubuntu,50:E

Almost same as last example expcept that first the 3rd column($3) is concatenated with 2nd column($2) and then removed.

46 comments:

ASHISH SINGHALJuly 23, 2013 at 11:21 PM
how to print different values in each row in a new column using

$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file
ReplyDelete
Replies
UnknownApril 30, 2014 at 4:54 PM
I have a csv file in which first column is given as date and time. eg: '30/04/2014 00:00' where 00:00 is common in all rows. I need to delete 00:00 alone in every fields.
ReplyDelete
Replies
Jog FallsJune 4, 2014 at 8:21 PM
I used below statement to display only 20 chars in file.But its not working as expected.
awk -F, '{$3=substr($3,0,20)}3' OFS=, SFDC_OP_TEST1.csv > SFDC_OP_TMP.csv

Please correct if i m wrong.
ReplyDelete
Replies
AjithOctober 9, 2014 at 7:03 PM
I have a file like this
ajith bangalore 32 50
ram mysore 30 43
manu indore 40 32
shyam bhopal 32 35
vijay indore 32 45

And I want to replace the city of "manu" to delhi. No other lines should be changed.
Is that possible using awk ?
ReplyDelete
Replies
UnknownDecember 8, 2014 at 10:03 PM
Let me know how could i get only columns whose datatype is String(non-numeric)??
ReplyDelete
Replies
UnknownDecember 18, 2014 at 2:57 PM
I have a file file like this
40000008,10,
40100017,22,0.0
40100075,212,20
40100153,20,22
40100182,1212,50
40100202,1313,0.0
40100229,133,55
40100231,3131,77
I need to generate a new file with respect to last column. If last column is zero or null then new column should be added with B and if not with A.

40000008,10,31,B
40100017,22,0.0,B
40100075,212,20,A
40100153,20,22,A
40100182,1212,50,A
40100202,1313,0.0,B
40100229,133,55,A
40100231,3131,77,A
Please help to get the command.
ReplyDelete
Replies
UnknownJanuary 19, 2015 at 5:05 PM
How can i merge two file into a different file werein their column one is the same but different values in column two.

Ex:

file1:
Server OS
Testserver1 Solaris
Prod1 HPUX

File2:
Server Status
Testserver1 Running
Prod1 Disconnected

expected Output file3:
OS Status
Solaris Running
HXUP Disconnected

ReplyDelete
Replies
UnknownFebruary 19, 2015 at 12:22 PM
Hi Guru

I have CSV like:

1,1234566,asdasd,asdasda

and need to encrypt $2 using AES 128

please help.
ReplyDelete
Replies
UnknownFebruary 28, 2015 at 11:57 PM
Here is a sample input

Piyush | 12345 | 5 | 5 | 4
James | 007 | 0 | 0 | 7
Knuth | 31415 | 100 | 100 | 100

For which the output is

Piyush | 12345 | 5 | 5 | 4 | 14
James | 007 | 0 | 0 | 7 | 7
Knuth | 31415 | 100 | 100 | 100 | 300
max | | 100 | 100 | 100 | 300
min | | 0 | 0 | 4 | 7
mean | | 35.00 | 35.00 | 37.00 | 107.00
sd | | 46.01 | 46.01 | 44.56 | 136.50

How do iadd extra record(row ) after the last line of input file
ReplyDelete
Replies
jorgeMarch 7, 2015 at 8:14 PM
Hi. Can you help? I need to get the number of characters-1 (letters only) of the fourth column and add them to the second column. I´ve tried several options without luck. Thanks!
1 11782884 T -C
1 17034455 C -GCGCGCGT
1 27100940 G +C

Expected
1 11782885 T -C
1 17034463 C -GCGCGCGT
1 27100941 G +C
ReplyDelete
Replies
UnknownMarch 25, 2015 at 7:31 AM
Guru, you are awesome solution provider. I too have one to ask your help. I have a file which is pipe delimited. I would like to replace a value for a particular line and column which will always vary. for ex:

cat file.txt
a|b|c|d|e
a|c|x|y|z|u|b
c|e|y|b

now I would like to find, replace and save, let's say line 2 pipe delimited value "y"(at 4) as "k" so my file will look like:

cat file.txt
a|b|c|d|e
a|c|x|k|z|u|b
c|e|y|b

this is just an example. my actual file has more than 200 pipes and it's really difficult for me to count and change values. I was thinking if I can read 3 inputs for my script, let's say:
$ ./changeValue 2 4 k
will update the same file without creating a new file.

Really appreciate your help.
ReplyDelete
Replies
champostApril 10, 2015 at 7:43 AM
Hi, here are some simpler solutions to your examples

9. Remove/Delete the 2nd column from the CSV file:
awk -F, '{print $1 "," $3}' file

10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
awk -F, '{print $1 "," $2 ":" $3}' file

Have a nice day.
ReplyDelete
Replies
UnknownJune 24, 2015 at 11:53 AM
I have a file named product_sale With Below format

product_name,jan,feb,march
cakes,20,40,60
pizzas,10,4,6
cokes,10,20,40

I want to add extra column which is average value of columns from jan to march. Like below format.

product_name,jan,feb,march,Average_sale
cakes,20,40,60,40
pizzas,11,4,6,7
cokes,10,10,40,20

How to generate this ?
ReplyDelete
Replies
UnknownJuly 7, 2015 at 6:27 PM
Hi can all these commands be applied to pipe separated dat files also?
ReplyDelete
Replies
UnknownJuly 7, 2015 at 6:29 PM
I have a day file with below data 1|2|3|4 I need to remove the first column and two new columns to it in the start like a|b|2|3|4
ReplyDelete
Replies
jpSeptember 29, 2015 at 8:26 PM
Hi...
I have a csv file with total 42 columns
and from that file i need to make a new csv file that contains only column 23-31 and 42th (only 10 columns) for every lines i have...

Can you help me how to generate it? thank...
ReplyDelete
Replies
MannyNovember 28, 2015 at 7:35 AM
HI Guru,

Thanks for the wonderful explanation. My situation is a bit tricky.. I have a CSV file in which one field has http responses. So it includes comma, double quotes and many other special chacters, it isalso is not limited to one line. The issue im facing is how do i escape the special characters are new line and print the exact same thing as CSV.

Refer to following for detailed question : http://stackoverflow.com/questions/33965840/how-to-escape-comma-double-quotes-and-new-line-in-csv-using-shell-bash-script?noredirect=1#comment55689411_33965840
ReplyDelete
Replies
UnknownJanuary 2, 2016 at 4:20 AM
Hi Guru,

I have a csv file and need to include in column 3 with " " example "192.168.1.10,192.168.1.11"

cat file.csv
server1;redhat;192.168.1.10,192.168.1.11;up
server2;centos;192.168.1.20,192.168.1.21;down

and I want the output write in file like this:
server1;redhat;"192.168.1.10,192.168.1.11";up
server2;centos;"192.168.1.20,192.168.1.21";down

Really appreciate your help.
ReplyDelete
Replies
UnknownJanuary 6, 2016 at 10:20 AM
Thanks Guru Prasad, I forgot to inform you that column 2 also need to add with symbol !!. For example !!redhat!! and !!centos!!.

Result will be like this:
cat file.csv
server1;!!redhat!!;"192.168.1.10,192.168.1.11";up
server2;!!centos!!;"192.168.1.20,192.168.1.21";down

ReplyDelete
Replies
UnknownJanuary 18, 2016 at 3:40 PM
Hi Guru,
I have a .dat file (dellimiter = space)with thousand columns. I want to
1. duplicate column 1
2.insert 4 columns of 0 after 3rd column (after duplicating column 1)
Can you help me? I could not figure out from your examples :(
Thank you very much!
ReplyDelete
Replies
abeslamAugust 24, 2016 at 4:08 AM
Hello

can we add a colum or columns not a serie of numbers but containing characters or something else thank you
ReplyDelete
Replies
UnknownFebruary 22, 2017 at 9:30 PM
Hi Guru,

i have:

time,nooffiles1,nooffiles2,
HR00:,1,2
HR01:,3,4

i want to become like this

time,nooffiles1,nooffiles2,total
HR00:,1,2,3
HR01:,3,4,7

how to generate this?
Thank you very much
ReplyDelete
Replies
DavidApril 21, 2017 at 5:19 PM
Hi Guru,
I have a file with "|" separator. we need to update a column value based on the 1st column and will print in output file note that file does not have fixed columns.
Eg: 101|a|b|c
101|q|a|f
102|w|r|p|f
102|e|s|d|g
103|l|d|v|h|b|x
103|z|x|c|b|n|m
need to update some column value based on the first column value.
Really appreciate your help and Thank You
ReplyDelete
Replies
UnknownJuly 3, 2017 at 8:57 AM
Hi Guru,

I'd to update the second column in a file based on the the hour and minute in the first column. The delimiter is ','.

For example, the original file contains the following

2017-07-02 aa:bb:00,0
2017-07-02 cc:dd:00,0

The updated file is

echo 2017-07-02 aa:bb:00,`expr 1000 '*' aa '+' 20 '*' bb`
echo 2017-07-02 cc:dd:00,`expr 1000 '*' cc '+' 20 '*' dd`
ReplyDelete
Replies
UnknownSeptember 3, 2018 at 10:26 PM
Hi

I have a text file with text like
[YTS.AM]/www.YTS.AM.jpg

I would like to insert text to make a command like...
rm "[1080p] [YTS.AM]/www.YTS.AM.jpg"

please help
ReplyDelete
Replies
vasanthNovember 23, 2018 at 5:39 PM
How can we insert two columns before first column
Example: file.txt has
1
2
3
4
5

Need to have
Value 100 1
Value 102 2
Value 103 3
Value 104 4
Value 105 5
ReplyDelete
Replies
beautiful quotes...July 8, 2019 at 7:18 PM
is it possible to update or replace a specific cell in a excel ?

ReplyDelete
Replies

Add comment

Pages

Monday, November 26, 2012

awk - 10 examples to insert / remove / update fields of a CSV file

46 comments: