Monday, November 26, 2012

awk - 10 examples to insert / remove / update fields of a CSV file



How to manipulate a text / CSV file using awk/gawk? How to insert/add a column between columns, remove columns, or to update a particular column? Let us discuss in this article.

Consider a CSV file with the following contents:
$ cat file
Unix,10,A
Linux,30,B
Solaris,40,C
Fedora,20,D
Ubuntu,50,E
1. To insert a new column (say serial number) before the 1st column
$ awk -F, '{$1=++i FS $1;}1' OFS=, file
1,Unix,10,A
2,Linux,30,B
3,Solaris,40,C
4,Fedora,20,D
5,Ubuntu,50,E
$1=++i FS $1 => Space is used to concatenate columns in awk. This expression concatenates a new field(++i) with the 1st field along with the delimiter(FS), and assigns it back to the 1st field($1). FS contains the file delimiter.

2. To insert a new column after the last column
$ awk -F, '{$(NF+1)=++i;}1' OFS=, file
Unix,10,A,1
Linux,30,B,2
Solaris,40,C,3
Fedora,20,D,4
Ubuntu,50,E,5
$NF indicates the value of last column. Hence,by assigning something to $(NF+1), a new field is inserted at the end automatically.

3. Add 2 columns after the last column:
$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file
Unix,10,A,1,X
Linux,30,B,2,X
Solaris,40,C,3,X
Fedora,20,D,4,X
Ubuntu,50,E,5,X
The explanation gives for the above 2 examples holds good here.

4. To insert a column before the 2nd last column
$ awk -F, '{$(NF-1)=++i FS $(NF-1);}1' OFS=, file
Unix,1,10,A
Linux,2,30,B
Solaris,3,40,C
Fedora,4,20,D
Ubuntu,5,50,E
NF-1 points to the 2nd last column. Hence, by concatenating the serial number in the beginning of NF-1 ends up in inserting a column before the 2nd last.

5. Update 2nd column by adding 10 to the variable:
$ awk -F, '{$2+=10;}1' OFS=, file
Unix,20,A
Linux,40,B
Solaris,50,C
Fedora,30,D
Ubuntu,60,E
   $2 is incremented by 10.

6.Convert a specific column(1st column) to uppercase in the CSV file:
$ awk -F, '{$1=toupper($1)}1' OFS=, file
UNIX,10,A
LINUX,30,B
SOLARIS,40,C
FEDORA,20,D
UBUNTU,50,E
Using the toupper function of the awk, the 1st column is converted from lowercase to uppercase.

7. Extract only first 3 characters of a specific column(1st column):
$ awk -F, '{$1=substr($1,0,3)}1' OFS=, file
Uni,10,A
Lin,30,B
Sol,40,C
Fed,20,D
Ubu,50,E
Using the substr function of awk, a substring of only the first few characters can be retrieved.

8.Empty the value in the 2nd column:
$ awk -F, '{$2="";}1' OFS=, file
Unix,,A
Linux,,B
Solaris,,C
Fedora,,D
Ubuntu,,E
Set the variable of 2nd column($2) to blank(""). Now, when the line is printed, $2 will be blank.

9. Remove/Delete the 2nd column from the CSV file:
$ awk -F, '{for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=2 file
Unix,A
Linux,B
Solaris,C
Fedora,D
Ubuntu,E
By just emptying a particular column, the column stays as is with empty value. To remove a column, all the subsequent columns from that position, needs to be advanced one position ahead. The for loop loops on all the fields. Using the ternary operator, every column is concatenated to the variable  "f" provided it is not 2nd column using the FS as delimiter. At the end, the variable "f" is printed which contains the updated record. The column to be removed is passed through the awk variable "x" and hence just be setting the appropriate number in x, any specific column can be removed.

10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
$ awk -F, '{$2=$2":"$x;for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=3 file
Unix,10:A
Linux,30:B
Solaris,40:C
Fedora,20:D
Ubuntu,50:E
Almost same as last example expcept that first the 3rd column($3) is concatenated with 2nd column($2) and then removed.

46 comments:

  1. how to print different values in each row in a new column using

    $ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file

    ReplyDelete
    Replies
    1. You can pass the output of the awk to sed and get it like this:

      awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file | sed 's/,/\n/g'

      Delete
  2. I have a csv file in which first column is given as date and time. eg: '30/04/2014 00:00' where 00:00 is common in all rows. I need to delete 00:00 alone in every fields.

    ReplyDelete
    Replies
    1. You can use sed for this, Refer examples here:
      http://www.theunixschool.com/2014/08/sed-examples-remove-delete-chars-from-line-file.html

      Delete
  3. I used below statement to display only 20 chars in file.But its not working as expected.
    awk -F, '{$3=substr($3,0,20)}3' OFS=, SFDC_OP_TEST1.csv > SFDC_OP_TMP.csv

    Please correct if i m wrong.

    ReplyDelete
    Replies
    1. After substring command, you should do "print $3". Also remove the 3 outside.

      Delete
  4. I have a file like this
    ajith bangalore 32 50
    ram mysore 30 43
    manu indore 40 32
    shyam bhopal 32 35
    vijay indore 32 45

    And I want to replace the city of "manu" to delhi. No other lines should be changed.
    Is that possible using awk ?

    ReplyDelete
  5. Let me know how could i get only columns whose datatype is String(non-numeric)??

    ReplyDelete
  6. I have a file file like this
    40000008,10,
    40100017,22,0.0
    40100075,212,20
    40100153,20,22
    40100182,1212,50
    40100202,1313,0.0
    40100229,133,55
    40100231,3131,77
    I need to generate a new file with respect to last column. If last column is zero or null then new column should be added with B and if not with A.

    40000008,10,31,B
    40100017,22,0.0,B
    40100075,212,20,A
    40100153,20,22,A
    40100182,1212,50,A
    40100202,1313,0.0,B
    40100229,133,55,A
    40100231,3131,77,A
    Please help to get the command.

    ReplyDelete
    Replies
    1. awk -F, '{$(NF+1)=int($NF)>0?"A":"B"}1' OFS=, file

      Delete
  7. How can i merge two file into a different file werein their column one is the same but different values in column two.

    Ex:

    file1:
    Server OS
    Testserver1 Solaris
    Prod1 HPUX

    File2:
    Server Status
    Testserver1 Running
    Prod1 Disconnected

    expected Output file3:
    OS Status
    Solaris Running
    HXUP Disconnected

    ReplyDelete
    Replies
    1. $ cat a1
      Testserver1 Solaris
      Prod1 HPUX
      guru@laptop:~$
      guru@laptop:~$ cat a2
      Testserver1 Running
      Prod1 Disconnected
      guru@laptop:~$
      guru@laptop:~$ awk 'NR==FNR{a[$1]=$2;next}{print a[$1],$2}' a1 a2
      Solaris Running
      HPUX Disconnected
      guru@laptop:~$

      Delete
  8. Hi Guru

    I have CSV like:

    1,1234566,asdasd,asdasda

    and need to encrypt $2 using AES 128

    please help.

    ReplyDelete
  9. Here is a sample input

    Piyush | 12345 | 5 | 5 | 4
    James | 007 | 0 | 0 | 7
    Knuth | 31415 | 100 | 100 | 100


    For which the output is

    Piyush | 12345 | 5 | 5 | 4 | 14
    James | 007 | 0 | 0 | 7 | 7
    Knuth | 31415 | 100 | 100 | 100 | 300
    max | | 100 | 100 | 100 | 300
    min | | 0 | 0 | 4 | 7
    mean | | 35.00 | 35.00 | 37.00 | 107.00
    sd | | 46.01 | 46.01 | 44.56 | 136.50

    How do iadd extra record(row ) after the last line of input file

    ReplyDelete
    Replies
    1. I hope you mean how to add the last column to every line in the file:

      awk -F"|" '{$(NF+1)=$3+$4+$5;}1' OFS="|" file

      Delete
  10. Hi. Can you help? I need to get the number of characters-1 (letters only) of the fourth column and add them to the second column. I´ve tried several options without luck. Thanks!
    1 11782884 T -C
    1 17034455 C -GCGCGCGT
    1 27100940 G +C

    Expected
    1 11782885 T -C
    1 17034463 C -GCGCGCGT
    1 27100941 G +C

    ReplyDelete
  11. Guru, you are awesome solution provider. I too have one to ask your help. I have a file which is pipe delimited. I would like to replace a value for a particular line and column which will always vary. for ex:

    cat file.txt
    a|b|c|d|e
    a|c|x|y|z|u|b
    c|e|y|b

    now I would like to find, replace and save, let's say line 2 pipe delimited value "y"(at 4) as "k" so my file will look like:

    cat file.txt
    a|b|c|d|e
    a|c|x|k|z|u|b
    c|e|y|b

    this is just an example. my actual file has more than 200 pipes and it's really difficult for me to count and change values. I was thinking if I can read 3 inputs for my script, let's say:
    $ ./changeValue 2 4 k
    will update the same file without creating a new file.

    Really appreciate your help.

    ReplyDelete
    Replies
    1. awk -v line=2 -v col=4 -v val="k" -F"|" 'NR==line{$col=val;}1' OFS="|" file

      Delete
  12. Hi, here are some simpler solutions to your examples

    9. Remove/Delete the 2nd column from the CSV file:
    awk -F, '{print $1 "," $3}' file

    10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
    awk -F, '{print $1 "," $2 ":" $3}' file

    Have a nice day.

    ReplyDelete
    Replies
    1. Thanks for your comment. If the file were to have 100 columns, you cant do this, meaning it is not a generic solution.

      Delete
  13. I have a file named product_sale With Below format

    product_name,jan,feb,march
    cakes,20,40,60
    pizzas,10,4,6
    cokes,10,20,40

    I want to add extra column which is average value of columns from jan to march. Like below format.

    product_name,jan,feb,march,Average_sale
    cakes,20,40,60,40
    pizzas,11,4,6,7
    cokes,10,10,40,20

    How to generate this ?

    ReplyDelete
    Replies
    1. awk -F, '{$5=round(($2+$3+$4)/3);}1' OFS=, file

      Delete
  14. Hi can all these commands be applied to pipe separated dat files also?

    ReplyDelete
  15. I have a day file with below data 1|2|3|4 I need to remove the first column and two new columns to it in the start like a|b|2|3|4

    ReplyDelete
    Replies
    1. echo "1|2|3|4" | awk -F"|" '{$1="a|b";}1' OFS="|"

      Delete
  16. Hi...
    I have a csv file with total 42 columns
    and from that file i need to make a new csv file that contains only column 23-31 and 42th (only 10 columns) for every lines i have...

    Can you help me how to generate it? thank...

    ReplyDelete
    Replies
    1. You can do this using cut command itself:

      cut -d, -f23-31,42 file

      Delete
  17. HI Guru,

    Thanks for the wonderful explanation. My situation is a bit tricky.. I have a CSV file in which one field has http responses. So it includes comma, double quotes and many other special chacters, it isalso is not limited to one line. The issue im facing is how do i escape the special characters are new line and print the exact same thing as CSV.

    Refer to following for detailed question : http://stackoverflow.com/questions/33965840/how-to-escape-comma-double-quotes-and-new-line-in-csv-using-shell-bash-script?noredirect=1#comment55689411_33965840

    ReplyDelete
  18. Hi Guru,

    I have a csv file and need to include in column 3 with " " example "192.168.1.10,192.168.1.11"

    cat file.csv
    server1;redhat;192.168.1.10,192.168.1.11;up
    server2;centos;192.168.1.20,192.168.1.21;down

    and I want the output write in file like this:
    server1;redhat;"192.168.1.10,192.168.1.11";up
    server2;centos;"192.168.1.20,192.168.1.21";down

    Really appreciate your help.

    ReplyDelete
  19. Thanks Guru Prasad, I forgot to inform you that column 2 also need to add with symbol !!. For example !!redhat!! and !!centos!!.

    Result will be like this:
    cat file.csv
    server1;!!redhat!!;"192.168.1.10,192.168.1.11";up
    server2;!!centos!!;"192.168.1.20,192.168.1.21";down

    ReplyDelete
    Replies
    1. awk -F";" '{$2=e$2e;$3=q$3q;}1' q='"' e='!!' OFS=";" file.csv

      Delete
    2. Thank you very much Guru Prasad. It's work now. Again,
      How about from file.csv
      cat file.csv
      server1;!!redhat!!;"192.168.1.10,192.168.1.11";up
      server2;!!centos!!;"192.168.1.20,192.168.1.21";down

      in column 2 which is !!redhat!! and !!centos!! required remove !!?

      Example:
      server1;redhat;"192.168.1.10,192.168.1.11";up
      server2;centos;"192.168.1.20,192.168.1.21";down

      Really appreciate your help.

      Delete
  20. Hi Guru,
    I have a .dat file (dellimiter = space)with thousand columns. I want to
    1. duplicate column 1
    2.insert 4 columns of 0 after 3rd column (after duplicating column 1)
    Can you help me? I could not figure out from your examples :(
    Thank you very much!

    ReplyDelete
  21. Hello

    can we add a colum or columns not a serie of numbers but containing characters or something else thank you

    ReplyDelete
  22. Hi Guru,

    i have:

    time,nooffiles1,nooffiles2,
    HR00:,1,2
    HR01:,3,4

    i want to become like this

    time,nooffiles1,nooffiles2,total
    HR00:,1,2,3
    HR01:,3,4,7

    how to generate this?
    Thank you very much

    ReplyDelete
    Replies
    1. awk 'NR==1{$(NF+1)="Total"}NR!=1{$(NF+1)=$2+$3;}1' FS=, OFS=, file

      Delete
  23. Hi Guru,
    I have a file with "|" separator. we need to update a column value based on the 1st column and will print in output file note that file does not have fixed columns.
    Eg: 101|a|b|c
    101|q|a|f
    102|w|r|p|f
    102|e|s|d|g
    103|l|d|v|h|b|x
    103|z|x|c|b|n|m
    need to update some column value based on the first column value.
    Really appreciate your help and Thank You

    ReplyDelete
  24. Hi Guru,

    I'd to update the second column in a file based on the the hour and minute in the first column. The delimiter is ','.

    For example, the original file contains the following

    2017-07-02 aa:bb:00,0
    2017-07-02 cc:dd:00,0

    The updated file is

    echo 2017-07-02 aa:bb:00,`expr 1000 '*' aa '+' 20 '*' bb`
    echo 2017-07-02 cc:dd:00,`expr 1000 '*' cc '+' 20 '*' dd`

    ReplyDelete
  25. Hi

    I have a text file with text like
    [YTS.AM]/www.YTS.AM.jpg

    I would like to insert text to make a command like...
    rm "[1080p] [YTS.AM]/www.YTS.AM.jpg"

    please help

    ReplyDelete
  26. How can we insert two columns before first column
    Example: file.txt has
    1
    2
    3
    4
    5

    Need to have
    Value 100 1
    Value 102 2
    Value 103 3
    Value 104 4
    Value 105 5

    ReplyDelete
  27. is it possible to update or replace a specific cell in a excel ?

    ReplyDelete