Monday, November 26, 2012

awk - 10 examples to insert / remove / update fields of a CSV file



How to manipulate a text / CSV file using awk/gawk? How to insert/add a column between columns, remove columns, or to update a particular column? Let us discuss in this article.

Consider a CSV file with the following contents:
$ cat file
Unix,10,A
Linux,30,B
Solaris,40,C
Fedora,20,D
Ubuntu,50,E
1. To insert a new column (say serial number) before the 1st column
$ awk -F, '{$1=++i FS $1;}1' OFS=, file
1,Unix,10,A
2,Linux,30,B
3,Solaris,40,C
4,Fedora,20,D
5,Ubuntu,50,E
$1=++i FS $1 => Space is used to concatenate columns in awk. This expression concatenates a new field(++i) with the 1st field along with the delimiter(FS), and assigns it back to the 1st field($1). FS contains the file delimiter.

2. To insert a new column after the last column
$ awk -F, '{$(NF+1)=++i;}1' OFS=, file
Unix,10,A,1
Linux,30,B,2
Solaris,40,C,3
Fedora,20,D,4
Ubuntu,50,E,5
$NF indicates the value of last column. Hence,by assigning something to $(NF+1), a new field is inserted at the end automatically.

3. Add 2 columns after the last column:
$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file
Unix,10,A,1,X
Linux,30,B,2,X
Solaris,40,C,3,X
Fedora,20,D,4,X
Ubuntu,50,E,5,X
The explanation gives for the above 2 examples holds good here.

4. To insert a column before the 2nd last column
$ awk -F, '{$(NF-1)=++i FS $(NF-1);}1' OFS=, file
Unix,1,10,A
Linux,2,30,B
Solaris,3,40,C
Fedora,4,20,D
Ubuntu,5,50,E
NF-1 points to the 2nd last column. Hence, by concatenating the serial number in the beginning of NF-1 ends up in inserting a column before the 2nd last.

5. Update 2nd column by adding 10 to the variable:
$ awk -F, '{$2+=10;}1' OFS=, file
Unix,20,A
Linux,40,B
Solaris,50,C
Fedora,30,D
Ubuntu,60,E
   $2 is incremented by 10.

6.Convert a specific column(1st column) to uppercase in the CSV file:
$ awk -F, '{$1=toupper($1)}1' OFS=, file
UNIX,10,A
LINUX,30,B
SOLARIS,40,C
FEDORA,20,D
UBUNTU,50,E
Using the toupper function of the awk, the 1st column is converted from lowercase to uppercase.

7. Extract only first 3 characters of a specific column(1st column):
$ awk -F, '{$1=substr($1,0,3)}1' OFS=, file
Uni,10,A
Lin,30,B
Sol,40,C
Fed,20,D
Ubu,50,E
Using the substr function of awk, a substring of only the first few characters can be retrieved.

8.Empty the value in the 2nd column:
$ awk -F, '{$2="";}1' OFS=, file
Unix,,A
Linux,,B
Solaris,,C
Fedora,,D
Ubuntu,,E
Set the variable of 2nd column($2) to blank(""). Now, when the line is printed, $2 will be blank.

9. Remove/Delete the 2nd column from the CSV file:
$ awk -F, '{for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=2 file
Unix,A
Linux,B
Solaris,C
Fedora,D
Ubuntu,E
By just emptying a particular column, the column stays as is with empty value. To remove a column, all the subsequent columns from that position, needs to be advanced one position ahead. The for loop loops on all the fields. Using the ternary operator, every column is concatenated to the variable  "f" provided it is not 2nd column using the FS as delimiter. At the end, the variable "f" is printed which contains the updated record. The column to be removed is passed through the awk variable "x" and hence just be setting the appropriate number in x, any specific column can be removed.

10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
$ awk -F, '{$2=$2":"$x;for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=3 file
Unix,10:A
Linux,30:B
Solaris,40:C
Fedora,20:D
Ubuntu,50:E
Almost same as last example expcept that first the 3rd column($3) is concatenated with 2nd column($2) and then removed.
Related Posts Plugin for WordPress, Blogger...

13 comments:

  1. Can you explain what the "1" does at the end of script?awk -F, '{$1=++i FS $1;} 1' OFS=, file

    ReplyDelete
    Replies
    1. 1 means True which will result in every line getting printed.

      Delete
  2. how to print different values in each row in a new column using

    $ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file

    ReplyDelete
    Replies
    1. You can pass the output of the awk to sed and get it like this:

      awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file | sed 's/,/\n/g'

      Delete
  3. I have a csv file in which first column is given as date and time. eg: '30/04/2014 00:00' where 00:00 is common in all rows. I need to delete 00:00 alone in every fields.

    ReplyDelete
    Replies
    1. You can use sed for this, Refer examples here:
      http://www.theunixschool.com/2014/08/sed-examples-remove-delete-chars-from-line-file.html

      Delete
  4. I used below statement to display only 20 chars in file.But its not working as expected.
    awk -F, '{$3=substr($3,0,20)}3' OFS=, SFDC_OP_TEST1.csv > SFDC_OP_TMP.csv

    Please correct if i m wrong.

    ReplyDelete
    Replies
    1. After substring command, you should do "print $3". Also remove the 3 outside.

      Delete
  5. I have a file like this
    ajith bangalore 32 50
    ram mysore 30 43
    manu indore 40 32
    shyam bhopal 32 35
    vijay indore 32 45

    And I want to replace the city of "manu" to delhi. No other lines should be changed.
    Is that possible using awk ?

    ReplyDelete
  6. Let me know how could i get only columns whose datatype is String(non-numeric)??

    ReplyDelete
  7. I have a file file like this
    40000008,10,
    40100017,22,0.0
    40100075,212,20
    40100153,20,22
    40100182,1212,50
    40100202,1313,0.0
    40100229,133,55
    40100231,3131,77
    I need to generate a new file with respect to last column. If last column is zero or null then new column should be added with B and if not with A.

    40000008,10,31,B
    40100017,22,0.0,B
    40100075,212,20,A
    40100153,20,22,A
    40100182,1212,50,A
    40100202,1313,0.0,B
    40100229,133,55,A
    40100231,3131,77,A
    Please help to get the command.

    ReplyDelete
    Replies
    1. awk -F, '{$(NF+1)=int($NF)>0?"A":"B"}1' OFS=, file

      Delete