Monday, March 19, 2012

Join every 2 lines in a file



  In this article, we will see the different ways in which we can join every two lines in a file. We will join them using comma as the delimiter.
    Assume a file with the following contents as shown below. The file below shows stats for countries. The only issue in the file is: the stats are not directly against the country names, instead they are in the next line.
$ cat file
USA
442
India
249
UK
50
1. paste command can take standard input. Every "-" consumes one line. Two "-" consumes two lines, and -d to join them using comma.
$ paste - - -d, < file
USA,442
India,249
UK,50
2. The traditional way of using paste command with "-s" option. "-d" in paste can take multiple delimiters. The delimiters specified here are comma and a newline character. This means while joining the first and second line use comma, and the second and third line by a newline character. And this repeats.
$ paste -s -d",\n" file
USA,442
India,249
UK,50
3. The sed way. 'N' joins 2 lines. And we replace the newline with a comma.
$ sed 'N;s/\n/,/' file
USA,442
India,249
UK,50
4.  Perl with "-p" option does default printing. All we do here is, if the line number is an odd line, replace the newline character with a comma.
$ perl -pne 'if($.%2){s/\n/,/;}' file
USA,442
India,249
UK,50
5. In the different ways to display file contents article, we saw one way using xargs. The "-L" argument in xargs tells how many lines to join. No "-L" means all lines are joined. And awk puts the output field separator(OFS).
$ xargs -L 2 < file | awk '$1=$1' OFS=,
USA,442
India,249
UK,50
6. awk method. Print the odd line using printf(no new line) with a comma, and do a normal print for the even line using print(which puts newline by default).
$ awk 'NR%2{printf "%s,",$0;next}{print;}' file
USA,442
India,249
UK,50

2 comments:

  1. If I don"t want comma(,) in between USA and 442 and so on for others what I need to do?

    ReplyDelete
  2. change the ',' to whatever, including space.

    even shorter: `awk 'ORS= NR%2 ? ",":"\n" '` -- redefines the Output Record Separator for odd lines to ",".

    ReplyDelete