In one of our earlier articles, we had discussed about joining all lines in a file and also joining every 2 lines in a file. In this article, we will see the how we can join lines based on a pattern or joining lines on encountering a pattern using awk or gawk.
Let us assume a file with the following contents. There is a line with START in-between. We have to join all the lines following the pattern START.
$ cat file START Unix Linux START Solaris Aix SCO1. Join the lines following the pattern START without any delimiter.
$ awk '/START/{if (NR!=1)print "";next}{printf $0}END{print "";}' file UnixLinux SolarisAixSCO
Basically, what we are trying to do is: Accumulate the lines following
the START and print them on encountering the next START statement. /START/ searches for lines containing the pattern START. The command within the {} will work only on lines containing the START pattern. Prints a blank line if the line is not the first line(NR!=1). Without this condition, a blank line will come in the very beginning of the output since it encounters a START in the beginning.
The next command prevents the remaining part of the command from getting executed for the START lines. The second part of braces {} works only for the lines not containing the START. This part simply prints the line without a terminating new line character(printf). And hence as a result, we get all the lines after the pattern START in the same line. The END label is put to print a newline at the end without which the prompt will appear at the end of the last line of output itself.
2. Join the lines following the pattern START with space as delimiter.
$ awk '/START/{if (NR!=1)print "";next}{printf "%s ",$0}END{print "";}' file Unix Linux Solaris Aix SCOThis is same as the earlier one except it uses the format specifier %s in order to accommodate an additional space which is the delimiter in this case.
3. Join the lines following the pattern START with comma as delimiter.
$ awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}' file Unix,Linux Solaris,Aix,SCO
Here, we form a complete line and store it in a variable x and print the variable x whenever a new pattern starts. The command: x=(!x)?$0:x","$0 is like the ternary operator in C or Perl. It means if x is empty, assign the current line($0) to x, else append a comma and the current line to x. As a result, x will contain the lines joined with a comma following the START pattern. And in the END label, x is printed since for the last group there will not be a START pattern to print the earlier group.
4. Join the lines following the pattern START with comma as delimiter with also the pattern matching line.
$ awk '/START/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' file START,Unix,Linux START,Solaris,Aix,SCOThe difference here is the missing next statement. Because next is not there, the commands present in the second set of curly braces are applicable for the START line as well, and hence it also gets concatenated.
5. Join the lines following the pattern START with comma as delimiter with also the pattern matching line. However, the pattern line should not be joined.
$ awk '/START/{if (x)print x;print;x="";next}{x=(!x)?$0:x","$0;}END{print x;}' file START Unix,Linux START Solaris,Aix,SCOIn this, instead of forming START as part of the variable x, the START line is printed. As a result, the START line comes out separately, and the remaining lines get joined.
I'm a mainframe and SAS programmer trying to learn UNIX Shell Scripts:
ReplyDeleteI cannot get following command to work properly (it does not concat the records after the START delim...rather it shows last rcrd before the START and it blends last rcrd with leftover from 1st rcrd)...please help; thx:
awk '/START/{if (NR!=1)print "";next}{printf $0}END{print "";}' file
Al Diovanni
adiovanni@earthlink.net
C#: 347.525.2501
H#: 718.987.8672
Looks like your file contains ^M characters. Run the dos2unix command on your file before running the awk command.
DeleteThanks for your help: i got it to work. I could not install dos2unix (using yum) because i don't yet know how to get access to my fedora linux root directory; however, someone gave me this dos2unix equivalent command which fixed my dos text input file:
Deletetr -d '\r' < awk_merge_join_input_file > awk_merge_join_input_file_new
So now when I do:
awk '/START/{if (NR!=1)print "";next}{printf $0}END{print "";}' file
I get the records sandwiched between the START pattern delimiters properly concatenated onto one line.
Thanks !
Hi Experts, I am trying to achieve below results. Please help:
ReplyDeleteFor Inputs:
START 1
UNIX
Linux
START 2
Solaris
Aix
SCO
Output should be:
START 1~UNIX
START 1~Linux
START 2~Solaris
START 2~Aix
START 3~SCO
awk '/START/{x=$0;next}{print x"~"$0;}' file
DeleteThanks a lot Guru!
ReplyDeleteThis command is working fine in all situations except when input is like this:
START 1
START 2
Unix
Linux
in this case, output is :
START 2~Unix
START 2~Linux
but expected output is :
START 1
START 2~Unix
START 3~Linux
hi so i have a problem that is kind of similar to this i have this
ReplyDelete>@1M1U7:00212:00595
_F_48_30.5625
CAATGGGAAATCTTAGGCACTTCTTCCGGCGAATTTCGCGCCATTTCT
>@1M1U7:00241:00593
_F_48_30.3958333333
CAATGGGAAATCTTAGGCACTTCTTCCGGCGAATTTCGCGCCATTTCT
and i want to get to this:
>@1M1U7:00212:00595_F_48_30.5625
CAATGGGAAATCTTAGGCACTTCTTCCGGCGAATTTCGCGCCATTTCT
>@1M1U7:00241:00593_F_48_30.3958333333
CAATGGGAAATCTTAGGCACTTCTTCCGGCGAATTTCGCGCCATTTCT
awk '/^>/{a=$0;getline x;$0=a;}1' file
ReplyDelete