The UNIX School: Different ways to split the file contents

In one of our earlier articles, we saw different ways to join all lines in a file. In this article, we will see the different ways in which we can split the file contents of a file. Some even refer to this as converting rows into columns of a file.

Let us assume a file, say "a" with the following contents:

$ cat a
a,b,c,d

1. Using sed, we can substitute comma (,) with a new line character. This will convert all the columns into rows.

$ sed 's/,/\n/g' a
a
b
c
d

2. Using tr command, the same can be achieved. One thing to notice is, tr does not take a file as input it needs only input stream.

$ tr ',' '\n' < a
a
b
c
d

3. Using awk. awk reads line by line of a file. A line is identified using a newline character. In this, we tell the awk program to identify a line using "," which is done by using the special variable RS(Input Record separator). Hence every component after comma is treated as a line and gets printed.

$ awk '$1=$1' RS=, a
a
b
c
d

4. Other method using awk which is straightforward. In this, we use the gsub function to replace all the occurences of comma with a newline. awk has one more function with the name sub. The difference between sub and gsub is sub replaces only the first occurrence, whereas gsub replaces all the occurrences where the regular expression matches.

$ awk '{gsub(",","\n");}1' a
a
b
c
d

5. Using perl. This is more like the first awk solution. $/ is a Perl special variable indicating the input line separator. We set this to "," which by default is the newline.

$ perl -ne 'BEGIN{$/=",";$\="\n"}chop;print;' a
a
b
c
d

6. The last solution is also using perl. This is more like the sed solution above.

$ perl -ne 's/,/\n/g;print;' a
a
b
c
d

Happy splitting!!!

Pages

Tuesday, February 28, 2012

Different ways to split the file contents

No comments:

Post a Comment