In one of our earlier articles on awk, we saw how easily awk can parse a file and extract data from it. Shell also has properties with which we can handle text files: files with fields separated by white spaces or CSV files in which the fields are separated by a comma delimiter. Let us see in this article how to read and parse a file field by field or column by column and extract data from it using the while loop of shell. This article will also explain the usage of shell while loop.
Let us consider a file with sample contents as shown below. The file contains 3 fields: OS, the company and a random value.
$ cat file Solaris Sun 25 Linux RedHat 301. while command, in addition to running a loop till the condition becomes false, can also read a file and parse it line by line. This we discussed once in the 15 different ways to display the file contents. Below shown is a simple example demonstrating it:
$ cat test.sh #!/bin/bash while read line do echo "Line is : $line" done < fileOn running the above script:
$ ./test.sh Line is : Solaris Sun 25 Line is : Linux RedHat 30The script does: Read the file named "file"(input re-direction < ). Every line read is present in the variable line. Inside the while loop, the line is printed which contains the entire line. The read condition becomes false when there are no lines to read at which point the while loop is quit.
Note: The read statement reads till a newline character is found. And hence while loop is able to parse the file line by line.
2. Read every line into individual fields using the while loop:
$ cat test.sh #!/bin/bash while read f1 f2 f3 do echo "OS is : $f1" echo "Company is: $f2" echo "Value is : $f3" done < fileOn running the above script:
$ ./test.sh OS is : Solaris Company is: Sun Value is : 25 OS is : Linux Company is: RedHat Value is : 30read command can take multiple variable names as shown here in the form of f1, f2 and f3. The first field gets read into f1, second into f2 and so on.
How does the shell do it? How does it manage to split the fields properly and allocate it to variables? It is because of the shell special variable IFS which stands for Internal Field Separator. IFS in shell is same as FS, the field separator, in awk. By default, IFS is white space which could be single space, series of single space, tab space or a new line. Hence, the shell is able to parse the fields approriately.
3. What if there are fewer variable compared to fields? i.e, If we have lesser fields in read with more fields present in file?
$ cat test.sh #!/bin/bash while read f1 f2 do echo "OS is : $f1" echo "Remaining is: $f2" done < fileOn running the above script:
$ ./test.sh OS is : Solaris Remaining is: Sun 25 OS is : Linux Remaining is: RedHat 30What happened? The last variable in the read command will consume all the remaining fields. Similarly, try and see what happens in the vice-versa case when there are more variables in the read command than the number of fields present in the file.
4. Change the delimiter of a file from a single space to a colon using the while loop:
#!/bin/bash while read f1 f2 f3 do echo $f1:$f2:$f3 done < file > file1This script also shows how to re-direct the output of a while script to a file. Here, the file is read in the while loop, and the read variables are printed using the echo statement with a delimiter.
Also, note this: "< file > file1". The while reads from file and writes to file1. The < indicates reading, > indicates writing to the output.
The output file contents will look as below:
$ cat file1 Solaris:Sun:25 Linux:RedHat:30
5. Read every line from a CSV file into individual fields using the while loop. The CSV files are separated by a comma delimiter:
Let us assume the sample file to contain data as below:
$ cat file.csv Solaris,Sun,25 Linux,RedHat,30The script:
$ cat test.sh #!/bin/bash IFS="," while read f1 f2 f3 do echo "OS is : $f1" echo "Company is : $f2" echo "Value is : $f3" done < file.csvOn running the above script:
$ ./test.sh OS is : Solaris Company is : Sun Value is : 25 OS is : Linux Company is : RedHat Value is : 30You got it correct!! Since IFS is the one which tells the read command how to split the fields, by setting the IFS to ",", read will now read the fields by separating them when a comma is encountered. In this way, we get the individual fields in the corresponding variables.
6. Do not change the IFS permanently. In the above example, once the IFS is changed, it remains so till the end of the script. This may lead to a problem since the code below might have been written with the default behavior of IFS in mind. It is always considered a good practice to set the IFS to the old value once the purpose is solved.
$ cat test.sh #!/bin/bash OLDIFS=$IFS IFS="," while read f1 f2 f3 do echo "OS is : $f1" echo "Company is : $f2" echo "Value is : $f3" done < file IFS=$OLDIFSOR
#!/bin/bash while IFS="," read f1 f2 f3 do echo "OS is : $f1" echo "Company is : $f2" echo "Value is : $f3" done < fileIn the first way, IFS is temporarily saved into a variable OLDIFS and once the operation is done, IFS is restored back to its old value.
In the second way, IFS is being set to a new value with-in the while loop. Because of this, the scope of the IFS remains so only within the while loop. Once outside the while, IFS will retain its old value. You can write some echo statements before and after the while loop printing the IFS value to confirm the same.
7. Read a file with dates using the while loop and extract the date components year, month and date.
Assuming the file contents as shown below. The file contains the dates separated by a slash.:
$ cat file 2012/05/21 2012/05/22The script to parse the dates. Same as the earlier ones, except the IFS is set to slash.:
$ cat test.sh #!/bin/bash OLDIFS=$IFS IFS="/" while read f1 f2 f3 do echo "Year is : $f1" echo "Month is : $f2" echo "Date is : $f3" done < file IFS=$OLDIFSOn running the above script:
$ ./test.sh Year is : 2012 Month is : 05 Date is : 21 Year is : 2012 Month is : 05 Date is : 22This is now very easy to understand. Once the IFS is set to slash, we got the year, month and date components into f1, f2 and f3 respectively. Inside the while, we can use the variables for any use.
8. Reading a file with multiple delimiters in the shell:
Assuming the sample file contents as shown below. The difference here is the 1st and 2nd fields are separated by colon, whereas the 2nd and 3rd are separated by a slash.
$ cat file Solaris:Sun/25 Linux:RedHat/30The script to parse the above file:
$ cat test.sh #!/bin/bash OLDIFS=$IFS IFS=":/" while read f1 f2 f3 do echo "OS is : $f1" echo "Company is: $f2" echo "Value is : $f3" done < file IFS=$OLDIFSOn running the script:
$ ./test.sh OS is : Solaris Company is: Sun Value is : 25 OS is : Linux Company is: RedHat Value is : 30Check out the IFS setting. IFS can be set to multiple values as well as shown. On setting it to colon and a slash, it starts splitting fields whenever it encounters either a colon or a slash. So, on encountering the colon first, it read the 1st column, and on encountering the slash, the 2nd column is read, and the last got in by default.
Very Usefull
ReplyDeleteHow to store csv values in an array?
ReplyDeleteIt is very useful article to understand reading data in a shell script. Very good work.
ReplyDeletevery nice and very helpful article.
ReplyDeleteGreat..
Really well explained article- thanks
ReplyDeleteIf the last line does not contain a "new line" characterm then it will be omitted.
ReplyDeletewhat if the contents of the file are not static, they differ from program to progarm which is going to execute the script. I want to concatenate the second column of the file, starting from second row:--
ReplyDeleteEmail|prog
Email|first@pp.com
Email|second@pp.com
I want the o/p as:--
first@pp.com;second@pp.com;
the count of email address may vary. Please help.
$ awk -F"|" 'NR!=1{print $2}' file | paste -sd";"
Deletefirst@pp.com;second@pp.com
Nice and useful
ReplyDeleteSome powerful things right there!
ReplyDelete