In one of our earlier articles on awk series, we had seen the basic usage of awk or gawk. In this, we will see mainly how to search for a pattern in a file in awk. Searching pattern in the entire line or in a specific column.
Let us consider a csv file with the following contents. The data in the csv file contains kind of expense report. Let us see how to use awk to filter data from the file.
$ cat file Medicine,200 Grocery,500 Rent,900 Grocery,800 Medicine,6001. To print only the records containing Rent:
$ awk '$0 ~ /Rent/{print}' file Rent,900~ is the symbol used for pattern matching. The / / symbols are used to specify the pattern. The above line indicates: If the line($0) contains(~) the pattern Rent, print the line. 'print' statement by default prints the entire line. This is actually the simulation of grep command using awk.
2. awk, while doing pattern matching, by default does on the entire line, and hence $0 can be left off as shown below:
$ awk '/Rent/{print}' file Rent,9003. Since awk prints the line by default on a true condition, print statement can also be left off.
$ awk '/Rent/' file Rent,900In this example, whenever the line contains Rent, the condition becomes true and the line gets printed.
4. In the above examples, the pattern matching is done on the entire line, however, the pattern we are looking for is only on the first column. This might lead to incorrect results if the file contains the word Rent in other places. To match a pattern only in the first column($1),
$ awk -F, '$1 ~ /Rent/' file Rent,900The -F option in awk is used to specify the delimiter. It is needed here since we are going to work on the specific columns which can be retrieved only when the delimiter is known.
5. The above pattern match will also match if the first column contains "Rents". To match exactly for the word "Rent" in the first column:
$ awk -F, '$1=="Rent"' file Rent,9006. To print only the 2nd column for all "Medicine" records:
$ awk -F, '$1 == "Medicine"{print $2}' file 200 6007. To match for patterns "Rent" or "Medicine" in the file:
$ awk '/Rent|Medicine/' file Medicine,200 Rent,900 Medicine,6008. Similarly, to match for this above pattern only in the first column:
$ awk -F, '$1 ~ /Rent|Medicine/' file Medicine,200 Rent,900 Medicine,6009. What if the the first column contains the word "Medicines". The above example will match it as well. In order to exactly match only for Rent or Medicine,
$ awk -F, '$1 ~ /^Rent$|^Medicine$/' file Medicine,200 Rent,900 Medicine,600The ^ symbol indicates beginning of the line, $ indicates the end of the line. ^Rent$ matches exactly for the word Rent in the first column, and the same is for the word Medicine as well.
10. To print the lines which does not contain the pattern Medicine:
$ awk '!/Medicine/' file Grocery,500 Rent,900 Grocery,800The ! is used to negate the pattern search.
11. To negate the pattern only on the first column alone:
$ awk -F, '$1 !~ /Medicine/' file Grocery,500 Rent,900 Grocery,80012. To print all records whose amount is greater than 500:
$ awk -F, '$2>500' file Rent,900 Grocery,800 Medicine,60013. To print the Medicine record only if it is the 1st record:
$ awk 'NR==1 && /Medicine/' file Medicine,200This is how the logical AND(&&) condition is used in awk. The records needed to be retrieved is only if it is the first record(NR==1) and the record is a medicine record.
14. To print all those Medicine records whose amount is greater than 500:
$ awk -F, '/Medicine/ && $2>500' file Medicine,60015. To print all the Medicine records and also those records whose amount is greater than 600:
$ awk -F, '/Medicine/ || $2>600' file Medicine,200 Rent,900 Grocery,800 Medicine,600This is how the logical OR(||) condition is used in awk.
Thanks, that help me! Are regex matching against fields and complex boolean patterns allowed in POSIX awk?
ReplyDeleteIn the example above for the expenses:
ReplyDelete> Medicine,300
Grocery,800
Rent,900
When I try to grep Rent using following commands, the behavior is different:
> awk -F"," '$1 ~ /Rent/' expenses
o/p - Rent,900
> awk -F"," '$1 ~ /^Rent$/' expenses
No o/p
> awk '/^Rent$/' expenses
No o/p
> awk '/Rent/' expenses
o/p - Rent,900
Thank you,
ReplyDeletecan i use conditions with mach excretions inside awk?
# awk '/thank you/' thankfulnote.txt
ReplyDeleteThe pattern are very useful for me, thank you !
#
Thanks for sharing this amazing post with us. I have read your article and found very important information.
ReplyDelete