The UNIX School: grep vs awk : 10 examples of pattern search

Tuesday, September 18, 2012

grep vs awk : 10 examples of pattern search

"grep vs awk" does not mean comparison between awk and grep because there is no comparison to awk whatsoever from grep. awk is a highly powerful programming language whereas grep is just a filtration tool. However, many of the things which is done using grep and few more commands can be done using a simple awk command. In this article, we will see the awk altenative for the frequently used grep commands.

Let us consisder a file with the following contents:

$ cat file
Unix
Linux
Solaris
AIX
Ubuntu
Unix

1. To search for the pattern Linux in file:

$ grep Linux file
Linux

The most simple grep statement. grep followed by pattern name searches for all the lines matching the pattern in the file.

$ awk '/Linux/' file
Linux

Using awk, the same thing can be done by placing the pattern within slashes.

2. To do a case-insensitive search for 'Linux':

$ grep -i linux file
Linux

-i option in grep does a case-insensitive search.

$ awk '/linux/' IGNORECASE=1 file
Linux

IGNORECASE is a special built-in variable present in GNU awk/gawk. When it is set to a non-zero value, it does a case insentive search.

3. To count total number of lines containing the pattern 'Unix':

$ grep -c Unix file
2

The -c option of grep does the total count of the patterns present in a file. Keep in mind, this does a line count. So, if a pattern is present twice in a line, it still will be counted as one.

$ awk '/Unix/{x++;}END{print x}' file
2

A variable x is incremented when the pattern Unix is encountered. And once the end of the file is reached(END), the count is printed.

4. To get the list of filenames containing the pattern 'AIX':

$ grep -l AIX file*
file

-l option of grep does not print the pattern. It just prints the filename containing the pattern. This example also shows grep can search in multiple files.

$ awk '/AIX/{print FILENAME;nextfile}' file*
file

awk has a special variable FILENAME which contains the name of the file which is currently being worked upon. So, the FILENAME is printed everytime the pattern is encountered. nextfile is the awk command to quit the current file and start working on a new file. Without this command, if the pattern is present twice in a file, the file name will also get printed twice.

5. To print the line number along with the pattern matching line:

$ grep -n Unix file
1:Unix
7:Unix

-n option of grep is used to print the line number along with the pattern matching line.

$ awk '/Unix/{print NR":"$0}' file
1:Unix
7:Unix

awk has a special built in variable NR which contains the line number of the particular line being processed.

6. To search for multiple patterns 'Linux' & 'Solaris' in the file:

$ grep -E 'Linux|Solaris' file
Linux
Solaris

-E option is used for extended regular expressions. Using -E, multiple patterns can be provided to search.

$ awk '/Linux|Solaris/' file
Linux
Solaris

No special option is needed for the awk command. awk, by default, can accept multiple patterns using the pipe.

7. To do a negative search for a pattern 'Linux':

$ grep -v Linux file
Unix
AIX
Ubuntu
Unix

-v option of grep gives the inverse result.i.e, it prints all lines not containing the search pattern.

$ awk '!/Linux/' file
Unix
AIX
Ubuntu
Unix

By giving the exclamation before the pattern, all the lines not containg the pattern is printed.

8. To print a line next to the pattern match and also the line containing the pattern 'Linux':

$ grep -A1 Linux file
Linux
Solaris

-A 1 prints one line which is next to the line containing the pattern. The line containing the pattern also gets printed since it the default grep behavior.

$ awk '/Linux/{print;getline;print}' file
Linux
Solaris

'print;getline;print' => print prints the current line which contains the pattern. getline gets next line from buffer and stores in $0. print prints the line present in $0. In this way, lines next to the pattern can also be printed.

9. To print a line before the pattern match and also the line containing the pattern 'Solaris':

$ grep -B1 Solaris file
Linux
Solaris

-B is for printing lines before the pattern.

$ awk '/Solaris/{print x;print;next}{x=$0;}' file
Linux
Solaris

Every line read is stored in the variable x. Hence, when the pattern matches, x contains the previous line. And hence, by printing the $0 and x, the current line and the previous line gets printed.

10. To print the previous, the pattern matching line and next line:

$ grep -C1 Solaris file
Linux
Solaris
AIX

-C is to print both lines above and below pattern.

$ awk '/Solaris/{print x;print;getline;print;next}{x=$0;}' file
Linux
Solaris
AIX

Just the combinations of solutions given for above and below.

11 comments:

UnknownSeptember 19, 2012 at 3:15 AM
Do you realize that the sample file you list at the top does not actually contain the word "Linux", therefore many of the examples are incorrect. Otherwise, good article.
ReplyDelete
Replies
UnknownSeptember 20, 2012 at 8:51 AM
I love this blog!
ReplyDelete
Replies
vinayOctober 17, 2012 at 2:07 AM
Very Good Blog. Thank You !
ReplyDelete
Replies
Ed MortonApril 13, 2013 at 10:04 PM
Many of the examples used in this page are incorrect, do not follow the advice given here. For example the correct awk answer for "To print a line next to the pattern match and also the line containing the pattern 'Linux':" is NOT "awk '/Linux/{print;getline;print}' file", instead it's "awk '/Linux/{c=2} c&&c--' file". The former will cause you all sorts of headaches in MANY situations including if you try to enhance it in future to print N subsequent lines instead of 1, or if you want to run it on multiple input files since the "getline" will jump over the first line of the next file if the pattern matched is the last line of the current file.
ReplyDelete
Replies
UnknownMay 26, 2013 at 12:25 AM
Please help with this Unix query :

I want to search for a pattern "abc" in a folder /home/rahul and want to copy all the files found to the new place /home/rahul2

ReplyDelete
Replies
dhaval.shukla.pmOctober 7, 2014 at 12:11 AM
Thanks for sharing info regarding awk and grep
ReplyDelete
Replies
vivekMarch 9, 2016 at 3:51 PM
This is what I was looking for. Thanks Guru
ReplyDelete
Replies
UnknownOctober 27, 2018 at 9:33 AM
Hi
A question that has been a source of cranial discomfort :)
A CSV file of data containing 2 columns, the first being a LUN ID and the second a VG name:
33213600507680C80078912000000000888DA04214503IBMfcp,mxfarcvg
33213600507680C80078912000000999999DA04214503IBMfcp,mxfarcvg01
33213600507680C80078912000000000888DB04214503IBMfcp,mxamurexvg
33213600507680C80078912000000000888E304214503IBMfcp,mxbarcvg
33213600507680C80078912000000000888E204214503IBMfcp,mxrmurexvg
33213600507680C80078912000000999999E304214503IBMfcp,mxfmxrvg
33213600507680C80078912000000999999E204214503IBMfcp,mxfmxrexvg

To extract the LUN ID for all names starting with "mxr", I currently :
cat lun.csv | awk -F, '{print $2}' | grep ^mxr
Then for loop the result to get each LUN ID, laborious but works.

This works when not using a script variable:
awk -F, '$2 ~ /^mxr/{print $1" "$2}' lun.csv (which I then put into an array)

but using a script variable fails miserably and I can't figure out why.
VGNAME = script variable for the the name I am extracting & "-v" defines it as an awk variable for the awk command:
awk -F, -v VGN=${VGNAME} '$2 ~ /^VGN/{print $1" "$2}' lun.csv (fails)
however:
awk -F, -v VGN=${VGNAME} '$2 ~ /VGN/{print $1" "$2}' lun.csv works but then matches 'containing' instead of 'starts with' resulting in incorrect data being extracted.

I have tried escaping and various other tricks to no avail.

Any ideas would be appreciated!

I also need an equivalent awk search example for "grep -w" if you would be so kind.
ReplyDelete
Replies

Add comment

Pages

Tuesday, September 18, 2012

grep vs awk : 10 examples of pattern search

11 comments: