The UNIX School: grep vs awk

In this article, we will see more awk alternatives for the frequently used grep commands. This is in continuation to the grep vs awk - Part 1. Let us consider a sample file as shown below:

$ cat file
Unix

Linux
Uniix
Solaris
AIX
ArchLinux
Ubuntu

1. Search for a pattern present in a variable

$ x="Linux"
$ grep "$x" file
Linux

The variable x contains the search pattern. Using grep, the variable can directly be used.

$ awk -v var="$x" '$0 ~ var' file
Linux

In awk, the variable cannot be used directly. It needs to be passed to awk from shell using the -v option. More about the variable passing to awk.

2. Search for lines beginning with a specific pattern:

$ grep '^S' file
Solaris

The ^ symbol is used to search for lines which begin with. In this example, we are trying to search for lines which are beginning with is S.

$ awk '/^S/' file
Solaris

3. Search for lines ending with a specific pattern:

$ grep 'x$' file
Unix
Uniix
Linux
ArchLinux

The $ symbol is used to search for matches which are ending with.

$ awk '/x$/' file
Unix
Uniix
Linux
ArchLinux

4. To search for a exact word in a file:

$ grep -w Linux file
Linux

The option -w is used in grep for the specific word search. The pattern ArchLinux did not retutrn because 'Linux' is just part of the the word.

$ awk '{for(i=1;i<=NF;i++)if($i=="Linux"){print;}}' file
Linux

In awk, we need to loop over the columns and compare the value against every column.

5. To get count of lines matching pattern:

$ grep -c ix file
2

The -c option of grep gives the count of lines matching the pattern. The "ix" pattern is matched in 2 lines.

$ awk '/ix/{x++}END{print x}' file
2

In awk, whenever the pattern "ix" is matched, a variable x is incremented. At the end of the file processing, x contains the count of lines matching the pattern.

6. Search for a pattern which exactly matches the whole line:

$ grep -x Linux file
Linux

grep has the -x option for matching exact matches.

$ awk '/^Linux$/' file
Linux

In awk, we use the ^ (beginning with) and $(ending with) meta characters to search for the exact match.

7. Search for non-empty lines or lines containing atleast one character:

$ grep . file
Unix
Linux
Uniix
Solaris
AIX
ArchLinux
Ubuntu

The . matches any character. Since empty line does not have any, it does not match.

$ awk NF file
Unix
Linux
Uniix
Solaris
AIX
ArchLinux
Ubuntu

NF indicates number of fields. Since an empty line will have NF as 0, it does not get matched.

8. Extract part of the line instead of the entire line:

$ grep -o "..$" file
ix
ux
ix
is
IX
tu

By default, grep prints the entire line which matches the pattern. Using -o option only a part of string can be extracted. .. extracts 2 characters, ..$ extracts 2 characters from the end of the line.

$ awk 'NF{print substr($0,length-1);}' file
ix
ux
ix
is
IX
tu

awk uses the sub-string function(substr) to do the extraction.

Let us consider another sample file with 2 columns in it:

$ cat file
Unix 2
Linux 3
Uniix 4
Solaris 5

9. Extract the 2nd column from the lines containing a specific pattern :

$ grep Uni file | cut -d" " -f2
2
4

grep extracts the lines containing the pattern, and cut extracts only the 2nd column.

$ awk '/Uni/{print $2}' file
2
4

In awk, searching a pattern and printing a particular column all are done as part of the same command.

10. Extracting first 3 characters from lines containing a specific pattern:

$ grep x file | cut -c 1-3
Uni
Lin
Uni

grep extracts lines containing the pattern 'x' and cut cuts the 1st three columns.

$ awk '/x/{print substr($0,1,3)}' file
Uni
Lin
Uni

To know& learn more about the awk, you can refer to the awk related articles.

Pages

Wednesday, January 11, 2017

grep vs awk - Part 2

No comments:

Post a Comment