Tuesday, March 15, 2011

comm - The beautiful comparison



 comm is one of the most underused commands in UNIX. I have seen many times programmers resorting to writing a shell script to achieve something which comm could have done it easily. Let us see how to use this comm command.

 Say, you have a file, file1, which contains the list of files present in a version. And file2 contains the list of files received in the next version:


$ cat file1
f1.c
f2.c
f3.c
f4.c
f5.c
$ cat file2
f1.c
f3.c
f4.c
f6.c
f7.c
1. Now, When you run the comm command on these files, this is what you get:
$ comm file1 file2
                f1.c
f2.c
                f3.c
                f4.c
f5.c
        f6.c
        f7.c
 The output is split in 3 columns. Column1 indicates files which are unique in file1, column 2 indicates files unique to file2. Column 3 indicates files common between them. comm command provides some real good options with which you can filter the output better.

2.  Now, say you want to find out only the list of files  which were there in the older version but not in the newer version:
$ comm -23 file1 file2
f2.c
f5.c
   The option -23 indicates to remove the second and third columns from the comm command output, and hence we are left with only the first column which is the files unique in file1.

3. Similarly, to find out the list of files which were not there in the old version, but has been added in the new version:
$ comm -13 file1 file2
f6.c
f7.c
  As explained above, -13 option tells to remove the first and third columns from the comm output.

4. Finally, to know the list of files which have been retained, or common in both the versions:
$ comm -12 file1 file2
f1.c
f3.c
f4.c
Note: When you apply comm command on files, the files should be sorted. This command works only on sorted files.

  In one of our future articles, we will see how to identify if the files which have been retained but got their content changed.

2 comments: