Tuesday, May 31, 2011

cut - cut files with a delimiter



   cut is a very frequently used command for file parsing. It is very useful in splitting columns on files with or without delimiter. In this article, we will see how to use the cut command on files having a delimiter.

Let us consider a sample file, say file1, with a comma as delimiter as shown below:
$ cat file1
Rakesh,Father,35,Manager
Niti,Mother,30,Group Lead
Shlok,Son,5,Student 
 The first column indicates name, second relationship, the third being the age, and the last one is their profession.


  cut command has 2 main options to work on files with delimiters:

 -f - To indicate which field(s) to cut.
 -d - To indicate the delimiter on the basis of which the cut command will cut the fields.

Let us now try to work with this command with a few examples:

1. To get the list of names alone from the file, which is the first column:
$ cut -d, -f 1 file1
Rakesh
Niti
Shlok
   The option "-d' followed by a comma indicates to cut the file on the basis of comma. "-f" followed by 1 indicates to retrieve the field 1 from the file1, and hence we got the names alone.

2. To get the relationship alone:, i.e, 2nd field
$ cut -d, -f 2 file1
Father
Mother
Son
3.  To get 2 fields, say Name and Age:
$ cut -d, -f 1,3 file1
Rakesh,35
Niti,30
Shlok,5
   Giving 1,3 means to retrieve the first and third fields which happens to be name and age respectively.

4. To get the name, relationship and age, excluding the profession, i.e, 1st to 3rd fields:
$ cut -d, -f 1-3 file1
Rakesh,Father,35
Niti,Mother,30
Shlok,Son,5
   The option 1-3 means from first field till third field. Whenever we need a range of fields to be retrieved, we use the '-' option.

 The same result above can also be retrieved in other ways also: 
$ cut -d, -f -3 file1
Rakesh,Father,35
Niti,Mother,30
Shlok,Son,5
This is the best of the 3 methods to retrieve a range of fields. The option "-3" means from the beginning i.e, the first field till the third field. And hence we get the fields 1, 2 and 3.

5. To retrieve all the fields except the name field. i.e, to retrieve from field 2 to field 4:
$ cut -d, -f 2- file1
Father,35,Manager
Mother,30,Group Lead
Son,5,Student
  Similar to the last result, "2-" means from the second field till the end which is the 4th field. Whenever the beginning of the range is not specified, it defaults to 1, similarly when the end of the range is not given, it defaults to the last field. The same result could have been achieved using the option "2-4" as well.

Let us consider the same input file with a space as the delimiter:
$ cat file1
Rakesh Father 35 Manager
Niti Mother 30 GL
Shlok Son 5 Student
   The same options and commands used above hold good but for the delimiter specified. When comma is the delimiter, we can give it after the -d option. However, for the space as delimiter, we need to quote the delimiter as shown below. In fact, we can always quote the delimiter to be in the safer side.

6. To retrieve the first field from a space delimited file:
$ cut -d" " -f 1 file1
Rakesh
Niti
Shlok
Let us consider the same file separated by tab space:
$ cat file1
Rakesh  Father  35      Manager
Niti    Mother  30      GL
Shlok   Son     5       Student
 To actually confirm the file is indeed separated by tab space, use the "-t"  option with the cat command:
$ cat -t file1
Rakesh^IFather^I35^IManager
Niti^IMother^I30^IGL
Shlok^ISon^I5^IStudent
   The ^I indicates a tab space.

7. To retrieve the first field from this tab separated file. How to specify the tab space with the "-d" option?
$ cut -f 1 file1
Rakesh
Niti
Shlok
  Surprised!!  The default delimiter of the cut command is the tab space, and hence when we have a file which is tab separated, we need not specify the "-d" option at all. Directly, the "-f" option can be used to retrieve the fields.

Happy Cutting!!!

3 comments:

  1. Good examples that elaborate the use of cut. Keep on posting such things using other commands.

    ReplyDelete
  2. When we have no value between tabs how do we get 4th column

    ReplyDelete
    Replies
    1. Same way. cut -f 4 file --since no value, it will simply print empty line

      Delete