split command in Unix is used to split a large file into smaller files. The splitting can be done on various criteria: on the basis of number of lines, or the number of output files or the byte count, etc. In an earlier article, we discussed how to split files into multiple files using awk. In this, we will discuss it using the split command itself:Let us consider a file with the following content:
$ cat file Unix Linux AIX Solaris HPUX Ubuntu Cygwin1. Split a file :
$ split fileThe split command splits the file into multiple files with 1000 lines into each output file by default. The output file generated in this case is:
$ ls x* xaaSince the input file does not contain 1000 lines, all the contents are put into only one output file "xaa". By default, the output files generated contains the prefix "x", and the suffix as "aa", "ab", "ac" and so on.
2. Split file into multiple files with 3 lines each:
$ split -l 3 fileThe option -l specifies the number of lines per output file. Since the input file contains 7 lines, the output files contain 3, 3 and 1 respectively. The output files generated are:
$ ls x* xaa xab xac $ cat xaa Unix Linux AIXThe file "xab" contains the 4th till 6th line, and the file "xac" contains the last line.
3. Split file into multiple files with a user defined prefix:
$ split -l 3 file F $ ls F* Faa Fab FacThe suffix, if provided, is the last argument of the split command. Since the suffix provided is "F", the files created are "Faa","Fab", and so on.
4. Split file into multiple files with a single character suffix:
$ split -l 3 -a 1 file F $ ls F* Fa Fb FcIn the above examples, the suffixes generated are "aa","ab" and so on. If the number of output files to be created is huge, this makes sense. For our example, a single character suffix would suffice. The option -a of split allows to control the length of suffix. By providing the suffix length as 1, the files created are "Fa","Fb", and so on.
5. Split file into multiple files with a numeric suffix:
$ split -l 3 -d file F $ ls F* F00 F01 F02The option -d of split enables a numeric suffix. With this, the files generated will be "F00", "F01", "F02", and so on. To get the single digit numeric suffix:
$ split -l 3 -a 1 -d file F $ ls F* F0 F1 F2By enabling the option -a to 1, single digit numeric suffix is set.
6. Split file into multiple files with 10 bytes per OUTPUT file:
$ split -b 10 -a 1 -d file FThe -b option of split divides the file on the basis of byte count. The byte count includes the new line character present at the end of the line as well.
$ ls F* F0 F1 F2 F3 F4
$ cat F0 Unix Linux
$ cat F1 AIX SolarThe file F0 contains 10 characters 5 characters of first line (Unix + new line) and 5 characters of second line (Linux). The new line character of the 2nd line moved to the 2nd output file.
7. Split file with Kilobytes or Megabytes of data per OUTPUT file:
$ split -b 1k fileThis will split the file with 1 KB of data per OUTPUT file. Similarly, to split the file with 1MB of data per OUTPUT file:
$ split -b 1m fileNote: The commands below use the option -n which is not available in all Unix flavors.
8. Split a file into 2 files of equal length:
$ split -n 2 -a 1 -d file FAt times, the requirement can be to split a file equally into 2 files, unlike earlier case where the split is based on number of lines per output file. The n option of split does this. By specifying the "-n 2", the file is split equally into 2 files as shown below:
$ ls F* F0 F1
$ cat F0 Unix Linux AIX Solari
$ cat F1 s HPUX Ubuntu CygwinNote: -n divides the file into equal lengths on the basis of the byte count of the files. As shown above, since the file has 42 characters, it is divided into 21 characters each.
9. Split file into 2 files with complete lines of output:
$ split -n l/2 -a1 -d file FThe option "-n l/2" enables to split on the basis of complete lines. And hence, the file F0 contains the complete 4th line "Solaris", and the rest goes to the 2nd file.
$ ls F* F0 F1
$ cat F0 Unix Linux AIX Solaris10. split command to display only a section of the file:
$ split -n 1/4 file Unix LinuxThe option "-n 1/4" does not create any output files. It simply displays the file. 4 indicates to split the file into 4 equal parts or sections, and 1/4 indicates to write to stdout the 1st of the 4 sections. In other words, it displays the 1st part in the terminal. Similarly, to display the 2nd of the 4 parts:
$ split -n 2/4 file AIX SolarNote: As seen above, the output does not contain complete lines. The split is done purely on the basis of equal byte count.
Split file with complete lines:
$ split -n l/1/4 file Unix Linux
$ split -n l/2/4 file AIX SolarisBy specifying the l option, the split is done at the completion of the line.