Wednesday, March 7, 2012

Disk Usage and Free Space- Housekeeping

  File system and directories should be kept clean and tidy, else they can anytime messup. Say, you get a mail from your system admin saying the filesystem is full, and asking you to clean up the files. Or say your TL comes and stands next to you asking you to free some space in your account so as to get some free space.How do we do it?  In this article, we will see about these house keeping stuffs.

1. How do you find out the file system capacity or free space?

The df command lists all the filesystems in the system. Infact, df -h gives more readable output:
$ df -h
Filesystem   Size  Used   Avail Use%  Mounted on
/dev/dev1    97G   6.0G   86G   7%    /
/dev/dev2    97G   96G     1G   99%   /home

 The used column tells the used space, and the available column tells the available space. This indicates /home is 99% full. The "-h" is to give the output in readable format such as kb,MB or GB.

2. How to find out which user consumes the maximum space?
    In my case, all the users home directory are under /home. So, I get into the /home directory, and fire the below command:
$ du -s * | sort -nr | head -5
1959288 user3
906328  user2
726000  user1
560800  user4
   du command gives the disk usage in kilo bytes.. When du is run on any given directory, it gives the disk usage of each and every file present in the directory and their sub-directories. The "-s" option in du gives only the usage in summary for a directory which is what ideally we need in this case. We need the summary of every directory's usage under /home and hence "*". sorting the output gives the top disk users in the beginning of the output. And 'head -5' gives us the top 5 users.

  In order to get the disk usage in more understandable format, use
$ du -sh *
4.1G    user3
2.1G    user1
100M    user2
   As seen above, '-h' option gives the output in the form of KB, MB and GB.

3. Say, you are asked to clean up your account. How will you find the biggest files in your account?
$ find . -type f  -exec ls -l '{}' \; | awk '{print $5, $NF}' | sort -nr | head -5

  What the above command does:

  1. It finds all the files and does a long listing of them(find & ls -l).
  2. Only the filesize and the filenames alone are filtered.($5, $NF)
  3. Sorted on the basis of file size.(Biggest files at the top).
  4. The big 5 files get displayed.(head -5)

Note: Do not run the above command where there are huge number of files present. It will take a long time to respond.Also, in the awk command, $5 denotes the filesize in Linux. It might be different in other *nix flavors.

 3a. There could be cases when the user is specifically interested only to find big files above a particular size, say above 100MB:
$ find . -type f -size +100M -exec ls -l '{}' \; | awk '{print $5, $NF}' | sort -nr | head -5
  In the above find command, size switch is used to find files on the basis of size. '+100M' indicates files bigger(+) than 100MB.

3b. Similarly, to find files of size between 100MB and 200MB:
$ find . -type f -size +100M -size -200M -exec ls -l '{}' \; | awk '{print $5, $NF}' | sort -nr | head -5
    +100M indicates files bigger than 100MB, -200M indicates files smaller than 200MB. In other words, we will get files of size between 100 and 200MB.

   The notations to specify in the size switch of the find command is :
     For greater than 50KB,   +50k     (small k)
           greater than 50MB,   +50M     (big M)
           greater than 5GB,    +5G         (big G)

4. For this requirement, we can have a handy script to find the big files. I call this script fbig:
$ cat fbig

if [ -n "$1" -a "$1" = "-h" ];then
  echo "Usage: fbig <from-size> <to-size>[opt.]"
  echo "Exmpl: fbig         #top 5 files consuing space"
  echo "Exmpl: fbig 100M    #top 5 files bigger than 100M"
  echo "Exmpl: fbig 10M 25M #top 5 files between 10M and 25M"

if [ $# -eq 0 ]; then

   find . -type f  -exec ls -l '{}' \; | awk '{print $5, $NF}' | sort -nr | head -5

elif [ $# -eq 1 ]; then

   find . -type f -size +$1 -exec ls -l '{}' \; | awk '{print $5, $NF}' | sort -nr | head -5

elif [ $# -eq 2 ];then

   find . -type f -size +$1 -size -$2 -exec ls -l '{}' \; | awk '{print $5, $NF}' | sort -nr | head -5

This is how the script is run:
$ ./fbig     # This will give big 5 files. Might take lot of time depending upon from where you are firing this.

$ ./fbig 100M   #big 5 files of size 100MB and above.

$ ./fbig 100M  200M  #big 5 files of size between 100MB and 200MN.

Happy Housekeeping!!!

1 comment:

  1. please note that file names with spaces, although correctly passed from find to ls and to awk, they lead to corrupted file name in the final output. Only the last fragment of the bigest files are reported in these cases. I tried several quoting variants without success.