Thursday, March 29, 2012

10 good shell scripting practices



   Everybody working in UNIX can do a decent level of shell scripting. Depending on your expertise, the kind of commands you use, the way you look at a problem, the mode of arriving at a solution could be different. For people in their early stages of shell scripting, it is good to follow certain practices which will help you  in learning the art faster, cleaner and better. We discussed about the 5 important things to follow to  become good in shell scripting in one of our earlier articles. Along the same lines, the following are the 10 points in which we will discuss about the good practices:

1. Learn less, do more: You want to do some scripting in UNIX, be it in shell, perl or python. So, you took a book and started  studying. Some people study the entire thing first and start practising. This might be a good method for some, but I do not believe too much in this. Instead, just study the basic, the very basic with which you can start with something. Once done with the basic, start writing simple programs. Gradually, build your requirement. Go back and develop your program further. If you get stuck due to lack of knowledge, go back to the book. Read what you want. Come back and start developing. Build requirement further. Read a little more if stuck. Carry on. I believe in this way than to read the entire stuff first because how much ever we read, unless we start practicing, we cannot correlate lot of things, and it does not add much value to your study either. This learn-practice-stuck method helps me a lot.

2. Try at the command prompt: Some times, we get some error in our script. We fix the error, run the script, it errors again. And this fix and error process goes on for some time. At times, shell script errors will be a little misleading in the sense the actual location of the error could be something else. First, we need to zero in on the line or the command which is the problematic one. This can easily be done by having some debug statements before and after the statement. Once the statement is found, try to execute the same command at the command prompt by preparing the requisite inputs. Once it starts working properly at the command prompt, you can easily figure out the reason why it is not working in the script which could be due to some incorrect input, or due to some environment variables mismatch, or a binary from different location being referred, etc. This makes the debugging a lot easier because its easier to fix the issue when you run at the prompt that in the script which is surrounded by a lot of many other things.

3. Keep big files in mind: An issue comes to you. You provide a solution just to address the issue which is currently in hand without looking at the big picture or in other words, the big files. Say, for example, we want the first line in a file or the header line:
sed -n '1p' file
   This will, of course, give the first line which you are looking for. What if the file being worked upon is a huge file with millions of records?  The above sed command, even though prints only the first line, ends up parsing the entire file which might create performance problems to you with big files.

The solution:
sed -n '1p;1q' file
  This command, will simply print the first line and quit.

4. Try different ways always: There is more than one way to do anything in scripting. You get a requirement, and you provide a solution in a particular way.  Next time you come across a requirement of almost the same type, do not do it in the same way as you did earlier. Just try doing it in some other way. Some other day, try something even new. The many different options you become aware, the more grip you will get on things, and the more different your thinking will be.
Ex:
if [ $? -eq 0 ]
then
   echo "Success"
fi
Another way:
[ $? -eq 0 ] && echo "Success"
  Now, you will know why we have some many articles in this blog with the title "Different ways of ....": Different ways of deleting Ctrl-M character, different ways of doing arithmetic in UNIX, etc...All these articles are to help the blog subscribers to always keep your options open, and not to stick to one particular way of doing.

5. Do it. Do it faster: Scripting is done to save time, to improve our productivity, to make things faster.  By the way, don't we take a lot of time to write and test a script?   Say, we want to write a script. We open a file, write stuff, save the file. Run the script. Got errors. Open the file again. Correct errors. Save it. Run it. Open the file again. Correct errors. And this process goes on. In one of our earlier articles Shell script to do shell scripting faster, we saw how we can considerably reduce the time to write and test a shell script on the fly(as we write) without coming back to command prompt. Use methods like these to write or design your scripts faster. I use this script always. And I can definitely say that I have saved a lot of time with this.

6. Use Internal commands a lot:  In scenarios, wherever possible, go for internal commands than external commands. In one of our earlier articles on Internal vs External commands, we saw the differences between internal and external commands. Using internal commands will  always benefit you. Depending on the size of the input files being processed, the internal commands can save you a lot in performance. Not always you get a choice to choose internal over external, but in some scenarios, one can definitely take the right option.

7. Useless use of cat(UUC): This is one of the things which we witness frequently in forums. The term "useless use of cat " refers to usage of the cat command when there was actually no need for it. In fact, many users are pretty much used to using of UUC. UUC makes your program ugly and leads to increase in performance even though you will always get the expected result.

UUC Example:
$ cat /etc/passwd | grep guru
Right way:
$ grep guru /etc/passwd
   As shown above, the usage of cat was not needed at all. Many users are so used to starting a command with "cat", they are not able to give it up :). Never use the cat if there is no need for it.

8. Read error messages: One of the common mistakes done by a user is When we type any command and if it results in error, most of us just look at the error in a flash without actually reading the error message. Most of the times, the error message itself contains the solution needed. More importantly, at times, say we are working to fix an error. After a fix, we run again only to get the same error. Fix again, and error again. This goes on for some time. In between, it might have so happened that the original error actually went off and some new error is coming which we might have overlooked. And we still keep wondering why the fix is not working. So, always read the error messages very very carefully.

9. Do not make big commands: You are trying to filter out a particular component of a big output. While doing this, we might end up achieving using a lot of commands in sequence with each command piping the output to the next one. Though we might actually get the expected end result, it does not look good, and more importantly pretty tough to understand for people. Having said this, there are situations where one cannot avoid this. Still, a user should try to avoid piping lot of commands into one. The following are some of the genuine cases where piping of multiple commands can be avoided.

Ex: To retrieve the username for the user-id 502.

Not good:
$ grep 502 /etc/passwd | cut -d: -f1
OR
$ grep 502 /etc/passwd | awk -F":" '{print $1}'
Good:
$ awk -F":" '$3==502{print $1}' /etc/passwd

   As shown above, the requirement can actually be achieved using a single awk command.

10. Always use comments to give a brief: A script is written. After a week or two, you open the script and go over it,  you take a little time to understand it if there are no comments in it, inspite of we being the author, resulting in a little waste of time. Now, imagine if somebody else opens and tries to understand it, more waste of time. Scripts are written to save time, no way we can end up wasting time to understand those things whose sole purpose is to save time. Always make it a habit to put comments in your script and make it more readable. The comments need not be very detailed, but just enough for a person to understand when he reads it. It always helps.

4 comments:

  1. Great article. I am definately a "over catter", and this article will make me remember to do that less. Where would you recommand professional training for data manipulation techniques?

    ReplyDelete
    Replies
    1. Thanks for leaving your comment. I am from India, do not know where you are from. But whereever it is, usually you will find training on shell scripting or Perl scripting, not specifically for data manipulation techniques as such.

      Delete
  2. Suggest me a good book to start learning shell scripting I have basic idea of unix command ....will prefer Indian writer

    ReplyDelete
  3. Hey Guru, Hope you are doing well, you taught me shell scripting in 2007. It's great to refer this page of yours, very knowledgeable indeed. Keep going..Neha.

    ReplyDelete