Tuesday, August 28, 2012

5 examples to extract substring in bash / ksh93 shell



Sub-string is nothing but extracting a part of a string. In an earlier article, we discussed the various ways of how to extract a sub-string from every line in a file. In this article, we will see how to extract sub-string exclusively in bash / ksh93 and its options. Moreover, this substring extraction is an internal command, and hence it is very good from performance perspective:

 The syntax of the sub-string extraction is:
${variable:offset:length}
where variable is the variable containing the string,
          offset specifies the position from where to start extracting the string,
          length specifies the length of characters to be extracted from the offset.

 Let us consider a variable containing a string "Solaris"
$ x="Solaris"
1. To extract the first 3 characters :
$ echo ${x:0:3}
Sol
In the offset, 0 indicates the 1st character, 1 indicates 2nd character and so on. From the 0th position, 3 characters are extracted, as a result of which we get the first 3 characters.

2. To extract from the 4th character onwards:
$ echo ${x:3}
aris
Offset 3 indicate 4th character, and since no length is specified, the entire string from the 4th character onwards is extracted.

3. The whole string is extracted if the offset is 0.
$ echo ${x:0}
Solaris
 With the above examples, this is self explanatory.

4. To extract a single character, say the 4th character alone:
$ echo ${x:3:1}
a
cut command is used a lot when people want to extract a specific character. But bash, provides you better options.

5. To extract the last 3 characters from a string :
$ echo ${x:-3}
Solaris
What happened!!! Unlike the sql, where a negative offset retrieves characters from behind, bash does not do so.
$ echo ${x:(-3)}
ris
    Simply put the negative number in brackets. Thats it. However, this will work only in bash, not in ksh93.
$ echo ${x:${#x}-3}
ris
This one works in both bash and ksh93. Checkout the offset here. ${#x} gives the length of the string. ${#x} - 3 gives the offset which is equivalent to the position of the 3rd last character, and hence we can retrieve the last 3 characters.

No comments:

Post a Comment