Tuesday, February 8, 2011

sed - Replace or substitute file contents



 In one our earlier articles, we saw how to insert a line or append a line to an existing file using sed. In this article, we will see how we can do data manipulation or substitution in files using sed.

Let us consider a sample file, sample1.txt, as shown below:
apple
orange
banana
pappaya


1. To add something to the beginning of a every line in a file, say to add a word Fruit:
$ sed 's/^/Fruit: /' sample1.txt
Fruit: apple
Fruit: orange
Fruit: banana
Fruit: pappaya
  The character 's' stands for substitution. What follows 's' is the character, word or regular expression to replace followed by character, word or regular expression to replace with. '/' is used to separate the substitution character 's', the content to replace and the content to replace with. The '^' character tells replace in the beginning and hence everyline gets added the phrase 'Fruit: ' in the beginning of the line.

2. Similarly, to add something to the end of the file:
$ sed 's/$/ Fruit/' sample1.txt
apple Fruit
orange Fruit
banana Fruit
pappaya Fruit
  The character '$' is used to denote the end of the line. And hence this means, replace the end of the line with 'Fruit' which effectively means to add the word 'Fruit' to the end of the line.

3. To replace or substitute a particular character, say to replace 'a' with 'A'.
$ sed 's/a/A/' sample1.txt
Apple
orAnge
bAnana
pAppaya
   Please note in every line only the first occurrence of 'a' is being replaed, not all. The example shown here is just for a single character replacement, which can be easily be done for a word as well.

4. To replace or substitute all occurrences of 'a' with 'A'
$ sed 's/a/A/g' sample1.txt
Apple
orAnge
bAnAnA
pAppAyA
5. Replace the first occurrence or all occurrences is fine. What if we want to replace the second occurrence or third occurrence or in other words nth occurrence.

  To replace only the 2nd occurrence of a character :
$ sed 's/a/A/2' sample1.txt
apple
orange
banAna
pappAya
  Please note above. The 'a' in apple has not changed, and so is in orange since there is no 2nd occurrence of 'a' in this. However, the changes have happened appropriately in banana and pappaya

6. Now, say to replace all occurrences from 2nd occurrence onwards:
$ sed 's/a/A/2g' sample1.txt
apple
orange
banAnA
pappAyA
7. Say, you want to replace 'a' only in a specific line say 3rd line, not in the entire file:
$ sed '3s/a/A/g' sample1.txt
apple
orange
bAnAnA
pappaya
  '3s' denotes the substitution to be done is only for the 3rd line.

8. To replace or substitute 'a' on a range of lines, say from 1st to 3rd line:
$ sed '1,3s/a/A/g' sample1.txt
Apple
orAnge
bAnAnA
pappaya
9. To replace the entire line with something. For example, to replace 'apple' with 'apple is a Fruit'.
$ sed 's/.*/& is a Fruit/' sample1.txt
apple is a Fruit
orange is a Fruit
banana is a Fruit
pappaya is a Fruit
   The '&' symbol denotes the entire pattern matched. In this case, since we are using '.*' which means matching the entire line, '&' contains the entire line. This type of matching will be really useful when you a file containing list of file names and you want to say rename them as we have shown in one of our earlier articles: Rename group of files

10. Using sed, we can also do multiple substitution. For example, say to replace all 'a' to 'A', and 'p' to 'P':
$ sed 's/a/A/g; s/p/P/g' sample1.txt
APPle
orAnge
bAnAnA
PAPPAyA
OR This can also be done as:
$ sed -e 's/a/A/g' -e 's/p/P/g' sample1.txt
APPle
orAnge
bAnAnA
PAPPAyA
  The option '-e' is used when you have more than one set of substitutions to be done.

OR The multiple substitution can also be done as shown below spanning multiple lines:
$ sed -e 's/a/A/g' \
> -e 's/p/P/g' sample1.txt
APPle
orAnge
bAnAnA
PAPPAyA

9 comments:

  1. awesome and simple, congrats
    exactly what I was looking for.

    ReplyDelete
  2. I want to insert a pipe { | } after the first character in a file . I need to do this on all the lines. how do i do it

    ReplyDelete
  3. How do I skip certain lines? Say I want to do the above substitution but not for lines that have a comma (,) in them. Or not for those where the substitution would occur after a !

    ReplyDelete
    Replies
    1. To skip lines containing comma, you can do like this:
      sed -nr '/^[^,]+$/p' file

      Delete
  4. HI ,

    I have small question regarding line.

    how Linux decides the ending of line

    ReplyDelete