Sed: simple pattern address usage

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.

« Public key cryptography 101 Encryption operating modes: ECB vs CBC »

Posted on 06 Dec 2010
Tagged with: [ pattern ] [ regex ] [ sed ]

Most people I know use sed for simple and fast translation of some keyword in files. For instance, changing ports and tags inside configuration files during deployment to production servers. This results in sometimes clumsy scripts to make sure that sed changes a keword on line 4, but not on line 40. Most people I know have no idea that the way you can actually limit the range in which sed has to operate. Let’s explore…

Sed’s general form is something like this:

[address[,address]][!]command [args]

As you can see, it LOOKS simple enough, but it really isn’t. Let’s take a look at a simple sed-action:

echo "Hello world" | sed -e 's/world/people/'

This will output “Hello people” since it changes the text “world” into “people”. If instead of 1 line, we streamed a file into sed, it will replace the first “world” string for every line in the file. If you have 2 “world” strings on one line, you need to add the “g” option (g meaning global here).

cat largefile.txt | sed -e 's/world/people/g'

Meet sed’s addressing

This is all fine and dandy, until you don’t want to replace all world-tags in a file, but just the tags in the first, say, 10 lines. It’s very easy to accomplish this in sed:

cat largefile.txt | sed -e '1,10 s/world/people/g'

This looks the same, only we have added an address-range to sed. In this case, it will change world into people on line 1 to 10, but sometimes we want to do the opposite: change on all lines EXCEPT 1 to 10. Still easy enough:

cat largefile.txt | sed -e '1,10 ! s/world/people/g'

Now for the fun part

It is not only possible to add line numbers for addressing, but you can also add a regex if you like (or any mix).

cat largefile.txt | sed -e '/^start/,/^end/ s/world/people/g'

this will change world to people for every line after finding a line that starts with “START”. It will stop after finding a line that starts with “END”. So let’s see, which lines will be changed in the following files:

hello world
    
this is a world readable test file
let's start shall we?

start
the world is not enough. But the world should be.
end

start with a better world
ending with a world of hate is not an option

Did you catch them all? Here’s the answer:

hello world

this is a world readable test file
let's start shall we?

start
the people is not enough. But the people should be.
end

start with a better people
ending with a people of hate is not an option

An advanced example

Let’s see how some more advanced usages work. These examples use application.ini, which is a zend framework configuration file, but basically any INI file will do. Let’s “remove” the [testing] section by commenting out the data. Note that we don’t know (nor care) where (or if) this [testing] section is present or how long it is.

cat application.ini |
sed -e '/^\[testing]/,/^\[/ { /^\[testing]/b ; /^\[/b; s/\(.*\)/**DELETED** \1/ }'</pre>

So, what does this actually do then? Let’s take a look at the address range:

The address-range:

/^\[testing]/,/^\[/

this sed script matches (every) line that starts with [testing] and ends with a line that starts with [. When dealing with ini-files, this basically means you match a complete [testing] section, INCLUDING the [testing] and next [..] lines. It looks a bit messy but the same /RE1,RE2/ syntax can be found.

The { } brackets makes it possible to group multiple sed commands just like php. In this case, we have 3 commands, all separated by a ; (sounds very familiar does it?)

/^\[testing]/b

This command tells sed to “branch” to the end of the script (and it will continue with a new line). In effect, it’s a check that no other commands will be run when we match the initial [testing block. It would be easier if sed had a way to exclude the address-ranges but alas. So we are forced to check the start and end matches ourselves.

/^\[/b

As said, we checked the start, so we have to check the end as well, so these lines don’t get changed by sed.

s/\(.*\)/; **DELETED** \1/

When we arrived here, we can safely change our lines. In this case, it will do a substitution (the ‘s’) where it will match .*, which is regex-speak for the whole line. The () are needed to capture this whole line into the \1 parameter we use inside the substition.

We substitute the line by changing the line to “; **DELETED** “, and adding the \1 to it. This effectively appends the “; ** deleted **” string in front of the line.

That’s it.. The whole INI-section called [testing] is commented out in one go and you know a bit more about sed’s address usage.

Enjoy..

« Public key cryptography 101 Encryption operating modes: ECB vs CBC »