Search and Replace

February 6, 2007

Two great tools for search-and-replace are tr and sed.

tr – Translate (or Delete)

tr can translate a single character into another. For example, “tr 'a' 'b'” will convert all instances of “a” into “b”. Although it works on single characters, it also understands blocks, so “tr '[A-Z]' '[a-z]'” will convert uppercase to lowercase: ‘A’ becomes ‘a’, ‘B’ becomes ‘b’, and so on.

The GNU version of tr (which comes with most Linux distributions) has some handy keywords, too: “tr [:lower:] [:upper:]” is a more readable opposite of the above [A-Z] convention.

You can also specify your own set of characters, so if you want to convert ‘l’ to ’1′, ‘o’ to ’0′, and ‘e’ to ’3′, then this will do the job:

$  echo welcome | tr 'loe' '103'
w31c0m3

You could extend it to a more complete “l33t sp33k”:

$ echo abcdefghijlkmnopqrstuvwxyz | tr 'aeilost' '@3!1057'
@bcd3fgh!j1kmn0pqr57uvwxyz

tr can also just delete – this deletes the letter ‘l’ :

$ echo hello and welcome | tr -d 'l'
heo and wecome

sed – Stream Editor

UNIX uses a few metaphors, one being a water metaphor, which we use with pipes (|), redirects (<, >), and a few other places. Sed gets its name from what it does… much like tr, you stream data into it, and slightly modified data comes out the other end.

sed isn’t limited to single-character operations; it can cope with whole phrases, as well as regular expressions. I’ll keep it simple(ish) for now, I plan to do a more complete post on sed and another on regular expressions soon, though. For today, I’ll stick to the sed s/from/to/count syntax.

With the s/from/to/count syntax, sed will convert “from” to “to”, as many times (per line of text) as you specify. The special “/g” converts every instance.

I like to get stuck in with a few examples, so here goes:

$ cat text.txt
Fedora Core is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Fedora Core is definitely
the best Linux distribution for home users.
Fedora Core is certainly my favourite distribution.
$ sed s/"Fedora Core"/"Ubuntu"/g text.txt
Ubuntu is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Ubuntu is definitely
the best Linux distribution for home users.
Ubuntu is certainly my favourite distribution.
$

The syntax there was sed s/from/to/count, so it replaces “Fedora Core” with “Ubuntu” in this example. If we specified “/1” at the end, it would only convert the first instance on each line. Similarly, “/2” would convert the first two instances. “/g” is probably the most-used, it converts everything (the “g” stands for “global”).

sed can emulate tr‘s tr -d functionality by having the “to” part being an empty string; here we refer to “Fedora Core” simply as “Fedora” (note the leading space: it’s ” Core”, not “Core”) :

$ sed s/" Core"//g text.txt
Fedora is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Fedora is the
best Linux distribution for home users.
Fedora is certainly my favourite distribution.

Notice also that we can cat stuff into sed, as “cat text.txt | sed s/src/dest/g“, or we can pass the file directly to sed, like this: sed s/src/dest/g text.txt. The same applies to most *nix commands.

To get in to the rest, we’ll need to get into regexp (Regular Expressions, the stuff like “the * brown * jumped over the * dog” which result in $1 = “quick”, $2 = “fox”, $3 = “lazy”). That’s for another day, though.


Follow

Get every new post delivered to your Inbox.