Harnessing the flexibility of Regular Expressions with Grep

There are two sides to grep – like any command, there’s the learning of syntax, the beginning
of which I covered in the grep tool tip. I’ll
come back to the syntax later, because there is a lot of it.

However, the more powerful side is grep‘s use of regular expressions. Again, there’s not room
here to provide a complete rundown, but it should be enough to cover 90% of usage. Once I’ve got a library
of grep-related stuff, I’ll post an entry with links to them all, with some covering text.

This or That

Without being totally case-insensitive (which -i) does,
we can search for “Hello” or “hello” by specifying the optional
characters in square brackets:

$ grep [Hh]ello *.txt
test1.txt:Hello. This is  test file.
test3.txt:hello
test3.txt:Hello
test3.txt:Why, hello there!

If we’re not bothered what the third letter is, then we can say “grep [Hh]e.o *.txt“, because the dot (“.”) will match any single character.

If we don’t care what the third and fourth letters are, so long as it’s “he..o”, then we say exactly that: “grep he..o” will match “hello”, hecko”, heolo”, so long as it is “he” + 1 character + “lo”.

If we want to find anything like that, other than “hello”, we can do that, too:

$ grep he[^l]lo *.txt
test2.txt:heclo
test3.txt:hewlo
test3.txt:hello

Notice how it doesn’t pick up any of the “Hello” variations which have a “llo” in them?

How many?!

We can specify how many times a character can repeat, too. We have to put the expression we’re talking about in [square brackets]:

  • “?” means “it might be there”
  • “+” means “it’s there, but there might be loads of them”
  • “*” means “lots (or none) might be there”

So, we can match “he”, followed by as many “l”s as you like (even none), followed by an “o” with “grep he[l]*o *.txt“:

$ grep he[l]*o *.txt
test2.txt:helo
test3.txt:hello
test3.txt:Why, hello there!
test3.txt:hellllo

Leave a comment