http://etext.lib.virginia.edu/services/helpsheets/unix/regex.html has a good introduction to Regular Expressions – grep, sed, and friends.
It includes a brief discussion on Backreferences (aka “the stuff that * matched”)
http://etext.lib.virginia.edu/services/helpsheets/unix/regex.html has a good introduction to Regular Expressions – grep, sed, and friends.
It includes a brief discussion on Backreferences (aka “the stuff that * matched”)
There are two sides to grep – like any command, there’s the learning of syntax, the beginning
of which I covered in the grep tool tip. I’ll
come back to the syntax later, because there is a lot of it.
However, the more powerful side is grep‘s use of regular expressions. Again, there’s not room
here to provide a complete rundown, but it should be enough to cover 90% of usage. Once I’ve got a library
of grep-related stuff, I’ll post an entry with links to them all, with some covering text.
Without being totally case-insensitive (which -i) does,
we can search for “Hello” or “hello” by specifying the optional
characters in square brackets:
$ grep [Hh]ello *.txt test1.txt:Hello. This is test file. test3.txt:hello test3.txt:Hello test3.txt:Why, hello there!
If we’re not bothered what the third letter is, then we can say “grep [Hh]e.o *.txt“, because the dot (“.”) will match any single character.
If we don’t care what the third and fourth letters are, so long as it’s “he..o”, then we say exactly that: “grep he..o” will match “hello”, hecko”, heolo”, so long as it is “he” + 1 character + “lo”.
If we want to find anything like that, other than “hello”, we can do that, too:
$ grep he[^l]lo *.txt test2.txt:heclo test3.txt:hewlo test3.txt:hello
Notice how it doesn’t pick up any of the “Hello” variations which have a “llo” in them?
We can specify how many times a character can repeat, too. We have to put the expression we’re talking about in [square brackets]:
So, we can match “he”, followed by as many “l”s as you like (even none), followed by an “o” with “grep he[l]*o *.txt“:
$ grep he[l]*o *.txt test2.txt:helo test3.txt:hello test3.txt:Why, hello there! test3.txt:hellllo
A powerful and useful tool in the shell scripter’s arsenal is grep. If you’ve not come across it before, it’s similar to the “find” tool that DOS had; it finds strings in files. Grep stands for “get regular expression”; a “regular expression” is a string, or something more than just a string.
Example:
$ grep foo myfile.txt
and Steve said, "foo! that's crazy"
$
That searches for “foo” in the file called “myfile.txt”. It gets any line (yes, the whole line) which contains the search text.
But you can do other stuff, with “switches”. For example “-i” means “insensitive to case”:
$ grep -i foo myfile.txt
"Foo" is a word, associated with "Bar".
and Steve said, "foo! that's crazy"
This time, grep finds that the word “foo” is actually mentioned twice in “myfile.txt”; once as “Foo” and once as “foo”.
The “-i” flag is a pretty common one, then, because it’s often what we really want it to find.
Here’s a good one, though: Under Linux, a special file /proc/bus/usb/devices lists your USB devices. That’s good, but yuck, it’s a mess of (too much) detailed information:
T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.15-27-server uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:07.2 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=ff(vend.) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=06b9 ProdID=4061 Rev= 0.00 S: Manufacturer=ALCATEL S: Product=Speed Touch USB S: SerialNumber=0090D00D0B25 C:* #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=00 Prot=00 Driver=usbfs
How do I just get what I need from the file? One switch to grep, which I don’t use as much as I should, is “-A”, for “After”. (Note that it’s a capital “A”).
After the Vendor ID and Product ID, /proc/bus/usb/devices includes the name of the device, so I can find out what I’ve got installed with a Vendor ID of 06b9 quite easily:
$ grep -A 2 06b9 /proc/bus/usb/devices
P: Vendor=06b9 ProdID=4061 Rev= 0.00
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB
Or what have I got from Alcatel?
$ grep -i -A1 Alcatel /proc/bus/usb/devices
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB
I can also ask: Who made my Speed Touch modem, or what’s its ID? “-B” displays lines before the line that matches:
$ grep -B 2 Speed /proc/bus/usb/devices
P: Vendor=06b9 ProdID=4061 Rev= 0.00
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB
$
There’s a lot you can do with grep; I’ve only really covered the first line from “man grep”