Looping in the shell (for and while)

March 7, 2007

In programming, the two most common types of loop are “for” and “while” loops. We can do both (and “repeat” loops, too, because that’s just a special case of the “while” loop) with the *nix shell. I’ve got some more detail in the tutorial.

for loops

Some languages have a “foreach” command; if you are used to such a language, then treat the shell’s for command as equivalent to foreach. If not, then don’t worry about it, just watch the examples, it’s about as simple as it can be.

$ for artist in Queen "Elvis Costello" Metallica
> do
> echo "I like $artist"
> done
I like Queen
I like Elvis Costello
I like Metallica
$

The for loop will simply go through whatever text it gets passed, and do the same stuff with each item.

Note that for “Elvis Costello” to be marked as one artist, not “Elvis” and “Costello”, I had to put quotes around the words. However, if these were files, then the following would suffice:

$ ls -l
total 0
-rw-r--r-- 1 steve steve 0 2007-03-07 00:45 Elvis Costello
-rw-r--r-- 1 steve steve 0 2007-03-07 00:45 Metallica
-rw-r--r-- 1 steve steve 0 2007-03-07 00:45 Queen
$ for artist in *
> do
> echo "I like $artist"
> done
I like Elvis Costello
I like Metallica
I like Queen
$

This time around, because we passed them via the shell’s interpreter, they are parsed in alphabetical order.

Many languages can do better than “foreach”, though: “for i=1 to 99 step 3“, for example, to step through 1,4,7,10 .. 91, 94, 97.

We can do this with a while loop.

while loops

While loops are not quite as simple as for loops; they have some kind of condition to match; when the condition does not match, the loop will exit.

The examples above, stepping through from 1 to 100 in increments of 3 (1, 4, 7, 10, … 91, 94, 97), can easily enough be done with a while loop:

$ i=1
$ while [ "$i" -lt "100" ]
> do
>   echo $i
>   i=`expr $i + 3`
> done
1
4
7
...
91
94
97

The “i=`expr $i + 3`” means “increment ‘i’ by 3″ (“i = i + 3” in most other languages).

The “-lt” means “is less than” (“-le” means “is less than or equal too; see “man test”, or http://steve-parker.org/sh/test.shtml and http://steve-parker.org/sh/quickref.shtml)


Tool Tip: “ls”

February 26, 2007

Yeah yeah, we know ls already.

But how much of ls‘s functionality do you actually use? There are so many switches to ls, that when Sun added extended attributes (does anyone use that?) they found that there were no letters left, so they had to use “-@” !

So, here are a couple of handy ls options, in no particular order; either for interactive or scripting use. I’m assuming GNU ls; Solaris ls supports most GNU-style features, but the “nice-to-have” features, like ls -h aren’t in historical UNIX ls implementations. I’ll split these into two categories: Sort ‘em and Show ‘em. What are your favourites?

Sort ‘em

When sorting, I tend to use the “-l (long listing)” and “-r (reverse order)” switches:

Sort ‘em by Size:

ls -lSr

Sort ‘em by Date:

ls -ltr

Show ‘em

There are a number of ways to show different attributes of the files you are listing; “-l” is probably the obvious example. However, there are a few more:

Show ‘em in columns

ls -C

Useful if you’re not seeing as many as you’d expect.

Show ‘em one by one

ls -1

That’s the number 1 (one) there, not the letter l (ell). Forces one-file-per-line. Particularly useful for dealing with strange filenames with whitespace in them.

Show ‘em as they are

ls -F

To append symbols (“*” for executables, “/” for directories, etc) to the filename to show further information about them.

Show ‘em so I can read it

ls -lh

Human-readable filesizes, so “12567166″ is shown as “12M”, and “21418″ is “21K”. This is handy for people, but of course, if you’re writing a script which wants to know file sizes, you’re better off without this (21Mb is bigger than 22Kb, after all!)

Show ‘em with numbers

ls -n

This is equivalent to ls -l, except that UID and GID are not looked up, so:

$ ls -l foo.txt
-rw-r--r-- 1 steve steve 46210 2006-11-25 00:33 foo.txt
$ ls -n foo.txt
-rw-r--r-- 1 1000 1000 46210 2006-11-25 00:33 foo.txt

This can be useful in a number of ways; particularly if your NIS (or other) naming service is down, or if you’ve imported a filesystem from another system.

What’s your favourite?

What are your most-used switches for the trusty old ls tool?


Harnessing the flexibility of Regular Expressions with Grep

February 14, 2007

There are two sides to grep – like any command, there’s the learning of syntax, the beginning
of which I covered in the grep tool tip. I’ll
come back to the syntax later, because there is a lot of it.

However, the more powerful side is grep‘s use of regular expressions. Again, there’s not room
here to provide a complete rundown, but it should be enough to cover 90% of usage. Once I’ve got a library
of grep-related stuff, I’ll post an entry with links to them all, with some covering text.

This or That

Without being totally case-insensitive (which -i) does,
we can search for “Hello” or “hello” by specifying the optional
characters in square brackets:

$ grep [Hh]ello *.txt
test1.txt:Hello. This is  test file.
test3.txt:hello
test3.txt:Hello
test3.txt:Why, hello there!

If we’re not bothered what the third letter is, then we can say “grep [Hh]e.o *.txt“, because the dot (“.”) will match any single character.

If we don’t care what the third and fourth letters are, so long as it’s “he..o”, then we say exactly that: “grep he..o” will match “hello”, hecko”, heolo”, so long as it is “he” + 1 character + “lo”.

If we want to find anything like that, other than “hello”, we can do that, too:

$ grep he[^l]lo *.txt
test2.txt:heclo
test3.txt:hewlo
test3.txt:hello

Notice how it doesn’t pick up any of the “Hello” variations which have a “llo” in them?

How many?!

We can specify how many times a character can repeat, too. We have to put the expression we’re talking about in [square brackets]:

  • “?” means “it might be there”
  • “+” means “it’s there, but there might be loads of them”
  • “*” means “lots (or none) might be there”

So, we can match “he”, followed by as many “l”s as you like (even none), followed by an “o” with “grep he[l]*o *.txt“:

$ grep he[l]*o *.txt
test2.txt:helo
test3.txt:hello
test3.txt:Why, hello there!
test3.txt:hellllo

Backgrounding

February 14, 2007

The great thing about Unix, and the Bourne shell, when it was introduced back in the day, was multitasking. It’s such an overused buzzword these days, but at the time, it was really a new thing. If you’ve only got one connection in to a machine, you can get it doing as much as you want.

The shell command to “do this in the background, then give me a new prompt to provide the next command” is the ampersand (“&”):

$  # I need to trawl the filesystem for files called "*dodgy*"
    #  (should have installed slocate
    #  (http://packages.debian.org/stable/source/slocate), 
    # but it's too late for that)
$ find / -name "*dodgy" -print
    (wait for a very very very long time)
/foo/bar/thisfileisnotdodgy.txt
/bar/foo/thisisadodgyfile.txt
$

Well, that’s a good hour of my life wasted.

Chuck it into a script, and run it in the background. If you want the outcome, direct it to a file:

$ cd /tmp
$ cat myfindscript.sh
find / -name "*dodgy*"
$ chmod u+rx myfindscript.sh
$ ./myfindscript.sh > /tmp/mysuspectfiles.txt &
[4402]
$
$ # wow, didja see that? It'll take ages, but I've got 
   # control back. "4402" is the Process ID (PID), 
   # so I can run "ps -fp 4402" to check on its
   # progress, but it's happening, in the backrgound.

You don’t get a lot of job control here; the “ps” mentioned above is about your lot, but you can spawn a child process and let it run, whilst you get on with the stuff you need.

This is known as “backgrounding” a task; if you know it will take a long time, just background it. Of course, if the next thing you need to do is to read the entire file, then you won’t get away with it, you’ll have to wait for it to finish. However, you could background it and then “tail -f /tmp/mysuspectfiles.txt” to check on the status.


Search and Replace

February 6, 2007

Two great tools for search-and-replace are tr and sed.

tr – Translate (or Delete)

tr can translate a single character into another. For example, “tr 'a' 'b'” will convert all instances of “a” into “b”. Although it works on single characters, it also understands blocks, so “tr '[A-Z]' '[a-z]'” will convert uppercase to lowercase: ‘A’ becomes ‘a’, ‘B’ becomes ‘b’, and so on.

The GNU version of tr (which comes with most Linux distributions) has some handy keywords, too: “tr [:lower:] [:upper:]” is a more readable opposite of the above [A-Z] convention.

You can also specify your own set of characters, so if you want to convert ‘l’ to ’1′, ‘o’ to ’0′, and ‘e’ to ’3′, then this will do the job:

$  echo welcome | tr 'loe' '103'
w31c0m3

You could extend it to a more complete “l33t sp33k”:

$ echo abcdefghijlkmnopqrstuvwxyz | tr 'aeilost' '@3!1057'
@bcd3fgh!j1kmn0pqr57uvwxyz

tr can also just delete – this deletes the letter ‘l’ :

$ echo hello and welcome | tr -d 'l'
heo and wecome

sed – Stream Editor

UNIX uses a few metaphors, one being a water metaphor, which we use with pipes (|), redirects (<, >), and a few other places. Sed gets its name from what it does… much like tr, you stream data into it, and slightly modified data comes out the other end.

sed isn’t limited to single-character operations; it can cope with whole phrases, as well as regular expressions. I’ll keep it simple(ish) for now, I plan to do a more complete post on sed and another on regular expressions soon, though. For today, I’ll stick to the sed s/from/to/count syntax.

With the s/from/to/count syntax, sed will convert “from” to “to”, as many times (per line of text) as you specify. The special “/g” converts every instance.

I like to get stuck in with a few examples, so here goes:

$ cat text.txt
Fedora Core is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Fedora Core is definitely
the best Linux distribution for home users.
Fedora Core is certainly my favourite distribution.
$ sed s/"Fedora Core"/"Ubuntu"/g text.txt
Ubuntu is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Ubuntu is definitely
the best Linux distribution for home users.
Ubuntu is certainly my favourite distribution.
$

The syntax there was sed s/from/to/count, so it replaces “Fedora Core” with “Ubuntu” in this example. If we specified “/1” at the end, it would only convert the first instance on each line. Similarly, “/2” would convert the first two instances. “/g” is probably the most-used, it converts everything (the “g” stands for “global”).

sed can emulate tr‘s tr -d functionality by having the “to” part being an empty string; here we refer to “Fedora Core” simply as “Fedora” (note the leading space: it’s ” Core”, not “Core”) :

$ sed s/" Core"//g text.txt
Fedora is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Fedora is the
best Linux distribution for home users.
Fedora is certainly my favourite distribution.

Notice also that we can cat stuff into sed, as “cat text.txt | sed s/src/dest/g“, or we can pass the file directly to sed, like this: sed s/src/dest/g text.txt. The same applies to most *nix commands.

To get in to the rest, we’ll need to get into regexp (Regular Expressions, the stuff like “the * brown * jumped over the * dog” which result in $1 = “quick”, $2 = “fox”, $3 = “lazy”). That’s for another day, though.


Tool Tip: “find”

January 19, 2007

find is a very powerful command. After the last post about grep, in which I mentioned that DOS has a command called “find” which is a simplistic version of grep, I now feel obliged to tell all about the real (that is, the *nix) find command.

Find works on the basis of find (from somewhere) (something); the “from somewhere” is often “.” (here, the current directory), or “/” (root, to search the entire filesystem). it’s not terribly interesting. What is interesting, is the “something” bit. You can specify a file name, for example:

$ find / -name "foo.txt"
/home/steve/misc/foo.txt
$ 

That wasn’t very exciting, and it will take a long time to complete, too. Systems with “slocate” installed could just say locate foo.txt and get the answer back in a fraction of a second (by looking it up in a database) ,without trawling through the whole hard disk (or, indeed, all attached disks). So that’s not what’s exciting about find. What is exciting about find, is what else it can do, instead of just “-name foo.txt”.

Don’t get me wrong; the “-name” switch is useful. More useful with wildcards: find . -name "*.txt" will find all text files.

You can restrict the search to one filesystem with the “-mount” (aka “-xdev”) flag.

If you want to find files newer than /var/log/messages, you can use find . -newer /var/log/messages

If you want to find files over 10Kb, then find . -size +10k will do the job. To get a full listing, find . -size +10k -ls.

Want to know what files I own? find . -uname steve

How about listing all files over 10Kb with their owner and permissions?

$ find . -size +10k -printf "%M %u %f\n"
-rwxr-xr-x steve foo.txt
-rw------- steve bar.doc
-rwxr-xr-x steve fubar.iso
-rwxr-xr-x steve fee.txt
-rw------- steve jg.tar
$

Here, the “%M” shows the permissions (-rwxr-xr-x), “%u” shows the username (“steve”), and “%f” shows the filename. The “\n” puts a “newline” character after each matching file, so that we get one file per line.

There is much more to find than this; I’ve not really covered the actions (other than printf) at all in this article, just a quick glimpse of how find can search for files based on just about any criteria you can think of. Search terms can be combined, so find . -size +10k -name "*.txt" will only find text files over 10Kb, and so on.


Tool Tip: “grep”

January 17, 2007

A powerful and useful tool in the shell scripter’s arsenal is grep. If you’ve not come across it before, it’s similar to the “find” tool that DOS had; it finds strings in files. Grep stands for “get regular expression”; a “regular expression” is a string, or something more than just a string.

Example:
$ grep foo myfile.txt
and Steve said, "foo! that's crazy"
$

That searches for “foo” in the file called “myfile.txt”. It gets any line (yes, the whole line) which contains the search text.

But you can do other stuff, with “switches”. For example “-i” means “insensitive to case”:
$ grep -i foo myfile.txt
"Foo" is a word, associated with "Bar".
and Steve said, "foo! that's crazy"

This time, grep finds that the word “foo” is actually mentioned twice in “myfile.txt”; once as “Foo” and once as “foo”.

The “-i” flag is a pretty common one, then, because it’s often what we really want it to find.

Here’s a good one, though: Under Linux, a special file /proc/bus/usb/devices lists your USB devices. That’s good, but yuck, it’s a mess of (too much) detailed information:

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 2.06
S:  Manufacturer=Linux 2.6.15-27-server uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:07.2
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=ff(vend.) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=06b9 ProdID=4061 Rev= 0.00
S:  Manufacturer=ALCATEL
S:  Product=Speed Touch USB
S:  SerialNumber=0090D00D0B25
C:* #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=00 Prot=00 Driver=usbfs

How do I just get what I need from the file? One switch to grep, which I don’t use as much as I should, is “-A”, for “After”. (Note that it’s a capital “A”).

After the Vendor ID and Product ID, /proc/bus/usb/devices includes the name of the device, so I can find out what I’ve got installed with a Vendor ID of 06b9 quite easily:

$ grep -A 2 06b9 /proc/bus/usb/devices
P: Vendor=06b9 ProdID=4061 Rev= 0.00
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB

Or what have I got from Alcatel?
$ grep -i -A1 Alcatel /proc/bus/usb/devices
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB

I can also ask: Who made my Speed Touch modem, or what’s its ID? “-B” displays lines before the line that matches:

$ grep -B 2 Speed /proc/bus/usb/devices
P: Vendor=06b9 ProdID=4061 Rev= 0.00
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB
$

There’s a lot you can do with grep; I’ve only really covered the first line from “man grep”


Follow

Get every new post delivered to your Inbox.