Shell Pitfalls

July 30, 2007

Greg Wooledge has an excellent list of Bash Pitfalls, with good explanations as to why they are wrong, and what the correct syntax should be.

Shell Pipes by Example

July 22, 2007

Pipes, piping, pipelines… whatever you call them, are very powerful – in fact, they are one of the core tenets of the philosophy behind UNIX (and therefore Linux). They are also, really, very simple, once you understand them. The way to understand them, is by playing with them, but if you don’t know what they do, you don’t know where to start… Catch-22!

So, here are some simple examples of how the pipe works.

Let’s see the code

$ grep steve /etc/passwd | cut -d: -f 6

What did this do? There are two UNIX commands there: grep and cut. The command “grep steve /etc/passwd” finds all lines in the file /etc/passwd which contain the text “steve” anywhere in the line. In my case, this has one result:
steve:x:1000:1000:Steve Parker,,,:/home/steve:/bin/bash
The second command, “cut -d: -f6” cuts the line by the delimiter (-d) of a colon (“:“), and gets field (-f) number 6. This is, in the /etc/passwd file, the home directory of the user.

So what? Show me some more

This is the main point of this article; once you’ve seen a few examples, it normally all becomes clear.


$ find . -type f -ls | cut -c14- | sort -n -k 5
rw-r--r--   1 steve    steve       28 Jul 22 01:41 ./hello.txt
rwxr-xr-x   1 steve    steve     6500 Jul 22 01:41 ./a/filefrag
rwxr-xr-x   1 steve    steve     8828 Jul 22 01:42 ./c/hostname
rwxr-xr-x   1 steve    steve    30848 Jul 22 01:42 ./c/ping
rwxr-xr-x   1 steve    steve    77652 Jul 22 01:42 ./b/find
rwxr-xr-x   1 steve    steve    77844 Jul 22 01:41 ./large
rwxr-xr-x   1 steve    steve    93944 Jul 22 01:41 ./a/cpio
rwxr-xr-x   1 steve    steve    96228 Jul 22 01:42 ./b/grep

What I did here, was three commands: “find . -type f -ls” finds regular files, and lists them in an “ls”-style format: permissions, owner, size, etc.
cut -c14-” cuts out the first 14 characters, which mess up the formatting on this website (!), and aren’t very interesting.
sort -n -k 5” does a numeric (-n) sort, on field 5 (-k5), which is the size of the file.
So this gives me a list of the files in this directory (and subdirectories), ordered by file size. That’s much more useful than “ls -lS“, which restricts itself to the current directory, but not subdirectories.

(As an aside, I have to admit that I only concocted this by trying to think of an example; it actually seems really useful, and worth making into an alias… I must do a post about “alias” some time!)

So how does it work?

This seems pretty straightforward: get lines containing “steve” from the input file (“grep steve /etc/passwd“), and get the sixth field (where fields are marked by colons) (“cut -d: -f6“). You can read the full command from left to right, and see what happens, in that order.

How does it really work?

EG1 Explained

There are some gotchas when you start to look at the plumbing. Because we’re using the analogy of a pipe (think of water flowing through a pipe), the OS actually sets up the commands in the reverse order. It calls cutfirst, then it calls grep. If you have (for example) a syntax error in your cut command, then grep will never be called.
What actually happens is this:

  1. A “pipe” is set up – a special entity which can take input, which it passes, line by line, to its output.
  2. cut is called, and its input is set to be the “pipe”.
  3. grep is called, and its output is set to be the “pipe”.
  4. As grep generates output, it is passed through the pipe, to the waiting cut command, which does its own simple task, of splitting the fields by colons, and selecting the 6th field as output.

EG2 Explained

For EG2, “sort” is called first, which ties to the second (rightmost) pipe for its input. Then “cut” is called, which ties to the second pipe for its output, and the first (leftmost) pipe for its input. Then, “find” is called, which ties to the first pipe for its output.
So, the output of “find” is piped into “cut“, which strips off the first 14 characters of the “find” output. This is then passed to “sort“, which sorts on field 5 (of what it receives as input), so the output of the entire pipeline, is a numerically sorted list of files, ordered by size.

Redirection – Simple Stuff

May 30, 2007

Nobody deals with the really low-level stuff any more; I learned it from UNIX Gurus in the 90s. I was really lucky to have met some real experts, and was stupid not to have better understood the opportunity to pick their brains.

Write to a file

$ echo foo > file

Append to a file

$ echo foo >> file

Read from a file (1)

$ cat < file

Read from a file (2)

$ cat file

Read lines from a file

$ while read f
> do
>   echo LINE: $f
> done < file

Pipes Primer

May 8, 2007

The previous post dealt with pipes, though the example may not have been the best for those who are not accustomed to the concept.

There are a few concepts to be understood – mainly, that of two (or more) processes operating together, how they put their data out, and how the get their data in. UNIX deals with multiple processes, all running (conceptually, at least) at the same time, on different CPUs, each with a standard input (stdin), and standard output (stdout). Pipes connect one process’s stdout to another’s stdin.

What do we want to pipe? Let’s say we’ve got a small 80×25 terminal screen, and lots of files. The ls command will spew out tons of data, faster than we can read it. There’s a handy utility called “more“, which will show a screen-worth of text, then prompt “more”. When you hit the space bar, it will scroll down a screen. You can hit ENTER to scroll one line.

I’m sure that you’ve worked this out already, but here is how we combine these two commands:

$ ls | more
<the first screenful of files is shown>

What happens here, is that the “more” command is started up first, then the “ls” command. The output of “ls” is piped to the input of “more”, so it can read the data.

Most such tools can also work another way, too:

$ more myfile.txt
<the first screenful of "myfile.txt" is shown>

That is to say, “myfile.txt” is taken as standard input (stdin).

Regular Expressions

April 18, 2007 has a good introduction to Regular Expressions – grep, sed, and friends.

It includes a brief discussion on Backreferences (aka “the stuff that * matched”)

Tool Tip: “Read” – it does what it says!

April 14, 2007

read is a very useful tool; it might seem too simple to bother mentioning, but there are at least three different ways to use it. (Okay, two, and the third isn’t really anything special about read, just a nifty thing that the shell itself provides)…

1. Read the whole line

Let’s start with an interactive script:

$ cat
echo "I'm a parrot!"
while read a
    echo "A is $a"
$ ./
I'm a parrot!
A is hello
one two three
A is one two three
piglet eeyore pooh owl
A is piglet eeyore pooh owl

Yes, you’ll need to hit CTRL-D to exit this loop, it’s just a simple example.

So far, so stupid. But wait; what if I wanted to get that “one” “two” “three” and use them differently?

2. Read the words

$ cat
echo "I'm a parrot!"
while read a b c
        echo "A is $a"
        echo "B is $b"
        echo "C is $c"
$ ./
I'm a parrot!
A is hello
B is
C is
one two three
A is one
B is two
C is three
piglet eeyore pooh owl
A is piglet
B is eeyore
C is pooh owl

So, just by naming some variables, we can pick what we get. And – did you see that last one? We don’t lose anything, either… Just because we asked for three variables (a, b, c) and we got 4 values (piglet eeyore pooh owl), we didn’t lose anything; the last one was treated like a normal read.

This is actually pretty handy stuff; you’d have to do a bit of messing about with pointers to get the same effect in C, for example.

3. Read from a file

We can do all this from a file, too. This isn’t special to read, but it’s often used in this way. See that “while – do – done” loop? It’s a sub-shell, and we can direct whatever we want to its input (everything is a file, remember, so the keyboard, a text file, a device driver, whatever you want, it’s all just a file)

We do this with the “<” operator. Just add “< filename.txt” after the “done” end of the loop:

$ cat
echo "I'm a parrot!"
while read a b c
        echo "A is $a"
        echo "B is $b"
        echo "C is $c"
done  < myfile.txt
$ cat myfile.txt
1 2 3
5 6
8 9 10 11 12 13
15 16 17
$  ./
I'm a parrot!
A is 1
B is 2
C is 3
A is 4
B is
C is
A is 5
B is 6
C is
A is 7
B is
C is
A is 8
B is 9
C is 10 11 12 13
A is 14
B is
C is
A is 15
B is 16
C is 17

So we can process tons of data, wherever it comes from.

4. I only mentioned 3 uses

We could make the script a bit more useful, by allowing the user to specify the file, instead of hard-coding it to “myfile.txt“:

$ cat
echo "I'm a parrot!"
while read a b c
        echo "A is $a"
        echo "B is $b"
        echo "C is $c"
done < $1
$ cat someotherfile.txt
1 2 3
one two three four
$ ./ someotherfile.txt
I'm a parrot!
A is 123
B is
C is
A is 1
B is 2
C is 3
A is one
B is two
C is three four

Update 14 April

Updated to fix the “done < filename.txt” from the example code of the last two examples.

More maths stuff – bc in detail

April 8, 2007

There’s a great post about bc at – I think that I’ve already covered most of the same ground, but it’s got lots of great examples.

Calculating Averages

March 26, 2007

The Simple Maths post seems to be the most popular article in the so-far short life of this blog.

It’s also something that I have received a few emails about recently, so I feel like posting a bit more on the subject.

I think that the code can speak for itself… We implement a loop, which calls the builtin read function (I’m not sure the “-p” flag, to provide a prompt, is universal. It does work with the Bash builtin. If it doesn’t work on your *nix, it’s really only for show, so you can live without it.

Because read works on standard input (aka “stdin”), it will work interactively from the keyboard, or direct from a file (one number per line).

We use two methods of doing maths in the shell:

  • expr, because it’s a simple and easily-read way to do simple maths: n=`expr $n + 1`

  • bc, because it is more powerful. Do have a play with bc interactively, it can do a lot... see below.

So, we can write a fairly simple script (read down, it's only actually 11 lines of code without the comments), which is actually quite versatile - it can do running averages, it can be interactive or run from cron, called from another script, even used as a function.

So, here's the code. It should be fairly self-explanatory, but do have a look at the interactive bc sample session below, to see what we are doing with bc. Also, do play with bc (some Linux distros have dropped it from the default install recently, so you'll have to yast -i bc, or equivalent)

The Script - Calculate Averages

# Calculate mean (average) of integer data

# Initialise the variables
n=0     # n being the number of (valid) data provided
sum=0   # sum being the running total of all data

# Note that by using ^D (aka "EOF") to quit, this
# script will work just as well interactively, as
# when provided with a file containing the data.
while read -p "Enter a number (^D to quit): " x
        # expr is useful for simple maths
  sum=`expr $sum + $x`
  # If this fails, it was non-numeric input
  if [ "$?" -eq "0" ]; then
    # Okay, it was valid input.
    n=`expr $n + 1`
    # We can provide a "running average" here;
    # I'll comment it out for now.
    # echo "Running Average:"
    # echo "scale=2;$sum/$n" | bc
    # echo

# Okay, we've done the loop.
# Present the data.
echo "Overall Average:"
        # bc is more useful than expr for
        # more involved maths, though its
        # syntax, particularly in a script,
        # is possibly less obvious.
        # Using bc interactively is easier
        # than using it in a shell script
echo "scale=2;$sum/$n" | bc

Interactive bc

The bold text is user input. The rest is from bc:

$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
ibase=2 I'll be entering base2 (binary)
01001001 So, I enter 1001001 (73)
73 And it replies with the answer in base 10
ibase=10 Does this set the input base back to 10?
10 Let's input "10", it should reply "10"
2 No, we entered "10" in base 2, which is 2!
ibase=1010 So, 10 in binary is 1010 (8+2)
10 We say 10
10 And bc says 10. Good, we're back to normal
11 And the same for 11
11 Good, it works. Now for some maths..
1 + 2 (tricky stuff!)
3 Yes, that's good, 1+2=3
23 + 34 + 45 + 56 We're not limited to x+y
158 So we can build up our sums
10/3 10/3 = 3 and a third, right?
3 Not to 0 decimal places.
scale=2 Okay, let's have 2 decimal places
10/3 Now ask again
3.33 That's better
scale=5 Or to 5 points?
10/3 Ask again...
3.33333 And it works!
scale=1 One point:
10/3 And ask again
3.3 As we expected.
scale=0 So, scale=0 means 0 places
10/3 Should say 3
3 Yes, we're back to where we started.

Back to the Script

That made a nice break. Now we'll go back to the script... it's only actually 11 lines long:

while read x
  sum=`expr $sum + $x`
  if [ "$?" -eq "0" ]; then
    n=`expr $n + 1`
echo "scale=2;$sum/$n" | bc

And as I said, we can use it interactively, or with a file of data:

$ cat data.txt

Because, under *nix, EVERYTHING IS A FILE, even the keyboard!

Links to bash and sed FAQs

March 19, 2007

A few more interesting / useful links …

17 Bash Pitfalls and (from the same site) BashFaq

Also the SED FAQ at SourceForge, which I hadn’t come across before today (the FAQ that is, not SourceForge!): (to go with which I plugged previously)

Variables – When to use a ‘dollar’ symbol

March 19, 2007

If you’ve used any other language, then you are probably quite familiar with how variables work, and how they are referred to. If not, then the following examples should suffice to cover the two most common options:

PHP (amongst others)

$foo="hello, world";
echo $foo;

This sets the “foo” variable to be “hello, world!”, and then echoes it out. Notice how “foo” is always preceded by a dollar ($) symbol. That denotes it as a variable. Whether we’re setting or reading its contents, it’s always “$foo“.

C (and others again)

int main() {
  int foo;
  int bar;

  bar = foo * 5;
  printf("The answer is %d\n", bar);

This will provide the useful fact that 2*5 = 10. However, no “$” dollar symbols are used at all. That’s the “other” option… the compiler knows the rules, depending on the chosen language.

However, the shell falls part way in between these two examples.

If you want to quote the value of a variable, then you need the dollar. If you want to set it, then drop the dollar:

Shell Script

echo $foo World

This will say “Hello World”. But did you see what happened with the dollars? To set a variable, no dollar. To read it, use the dollar.

This is particularly confusing to many users of the shell. I can’t even provide a good reason as to why it should work this way, whether historically or pragmatically.

Similarly, when reading variable contents, do not use the dollar:

echo "What do you want to tell the world?"
read msg
echo $msg World

This will tell your message to the world… if you say “Hello”, then it will say “Hello World”. If you say “Goodbye Cruel”, then it will say “Goodbye Cruel World”.

Notice the dollars… whilst strange, it is at least consistent. You use the dollar symbol to quote the content of the variable; otherwise, leave it out.

For more in-depth stuff about variables, check out


