cut

January 31, 2007

The cut utility is a great example of the UNIX philosophy of “do one thing, and do it well.” cut just cuts lines of text, based on a (single-character) delimiter.

There are two basic forms in which cut is generally used:

1. Grab These Columns (cut -c)

One form of cut gets certain characters, or columns of characters, out of a file. This is done with the cut -c command. So we can get the 5th character of the string:

$ echo "Hello, World" | cut -c 5
o

Or we can get the first 5 characters:

$ echo "Hello, World" | cut -c -5
Hello

Or from character #5 onwards:

$ echo "Hello, World" | cut -c 5-
o, World

Or even just characters 2-5:

$ echo "Hello, World" | cut -c 2-5
ello

Or maybe just select characters 5, 8 and 9:

$ echo "Hello, World" | cut -c 5,8,9
owo

2. Delimited (cut -d)

The other method is to use a delimiter. For example, the /etc/passwd file looks something like thism with the fields delimited by the “:” (colon) symbol:

root:x:0:0:root:/root:/bin/bash
steve:x:1000:1000:Steve Parker,,,:/home/steve:/bin/bash

In this example, field 1 is “root” (or “steve”), field 2 is “x”, and so on…. the last field (7) is “/bin/bash” for both accounts, in this case.

So we just have to specify the delimiter (:) and the field number(s). Field 1 is the account name, 6 is the home directory, field 3 is the UserID…

$ cut -d: -f1 /etc/passwd
root
steve
$ cut -d: -f6 /etc/passwd
/root
/home/steve
$ grep "^root:" /etc/passwd | cut -d: -f3
0

(In the last example, we got *just* the root account, with a grep which searches for a line starting with “root:”, which could only be the root account, and not (for example) “Fred Troot” which would also match a search for “root”)

cut is one of those really simple, but really useful utilities. And because it’s very simple, it’s nice and quick (which matters a lot if you’re looping through a few hundred times.


Redirection

January 29, 2007

A lot of the power of the *nix shell is in redirection. The model is called “streams”, and we even have “pipes” for these “streams”. It’s not the best metaphor ever, but it’s good enough, I suppose. There are 3 streams associated with a process: the standard input (stdin), standard output (stdout) and standard error output (stderr). As you might expect, you get user input on stdin, output to stderr, and send errors to stderr.

stdout : Redirecting Output

Let’s start with stdout, the output stream, since that is the most common:

$ ls > /tmp/foo.txt
$

This will run the “ls” (list files) command, but instead of showing you the results, it will write to the file you specify (/tmp/foo.txt, in this example). If there is already a /tmp/foo.txt file, then it gets replaced.

$ ls >> /tmp/foo.txt
$

By using >> instead of the single >, we can append to the file instead of overwriting it.

If we want to see the output like we normally would, but log it to a file as well, we can use the “tee” utility:

$ ls | tee /tmp/foo.txt
[ the usual ls output shown here ]
$ ls | tee -a /tmp/foo.txt
[ the usual ls output shown here ]
$

The first command will write to /tmp/foo.txt; the second (with “-a“) will append to /tmp/foo.txt.

We have used a different redirector here; “|” (pipe) instead of “>” and “>>” . This is because we’re not funnelling the output to a different place (/tmp/foo.txt), but passing it to a program (tee) which does some more funky redirection.

Another common use of the pipe (“|“) is to go to the more (or less) utility. If a command would produce loads of output, faster than you can read it, then more will pause after the screen is full, and prompt you with a “— more” prompt (hence the name). SPACE will show the next screen; RETURN will show the next line. less is just like more, but it can also go backwards (PgUp and PgDn):

$ ls | more
file1.txt
file2.txt
file3.txt
--- more  (PRESS SPACE)
file4.txt
file5.txt
file6.txt
--- more (PRESS SPACE)
file7.txt
$ 

stderr : Redirecting Errors

Well-written programs will send their errors to a different “device” than they send their normal output to. Both stdout and stderr are usually the same place (your terminal), but they can be treated separately:

$ ls
foo.txt
$ ls fo.txt
ls: fo.txt: No such file or directory
$ ls fo.txt > output.txt 2>errors.txt
$ ls
foo.txt    output.txt   errors.txt
$ cat output.txt
$ cat errors.txt
ls: fo.txt: No such file or directory
$

What happened there? Well, we asked for “ls fo.txt“, but fo.txt doesn’t exist (foo.txt does). So we see an error from ls. If we direct stdout to “output.txt” and stderr to “errors.txt”, then we can see the difference. What ls actually did, was that it wrote *nothing* to stdout and the error message was sent to stderr. (stderr has file descriptor #2, so we say “2>” to direct stderr.) So when we “cat output.txt“, we get nothing (there was no output), but when we “cat errors.txt“, we see the error.

stdin: Redirecting Input

This is most commonly done by system utilities, but many shell scripts use the functionality.
The simple way is to use the “<” director:

$ mycommand < myfile.txt

This will take input from “myfile.txt” instead of from the keyboard.

The second way is probably more common. The more example above had more redirecting its input, via the pipe (“|“). We can create entire pipelines this way:

$ find . -print | grep foo | more

This will attach the stdout of the find command to the stdin of the grep command, and attach the stdout of grep to the stdin of more. Got that?! So what the *nix kernel will do, is that it will start more first, and then start grep telling it that its stdout is more‘s stdin. It will then start find, and tell it that its stdout is grep‘s stdin.

There’s actually more to it than that, but that’s got the basics of stdin, stdout and stderr covered briefly.


Simple Maths in the Unix Shell

January 29, 2007

One thing that often confuses new users to the Unix / Linux shell, is how to do (even very simple) maths. In most languages, x = x + 1 (or even x++) does exactly what you would expect. The Unix shell is different, however. It doesn’t have any built-in mathematical operators for variables. It can do comparisons, but maths isn’t supported, not even simple addition.

Following the Unix tradition (“do one thing, and do it well”) to the extreme, because the expr and bc utilites can do maths, there is absolutely no need for sh to re-invent the wheel.

Yes, I agree. This is frustrating. If I’ve got one gripe against shell programming, then this is it.

Addition and Subtraction

So how do we cope? There are basically two ways, depending on whether we choose expr or bc:

#!/bin/sh

echo "Give me a number: "
read x
echo "Give me another number: "
read y

######  Here's where we have the two options:
# The expr method:
exprans=`expr $x + $y`

# The bc method:
bcans=`echo $x + $y | bc`
###### Did you see the difference?


echo "According to expr, $x + $y = $exprans"
echo "According to bc,   $x + $y = $bcans"

As you can see, the language is slightly different for the two commands; expr parses an expression passed to it as arguments: expr something function something whereas bc takes the expression as its input (stdin): echo something function something | bc. Also, for expr, you must put spaces around the arguments: “expr 1+2” doesn’t work. “expr 1 + 2” works.

Multiplication

Multiplication is a little awkward, too; the * asterisk, which traditionally denotes multiplication, is a special character to the shell; it means “every file in the current directory”, so we have to delimit it with a backslash. “*” becomes “\*“:

#!/bin/sh

echo "Give me a number: "
read x
echo "Give me another number: "
read y

######  Here's where we have the two options:
# The expr method:
exprans=`expr $x \\* $y`

# The bc method:
bcans=`echo $x \\* $y | bc`
###### Did you see the difference?


echo "According to expr, $x * $y = $exprans"
echo "According to bc,   $x * $y = $bcans"

The other thing to note here, is the backtick (`). This grabs the output of the command it surrounds, and passes it back to the caller. So a command like

x=`expr 1 + 2`

means that, while if you type expr 1 + 2 at the command line, you’d get “3” back:

steve@nixshell$ expr 1 + 2
3
steve@nixshell$ 

If you enclose it with backticks, then the variable $x becomes set to the output of the command. Therefore,

x=`expr 1 + 2`

Is equivalent to (but of course more flexible than):

x=3

One last thing about assigning values to variables: Whitespace MATTERS. Don’t put spaces around the = sign. “x = 3” won’t work. “x=3” works.

Update: 17 Feb 2007 : Division, and Base Conversion

As noted by Constantin, the “scale=x” function can be useful for defining precision (bc sometimes seems to downgrade your precision: “echo 5121 / 1024 | bc” claims that the answer is “5”, which isn’t quite true; 5120/1024=5. echo "scale = 5 ; 5121 / 1024" | bc produces an answer to 5 decimal places (5.00097)).

Another important note I would like to add, is that bc is great at converting between bases:

Convert Decimal to Hexadecimal

steve@nixshell$ bc
obase=16
12345
3039

This tells us that 12345 is represented, in Hex (Base 16) as “0x3039”.
Similarly, we can convert back to decimal (well, we can use bc to convert any base to any other base)…

steve@nixshell$ bc
ibase=16
3039
12345

Or we can convert from Binary:

steve@nixshell$ bc
ibase=2
01010110
86
steve@nixshell$

… which tells us that 01010110 (Binary) is 86 in decimal. We can get that in hex, like this:

steve@nixshell$ bc
obase=16
ibase=2
01010110
56
steve@nixshell$

Which tells bc that the input base is 2 (Binary), and the output base is 16 (Hex). So, 01010110 (base 2) = 56 (hex) = 86 (decimal).

Note that the order does matter a lot; if we’d have said “ibase=2; obase=16”, that would be interpreted differently from “obase=16; ibase=2”.

I hope that this article will help some people out with some of the more frustrating aspects of shell programming. Please, let me know what you think.


File Permissions

January 25, 2007

The Unix file permissions model doesn’t seem to get explained very clearly, very often. It’s really quite simple, though some of the more advanced stuff isn’t so widely known. The key commands are ls -l and chmod. chmod has two ways of working; we’ll deal with the easy one first.

When you look at a file, there are lots of fields. (I’m using Linux with an ext3 filesystem for these examples, but it’s the same across the board for Unix and Linux, and just about any filesystem.)

$ ls -l myfile.txt
-rw-r--r-- 1 steve users 4 2007-01-25 20:37 myfile.txt
$

So what does it all mean? Going through the fields in order, it’s:

-rw-r--r--    1    steve   users  4    2007-01-25 20:37 myfile.txt
permission  links  owner   group  size  last-modified   filename

We’re dealing with the permission stuff here, but I’ll quickly run through the others. “Links” tells you how many “hard links” there are to the file. That’s probably for another post, but if you type “ln myfile.txt yourfile.txt“, then the link count will go up from 1 to 2. “owner” tells you what user owns the file, and “group” tells you what group is associated with the file. “size” is pretty obvious; it’s in bytes, (this file’s 4 bytes are “f”, “o”, “o” and a newline character). “last-modified” tells you when the file was last changed (not necessarily when it was created), and finally, the filename.

For our purposes, the important stuff is the permission, owner and group. That’s “-rw-r–r–“, “steve” and “users” in this example.

Looking at the “-rw-r–r–“, it seems almost random. Once you know the structure, it’s very informative. There are 10 characters, or fields, grouped with the first character by itself, then three sets of three, like so:

File Type Owner Group Other
rw- r– r–

The initial “-” for File Type, tells you what kind of file it is. In this case, “-” means it’s a regular file. “d” indicates a directory, “c” means a character-special device, and “b” means a block-special device. Run “ls -l /dev” to see some “c” and “b” files. They’re device drivers; a character-device (eg, /dev/lp0, the printer) is accessed with characters; you tend to chuck text at it. A block-device (eg, /dev/hda1, the hard disk) is accessed in blocks, not single characters. We’re not kernel developers, so we don’t need to worry about that too much.

The Meat Of It

The main part of the -rw-r–r– information is the three sets of three characters: “rw-“, “r–” and “r–” in this example. Of the block, the first character is either “r” to indicate that you can Read the file, or “-” to indicate that you can’t read it. The second is “w” if you can Write to the file, and “-” if you can’t. The third is usually “x” if you can eXecute (run) the file, or “-” if you can’t. (the third can also be “t” or “s”; we’ll come to that in a minute).

So in this case, the file’s owner (“steve”) can read and write, but not execute (rw-). Members of the group (“users”) can read the file, but can’t change it or run it (r–). Anybody who’s not “steve” and not in the “user” group can do the same in this example (r–).

Common Uses

Common sets of permissions are:

600 -rw——- I can read and write it, but nobody else can. (Private files)
640 -rw-r—– I can read and write it, my group can only read it. Others can do nothing. (Semi-shared files)
755 -rwxr-xr-x I can read, write, execute; everyone else can read and execute it, but not change it. (Shared programs)
644 -rw-r–r– I can read and write it, the rest of the world can read it (Shared files)

What’s that left-hand column I threw in there? That’s the other way of thinking about permissions. if “r” is 4, “w” is 2, and “x” is 1, then “rwx” is “4+2+1=7”, “r–” is 4, “rw-” is 4+2=6, and so on. It’s a kind of shorthand.

chmod

We set permissions with the chmod command. The first set of three is “u” for “User”, the second is “g” for “Group”, and the last is “o” for “Other”. There’s also “a” for “All”. So “chmod g+rwx” means “add rwx to the second block”, while “chmod a-x” means “take off the x flags for everybody”.

This is easiest to show with examples:

$ ls -l myfile.txt
-rw-r--r-- 1 steve steve 4 2007-01-25 20:39 myfile.txt

#                                           Allow me to eXecute the file: User + eXecute = u+x:
$ chmod u+x myfile.txt
$ ls -l myfile.txt
-rwxr--r-- 1 steve steve 4 2007-01-25 20:39 myfile.txt

#                                           Don't let Others Read the file: Others - Read = o-r:
$ chmod o-r myfile.txt
$ ls -l myfile.txt
-rwxr----- 1 steve steve 4 2007-01-25 20:39 myfile.txt

#                                           Don't the Group Read the file: Group - Read = g-r:
$ chmod g-r myfile.txt
$ ls -l myfile.txt
-rwx------ 1 steve steve 4 2007-01-25 20:39 myfile.txt

#                                           Be specific with numbers: 600 = -rw-------
$ chmod 600 myfile.txt
$ ls -l myfile.txt
-rw------- 1 steve steve 4 2007-01-25 20:39 myfile.txt

#                                           Be specific with numbers: 755 = rwxr-xr-x
$ chmod 755 myfile.txt
$ ls -l myfile.txt
-rwxr-xr-x 1 steve steve 4 2007-01-25 20:39 myfile.txt

Copy a filesystem

January 23, 2007

When you need to copy or move an entire filesystem, the first, most obvious, answers do not quite do the job. cp needs lots of flags to get the permissions, ownership, access times and so on to match. tar takes care of most of that, but it still doesn’t deal with special files (the most common being character or block devices, such as those you’d find in /dev). So there’s a little magic incantation I’ve memorised. It does the job, not 99% of the job, it just does the job.

If you have two volumes, let’s call them /data and /backup, and you want to copy everything from /data into /backup, then you’d do this:

# cd /data
# find . -print | cpio -pudvm /backup

And off it goes. Your inode attributes, special files, everything, just as it was in the /data filesystem. If there are other filesystems mounted under /data – eg., /data/customer1 and /data/customer2, then they’ll be backed up by this method. If that’s not what you want, then add -mount to the find command.

I used it recently when a hard disk was dying, and it contained the /home filesystem; just get a new disk, and off you go:

# mkfs.ext3 /dev/hdc1
[snip]
# mount /dev/hdc1 /newhomes
# cd /home
# find . -print | cpio -pudvm /newhomes

And then just unmount the old one, edit the /etc/fstab to point to the new one, and remount it. A little bit of sed is easier to show than the editing… I’d use vi in real life:

# mount | grep home
/dev/hda1 on /home type ext3 (rw)
# umount /home
# grep home /etc/fstab
/dev/hda1    /home     ext3  defaults    0   2
# sed -i hda1 hdc1 /etc/fstab
# grep home /etc/fstab
/dev/hdc1    /home     ext3  defaults    0   2
# mount /home
# mount | grep home
/dev/hdc1 on /home type ext3 (rw)

The cpio switches are (with descriptions from a Solaris 9 man page):
-p (pass) Reads a list of file path names from the standard input and conditionally copies those files into the destination directory tree.
-d Creates directories as needed.
-m Retains previous file modification time. This option is ineffective on directories that are being copied.
-u Copies unconditionally. Normally, an older file will not replace a newer file with the same name.
-v Verbose. Prints a list of file and extended attribute names. When used with the -t option, the table of contents looks like the output of an ls -l command (see ls(1) ).

I don’t know why, but I find the order pudvm is stuck in my head, it just seems like a nice little word in its own right. And it’s been really helpful on a number of occasions.


Tool Tip: “find”

January 19, 2007

find is a very powerful command. After the last post about grep, in which I mentioned that DOS has a command called “find” which is a simplistic version of grep, I now feel obliged to tell all about the real (that is, the *nix) find command.

Find works on the basis of find (from somewhere) (something); the “from somewhere” is often “.” (here, the current directory), or “/” (root, to search the entire filesystem). it’s not terribly interesting. What is interesting, is the “something” bit. You can specify a file name, for example:

$ find / -name "foo.txt"
/home/steve/misc/foo.txt
$ 

That wasn’t very exciting, and it will take a long time to complete, too. Systems with “slocate” installed could just say locate foo.txt and get the answer back in a fraction of a second (by looking it up in a database) ,without trawling through the whole hard disk (or, indeed, all attached disks). So that’s not what’s exciting about find. What is exciting about find, is what else it can do, instead of just “-name foo.txt”.

Don’t get me wrong; the “-name” switch is useful. More useful with wildcards: find . -name "*.txt" will find all text files.

You can restrict the search to one filesystem with the “-mount” (aka “-xdev”) flag.

If you want to find files newer than /var/log/messages, you can use find . -newer /var/log/messages

If you want to find files over 10Kb, then find . -size +10k will do the job. To get a full listing, find . -size +10k -ls.

Want to know what files I own? find . -uname steve

How about listing all files over 10Kb with their owner and permissions?

$ find . -size +10k -printf "%M %u %f\n"
-rwxr-xr-x steve foo.txt
-rw------- steve bar.doc
-rwxr-xr-x steve fubar.iso
-rwxr-xr-x steve fee.txt
-rw------- steve jg.tar
$

Here, the “%M” shows the permissions (-rwxr-xr-x), “%u” shows the username (“steve”), and “%f” shows the filename. The “\n” puts a “newline” character after each matching file, so that we get one file per line.

There is much more to find than this; I’ve not really covered the actions (other than printf) at all in this article, just a quick glimpse of how find can search for files based on just about any criteria you can think of. Search terms can be combined, so find . -size +10k -name "*.txt" will only find text files over 10Kb, and so on.


Tool Tip: “grep”

January 17, 2007

A powerful and useful tool in the shell scripter’s arsenal is grep. If you’ve not come across it before, it’s similar to the “find” tool that DOS had; it finds strings in files. Grep stands for “get regular expression”; a “regular expression” is a string, or something more than just a string.

Example:
$ grep foo myfile.txt
and Steve said, "foo! that's crazy"
$

That searches for “foo” in the file called “myfile.txt”. It gets any line (yes, the whole line) which contains the search text.

But you can do other stuff, with “switches”. For example “-i” means “insensitive to case”:
$ grep -i foo myfile.txt
"Foo" is a word, associated with "Bar".
and Steve said, "foo! that's crazy"

This time, grep finds that the word “foo” is actually mentioned twice in “myfile.txt”; once as “Foo” and once as “foo”.

The “-i” flag is a pretty common one, then, because it’s often what we really want it to find.

Here’s a good one, though: Under Linux, a special file /proc/bus/usb/devices lists your USB devices. That’s good, but yuck, it’s a mess of (too much) detailed information:

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 2.06
S:  Manufacturer=Linux 2.6.15-27-server uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=0000:00:07.2
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=ff(vend.) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=06b9 ProdID=4061 Rev= 0.00
S:  Manufacturer=ALCATEL
S:  Product=Speed Touch USB
S:  SerialNumber=0090D00D0B25
C:* #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=00 Prot=00 Driver=usbfs

How do I just get what I need from the file? One switch to grep, which I don’t use as much as I should, is “-A”, for “After”. (Note that it’s a capital “A”).

After the Vendor ID and Product ID, /proc/bus/usb/devices includes the name of the device, so I can find out what I’ve got installed with a Vendor ID of 06b9 quite easily:

$ grep -A 2 06b9 /proc/bus/usb/devices
P: Vendor=06b9 ProdID=4061 Rev= 0.00
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB

Or what have I got from Alcatel?
$ grep -i -A1 Alcatel /proc/bus/usb/devices
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB

I can also ask: Who made my Speed Touch modem, or what’s its ID? “-B” displays lines before the line that matches:

$ grep -B 2 Speed /proc/bus/usb/devices
P: Vendor=06b9 ProdID=4061 Rev= 0.00
S: Manufacturer=ALCATEL
S: Product=Speed Touch USB
$

There’s a lot you can do with grep; I’ve only really covered the first line from “man grep”


Hello world!

January 17, 2007

Okay, here we go.

As a side-project to my *nix shell programming tutorial, I thought it might be an idea to start a blog, with regular postings of general hints and tips.

Not the overview stuff that the tutorial provides, but more “cheat codes” type stuff; crib-sheets to get you through the day.

Here’s one for starters: Sed 1-Liners There’s an awful lot you can do with sed if you have the secret sauce. Without this file, I’d be stuck with sed s/foo/bar/g, and that’d be it. You can do a lot with sed. The file linked above contains many very useful tricks.