Links to bash and sed FAQs

March 19, 2007

A few more interesting / useful links …

17 Bash Pitfalls and (from the same site) BashFaq

Also the SED FAQ at SourceForge, which I hadn’t come across before today (the FAQ that is, not SourceForge!): http://sed.sourceforge.net/sedfaq.html (to go with http://sed.sourceforge.net/sed1line.txt which I plugged previously)


Variables – When to use a ‘dollar’ symbol

March 19, 2007

If you’ve used any other language, then you are probably quite familiar with how variables work, and how they are referred to. If not, then the following examples should suffice to cover the two most common options:

PHP (amongst others)

$foo="hello, world";
echo $foo;

This sets the “foo” variable to be “hello, world!”, and then echoes it out. Notice how “foo” is always preceded by a dollar ($) symbol. That denotes it as a variable. Whether we’re setting or reading its contents, it’s always “$foo“.

C (and others again)

int main() {
  int foo;
  int bar;

  foo=2;
  bar = foo * 5;
  printf("The answer is %d\n", bar);
}

This will provide the useful fact that 2*5 = 10. However, no “$” dollar symbols are used at all. That’s the “other” option… the compiler knows the rules, depending on the chosen language.

However, the shell falls part way in between these two examples.

If you want to quote the value of a variable, then you need the dollar. If you want to set it, then drop the dollar:

Shell Script

#!/bin/sh
foo=Hello
echo $foo World

This will say “Hello World”. But did you see what happened with the dollars? To set a variable, no dollar. To read it, use the dollar.

This is particularly confusing to many users of the shell. I can’t even provide a good reason as to why it should work this way, whether historically or pragmatically.

Similarly, when reading variable contents, do not use the dollar:

#!/bin/sh
echo "What do you want to tell the world?"
read msg
echo $msg World

This will tell your message to the world… if you say “Hello”, then it will say “Hello World”. If you say “Goodbye Cruel”, then it will say “Goodbye Cruel World”.

Notice the dollars… whilst strange, it is at least consistent. You use the dollar symbol to quote the content of the variable; otherwise, leave it out.

For more in-depth stuff about variables, check out


Bashish

March 13, 2007

Just a quick post, to plug a rather thorough article about changing the appearance of the bash prompt:

http://systhread.net/texts/200703bashish.php


Timestamps for Log Files

March 11, 2007

There are two common occasions when you might want to get a timestamp

  • If you want to create a logfile called “myapp_log.11.Mar.2007″
  • If you want to write to a logfile with “myapp: 11 Mar 2007 22:14:44: Something Happened”

Either way, you want to get the current date, in the format you prefer – for example, it’s easier if a filename doesn’t include spaces.

For the purposes of this article, though for no particular reason, I am assuming that the current time is 10:14:44 PM on Sunday the 11th March 2007.

The tool to use is, naturally enough, called “date“. It has a bucket-load of switches, but first, we’ll deal with how to use them. For the full list, see the man page (“man date“), though I’ll cover some of the more generally useful ones below.

Setting the Date/Time

The first thing to note, is that date has two aspects: It can set the system clock:

# date 031122142007.44

will set the clock to 03 11 22 14 2007 44 – that is, 03=March, 11=11th day, 22 = 10pm, 14 = 14 minutes past the hour, 2007 = year 2007, 44 = 44 seconds past the minute.

Heck, I don’t even know why I bothered to spell it out, it’s obvious. Of course the year should come between the minutes and the seconds (ahem).

Getting the Date/Time

The more often used feature of the date command, is to find the current system date / time, and that is what we shall focus on here. It doesn’t follow tradition, in that it uses the “+” and “%” symbols, instead of the “-” symbol, for its switches.

H = Hours, M = Minutes, S = Seconds, so:

$ date +%H:%M:%S
22:14:44

Which means that you can name a logfile like this:

#!/bin/sh
LOGFILE=/tmp/log_`date +%H%M%S`.log
echo Starting work > $LOGFILE
do_stuff >> $LOGFILE
do_more_stuff >> $LOGFILE
echo Finished >> $LOGFILE

This will create a logfile called /tmp/log_221444.log

You can also put useful information to the logfile:

#!/bin/sh
LOGFILE=/tmp/log_`date +%H%M%S`.log
echo `date +%H:%M:%S : Starting work > $LOGFILE
do_stuff >> $LOGFILE
echo "`date +%H:%M:%S : Done do_stuff" >> $LOGFILE
do_more_stuff >> $LOGFILE
echo "`date +%H:%M:%S : Done do_more_stuff" >> $LOGFILE
echo Finished >> $LOGFILE

This will produce a logfile along the lines of:

$ cat /tmp/log_221444.log
22:14:44: Starting work
do_stuff : Doing stuff, takes a short while
22:14:53: Done do_stuff
do_more_stuff : Doing more stuff, this is quite time consuming.
22:18:35: Done do_more_stuff
$

Counting the Seconds

UNIX has 1st Jan 1970 as a “special” date, the start of the system clock; GNU date will tell you how many seconds have elapsed since midnight on 1st Jan 1970:

$ date +%s
1173651284

Whilst this information is not very useful in itself, it may be useful to know how many seconds have elapsed between two events:

$ cat list.sh
#!/bin/sh
start=`date +%s`
ls -R $1 > /dev/null 2>&1
end=`date +%s`

diff=`expr $end - $start`
echo "Started at $start : Ended at $end"
echo "Elapsed time = $diff seconds"
$ ./list.sh /usr/share
Started at 1173651284 : Ended at 1173651290
Elapsed time = 6 seconds
$

For more useful switches, see the man page, but here are a few handy ones:

$ date "+%a %b %d" # (in the local language)
Sun Mar 11
$ date +%D         # (show the full date)
03/11/07
$ date +%F         # (In another format)
2007-03-11
$ date +%j         # (how many days into the year)
070
$ date +%u         # (day of the week)
7
$

Looping in the shell (for and while)

March 7, 2007

In programming, the two most common types of loop are “for” and “while” loops. We can do both (and “repeat” loops, too, because that’s just a special case of the “while” loop) with the *nix shell. I’ve got some more detail in the tutorial.

for loops

Some languages have a “foreach” command; if you are used to such a language, then treat the shell’s for command as equivalent to foreach. If not, then don’t worry about it, just watch the examples, it’s about as simple as it can be.

$ for artist in Queen "Elvis Costello" Metallica
> do
> echo "I like $artist"
> done
I like Queen
I like Elvis Costello
I like Metallica
$

The for loop will simply go through whatever text it gets passed, and do the same stuff with each item.

Note that for “Elvis Costello” to be marked as one artist, not “Elvis” and “Costello”, I had to put quotes around the words. However, if these were files, then the following would suffice:

$ ls -l
total 0
-rw-r--r-- 1 steve steve 0 2007-03-07 00:45 Elvis Costello
-rw-r--r-- 1 steve steve 0 2007-03-07 00:45 Metallica
-rw-r--r-- 1 steve steve 0 2007-03-07 00:45 Queen
$ for artist in *
> do
> echo "I like $artist"
> done
I like Elvis Costello
I like Metallica
I like Queen
$

This time around, because we passed them via the shell’s interpreter, they are parsed in alphabetical order.

Many languages can do better than “foreach”, though: “for i=1 to 99 step 3“, for example, to step through 1,4,7,10 .. 91, 94, 97.

We can do this with a while loop.

while loops

While loops are not quite as simple as for loops; they have some kind of condition to match; when the condition does not match, the loop will exit.

The examples above, stepping through from 1 to 100 in increments of 3 (1, 4, 7, 10, … 91, 94, 97), can easily enough be done with a while loop:

$ i=1
$ while [ "$i" -lt "100" ]
> do
>   echo $i
>   i=`expr $i + 3`
> done
1
4
7
...
91
94
97

The “i=`expr $i + 3`” means “increment ‘i’ by 3″ (“i = i + 3” in most other languages).

The “-lt” means “is less than” (“-le” means “is less than or equal too; see “man test”, or http://steve-parker.org/sh/test.shtml and http://steve-parker.org/sh/quickref.shtml)


Tool Tip: “ls”

February 26, 2007

Yeah yeah, we know ls already.

But how much of ls‘s functionality do you actually use? There are so many switches to ls, that when Sun added extended attributes (does anyone use that?) they found that there were no letters left, so they had to use “-@” !

So, here are a couple of handy ls options, in no particular order; either for interactive or scripting use. I’m assuming GNU ls; Solaris ls supports most GNU-style features, but the “nice-to-have” features, like ls -h aren’t in historical UNIX ls implementations. I’ll split these into two categories: Sort ‘em and Show ‘em. What are your favourites?

Sort ‘em

When sorting, I tend to use the “-l (long listing)” and “-r (reverse order)” switches:

Sort ‘em by Size:

ls -lSr

Sort ‘em by Date:

ls -ltr

Show ‘em

There are a number of ways to show different attributes of the files you are listing; “-l” is probably the obvious example. However, there are a few more:

Show ‘em in columns

ls -C

Useful if you’re not seeing as many as you’d expect.

Show ‘em one by one

ls -1

That’s the number 1 (one) there, not the letter l (ell). Forces one-file-per-line. Particularly useful for dealing with strange filenames with whitespace in them.

Show ‘em as they are

ls -F

To append symbols (“*” for executables, “/” for directories, etc) to the filename to show further information about them.

Show ‘em so I can read it

ls -lh

Human-readable filesizes, so “12567166″ is shown as “12M”, and “21418″ is “21K”. This is handy for people, but of course, if you’re writing a script which wants to know file sizes, you’re better off without this (21Mb is bigger than 22Kb, after all!)

Show ‘em with numbers

ls -n

This is equivalent to ls -l, except that UID and GID are not looked up, so:

$ ls -l foo.txt
-rw-r--r-- 1 steve steve 46210 2006-11-25 00:33 foo.txt
$ ls -n foo.txt
-rw-r--r-- 1 1000 1000 46210 2006-11-25 00:33 foo.txt

This can be useful in a number of ways; particularly if your NIS (or other) naming service is down, or if you’ve imported a filesystem from another system.

What’s your favourite?

What are your most-used switches for the trusty old ls tool?


Harnessing the flexibility of Regular Expressions with Grep

February 14, 2007

There are two sides to grep – like any command, there’s the learning of syntax, the beginning
of which I covered in the grep tool tip. I’ll
come back to the syntax later, because there is a lot of it.

However, the more powerful side is grep‘s use of regular expressions. Again, there’s not room
here to provide a complete rundown, but it should be enough to cover 90% of usage. Once I’ve got a library
of grep-related stuff, I’ll post an entry with links to them all, with some covering text.

This or That

Without being totally case-insensitive (which -i) does,
we can search for “Hello” or “hello” by specifying the optional
characters in square brackets:

$ grep [Hh]ello *.txt
test1.txt:Hello. This is  test file.
test3.txt:hello
test3.txt:Hello
test3.txt:Why, hello there!

If we’re not bothered what the third letter is, then we can say “grep [Hh]e.o *.txt“, because the dot (“.”) will match any single character.

If we don’t care what the third and fourth letters are, so long as it’s “he..o”, then we say exactly that: “grep he..o” will match “hello”, hecko”, heolo”, so long as it is “he” + 1 character + “lo”.

If we want to find anything like that, other than “hello”, we can do that, too:

$ grep he[^l]lo *.txt
test2.txt:heclo
test3.txt:hewlo
test3.txt:hello

Notice how it doesn’t pick up any of the “Hello” variations which have a “llo” in them?

How many?!

We can specify how many times a character can repeat, too. We have to put the expression we’re talking about in [square brackets]:

  • “?” means “it might be there”
  • “+” means “it’s there, but there might be loads of them”
  • “*” means “lots (or none) might be there”

So, we can match “he”, followed by as many “l”s as you like (even none), followed by an “o” with “grep he[l]*o *.txt“:

$ grep he[l]*o *.txt
test2.txt:helo
test3.txt:hello
test3.txt:Why, hello there!
test3.txt:hellllo

Backgrounding

February 14, 2007

The great thing about Unix, and the Bourne shell, when it was introduced back in the day, was multitasking. It’s such an overused buzzword these days, but at the time, it was really a new thing. If you’ve only got one connection in to a machine, you can get it doing as much as you want.

The shell command to “do this in the background, then give me a new prompt to provide the next command” is the ampersand (“&”):

$  # I need to trawl the filesystem for files called "*dodgy*"
    #  (should have installed slocate
    #  (http://packages.debian.org/stable/source/slocate), 
    # but it's too late for that)
$ find / -name "*dodgy" -print
    (wait for a very very very long time)
/foo/bar/thisfileisnotdodgy.txt
/bar/foo/thisisadodgyfile.txt
$

Well, that’s a good hour of my life wasted.

Chuck it into a script, and run it in the background. If you want the outcome, direct it to a file:

$ cd /tmp
$ cat myfindscript.sh
find / -name "*dodgy*"
$ chmod u+rx myfindscript.sh
$ ./myfindscript.sh > /tmp/mysuspectfiles.txt &
[4402]
$
$ # wow, didja see that? It'll take ages, but I've got 
   # control back. "4402" is the Process ID (PID), 
   # so I can run "ps -fp 4402" to check on its
   # progress, but it's happening, in the backrgound.

You don’t get a lot of job control here; the “ps” mentioned above is about your lot, but you can spawn a child process and let it run, whilst you get on with the stuff you need.

This is known as “backgrounding” a task; if you know it will take a long time, just background it. Of course, if the next thing you need to do is to read the entire file, then you won’t get away with it, you’ll have to wait for it to finish. However, you could background it and then “tail -f /tmp/mysuspectfiles.txt” to check on the status.


Search and Replace

February 6, 2007

Two great tools for search-and-replace are tr and sed.

tr – Translate (or Delete)

tr can translate a single character into another. For example, “tr 'a' 'b'” will convert all instances of “a” into “b”. Although it works on single characters, it also understands blocks, so “tr '[A-Z]' '[a-z]'” will convert uppercase to lowercase: ‘A’ becomes ‘a’, ‘B’ becomes ‘b’, and so on.

The GNU version of tr (which comes with most Linux distributions) has some handy keywords, too: “tr [:lower:] [:upper:]” is a more readable opposite of the above [A-Z] convention.

You can also specify your own set of characters, so if you want to convert ‘l’ to ’1′, ‘o’ to ’0′, and ‘e’ to ’3′, then this will do the job:

$  echo welcome | tr 'loe' '103'
w31c0m3

You could extend it to a more complete “l33t sp33k”:

$ echo abcdefghijlkmnopqrstuvwxyz | tr 'aeilost' '@3!1057'
@bcd3fgh!j1kmn0pqr57uvwxyz

tr can also just delete – this deletes the letter ‘l’ :

$ echo hello and welcome | tr -d 'l'
heo and wecome

sed – Stream Editor

UNIX uses a few metaphors, one being a water metaphor, which we use with pipes (|), redirects (<, >), and a few other places. Sed gets its name from what it does… much like tr, you stream data into it, and slightly modified data comes out the other end.

sed isn’t limited to single-character operations; it can cope with whole phrases, as well as regular expressions. I’ll keep it simple(ish) for now, I plan to do a more complete post on sed and another on regular expressions soon, though. For today, I’ll stick to the sed s/from/to/count syntax.

With the s/from/to/count syntax, sed will convert “from” to “to”, as many times (per line of text) as you specify. The special “/g” converts every instance.

I like to get stuck in with a few examples, so here goes:

$ cat text.txt
Fedora Core is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Fedora Core is definitely
the best Linux distribution for home users.
Fedora Core is certainly my favourite distribution.
$ sed s/"Fedora Core"/"Ubuntu"/g text.txt
Ubuntu is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Ubuntu is definitely
the best Linux distribution for home users.
Ubuntu is certainly my favourite distribution.
$

The syntax there was sed s/from/to/count, so it replaces “Fedora Core” with “Ubuntu” in this example. If we specified “/1” at the end, it would only convert the first instance on each line. Similarly, “/2” would convert the first two instances. “/g” is probably the most-used, it converts everything (the “g” stands for “global”).

sed can emulate tr‘s tr -d functionality by having the “to” part being an empty string; here we refer to “Fedora Core” simply as “Fedora” (note the leading space: it’s ” Core”, not “Core”) :

$ sed s/" Core"//g text.txt
Fedora is my favourite distribution.
It's got just the right level of ease-of-use
along with regular updates, whilst remaining
a stable, supportable Operating System. In fact,
I'd go so far as to say that Fedora is the
best Linux distribution for home users.
Fedora is certainly my favourite distribution.

Notice also that we can cat stuff into sed, as “cat text.txt | sed s/src/dest/g“, or we can pass the file directly to sed, like this: sed s/src/dest/g text.txt. The same applies to most *nix commands.

To get in to the rest, we’ll need to get into regexp (Regular Expressions, the stuff like “the * brown * jumped over the * dog” which result in $1 = “quick”, $2 = “fox”, $3 = “lazy”). That’s for another day, though.


cut

January 31, 2007

The cut utility is a great example of the UNIX philosophy of “do one thing, and do it well.” cut just cuts lines of text, based on a (single-character) delimiter.

There are two basic forms in which cut is generally used:

1. Grab These Columns (cut -c)

One form of cut gets certain characters, or columns of characters, out of a file. This is done with the cut -c command. So we can get the 5th character of the string:

$ echo "Hello, World" | cut -c 5
o

Or we can get the first 5 characters:

$ echo "Hello, World" | cut -c -5
Hello

Or from character #5 onwards:

$ echo "Hello, World" | cut -c 5-
o, World

Or even just characters 2-5:

$ echo "Hello, World" | cut -c 2-5
ello

Or maybe just select characters 5, 8 and 9:

$ echo "Hello, World" | cut -c 5,8,9
owo

2. Delimited (cut -d)

The other method is to use a delimiter. For example, the /etc/passwd file looks something like thism with the fields delimited by the “:” (colon) symbol:

root:x:0:0:root:/root:/bin/bash
steve:x:1000:1000:Steve Parker,,,:/home/steve:/bin/bash

In this example, field 1 is “root” (or “steve”), field 2 is “x”, and so on…. the last field (7) is “/bin/bash” for both accounts, in this case.

So we just have to specify the delimiter (:) and the field number(s). Field 1 is the account name, 6 is the home directory, field 3 is the UserID…

$ cut -d: -f1 /etc/passwd
root
steve
$ cut -d: -f6 /etc/passwd
/root
/home/steve
$ grep "^root:" /etc/passwd | cut -d: -f3
0

(In the last example, we got *just* the root account, with a grep which searches for a line starting with “root:”, which could only be the root account, and not (for example) “Fred Troot” which would also match a search for “root”)

cut is one of those really simple, but really useful utilities. And because it’s very simple, it’s nice and quick (which matters a lot if you’re looping through a few hundred times.


Follow

Get every new post delivered to your Inbox.