IFS – Internal Field Separator

It seems like an esoteric concept, but it’s actually very useful.

If your input file is “1 apple steve@example.com”, then your script could say:

while read qty product customer
do
  echo "${customer} wants ${qty} ${product}(s)"
done

The read command will read in the three variables, because they’re spaced out from each other.

However, critical data is often presented in spreadsheet format. If you save these as CSV files, it will come out like this:

1,apple,steve@example.com

This contains no spaces, and the above code will not be able to understand it. It will take the whole thing as one item – the first thing, quanity, $qty, and set the other two fields as blank.

The way around this, is to tell the entire shell, that “,” (the comma itself) separates fields; it’s the “internal field separator”, or IFS.

The IFS variable is set to space/tab/newline, which isn’t easy to set in the shell, so it’s best to save the original IFS to another variable, so you can put it back again after you’ve messed around with it. I tend to use “oIFS=$IFS” to save the current value into “oIFS”.

Also, when the IFS variable is set to something other than the default, it can really mess with other code.

Here’s a script I wrote today to parse a CSV file:

#!/bin/sh
oIFS=$IFS     # Always keep the original IFS!
IFS=","          # Now set it to what we want the "read" loop to use
while read qty product customer
do
  IFS=$oIFS
  # process the information
  IFS=","       # Put it back to the comma, for the loop to go around again
done < myfile.txt

It really is that easy, and it’s very versatile. You do have to be careful to keep a copy of the original (I always use the name oIFS, but whatever suits you), and to put it back as soon as possible, because so many things invisibly use the IFS – grep, cut, you name it. It’s surprising how many things within the “while read” loop actually did depend on the IFS being the default value.

About these ads

13 Responses to IFS – Internal Field Separator

  1. mo6 says:

    Great stuff! Finally it’s dead easy to create an CSV parser in BASH. Thanks.

  2. unixshell says:

    You’d have thought so, wouldn’t you?

    Watch out for quotation marks in the CSV… some applications use them, some don’t. Some require them.

  3. jonny says:

    Thanks for the awesome article!
    also, if you want to use only the line breaks as a separator you can use
    IFS=$’\n’

    I think the IFS gets set in your bashrc aka the IFS will get reset when you make a new shell… so if you want sub shells to see your change to IFS remember to use export.

    if you want to peek at the value in your IFS, you might try:
    printf ‘\n’ “$IFS” | cat -vt
    everything between the is the value of your IFS in secret code.

    looking at the return of
    set | grep IFS
    my IFS is set to
    IFS=$’ \t\n’
    When I open a shell.

  4. jonny says:

    looks like I lost my angle brackets and percent sign s on that last comment making the line:
    printf ‘\n’ “$IFS” | cat -vt
    should read:
    printf ‘greaterThanPercentSignLowerCaseSGreaterThan\n’ “$IFS” | cat -vt

  5. Pai says:

    The article was really informaive thanks to the author.

    Jonny thanks for your comment regarding line break seperator, it was of great help.

    Keep up the good work.

    cheers
    pai

  6. kane says:

    When i do set | grep IFS

    $ set | grep IFS
    HZ=’ ‘
    IFS=’

    Can someone tell is this IFS environment in KSH set correct by default?

  7. unixshell says:

    Hello Kane,

    Using ksh on Solaris, I ran this:

    $ set|grep IFS|od -c
    0000000 I F S = ‘ \t \n
    0000010
    $

    This shows that IFS consists of a space (” “), a tab (“\t”) and a newline (“\n”).

    This is the correct setting for IFS.

    “od” stands for “octal dump”, and “od -c” displays each character as a byte. The “0000000″ and “0000010″ are the offsets (in Octal, base 8); the relevant stuff is the “I F S = ‘ \t \n” part.

    Hope this helps.

    Maybe I should write a post about od some time… it’s a strange and arcane command, but it does sometimes turn out to be very useful.

  8. linuxball says:

    A somewhat simpler version below (does not require to save and restore IFS) does not set IFS globally but locally only for the following “read” command.

    Background:

    VAR1= [VAR2= ...] command

    Above line sets and exports the environment variable(s) VAR1, VAR2, … ONLY for the the environment of “command”. It will not change the setting for the current shell.

    Example:

    #!/bin/sh
    while IFS=, read qty product customer
    do
    # process the information
    echo “$customer wants $qty $product(s)”
    done < myfile.txt

  9. unixshell says:

    True. And perfectly valid.

    I prefer the explicit mangling and unmangling of IFS, though this is also valid for all shells I am aware of.

  10. tsolox says:

    i prefer the explicit setting and the restoring to original value of IFS than that trick. That trick only assigns the new value to IFS within the ‘while line’ and not within the while loop…which got me a little confused..and besides, i need the IFS inside the while loop..so it’s better that i explicitely control the IFS value.

  11. [...] IFS – Internal Field Separator September 200710 comments 5 [...]

  12. Cokes says:

    Good Work ! Thanks.

  13. [...] Shell ist die Debian-Almquist-shell (dash), um die Sache möglichst POSIX-konform zu halten. “IFS” ist übrigens eine extrem nützliche Umgebunsvariable. Das ganze lässt sich wunderbar in [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: