It seems like an esoteric concept, but it’s actually very useful.
If your input file is “1 apple steve@example.com”, then your script could say:
while read qty product customer
do
echo "${customer} wants ${qty} ${product}(s)"
done
The read command will read in the three variables, because they’re spaced out from each other.
However, critical data is often presented in spreadsheet format. If you save these as CSV files, it will come out like this:
1,apple,steve@example.com
This contains no spaces, and the above code will not be able to understand it. It will take the whole thing as one item – the first thing, quanity, $qty, and set the other two fields as blank.
The way around this, is to tell the entire shell, that “,” (the comma itself) separates fields; it’s the “internal field separator”, or IFS.
The IFS variable is set to space/tab/newline, which isn’t easy to set in the shell, so it’s best to save the original IFS to another variable, so you can put it back again after you’ve messed around with it. I tend to use “oIFS=$IFS” to save the current value into “oIFS”.
Also, when the IFS variable is set to something other than the default, it can really mess with other code.
Here’s a script I wrote today to parse a CSV file:
#!/bin/sh oIFS=$IFS # Always keep the original IFS! IFS="," # Now set it to what we want the "read" loop to use while read qty product customer do IFS=$oIFS # process the information IFS="," # Put it back to the comma, for the loop to go around again done < myfile.txt
It really is that easy, and it’s very versatile. You do have to be careful to keep a copy of the original (I always use the name oIFS, but whatever suits you), and to put it back as soon as possible, because so many things invisibly use the IFS – grep, cut, you name it. It’s surprising how many things within the “while read” loop actually did depend on the IFS being the default value.
June 20, 2008 at 2:42 pm |
Great stuff! Finally it’s dead easy to create an CSV parser in BASH. Thanks.
June 20, 2008 at 4:35 pm |
You’d have thought so, wouldn’t you?
Watch out for quotation marks in the CSV… some applications use them, some don’t. Some require them.
June 24, 2008 at 7:35 pm |
Thanks for the awesome article!
also, if you want to use only the line breaks as a separator you can use
IFS=$’\n’
I think the IFS gets set in your bashrc aka the IFS will get reset when you make a new shell… so if you want sub shells to see your change to IFS remember to use export.
if you want to peek at the value in your IFS, you might try:
printf ‘\n’ “$IFS” | cat -vt
everything between the is the value of your IFS in secret code.
looking at the return of
set | grep IFS
my IFS is set to
IFS=$’ \t\n’
When I open a shell.
June 24, 2008 at 7:41 pm |
looks like I lost my angle brackets and percent sign s on that last comment making the line:
printf ‘\n’ “$IFS” | cat -vt
should read:
printf ‘greaterThanPercentSignLowerCaseSGreaterThan\n’ “$IFS” | cat -vt
January 19, 2009 at 7:21 am |
The article was really informaive thanks to the author.
Jonny thanks for your comment regarding line break seperator, it was of great help.
Keep up the good work.
cheers
pai
January 22, 2009 at 9:06 am |
When i do set | grep IFS
$ set | grep IFS
HZ=’ ‘
IFS=’
‘
Can someone tell is this IFS environment in KSH set correct by default?
January 23, 2009 at 2:11 am |
Hello Kane,
Using ksh on Solaris, I ran this:
$ set|grep IFS|od -c
0000000 I F S = ‘ \t \n
0000010
$
This shows that IFS consists of a space (” “), a tab (“\t”) and a newline (“\n”).
This is the correct setting for IFS.
“od” stands for “octal dump”, and “od -c” displays each character as a byte. The “0000000″ and “0000010″ are the offsets (in Octal, base 8); the relevant stuff is the “I F S = ‘ \t \n” part.
Hope this helps.
Maybe I should write a post about od some time… it’s a strange and arcane command, but it does sometimes turn out to be very useful.
May 15, 2009 at 3:15 pm |
A somewhat simpler version below (does not require to save and restore IFS) does not set IFS globally but locally only for the following “read” command.
Background:
VAR1= [VAR2= ...] command
Above line sets and exports the environment variable(s) VAR1, VAR2, … ONLY for the the environment of “command”. It will not change the setting for the current shell.
Example:
#!/bin/sh
while IFS=, read qty product customer
do
# process the information
echo “$customer wants $qty $product(s)”
done < myfile.txt
May 16, 2009 at 12:56 am |
True. And perfectly valid.
I prefer the explicit mangling and unmangling of IFS, though this is also valid for all shells I am aware of.