Thursday, April 22, 2010

New Features In Bash Version 4.x - Part 4

In this final installment of our series, we will look at perhaps the most significant area of change in bash version 4.x: arrays.

Arrays In bash

Arrays were introduced in version 2 of bash about fifteen years ago.  Since then, they have been on the fringes of shell programming.  I, in fact, have never seen a shell script "in the wild" that used them.  None of the scripts on LinuxCommand.org, for example, use arrays.

Why is this?  Arrays are widely used in other programming languages.  I see two reasons for this lack of popularity.  First, arrays are not a traditional shell feature.  The Bourne shell, which bash was designed to emulate and replace, offers no array support at all.  Second, arrays in bash are limited to a single dimension, an unusual limitation given that virtually every other programming language supports multi-dimensional arrays.

Some Background

I devote a chapter in my book to bash arrays but briefly, bash supports single dimension array variables.  Arrays behave like a column of numbers in a spread sheet.  A single array variable contains multiple values called elements.  Each element is accessed via an address called an index or subscript.  All versions of bash starting with version 2 support integer indexes.  For example, to create a five element array in bash called numbers containing the strings "zero" through "four", we would do this:

bshotts@twin7:~$ numbers=(zero one two three four)

After creating the array, we can access individual elements by specifying the array element's index:

bshotts@twin7:~$ echo ${numbers[2]}
two

The braces are required to prevent the shell from misinterpreting the brackets as wildcard characters used by pathname expansion.

Arrays are not very useful on the command line but are very useful in programming because they work well with loops.  Here's an example using the array we just created:

#!/bin/bash

# array-1: print the contents of an array

numbers=(zero one two three four)

for i in {0..4}; do
    echo ${numbers[$i]}
done

When executed, the script prints each element in the numbers array:

bshotts@twin7:~$ array-1
zero
one
two
three
four

mapfile Command

bash version 4 added the mapfile command.  This command copies a file line-by-line into an array.  It is basically a substitute for the following code:

while read line
    array[i]="$line"
    i=$((i + 1))
done < file

with mapfile, you can use the following in place of the code above:

mapfile array < file

mapfile handles the case of a missing newline at the end of the file and creates empty array elements when it encounters a blank line in the file.  It also supports ranges within the file and the array.

Associative Arrays

By far, the most significant new feature in bash 4.x is the addition of associative arrays.  Associative arrays use strings rather than integers as array indexes.  This capability allow interesting new approaches to managing data.  For example, could create an array called colors and use color names as indexes:

colors["red"]="#ff0000"
colors["green"]="#00ff00"
colors["blue"]="#0000ff"

Associative array elements are accessed in much the same way as integer indexed arrays:

echo ${colors["blue"]}

In the script that follows, we will look at several programming techniques that can be employed in conjunction with associative arrays.  This script, called array-2, when given the name of a directory, prints a lsting of the files in the directory along with the names of the the file's owner and group owner.  At the end of listing, the script prints a tally of the number of files belonging to each owner and group.  Here we see the results (truncated for brevity) when the script is given the directory /usr/bin:

bshotts@twin7:~$ array-2 /usr/bin
/usr/bin/2to3-2.6                        root       root      
/usr/bin/2to3                            root       root      
/usr/bin/a2p                             root       root      
/usr/bin/abrowser                        root       root      
/usr/bin/aconnect                        root       root      
/usr/bin/acpi_fakekey                    root       root      
/usr/bin/acpi_listen                     root       root      
/usr/bin/add-apt-repository              root       root
.
.
.
/usr/bin/zipgrep                         root       root      
/usr/bin/zipinfo                         root       root      
/usr/bin/zipnote                         root       root      
/usr/bin/zip                             root       root      
/usr/bin/zipsplit                        root       root      
/usr/bin/zjsdecode                       root       root      
/usr/bin/zsoelim                         root       root      

File owners:
daemon    :     1 file(s)
root      :  1394 file(s)

File group owners:
crontab   :     1 file(s)
daemon    :     1 file(s)
lpadmin   :     1 file(s)
mail      :     4 file(s)
mlocate   :     1 file(s)
root      :  1380 file(s)
shadow    :     2 file(s)
ssh       :     1 file(s)
tty       :     2 file(s)
utmp      :     2 file(s)

Here is a listing of the script:

     1    #!/bin/bash
     2    
     3    # array-2: Use arrays to tally file owners
     4    
     5    declare -A files file_group file_owner groups owners
     6    
     7    if [[ ! -d "$1" ]]; then
     8        echo "Usage: array-2 dir" >&2
     9        exit 1
    10    fi
    11    
    12    for i in "$1"/*; do
    13        owner=$(stat -c %U "$i")
    14        group=$(stat -c %G "$i")
    15        files["$i"]="$i"
    16        file_owner["$i"]=$owner
    17        file_group["$i"]=$group
    18        ((++owners[$owner]))
    19        ((++groups[$group]))
    20    done
    21    
    22    # List the collected files
    23    { for i in "${files[@]}"; do
    24        printf "%-40s %-10s %-10s\n" \
    25            "$i" ${file_owner["$i"]} ${file_group["$i"]}
    26    done } | sort
    27    echo
    28    
    29    # List owners
    30    echo "File owners:"
    31    { for i in "${!owners[@]}"; do
    32        printf "%-10s: %5d file(s)\n" "$i" ${owners["$i"]}
    33    done } | sort
    34    echo
    35    
    36    # List groups
    37    echo "File group owners:"
    38    { for i in "${!groups[@]}"; do
    39        printf "%-10s: %5d file(s)\n" "$i" ${groups["$i"]}
    40    done } | sort

Line 5: Unlike integer indexed arrays, which are created by merely referencing them, associative arrays must be created with the declare command using the new -A option.  In this script we create five arrays as follows:
  1. files contains the names of the files in the directory, indexed by file name
  2. file_group contains the group owner of each file, indexed by file name
  3. file_owner contains the owner of each file, indexed by file name
  4. groups contains the number of files belonging to the indexed group
  5. owners contains the number of files belonging to the indexed owner

Lines 7-10: Checks to see that a valid directory name was passed as a positional parameter.  If not, a usage message is displayed and the script exits with an exit status of 1.

Lines 12-20:  Loop through the files in the directory.  Using the stat command, lines 13 and 14 extract the names of the file owner and group owner and assign the values to their respective arrays (lines 16, 17) using the name of the file as the array index.  Likewise the file name itself is assigned to the files array (line 15).

Lines 18-19:  The total number of files belonging to the file owner and group owner are incremented by one.

Lines 22-27:  The list of files is output.  This is done using the "${array[@]}" parameter expansion which expands into the entire list of array element with each element treated as a separate word.  This allows for the possibility that a file name may contain embedded spaces.  Also note that the entire loop is enclosed in braces thus forming a group command.  This permits the entire output of the loop to be piped into the sort command.  This is necessary because the expansion of the array elements is not sorted.

Lines 29-40:  These two loops are similar to the file list loop except that they use the "${!array[@]}" expansion which expands into the list of array indexes rather than the list of array elements.

Further Reading

The Linux Command Line
  • Chapter 36 (Arrays)
  • Chapter 37 (Group commands)
Other bash 4.x references:
A Wikipedia article on associative arrays:
The Complete HTML Color Chart:
Other installments in this series: 1 2 3 4

7 comments:

  1. I use arrays all of the time in batch processing for one of the largest financial corporations in the world. I really don't know what I would do without them. :)

    ReplyDelete
  2. 2 or more dimensional arrays are fictions in most programming languages. Arrays are 1D, with "rows" or "columns" assumed uniform in size, with simple offset calculations being done, usually at compile time. Remember fortran. young people ? So a 1D array is merely a minor challenge. Bash is a bit different. Maybe not better, but different.

    ReplyDelete
  3. If you want to write real programs, one should consider real programming languages. Especially if you feel the need for arrays, it's time to consider Perl, Tcl or Python. The semenatics are close enough.

    Shells should not be abused as programming languages. And the way bash development is going is all wrong.
    We don't need a sluggish and memory-ungry programming language as base shell in all Linux distributions. The shells purpose is to be a slim glue interface between apps.

    I think the bash devlopers should finally rename it to GNUSH or something. The name bash becomes inappropriate at this time. I very strongly believe, they don't add these new features because it was widely asked for. This is just their way of doing embrace & extend, to make Linux programmers dependend on yet another destandardized/non-unix GNU tool.

    ReplyDelete
  4. I agree with @mario,

    The shells purpose is to be a slim glue interface between apps.

    ReplyDelete
  5. I kind of agree, too, but there is no general point of view about if this shell is overloaded and what's too much. Too many people have too many different ideas aabout that. Well, I think David Korn went the right way in the middle, but that's also just my personal opinion.

    Anyways, Bash is what it is. Probably we/they should spend some time to modernize the internal code a bit instead of implementing redundant features - if you look at Bash's source, it feels a bit dusty. But to say Bash is not a Bash anymore and should be renamed is a bit of overrating the situation.

    I also agree that a UNIX shell is not a general purpose programming language (it is - however - a programming language, to some extent), so it's wise to add features that support this. If you ever hacked scripts that work on Sinix aswell as Linux or Solaris, you know that the standard shell level is ugly. But the main focus should always be to support the user (interactive or not) interfacing with the system.

    ReplyDelete
  6. @mario raises a good point.. The answer to an ever-expanding bash feature set is the "sh" shell (or dash depending on your distro).
    This is designed to be a minimal/POSIX-compliant shell, that most system scripts run under.

    Personally, I'm a fan of the recent bash functionality..

    ReplyDelete
  7. Use any language or shell you want, or write your own. Just don't bash the BASH man ;-)

    ReplyDelete