Arrays In bash
Arrays were introduced in version 2 of bash about fifteen years ago. Since then, they have been on the fringes of shell programming. I, in fact, have never seen a shell script "in the wild" that used them. None of the scripts on LinuxCommand.org, for example, use arrays.
Why is this? Arrays are widely used in other programming languages. I see two reasons for this lack of popularity. First, arrays are not a traditional shell feature. The Bourne shell, which bash was designed to emulate and replace, offers no array support at all. Second, arrays in bash are limited to a single dimension, an unusual limitation given that virtually every other programming language supports multi-dimensional arrays.
Some Background
I devote a chapter in my book to bash arrays but briefly, bash supports single dimension array variables. Arrays behave like a column of numbers in a spread sheet. A single array variable contains multiple values called elements. Each element is accessed via an address called an index or subscript. All versions of bash starting with version 2 support integer indexes. For example, to create a five element array in bash called numbers containing the strings "zero" through "four", we would do this:
bshotts@twin7:~$ numbers=(zero one two three four)
After creating the array, we can access individual elements by specifying the array element's index:
bshotts@twin7:~$ echo ${numbers[2]}
two
The braces are required to prevent the shell from misinterpreting the brackets as wildcard characters used by pathname expansion.
Arrays are not very useful on the command line but are very useful in programming because they work well with loops. Here's an example using the array we just created:
#!/bin/bash
# array-1: print the contents of an array
numbers=(zero one two three four)
for i in {0..4}; do
echo ${numbers[$i]}
done
When executed, the script prints each element in the numbers array:
bshotts@twin7:~$ array-1
zero
one
two
three
four
mapfile Command
bash version 4 added the mapfile command. This command copies a file line-by-line into an array. It is basically a substitute for the following code:
while read line
array[i]="$line"
i=$((i + 1))
done < file
with mapfile, you can use the following in place of the code above:
mapfile array < file
mapfile handles the case of a missing newline at the end of the file and creates empty array elements when it encounters a blank line in the file. It also supports ranges within the file and the array.
Associative Arrays
By far, the most significant new feature in bash 4.x is the addition of associative arrays. Associative arrays use strings rather than integers as array indexes. This capability allow interesting new approaches to managing data. For example, could create an array called colors and use color names as indexes:
colors["red"]="#ff0000"
colors["green"]="#00ff00"
colors["blue"]="#0000ff"
Associative array elements are accessed in much the same way as integer indexed arrays:
echo ${colors["blue"]}
In the script that follows, we will look at several programming techniques that can be employed in conjunction with associative arrays. This script, called array-2, when given the name of a directory, prints a lsting of the files in the directory along with the names of the the file's owner and group owner. At the end of listing, the script prints a tally of the number of files belonging to each owner and group. Here we see the results (truncated for brevity) when the script is given the directory /usr/bin:
bshotts@twin7:~$ array-2 /usr/bin
/usr/bin/2to3-2.6 root root
/usr/bin/2to3 root root
/usr/bin/a2p root root
/usr/bin/abrowser root root
/usr/bin/aconnect root root
/usr/bin/acpi_fakekey root root
/usr/bin/acpi_listen root root
/usr/bin/add-apt-repository root root
.
.
.
/usr/bin/zipgrep root root
/usr/bin/zipinfo root root
/usr/bin/zipnote root root
/usr/bin/zip root root
/usr/bin/zipsplit root root
/usr/bin/zjsdecode root root
/usr/bin/zsoelim root root
File owners:
daemon : 1 file(s)
root : 1394 file(s)
File group owners:
crontab : 1 file(s)
daemon : 1 file(s)
lpadmin : 1 file(s)
mail : 4 file(s)
mlocate : 1 file(s)
root : 1380 file(s)
shadow : 2 file(s)
ssh : 1 file(s)
tty : 2 file(s)
utmp : 2 file(s)
Here is a listing of the script:
1 #!/bin/bash
2
3 # array-2: Use arrays to tally file owners
4
5 declare -A files file_group file_owner groups owners
6
7 if [[ ! -d "$1" ]]; then
8 echo "Usage: array-2 dir" >&2
9 exit 1
10 fi
11
12 for i in "$1"/*; do
13 owner=$(stat -c %U "$i")
14 group=$(stat -c %G "$i")
15 files["$i"]="$i"
16 file_owner["$i"]=$owner
17 file_group["$i"]=$group
18 ((++owners[$owner]))
19 ((++groups[$group]))
20 done
21
22 # List the collected files
23 { for i in "${files[@]}"; do
24 printf "%-40s %-10s %-10s\n" \
25 "$i" ${file_owner["$i"]} ${file_group["$i"]}
26 done } | sort
27 echo
28
29 # List owners
30 echo "File owners:"
31 { for i in "${!owners[@]}"; do
32 printf "%-10s: %5d file(s)\n" "$i" ${owners["$i"]}
33 done } | sort
34 echo
35
36 # List groups
37 echo "File group owners:"
38 { for i in "${!groups[@]}"; do
39 printf "%-10s: %5d file(s)\n" "$i" ${groups["$i"]}
40 done } | sort
Line 5: Unlike integer indexed arrays, which are created by merely referencing them, associative arrays must be created with the declare command using the new -A option. In this script we create five arrays as follows:
- files contains the names of the files in the directory, indexed by file name
- file_group contains the group owner of each file, indexed by file name
- file_owner contains the owner of each file, indexed by file name
- groups contains the number of files belonging to the indexed group
- owners contains the number of files belonging to the indexed owner
Lines 7-10: Checks to see that a valid directory name was passed as a positional parameter. If not, a usage message is displayed and the script exits with an exit status of 1.
Lines 12-20: Loop through the files in the directory. Using the stat command, lines 13 and 14 extract the names of the file owner and group owner and assign the values to their respective arrays (lines 16, 17) using the name of the file as the array index. Likewise the file name itself is assigned to the files array (line 15).
Lines 18-19: The total number of files belonging to the file owner and group owner are incremented by one.
Lines 22-27: The list of files is output. This is done using the "${array[@]}" parameter expansion which expands into the entire list of array element with each element treated as a separate word. This allows for the possibility that a file name may contain embedded spaces. Also note that the entire loop is enclosed in braces thus forming a group command. This permits the entire output of the loop to be piped into the sort command. This is necessary because the expansion of the array elements is not sorted.
Lines 29-40: These two loops are similar to the file list loop except that they use the "${!array[@]}" expansion which expands into the list of array indexes rather than the list of array elements.
Further Reading
The Linux Command Line
- Chapter 36 (Arrays)
- Chapter 37 (Group commands)
A Wikipedia article on associative arrays:
The Complete HTML Color Chart:
Other installments in this series: 1 2 3 4
I use arrays all of the time in batch processing for one of the largest financial corporations in the world. I really don't know what I would do without them. :)
ReplyDelete2 or more dimensional arrays are fictions in most programming languages. Arrays are 1D, with "rows" or "columns" assumed uniform in size, with simple offset calculations being done, usually at compile time. Remember fortran. young people ? So a 1D array is merely a minor challenge. Bash is a bit different. Maybe not better, but different.
ReplyDeleteIf you want to write real programs, one should consider real programming languages. Especially if you feel the need for arrays, it's time to consider Perl, Tcl or Python. The semenatics are close enough.
ReplyDeleteShells should not be abused as programming languages. And the way bash development is going is all wrong.
We don't need a sluggish and memory-ungry programming language as base shell in all Linux distributions. The shells purpose is to be a slim glue interface between apps.
I think the bash devlopers should finally rename it to GNUSH or something. The name bash becomes inappropriate at this time. I very strongly believe, they don't add these new features because it was widely asked for. This is just their way of doing embrace & extend, to make Linux programmers dependend on yet another destandardized/non-unix GNU tool.
I agree with @mario,
ReplyDeleteThe shells purpose is to be a slim glue interface between apps.
I kind of agree, too, but there is no general point of view about if this shell is overloaded and what's too much. Too many people have too many different ideas aabout that. Well, I think David Korn went the right way in the middle, but that's also just my personal opinion.
ReplyDeleteAnyways, Bash is what it is. Probably we/they should spend some time to modernize the internal code a bit instead of implementing redundant features - if you look at Bash's source, it feels a bit dusty. But to say Bash is not a Bash anymore and should be renamed is a bit of overrating the situation.
I also agree that a UNIX shell is not a general purpose programming language (it is - however - a programming language, to some extent), so it's wise to add features that support this. If you ever hacked scripts that work on Sinix aswell as Linux or Solaris, you know that the standard shell level is ugly. But the main focus should always be to support the user (interactive or not) interfacing with the system.
@mario raises a good point.. The answer to an ever-expanding bash feature set is the "sh" shell (or dash depending on your distro).
ReplyDeleteThis is designed to be a minimal/POSIX-compliant shell, that most system scripts run under.
Personally, I'm a fan of the recent bash functionality..
Use any language or shell you want, or write your own. Just don't bash the BASH man ;-)
ReplyDelete