Friday, April 30, 2010

The Bugs In Ubuntu 10.04


Now that Ubuntu 10.04 ("Lucid Lynx") has been released, I can spend some time talking about my experience testing it.

I was really hoping that 10.04, being a LTS (Long Term Support) release, would have focused on supreme reliability and stability.  A sort of "9.10 without the bugs."  Unfortunately this was not the case.  10.04 introduces a host of new features and technologies, some of which are still rather "green."

In the comments to follow, I was to vent some of my frustrations over the quality of the 10.04 release.  I don't want to disparage the work of the people in the Ubuntu community nor the staff at Canonical.  I'm sure they worked very hard getting this version out.  Many of the problems are rooted in the upstream projects from which Ubuntu is derived.

A Pet Peeve

If you go to the bug tracking site for Ubuntu, the first thing you see is a long list of open "Critical" bugs.  Looking at the list you notice that some of these bugs are very old.  Discounting the symbolic bug number 1, the "We don't have as much market share as Microsoft" bug, you see that some of these open, critical bugs are years old.

Now, I used to administer a bug database (albeit a much smaller one than the one at Ubuntu), and to my eye this just looks terrible.  It leaves a bad impression.  If the bug is really "Critical," then it should get addressed either by marking it no longer relevant due to its age, or it should get fixed.

Overt Cosmetic Problems

While a lot of attention was given to the look of Ubuntu 10.04, serious cosmetic problems appear on many machines.  Neither of my test systems could take full advantage of the visual "improvements" during boot up.  There are a lot of distracting flashes and, on my laptop, it displays the image you see at the top of this post.

Not very impressive.

O.K. so maybe I have weird hardware of something, but how do you explain this:  click on the help icon on the top panel and wait for Ubuntu Help Center to come up.  Select "Advanced Topics" from the topics list then "Terminal Commands References [sic]" and look at the result:


Nice.

Connectivity Issues

I'm sure that the folks at Canonical are very interested in corporate adoption of their product.  This means, of course that it has got to play well with Windows file shares.  Unfortunately, here too, 10.04 falls down.  Throughout the beta test phase there were numerous problems with gvfs and gnome-keyring.  As it stands now, many of these of these problems have been worked out, but as of today, you still cannot mount a password protected Windows share if you store the password.  It seems to work at first, but if you reboot your machine and try it again you get something like this:


X Problems Galore

While I  encountered some important problems with my desktop test system, they were nothing compared to the problems I have with my laptop.

A few words about my laptop.  Yeah it's old.  It's an IBM ThinkPad T41 circa 2004.  I bought it from EmperorLinux pre-installed with Fedora Core 1.  Since then, I have run numerous Linux distributions on it without problems.  In fact, the T41 was a favorite of Linux users since it ran Linux so easily.  This is, until 10.04.

Various things will just crash the X server, such as using the NoScript Firefox extension, or the remote desktop viewer.  Suspend and resume don't work (the display never comes back).  You can work around these problems if you add the kernel parameter nomodeset to grub, but then all of your videos in totem look like this:


Not exactly what I had in mind.
 
A Release Too Far?

I am still looking forward to upgrading to 10.04.  I'm hopeful that, in time, the issues I have with 10.04 will be addressed.  But what about in the meantime, with thousands of possibly new users trying out the live CD only to find issues like the ones I found?  That's not good.

Admittedly, I got spoiled by 8.04.  It has served me very well for the last two years.  It got me through the production of my book without crashing once and frankly I'm willing to wait a few seconds more for my system to boot and willing to give up a lot of CPU-sucking, memory-hogging eye candy to have a system that stays up and can do the basic things right.

Further Reading
[UPDATE] I'm not the only one with concerns about 10.04:

Tuesday, April 27, 2010

The Financial Physics of Free Software

In the Internet age, does software have value?  Of course software is valuable in the sense that it provides service and is useful, but does software have monetary value?

If one looks at the law of supply and demand, the fact that software, like all other forms of digital content, can be endlessly reproduced and distributed at virtually no cost negates its value because software distributed this way lacks scarcity.  Digital content is simply not a scarce resource.  This hasn't stopped people from trying to impose artificial scarcity on digital data through the use of digital restrictions management (DRM) and draconian imaginary property laws but these approaches have had only limited success.  This is not surprising as attempting to create an artificial shortage goes against the physical nature of the Internet and of computers themselves.

The Proprietary Model

If you have ever checked out my resume, you know that I spent the greater portion of my career in the proprietary software world and was, at one time, a big supporter of proprietary software.  I was fortunate to have spent all of my years in the software industry working for small companies where one could wander the halls and learn every aspect of the business.  In addition to being a technical manager, I had number of marketing and sales assignments as well.

Software development in the proprietary world is speculative.  Typically, a product manager or marketing director is given the assignment of coming up with the "next big thing," a product that can sold to many customers at a profit.  The reason that it has to be big is because proprietary software development is fantastically expensive. The product manager will present ideas to management and get approval for some personnel and a budget based on the product manager's forecasts for delivery dates and sales targets.  After approval, the software development process begins,  In some companies this process is very formal including requirements specifications, design reviews, test plans, etc.  At the end of the process, the software product goes to market.  This involves a number significant expenses including marketing, advertising, trade shows, etc.

It is important to remember that proprietary software companies don't actually sell software.  They sell licenses.  It is through this mechanism that they attempt to create a scarcity that gives their product value.

Proprietary software only has value once it is written.  You will sometimes see product announcements appear for non-existent yet-to-be-developed products.  Such products are derisively known as "vaporware" in the industry because proprietary software does not have value until it is written and actually availabile in short supply.

The Free Software Model

To members of the proprietary software community, the notion of free software appears insane.  This is because they think that free software means that they have to go through all of the steps and expense of the process above and then not collect any revenue on the back-end.  There are a number of problems with this assumption.

The development process for free software is fundamentally different.  First off, it is not speculative.  Developers of free software typically have an interest in actually using the program they want to write.  It also means that free software developers are usually subject matter experts for their chosen program.

Free software is much less expensive to produce than proprietary software.  The development process is much less formal than closed proprietary processes owing to the fact that development is done in the open.  This allows a more natural and organic method of solving problems and fixing bugs and, unlike proprietary development, the development tools and shared software components are free.  Free software also does not incur the engineering overhead of implementing "copy protection," user registration systems, and tiered product versions that are used to establish upgrade paths for proprietary products.

Finally, free software products don't have the marketing and sales expenses of proprietary software.

Making Money

While the proprietary software appears to make a lot of money now, is it sustainable?  Will the Internet and its ability to perform infinite duplication and distribution drain the value from software?  Only time will tell, but I'm betting that the Internet will emerge victorious.  We can already see the signs of this victory with the rise of "cloud computing" which is eliminating the need for software all together.  But cloud computing raises a number of issues including privacy and security, as well as freedom.

There has been a lot of discussion of how to make money with free software.  Most of the ideas put forth involve charging for services.  After all, Red Hat, a very successful software company, makes its money that way, but I want to suggest another possibility.

As we saw, proprietary software only has value after it is written and is available for license sales.  The free software model assumes from the start that once a program is written it no longer has value because it is not scarce.  In contrast to proprietary software, free software only has value before it is written.  The absence of a desired software program is the ultimate scarcity.  There exists an opportunity to exploit this fact.  It's not really a new idea by any means.  This is how the custom software business works.  Clients want something and pay big money to get something written.  What I envision is a business that somehow aligns many clients with developers so that the cost of development can be spread out among many clients.

What will such a business look like?  That's an exercise I will leave to my more entrepreneurial readers.

Further Reading

Thursday, April 22, 2010

New Features In Bash Version 4.x - Part 4

In this final installment of our series, we will look at perhaps the most significant area of change in bash version 4.x: arrays.

Arrays In bash

Arrays were introduced in version 2 of bash about fifteen years ago.  Since then, they have been on the fringes of shell programming.  I, in fact, have never seen a shell script "in the wild" that used them.  None of the scripts on LinuxCommand.org, for example, use arrays.

Why is this?  Arrays are widely used in other programming languages.  I see two reasons for this lack of popularity.  First, arrays are not a traditional shell feature.  The Bourne shell, which bash was designed to emulate and replace, offers no array support at all.  Second, arrays in bash are limited to a single dimension, an unusual limitation given that virtually every other programming language supports multi-dimensional arrays.

Some Background

I devote a chapter in my book to bash arrays but briefly, bash supports single dimension array variables.  Arrays behave like a column of numbers in a spread sheet.  A single array variable contains multiple values called elements.  Each element is accessed via an address called an index or subscript.  All versions of bash starting with version 2 support integer indexes.  For example, to create a five element array in bash called numbers containing the strings "zero" through "four", we would do this:

bshotts@twin7:~$ numbers=(zero one two three four)

After creating the array, we can access individual elements by specifying the array element's index:

bshotts@twin7:~$ echo ${numbers[2]}
two

The braces are required to prevent the shell from misinterpreting the brackets as wildcard characters used by pathname expansion.

Arrays are not very useful on the command line but are very useful in programming because they work well with loops.  Here's an example using the array we just created:

#!/bin/bash

# array-1: print the contents of an array

numbers=(zero one two three four)

for i in {0..4}; do
    echo ${numbers[$i]}
done

When executed, the script prints each element in the numbers array:

bshotts@twin7:~$ array-1
zero
one
two
three
four

mapfile Command

bash version 4 added the mapfile command.  This command copies a file line-by-line into an array.  It is basically a substitute for the following code:

while read line
    array[i]="$line"
    i=$((i + 1))
done < file

with mapfile, you can use the following in place of the code above:

mapfile array < file

mapfile handles the case of a missing newline at the end of the file and creates empty array elements when it encounters a blank line in the file.  It also supports ranges within the file and the array.

Associative Arrays

By far, the most significant new feature in bash 4.x is the addition of associative arrays.  Associative arrays use strings rather than integers as array indexes.  This capability allow interesting new approaches to managing data.  For example, could create an array called colors and use color names as indexes:

colors["red"]="#ff0000"
colors["green"]="#00ff00"
colors["blue"]="#0000ff"

Associative array elements are accessed in much the same way as integer indexed arrays:

echo ${colors["blue"]}

In the script that follows, we will look at several programming techniques that can be employed in conjunction with associative arrays.  This script, called array-2, when given the name of a directory, prints a lsting of the files in the directory along with the names of the the file's owner and group owner.  At the end of listing, the script prints a tally of the number of files belonging to each owner and group.  Here we see the results (truncated for brevity) when the script is given the directory /usr/bin:

bshotts@twin7:~$ array-2 /usr/bin
/usr/bin/2to3-2.6                        root       root      
/usr/bin/2to3                            root       root      
/usr/bin/a2p                             root       root      
/usr/bin/abrowser                        root       root      
/usr/bin/aconnect                        root       root      
/usr/bin/acpi_fakekey                    root       root      
/usr/bin/acpi_listen                     root       root      
/usr/bin/add-apt-repository              root       root
.
.
.
/usr/bin/zipgrep                         root       root      
/usr/bin/zipinfo                         root       root      
/usr/bin/zipnote                         root       root      
/usr/bin/zip                             root       root      
/usr/bin/zipsplit                        root       root      
/usr/bin/zjsdecode                       root       root      
/usr/bin/zsoelim                         root       root      

File owners:
daemon    :     1 file(s)
root      :  1394 file(s)

File group owners:
crontab   :     1 file(s)
daemon    :     1 file(s)
lpadmin   :     1 file(s)
mail      :     4 file(s)
mlocate   :     1 file(s)
root      :  1380 file(s)
shadow    :     2 file(s)
ssh       :     1 file(s)
tty       :     2 file(s)
utmp      :     2 file(s)

Here is a listing of the script:

     1    #!/bin/bash
     2    
     3    # array-2: Use arrays to tally file owners
     4    
     5    declare -A files file_group file_owner groups owners
     6    
     7    if [[ ! -d "$1" ]]; then
     8        echo "Usage: array-2 dir" >&2
     9        exit 1
    10    fi
    11    
    12    for i in "$1"/*; do
    13        owner=$(stat -c %U "$i")
    14        group=$(stat -c %G "$i")
    15        files["$i"]="$i"
    16        file_owner["$i"]=$owner
    17        file_group["$i"]=$group
    18        ((++owners[$owner]))
    19        ((++groups[$group]))
    20    done
    21    
    22    # List the collected files
    23    { for i in "${files[@]}"; do
    24        printf "%-40s %-10s %-10s\n" \
    25            "$i" ${file_owner["$i"]} ${file_group["$i"]}
    26    done } | sort
    27    echo
    28    
    29    # List owners
    30    echo "File owners:"
    31    { for i in "${!owners[@]}"; do
    32        printf "%-10s: %5d file(s)\n" "$i" ${owners["$i"]}
    33    done } | sort
    34    echo
    35    
    36    # List groups
    37    echo "File group owners:"
    38    { for i in "${!groups[@]}"; do
    39        printf "%-10s: %5d file(s)\n" "$i" ${groups["$i"]}
    40    done } | sort

Line 5: Unlike integer indexed arrays, which are created by merely referencing them, associative arrays must be created with the declare command using the new -A option.  In this script we create five arrays as follows:
  1. files contains the names of the files in the directory, indexed by file name
  2. file_group contains the group owner of each file, indexed by file name
  3. file_owner contains the owner of each file, indexed by file name
  4. groups contains the number of files belonging to the indexed group
  5. owners contains the number of files belonging to the indexed owner

Lines 7-10: Checks to see that a valid directory name was passed as a positional parameter.  If not, a usage message is displayed and the script exits with an exit status of 1.

Lines 12-20:  Loop through the files in the directory.  Using the stat command, lines 13 and 14 extract the names of the file owner and group owner and assign the values to their respective arrays (lines 16, 17) using the name of the file as the array index.  Likewise the file name itself is assigned to the files array (line 15).

Lines 18-19:  The total number of files belonging to the file owner and group owner are incremented by one.

Lines 22-27:  The list of files is output.  This is done using the "${array[@]}" parameter expansion which expands into the entire list of array element with each element treated as a separate word.  This allows for the possibility that a file name may contain embedded spaces.  Also note that the entire loop is enclosed in braces thus forming a group command.  This permits the entire output of the loop to be piped into the sort command.  This is necessary because the expansion of the array elements is not sorted.

Lines 29-40:  These two loops are similar to the file list loop except that they use the "${!array[@]}" expansion which expands into the list of array indexes rather than the list of array elements.

Further Reading

The Linux Command Line
  • Chapter 36 (Arrays)
  • Chapter 37 (Group commands)
Other bash 4.x references:
A Wikipedia article on associative arrays:
The Complete HTML Color Chart:
Other installments in this series: 1 2 3 4

Ubuntu 10.04 RC Has Been Released

For those of you following along with my Getting Ready For Ubuntu 10.04 series, the Release Candidate has just come out.  LWN has the release announcement.

Wednesday, April 21, 2010

Thursday, April 15, 2010

stat

I was going to write the next installment in my New Features In Bash Version 4.x series today, but in thinking about the examples I want to use, I thought I should talk about the stat command first.

We're all familiar with ls.  It's the first command that most people learn.  Using ls you can get a lot of information about a file:

bshotts@twin7:~$ ls -l .bashrc
-rw-r--r-- 1 bshotts bshotts 3800 2010-03-25 13:18 .bashrc

Very handy.  But there is one problem with ls; it's output is not very script friendly.  Commands like cut cannot easily separate the fields (though awk can, but we're not talking about that yet).  Wouldn't it be great if there was a command that let you get file information in a more flexible way?

Fortunately there is such a command.  It's called stat.  The name "stat" derives from the word status.  The stat command shows the status of a file or file system.  In it's basic form, it works like this:

bshotts@twin7:~$ stat .bashrc
  File: `.bashrc'
  Size: 3800          Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d    Inode: 524890      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/ bshotts)   Gid: ( 1000/ bshotts)
Access: 2010-04-15 08:46:22.292601436 -0400
Modify: 2010-03-25 13:18:09.621972000 -0400
Change: 2010-03-27 08:41:31.024116233 -0400

As we can see, when given the name of a file (more than one may be specified), stat displays everything the system knows about the file short of examining its contents.  We see the file name, its size including the number of blocks it's using and the size of the blocks used on the device.  The attribute information includes the owner and group IDs, and the permission attributes in both symbolic and octal format.  Finally we see the access (when the file was last read), modify (when the file was last written), and change (when the file attributes were last changed) times for the file.

Using the -f option, we can examine file systems as well:

bshotts@twin7:~$ stat -f /
  File: "/"
    ID: 9e38fe0b56e0096d Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 18429754   Free: 10441154   Available: 9504962
Inodes: Total: 4685824    Free: 4401092

Clearly stat delivers the goods when it comes to file information, but what about that output format?  I can't think of anything worse to deal with from a script writer's point-of-view (actually I can, but let's not go there!).

Here's where the beauty of stat starts to shine through.  The output is completely customizable.  stat supports printf-like format specifiers.  Here is an example extracting just the name, size, and octal file permissions:

bshotts@twin7:~$ stat -c "%n  %s  %a" .bashrc
.bashrc  3800  644

The -c option provides basic formatting capabilities, while the --printf option can do even more by interpreting backslash escape sequences:

bshotts@twin7:~$ stat --printf="%n\t%s\t%a\n" .bashrc
.bashrc    3800    644

Using this format, we can produce tab-delimited output, perfect for processing by the cut command.  Each of the fields in the stat output is available for formatting.  See the stat man page for the complete list.

Further Reading
  • The stat man page
The Linux Command Line:
  • Chapter 10 (file attributes and permissions)
  • Chapter 21 (cut command)
  • Chapter 22 (printf command)
A Wikipedia article on the stat() Unix system call from which the stat command is derived:

Tuesday, April 13, 2010

New Features In Bash Version 4.x - Part 3

In this installment, we are going to look at a couple of commands that have been updated in bash 4.x

read Enhancements

The read command has gotten several small improvements.  There is one however that I thought was a real standout.  You can now provide a default value that will be accepted if the user presses the Enter key.  The new -i option is followed by a string containing the default text.  Note that for this option to work, the -e option (which enables the readline library) must also be specified.  Here is an example script:

#!/bin/bash

# read4: demo new read command feature

read -e -p "What is your user name? " -i $USER
echo "You answered: '$REPLY'"

When the script is executed, the user is prompted to answer, but a default value is supplied which may be edited if desired:

bshotts@twin7:~$ read4
What is your user name? bshotts
You answered: 'bshotts'

case Improvements

The case compound command has been made more flexible.  As you may recall, case performs a multiple choice test on a string.  In versions of bash prior to 4.x, case allowed only one action to be performed on a successful match.  After a successful match, the command would terminate.  Here we see a script that tests a character:

#!/bin/bash

# case4-1: test a character

read -n 1 -p "Type a character > "
echo
case $REPLY in
    [[:upper:]])    echo "'$REPLY' is upper case." ;;
    [[:lower:]])    echo "'$REPLY' is lower case." ;;
    [[:alpha:]])    echo "'$REPLY' is alphabetic." ;;
    [[:digit:]])    echo "'$REPLY' is a digit." ;;
    [[:graph:]])    echo "'$REPLY' is a visible character." ;;
    [[:punct:]])    echo "'$REPLY' is a punctuation symbol." ;;
    [[:space:]])    echo "'$REPLY' is a whitespace character." ;;
    [[:xdigit:]])   echo "'$REPLY' is a hexadecimal digit." ;;
esac

Running this script produces this:

bshotts@twin7:~$ case4-1
Type a character > a
'a' is lower case.

The script works for the most part, but fails if a character matches more than one of the POSIX characters classes.  For example the character "a" is both lower case and alphabetic, as well as a hexadecimal digit.  In bash prior to 4.x there was no way for case to match more than one test.  In bash 4.x however, we can do this:

#!/bin/bash

# case4-2: test a character

read -n 1 -p "Type a character > "
echo
case $REPLY in
    [[:upper:]])    echo "'$REPLY' is upper case." ;;&
    [[:lower:]])    echo "'$REPLY' is lower case." ;;&
    [[:alpha:]])    echo "'$REPLY' is alphabetic." ;;&
    [[:digit:]])    echo "'$REPLY' is a digit." ;;&
    [[:graph:]])    echo "'$REPLY' is a visible character." ;;&
    [[:punct:]])    echo "'$REPLY' is a punctuation symbol." ;;&
    [[:space:]])    echo "'$REPLY' is a whitespace character." ;;&
    [[:xdigit:]])   echo "'$REPLY' is a hexadecimal digit." ;;&
esac

Now when we run the script, we get this:

bshotts@twin7:~$ case4-2
Type a character > a
'a' is lower case.
'a' is alphabetic.
'a' is a visible character.
'a' is a hexadecimal digit.

The addition of the ";;&" syntax allows case to continue on to the next test rather than simply terminating.  There is also a ";&" syntax which permits case to continue on to the next action regardless of the outcome of the next test.

Further Reading

The bash man page:
  • The Compound Commands subsection of the SHELL GRAMMAR section.
  • The SHELL BUILTIN COMMANDS section.
The Bash Hackers Wiki:
The Bash Reference Manual
Other installments in this series: 1 2 3 4

Thursday, April 8, 2010

Linux News: Ubuntu 10.04 Beta 2 Has Been Released

The second beta has just come out.  For those of you following along with my Getting Ready For Ubuntu 10.04 series, you can get the latest version here:

http://www.ubuntu.com/testing/lucid/beta2

New Features In Bash Version 4.x - Part 2

Now that we have covered some of the minor improvements found in bash 4.x, we will begin looking at the more significant new features focusing on changes in the way bash 4.x handles expansions.

Zero-Padded Brace Expansion

As you may recall, bash supports an interesting expansion called brace expansion.  With it, you can rapidly create sequences.  This is often useful for creating large numbers of file names or directories in a hurry.  Here is an example similar to one in my book:

bshotts@twin7: ~$ mkdir -p foo/{2007..2010}-{1..12}
bshotts@twin7: ~$ ls foo
2007-1   2007-4  2008-1   2008-4  2009-1   2009-4  2010-1   2010-4
2007-10  2007-5  2008-10  2008-5  2009-10  2009-5  2010-10  2010-5
2007-11  2007-6  2008-11  2008-6  2009-11  2009-6  2010-11  2010-6
2007-12  2007-7  2008-12  2008-7  2009-12  2009-7  2010-12  2010-7
2007-2   2007-8  2008-2   2008-8  2009-2   2009-8  2010-2   2010-8
2007-3   2007-9  2008-3   2008-9  2009-3   2009-9  2010-3   2010-9

This command creates a series of directories for the years 2007-2010 and the months 1-12.  You'll notice however that the list of directories does not sort very well.  This is because the month portion of the directory name lacks a leading zero for the months 1-9.  To create this directory series with correct names, we would have to do this:

bshotts@twin7:~$ rm -r foo
bshotts@twin7:~$ mkdir -p foo/{2007..2010}-0{1..9} foo/{2007..2010}-{10..12}
bshotts@twin7:~$ ls foo
2007-01  2007-07  2008-01  2008-07  2009-01  2009-07  2010-01  2010-07
2007-02  2007-08  2008-02  2008-08  2009-02  2009-08  2010-02  2010-08
2007-03  2007-09  2008-03  2008-09  2009-03  2009-09  2010-03  2010-09
2007-04  2007-10  2008-04  2008-10  2009-04  2009-10  2010-04  2010-10
2007-05  2007-11  2008-05  2008-11  2009-05  2009-11  2010-05  2010-11
2007-06  2007-12  2008-06  2008-12  2009-06  2009-12  2010-06  2010-12

That's what we want. but we had to basically double the size of our command to do it.

bash version 4.x now allows you prefix zeros to the values being expanded to get zero-padding when the expansion is performed.  For example:

No leading zeros:

bshotts@twin7:~$ echo {1..20}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

One leading zero:

bshotts@twin7:~$ echo {01..20}
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20

Two leading zeros:

bshotts@twin7:~$ echo {001..20}
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020

...and so on.

With this new feature, our directory creating command can be reduced to this:

bshotts@twin7:~$ rm -r foo
bshotts@twin7:~$ mkdir -p foo/{2007..2010}-{01..12}
bshotts@twin7:~$ ls foo
2007-01  2007-07  2008-01  2008-07  2009-01  2009-07  2010-01  2010-07
2007-02  2007-08  2008-02  2008-08  2009-02  2009-08  2010-02  2010-08
2007-03  2007-09  2008-03  2008-09  2009-03  2009-09  2010-03  2010-09
2007-04  2007-10  2008-04  2008-10  2009-04  2009-10  2010-04  2010-10
2007-05  2007-11  2008-05  2008-11  2009-05  2009-11  2010-05  2010-11
2007-06  2007-12  2008-06  2008-12  2009-06  2009-12  2010-06  2010-12

Case Conversion

One of the big themes in bash 4.x is upper/lower-case conversion of strings.  bash adds four new parameter expansions and two new options to the declare command to support it.

So what is case conversion good for?  Aside from the obvious aesthetic value, it has an important role in programming.  Let's consider the case of a database look-up.  Imagine that a user has entered a string into a data input field that we want to look up in a database.  It's possible the user will enter the value in all upper-case letters or lower-case letters or a combination of both.  We certainly don't want to populate our database with every possible permutation of upper and lower case spellings.  What to do?

A common approach to this problem is to normalize the user's input.  That is, convert it into a standardized form before we attempt the database look-up.  We can do this by converting all of the characters in the user's input to either lower or upper-case and ensure that the database entries are normalized the same way.

The declare command in bash 4.x can be used to normalize strings to either upper or lower-case.  Using declare, we can force a variable to always contain the desired format no matter what is assigned to it:

#!/bin/bash

# ul-declare: demonstrate case conversion via declare

declare -u upper
declare -l lower

if [[ $1 ]]; then
        upper="$1"
        lower="$1"
        echo $upper
        echo $lower
fi

In the above script, we use declare to create two variables, upper and lower.  We assign the value of the first command line argument (positional parameter 1) to each of the variables and then display them on the screen:

bshotts@twin7:~$ ul-declare aBc
ABC
abc

As we can see, the command line argument ("aBc") has been normalized.

bash version 4.x also includes four new parameter expansions that perform upper/lower-case conversion:

FormatResult
${parameter,,}Expand the value of parameter into all lower-case.
${parameter,}Expand the value of parameter changing only the first character to lower-case.
${parameter^^}Expand the value of parameter into all upper-case letters.
${parameter^}Expand the value of parameter changing on the first character to upper-case (capitalization).

Here is a script that demonstrates these expansions:

#!/bin/bash

# ul-param - demonstrate case conversion via parameter expansion

if [[ $1 ]]; then
        echo ${1,,}
        echo ${1,}
        echo ${1^^}
        echo ${1^}
fi

Here is the script in action:

bshotts@twin7:~$ ul-param aBc
abc
aBc
ABC
ABc

Again, we process the first command line argument and output the four variations supported by the new parameter expansions.  While this script uses the first positional parameter, parameter my be any string, variable, or string expression.

Further Reading

The Linux Command Line
  • Chapter 8 (covers expansions)
The Bash Reference Manual
The Bash Hackers Wiki
Other installments in this series: 1 2 3 4

Tuesday, April 6, 2010

New Features In Bash Version 4.x - Part 1

As I mention in the introduction to The Linux Command Line, the command line is a long lasting skill.  It's quite possible that a script that you wrote ten years ago still works perfectly well today.  But even so, every few years the GNU Project releases a new version of bash.  While I was writing the book, version 3.2 was the predominate version found in Linux distributions.  In February of 2009 however a new major version of bash (4.0) appeared and it began to show up in distributions last fall.  Today the current version of bash is 4.1 and it, too, is beginning to show up in distributions such as Ubuntu 10.04 and (I presume, since I haven't checked yet) Fedora 13.

So what's new in bash?  A bunch of things, though most of them tend to be rather small.  In this series we will look at features that, I feel, are of the most use to ordinary shell users starting with a couple of the small ones.

Finding Your Version

How do you know if you are using the latest, greatest bash?  By issuing a command of course:

me@linuxbox: ~$ echo $BASH_VERSION
4.1.2(1)-release

bash maintains a shell variable called BASH_VERSION that always contains the version number of the shell in use.  The example above is from my Ubuntu 10.04 test machine and it dutifully reveals that we are running bash version 4.1.2.

Better help

The help command, which is used to display documentation for the shell's builtin commands, got some much needed attention in the new version of bash.  The command has some new options and the help text itself has been reformatted and improved.  For example, here is the result of the help cd command in bash 3.2:

bshotts@twin2:~$ help cd
cd: cd [-L|-P] [dir]
    Change the current directory to DIR.  The variable $HOME is the
    default DIR.  The variable CDPATH defines the search path for
    the directory containing DIR.  Alternative directory names in CDPATH
    are separated by a colon (:).  A null directory name is the same as
    the current directory, i.e. `.'.  If DIR begins with a slash (/),
    then CDPATH is not used.  If the directory is not found, and the
    shell option `cdable_vars' is set, then try the word as a variable
    name.  If that variable has a value, then cd to the value of that
    variable.  The -P option says to use the physical directory structure
    instead of following symbolic links; the -L option forces symbolic links
    to be followed.

The same command in bash 4.1:

bshotts@twin7:~$ help cd
cd: cd [-L|-P] [dir]
    Change the shell working directory.
  
    Change the current directory to DIR.  The default DIR is the value of the
    HOME shell variable.
  
    The variable CDPATH defines the search path for the directory containing
    DIR.  Alternative directory names in CDPATH are separated by a colon (:).
    A null directory name is the same as the current directory.  If DIR begins
    with a slash (/), then CDPATH is not used.
  
    If the directory is not found, and the shell option `cdable_vars' is set,
    the word is assumed to be  a variable name.  If that variable has a value,
    its value is used for DIR.
  
    Options:
        -L    force symbolic links to be followed
        -P    use the physical directory structure without following symbolic
        links
  
    The default is to follow symbolic links, as if `-L' were specified.
  
    Exit Status:
    Returns 0 if the directory is changed; non-zero otherwise.

As you can see, the output is more "man page-like" than the previous version, as well as better written.  help also includes two new options, -d which displays a short description of the command and the -m option which displays the help text in full man page format.

New Redirections

It is now possible to combine both standard output and standard error from a command and append it to a file using this form:

command &>> file

Likewise, it is possible to pipe the combined output streams using this:

command1 |& command2

where the standard output and standard error of command1 is piped into the standard input of command2.  This form may be used in place of the traditional form:

command1 2>&1 | command2

Further Reading

Here are some change summaries for the 4.x version of bash:
Other installments in this series: 1 2 3 4

Thursday, April 1, 2010

Script: average

A few weeks ago, I was cruising the Ubuntu forums and came across a question from a poster who wanted to find the average of a series of floating-point numbers.  The numbers were extracted from some other command and were output in a column.  He wanted a command line incantation that would take the column of numbers and return the average.  Several people answered this query with clever one-line solutions, however I thought that this problem would be a good task for a script.  Using a script, one could have a solution that was a little more robust and general purpose.  I wrote the following script, presented here with line numbers:


     1    #!/bin/bash
     2    
     3    # average - calculate the average of a series of numbers
     4    
     5    # handle cmd line option
     6    if [[ $1 ]]; then
     7        case $1 in
     8            -s|--scale)    scale=$2 ;;
     9            *)             echo "usage: average [-s scale]" >&2
    10                           exit 1 ;;
    11        esac
    12    fi
    13    
    14    # construct instruction stream for bc
    15    c=0
    16    {    echo "t = 0; scale = 2"
    17        [[ $scale ]] && echo "scale = $scale"
    18        while read value; do
    19    
    20            # only process valid numbers
    21            if [[ $value =~ ^[-+]?[0-9]*\.?[0-9]+$ ]]; then
    22                echo "t += $value"
    23                ((++c))
    24            fi
    25        done
    26    
    27        # make sure we don't divide by zero
    28        ((c)) && echo "t / $c"
    29    } | bc

This script takes a series of numbers from standard input and prints the result.  It is invoked as follows:

average -s scale < file_of_numbers

where scale is an integer containing the desired number of decimal places in the result and file_of_numbers is a file containing the series of number we desire to average.  If scale is not specified, then the default value of 2 is used.

To demonstrate the script, we will calculate the average size of the programs in the /usr/bin directory:

me@linuxbox:~$ stat --format "%s" /usr/bin/* | average
81766.66

The basic idea behind this script is that it uses the bc arbitrary precision calculator program to figure out the average.  We need to use something like bc, because arithmetic expansion in the shell can only handle integer math.

To perform our calculation, we need to construct a series of instructions and pipe them into bc.  This task comprises the bulk of our script.  In order to do something that complicated, we employ a shell feature known as a group command.  Starting with line 16 and ending with line 29 we capture all of the standard output and consolidate it into a single stream.  That is, all of the standard output produced by the commands on lines 16-29 is treated as though it is a single command and piped into bc on line 29.

We'll look at our group command piece by piece.  As you know, an average is calculated by adding up a series of numbers and dividing the sum by the number of entries.  In our case, the number of entries is stored in the variable c and the sum is stored (within bc) in the variable t.  We start our group command (line 16) by passing some initial values to bc.  We set the initial value of the bc variable t to zero and the value of scale to our default value of two (the default scale of bc is zero).

On line 17, we evaluate the scale variable to see if the command line option was used and if so, pass that new value to bc.

Next, we start a while loop that reads entries from our standard input.  Each iteration of the loop causes the next entry in the series to be assigned to the variable value.

Lines 20-24 are interesting.  Here we test to see if the string contained in value is actually a valid floating point number.  To do this, we employ a regular expression that will only match if the number is properly formatted.  The regular expression says, to match, value may start with a plus or minus sign, followed by zero or more numerals, followed by an optional decimal point, and ending with one or more numerals..  If value passes this test, an instruction is inserted into the stream telling bc to add value to t (line 22) and we increment c (line 23), otherwise value is ignored.

After all of the numbers have been read from standard input, it's time to perform the calculation,  First, we test to see that we actually processed some numbers.  If we did not, then c would equal zero and the resulting calculation would cause a "division by zero" error, so we test the value of c and only if it is not equal to zero we insert the final instruction for bc.

This script would make a good starting point for a series of statistical programs.  The most significant design weakness of the script as written is that it fails to check that the value supplied to the scale option is really an integer.  That's an improvement I will leave to my faithful readers...

Further Reading

The following man pages:
  • bc
  • bash (the "Compound Commands" section, covers group commands and the [[]] and (()) compound commands)
The Linux Command Line
  • Chapter 20 (regular expressions)
  • Chapter 28 (if command, [[]] and (()) compound commands and && and || control operators)
  • Chapter 29 (the read command)
  • Chapter 30 (while loops)
  • Chapter 35 (arithmetic expressions and expansion, bc program)
  • Chapter 33 (positional parameters)
  • Chapter 37 (group commands)