Creating A Package List
In order to get a grip on the extra software I have installed on my desktop, I started out just writing a list of everything I saw in the desktop menus that did not appear on my 10.04 test systems. This is all the obvious stuff like Thunderbird, Gimp, Gthumb, etc., but what about the stuff that's not on the menu? I know I have installed many command line programs too. To get a complete list of the software installed on the system, we'll have to employ some command line magic:
me@twin7$ dpkg --list | awk '$1 == "ii" {print $2}' > ~/package_list.old.txt
This creates a list of all of the installed packages on the system and stores it in a file. We'll use this file to compare the package set with that of the new OS installation.
Making A Backup
The most important task we need to accomplish before we install the new OS is backing up the important data on the system for later restoration after the upgrade. For me, the files I need to preserve are located in /etc (the system's configuration files. I don't restore these, but keep them for reference), /usr/local (locally installed software and administration scripts), and /home (the files belonging to the users). If you are running a web server on your system, you will also probably need to backup portions of the /var directory as well.
There are many ways to perform backups. My systems normally backup every night to a local file server on my network, but for this exercise we'll use an external USB hard drive. We'll look at two popular methods: rsync and tar.
The choice of method depends on your needs and on how your external hard drive is formatted. The key feature afforded by both methods is that they preserve the attributes (permissions, ownerships, modification times, etc.) of the files being backed up. Another feature they both offer is the ability to exclude files from the backup because there are a few things that we don't want.
The rsync program copies files from one place to another. The source or destination may be a network drive, but for our purposes we will use a local (though external) volume. The great advantage of rsync is that once an initial copy is performed, subsequent updates can be made very rapidly as rsync only copies the changes made since the previous copy. The disadvantage of rsync is that the destination volume has to have a Unix-like file system since it relies on it to store the file attributes.
Here we have a script that will perform the backup using rsync. It assumes that we have an ext3 formatted file system on a volume named BigDisk and that the volume has a backup directory:
#!/bin/bash
# usb_backup - Backup system to external disk drive using rsync
SOURCE="/etc /usr/local /home"
EXT3_DESTINATION=/media/BigDisk/backup
if [[ -d $EXT3_DESTINATION ]]; then
sudo rsync -av \
--delete \
--exclude '/home/*/.gvfs' \
$SOURCE $EXT3_DESTINATION
fi
The script first checks that the destination directory exists and then performs rsync. The --delete option removes files on the destination that do not exist on the source. This way a perfect mirror of the source is maintained. We also exclude any .gvfs directories we encounter. They cause problems. This script can be used as a routine backup procedure. Once the initial backup is performed, later backups will be very fast since rsync identifies and copies only files that have changed between backups.
Our second approach uses the tar program. tar (short for tape archive) is a traditional Unix tool used for backups. While its original use was for writing files on magnetic tape, it can also write ordinary files. tar works by recording all of the source files into a single archive file called a tar file. Within the tar file all of the source file attributes are recorded along with the file contents. Since tar does not rely on the native file system of the backup device to store the source file attributes, it can use any Linux-supported file system to store the archive. This makes tar the logical choice if you are using an off-the-shelf USB hard drive formatted as NTFS. However, tar has a significant disadvantage compared to rsync. It is extremely cumbersome to restore single files from an archive if the archive is large.
Since tar writes its archives as though it were writing to magnetic tape, the archives are a sequential access medium. This means to find something in the archive, tar must read through the entire archive starting from the beginning to retrieve the information. This is opposed to a direct access medium such as a hard disk where the system can rapidly locate and retrieve a file directly. It's like the difference between a DVD and a VHS tape. With a DVD you can immediately jump to a scene whereas with a VHS tape you have to scan down the entire length of the tape until you get to the desired spot.
Another disadvantage compared to rsync is that each time you perform a backup, you have to copy every file again. This is not a problem for a one time backup like the one we are performing here but would be very time consuming if used as a routine procedure.
By the way, don't attempt a tar based backup on a VFAT (MS-DOS) formatted drive. VFAT has a maximum file size limit of 4GB and unless you have a very small set of home directories, you'll exceed the limit.
Here is our tar backup script:
#!/bin/bash
# usb_backup_ntfs - Backup system to external disk drive using tar
SOURCE="/etc /usr/local /home"
NTFS_DESTINATION=/media/BigDisk_NTFS/backup
if [[ -d $NTFS_DESTINATION ]]; then
for i in $SOURCE ; do
fn=${i//\/}
sudo tar -czv \
--exclude '/home/*/.gvfs' \
-f $NTFS_DESTINATION/$fn.tgz $i
done
fi
This script assumes a destination volume named BigDisk_NTFS containing a directory named backup. While we have implied that the volume is formatted as NTFS, this script will work on any Linux compatible file system that allows large files. The script creates one tar file for each of the source directories. It constructs the destination file names by removing the slashes from the source directory names and appending the extension ".tgz" to the end. Our invocation of tar includes the z option which applies gzip compression to the files contained within the archive. This slows things down a little, but saves some space on the backup device.
Other Details To Check
Since one of the goals of our new installation is to utilize new versions of our favorite apps starting with their native default configurations, we won't be restoring many of the configuration files from our existing system. This means that we need to manually record a variety of configuration settings. This information is good to have written down anyway. Record (or export to a file) the following:
- Email Configuration
- Bookmarks
- Address Books
- Passwords
- Names Of Firefox Extensions
- Others As Needed
Ready, Set, Go!
That about does it. Once our backups are made and our settings are recorded, the next thing to do is insert the install CD and reboot. I'll see you on the other side!
Further Reading
The following chapters in The Linux Command Line
- Chapter 16 - Storage Media (covers formatting external drives)
- Chapter 19 - Archiving And Backup (covers rsync, tar, gzip)
- rsync
- tar
Other installments in this series: 1 2 3 4 4a 5
"dpkg --get-selections > packages.list" is a simpler way to get a list of what's been added/removed. Then "dpkg --set-selections < packages.lisr && sudo apt-get dselect-upgrade"
ReplyDelete