Wednesday, May 16, 2012

GRUB load/boot issues.

All that you wanted to know about GRUB

Boot your system using a "live CD" or "live DVD".
Open a shell window and become root: sudo su
For clarity, let’s discuss things using the shell variables $partition and $device. An example might be: partition=/dev/sda6 ; device=/dev/sda
You need to know which partition holds the Linux system you want to boot. If you remember this, define $partition and $device accordingly, and skip to the next step. If you need to figure it out,
  • get a list of disk devices: ls /dev/sd? /dev/hd?
  • look at each such device: cfdisk $device or fdisk -l $device
    Look at the partition sizes and partition labels to find the partition that you want to boot. Define $partition and $device accordingly.
Create a mountpoint: install -d /mnt/radicula
Mount the partition containing your Linux: mount $partition /mnt/radicula
Reinstall grub: grub-install --root-directory=/mnt/radicula $device
Beware: You want to install grub on the device (e.g. /dev/sda). If you install it on the partition (e.g. /dev/sda6), the grub-install program won’t complain, but the results won’t be what you wanted.
That’s probably enough to get you going. If you want to give it a try, shut down the live CD system, eject the CD, and reboot in the normal way from your favorite device (/dev/sda in the example).
If you want to improve your chances, you can do a little more work before rebooting.
If the Live CD system has a /boot directory, move it out of the way: mv /boot /xxxboot
Put the target system’s boot directory in its place: ln -s /mnt/radicula/boot /
Back up the existing grub control file, namely grub.cfg (for Grub Version 2) and/or menu.lst (for Grub Version 1). If both exist, back up both of them. cd /boot/grub ; cp grub.cfg grub.cfg#1 ; cp menu.lst menu.lst#1
Update the grub control file: update-grub.
Note that in Grub Version 1, update-grub writes the file menu.lst, whereas in Grub Version 2, it invokes grub-mkconfig to write the file grub.cfg.
Now you really should be ready to shut own the Live CD system, remove the CD, and reboot in the normal way.

1.2  Follow-Up

The procedures in section 1.1 were meant to get the system functioning again as quickly as possible. Now that the system is up and running, so that the time pressure is off, we can do some housekeeping:
  1. Optional: You may want to make sure your copy of the software is not corrupted: apt-get install --reinstall grub # (optional)
  2. You should make a backup of the MBR as described in section 3.1.
  3. Highly recommended: Rebuild the grub configuration file: update-grub
  4. Install the latest and greatest grub in the MBR: grub-install --recheck /dev/hda
In ideal situations, the work described in this section doesn’t accomplish much, because it duplicates the work done in section 1.1. However, consider the situation where the Live CD you used to restore the MBR is using a different version of grub. Maybe one system is out of date, or maybe just exercised the option to use a different version. This is your chance to install the grub version that your system thinks should be installed. If you don’t do this, you risk having some ugly problems later.

2  Scenarios and Alernatives

There are several scenarios that can lead to an MBR being overwritten or otherwise rendered unsatisfactory. Examples include:
  • On a dual-boot system, every time you install (or reinstall) Windows, it will almost certainly overwrite your MBR. See section 2.1.
  • A failed upgrade can leave grub in a bad state. In particular, if the system was using Grub Version 1 before the upgrade and wants to use Grub Version 2 afterwards, sometimes things get confused. I’ve seen it happen.
  • Viruses and other malicious software are fond of overwriting the MBR.
  • et cetera.

2.1  Dual Boot

Suppose you have a dual boot system, i.e. one that sometimes boots Linux and sometimes boots Windows. Every time you install (or reinstall) Windows, it installs its own boot loader into the MBR. This is a problem, because the MS boot loader will not load anything except the MS operating system ... in contrast to grub, which will happily allow you to boot almost anything: Linux, memtest86, various MS products, et cetera.
Some folks recommend installing MS before installing Linux, so that the Linux installation process will set up the MBR for you. This is fine as far as it goes, but it is not always possible. For instance, sometimes it is necessary to reinstall or upgrade the MS stuff, days or months or years after Linux was installed.
The grub-reinstall procedure described in this document takes only a few minutes, so feel free to install MS after Linux if you find it necessary or convenient to do so. MS will trash the MBR, but you can restore it using the techniques described here.

3  Backing Up and Restoring the MBR

3.1  Backup

It never hurts to make a backup of the MBR.
dd if=/dev/sda of=host1-sda.mbr count=1
If you have two or more Linux systems, use system "1" to store the backups pertaining to system "2" and vice versa. If you have only one system, store the backups on floppy ... and don’t forget where you put the floppy. (It does no good to store the backup on the same drive as the MBR you are backing up.)

3.2  Restore

Keep in mind that sector zero contains both the stage-0 boot code and the primary partition table. Therefore, before restoring the boot sector, you have to make a decision:
  • In the scenario where something trashes sector 0 including the partition table, then you want to restore the whole thing. This can rescue from what would otherwise be a very bad situation.
    dd if=host1-sda.mbr of=/dev/sda count=1
  • In the scenario where the partition table is not trashed, and has possibly changed since you backed up the MBR, you want to restore the boot code without disturbing the current partition table. You need to splice the backed-up boot code onto the current partition table before writing anything to sector 0. The procedure is:
    Keep a copy, just to be safe: dd if=/dev/sda of=damaged.mbr count=1
    Grab the good boot code from backup: dd if=host1-sda.mbr bs=1 count=444 > new.mbr
    Tack on the current partition table: dd if=/dev/sda bs=1 skip=444 count=68 >> new.mbr
    Write to disk: dd if=new.mbr of=/dev/sda count=1

4  Details

Some discussion of the MBR and the basic boot process can be found in reference 1.

4.1  Live CDs

  • Ubuntu: The Ubuntu Live CD that you used to install Ubuntu also serves as a nice Live CD, suitable for many purposes including the grub reinstallation process described here. So be sure to keep that CD handy. If you need to download a new copy, see reference 2.
  • Debian: The usual Debian install disk is not, alas, a fully-featured live CD. A rundown of the various Debian live CDs can be found in reference 3.
  • Slackware: RIP (reference 4) is a Slackware live CD, suitable for tasks such as grub reinstallation.

4.2  Superuser Privileges

We now discuss the step sudo su
For good reasons, when you fire up a typical live CD, you are logged in as an ordinary user, not the superuser.
You can exert superuser privileges on a command-by-command basis by prefixing each command with "sudo" ... but since every command we are about to do requires superuser privileges, it is easier to just become superuser once and for all by saying sudo su

4.3  Mountpoint

We now discuss the step install -d /mnt/radicula
Note that “radicula” is Latin for “rootlet” i.e. “little root”.
The name of the mountpoint doesn’t matter. Reasonable choices might include /tmp/root or /mnt/sda6. It’s just some directory. Any available directory can be used as a mountpoint.

4.4  Mounting Your Linux Partition

We now discuss the step mount /dev/sda6 /mnt/radicula
Not much to say, really. If you want the operating system to treat your partition as a collection of files and directories (as opposed to a bucket of bits) you need to mount it.

4.5  Grub Installation

We now discuss the step grub-install --root-directory=/mnt/radicula /dev/sda
The --root-directory=/mnt/radicula option tells grub where to look for the grub directory during the installation process. The grub directory is /mnt/radicula/boot/grub on typical distributions such as Ubuntu and Debian, but may be /mnt/radicula/grub on some *bsd setups.
The grub-install program uses the grub directory in several ways during the installation process. Among other things, it goes there to read the file. It also goes there to write the core.img file. A new core.img file gets written each time you run grub-install.
Keep in mind that the Unix file system is essentially a graph (in the sense of graph theory) with edges and nodes. The edges are the paths, i.e. directory names and file names. The nodes do not have names. The nodes are where the data is stored. So: the inode of interest will be reached by the path "/mnt/radicula" during the installation process. Grub assumes this inode will be reached by the simple path "/" later, when the system on /dev/sda6 is actually booting and running.
The idea that the same inode could be reached by one path now and a different path later makes perfect sense if you think about it the right way. The grub-install program understands the distinction between the two, which is what makes it possible to reinstall grub using the easy procedure described in this document.
This distinction is, alas, not well documented. You could read the grub manpage all day and not learn anything about this distinction. The grub-install --help message says
  --root-directory=DIR    install GRUB images under the directory DIR
                          instead of the root directory
which seems somewhere between incomprehensible and self-contradictory. Is DIR the root directory (as suggested by the equation root-directory=DIR) ... or is DIR used "instead of the root directory" (as stated in the explanatory message)? Gaaack.

5  Using Grub Commands Directly

I hope you never need to know this. Usually the procedures described in section 1.1 make this unnecessary.
Imagine a scenario where grub is installed in the MBR correctly, but the grub configuration files are messed up, so all you get is the grub> prompt (rather than a menu of kernels that can be booted). Further imagine that you can’t fix it using the methods described in section 1.1.
You may be able to recover using the following procedure:
  • At the grub> prompt, type root (hd0,<tab>
    This will give you a listing of all the partitions on the hd0 device, along with their UUID, filesystem type, and modification date.
    If hd0 turns out to be not the device you want, try hd1 and so on.
  • Pick the partition you want, say #2, and issue the complete command: root (hd0,2)
  • At the grub> prompt, type linux /boot/vml<tab>
    This will give you a listing of all the filenames in the boot directory that start with “vml”. (If your kernel isn’t named vmlinuz-something, adapt these instructions accordingly.)
  • Pick the kernel you want, and issue the complete command, e.g.: linux /boot/vmlinuz- root=/dev/hde3
    Note that you generally have to add the root=... option to the linux command line.
    Beware that the way grub numbers disk devices {hd0, hd1, hd2, etc.} may be different from the way linux does it {sda, sdb, sdc, etc.} ... and the difference is not systematic. I have one system where hd0 corresponds to /dev/hde/. This is commonly an annoyance on systems that have a mixture of SATA and PATA devices.
    The numbering of partitions is also different, but the difference is systematic: grub numbers them starting from 0, while linux numbers them starting from 1, so grub partition (...,2) coresponds to linux partition /dev/...3 and so on.
  • At the grub> prompt, type initrd /boot/init<tab>
    This will give you a listing of all the initrd files. Pick the one that corresponds to your kernel, and issue the complete command: initrd /boot/initrd.img- or whatever.
  • Issue the boot command. The kernel should boot.
  • If the kernel panics because it could not mount the root fs, it means you guessed wrong about the root=... command-line argument. Maybe it is /dev/hda3 or /dev/sda3 or /dev/sde3. However ...
  • Remember that the kernel needs to know the root device twice, once when it is reading the initrd (initial ramdisk), and once again when it is starting the system for real. I have seen situations where the device is named differently in the two cases, in which case any device name you pick is going to be wrong in one context or the other, and the system will not boot correctly.
    The only way to handle this case is to refer to the disk by its UUID, using a construction of the form root=UUID=4240ce68-802b-4a41-8345-543fad0ec20f
    That is an obnoxious amount of typing, but with any luck you only have to do it once.
    Grub will tell you the UUID; see the first item in this list.
  • Once the system is booted, clean up the mess using the methods described in section 1.2.