[UFO Chicago] Yay LNX-BBC

Jesse Becker jesse_becker@yahoo.com
Thu, 27 Mar 2003 21:29:03 -0800 (PST)


I forgot to mention this at the meeting tonight, but I
wanted to thank Nate, et al, for the LNX-BBC disks.  I used
one this past week to fix a particularly annoying
workstation that boots off a two disk IDE RAID-1 array. 
There are no other disks in the system.

The hardware is a dual-xeon workstation, with an on-board
Promise 20267 asic (this is the same as a FastTrak 100 PCI
card).  Promise has limited support for Linux systems, and
has deigned to provide a binary-only module that runs on
very, very specific versions of Redhat's *stock* kernel. 
It is also *ONLY* available as a module, and cannot be
built into the kernel.  Furthermore, it is built outside of
the kernel proper; you cannot do anything from 'make
(x|menu)config'.  When everything is configured correctly,
the system works quite well, and the disk mirror runs very
quickly (i.e. no performance loss from standard disks), and
can rebuild in the background.

The problem is when the system isn't configured as it
should.  It's worth noting that the boxes came
pre-configured from Microway--I don't setup systems to do
dumb things like this...

Last week, this box was rebooted or shut off because it's
owner went away on vacation.  At some point, the kernel
package was removed, playing havoc on everything in
/lib/modules.

The mirror appears as a SCSI device, /dev/sda in this case,
and lilo is configured to work off of it.  Since the
FastTrak module is required to actually see the drive in
the first place, a ramdisk is required.  The only modules
included on the ramdisk is FastTrak.o, scsi_mod, and
sd_mod.  The filesystem are all EXT3...which was built as a
module...and all the modules were removed, along with the
rest of the kernel RPM.

So, on boot, the system comes up, freaks out, and fails to
mount /usr /var/ and /tmp (/home is on nfs) because the
ext3.o module is gone.  There are other kernels present
(2.4.19 with xfs patches), but those modules won't work,
even if forced.

"No problem," I think, "I'll just grab the kernel RPM off
the 'net, and install it."  Whoops, no eepro100 module
either.  "[censored]. Fine, I use these handy RH7.3 CDs I
have."  Whoops, no ide-cd module either.

"[CENSORED] with a [CENSORED] and [CENSORED] until you
[CENSORED].  Grrr... then I'll [CENSORED] [CENSORED]
[CENSORED] some more."

I don't have the modules I need, and I can't get them
either.  Furthermore, I can *NOT* boot off of a rescue
CD/disk (including the LNX BBC disk at this point), to fix
this probem, because nothing out there actually has the
FastTrak module.  Even if they did, it's unlikely that they
use the specific version supported by the module.

Essentially, I can boot the system in a half usable state
to screw with the hard drive, but not have network access,
or I can boot off the BBC disk, have network access, but
not touch the hard disk.  (it's worth noting that I could
have broken the mirror, accessed a single disk at a time,
then reenabled the mirror, but that was a last, and time
consuming, resort)

Enter the humble floppy disk.  Oh, did I mention that
ide-floppy.o also was built a module, and I thus couldn't
access it either when booting from the RAID?

I did have enough forethought to make a boot disk before I
completely trashed the system.

The BBC, however, could read the floppy disk, instantly
found a network driver (the e100.o module instead of
eepro100), and could manage the CDROM.  It just couldn't
actually read the hard disk.  It also doesn't have a
compiler, and building the FastTrak modules requires one.

What I was able to do, however, was this:

1)  Mount floppy disk
2)  Copy initrd.img off floppy, and copy it to a different
machine.
3)  Mount initrd.img file as a loopback device, and add the
ext3.o and eepro100.o modules to it.
4)  Copy initrd.img back to the BBC machine, and thus onto
the floppy disk.
5)  Reboot, and discover that I also need the scsi_mod and
sd_modules.  It's difficult to find unresolved symbols for
a module on a system running different kernel versions.  If
there's a way to do this, I'd like to know.
6)  Repeat steps 1-4, but with the correct modules.
7)  Reboot into single user mode, mount the partitions, and
bring up the network device.
8)  Grab the kernel RPM off the net (faster than off the
CD), and install it.
9)  Fix, and install, LILO
10)  Sacrifice a goat, remove the floppy, and reboot.
11)  Repeat steps 6-9, but this time remembering to mount
/boot--since it too is a separate partition that I'd
neglected--and use the correct kernels
12)  Sacrifice a goat and a chicken, and reboot.
13)  Success.  I made a few minor tweaks, and did a few
more test reboots, but it all worked at this point.

Soooo... If you read this far, congratulations.  You win a
prize to be chosen and paid for by General Zod
(http://www.i-mockery.com/GeneralZod/whoiszod.asp).

I do want to thank the LNX-BBC folks for a nice CD
distribution, and especially for being able to run
'trivial-net-config' (or is it '-setup'?  I forget), and
having all the defaults work for DHCP.

Lessons learned:

* Promise makes decent hardware, but crappy software. 
Driver support is pathetic.  There is a mostly-closed
source package that can, in theory, be used with other
kernels; I've had mixed luck with it.

* Don't let Microway configure you Linux box for you.

--Jesse

__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com