work week off to a bad start already

Jul 07, 2003 04:55

Uhg.

I've worked the last 7 hours instead of sleeping. I started getting _LOTS_ of pages from Nagios for unrelated systems I monitor. [Plug: Nagios is great. I'm monitoring 111 hosts and 207 services and I've barely scrathed the surface of what it can do] Tried to figure out what was happening by connecting with SSH but things were really confusing with some systems down and others just unreachable from any non adjacent hosts, network issues that suggested the firewall had been reconfigured or was failing in some _interesting_ way.

Rode into work and talked to the coworkers who were already there.

Apparently there was a problem with two out of four power conditioners in the data center at work (ironic, eh?), some time Sunday evening.

Various hardware failures, including the primary Cisco PIX 515 (which now fails to communicate over the serial console), causing the problem with the secondary that was supposed to take over.

primary alone, secondary alone, primary then secondary, secondary then primary, nothing helped.

Contacted Cisco and told them the serial numbers involved, but they had those numbers listetd for companies in New York and Michigan. Apparently both units have been refurbished, and Cisco hadn't bothered to change the serial number or update their own database to reflect the fact that the units had been sent back out to a different customer. Much difficulty finding our contract number because we've already had these PIX replaced before, and don't happen to have the serial numbers of the old units handy, and we aren't associated with the new units. Gah!

Once we got them to agree that they needed to provide us with support even if they had fucked up their own database I talked to a support engineer who started off by asking me to configure the secondary PIX to allow her to SSH into the secondary PIX. I explained that I didn't want to do that and couldn't anyway because the PIX were not external firewalled and I would have to get someone else to change the external firewall to NAT her traffic to the this firewall....

She ask me to run 'show tech' which produced about 300 lines of output, and cut and paste them to an email and send them to her. I ask her (the Cisco support engineer) if she knew how to send the output to the network (like the tftp dump of the config file), but she didn't know how to do that. I ask if she could tell me how to turn off the paginating, but she didn't know how (I later found I could say 'set pager 0'). So I set the font size to unreadable, scaled the XTerm up to a huge size, and cut and paste the output.... Yeachk.

She didn't have any useful suggestions.

I tried power cycling the switches (more Cisco hardware) adjacent to the PIX and this fixed the problem.

I'd think clustered firewalls would send gratuitous ARP messages. Oh well.
Previous post Next post
Up