The Narcoleptic Computer and the Ghost of Jack Daniel

Apr 20, 2006 06:22

Ever since I built it, it's been a battle to keep my computer powered up. I'm referring to the current incarnation of Metal, the dual-proc Opteron I wrote about a few months back. I say "current incarnation," because this is not the first computer I built in this case. The first incarnation of Metal was a shiny dual-proc Athlon MP 2600 I had built a few years ago. It met an untimely death at the hands of Mr. Jack Daniel himself: I fell asleep at my desk, and knocked the remainder of my Old Fashioned over. It landed on top of the computer and it poured into a fan vent on top of the case, with predictable results.

(And yes, that Old Fashioned was of the heretical soda-filled kind.)



After the Old Fashioned incident, I reinstated older computers, retiring Metal's carcass for awhile, and offloading the damaged motherboard and CPUs to a friend. I lived with some combination of my old Pentium II 300, my aging Mac and, for a time, chrispee42's old dual Pentium II 450. Anyway....

Well, last fall, I decided to rebuild Metal. From day 1, something wasn't right about it.

After the Old Fashioned incident, I cleaned out the case really well. The new Metal had almost entirely new hardware: New power supply, new motherboard, new CPUs, new RAM, new DVD-RW drive, new hard drives... The only "old" things were my Plextor CD-RW drive, an 80GB HD, and the shiny case.

From the first moment I powered it up, it was a rough ride.

I plugged in the machine, and threw the master power switch on the back. The machine powered up instantly--it didn't wait for me to push the "on" button on the front. See, these ATX power supplies have a master on/off switch on the back that determines whether the machine can power up at all. When that's set to "on," it's still supposed to stay off until you push the on button on the front. Otherwise, the PC's supposed to be in a powered off state, with a handful of things powered in a standby mode. So, this was definitely odd. I figured the BIOS must've been set up oddly (e.g. had "wake on LAN" or other such things enabled), so I shrugged it off.

That is, until it shut off nearly as quickly as it turned on.

I had numerous problems getting the machine to run at first. The power supply I bought didn't have an ATX 8X connector. The 8X connector provides supplemental +12v lines to power two CPUs. It had a 4X connector, though. That was my first problem. I rigged up an 8X connector from spare parts and got over that hurdle.

But the machine would mysteriously shut down and re-power it itself during some of this debugging. I spent plenty of time reseating things, checking heat sinks etc. I eventually got it stable, and got the machine booted. The machine stayed up for a few months until one day, *POOF*, it shut down with no warning. And it wouldn't power up after that.

I spent the better part of a weekend trying to get it to come back after that. It wouldn't power up and stay powered up. And when it did, I had trouble getting it to boot. With all the random self-powering/depowering, it eventually wouldn't even get past the power-on self-text (POST).

I went on a debugging rampage. I dropped down to 1 CPU and 1GB RAM. I swapped out RAM sticks until I found a pair that allowed me to POST. The machine was still randomly powering up and down though. So, I replaced the power supply with a higher-spec unit, an Antec Neo HE550. (I'm quite pleased with it, BTW. Much quieter and cooler.) Didn't fix it. I replaced the motherboard. Didn't fix it.

I finally, just through random tweaking, got the machine to come back up, cut down to 1 CPU and 1GB RAM, with the other CPU and 1GB RAM set aside. And, for a couple more months, it stayed up. Until a few weeks ago.

At that time, I started running some rather large-footprint computations on here (hey, I bought this system for a reason, ok?) so I decided to try to get the second CPU and RAM working. I eventually determined that the other 1 GB RAM was actually bad. The other CPU, though, works like a champ. After some mysterious episodes of self-power-cycling like a crack addled monkey, I unscrewed and rescrewed the motherboard a couple times, suspecting a mobo-to-case short, and eventually it settled down, and all was good, right?

Well, not quite. On Monday, our fair city experienced rolling blackouts as temperatures soared over 100 and Texans cranked up the A/C. (Yes, over 100 on April 17th.) My computer shut off after running out the UPS. It didn't come back afterwards. It'd power up, stay up for 3-5 seconds, and power down. Sometimes, the power switch stopped responding altogether. A couple times, I managed to get it to stay powered up, but then it shut down after a few minutes. Just *poof* without warning. WTF? Is this thing possessed? I started referring to this as the Ghost of Jack Daniel.

At this point, I still suspected mobo-to-case short. The random shutoffs seemed related to physical interactions with the case-e.g. resting my foot on it triggered one shut down, and sitting down in my chair coincided with another shutdown. Tuesday night, I decided to debug it. I was millimeters away from buying a new case.

It was then that I noticed, after a random shutdown that left it catatonic (it did not respond to the power button at all), that lights were still on in the case. The LAN light was on. The CPU-inserted light was on. The machine was in the powered-off standby mode I mentioned above. If it had been a short, the power supply would have gone into self-protect mode. Those lights wouldn't be on. Hmmm....

And then it hit me. All this time, (and it should be plain as day in the narrative above), it was the power button itself that was the problem. When I spilled that Old Fashioned in the computer, some of it must've either gotten on the connector or on the switch itself, causing *it* to occasionally short itself out.

See, on an ATX system, no matter what the computer itself is doing, if you hold the power switch in for a few seconds, it forces the machine to shut off. Ordinarily, it just asks the OS to do shut down or go to a sleep/hibernate mode. The idea is, you could tap the power switch and your machine would shut down gracefully, or you could hold and force it off-useful if you're machine froze. I had the tap-to-power-off mode disabled. Nothing disables the ungraceful force-it-off mode,though. So, what was happening was that when the switch decided to get in a partially-shorted-out mood, it'd cause the machine to power itself up and down and up and down. Wiggle things right, and it's stay powered. They wiggle back and *BAM*: Random power-offs. It really was the Ghost of Jack Daniel, hiding somewhere in the power switch.

I unhooked the power switch, and just moved the connection for the reset switch over to the power switch position. Both are momentary contact switches, and I never use reset (and only rarely hit the power switch), so that seemed like a good move. And all was good, until last night.

I was surfing away as a nifty thunderstorm blew in last night. Some lightning hits nearby, and *POOF*! The system shuts down. Everything else in the room was fine. I was like, "WTF? This is on a friggin' UPS!"

Thinking it a fluke, I power it back up. A couple minutes later, more lightning and *POOF*! Again, "WTF?!"

I go look at the UPS, a shiny APC SmartUPS 700. At $320 a pop, not exactly a Wal-Mart UPS. I power it off and back on, and it fails self-test. The battery's toast. Apparently, when I ran out the UPS during the rolling blackout, I took the battery out with it. Dang nab it. So that's today's chore: Go get a new battery for this UPS, and perhaps a beefier UPS for this machine. It seems I'm running too close to its load limits. (At idle, I use about 2/3rd its rated capacity. Something about having 2 CPUs and 4 HDs.)

*le sigh*

So, can I please, please, please have a computer that doesn't shut down randomly? Is that too much to ask?

Hopefully, by the weekend, I will.

geeky, computer

Previous post Next post
Up