Odd outage

Apr 10, 2008 21:13

One of the other admins for cepheid.org did a bit of maintenance Wednesday which resulted in a minor outage for some web services. That's been fixed. Details below the cut.


One of the fun details of changing admins is that configuration information sometimes doesn't get immediately passed on. In this case, the root password for MySQL was known only to a select few. Or one person, specifically, who was unavailable.

So when a user requested access to MySQL for a new website, the only thing to do was to clobber the old password and make a new one. One of the other admins did this, and in the process also cleaned up some permission problems that had been plaguing the database for a while.

Cleaning up those permissions had an unexpected effect, though. It caused a few databases, including the one that drives the Cepheid.org webmail, to be inaccessible by their respective users. The database not answering effectively crippled webmail, though anyone trying to access mail by SSHing to cepheid.org and running a mail client there (such as pine) were not affected. Similarly, anyone using a POP or IMAP client (such as Eudora, Outlook, Thunderbird, etc.) should not have been affected.

I received notification Wednesday afternoon about the problem and postponed real work to deal with it (I made up the time spent dealing with this problem by working long after quitting time.) It was fixed fairly quickly once I was notified that there was a problem.

TODO:
Set up an automatic monitoring system so that if, for example, webmail breaks, I'm notified immediately.
Set up alternate webmail using different database for redundancy.
Make sure people know to contact me.

ETA: The date on this is way wrong. I didn't think that Livejournal would set a non-NOW date for communities. Weird.
Previous post Next post
Up