A few days ago our hosting service needed to reboot our servers so they could be relocated. It had been a very long time since they were rebooted. The master DNS server wasn't set to restart DNS so it was down. Today, the DNS slave gave up and stopped serving the domain.
We have multiple availability monitors and they had issues as well.
- One had an old email address for a cell phone
- One person got an email, but was leaving the house and figured someone else would work on it.
- One email got auto-filed in a log mailbox so wasn't seen.
All of the issues have been identified and fixed. This is our first real outage in almost 2 years... not too bad.
The outage only affected the SD servers... data delivery was not affected.
Even though we're volunteers here, that's not an excuse and we should have done better.