When automation fails..backup, backup, backup it will save your business as it did mine

I got a notification from my primary web server that it was having some issues. When i logged in and looked at the logs and the system behavior…I knew something was amiss. I decided to restore the system from my first level, backups which are provided by the datacenter itself. That restore simply hung up. The datacenter said they had a technical issue with the backup server. Ok, I had them kill the restore and i rebuilt the server with a new operating system restore. I then used my control panel’s automated restore to grab the large daily backups i have it make. Well that took a while to pull in that very large file from the remote backup server. When that finally completed and the restore process began, i knew i had a few hours to wait for it to chew through that process. Hours later, I had a working server, or so I thought. Only about half of the sites came back online. Things appeared to be ok but these sites simply refused to load…and they all redirected back to my main site here. Some deep digging revealed the backup of the control panel settings itself had some kind of issue. We are now at 5am this morning and i had to get some sleep after being up all night. Fast forward to 430pm eastern…and i am now reloading each account from the daily, individual account backups the system makes twice a day in two different locations. I would up manually reconfiguring the server settings, and I am now restoring each account, one by one. It’s slow work, but required if you want to have as many backups as you can without adding too much in costs. As i type this, I have about 10 sites back online and am going one by one for the rest. I should have the rest online in the next 5 hours.

Once I get this done I intend to make some changes:

The primary dns will reside in Los Angeles instead of Chicago. The primary web server will continue to be housed in Chicago for balanced, nationwide performance.
Do not include the core control panel backups int he daily disaster recovery backups due to probable corruption.
continue weekly testing of all phases of backup recovery testing
Make the above change to all client servers as well and do increased testing for the next 14 days as a precaution.

The individual account backups serve two purposes, give clients my 14 days backup archive and also serve as the oh crap final backup archive. Today those individual backups have mean the difference between recovering my hosting service and loosing it entirely. While I do not guarantee I can save your data in terms of hosting…it would be stupid of me to NOT do this in case I hit a major issue like I did today. i can say with confidence this was not malware or a hacker, this was the control system somehow got itself corrupted and it cascaded from there. Once i figured out where the issue came from, I knew what changes to make to prevent those type of failures from reoccurring. Even the big folks have unexpected issues that take them offline…and this is one time i had something that hit me that i didn’t plan for. Luckily, my own disaster recovery setup prevented this issue from taking my hosting business out. Just a reminder, ETC Maryland has had this multi layer, malware and ransomware resistant architecture since before 2018. My hosting was launched with this type of architecture from day 1.

*UPDATE* 953 pm August 6, 2025. All sites using the aforementioned single account disaster recovery backups have been completed and restored. It took a ton of time but this is why i have a multi-layer backup architecture. My clients just happen to benefit from it if they have a problem while making changes.

If you are not sure if your website data is sufficiently secure, contact us for a free evaluation.