Moving servers -or- How not to spend Labor Day weekend
Posted by Bill at 9:14am on September 11th, 2007Our hosting company, Total Choice Hosting, does a great job for us. We’ve been with them since Cameron and I started the company in 2003. We started with a single small hosting account, then moved to a resellers account, a larger resellers account, and in May 2006 decided we should have our own dedicated server. It worked really well for us and the TCH tech staff kept our system software up to date and were very quick to fix our problems. While TCH managed the systems software, the machine was physically in a co-location facility in Phoenix.
Last spring, TCH said they were building their own operations center and would be moving all the servers from the co-lo facilities into their NOC. TCH feels they did not get the quickest of responses from some of the co-los when there were hardware failures and wanted to do a better job for their clients (like us!).
Our turn came up for this Labor Day weekend. The week before, we spent a lot of time making sure we were ready and calling and emailing our clients to make sure they were prepared. The move would be invisible to most of our clients since the techs would copy all the accounts over a few days before the move. The shift from the old to the new server is instantaneous because they change the local DNS servers to point to the new server so the outside world did not have to change. (Later we changed the official nameservers to directly point to the new server.) However several of our clients have quite active online businesses so we had to take extra care to copy databases at the last minute and in one case, I actually took one website offline during the changover.
The move started Friday night around 9 p.m. I watched things for a while but it was pretty boring so went to bed. It’s hard to get excited watching packets go across the network. Saturday morning I found out that all but one website had already been moved so started testing the sites. They were looking good so brought the one offline website back online and went out to play.
About then I found out we had a few things that didn’t go as planned. There were a couple databases that had some minor corruption which was easy to fix. The new server had different firewall rules and that caused one feature on a website to break. The techs opened the correct outbound port to fix that. The server also has stricter rules on what websites can do which caused problems for a couple websites. I fixed those problems. Some files were changed late Friday on the old server and had to be manually moved to the new one. There were a couple email glitches that were fixed. And a few nightly scripts that were lonely for the old server and blew up but with the appropriate introductions, they seem to running happily on the new server.
By the Tuesday afternoon after Labor Day, we had solved all the problems we knew about. It was time to send out an email to our clients asking if they are had any other problems. And apparently there weren’t other problems because all we got were some “thank you” emails.
So how did it go? Overall, pretty well in spite of the hiccups. The vast majority of clients had no problems during the transition and the ones that did were rapidly fixed. In retrospect, we were too cocky about how well prepared we were. The team held a “lessons learned” meeting so we will be better prepared if we have another server move. We have documented various items we will specifically check and will have more of the staff available to do that testing. I am comfortable it will go better next time.
Actually, the rest of the company will do a better job next time. I plan to be on vacation.






