One of the unique traits about our computer consulting company, ASSI, is that we respond to emergency calls for new clients. Many IT companies will not walk blindly into a business that is not on retainer and has experienced a critical failure. There are many instances where we walk into a situation where a critical service has failed and the organization needs to be back up and running ASAP. One such call came recently from an airport and is a good example of what we find when walking into a system-down situation.
We received the call that the airport’s systems had failed and neither the failover nor backup systems were working as intended. There was no email available to their president, management, pilots, technical staff, sales staff, support staff, ticketing staff, and so on. No access to contacts on cell phones since they were stored on offsite servers. No reminders for appointments or conference calls, no vendor communication. To make matters worse, they were in the middle of rolling out new services, so increased communication from travel agents across the world was expected and was unavailable.
The airport had their own IT support staff, and their day-to-day operations were all what you would hope to see in an organization that relies on communications to operate. There were multiple internet service providers for redundancy, and there had been significant investments made into the IT infrastructure. The IT staff provided full details of service account names and passwords, so everything was very well documented.
So how could a company that had made the proper investments in IT hardware come to a screeching halt? Lack of talent? No, the staff were well experienced in enterprise networks. Lack of budget? No, as stated, there were significant investments made in the IT infrastructure and redundancy had been built into the network. Lack of supervision? Not at all, plenty of management reviewing the plans. The answer was that they had never put a second set of eyes on their technical recovery plans. They were so entrenched in the day-to-day operations that they had never stopped to think whether their plans would work if any one part in the plan failed to go as planned.
Two ways to test your IT Disaster Recovery Plan
There are two ways to test your disaster recovery scenario. The first suggestion to review your Disaster Recovery plans is to hire an independent consultant to review your IT Disaster Recovery Plan. Even if you outsource your IT support, having an independent review gives you the best idea if your plans translate well to someone who is unfamiliar with the day-to-day operations. In the event of a disaster, you never know who in your support team may be affected and therefore unavailable to answer the little questions that may not be documented.
Back in 1999 my co-worker Louis had little plastic bugs taped all over his monitor. When I asked why he had them all there, he said it was to remind him of all the “bugs” that we know are in a system but choose to ignore because we have a workaround. By keeping that in mind, Louis made sure that these items were either addressed or documented. If you or one of any member of your staff couldn’t come to work, could another person get around the “bugs” that you have accepted as a part of the day-to-day procedures? These issues must be addressed and repaired so that they do not become pests to someone who is not familiar with them and having a fresh set of eyes on your disaster recovery plan can help to uncover the deficiencies that would affect your recovery times.
The second way to test your Backup Recovery scenario is to simulate a complete disaster. While costlier up front, knowing that you can recover from a disaster and how long the process takes is invaluable in the event of a disaster.
Choose a random day and tell your IT staff that you were hit with ransomware and that all of the company’s data is gone, both onsite and in the cloud. Do they have another recent backup of all your company’s data somewhere? Even better and more thorough, have your IT staff locked out of the IT offices and server room. Tell them to act as if the building sustained substantial damage and ask them to estimate how long it will it take to restore services. Where will they the get equipment to restore the data that was stored offsite? Nothing is better to prepare a company for disaster that a mock disaster scenario.
Restoration of Services
The state of the infrastructure was interesting because the airport staff were aware that some projects for failover of critical systems existed which were incomplete. These projects were labeled as “would be nice to have but not needed for day-to-day operations.” This happens all too often where the IT staff is busy supporting your organization and critical projects are put on the back burner.
How can you expect your staff to know about new technologies unless you send them off for training and build a test lab environment mimicking your live infrastructure? It is rare that companies directly hire a team of people for specific projects, and finding temporary staff for these high-level tasks is risky. It is high-level projects like this that best benefit from a team of experts. Not doing the job right the first time can cause your organization to come to a standstill with no estimated restoration time. That lack of control is a feeling that you should never have to experience.
For the airport issue, ASSI staff reconfigured some of their onsite legacy equipment which restored communications immediately. We then worked with their staff to repair the failed hardware and to restore their data from the failed backup system. Had we not reconfigured the legacy equipment the restoration of email communication would not have been immediate. It would have taken 36 hours to restore email communication had they followed their documented plan.
If you would like help reviewing your infrastructure for potential weaknesses, call Advanced Systems Solutions Inc. and we will help you to ensure that you can respond to any disaster that may occur.
Disclaimer: The above information is not intended as technical advice. Additional facts or future developments may affect subjects contained herein. Seek the advice of an IT Professional before acting or relying upon any information in this communiqué.