Empirical Research in DRP and BCP of IT Firms

Subject: Strategic Management
Pages: 52
Words: 17428
Reading time:
61 min
Study level: Master

Abstract

Disaster recovery planning – DRP and business continuity plan – BCP have become very crucial in the current scenario with increased threats from natural and man-made disasters. These disasters effectively wipe out the intellectual and soft assets of a company; taking it a few years back, if the entire source code and transaction records are lost. DRP implementation helps an organisation to quickly restore its networks and retrieve the software applications, account details and transaction records of the customer so that losses are minimised and the company can resume normal operations in the quickest possible time.

The report has researched a framework for implementation of DRP and BCP among the 10 ten Indian IT Companies. In addition to presenting implementations of networks and architecture it has also presented the steps to be followed for DRP and BCP implementation. The report has also presented results of a research in which a survey instrument was used to gather the responses of experts in IT industries. The survey instrument helps in studying three key areas of DRP- cost of downtime, perceived importance of DRP and current state of DRP within organisation with respect to the DRP process. The instrument had three sets of questions that asked for responses for different aspects of the DRP.

The findings are that DRP systems are designed to protect at least 95% of the assets and that organisations have a positive attitude in providing budgets for DRP. All the surveyed organisations use their in house training programs, keep their systems updated, test and maintain them at regular intervals. The report is expected to help organisations and students who want to implement DRP or research this area for further study.

List of Acronyms and terms used in the Report.

Acronym Description
BCMP Business Continuity Management Plan
BCP Business Continuity Plan
BIA Business Impact Analysis
CMT Crisis Management Team
CRM Customer Relation Management
Crore 100 million = 1 crore
DAT Damage Assessment Team
DRP Disaster Recovery Planning
ERP Enterprises Resources Planning
ISDN International Subscribers Dial Up Network
IT Information Technology
ITeS Information Technology Enabled Services (includes call centres)
Lakh 1 million = 10 Lakhs
LAN Local Area Network
Mbps/ Kbps Megabytes per Second/ Kilobytes per second
ODC Offshore Development Centre
RTO Recovery time objectives
SBU Strategic Business Units or profit centres in industries
TCS Tata Consultancy Services
VPN Virtual Private Networks

Introduction

Businesses worldwide operate under conditions that are subject to change, depending on the political situation, economic, and natural conditions. Business houses are constantly under the threat of disasters, such as earthquakes, terrorist attacks, fire, riots, power outages, and stock market crash. Due to such risks, the intellectual assets, such as classified documents, source codes, and physical assets, such as infrastructure and hardware, run the risk of compromise. In such a situation, a plan must be in place to allow business to recover their intellectual and physical assets and continue the business operations, at the earliest.

It is also essential to assure clients and business partners, who have invested time and resources, that in case of a disaster, their investments could be recovered in an acceptable time frame. To handle such situations, it is important to have an IT disaster recovery plan that is implemented to counter the effects of disasters. Having a Disaster Recovery Plan (DRP) and Business Continuity Plan (BCP) becomes important, when business expands to overseas market.

While DRP and BCP will not prevent disasters from happening or prevent the loss of lives and property, they would certainly help to reduce the loss caused by delay in restarting the operations and mainly they would help in recovering very valuable company data and information. This paper discusses the important elements of a DRP and BCP for a company with global operations.

Problem Definition

Recent events such as the 9/11 attacks, Katrina hurricane, the Tsunami in south east Asia and others show that disaster, both natural and man made can strike with very little warning and totally take out the infrastructure that has been built such as buildings, whole towns and cities, cabling and any IT systems that are located in a particular location. Benton (2007) defined disaster recovery as “the process, policies and procedures of restoring operations critical to the resumption of business, including regaining access to data (records, hardware, software, etc.), communications (incoming, outgoing, toll-free, fax, etc.), workspace, and other business processes after a natural or human-induced disaster”.

While a disaster recovery would also involve reconstruction of buildings, relocating people, building roads, restoring power and communications and many other activities, this paper would be limited to discussing the disaster recovery plan for the IT systems of a company (Meade, 1993).

In the current environment, the threat from terrorists and also from nature places IT systems at a high risk. Since many companies have very strict rules regarding retrieval and storage of sensitive information, data tends to get centralized. If a disaster strikes the central server room where the data is stored, then all the company’s soft assets would be lost forever. Information about customers, business strategies and records, marketing and trading information and other details would become irrecoverable.

In such a scenario, strategic plan that protects all computer-based operations necessary for the company’s day-to-day survival is imperative. If a company loses sensitive data, then it not only loses its soft asset but also the confidence of the customers and would probably go bankrupt. With increasing use of IT systems and dependence on business-critical information, the importance of protecting irreplaceable data has become a top business need. Since many companies rely on IT systems and regard it as critical infrastructure the need for regular backup is very crucial that even after a disaster strikes, the company can begin operating within a short period of time.

Many large companies provide up to 4 percent in their IT budget on disaster recovery systems. It is estimated that 43 percent of companies that had lost data and could not replace the data went bankrupt while 51 percent had to shut down in two years while only six percent could service in the long run (Swartz, 2004).

So DRP and BCP are required to ensure that a company is able to recover quickly in case of a disaster, customer confidence is retained and that the business is able to continue.

Aims and Objectives of the Study

The present study aims to find out the information about practical problems in preparing DRP and attempts to suggest a framework for effective DRP and BCP by combining different risk identifying and management strategies into single integrated strategy with which organisations can effectively respond to changes in the form of opportunities, risks and regulations. DRP practices in the top ten Indian companies have been researched and views of experts in these companies have been presented.

Research Questions

The following research questions are proposed:

  • Assess the framework requirement for implementing DRP and BCP activities in organisations that may have only one or more development centre
  • Ascertain the architecture used by different companies that have successfully implemented DRP and BCP plans
  • Assess the extent of importance given to DRP and BCP in different IT companies, their impact exposure to risks, budget and types of applications that are covered in the DRP and BCP plans

Rationale of Research

DRP plan is intended to provide a framework within which companies can take decisions promptly during a business disruption. The objectives of this plan are (Kaye, 2006):

  • To identify major business risks.
  • To proactively minimize the risks to an acceptable level by taking appropriate preventive and/or alternative measures.
  • To effectively manage the consequences of business interruption caused by any event though contingency plans.
  • To effectively manage the process of returning to normal operations in a planned and efficient manner.

The scope of the corporate business continuity management plan document must include plans for restoring:

  • SBUs (Strategic Business Units) and all the Projects being executed by the SBUs
  • Shared services
  • Information Systems at all locations of the company

Need and Significance of the Study

The purpose of this study is to develop an effective DRP framework and BCP for IT organisations that provide products and services. Today’s business environment is characterized by brisk and unpredictable change. Some of those changes bring opportunities for business, while others bring challenges and threats to organisation. Irrespective of it, business has to be responsive and resilient by making good use of opportunities while mitigating risks.

Organisational infrastructure must be designed and planned for the continuity of business in case of any disasters. The contemporary definition of disaster concludes that it is a situation created by major events rather than event itself, and specifically the socio economic development and political consequences of event, which forms the key defining aspect of disaster. However, there are number of definitions for ‘disaster’ which focuses on actual hazard or event and its effect in terms of loss of life and damage to property.

In 1961, Fritz, for an instance, defined disasters as “events that are concentrated in time and space, in which a society, or a relatively self-sufficient subdivision of a society, undergoes severe danger and incurs such losses to its members and physical appurtenances that the social structure is disrupted and the fulfilment of all or some of the essential functions of the society is prevented” (Fritz 1961, p. 202). In 1992, the United Nations recognised that for an event to be disaster, it must overwhelm the response capability of a community (Coppola, 2007).

Disaster recovery planning is a recurring process, which has the goal of maintaining the availability of information or service, even in the event of a disaster. In the words of an IT expert from Infosys, “disaster occurs when one has an inability to perform his critical business functions within an acceptable period of time and it introduces two important issues: one, what is critical, and two, what is an acceptable period of time” (Russell, 2007). This varies from company to company.

Fluctuating business conditions are like double-edged sword, any inappropriate response could loose ground to their competitors. For example, if online banking system of TSB Lloyds fails and if it could not restore the operations with in a day, it will loose its majority of customers to its competitor bank. As said by a manager at TCS, India, “there is a need to balance between the cost of protecting business against every conceivable eventuality and the risk of not protecting at all. The means of achieving this balance is through identifying disasters that are most likely to occur and plan for business continuity in each scenario”. This rises to need for an effective framework for disaster recovery planning which can balance all key issues involved.

Most people when faced with mystifying scope of disaster planning are overwhelmed. The subject is so large with high stakes and the time needed to deal with preparatory issues is so tedious that many firms think that it is easier to do nothing. The importance of disaster recovery planning for organisations are well documented but it is however unclear whether majority of business community is aware of this. As a result, there is limited information on the acceptance of recovery planning outside academic world and even less within the business community.

The Indian IT Industry

The Indian IT/ITeS sector has earned revenues of INR 1,98,477 Crore in 2006 and this figure represents an increase over 2005 of 31 percent. The industry is expected to grow to 100 billion USD by the year 2011. These figures are due to increase in outsourcing by US, European, Pan Asian countries and the majority of companies offer software development services. Many US software product manufacturers have set up offshore development centres in India (IDC. May 2, 2007).

Table 1.1. Indian IT/ ITeS growth statistics (IDC, May 2, 2007).

(INR Crore) 2006 2007 2008 2009 2010 2011 CAGR 2006-11
Domestic IT Market 61761 75050 88011 101342 116177 132133 16.40%
Domestic ITeS Market 6650 11970 17955 24239 31511 36238 40.40%
Domestic IT/ ITeS Market 68411 87020 105966 125581 147688 168370 19.70%
IT/ ITeS Exports Revenue 130067 159889 190475 221844 254422 289857 17.40%
IT/ ITeS Industry Size 198477 246909 296441 347425 402110 458228 18.20%

Top 10 Indian IT Companies

Among the hundreds of Indian IT companies, the survey and study has concentrated on the top 10 Indian IT companies. Following table gives brief information about these companies along with their rankings.

Table 1.2. Statistics on top 10 Indian IT Companies (Rediff, July 17, 2008).

Company Name Activities Revenues Comments
Tata Consultancy Services IT outsourcing vendor, software development for foreign and Indian companies $5.7 billion (March 2008 ending)
Wipro Technologies IT outsourcing vendor, software development for foreign and Indian companies, IT BPO $5 billion (March 2008 ending)
Infosys IT outsourcing vendor, software development for foreign and Indian companies, IT BPO $ 4.3 billion (March 2008 ending)
Satyam Computer Services IT outsourcing vendor, software development for foreign and Indian companies, IT BPO, ISP services $ 2 billion (March 2008 ending)
HCL Technologies Hardware including desktops, PC, laptops, hand held, servers, software development, IT BPO $ 4.9 billion (March 2008 ending)
Tech Mahindra IT outsourcing vendor, software development for foreign and Indian companies, IT BPO, CAD/ CAM services $ 810 million (March 2008 ending)
Patni Computer Systems IT outsourcing vendor, software development for foreign and Indian companies, IT BPO, CAD/ CAM services $ 710 million (March 2008 ending)
i-flex Solutions IT outsourcing vendor, software development for foreign and Indian companies, IT BPO $ 600 million (March 2008 ending)
MphasiS IT outsourcing vendor, software development for foreign and Indian companies, IT BPO $ 538 million (March 2008 ending)
L&T Infotech IT outsourcing vendor, software development for foreign and Indian companies, IT BPO $ 500 million (March 2008 ending)

Main Research Findings

Based on the research and analysis presented in Chapter 5, the following findings have been found:

  • Based on the research findings DRP is expected to protect up to at least 95% of the intellectual assets of a company.
  • Organisations suffer losses ranging from 0.01 million USD/ hours to 45 million USD/ hours if their systems are hit by disasters and if they do not have DRP implemented.
  • The impact of a disaster due to downtimes varies from minor to critical and this indicator depends on the nature of the effected business.
  • Organisations may use multiple applications that have to be backed up in the DRP and this increases the complexity of the DRP
  • All the surveyed organisations have some form of DRP in place or in an advanced state of implementation.
  • Organisations do not think that cost of protecting business against conceivable eventuality is higher than the risk of not protecting at all.
  • With respect to cost of downtime, organisations do not think that the organisation can recover easily after disaster without effective DRP in place.
  • While some organisations had implemented an organisation wide DRP since a few years back, other took up the implementation on a project basis. All companies took efforts to keep the DRP updated as and when any changes occurred or at specific intervals.
  • The time taken to implement a DRP can vary from a week to about 6 months.
  • All the surveyed organisations prefer to use their in house team to develop test and maintain DRP.
  • All the organisations surveyed regularly practiced on going maintenance of their DRP implementation.
  • All organisations surveyed have periodic testing of their DRP systems.

Expected Outcome of the Study

The study is expected to help IT companies to assess the state of their DRP and BCP plans and would help them to create feasible plans that would protect them in the case of a disaster.

How this report is organised

The report is structured into various chapters and each chapter provides in depth discussion of different aspects related to the research.

Chapter 2. Literature Review: The chapter provides a discussion of DRP and BCP, different levels of threats that a company should prepare for. A discussion of what data to back up and risk assessment has also been provided.

Chapter 3. Framework for the DRP plan: The chapter presents frameworks for DRP implementations that have been done in different IT organisations and discusses the various steps involved in DRP implementation. Information about the implementation has been obtained through site visits to IT companies and through emails.

Chapter 4. Framework for the BCP Plan: The chapter presents frameworks for BCP implementations that have been done in different IT organisations and discusses the various steps involved in DRP implementation. Information about the implementation has been obtained through site visits to IT companies and through emails.

Chapter 5. Research Analysis and Findings: The chapter discusses results of the survey instruments that have been used to obtain information about the DRP and BCP implementations in various organisations.

Chapter 6. Conclusions and Recommendations: The chapter draws conclusions and makes recommendations for DRP and BCP implementations.

References: The section provides a list of references used in preparing the report. The range of references used include books, peer reviewed journals, reliable websites and others sources.

Litreature Review

This section provides a literature review of important concepts related to risk management and threat levels that have to be considered for the DRP activities.

Understanding the importance of DRP and BCP

When Hurricane Katrina struck the Gulf Coast in August 2005, it damaged 90,000 square miles, an area the size of Oregon. Fully 750,000 people were left homeless in New Orleans alone, and Mississippi’s coastal area had 110,000 more displaced people. The storm caused the largest migration of doctors since World War II and closed insurance offices and financial services companies along with most other businesses in the disaster area.

Communities were inundated by water, causing local government employees to flee. Fifty-four New Orleans Police Department (NOPD) employees were ultimately fired for dereliction of duty for leaving their posts during the storm, and 247 were “AWOL” one week after the storm. Eighty percent of NOPD’s 1 ,700 employees were rendered homeless by the storm, and 700 NOPD members and their families (along with 200 fire department members and their families, sheriff deputies, emergency medical services staff, and essential government workers) lived on a cruise ship rented by the Federal Emergency Management Agency (FEMA) for six months after the storm.

FEMA usually provides trailers as disaster housing, but in New Orleans there was no place to put a trailer that had the necessary water and sewer hook ups available. While the damage to physical infrastructure was large and massive, buildings can be rebuilt. The loss of data, financial records, details of transactions lost, banking accounts and credit information that was lost was irreplaceable and many people who had healthy bank accounts were rendered bankrupt in a few hours of natures fury (Edwards, 2006).

Potter (3 April 2003) reports that after the 9/11 attacks devastated the New York twin towers at the trade centre, in addition to the thousands of people who lost their lives, the buildings hosted many banking and financial institutions. Very valuable information about the records and transactions done, money transfer details, shares and stocks trading information, information on debt instruments and others were lost. The loss happened simply because the companies did not fully implement the DRP activities.

The author reports that though almost all companies back up their critical IT systems and data, more than a quarter of them still do not have a disaster recovery plan in place. Half of those that do have plans, fail to test them. Also, 15% of companies do not take their backups off-site. This is despite the fact that 92% of businesses now consider disaster recovery planning an important driver of their IT expenditure.

About 58% of businesses surveyed would suffer significant business disruption if their IT systems were not available for a day – the highest figure recorded since the surveys began. This rises to 70% of large companies. Some 68% of companies polled believe that business continuity in a disaster situation is a very important driver of their information security expenditure, and a further 24% say it is important. Only 2% say it is not very important.

Adding further, the author points that 28% of companies do not have a disaster recovery plan in place, almost half of the disaster recovery plans have not been tested in the last year and 10% of companies with a disaster recovery plan do not store backups off-site. When companies suffered a systems failure or data corruption incident, 31% had no contingency plan in place and a further 10% found their contingency plan to be ineffective.

The year 2000 problem has raised the business continuity consciousness level of business. Where year 2000 risks have been mitigated, contingency plans have been developed just in case. Businesses have been forced to assess their resilience in the face of the threat that the millennium bug will cause their systems to crash. Leveraging this learning, some IT professionals have taken the opportunity to extend contingency plans to cover not just year 2000 issues, but broader disruptions, thereby making the most of the year 2000 problem and seizing a great opportunity. In the past, disaster recovery was provided for production applications as a matter of practice.

Whatever was needed for recovery was provided at the time an application was moved into production status. If these needs were not met, the application did not go into production until they were. It was that simple. And since the mainframe that ran the applications was tightly managed, rules were easy to enforce. Computers on every desktop and client server computing, all under distributed management. The new generation of manager was under pressure to deliver, and often knew very little about the disciplines of the data centre, specifically production turnover, change management and capacity planning. In fact, these disciplines were often seen as impediments to fast action.

As the mission critical applications moved to distributed servers under distributed management, disaster recovery plans were inconsistently developed and tested. At the same time, business continuity planning became essential as technology became indispensable for the conduct of business. So disaster recovery planning conceptually broadened to encompass business continuity planning. But to this observer, both concepts are, more often than not, dealt with after the fact.

We’re inspecting disaster recovery and business continuity into existence rather than engineering them in. And there is not enough managerial support for the work that has to go into the proper approach; the people responsible for these issues are fighting for attention and budget. The lack of serious attention paid to disaster recovery has, in some cases, been enough to put mission critical applications at severe risk. The most appropriate moment to incorporate recovery and contingency is at the time of development and implementation (Facer, 2001).

Following figure shows the relation between the amount of money spent on DRP and the probability of the systems being affected.

The importance of DRP and protecting IT systems increased as more and more information systems directly interacted with the general public and there are demands for guaranteed continuous operation. IT spending on DRP is in many cases done grudgingly and IT managers are forced to watch the bottom line to ensure that projects do no fall into the red since while expenses are made there are no ‘returns’ on the investment. But such attitudes are often myopic and it is only after a disaster strikes that managers begin to realise the importance of DRP plans but by then it is too late. Another problem is that after a DRP is implemented, periodic maintenance, updating and testing is not done and one realises that vital links in the plan are missing and an incomplete DRP is as bad as not having a plan at all (Gilchrist, 2001).

Understanding Risk Analysis and Management

Risk and Threat analysis forms a very important aspect of the DRP and BCP plan and an assessment of the risk is very crucial to a business. Risk grows from threats and any unforeseen even can be a threat. There are different types of risks that DRP and BCP do not cover and these include business risks such as failed products, increased competition, change in technology and customer preferences, change in government policies and so on. Risk is associated with the uncertainty of financial loss, the variations between actual and expected results, or the probability that a loss has occurred or will occur.

A risk assessment analysis is a rational and orderly approach, and a comprehensive solution, to problem identification and probability determination. It is also a method for estimating the expected loss from the occurrence of some adverse event. The key word here is estimating, because risk analysis will never be an exact science and we are discussing probabilities. Nevertheless, the answer to most, if not all, questions regarding one’s security exposures can be determined by a detailed risk-assessment analysis.

Risk analysis provides management with information on which to base decisions. Is it always best to prevent the occurrence of a situation? Is it always possible? Is it sufficient simply to recognize that an adverse potential exists and for now do nothing but be aware of the hazard? The eventual goal of risk analysis is to strike an economic balance between the impact of risk on the enterprise and the cost of implementing prevention and protective measures.

  • A properly performed risk analysis has many benefits, a few of which are:
  • The analysis will show the current security posture (profile) of the organization.
  • It will highlight areas where greater (or lesser) security is needed.
  • It will help to assemble some of the facts needed for the development and justification of cost effective countermeasures (safeguards).
  • It will serve to increase security awareness by assessing then reporting, the strengths and weaknesses of security to all organizational levels from management to operations.

Risk analysis is not a task to be accomplished once and for all; it must be performed periodically if one is to stay abreast of changes in mission, facilities, and equipment. Also, since security measures designed at the inception of a system generally prove to be more effective than those superimposed later, risk analysis should have a place in the design or building phase of every new facility. Unfortunately, this is seldom the case.

The one major resource required for a risk analysis is trained manpower. For these reason the first analysis will be the most expensive. Subsequent analyses can be based in part on previous work history; the time required to do a survey will decrease to some extent as experience and empirical knowledge are gained. The time allowed to accomplish the risk analysis should be compatible with its objectives.

Large facilities with complex, multi shift operations and many files of data will require more time than single-shift, limited production locations. If meaningful results are to be expected, management must be willing to commit the resources necessary for accomplishing this undertaking. It is best to delay or even abandon the project unless and until the necessary resources are made available to complete it properly.

Estimating Threat Levels

There are four levels of disasters that an organization would face and the effects of each level and the disaster recovery plan would differ as per the level such as Level 1 to Level 4. Level 1 would be the least severe while Level 4 would be regarded as a catastrophe.

Disasters can be classified into (Preston, 1999):

  • Level 1 Disaster: Causes minor outage. An example of Level 1 disaster is modem failure. Some or all business processes at a location might experience minor damage, but processes will continue to run with reduced efficiency. Full processing capability of mission critical business processes and related infrastructure and people can be restored within an hour. Recovery at an alternate site may not be required (Preston, 1999).
  • Level 2 Disaster: Causes moderate outage. An example of Level 2 disaster is LAN failure. Some or all business processes at a location might experience moderate damage. Processes may or may not continue since the equipment is below the minimum capacity to run. Full processing capability of mission critical business processes and related infrastructure and people may be restored within 2 hours. An alternate recovery site may not be required for continuing business but alternate equipment or communication links may be required (Preston, 1999).
  • Level 3 Disaster: Causes severe disaster. An example of Level 3 disaster is riots. Infrastructure ceases to function. Full processing capability of all business processes from that location and related infrastructure may be restored within 1-2 days. Use of alternate recovery site will be required (Preston, 1999).
  • Level 4 Disaster: Is a catastrophe, such as earthquake, war, or a major terrorist attack. This type of disaster results in major disruption of services. Full processing capability cannot be achieved for a substantial period of time. Recovery will require use of alternate recovery site (Preston, 1999). The following table gives details of these threat levels.

Table 2.1. Threat Level Analysis (Preston, 1999).

Type Of Disaster Description
Minor Outage (Level 1) Some or all business processes at a location experience minor damage / outage but processes will continue on a degraded basis. Full processing capability of mission critical business processes and related infrastructure and people can be restored within 1 hourby getting the necessary infrastructure, people and data operational. Recovery at alternate site is determined not to be required. It is assumed that the usual office premises & people are available to the business. e.g.
  1. A link between two locations is temporarily unavailable
  2. Modem fails.
  3. Sparks in electrical connections force temporary shutdown of servers / routers in that area. Operations resumed as soon as electrical connections are repaired
  4. Virus and hacking attacks or due to improper behaviour of employees
Moderate Outage
(Level 2)
Some or all business processes at a location experience moderate damage / outage. Processes may or may not continue on a degraded basis. Full processing capability of mission critical business processes and related infrastructure and people may be restored within 4 hours. An alternate site may not be requiredfor continuing business but alternate equipment or route (in case of communication links) may be required depending on the criticality of the business process and infrastructure. It is assumed that the usual office premises and people are available to the business. e.g.
  1. Power surge damages equipment
  2. Link Failure (that can be recovered within 4 Hours)
  3. LAN Failure
Disaster
(Level 3)
A Centre has experienced severe disaster. There is a total shut down of infrastructure. Full processing capability of all business processes from that location and related infrastructure and people may be restored within 1-2 days.Use of alternate recovery site will be required. It is assumed that premises and equipment are inaccessible, but people can congregate elsewhere if required. e.g.
  1. Flood / Rain / Snow makes office premises at one of offices inaccessible.
  2. Riots / Arson at a location near one of the offices renders the office premises inaccessible.
  3. Extended power cut.
Catastrophe
(Level 4)
A centre has experienced a major disaster that will likely result in a major disruption of services. Full processing capability cannot be achieved for a substantial period of time.Recovery will require use of alternate processing site as well as offsite offices for employees over an extended period of time. e.g.
  1. War
  2. Earthquake
  3. Terrorist Attacks / Bombing
  4. Extended Communal Riots etc.

Table 1. Four Levels of Threats (Preston, 1999).

A disaster may impact an organization in the following ways (Gilchrist, 2001):

  • The organization may not be able to operate from the affected site.
  • The organization may lose critical resources, such as systems, documents, and people.
  • The organization may not be able to interact and provide services to business partners, clients, brokers, vendors, and other related financial institutions.
  • In addition to incurring financial losses, disasters may impact the credibility of the company. In extreme cases, the company may lose many of the clients.

What to Back Up

The question of what to backup is best answered by asking ‘what are the company’s soft assets? An IT company may regard its software source code, its database structure, software source code of its applications as very crucial. For example, a company such as Microsoft would consider the source code of Windows, XP, MS Office and other software applications as critical and would want to ensure that the code is recovered at any point of time.

A banking company would consider the financial records of its customers, its own receivables and credit/ debit records as very important. Banks store the account details, credit card payment and receipt details, information about mortgages and loans, Forex accounts as critical and would be interested in taking the back up of such records. A large investment and share trading company or a bank that deals in futures would consider its stock portfolio as very important.

Government defence bodies would consider details of their troop deployment, state of munitions and aircraft, status and position of different missile systems as crucial to the protection of their country and would want this information to be safe and recoverable at any point of time. So the data to be backed up would depend in what the company feels is crucial and important. Hence the data to be backed up would vary (Toigo, 2005).

Another issue that comes up is the question of data formats and the type of backup. An organization typically stores information either in encrypted form, binary code or in the form of documents such as MS Word, XLS, pdf, image files and so on and these formats have to be saved according to the organization needs. Many organizations, to preserve the integrity of their data systems usually encrypt data using 128 bit or 256-bit encryption. At any point of time during the recovery system, the encryption key should be available to authorized personnel with the required level of clearances (Toigo, 2005).

Different techniques are used for backing up data and these include the incremental back up system that writes only data that has been changed since the last backup. Considering that banks and large organizations have data sizes in the range of Terra Flops, if a daily back up of this huge ream of data was to be taken, then massive resources would be required, time used be excessive and the system would slow down. To get over this problem, incremental data back up is taken and this process ensures that only data that has been changed since the last backup is written in the back up area. Also, since backup slows down the system, company’s run the data backup process as a day end process, late in the night when very few users would be logged in (Hiatt, 2007).

It is worth to remember this statement “When it comes to back up, members of organization are paranoid. While some feel that every little bit of email or document that they have created (which would be probably be deleted by the recipient) has to be backed up, others tend to develop paranoia that their documents or writing would be available for everyone to see and they would not want to share it with others. The management has to step in at a certain stage and frame a policy on what is worth backing and what is best left on the PC of a warehouse assistant clerk” (Kaye, 2006).

Research Methodology

The term Methodology refers to the approach taken for the research process, from the theoretical framework, hypothesis to gathering and analysing of data. The term method refers to the various means by which data can be collected and analysed. The methodological assumption is concerned with the process of the research, from the theoretical underpinning to the collection and analysis of the data (Silverman, 2001).

Qualitative and Quantitative Methods

Data that is collected can be designated into two basic categories, quantitative and qualitative. This also formulates what type of research a study will be conducting: quantitative or qualitative. Denzin (2000) described quantitative research as “the research which gathers data that is measurable in some way and which is usually analysed statistically”. This type of data is mainly concerned with how much there is of something, how fast things are done, and so on.

The data collected in this instance is always in the form of numbers. In order to obtain quantitative data, one should have a specific framework about what has to be researched, what should be known, types of inputs that are admissible and so on. Such as approach can help in designing the questionnaire, make observation and so on. Denzin also defined Qualitative research as “the research that gathers data that provides a detailed description of whatever is being researched”. Both types of research have their supporters and detractors and while some claim that quantitative research is much more scientific others argue that qualitative research is required to examine a specific issue in depth.

Researchers who support that quantitative research argue that numerical data can be statistically analysed and in this way it can be established whether it is valid, reliable and whether it can be generalized. By using numerical data, these numbers can be used to compare between other studies, which also use the same numbers, the same scales, etc. With qualitative research it is not so easily possible to achieve this result, as no specific method or scale of measurement is kept. This is basically the main disadvantage of qualitative research, as their findings cannot be generalized to larger populations with a large degree of certainty and validity.

The reason that this happens is because their findings are not tested and evaluated statistically in order to establish whether they are due to chance or whether they are statistically significant and to what extent. Another advantage of quantitative to qualitative research is that qualitative research is descriptive and many times subjective too, as it depends on the researchers perspective or how the research registers certain behaviours.

Another researcher conducting the same study may observe the qualitative data, which is given in a completely different way. Quantitative research does not show this disadvantage as all the data is in the form of numbers and, therefore, it may be translated in only one possible way, that which is given from the objective value of each specific number. However Qualitative research has many advantages to offer too, which are not offered through quantitative research. It is usually through such type of research that a rich, in-depth insight can be given into an individual or a group, by being far more detailed and by recognizing the uniqueness of each individual. This type of research realizes the importance of the subjective feelings of those who are studied.

Research method used in this report

A combination of qualitative and quantitative analysis has been used for the research. The research objectives would be achieved mainly with the study of practical aspects involved in disaster recovery planning and through survey via e-mail questionnaire. To understand the industry practices in planning for business continuity, it is very much required to visit any organisation for studying various steps involved in developing DRP and to identify the key problems in the process of developing. Research was performed by visiting the top 10 Indian IT companies and interviewing experts who plan and implement these practices in their organisations. A combination of telephone interviews and survey instrument was used to obtain the required information.

The survey questions can be categorised under three main areas of DRP and BCP: cost of downtime, perceived importance of DRP and state of DRP within organisation. However, the overall industry response on these aspects will be studied with the help of survey via e-mail questionnaire.

Literature data will be accessed through University of Liverpool Library and Information Service using a selection of tertiary and secondary information sources such as the library OPAC, commercial bibliographic databases and Internet search engines and directories. Sources accessed and retrieved will be used to write the literature review.

In addition, a lot of information about the IT architecture was obtained through interviews and through emails with the IT experts in different companies.

Questionnaire Design

In the light of above discussion, to understand an effective DRP framework for SME’s, a survey instrument was designed to find out the industry response on following set of questions which helps in studying three key areas of DRP- cost of downtime, perceived importance of DRP and current state of DRP within organisation with respect to the above discussed DRP process. The instrument had three sets of questions that asked for responses for different aspects of the DRP. The questions are given as below:

Set A – Cost of downtime

  1. How many projects do organisation undertakes per annum out of which how many can you afford to lose without significant impact on company?
  2. What is the cost of lost productivity and lost revenue for every one hour of downtime?
  3. How will be the collaborative business processes with partners, suppliers and customers affected by unexpected downtime?
  4. What is the average yearly revenue of the company?
  5. Do you depend on one or more critical IT applications such as ERP or CRM?

Set B – Perceived importance of DRP

  1. Have you planned for disaster recovery? If not are you looking for developing DRP in the coming future?
  2. Do you think cost of protecting business against conceivable eventuality is higher than the risk of not protecting at all?
  3. With respect to cost of downtime, do you think the organisation can recover easily after disaster without effective DRP in place?

Set C – Current state of DRP

  1. If the organisation practices DRP, When is that prepared and when is that updated last time?
  2. How long has it taken in developing DRP and what is the major hurdle you faced in the process of developing?
  3. Have you taken help from outside consultant in developing DRP?
  4. Is your organisation practices ongoing maintenance of DRP?
  5. When the DRP is tested last time?

Study method

The survey was administered to IT experts from different Indian IT companies and interviews were also conducted to get the industry response for the above set of questions. The reason for choosing this option is that, response rate from sending questionnaires either through post or email is highly uncertain and that may hamper the continuity of dissertation. Taking this into account and with the help of personal contacts and support from friends working in various organisations, the possibility of getting appointments for interview either directly or through telephonic interviews with companies was much higher. Thus, interviews will be used as basis of industry data collection.

Data Analysis

All the data collected would be categorised as per three key areas of DRP- cost of downtime, perceived importance of DRP, current state of DRP in company. Then the interrelation between them will be determined by analysing the following issues:

  • What is the proportion of downtime cost in total company revenue?
  • What is the importance given to DRP if the above proportion rate is lower and if higher?
  • What are the changes in state of DRP with relation to proportion rate and perceived importance of DRP?

The analysed data in the light of above issues forms the basis for the development of framework for effective DRP by relating all the key areas of DRP in an efficient practical way.

Framework for DRP

This chapter provides a framework for constructing DRP network for IT and other companies that operate across multiple locations. Information for this section has been obtained by field visits and with extensive literature review and observation of actual implementation plans in different companies. The chapter forms one of the important features of the report and would help in practical implementation.

Information is the key to survival for organizations. Information could be stored either electronically or as hard copies. Disaster Recovery Plan (DRP) is a set of procedures designed to restore information systems. A DRP mostly deals with technological issues and also recommends infrastructure that should be implemented to prevent damages when a disaster occurs. A disaster can make the business processes totally or partially unavailable.

Business Continuity Plan (BCP) focuses on sustaining the business processes of a company during and after a disaster and this plan is a continuation of the DRP and cannot be implemented in isolation. A BCP lists the actions to be taken, the resources to be used, and the procedures to be followed before, during, and after a disaster. An IT disaster recovery plan is implemented for an organization in this section (Facer, 2001).

The DRP within a company is responsible for performing the business impact analysis, a process of classifying information systems resources baseline on criticality, and development and maintenance of a DRP. Tasks that need to be covered are included in the BCP document. The DRP should also maintain the BCP document up-to-date. This responsibility includes periodic reviews of the document – both scheduled (time driven) and unscheduled (Event driven).

DRP defines a Recovery Time Objective (RTO) that specifies a time frame for recovering critical business processes. The DRP meets the needs of critical business processes in the event of disruption extending beyond the time frame. Recovery capability for each Strategic Business Unit (SBU) – including all Projects being executed under the SBU – shared service, location and Offshore Development Centre are defined. In the event of any moderate / minor disaster, the recovery capability should ensure that the business processes work seamlessly without affecting any other dependent critical business processes. E.g. If the main power grid is disrupted, there must be standby facilities like generators to ensure that power is available (Facer, 2001).

Hypothetical Company Description

In this chapter, a DRP plan would be implemented for an IT company called ABC Ltd. The plan is based on literature review and actual implementations done at different IT companies and while each company may have its own modalities and priorities, the common elements of DRP are discussed. The following illustration shows how the company is organized.

The above figure shows different assets and nodes of ABC company are organized. The company has its head quarters at New York and a number of units in branches in areas such as Washington, Rochester, Syracuse and others. The company also has a number of off shore development centres and these are identified as ABC Europe, ABC Japan, ABC Australia, etc. In addition, the company has a number of clients and these are identified as Client 1, Client 2.

Defining the Organization Chart for DRP

Before implementing a DRP, it is essential that an organization chart be created that would identify key employees who would be members of the DRP team. The following figure illustrates the organization chart of ABC Ltd.

Protecting Intellectual assets with the DRP

In a business relationship, a client invests in internal resources like personnel, funds to set up infrastructure. In addition clients may provide a company with resources in the form of confidential information, raw source codes, initial drawings, machinery. In addition a company, serving its clients has similarly invested funds and other resources in the business engagement. These investments represent assets. Companies must take preventive actions, such as setting up a dedicated security team or formulate policies that help you reduce damage when disasters occur.

IT Team Security Structure

The IT Security Team of a company is responsible for implementing and maintaining the corporate security policy at all ODC locations and other support units. A dedicated Security Officer should be assigned to all the units. In addition, the company needs to conduct security awareness program for all ODCs. Following figure shows a typical IT Security Team structure.

This figure shows the structure of the IT Security Team of a company, ABC, Ltd. The figure shows the various SBUs and their locations.It also lists the responsibilities of the IT security team of the SBU and the centre.

An important point to note is that these teams are expected to only ensure that systems are started and data recovery procedures are initiated. They are not expected to act as application experts for all the running projects and in the event of a disaster. It is the individual project teams that would configure, set up and install their codes and applications.

The DRP Network Diagram

The DRP would need to cover all these units and assets. To allow quick back up and DRP procedures for the company, the following network diagram is proposed.

In the diagram, the connectivity is allowed through a primary ISDN Back Up Line and a Dial Up Line. A separate ISDN line for backup is required since the backup process consumes extra bandwidth and may slow down regular business processes.

Based on corporate security policy, all the locations with a direct Internet access/connection should be secured by deploying firewalls. You can have a dedicated team of professionals, certified in various technologies who centrally manage the firewalls. You also need to have a change management procedure that enables you to incorporate any desired change in the existing set-up within a short notice. When a disaster occurs if a backup hardware exists, it can be used in the disaster recovery plan to restore services. You can protect gateways by installing Checkpoint Firewall Modules in the organization Network.

This enterprise wide implementation is managed using a central management console. At each location a De-Militarized Zone (DMZ) must be created to protect important servers. It is also necessary to ensure that the policies installed on the Checkpoint Firewall Modules are based on the corporate network security policies. Precautions must be taken against Internet hacking and vulnerabilities. Vulnerabilities are holes or weak points in the network. Following figure shows a sample firewall installation for a location (Preston, 1999).

The Firewall would ensure that unauthorized users would not be able to enter the network when back up processes are running or when a DRP plan is being implemented during a disaster.

Developing the DRP involves the following steps (Preston, 1999).

  • Risk Assessment
  • Business Impact Analysis
  • Strategy Selection and Implementation
  • Testing
  • Maintenance

Next sections provide details of these steps.

Risk Assessment

In this phase, risks to the business processes have to be identified along with assessing existing mitigation measures, and recommend mitigation measure wherever necessary. The activities in this phase helps DRP administrators to determine the extent of the potential threat and the risk associated with the IT infrastructure and IT applications of your company. A threat is any circumstance or event that can potentially cause harm to the business. The risk assessment phase involves/includes the following (Hiatt, 2007):

  • Inventory: identifies/Documents the various business processes, hardware, software, communication links, documents, and associated people using standard templates developed by the risk assessment team.
  • Threat analysis: Identifies various threats to the business processes. It also identifies the probability of a threat being executed and the potential impact a threat will have on the business in the event of its execution. This is done using a standard template developed by the risk assessment team. The risk assessment team identifies a list of over 35 possible threats to any asset. Based on this list each location is assessed for the probability of each threat being executed and the potential impact on the business processes.
  • Vulnerability analysis: Scans critical servers and hardware devices owned by the company periodically for identifying vulnerabilities and taking corrective actions based on the audit reports. These reports should be studied for their completeness and adequacy. In addition, while arriving at the probability of a threat being executed, the existing vulnerabilities of each location must be analysed.
  • Business Risk Assessment: Includes a detailed assessment of the practices followed by the business units with respect to risk management. The risk assessment team should conduct detailed interviews using standard questionnaires with senior representatives of the business units to understand the risk management practices of the individual business units.
  • Single Point of Failure Analysis (SPOF): identifies the most vulnerable business process. A SPOF is the weakest link in a business process. Each SBU must identify the SPOF at their locations.
  • Risk Matrix: Analyses the identified risk, derived by qualitative analysis of various threats and vulnerabilities to business processes through threats and vulnerabilities analysis, business risk assessment and SPOF analysis. The risk areas are classified as Very High Risk Areas, High Risk Areas, Medium Risk Areas, and Low Risk Areas. You can also recommend mitigation measures for each risk area identified.

The following figure illustrates the risk analysis for the company.

A number of templates have to be used at this stage to gather information about a project. These would provide micro information at a project level or at a client level. Some templates that need to be used include (Ambs, 2000):

  • Template for DRP Resource Requirements: This template is used to gather data for resources that are required to prepare a DRP.
  • Template For Project: This template is used to gather data about a project and helps to create a DRP at a project level.
  • Template For Project Team Details: This template is used to gather details of the project team members. The data is used to identify key members who may need to be moved to an alternate recovery site in case of a disaster.
  • Template For Client Team Details: This template is used to gather data about the client team details. Members identified here can be contacted in case of a disaster.
  • Template For Resource Requirement at Project Locations: This template is used to gather details of resources required at the alternate recovery site.
  • Template For Project DR Alternate Site: This template is used to gather data for an alternate recovery site.
  • Template At DR Location For People And Resources: This template is useful to gather data about people and other resources required at the alternate site.
  • Template For Min Required Resources At Alternate Site: This template is used to gather data about the minimum resources required at the alternate recovery site. Details of software and hardware that would be required need to be listed.
  • Template For Project Recovery Plan: This template is used to gather data for project recovery.

A sample template is shown below:

Project Disaster Recovery Plan – Project DR Procedures
Backup And Recovery Procedures.
Indicate Backup procedures and other details for each software resource (E.g. database, code under development etc.) and paper-based resource (e.g. hard copy of contract signed with customer etc.)
Backup Procedures
Frequency of Backup Weekly
Location of Stored Data CA
File Naming Convention 8.3
Description
Responsibility of taking Backup Jane Doe
Recovery Testing Procedures.
Indicate how frequently will backed up data be tested for recovery, what will be the sampling methodology, who will test for recovery, who will approve test results etc.
Frequency of Recovery Testing Monthly
Sampling Method for Recovery Testing Random
Description
Responsibility John Doe
Recovery Procedures
Describe the procedures that will be used to recover the resource in the event of a Disaster. Detailed step by step procedure to get the application/function up and running.
Description Install oracle and import all data.
Responsibility Mike

Table 3.1. Sample Template for Risk Assessment(Ambs, 2000).

Business Impact Analysis

The overall objective in this phase of the project is to gain an understanding of the business processes and to lay the framework of a business continuity plan for the business units. A Business Impact Analysis (BIA) must be performed with the objective of (Benton, 2007):

  • Evaluating the risk to the business due to systems and/or process failures.
  • Identifying critical business processes and the associated computing applications.
  • Estimating the impact of disruption.
  • Defining the recovery time objectives for critical business processes.

Following figure illustrates the methodology used for BIA.

This figure shows the business impact analysis approach. BIA is performed by interviewing business processes owners using detailed questionnaires / templates. The primary areas on which the interviews should focus are (Benton, 2007):

  • Identification of critical business processes and critical resources and applications associated with critical business processes.
  • Interfaces between various business processes.
  • Identification of outage impacts of business function unavailability and maximum allowable downtimes.
  • Prioritisation of recovery processes through recovery time objectives.
  • The resultant BIA documented for each business process describes the following:
  • The outage impact for the business process.
  • The criticality of each business process based on the outage impact. The business processes are classified into four levels of criticality – Mission Critical, High Criticality, Medium Criticality, and Low Criticality Business Process.
  • The minimum human resource required sustaining the business process during a disaster.
  • Criticality of locations from where the business processes are executed.
  • Criticality of the IT infrastructure that support the business processes.
  • Existing recovery times for the business processes in terms of hardware acquisition time and software installation time.
  • Recovery time objectives for the business processes depending on the criticality of the business process.

Strategy Selection and Implementation

Based on the risks identified in the risk analysis phase and the RTO defined in the BIA phase, strategies are identified to adequately mitigate the risks and satisfy the RTO. The strategies included – for each business process and associated resource is (Margaret, 2007):

  • Infrastructure Strategy: Includes hardware, software, and networking redundancy.
  • Alternate Site Strategy: Defines the alternate site from where the business process will be recovered in case of disaster.
  • Equipment Strategies – Ensures availability of necessary equipment at the alternate site.
  • People Strategies – Ensures availability of critical personnel during at the alternate site. E.g.: Specialized software’s like databases, operating systems need skilled people who know what needs to be done to get the applications running quickly.
  • Other Strategies – Handles insurance, service level agreements, and annual maintenance contracts to transfer risks that cannot be mitigated directly.

In order to tackle the operational contingencies for a large organization, the BCMP outlines the BCP concept of operations. The concept of operations is based on the risk mitigation strategies identified by the BCMP and approved by the corporate centre.

DRP – BCP Structure

Based on the size, geographical spread, and complexity of the organization structure, the DRP is divided into individual BCP for the various SBUs. Each SBU, shared service, and location. The location BCP covers the infrastructure and support functions for the location, whereas the business unit BCP covers the SDLC – Software Life Cycle Development Cycle, for all projects executed from the SBU site. The shared services BCP include the continuity plan for support services, such as finance, accounts, and human resource. Depending on the type and extent of the BCP event, relevant BCP is invoked. Following illustration gives the BCMP structure for a company (Pfleeger, 2002).

Crises Team Management Structure

Each BCP identifies a Crisis Management Team (CMT) that will take charge of respective operations in the event of a disaster. The composition of the various Crisis Management Teams is depicted in the following figure (Swartz, 2004).

Process Flow to identify disaster and activate DRP

Communication lines should be established that follow guidelines for reporting and managing disasters. The process flow diagram shown in the following describes the various stages of reporting a disaster.

The CMT may decide to activate some BCP procedures even before the DAT reverts back to the CMT with the Damage Assessment Report. This ensures that in case of a severe disaster, business processes, having a low recovery time objective, are activated immediately without awaiting a detailed assessment of the extent of damage.

DRP Invoking Procedures

DRP activation depends on the level of disaster. The BCP documents the following procedures during a disaster (Preston, 1999);

  1. Procedures for invoking relevant BCPs
  2. Procedures for communication of disaster. This includes procedures for –
  3. First notification of disaster and further escalation to CMT.
    1. Notification of disaster to SBU heads
    2. Notification of disaster to employees
    3. Notification of disaster to customers
    4. Notification of disaster to Media / media Management
  4. Procedures for Emergency Evacuation including Roles and Responsibilities of various personnel involved in Evacuation
  5. Recovery Procedures for various Infrastructure Items and IT Applications
Project Specific Disaster Recovery Plan

Each Project should prepare a DRP before the start of the Project in pre defined templates. Each Project Disaster Recovery Plan identifies an alternate site from where the project will be executed, in case the primary location is inaccessible based on the requirements of the project and availability of infrastructure at alternate site. This information is available from various templates that are used in the risk assessment (Toigo, 2005).

  • The Plan should identify critical project team members who will be shifted to the designated alternate location in case of such an incident. Where an employee may need to travel to onsite locations during a disaster, travel and other necessary documents are kept ready.
  • Data backup for all Projects should be stored at a predetermined location.
  • In case of a disaster where the primary site becomes inaccessible, each SBU from that location communicates requirements to the CMT to shift project team members.
  • CMT facilitates transportation of key employees to alternate locations through the Administration department.
Notification Procedures

A structure to notify disasters should be in place. This structure is also called as call tree. A call tree to notify occurrence of a disaster is shown in the following figure.

The figure shows the structure used to notify affected parties about the disasters. Emergency Procedures For Project DRP are

  • Control will be transferred to on-site – if required.
  • If recovery is required from alternate location, acquire resources / infrastructure from CML.
  • Initiate process of recovering processes, data, and applications as per the RTO or identified priority.
  • Make arrangements for transportation of people (as identified in Project DRP)
  • Resume operations at alternate location.
  • Confirm all Mission Critical services are restored
  • Use call tree to notify affected parties that services have been restored from alternate location.
  • Take control back to off-shore

Testing

Testing helps to evaluate the ability of recovery staff to implement the plan quickly and effectively. Each element of the BCP and DRP should be tested to confirm the accuracy of individual recovery procedures and the overall effectiveness of the plan. Plan testing is designed to determine (Pfleeger, 2002):

  • Whether the recovery teams are ready to cope with a disruption
  • Whether recovery inventories stored off-site are adequate to support recovery operations
  • Whether the business continuity plan has been properly maintained
Test Plan

Before conducting the test, a detailed test plan should be developed. The test plan includes (Pfleeger, 2002):

  • Scope of the Test – Defines the boundaries of the test. For example it lists the location, area, projects, components, and data.
  • Test objectives.
  • Test Scenario – This includes
  • Type of Test – For example Structured Walkthrough Test, Component Test or Full Function Test
  • Test Schedule
  • Description of the Test Scenario
  • Success Criteria For the Test – including the method used to evaluate the test results.
  • Test Participants
  • Sequence of Activities

In addition, maintenance procedures should be implemented for the DRP. To prevent Level 1 incidents of virus and hacking attacks or due to improper behaviour of employees, a security policy should also be implemented. The policy would specify rules of conduct while working, rules for email, data storage, personal storage devices such as iPods, MP3 players, mobiles with cameras and others.

Maintenance

The DRP must be maintained in a ready state that accurately reflects system requirements, procedures, and policies. IT systems undergo frequent changes because of changing business needs, technology upgrades, or new internal or external policies. It is important to review and update the BCP regularly to ensure new information is documented and contingency measures are revised if required. The DRP team is responsible for maintaining the BCP. The plan defines 2 types of maintenance, scheduled and unscheduled maintenance and these are briefly discussed as below:

Scheduled Maintenance

Scheduled maintenance is essentially time driven and occurs as a result of a scheduled review of the BCP. The frequency and type of reviews that need to be performed to maintain a business continuity plan include:

Quarterly Reviews

People-related elements of a business continuity plan become outdated quickly, quarterly reviews of these portions of the plan are important. People-related elements include:

  • Recovery Team Contacts
  • Critical Personnel
  • Vendor Contacts
  • Employee Lists
  • Emergency Phone Numbers
Semi Annual Reviews

Strategy-related elements of a business continuity plan are subject to changes in business and technology. These elements should be reviewed on a semi-annual basis. Strategy-related elements include:

  • The Strategy Outline
  • Interim Strategies
  • Prevention and Mitigation
  • Resources Requirements
Annual Reviews

The complete BCP should reviewed at least annually. The Business Continuity Management Team should meet with the management to discuss the BCP and obtain formal written approval for the same.

Unscheduled Maintenance

Unscheduled maintenance is event-driven. The Business Continuity Management Team must be made aware of all business-related events that occur which may affect the business continuity plan. Items which may cause unscheduled maintenance to the plan includes:

  • Changes in operating system environments (upgrades, new operating systems).
  • Changes in the network design
  • Changes in off-site storage facilities
  • Acquisition of, or merger with, another company
  • Sale of existing business
  • Re-engineering of a critical business process
  • Launch of new products
  • Transfer of business functions between existing sites
  • Implementation of new business functions
  • Discontinuance of an existing business function
  • Consolidation of work functions
  • Outsourcing of work functions
  • Migration to new technical platforms
  • Migration to new systems applications
  • Migration to new systems hardware
  • Change in critical third party vendor/ suppliers
  • Changes in telecommunications devices/systems, voice or data, structure/ equipment. These may include EPABX, new telephone systems.
  • Transfer, promotion, or resignation of individuals on the emergency notification list or CMT/DAT/Recovery Team members.

Summary

The chapter has discussed in detail the framework of DRP for an IT company that may operate through multiple locations. A specific organisation chart and steps to be followed for DRP implementation have been presented. To protect the intellectual assets of DRP, a company first needs to have an IT team security structure defined, carry out the risk assessment and perform the business impact analysis.

The next important step is to select the strategy for DRP implementation and form the crises tem management structure and create the process flow to identify disasters and activate the DRP along with the DRP invoking procedure and create project specific disaster recovery plan and the notification procedures. Once the DRP is in place, it is important to create a testing plan and a maintenance plan so that the DRP is in a state of readiness. This chapter is expected to serve as a guideline for organisations and managers who would want to create a DRP for their organisation.

Framework for BCP

This chapter provides a framework for constructing DRP network for IT and other companies that operate across multiple locations. Information for this section has been obtained by field visits and with extensive literature review and observation of actual implementation plans in different companies. The chapter forms one of the important features of the report and would help in practical implementation.

A Business Continuity Plan – BCP ensures that after a disaster has occurred and the DRP is running, the business is able to recover and start functioning. BCP is not implemented in isolation but would work along with the DRP and comes into effect after the DRP is implemented. Many clients insist that IT vendors should have DRP and BCP implemented before they provide them with business and work order. This is done to ensure that in the event of a disaster, the client does not suffer huge losses when large amounts of crucial data and source code are destroyed. Though the vendor also suffers losses, the loss to clients is much more since they would be using the software code costing a few million dollars to run their own enterprises worth billions of dollars.

In this chapter, a discussion has been done for three different scenarios: Scenario 1 where a natural disaster such as a hurricane or earthquake has occurred; Scenario 2 where the secure lines and VPN network has been compromised and Scenario 3 where hacking or a very dangerous virus attack has occurred. These three scenarios are Level 4 threats where the business has been severely and critically compromised and there is no chance of the company starting normally immediately. Detailed information of these scenarios was obtained by visiting clients and through email and telephonic interviews.

BCP plan is intended to provide a framework within which companies can take decisions promptly during a business disruption. The objectives of this plan are (Broder, 2002):

  • To identify major business risks.
  • To proactively minimize the risks to an acceptable level by taking appropriate preventive and/or alternative measures.
  • To effectively manage the consequences of business interruption caused by any event though contingency plans.
  • To effectively manage the process of returning to normal operations in a planned and efficient manner.
  • The scope of the corporate business continuity management plan document must include plans for restoring:
  • SBUs and all the Projects being executed by the SBUs
  • Shared services
  • Information Systems at all locations of the company

Scenario 1 – Natural Disaster, Earthquake, Hurricane

Natural disasters can occur at any point of time and while hurricanes give some amount of warning time, earthquakes can occur instantly and catch IT teams unawares. When such disasters strike, the whole infrastructure such as buildings, servers, computers, network wiring and others may be completely devastated. A disaster is defined as an event that causes interruption of business operations for an uncertain period of time. In this case, an IT company called ABC Ltd. has been considered.

About the Company ABC Ltd

The company has several Strategic Business Units (SBUs) or profit centres that are spread all over the world. The company provides services to various overseas clients through support units, such as offshore development cells (ODCs), onsite personnel, and sales offices (Botha, 2004). Figure 1 shows the organization structure of a company, ABC Ltd.

The figure shows the interconnectivity between different units of the company. The central hub of the company is in New York. It is a B2B central server and serves as the communication gateway and database for all business related protocols, processes and storage areas networks and others. The central hub also serves as a Gateway for different clients that the company provides services for and these are identified as Client 1, Client 2, etc. The Network is connected to a number of strategic business units in continents such as Europe, Japan, Australia, UK, Germany, etc. These centres are identified as ABC Europe, ANC Japan, etc. The clients are serviced through a network with different strategic business units (Edwards, 2006).

BCP Scenario

In the scenario, we will project that a major Hurricane has broken out along with an earthquake in the regions in which ABC Ltd. is situated. The natural disasters have taken out all the fiber optic cables and other infrastructure. The intellectual property of the company, its database containing records of transactions, software applications, customer financial records, etc, is stored in the IT systems. If the IT systems are not recovered in time, then all business would cease, people would not be able to use credit cards, personal identification authentication systems would be lost and there would be utter chaos (Edwards, 2006).

For the sake of the paper, it is assumed that a Level 4 disaster has stuck the centers.

BCP Solution

The Business Continuity Management Program (BCP) within a company is responsible for performing the business impact analysis, a process of classifying information systems resources baseline on criticality, and development and maintenance of a BCP. Tasks that need to be covered are included in the BXP document. The BCP should also maintain the BCP document up-to-date. This responsibility includes periodic reviews of the document – both scheduled (time driven) and unscheduled (Event driven).

BCP defines a Recovery Time Objective (RTO) that specifies a time frame for recovering critical business processes. The BCP meets the needs of critical business processes in the event of disruption extending beyond the time frame. Recovery capability for each Strategic Business Unit (SBU) – including all Projects being executed under the SBU – shared service, location and Offshore Development Centre are defined. In the event of any moderate / minor disaster, the recovery capability should ensure that the business processes work seamlessly without affecting any other dependent critical business processes. E.g. If the main power grid is disrupted, there must be standby facilities like generators to ensure that power is available. (Edwards, 2006)

The Proposed Solution

The following network is proposed for the BCP solution.

A redundant connectivity network has been proposed between different nodes in the network. According to the plan, a number of mirror cache sites have been proposed and these would take updates from different servers and while transferring the information in the network, they would also store data in storage area networks. A 2 mbps primary line with dedicated fiber optic cabling is proposed for the connection between the central server and the mirror caches.

In addition there would be a ISDN back up line that would connect the systems and this would be operated at 512 kbps. Further connections would have a T1 Dialup connection at 28-156 kbps. The update between the serves would be done at 12.00 hours GMT and at 24.00 hrs GMT. In this manner, even if disasters would take out one whole continent or even the central server, there is sufficient redundancy to start the network at reduced speeds. The data would already be stored in storage data networks and it can be physically retrieved and restored.

Scenario 2 – Secure Lines/ VPN Network Compromised

In this case, the same company as covered in the previous section has been considered. In this scenario, we will assume that the Site to site VPN connectivity from one of the centres ABC Ltd. New York has been compromised by fire. The fire has engulfed the IT systems in the centre and it would be cut off from the rest of the network. (Crothers, 2003). The following figure shows the existing Site-to-Site VPN Connectivity to ABC Ltd. NY.

BCP Solution

The following solution has been proposed to get the operations started once the fire has been doused and the network is ready for recovery.

The network diagram as shown above is designed to provide a redundancy between ABC Ltd. NY and the client location. An optional OSDN backup line is proposed along with firewalls. An internal Backbone router has been proposed with a switch that would be connected to the Client Internal network. A fire fall has been proposed of either Checkpoint 2000 or IPSEC complaint with 3DES encryption. The system data is stored in storage area networks and allows for quick restoration in the event of a fire. The fibre optic networking would be designed for Class IV fires and suitably hardened (Lavell, 2004).

Scenario 3 – Hacking and Virus Exploit

A network security administrator has seen in the system log, a few attempts by unauthorized users who have tried to login to the system. The system administrator has terminated the login attempt manually a few times, but there are fears that the hackers will ultimately hack into the servers and compromise the system. The plan is to build a honey pot to trap the intruder and harden the system by using firewalls and proxy servers. This will help in not only trapping the hackers but also allow the network to be recovered in case they hackers damage the network before the intrusion is detected. The BCP will also stop data from flowing out to the hackers (Botha, 2004).

BCP Solution

Intrusion Detection System (IDS) will be installed to detect unauthorized access attempts by hackers. The system will serve as an alarm system and the main intention of IDS is to provide a warning that illegal activity is happening or has happened some time back (Crothers, 2003).

A firewall is used to protect the internal network and create a demilitarised zone and this will isolate the corporate servers from being accessible to the public. There will be three intrusion detection sensors that will monitor the network traffic for signs of attack or malicious activity. The solid lines in the figure are the actual network connections. The dotted lines represent the secure communications that are used to pass detection information from the network and host based intrusion detection sensors to the master detection console. (Crothers, 2003).

Summary

The chapter has examined the BCP concept and discussed the implementation for three scenarios. BCP is used along with DRP to ensure that after a disaster has occurred and the DRP is implemented, the organisation is able continue its operations. Three scenarios with different types of threats have been examined and they include: hurricane and tornado disasters; when secure lines and VPN networks are compromised and when a network has been severely compromised by hacking or virus exploits. The three scenarios have demonstrated actual implementations of BCP along with network and infrastructure details and networks.

Analysis and Results (Empirical Chapters)

This chapter presents details of the field research conducted by the student during interactions with different Indian IT companies. Gartner estimates that only 35 percent of small and medium businesses across the world have a comprehensive disaster recovery plan in place (IBM corporate publication, 2006). The effectiveness of DRP depends on all the three major stages of DRP- Information gathering, plan development & testing and ongoing maintenance.

If the information gathered is not appropriate, it may lead to ineffective DRP. Even if the information is gathered accurately for planning phase, the effectiveness of DRP may fall down if the plan is not tested properly to find the flaws. Similarly the effectiveness of DRP may decline with the changes in business environment brought by changes in time if the organisation fails in ongoing maintenance phase of DRP. To improve the effectiveness of DRP, organisations should be able to perceive the importance of DRP with respect to cost of downtime.

Presentation of Research Findings

The survey instrument had three sets of questions that were open ended and while some replies could be quantitatively answered by giving values and number, others were answered quantitatively and the responder was asked to describe the responses. Questions in each of the three sets have been presented along with the responses and the answers have been analysed accordingly.

Questions Set A – Cost of downtime

Cost of downtime is a very critical area for IT businesses since all the back end operations run on a 24×7 basis and for the duration of the downtime, all activities cease. Once if the critical business processes are identified, availability requirements should be determined for each process. Documenting the cost of not meeting availability requirements helps in determining the value of investment used in improving availability (Vision solutions publications, 2006). Most organisations define ‘availability’ somewhere along a continuum between multiple hours of downtime with significant data loss to real-time 24/7 uptime with zero data loss.

The definition depends on the organizational needs, data and application requirements and organisational structure. However, the goal is to prevent inevitable system downtime from affecting business uptime. Unplanned downtime may hit at anytime from any number of reasons. How much does downtime costs the business is the obvious question to be answered by organisation, which reveals the importance of DRP. However, unexpected disasters cause both direct and indirect consequences both short term and far reaching. The following table shows the sample costs:

Direct costs Indirect Costs
Lost wages Loss of employees
Lost transaction revenue Lost business opportunities
Lost inventory Decrease in stock value
Remedial labour costs Loss of customer goodwill
Marketing costs Brand image
Bank fees Driving business to competitor
Legal penalties Bad publicity

The questions and their analysis are presented as below:

Q1. How many projects do organisation undertakes per annum out of which how many can you afford to lose without significant impact on company?

Analysis

The question was designed to estimate the number of projects out of the total number of projects that can be lost to disasters without the loss having a significant impact on the organisation. The number of projects per annum that the organisations undertook varied from 150 to 500 and the percentage of projects that could be lost without significant impact was on an average about 5%. Since this was a descriptive answer, the respondents first insisted that they were not ready to accept any loss of accounts but since accounts are lost in the normal business process, the figure of 5% of the projects is taken.

So this means that DRP would essentially be protecting 95% of the new projects and any losses to these projects would mean a high revenue loss. While 5% represents the risk exposure that organisations can handle, 95 % presents the risk coverage that DRP provides.

Organisations in some cases tend to have DRP for only certain projects that they consider is very vital and the DRP process in some cases in initiated when the project may reach a delivery stage. Prior to reaching this stage, data and source code of the project may be placed in applications such as Visual Source Safe. These applications actually provide against small local outages such as computer crashing, data getting corrupted and so on and when such minor incidents happen, the back versions of a source code can be extracted from VSS. But this is not DRP since these events are not disasters that can take out a whole region.

When a disaster occurs, even the computer with VSS gets destroyed. So it is important that incremental back ups procedures be initiated for all projects. It is also agreed that backing up vast amount of data becomes more expensive and typically when organisations do not see any short term benefit or prospects in a project, there is a tendency to put the project on low priority with shared resources allotted and consequently the data backup plan is either ignored or done erratically. One never knows if such projects would become significant in the future and so it is recommended that all work done is placed in the DRP backup procedures (Wallace, 2004).

This establishes the importance of the role of DRP and BCP in an organisation. So DRP is expected to protect up to at least 95% of the intellectual assets of a company.

Q2. What is the cost of lost productivity and lost revenue for every one hour of downtime?

Analysis

The responses for this question varied from 0.01 million USD/hour to 45 million USD/ hour. This means that organisations, depending on their business may incur losses of this magnitude. Again the value of business losses is not always dependant on the revenues of a company. Back office operating companies that process cheques, online transaction payments, call centres and so on, are often low cost companies and that is the reason why business has come to them in the first place. They may have employee wages as low as 10$ per hour but they may process transactions worth millions of dollars. Disasters would not spare both these values and this can to be considered while estimating the hourly costs of downtimes.

The cost of lost productivity is of severe concern to managers across all industries and more so in the IT industry. While in manufacturing industries it is possible to run third shifts and weekend working to make up for the production loss, in the IT industry, this is not always possible. For example, an online shopper or customers who pay through credit cards may cease their purchase and there is no guarantee that they will come again and buy or a product and they may very well buy a competing product. In addition, there are back end processes such as data transfer, ATM transactions, supply chain transfer details and so on that run continuously in ERP and CRM applications and a break in the connectivity of one hour means that transactions done during this hour are lost and the value of these transactions often run into millions of dollars. In such cases, organisations would have to manually revert the transactions, reconcile the accounts and then again initiate data upload and these actions consume resources that have to be diverted from other important functions (Toigo, 2005).

Organisations, particularly IT firms work with very tight manpower and to increase the productivity and lower the operating costs and the costs of diverting manpower are in many cases. Since project teams work on tight deadlines and are almost fully engaged, it is difficult to divert existing and engaged resources and ask them to take up other DRP recovery activities (Wallace, 2004).

So it can be concluded that organisations suffer losses ranging from 0.01 million USD/ hours to 45 million USD/ hours if their installations are hit by disasters and if they do not have DRP implemented.

Q3. How will be the collaborative business processes with partners, suppliers and customers affected by unexpected downtime?

Analysis

The answers for this question varied from minor to critical and this depends on the nature of business relation between the organisation and the partners and suppliers. Organisations that have live hosting of back end processing and call centres rated the impact of unexpected downtime as critical while organisations that have periodical drops of software application builds that are deployed in client locations rated this aspect as minor. It must be noted that in the latter case, the company has already implemented DRP and is able to retrieve the source codes at any given point of time.

The nature of collaborative business processes with partners, suppliers and customers needs to be understood first. Suppliers and vendors of products and goods that are linked through the supply chain to the mother nodes and that are located in areas unaffected by a disaster would only suffer loss of communication and connectivity while their physical assets and goods are not harmed in anyway. So when the link and connectivity is broken then at the most, details of work orders, payments, shipments and other such details are lost and these can be reconciled and recovered with some effort.

The case of partners and vendors that perform back office functions such as handling claims, processing payments, extending credit, managing customer banking accounts and credit cards and so on is much more complicated. These organisations process tremendous amounts of information and may process more than a million transactions each day. Any disruption in the service would lead to a blackout and there are chances that transactions in process and waiting for authentication may get corrupted or lost. In such cases, the losses for business and customers would be massive and total.

On the other hand scheduled and planned downtime that are taken periodically are different. In this type of downtime, services are not available for customers and hence transactions are stopped for the duration and there is no input of data and sufficient notice of the downtime is given to the concerned people. Also since maintenance teams and other stakeholders are available at hand, they take proper steps to ensure that data is backed up and nothing is lost. At the most, scheduled downtime can cause a certain amount of inconvenience and while there are lost opportunity costs in the form of customers who tend to go elsewhere, the losses are not extensive and total.

So the impact of a disaster due to downtimes varies from minor to critical and this indicator depends on the nature of the effected business.

Q4. What is the average yearly revenue of the company?

Analysis

The surveyed organisations had annual revenues ranging up to more than 2 billion USD. This type of revenue represents a substantial amount in the IT industry and shows that the survey has considered all types of companies with different values for revenues. Revenues as reported are as given by the responder for the questionnaire.

Q5. Do you depend on one or more critical IT applications such as ERP or CRM?

Analysis

The answer to this question is Yes. The types of applications that the companies depend on include ERP, CRM, PeopleSoft, Business Objects and others.

IT companies use a number of third part software on which they build their software applications and these are then sold in the market. Standard applications used for database management include Oracle while programming languages include Microsoft Visual Suite, Java, C++ and ERP applications such as SAP, BAAN, PeopleSoft and so on and the list may be very huge. Software application developers such as Infosys would be using the applications specified by their customers in the work orders. So the overall types of applications that would be running at any point of time would be large.

This again means that the data back and indexing methods would also scale and be very complex. Another problem is that organisations may be using different versions of the same application and there would be different source codes that run on these different software applications. For example, a company may be using Oracle 7 in one project Oracle 10i in another. So it is necessary to preserve the environment and versions of these builds separately.

So the conclusion is that organisations use multiple applications that have to be backed up in the DRP and this increases the complexity of the DRP.

Questions Set B – Perceived importance of DRP

This set of questions examines the importance the organisations give to DRP, their perception of the costs and returns and how willing organisations are to spend on DRP.

Q 1) Have you planned for disaster recovery? If not are you looking for developing DRP in the coming future?

Analysis

The answer to this question was an emphatic Yes from all the organisations that were surveyed. While some organisations already had an advanced DRP in place, others were in the process of implementing the same.

Post 9/11, organisations across the world have realised that disasters, both natural and man made can hit them at any point of time and from anywhere. The past few years also have seen an increase in natural disasters such as earthquakes that hit China, hurricanes that ravaged US and tornadoes and Tsunami. In many cases, these events were not forecast and even when they were forecast, the time gap between the notice and the actual striking of the disaster was too small and with such short notice time, the response time for the DRP to be implemented becomes very short. So a DRP has to be always in place and in a state of constant readiness and it must be possible to launch it with very little notice.

Organisations have moved beyond the wait and watch state where they would watch and see how other organisations that have implemented DRP are able to recover their systems to actual implementations. With the rise in uncertainty of the environment in global operations, with increased acts of terror and also increased fury and unpredictable behaviour of the natural forces, implementing effective DRP has become a need rather than a luxury or an accessory (Toigo, 2005).

The conclusion is that all the surveyed organisations have some form of DRP in place or in an advanced state of implementation.

Q 2) Do you think cost of protecting business against conceivable eventuality is higher than the risk of not protecting at all?

Analysis

The answer to this question was again an emphatic No. All the organisations did not think that the cost of protecting business against conceivable eventuality is higher than the risk of not protecting at all.

In the fierce competitive environment of today, organisations are at intense pressure to maintain healthy bottom lines and there is an increasing tendency to hive off or close down units that the organisation feels is not profitable or justified in maintaining. A DRP is like a fire extinguisher in the office or in the car. Money has to be paid for the device and it would be used only in the eventuality of a disaster.

If it is never used at all or if a fire never occurred at all, then that is really nice but just because a fire has never occurred in the past few years, it would be foolish to question the investment made for the fire extinguisher and attempt to sell it off as a fire can occur at any instant. To give another example, a DRP is like an accident insurance. People have paid the required insurance premiums for a few decades without ever collecting the money because they were fortunate to have never been involved in an accident. But this does not mean that they should be self-assured in their invincibility and decide to stop paying the insurance payments as accidents and mishaps can occur at any point of time (Wallace, 2005).

IT organisations are particularly notorious for placing employees on the bench when there aren’t enough projects to go around or when they find that there is less demand for people with specific skill sets. There is again an inclination to reduce the employee strength in specific departments that has lesser business and there network security and the DRP teams tend to appear in this list since these functions do not actually generate revenue and are more of support functions. But it is interesting to note that organisations do not regard costs of DRP as more than the cost of the assets they protect (Toigo, 2005).

So the conclusion is that organisations do not think that cost of protecting business against conceivable eventuality is higher than the risk of not protecting at all.

Q 3) With respect to cost of downtime, do you think the organisation can recover easily after disaster without effective DRP in place?

Analysis

The answer to this question was no. All the organisations surveyed felt that with respect to the cost of downtime, the organisation could not recover easily without an effective DRP in place. Some of the sample answers were “Without effective Disaster Recovery plan in place the organisation cannot recover easily, with out protecting the data it is impossible to recover it , and it is highly risk” and “Its very, very hard to recover after a disaster with out a Disaster Recovery Plan. I feel it is as same as re-developing the system completely from scratch”.

Large projects with team sizes of 100 or more typically take up to 2 years to fully develop a software application test and deliver the final build. This involves hundreds of man hours and interactions of specialists from other teams who may be involved in trouble shooting certain aspects of the program code. The employee turnover in the IT industry is quite high, of the order of 20% and from the start of a project to the end of a project sees many employees leave and knew members coming in.

If a disaster takes out a location and if there is no DRP implemented, then all the creative efforts of the members is lost forever and the organisation is moved two years back in time. While it may be possible to start development activities from start, this process would probably not replicate the original effort and result in massive drain on resources that the company cannot afford. Besides, there would also be a tremendous loss of goodwill and reputation and confidence of the customers, if it was found that the company did not have a DRP in place.

So the conclusion is that with respect to cost of downtime, organisations do not think that the organisation can recover easily after disaster without effective DRP in place.

Questions Set C – Current state of DRP

The section has researched the current state of DRP implementations in the surveyed organisations.

Q 1) If the organisation practices DRP, When is that prepared and when is that updated last time?

Analysis

The answers to this varied and while some organisations had prepared organisation wide DRP a decade back, others were taking up the activity on project basis. DRP updating is done as and when there is a change in the requirements.

Organisations also updated the DRP with new procedures or when crises management teams changed and this is a very crucial aspect in DRP. It is important to ensure that all links and nodes in the DRP are updated and working at any point of time as a weak or missing link can cause the DRP to fail and all the efforts and expenses put in to create and implement the DRP would be wasted.

So the conclusion is that while some organisations had implemented an organisation wide DRP since a few years back, other took up the implementation on a project basis. All companies took efforts to keep the DRP updated as and when any changes occurred or at specific intervals.

Q 2) How long has it taken in developing DRP and what is the major hurdle you faced in the process of developing?

Analysis

The answer to this question varied from a week to more than 6 months and this indicates the complexity of the tasks and work involved in preparing the DRP.

Smaller organisations of firms that use the same platforms, databases and programming languages often find that they can reduce the efforts required by creating a framework and then quickly replicating it through all other projects. This not only saves time but also manpower and other resources. On the other hand, firms that have complex and multiple systems, with huge volumes of data running on multiple platforms and databases often find take a relatively longer time to set up their DRP.

So the conclusion is that the time taken to implement a DRP can vary from a week to about 6 months.

Q. 3) Have you taken help from outside consultant in developing DRP?

Answer

The answer to this was No and all organisations have used in house resources in developing DRP systems.

Considering the critical nature of DRP, the volume of confidential data that has to be handled and the urgency of response, none of the surveyed organisations outsourced the work and no external consultant was hired to set up the system.

So the conclusion is that organisations prefer to use their in house team to develop, test and maintain DRP.

Q 4) Is your organisation practices ongoing maintenance of DRP?

Analysis

All the organisations surveyed answer Yes to indicate that they practice ongoing maintenance activities for DRP.

A DRP has to be regularly maintained to ensure that data back up, call and notification procedures, contact details of crises management teams are updated and that DRP notification procedures can be invoked with the required urgency, as an when it is required.

So the conclusion is that all the organisations surveyed regularly practiced on going maintenance of their DRP implementation.

5) When the DRP is tested last time?

Analysis

All the organisations have a specific time interval when the DRP is tested. The testing is followed up with an analysis to find potential loopholes and shortcomings and defects and these are analysed and set right.

So the conclusion is that all organisations surveyed have periodic testing of their DRP systems.

Conclusions and Recommendations

The report has done an extensive examination of DRP and BCP with an emphasis on creating a framework for practical implementation. DRP and BCP implementations would offer protection for the intellectual property of an organisation and help an organisation to quickly recover the soft property in case a disaster strikes. There are four levels of threats and while level 1 refers to a minor outage, a level 4 threats is a major disaster caused by hurricanes and earthquakes.

The report has discussed in detail the framework of DRP for an IT company that may operate through multiple locations. A specific organisation chart and steps to be followed for DRP implementation have been presented. To protect the intellectual assets of DRP, a company first needs to have an IT team security structure defined, carry out the risk assessment and perform the business impact analysis.

The next important step is to select the strategy for DRP implementation and form the crises tem management structure and create the process flow to identify disasters and activate the DRP along with the DRP invoking procedure and create project specific disaster recovery plan and the notification procedures. Once the DRP is in place, it is important to create a testing plan and a maintenance plan so that the DRP is in a state of readiness. This chapter is expected to serve as a guideline for organisations and managers who would want to create a DRP for their organisation.

The report has examined the BCP concept and discussed the implementation for three scenarios. BCP is used along with DRP to ensure that after a disaster has occurred and the DRP is implemented, the organisation is able continue its operations. Three scenarios with different types of threats have been examined and they include: hurricane and tornado disasters; when secure lines and VPN networks are compromised and when a network has been severely compromised by hacking or virus exploits. The three scenarios have demonstrated actual implementations of BCP along with network and infrastructure details and networks.

As a part of the research activity, 10 IT companies were surveyed through visits, with survey instruments and experts in these organisations were requested to complete a questionnaire. To understand an effective DRP framework for SME’s, a survey instrument was designed to find out the industry response, which helps in studying three key areas of DRP- cost of downtime, perceived importance of DRP and current state of DRP within organisation with respect to the DRP process. The instrument had three sets of questions that asked for responses for different aspects of the DRP. The responses to the instrument have been analysed and conclusions have been obtained for each of the questions from the sets.

Based on the research findings DRP is expected to protect up to at least 95% of the intellectual assets of a company. Organisations suffer losses ranging from 0.01 million USD/ hours to 45 million USD/ hours if their systems are hit by disasters and if they do not have DRP implemented. The impact of a disaster due to downtimes varies from minor to critical and this indicator depends on the nature of the effected business.

Organisations may use multiple applications that have to be backed up in the DRP and this increases the complexity of the DRP. All the surveyed organisations have some form of DRP in place or in an advanced state of implementation. Organisations do not think that cost of protecting business against conceivable eventuality is higher than the risk of not protecting at all. With respect to cost of downtime, organisations do not think that the organisation can recover easily after disaster without effective DRP in place. While some organisations had implemented an organisation wide DRP since a few years back, other took up the implementation on a project basis.

All companies took efforts to keep the DRP updated as and when any changes occurred or at specific intervals. The time taken to implement a DRP can vary from a week to about 6 months. All the surveyed organisations prefer to use their in house team to develop test and maintain DRP. All the organisations surveyed regularly practiced on going maintenance of their DRP implementation. All organisations surveyed have periodic testing of their DRP systems.

Recommendations for DRP implementations

DRP and BCP must be implemented by organisations with intellectual assets and soft assets that could be wiped out in case of a disaster. The framework suggests that multiple locations should be used for periodic incremental backup. By using different locations with mirror servers, then if one location is compromised, the DRP can be implemented and data recovered from another location. It is important to develop in house expertise in developing, maintaining and testing DRP implementations. The type and complexity of the system would depend on the nature of business that the organisation performs.

Limitations of Research

While the framework and network architecture and implementation steps for DRP are obtained from interviews and surveys, an actual DRP exercise when a disaster has struck a location ha not been performed. This is important to gauge the effectiveness and speed with which DRP and BCP can be effected.

Recommendations for further research

It is recommended that to provide a focussed research, a very detailed study of the actual DRP and BCP implementations should be conducted for different industry sectors. It is also recommended that a live analysis to gauge the response speed and effectiveness when an actual disaster has struck should be examined.

References

Ann, G.. 2001, A Framework for Disaster Recovery Planner’, Disaster Recovery Planning- Process and Options white papers, Comprehensive solutions, Brookfield, USA.

Ambs Ken. 2000. Optimising restoration capacity in the AT&T network. Interfaces Journal. Volume 30. Issue 1. pp: 26-40.

Amble, B. 2004, ‘SMEs booming in UK PLC’, Management-Issues. Web.

Benton, Dick. 2007. Disaster Recovery: A Pragmatist’s Viewpoint. Disaster Recovery Journal.

Botha Jacques. Rossouw Von Solms. 2004. A cyclic approach to business continuity planning. Journal of Information Management & Computer Security. Volume 12. Issue 4. pp 38-51.

Broder James F. 2002. Risk Analysis and the Security Survey, 2nd edition. Broder. Boston, MA: Elsevier Science. ISBN: 0750670894.

Brunetto Guy. 2006. Disaster recovery: How will your company survive? Journal of Strategic Finance. Volume 82. Issue 9. pp: 57-62.

‘Beyond disaster recovery: becoming a resilient business’, 2007, IBM Global services, USA.

‘Consulting Methodologies- Disaster Recovery Planning’, 2003, Info Tech Research. Web.

Crothers Tim, 2003. Implementing Intrusion Detection Systems. Wiley Publishing Inc. ISBN 8126503688.

Edwards Frances L. 2006. Businesses Prepare Their Employees for Disaster Recovery. Journal of Public Manager. Volume. 35, Issue. 4; pp. 7-13.

Denzin, Norman K. & Lincoln, Yvonna S. (Eds.) 2000. Handbook of Qualitative Research. Thousand Oaks, CA: Sage.

Facer Dave. 2001. Rethinking: Business continuity. Journal of Risk Management. Volume 46. Issue 10. pp: 17-21.

‘Finding the right disaster recovery balance’, 2006, Computer Weekly. Web.

Fitzgerald Kevin J. 1995. Establishing an effective continuity strategy. Journal of Information Management & Computer Security. Volume 3. Issue 3. pp: 105- 138.

Gilchrist Bruce. 2001. Coping with Catastrophe: Implications to Information Systems Design. Journal of the American Society for Information Science. pp: 271-278.

Greg, S. 2007, ‘Disaster Recovery Planning for SMBs’, Computer weekly. Web.

Gottfried, I.S., 1989, When disaster strikes”, Journal of Information Systems Management,pp. 86-9.

Hiatt Charlotte J. 2007. A Primer for Disaster Recovery Planning in an IT Environment, 2nd Edition. ISBN-10: 1878289810.

IDC. 2007. Indian IT Industry Growth Statistics. Web.

Lavell Joan L. 2004. Business continuity plans: An overview. Journal of Investment Compliance. Volume 5. Issue: 2. pp: 75-86.

‘Survey shows holes in UK’s corporate disaster recovery plans’, 2008, Business Management Zone News. Web.

Kakoli, B. and Peter, P.M., 1999, ‘A framework for integrated risk management in information technology’, Management Decisions, Vol. 37, No. 5, pp. 437-444.

Kaye David, Graham Julia. 2006. A Risk Management Approach to Business Continuity: Aligning Business Continuity with Corporate Governance. Rothstein Associates Inc. ISBN 1-931332-36-3.

Liz, G. 2007, ‘Disaster Recovery’, accounting today journal, Vol. 21, No. 7.

Margaret Pember. 2007. Information disaster planning: An integral component of corporate risk management. ARMA Records Management Quarterly. Volume 30. Issue 2. pp: 31-39.

Margulies Stuart. 2006. Preparation for the DRP test: (Degrees of reading power), 2nd Edition. Educational Design publications. ISBN-13: 978-0876942857.

Meade Peter. 1993. Taking the risk out of disaster recovery services. Journal of Risk Management. Volume 40. Issue 2. pp: 20-26.

Mick Savage. 2002. Business continuity planning. Journal of Work Study. Volume 51. Issue 5. pp. 95-123.

Moore Pat. 1995. Critical elements of a disaster recovery and business/service continuity plan. Journal of Facilities. Volume 13. Issue 9/10. pp: 195-236.

Potter Chris. 2003. New survey raises serious concerns about the effectiveness of disaster recovery plans. M2 Presswire.

Preston W. Curtis. 1999. UNIX Backup and Recovery. O’Reilly Media, Inc. ISBN-10: 1565926420.

Presswire. 2008. Price Waterhouse Coopers. New survey raises serious concerns about the effectiveness of disaster recovery plans. M2 Presswire. pp: 2-3.

Pfleeger Charles P. December 12, 2002. Security in Computing, 3rd Edition. Prentice Hall PTR. ISBN-13: 978-0130355485.

Rediff. July 17, 2008. Top 10 software companies in India. Web.

‘The Sunday Times 100 Best Companies to Work for’, 2008, TIMESONLINE. Web.

Silverman David. 2001. Interpreting Qualitative Data: Methods for Analysing Talk, Text and Interaction, Second edition. Sage Publications. ISBN 0761968652.

Swartz Nikki. 2004. Survey Assesses the State of Information Security Worldwide. Information Management Journal. Volume 38. Issue 1. pp: 16-20.

Toigo, J.W., 1992, Disaster Recovery Planning: Managing Risk and Catastrophe in Information Systems, Yourdan Press Computing Services, Prentice-Hall, Englewood Cliffs.

Toigo Jon William. 2002. Disaster Recovery Planning: Preparing for the Unthinkable, 3rd edition. Prentice Hall PTR.

Toigo Jon William. 2005. Disaster Recovery Planning: For Computers and Communication Resources. Wiley; Publications. ISBN-10: 0471121754.

Varghese and Mathew, 2002, ‘Disaster Recovery ‘, Course Technology publication, pp.11-15.

Wallace Michael, Webber Lawrence. 2004. The Disaster Recovery Handbook: A Step-by-Step Plan to Ensure Business Continuity and Protect Vital Operations, Facilities, and Assets. AMACOM.

Winkworth, G., 2007, ‘Disaster Recovery : A review of the literature’. Web.