Explain high availability, and disaster recovery , cloud disaster recovery.

HIGH AVAILABILITY AND FAULT TOLERANCE

An effective IT infrastructure must function even in the event of a rare network loss, device failure, or power loss. When the system fails, one or more of the three major availability techniques will kick in: high availability, fault tolerance, and/or disaster recovery. While each of these infrastructure design solutions contributes to the availability of your key applications and data, they do not fulfill the same goal. Simply because you run a High Availability infrastructure does not mean you need not set up a disaster recovery site and doing so risks disaster.

HIGH AVAILABILITY

A High Availability system is meant to be up and running 99.99 percent of the time, or as close to it as feasible. Typically, this entails creating a failover system capable of handling the same workloads as the original system. HA works in a virtualized environment by generating a pool of virtual computers and related resources inside a cluster. When one of the hosts or virtual machines dies, it is resumed on another VM in the cluster. HA is done in physical infrastructure by designing the system with no single point of failure; in other words, redundant components are required for all key power, cooling, computing, network, and storage infrastructure.

Hosting two identical web servers with a load balancer distributing traffic between them and an extra load balancer on standby is one example of a basic HA approach. If one of the servers fails, the balancer may route traffic to the other.

Fault Tolerance

The ability of a system (computer, network, cloud cluster, etc.) to continue working without interruption when one or more of its components fail is referred to as fault tolerance. The goal of developing a fault-tolerant system is to reduce interruptions caused by a single point of failure, while also assuring the high availability and business continuity of mission-critical applications or systems.

Fault-tolerant systems employ backup components that automatically take the place of failing components, ensuring that there is no interruption in service. These are some examples:

Hardware systems that are supported by the same or analogous hardware systems. A server, for example, can be made fault-tolerant by operating an identical server in parallel, with all processes mirrored to the backup server.

Software systems that are backed up by other instances of software. A database containing customer information, for example, can be continually copied to another system. If the primary database fails, operations can be immediately diverted to the backup database.

Power sources that have been made fault-tolerant through the use of alternate sources. Many firms, for example, have backup generators that can take over if the main power supply fails.

Similarly, any system or component that has a single point of failure can be made fault-tolerant through the use of redundancy.

DISASTER RECOVERY

If your systems are set up with High Availability (HA) or Fault Tolerance (FT), it may appear that you do not require a disaster recovery architecture. After all, why put up a separate DR site if your servers can endure downtime with 99.999 percent or greater availability?

DR extends beyond FT or HA to include a comprehensive strategy for recovering essential business systems and regular operations in the case of a catastrophic disaster such as a large weather catastrophe (storm, flood, tornado, earthquake, etc.), a cyberattack, or any other source of considerable downtime. HA is frequently a key component of DR, which can also include a completely different physical infrastructure site with a 1:1 replacement for every vital infrastructure component, or at least as many as are necessary to restore the most important business services.

DR is set up with a Time to Recovery and Recovery Point, which indicates the time it takes to restore critical systems and the point in time before the catastrophe that is restored (you probably do not need to restore backup data from 5 years ago to come back to work during a catastrophe, for example).

A disaster recovery platform copies your chosen systems and data to a second cluster for storage. This mechanism is activated when downtime is detected, and your network pathways are rerouted. DR is frequently used to replace a whole data center, whether real or virtual, as opposed to HA, which often deals with problems in a single component such as a CPU or a single server rather than a full failure of all IT infrastructure, as would occur in the event of a disaster.

CLOUD DISASTER RECOVERY (CLOUD DR)

Cloud disaster recovery is a backup and restoration method that involves keeping and keeping copies of electronic documents as a security measure in a cloud computing environment. The purpose of cloud DR is to offer a method for an organization to recover data and/or enable failover in the case of a man-made or natural disaster. Cloud disaster recovery often offers the same services as an on-premises or company-managed off-premises disaster recovery plan (DRP) facility but on a more cost-effective, efficient, and provider-managed platform. A cloud DRP vendor assigns users and storage space, and updates selected computers with client software installed on each system regularly. Users may add, amend, and delete systems and storage space without having to worry about the back-end infrastructure.

A cloud-based disaster recovery solution allows users to scale up the complete cloud DRP system from one to many. The storage and client software licenses are generally invoiced to the customer monthly. Most Cloud DR services include backup and recovery for essential server machines that run enterprise-level programs such as MS-SQL, Oracle, and others.

Although the idea and some of the goods and services of cloud-based disaster recovery are still in their early stages, some businesses, particularly small and medium-sized businesses (SMBs), are finding and beginning to leverage cloud services for DR. Because the usage-based pricing of cloud services is ideally suited for DR where the secondary infrastructure is parked and idle most of the time, it might be an appealing choice for organizations that may be short on IT resources. Having disaster recovery sites in the cloud eliminates the requirement for data center space, IT equipment, and IT staff, resulting in considerable cost savings, and allowing smaller businesses to adopt disaster recovery alternatives that were previously only available to bigger corporations.

Disaster recovery in the cloud is not a flawless solution, and its flaws and limitations must be thoroughly recognized before a company invests in it. Security is frequently at the top of the list of concerns: Is data securely transferred and stored in the cloud?

- How are users authenticated?

- Are passwords the only option or does the cloud provider offer some type of two-factor authentication?

- Does the cloud provider meet regulatory requirements?

Furthermore, because clouds are accessed over the Internet, bandwidth needs must be properly recognized. There is a danger in merely preparing for bandwidth needs to shift data onto the cloud without considering how to keep the data accessible in the event of a disaster: Do you have the bandwidth and network capacity to redirect all users to the cloud?

Reliability of the cloud provider, its availability, and its ability to serve your users while a disaster is in progress If you plan to restore from the cloud to on-premises infrastructure, how long will that restore take? are other key considerations. The choice of a cloud service provider or managed service provider (MSP) that can deliver service within the agreed terms is essential, and a wrong decision can even get you fired.

Search This Blog

Notes for BSc CSIT