Home

IT Recovery in AWS

Contents

1 Overview
2 Why Use AWS Availability Zones?
3 Difference Between High Availability (HA) and Disaster Recovery (DR)
4 Architecture Options
5 Frequently Asked Questions

Overview

This document aims to help IT managers understand how Amazon Web Services (AWS) can help them achieve their disaster recovery (DR) goals and comply with IS-12 policy. Its goal is to simplify the selection of an AWS architecture based on the application’s recovery requirements – Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

Why Use AWS Availability Zones?

The IS-12 Policy mandates that every in-scope IT resource, application, or service have a recovery plan and backup plan and conduct regular testing of these plans. AWS availability zones provide an effective means of ensuring resiliency and reliability in DR scenarios.

This recommendation is guided by the following assumptions regarding AWS Availability Zones

AWS availability zones function as different geographically dispersed data centers.
According to IS-12, backups must be stored in an “off-site” or geographically dispersed location. AWS snapshots are considered to be backups because the data is stored in S3.
A live transactional/operational copy is not considered to be a backup.

If applications are highly available (HA) across different availability zones, your application is considered DR-ready. However, the IS-12 policy will require you to test your HA setup annually.

Note:

Difference Between High Availability (HA) and Disaster Recovery (DR)

High Availability (HA) and Disaster Recovery (DR) are both vital for keeping IT services running smoothly, but they have different goals:

High Availability (HA): Ensures that services stay up and running with minimal downtime by using multiple instances across different locations or availability zones. It focuses on preventing interruptions.
Disaster Recovery (DR): Involves plans and backups to restore data and services after a major issue. DR aims to recover quickly and minimize data loss.

In short, HA is about avoiding downtime, while DR is about recovering from disasters.

Architecture Options

The following design recommendations are based on the recovery level classification.

Recovery within 15 minutes (Recovery Level 5)

Technical Design:

Application configuration: Two or more application instances with equal capacity and configuration operate simultaneously across multiple data centers (availability zones).
Data synchronization: These applications must exchange data with one another across different data centers in near real-time.

Data Backup/Restore:

The frequency of data backups or snapshots must align with the recovery point objectives for each application.
The data backup should be in a geographically dispersed location. Using AWS Snapshot can fulfill this requirement.

Outcomes:

Recovery Time Objective – within 15 minutes
Recovery Point Objective – Frequency of backups or snapshots

Cost

High

Recovery within 6 hours (Recovery Level 4)

Technical Design:

Application configuration: A primary application instance is operational, while a second instance, as a DR instance, can be configured to match its capacity but remain in a shutdown state with the operating system operational (warm-standby).
Data synchronization: Data replication/backup frequency from the primary data center to the secondary data aligns with the recovery point objective.

Data Backup/Restore Features:

The frequency of data backups or snapshots must align with the recovery point objectives for each application.
The data backup should be in a geographically dispersed location. Using AWS Snapshot can fulfill this requirement.

Outcomes:

Recovery Time Objective – within 6 hours
Recovery Point Objective – Frequency of backups or snapshots

Cost

Medium

Recovery in 24 hours or more (Recovery Level 3 or lower)

Technical Design:

Application configuration: The primary application instance is operational, and the DR instance can be restored using replication technology (cold standby).
Data synchronization: Data replication and backup frequency from the primary data center to the secondary data center aligns with the recovery point objective.

Data Backup/Restore Features:

The frequency of data backups or snapshots must align with the recovery point objectives for each application.
The data backup should be in a geographically dispersed location. Using AWS Snapshot can fulfill this requirement.

Outcomes:

Recovery Time Objective – 24 hours or longer
Recovery Point Objective – Frequency of backups or snapshots

Cost

Frequently Asked Questions

What’s the Recovery Level?

Under the IS-12 Policy, IT Resources/Application is categorized into 5 recovery levels.

What’s availability Zones

An Availability Zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center. All AZs in an AWS Region are interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber providing high-throughput, low-latency networking between AZs. All traffic between AZs is encrypted. The network performance is sufficient to accomplish synchronous replication between AZs. AZs make partitioning applications for high availability easy. If an application is partitioned across AZs, companies are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more. AZs are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.

What are AWS Regions?

AWS has the concept of a Region, which is a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone. Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area. Unlike other cloud providers, who often define a region as a single data center, the multiple AZ design of every AWS Region offers advantages for customers. Each AZ has independent power, cooling, and physical security and is connected via redundant, ultra-low-latency networks. AWS customers focused on high availability can design their applications to run in multiple AZs to achieve even greater fault-tolerance. AWS infrastructure Regions meet the highest levels of security, compliance, and data protection.

(See image next on page.)

What if my application currently does not meet the Recovery Level requirement?

Choose the design that best aligns with the current recovery capability and reevaluate the application architecture to modify it as necessary.

What’s an active-active application?

Applications in different data centers function as one. t

What’s a clustered application?

Multiple instances of the same application collaborate as a single, unified application, sharing data in real time. This configuration is synonymous with an “active-active” setup.

What’s the difference between warm and cold standby?

Warm standby: The application is pre-configured to operate and contains updated data, although not in real-time. The OS resource hosting the application is active, yet the application can be shut down to save resources.

Cold standby – unconfigured or minimally configured application or hardware.

Do highly available or applications (IT resources) comply with the IS-12 policy, or are DR-Ready?

These applications may be DR-ready and can minimize recovery time and effort during a service interruption. However, to comply with IS-12, they will also require documentation: a recovery plan, a test plan, testing, and backup testing. Please note that data replication may not equate to backup. Under IS-12, a backup must be stored in a different geographically dispersed location and needs to be tested.

Note: AWS replication can function as a backup because the data is stored in an AWS-managed location (S3).

What is S3?

S3 (Simple Storage Service) is a scalable object storage service provided by Amazon Web Services (AWS) that allows users to store and retrieve any amount of data, at any time, from anywhere on the web. It’s commonly used for backup, archiving, and hosting data like images, videos, and website content.