Availability

Availability concepts

High Availability - High availability provides redundancy and fault tolerance. The goal is to ensure the system is always available, even in the event of a failure. A system is highly available when it can withstand the failure of an individual or multiple components (e.g., hard disks, servers, network links).

Backup

Backup is critical to protect data and ensure business continuity. At the same time, backup can be a challenge to implement well. The pace at which data is generated is growing exponentially. The density and durability of local disk is not benefiting from the same growth rate. The enterprise backup has become its own industry.

Data is generated on an arbitrarily large number of endpoints—laptops, desktops, servers, virtual machines, and now mobile devices. This means the problem of backup is distributed in nature. Current backup software is very centralized—the general model is to collect data from many devices and store it in single place. Sometimes a copy of that stored data is also sent to tape. The centralized approach has the potential to overwhelm the backup target during recovery from a disaster and result in broken recovery SLAs.

Enterprise backup scenarios previously to look like this: If you wanted high performance data access, your backup had to live on disk. If you wanted cost-effective archival storage, your backup had to live on tape. If you wanted to archive off site, you had to physically deliver your archival tapes to another location. Recovery from local disk was fine, unless you needed something from a tape, and it might have been a while if that tape wasn’t on site. The cloud has changed things. Backup software can write to the cloud without any changes to the backup software itself.

Disaster Recovery

Disaster recovery (DR) is about preparing for and recovering from a disaster. A disaster is any event that has a negative impact on a company’s business continuity or finances—including hardware or software failure, network outage, power outage, physical damage to a building like fire or flooding, human error, or some other significant event.

To minimize the impact of a disaster, companies invest time and resources to plan and prepare, train employees, and document and update DR processes. The amount of investment for DR planning for a particular system can vary dramatically depending on the cost of a potential outage. Companies that have traditional physical environments typically must duplicate their infrastructure to ensure the availability of spare capacity in the event of a disaster. The infrastructure needs to be procured, installed, and maintained so that it is ready to support the anticipated capacity requirements. During normal operations, the infrastructure typically is under-utilized or over-provisioned.

AWS storage DR Options

Amazon S3

  • Objects redundantly stored on multiple devices across multiple facilities within a Region, designed to provide a durability of 99.999999999%

  • Data protection with versioning, MFA, bucket policies, and IAM

  • Cross-region replication enables automatic, asynchronous copying of objects across buckets in different AWS Regions

Amazon S3 Glacier

  • Designed for the same durability as Amazon S3

  • An inventory of all archives in each of your vaults is maintained for disaster recovery or occasional reconciliation purposes

Amazon EBS

  • Create point-in-time volume snapshots

  • Copy snapshots across Regions and accounts

  • Snapshots are stored in Amazon S3, taking advantage of Amazon S3’s durability and availability

  • Volumes are replicated across multiple servers in an Availability Zone

AWS Snowball

Using Snowball helps eliminate challenges that can be encountered with large-scale data transfers, such as:

  • High network costs

  • Long transfer times

  • Security concerns

  • Snowball devices can help retrieve data (>10TB) much more quickly than high-speed internet.

Amazon EFS

Amazon EFS File Sync can be used to sync files from on-premises or in-cloud file systems to Amazon EFS at speeds of up to 5x faster than standard Linux copy tools.

AWS Networking DR Options

Amazon Route 53

  • Distribute traffic / load across multiple regions based on a variety of options

  • Automatic failover to another region in event of a failure.

Elastic Load Balancing

  • Load balancing across multiple EC2 instances in different AZ

  • Health checks and automatic failover if an instance fails.

Amazon VPC

  • Extend your existing network topo to the cloud. i.e: recover enterprise applications that are running on the on-premise network.

  • Host infra in multiple AZ for greater availability.

AWS DX

  • Fast and consistent replication / backup of your large on-premise environment to the cloud.

  • Each connection has two endpoints for greater fault tolerence.

AWS Database DR Options

Amazon RDS

  • Snapshot data and save it in a separate Region

  • Can save a manual snapshot with up to 20 other AWS accounts

  • Combine Read Replicas with multi-AZ deployments (dependent on database engine)

  • Read Replicas can be promoted to become primary database instances in the event of primary database instance failure Automatic backups available

Amazon DynamoDB

  • Back up full tables to other Regions or to Amazon S3 within seconds

  • Point-in-time recovery enables you to continuously back up tables for up to 35 days

  • Initiate backups with a single click in the console or a single API call

  • Build multi-region, multi-master tables with global tables

  • Global tables are replicated across Regions

AWS deployment automation disaster recovery options

CloudFormation

  • Model your entire infrastructure in a text file, allowing for fast and consistent redeployment of failed/lost infrastructure

  • No need to perform manual actions or write custom scripts

  • Rolls back changes automatically in event of error

AWS ElasticBeanstalk

  • Quickly redeploy your entire stack in a few clicks

  • Roll back to a previous version of your application if your updated version fails

AWS OpsWorks

  • Automatic host replacement

  • Combine it with AWS CloudFormation in the recovery phase

  • Provision a new stack in the stored configuration that supports the defined RTO