High Availability - High availability provides redundancy and fault tolerance. The goal is to ensure the system is always available, even in the event of a failure. A system is highly available when it can withstand the failure of an individual or multiple components (e.g., hard disks, servers, network links).
Backup is critical to protect data and ensure business continuity. At the same time, backup can be a challenge to implement well. The pace at which data is generated is growing exponentially. The density and durability of local disk is not benefiting from the same growth rate. The enterprise backup has become its own industry.
Data is generated on an arbitrarily large number of endpoints—laptops, desktops, servers, virtual machines, and now mobile devices. This means the problem of backup is distributed in nature. Current backup software is very centralized—the general model is to collect data from many devices and store it in single place. Sometimes a copy of that stored data is also sent to tape. The centralized approach has the potential to overwhelm the backup target during recovery from a disaster and result in broken recovery SLAs.
Enterprise backup scenarios previously to look like this: If you wanted high performance data access, your backup had to live on disk. If you wanted cost-effective archival storage, your backup had to live on tape. If you wanted to archive off site, you had to physically deliver your archival tapes to another location. Recovery from local disk was fine, unless you needed something from a tape, and it might have been a while if that tape wasn’t on site. The cloud has changed things. Backup software can write to the cloud without any changes to the backup software itself.
Disaster recovery (DR) is about preparing for and recovering from a disaster. A disaster is any event that has a negative impact on a company’s business continuity or finances—including hardware or software failure, network outage, power outage, physical damage to a building like fire or flooding, human error, or some other significant event.
To minimize the impact of a disaster, companies invest time and resources to plan and prepare, train employees, and document and update DR processes. The amount of investment for DR planning for a particular system can vary dramatically depending on the cost of a potential outage. Companies that have traditional physical environments typically must duplicate their infrastructure to ensure the availability of spare capacity in the event of a disaster. The infrastructure needs to be procured, installed, and maintained so that it is ready to support the anticipated capacity requirements. During normal operations, the infrastructure typically is under-utilized or over-provisioned.
Objects redundantly stored on multiple devices across multiple facilities within a Region, designed to provide a durability of 99.999999999%
Data protection with versioning, MFA, bucket policies, and IAM
Cross-region replication enables automatic, asynchronous copying of objects across buckets in different AWS Regions
Amazon S3 Glacier
Designed for the same durability as Amazon S3
An inventory of all archives in each of your vaults is maintained for disaster recovery or occasional reconciliation purposes
Create point-in-time volume snapshots
Copy snapshots across Regions and accounts
Snapshots are stored in Amazon S3, taking advantage of Amazon S3’s durability and availability
Volumes are replicated across multiple servers in an Availability Zone
Using Snowball helps eliminate challenges that can be encountered with large-scale data transfers, such as:
High network costs
Long transfer times
Snowball devices can help retrieve data (>10TB) much more quickly than high-speed internet.
Amazon EFS File Sync can be used to sync files from on-premises or in-cloud file systems to Amazon EFS at speeds of up to 5x faster than standard Linux copy tools.
Amazon Route 53
Distribute traffic / load across multiple regions based on a variety of options
Automatic failover to another region in event of a failure.
Elastic Load Balancing
Load balancing across multiple EC2 instances in different AZ
Health checks and automatic failover if an instance fails.
Extend your existing network topo to the cloud. i.e: recover enterprise applications that are running on the on-premise network.
Host infra in multiple AZ for greater availability.
Fast and consistent replication / backup of your large on-premise environment to the cloud.
Each connection has two endpoints for greater fault tolerence.
Snapshot data and save it in a separate Region
Can save a manual snapshot with up to 20 other AWS accounts
Combine Read Replicas with multi-AZ deployments (dependent on database engine)
Read Replicas can be promoted to become primary database instances in the event of primary database instance failure Automatic backups available
Back up full tables to other Regions or to Amazon S3 within seconds
Point-in-time recovery enables you to continuously back up tables for up to 35 days
Initiate backups with a single click in the console or a single API call
Build multi-region, multi-master tables with global tables
Global tables are replicated across Regions
Model your entire infrastructure in a text file, allowing for fast and consistent redeployment of failed/lost infrastructure
No need to perform manual actions or write custom scripts
Rolls back changes automatically in event of error
Quickly redeploy your entire stack in a few clicks
Roll back to a previous version of your application if your updated version fails
Automatic host replacement
Combine it with AWS CloudFormation in the recovery phase
Provision a new stack in the stored configuration that supports the defined RTO