Auto Scaling

Auto Scaling

  • Launches or terminates instances based on specified conditions

  • Automatically registers new instances with specified load balancers

  • Can launch across Availability Zones

  • Can leverage On-Demand, Reserved, and Spot Instances

Auto Scaling Types

Schedule Scaling

  • Scale based on schedule; scale your application ahead of known load changes

  • Example: Turning off your dev and test instances at night

Dynamic Scaling

  • Excellent for general scaling

  • Allows your scaling to respond to unanticipated changes in traffic

  • Example: Scaling based on a CPU utilization CloudWatch alarm

Predictive Scaling

  • Easiest to use

  • Scales based on machine learning algorithms

  • Example: Want to eliminate manual monitoring and adjustment of Auto Scaling

Amazon Auto Scaling Example

Step 1 - ELB triggers Amazon CloudWatch

In this example, the load balancer was configured to report latency to Amazon CloudWatch, which has an alarm set up to trigger when latency gets poor enough to warrant adding more instances.

Step 2 - CloudWatch triggers scaling policy

When the CloudWatch alarm goes off, it triggers a scaling policy set up in the Auto Scaling group for those instances.

Step 3 - Auto Scaling scales out and registers instance with load balancer

Finally, once the scaling action is triggered, Auto Scaling launches a third instance into the group, based on the configurations specified in the Auto Scaling group, and registers that instance with the load balancer so that it will receive traffic appropriately.

Auto Scaling groups

  • Minimum capacity - This is the lowest number of instances this group can have. If you reach this number, but a CloudWatch alarm tries to tell Auto Scaling to scale in more, Auto Scaling won’t scale in.

  • Maximum capacity - This is the most instances this group can have. If CloudWatch alarms tell the group to scale out, Auto Scaling will not be able to.

  • Desired capacity - When you initially create your Auto Scaling group, the desired capacity will be the number of instances your group begins with. As CloudWatch alarms go off and request Auto Scaling to scale, Auto Scaling will change the desired capacity to whatever quantity of instances it needs to scale in or out to.

For example, you start your group with the following settings: Min: 2 Max: 10

Desired: 5

Auto Scaling launches 5 instances.

A few minutes later, a CPU capacity alarm goes off, and CloudWatch requests Auto Scaling to scale this group in by 1 instance. Auto Scaling then changes the group’s desired capacity to 4 and terminates 1 instance, based on your termination policy. Your new desired capacity is 4. If you go in and change that back to 5 later, Auto Scaling will launch a new instance to match the desired capacity.

We recommend starting with minimum and desired capacities of 2 instances (1 per Availability Zone).

Auto Scaling best practices

  • Avoid thrashing (aggressive instance termination)

  • Scale out early, scale in slowly

  • Set the min and max capacity parameters carefully

  • Use lifecycle hooks (perform custom actions as Auto Scaling launches or terminates instances)

  • Stateful applications require additional automatic configuration of instances launched into Auto Scaling groups

Amazon RDS Scaling

Horizontal: Read Replicas

  • For read-heavy workloads

  • Offloads reporting

  • Replication is asynchronous

  • Available for Amazon Aurora, MySQL, MariaDB, and PostgreSQL

Horizontal: Sharding

  • For write-heavy workloads

  • Splits data into large chunks (shards)

  • Can give you higher performance and better operating efficiency in many circumstances

Vertical: Push-Button Scaling

  • Scale your Amazon RDS instances up or down with the RDS APIs or a few clicks in the console, often with no downtime

  • Scale from micro to 8xlarge and everything in between

  • Scale storage up with zero downtime

  • Scale throughput (when using provisioned IOPS RDS storage)

Amazon DynamoDB Scaling

Auto Scaling

  • Automatically adjusts read and write throughput capacity in response to dynamically changing request volumes with zero downtime

  • Default for all new tables

  • Just set your desired throughput utilization target, minimum and maximum limits

  • Continuously monitors actual throughput consumption using Amazon CloudWatch

  • No additional cost to use

  • Available in all AWS Regions

  • Best for general scaling needs for most applications with relatively predictable scaling needs


  • Flexible billing option, capable of serving thousands of requests per second without capacity planning

  • Uses pay-per-request pricing instead of a provisioned pricing model

  • DynamoDB adapts rapidly to accommodate new peaks in level of traffic

  • Best for spiky, unpredictable workloads