Auto Scaling

Definition: Ability to automatically adjust compute resources based on actual demand, scaling up or down according to defined metrics.

— Source: NERVICO, Product Development Consultancy

What is Auto Scaling

Auto Scaling is the ability to automatically adjust the amount of compute resources allocated to an application based on actual demand. When traffic increases, Auto Scaling adds additional instances or containers to absorb the load. When demand decreases, it reduces resources to avoid paying for unused capacity. On AWS, Auto Scaling applies to EC2 instances, ECS tasks, DynamoDB tables, Aurora clusters, and other services, always based on metrics such as CPU usage, memory, requests per second, or custom metrics.

How It Works

Auto Scaling operates through policies that define when and how to scale. A target tracking scaling policy maintains a metric at a specific value: for example, keeping CPU usage at 60%. When the metric exceeds the target, Auto Scaling launches new instances. When it drops, it gradually terminates them. Step scaling policies allow defining different actions based on the magnitude of the deviation. Scheduled scaling for predictable patterns is also available. Auto Scaling uses a scaling group that defines minimum and maximum instance limits, the launch configuration (AMI, instance type, security groups), and the subnets where instances are deployed.

Why It Matters

Manually sizing infrastructure forces a choice between overprovisioning (paying for idle resources) or underprovisioning (risking downtime under load). Auto Scaling eliminates this dilemma by adjusting capacity in real time. For applications with variable traffic, such as e-commerce platforms with seasonal peaks or APIs with concentrated business-hours usage, Auto Scaling can reduce costs by 30% to 60% compared to fixed infrastructure, while ensuring availability during demand spikes.

Practical Example

A hotel booking platform experiences 5x higher traffic on Monday mornings when companies book business travel. The team configures Auto Scaling with a minimum of 3 EC2 instances, a maximum of 15, and a CPU target of 65%. On Mondays at 8:00 AM, traffic increases and Auto Scaling launches additional instances within 2 minutes. By 12:00 PM, when traffic stabilizes, the extra instances are gradually terminated. Weekly costs decrease by 45% compared to maintaining 15 instances permanently active.

What is Auto Scaling

How It Works

Why It Matters

Practical Example

Related Terms

Need help with product development?