What is Auto-Scaling?

Table of Contents

What is ?

Auto-Scaling is a concept within cloud computing that can mean one of two things:

Auto-Scaling is a process of virtualization resource automation wherein a cloud service provider will scale the resources of a client’s hosting environment to meet the demand being placed on that environment. In common terms, auto-scaling means a provider will instantly scale server resources up or down depending on how much incoming traffic that hosted website is receiving. During peak hours of traffic, the resources of a hosting environment are scaled up to meet traffic demand and likewise, when that traffic backs off, resources are scaled down to ensure the client doesn’t pay for services not being utilized.

Auto-Scaling also refers to a process within the cloud wherein providers will scale a hosting environment by provisioning and deploying additional cloud instances onto the original environment. In this type of auto-scaling, not only are resources of the original hosting environment scaled to meet incoming demand, wholly new cloud servers and instances are deployed to meet that demand. In addition, this type of auto-scaling will utilize the addition of load balancing technologies to ensure round robin traffic routing. In this version of auto-scaling, the resources and servers are, as above, virtualized.

Traditional Scaling Models

Within the traditional web hosting model, scaling was conducted in one of two ways:

Purchasing More Than Need. The first, and most traditional avenue for auto-scaling, was to purchase more servers/resources than the application would ever need. The basic idea here is simple: adding more servers than the application needs ensures demand will always be met and thus, downtime will never occur as a result of insufficient environment resources. The downside to this strategy is purchasing servers and resources that your demand may never need. Overshooting to protect against demand means paying for resources your application may never need.

Averaging Capacity. The second traditional form of auto-scaling was to purchase enough servers/resources to meet the average demand of your application. This goal was accomplished by averaging out the performance of your application over a given amount of time and purchasing the exact amount of resources/servers need to meet that mean demand. While this option isn’t as expensive as overshooting resource need, clients run the risk of operating with poor customer service and possibly, experiencing mass downtime if average demand is eclipsed.

A Third Option: Hybrid Auto-Scaling

The third option of auto-scaling, the dual process described at the start of this article, combines both older models to maximize server and resource scaling without having to worry about paying for resources not used or experiencing application downtime as a result of eclipsing average environment use.

For the client, auto-scaling means paying a more exact amount for used services while also being more certain your hosted environment won’t fail due to high traffic load.

For the provider, auto-scaling means being able to allocate a larger pool of resources to a larger quantity of clients without having to worry about maximum load on a single environment slowing down server performance.

Auto-scaling is a win-win for all involved.

Rick Donato

Rick Donato is a Network Automation Architect/Evangelist and the founder of Packet Coders.

Want to become a certified AWS expert?

Here is our hand-picked selection of the best courses you can find online:
Ultimate AWS Certified Cloud Practitioner course
Ultimate AWS Certified Solutions Architect Associate course
Ultimate AWS Certified Developer Associate course
and our recommended certification practice exams:
AlphaPrep Practice Tests - Free Trial