Web Server Scaling

What is Application Scalability?

An application scalability is the potential of an application to grow in time, being able to efficiently handle more and more requests per minute (RPM). It’s not just a simple tweak we can turn on/off, it’s a long-time process that touches almost every single item in our stack, including both hardware and software sides of the system.

Types of Scaling

Scaling is classified into two broad categories – Scale Up, Scale Out

  • Scaling Up: This involves adding more resources to your servers e.g. RAM, disk space, processors etc. It is useful in certain scenarios but will turn out to be expensive after a particular point and you will discover that it’s better to resort to Scaling Out
  • Scaling Out: In this process, more machines or additional server instances/nodes are added. This is also called clustering because all the servers are supposed to work together in unison (as a Group or Cluster) and should be transparent to the client.

Scaling out is fine if your application is stateless i.e. your application logic does not depend on existing server state to process a request e.g. RESTful API back end.

High Availability!=Scalability

Yes! Just because a system is Highly Available (by having multiple servers nodes to fail over to), does not mean it is scalable as well. HA Proxy just means that, if the current processing node crashes, the request would be passed on or failed over to a different node in the cluster so that it can continue from where it started – that’s pretty much it! Scalability is the ability to improve specific characteristics of the system (e.g. number of users, throughput, performance) by increasing the available resources (RAM, processor etc.) Even if the failed request is passed on to another node, you cannot guarantee that the application will behave correctly in that scenario (read on to understand why)

Load Balance Your Scaled Out Cluster

Let’s assume that we have scaled up to our maximum capacity and now we have scaled out our system by having multiple nodes forming a cluster. Now what we would do is put a Load Balancer in front of our clustered infrastructure so that we can distribute the load among our cluster members. Load balancing is not covered in detail since I do not have too much insight except for the basics 🙂

simple-cluster-with-LB.pptxHello, Sticky Sessions!


Sticky Session configuration can be done on the load balancer level to ensure that a request from a specific client/end user is always forwarded to the same instance/application server node i.e server affinity is maintained. Thus, we alleviate the problem of the required state not being present. But there is a catch here – what if that node crashes? The state will be destroyed and the user will be forwarded to an instance where there is no existing state on which the server side request processing depends.


Enter Replicated Clustering

In order to resolve the above problem, we can configure our application server clustering mechanism to support replication for our stateful components. By doing this we can ensure that our HTTP session data (and other stateful objects) are present on all the server instances. Thus the end user request can be forwarded to any server node now. Even if a server instance crashes or is unavailable, any other node in the cluster can handle the request. Now, our cluster is not an ordinary cluster – it’s a replicated cluster


External Store for Stateful Components

This can be avoided by storing session data and stateful objects in another tier. We can do so using RDBMS. Again, most application servers have inbuilt support for this.


If we notice, we have moved the storage from an in-memory tier to a persistent tier – at the end of the day, we might end up facing scalability issues because of the Database. I am not saying this will happen for sure, but depending upon our application, DB might get overloaded and latency might creep in e.g. in case of a failover scenario, think about recreating the entire user session state from the DB for use within another cluster instance – this can take time and affect end-user experience during peak loads

Final Frontier: Caching

It is the final frontier – at least in my opinion since it moves us back to the in-memory approach. We can use Redis, Memcache or any other distributed caching server to get the better outcome in terms of scalability.


Toufiq Mahmud