Applying Laws of Scalability to Technology and People
As businesses grow with larger customers size and hire more employees, they face challenges to meet the customer demands in terms of scaling their systems and maintaining rapid product development with bigger teams. The businesses aim to scale systems linearly with additional computing and human resources. However, systems architecture such as monolithic or ball of mud makes scaling systems linearly onerous. Similarly, teams become less efficient as they grow their size and become silos. A general solution to solve scaling business or technical problems is to use divide & conquer and partition it into multiple sub-problems. A number of factors affect scalability of software architecture and organizations such as the interactions among system components or communication between teams. For example, the coordination, communication and data/knowledge coherence among the system components and teams become disproportionately expensive with the growth in size. The software systems and business management have developed a number of laws and principles that can used to evaluate constraints and trade offs related to the scalability challenges. Following is a list of a few laws from the technology and business domain for scaling software architectures and business organizations:
Amdahl’s Law is named after Gene Amdahl that is used to predict speed up of a task execution time when it’s scaled to run on multiple processors. It simply states that the maximum speed up will be limited by the serial fraction of the task execution as it will create resource contention:
Speed up (P, N) = 1 / [ (1 — P) + P / N ]
Where P is the fraction of task that can run in parallel on N processors. When N becomes large, P / N approaches 0 so speed up is restricted to 1 / (1 — P) where the serial fraction (1 — P) becomes a source of contention due to data coherence, state synchronization, memory access, I/O or other shared resources.
Amdahl’s law can also be described in terms of throughput using:
N / [ 1 + α (N — 1) ]
Where α is the serial fraction between 0 and 1. In parallel computing, a class of problems known as embarrassingly parallel workload where the parallel tasks have a little or no dependency among tasks so their value for α will be zero because they don’t require any inter-task communication overhead.
Amdah’s law can be used to scale teams as an organization grows where the teams can be organized as small and cross-functional groups to parallelize the feature work for different product lines or business domains, however the maximum speed up will still be limited by the serial fraction of the work. The serial work can be: build and deployment pipelines; reviewing and merging changes; communication and coordination between teams; and dependencies for deliverables from other teams. Fred Brooks described in his book The Mythical Man-Month how adding people to a highly divisible task can reduce overall task duration but other tasks are not so easily divisible: while it takes one woman nine months to make one baby, “nine women can’t make a baby in one month”.
Brooks’s law was coined by Fred Brooks that states that adding manpower to a late software project makes it later due to ramp up time. As the size of team increases, the ramp up time for new employees also increases due to quadratic communication overhead among team members, e.g.
Number of communication channels = N x (N - 1) / 2
The organizations can build small teams such as two-pizza/single-threaded teams where communication channels within each team does not explode and the cross-functional nature of the teams require less communication and dependencies from other teams. The Brook’s law can be equally applied to technology when designing distributed services or components so that each service is designed as a loosely coupled module around a business domain to minimize communication with other services and services only communicate using a well designed interfaces.
Universal Scalability Law
The Universal Scalability Law is used for capacity planning and was derived from Amdahl’s law by Dr. Neil Gunther. It describes relative capacity in terms of concurrency, contention and coherency:
C(N) = N / [1 + α(N — 1) + βN (N — 1) ]
Where C(N) is the relative capacity, α is the serial fraction between 0 and 1 due to resource contention and β is delay for data coherency or consistency. As data coherency (β) is quadratic in N so it becomes more expensive as size of N increases, e.g. using a consensus algorithm such as Paxos is impractical to reach state consistency among large set of servers because it requires additional communication between all servers. Instead, large scale distributed storage services generally use sharding/partitioning and gossip protocol with a leader-based consensus algorithm to minimize peer to peer communication.
The Universal Scalability Law can be applied to scale teams similar to Amdahl’s law where α is modeled for serial work or dependency between teams and β is modeled for communication and consistent understanding among the team members. The cost of β can be minimized by building cross-functional small teams so that teams can make progress independently. You can also apply this model for any decision making progress by keeping the size of stake holders or decision makers small so that they can easily reach the agreement without grinding to halt.
The gossip protocols also applies to people and it can be used along with a writing culture, lunch & learn and osmotic communication to spread knowledge and learnings from one team to teams.
L = λ W
Where L is the average number of items within the system or queue, λ is the average arrival time of items and W is the average time an item spends in the system. The Little's law and queuing theory can be used for capacity planning for computing servers and minimizing waiting time in the queue (L).
The Little’s law can be applied for predicting task completion rate in an agile process where L represents work-in-progress (WIP) for a sprint; λ represents arrival and departure rate or throughput/capacity of tasks; W represents lead-time or an average amount of time in the system.
WIP = Throughput x Lead-Time
Lead-Time = WIP / Throughput
You can use this relationship to reduce the work in progress or lead time and improve throughput of tasks completion. Little’s law observes that you can accomplish more by keeping work-in-progress or inventory small. You will be able to better respond to unpredictable delays if you keep a buffer in your capacity and avoid 100% utilization.
where τ is the mean service time, μ = 1/τ is the service rate, λ is the mean arrival rate, ρ = λ/μ is the utilization, ca is the coefficient of variation for arrivals and cs is the coefficient of variation for service times. The King’s formula shows that the queue sizes increases to infinity as you reach 100% utilization and you will have longer queues with greater variability of work. These insights can be applied to both technical and business processes so that you can build systems with a greater predictability of processing time, smaller wait time E(W) and higher throughput λ.
Note: See Erlang analysis for serving requests in a system without a queue where new requests are blocked or rejected if there is not sufficient capacity in the system.
Gustafson’s law improves Amdahl’s law with a keen observation that parallel computing enables solving larger problems by computations on very large data sets in a fixed amount of time. It is defined as:
S = s + p x N
S = (1 — s) x N
S = N + (1 — N) x s
where S is the theoretical speed up with parallelism, N is the number of processors, s is the serial fraction and p is the parallel part such that s + p = 1.
Gustafson’s law shows that limitations imposed by the sequential fraction of a program may be countered by increasing the total amount of computation. This allows solving bigger technical and business problems with a greater computing and human resources.
Conway’s law states that an organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure. It means that the architecture of a system is derived from the team structures of an organization, however you can also use the architecture to derive the team structures. This allows defining building teams along the architecture boundaries so that each team is a small, cross functional and cohesive. A study by the Harvard Business School found that the often large co-located teams tended to produce more tightly-coupled and monolithic codebases whereas small distributed teams produce more modular codebases. These lessons can be applied to scaling teams and architecture so that teams and system modules are built around organizational boundaries and independent concerns to promote autonomy and reduce tight coupling.
The Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of causes. This principle shows up in numerous technical and business problems such as 20% of code has the 80% of errors; customers use 20% of functionality 80% of the time; 80% of optimization improvements comes from 20% of the effort, etc. It can also be used to identify hotspots or critical paths when scaling, as some microservices or teams may receive disproportionate demands. Though, scaling computing resources is relatively easy but scaling a team beyond an organization boundary is hard. You will have to apply other management tools such as prioritization, planning, metrics, automation and better communication to manage critical work.
Number of possible pair connections = N * (N — 1) / 2
Reed’s Law expanded this law and observed that the utility of large networks can scale exponentially with the size of the network.
Number of possible subgroups of a network = 2^N − N − 1
This law explains the popularity of social networking services via viral communication. These laws can be applied to model information flow between teams or message exchange between services to avoid peer to peer communication with extremely large group of people or a set of nodes. A common alternative is to use a gossip protocol or designate a partition leader for each group that communicates with other leaders and then disseminate information to the group internally.
The Dunbar’s number is a suggested cognitive limit to the number of people with whom one can maintain stable social relationships. It has a commonly used value of 150 and can be used to limit direct communication connections within an organization.
Above laws shows how you can partition tightly coupled architecture and large teams into modular architecture and small autonomous teams. For example, Amdahl’s and Universal Scalability laws demonstrate that you have to account for the cost of serial work, coordination and communication between partitions as you parallelize the problem because they become bottleneck as you scale. Brook’s and Metcalfe’s laws indicate that you will need to manage the number of communication paths among modules or teams as they can explode quadratically thus stifling your growth. Little’s law and King’s formula establishes that you need to reduce inventory or work in progress and avoid 100% utilization in order to provide reliable throughput. Conway’s law shows how architecture and team structures can be aligned for maximum autonomy and productivity. This allows you to accomplish more work by using small cross functional teams who own independent product lines and build modular architecture to reduce dependency on other teams and subsystems. Pareto principle can be used to make small changes to the architecture or teams that results in higher scalability and productivity. Dunbar number only applies to people but it can be used to limit dependencies for external teams as a human mind has a finite capacity to maintain external relationships. However, before applying these laws, you should have clear goals and collect proper metrics and KPIs so that you can measure the baseline and improvements from these laws. You should also be cautious when applying these laws prematurely for scalability as it may make things worse. Finally, when solving scalability and performance related problems, it is vital to focus on global optimization to scale an entire organization or the system as opposed to a local optimization by focusing strictly only on a specific part of the system.