Kubernetes: piloting the cybernetic dreamboat

The name Kubernetes originates from Greek, meaning helmsman or pilot, and is the root of governor and cybernetic.

The term “cybernetics” conjures up images of Cold War control rooms, creepy physiological experiments and old-timey transistors. Yet, as hinted by the hat-tip concealed in the naming of Google’s container orchestration platform, the discipline is the source of key concepts underlying modern DevOps. What can we gain from paying more attention to this strand in the history of our craft?

“Cybernetics” was coined from the Greek word κυβερνήτης in the 40s by one of the pioneers of high-speed digital computing, Norbert Wiener, to describe “the entire field of control and communication theory, whether in the machine or in the animal” (Cybernetics, 1948). Wiener’s central message was twofold:

The twin pillars of DevOps — (1) minimisation of operational burden and (2) “agile” responsiveness to customer-demand — are the modern incarnation of this twofold cybernetic dream. Let us look at some ways in which Kubernetes allows us to fulfil these objectives, and see what we can learn about them by placing them in a cybernetic context.

As usual, if you enjoy this post, please clap 👏 , share, and tweet 🐦! It is all about feedback after all…

Minimisation of operational burden

Kubernetes consists of a control loop and several controllers which, on each iteration of the loop, issue probes and respond to stimuli such as incoming requests in order to bring the cluster closer to a predefined state. The intention is that the cluster should continue to serve users no matter how its environment is altered, with minimal intervention from operations teams. In cybernetics, the ability of a machine to maintain constant function while suffering external disruption is denoted with a term borrowed from biology: “homeostasis”. The Kubernetes landing page describes the platform using a similar biological metaphor: “self-healing”.

At the core of Kubernetes’ self-healing mechanism are Replication Controllers. On each iteration of the control loop, these:

Check for failed containers and restart them
Check for failed nodes and reschedule any containers that were running on them
Kill any containers that don’t respond to health checks
Advertise containers if and when they are ready

Other types of controllers look after nodes, routes, services and volumes, and behave in more or less complex ways.

The example of autoscaling

One controller with relatively complex behaviour is the Horizontal Pod Autoscaler, in charge of increasing or decreasing the number of pods running on the cluster (via Deployment controllers, which carry out the actual scaling).

Digging into the literature on control theory helps us to better understand the trade-offs of the algorithm used by Kubernetes’ HPA and throws up some interesting alternatives. The “PID” (proportional-integral-derivative) controller, first developed by Vladimir Minorsky for the steering of ships, is of particular interest.

A naive autoscaler

A naive autoscaling algorithm might operate as follows:

Every 30 seconds, check the average CPU utilisation of pods (or some other metric, which could be user-defined)
If utilisation is below the target range, decrease the number of pods by a fixed amount
If utilisation is above the target range, increase the number of pods by a fixed amount

There are some big problems with such a simple algorithm:

There is no relation between the degree of deviation from the target and the size of the scale event. A large increase in demand could only be satisfied gradually by a succession of small scale-up events, by which time the demand may well have dissipated.
If scaling events take more than 30 seconds to complete, or have an effect on the target metric — for example, if starting a pod is unusually CPU-intensive — the system will tend to overshoot, causing “thrashing”

The second of these problems can be addressed by introducing “cooldown” periods after an auto scale event — typically a few minutes. Cooldown periods ensure that the system has time to regain equilibrium before a new event is triggered. With cooldown periods in place, something like the simple algorithm described here will provide an acceptable level of control for many applications. The AWS “simple” EC2 autoscaler works in this way.

A proportional autoscaler (Kubernetes’ HPA)

Alternatively, an autoscaler might alter the size of the scale event depending on the degree of deviation from the target. AWS provides a “step” autoscaler, which allows the user to define several scaling increments matched to different deviation thresholds. The Horizontal Pod Autoscaler takes a different approach, requiring much less configuration: the size of the autoscale event is proportional to the size of the deviation.

desiredReplicas = ceil[ currentReplicas * ( currentMetricValue / desiredMetricValue )]

Proportional autoscaling is not necessarily going to provide a better level of control than a well-configured step autoscaler, but it is more likely to work ‘out of the box’.

The main issue with proportional autoscaling is a phenomenon known as “proportional droop”. Imagine that demand on the cluster is constantly increasing. There is no way our autoscaler will be able to actually meet the increasing demand; it will always fall short by the product of the lag of the scaling event and the rate of the demand increase.

Note that Kubernetes’ algorithm is not a pure proportional autoscaler but incorporates several tweaks to improve performance. For example, it has a “tolerance” setting which prevents autoscaling events below a certain size; some pods, like those still initialising, are ignored for the purposes of the algorithm; a limit is applied to the size of scale-up events; and the algorithm is run several times within a window in order to reach a final figure (more details). In addition, because Kubernetes allows you to configure custom metrics, it is possible to elicit more complicated responses by feeding the autoscaler metrics which already have a layer of processing embedded within them. You can even write your own custom controller, perhaps integrating one of the refinements below:

A proportional-integral autoscaler

Proportional droop can be remedied in a less ad-hoc fashion by the addition of an integral term to the autoscaling algorithm.

Our formula would become something like:‍

desiredReplicas = ceil[ currentReplicas * ( currentMetricValue / desiredMetricValue + sumOfMetricErrorSinceLastScalingEvent )]

That last term is the integral term, calculated by repeatedly adding up (say, every 10 seconds) the amount by which the cluster is off its target. Consider again the example of constantly increasing demand. The longer that demand has been rising, the longer the proportional term has been failing to keep up with it, and so the greater the sum of the error becomes. The integral term thus neutralises the proportional droop given enough time.

A controller with an integral term exhibits more complex behaviour than a proportional controller and requires correspondingly more work to optimise. For example, if demand stops increasing, a proportional controller will never overshoot, whereas a proportional-integral controller may continue to increase because of the accumulated error. Nonetheless, proportional-integral controllers tuned with the right constants can offer exceedingly accurate control. A proportional-integral controller is in use at Facebook; you can read about its tuning.

A proportional-integral-derivative autoscaler

Because proportional-integral control still needs a certain amount of time for the integral term to accumulate, it is not always responsive enough. For applications that need extremely fast responses, a derivative term can be added. This term modifies the scaling event magnitude by an amount proportional to the rate at which the target metric is changing. It thus incorporates a predictive component into the autoscaling algorithm. If demand is increasing more rapidly than usual, the scaling event will be correspondingly larger. Because controllers incorporating a derivative term are predictive, they have a much greater sensitivity to noise: a very steep fluctuation, even if it is very short lived, could trigger a large scaling event.

You may have noticed that the trade-off between sensitivity to noise and lag is a recurrent one. Extending the cooldown period reduces the effect of noise, but increases lag. The Kubernetes HPA algorithm imposes a limit on the scaling factor which reduces the risk of inappropriately large actions due to noise, but is not well-adapted to users that want a faster response. The addition of a derivative term involves a similar trade-off.

A proportional-integral-derivative autoscaler is reportedly in use at Strava.

Other forms of control

PID control is an old and relatively simple form of control, but there is a lot more to explore. Predictive autoscalers which make use of machine learning are likely to become more important in the near future. AI-powered predictive autoscaling is in use at Netflix (Scryer) and now available on AWS. To get a sense of the range of options, cast your eyes over this list from The Art of Capacity Planning: Scaling Web Resources in the Cloud:

queuing theory, fuzzy logic, neural networks, reinforcement learning, support vector machines, wavelet transform, regression splines, pattern matching, Kalman filters, sliding window, proportional thresholding, second-order regression, histograms, time-series models, the secant method, voting system, and look-ahead control

Responsiveness to customer demand

DevOps is not just automation; rather, it is a culture that is developed within an organisation as a whole when it places emphasis on reducing lag and eliminating noise in the feedback loop between developers and customers. DevOps in this sense overlaps with what we mean by a “culture of learning”, as described by YLD’s CEO, Nuno Job. The necessity of the culture of learning derives from the fact that customer requirements are no more static than the typical load on a cluster. They are, in the cybernetic view, momentary samples of an oscillatory control loop whose operating principles (target and algorithm) can only be uncovered gradually through long term empirical research. That is why customer requirements can only be effectively tracked and predicted if infrastructure is flexible and reliable enough to repeatedly submit experiments to the market with minimal overhead. Helping clients get to that point is a large part of what we do at YLD.

Many organisations have such keen divisions between business strategy, marketing, design, operations and engineering capabilities that convincing them that good engineering and responsiveness to customer demand are the same thing is a real challenge. The key insight of Wiener and the cybernetic movement was precisely this: the feedback loops controlling computer infrastructure and those governing biological and social processes, including businesses and their customers, are inextricable. It is not just that systems need to become autonomous if a business is going to have the stability and flexibility to learn from and adapt to a changing environment; that autonomy actually is the capacity of systems to learn from and adapt to their environment. Autoscalers, for example, must be tuned to handle the patterns of load peculiar to the services they are responsible for scaling. Great automation is both an enabler and a consequence of the culture of learning.

Not long after the publication of Wiener’s book, his ideas were adopted within the field of management consultancy by, among others, the British consultant Stafford Beer. Beer developed a model of how an organisation could function as what he called a “viable system” (read: adaptive, homeostatic system): at the core of the model is a set of feedback loops monitoring and correcting mistakes at every level of the business. Beer’s focus is on the internal dynamics of organisations. The feedback loop between the customer and the business receives relatively little emphasis in his model. The capacity for customer monitoring unleashed by the Internet was not yet a reality. Today, the cybernetic vision of Wiener in which fully autonomous systems adapt and grow, fertilised by the data of billions of interactions, is upon us.

There is of course another and darker aspect of the cybernetic identity of customer and computer, one which concerned Wiener greatly:
Long before Nagasaki and the public awareness of the atomic bomb, it had occurred to me that we were here in the presence of another social potentiality of unheard-of importance for good and for evil.

If users wish to reap the full benefit of autonomous systems, they need to submit to them on the terms of those systems, yielding their data for consumption in a form that the systems can understand. They need to be willing to become part of a control loop — to become automatons themselves! The genealogy of cybernetics makes this danger clear. Cybernetics was forged in war. The lens cybernetics places on human behaviour is the lens of the gunner, implacable and detached. Here is Wiener, as quoted in The Ontology of the Enemy: Norbert Wiener and the Cybernetic Vision, reflecting on the realisation that enabled him to construct a gun able to track the apparently random motions of pilots:

We realized that the “randomness” or irregularity of an airplane’s path is introduced by the pilot; that in attempting to force his dynamic craft to execute a useful manoeuver, such as straight-line flight or 180 degree turn, the pilot behaves like a servo-mechanism, attempting to overcome the intrinsic lag due to the dynamics of his plane as a physical system, in response to a stimulus which increases in intensity with the degree to which he has failed to accomplish his task.

This “realisation”, in which the feedback loop from organism-machine to machine-organism was closed for the first time — in order to shoot down enemy planes — is the founding moment of cybernetics. The danger of objectifying the customer and, worse, turning him or her into a target, is one reason that a purely metrics-driven approach to business operations and product design will never be enough. Conversion, clickthrough and ‘likes’ tell a human story from a machine’s perspective. There must be a place for empathy and imagination in the control loop.

What happened to cybernetics?

The extent to which cybernetics penetrated intellectual culture internationally through the 50s, 60s and 70s can hardly be overstated. The discipline was developed and disseminated through a series of conferences (the “Macy conferences”) and nurtured at dedicated institutions like the Biological Computation Lab at the University of Illinois. It ended up exerting a profound influence on figures as diverse as the French psychoanalyst Jacques Lacan and the Chilean president Salvador Allende, who, in a high point for the discipline, hired Stafford Beer to organise the Chilean economy. Traces of this craze are still with us in the prevalence of the word “cyber” to denote anything to do with computers, although the word means not “computation”, but “control”. By and large, however, the discipline has faded into obscurity. So what happened?

While the interdisciplinary nature of cybernetics was undoubtedly part of the reason for its initial popularity, it became an ingredient in its downfall, as its sub-disciplines fractured and splintered or evaporated into generalities. Technological advance fuelled this process. Funding for cybernetic research groups started to flag significantly in the 70s, around the time that ARPANET and the first personal computers were being developed. Suddenly, much of what cyberneticians had been discussing in theoretical terms was becoming a reality, and so attention passed from theory back to engineering. What happened to cybernetics, then? Cybernetics has not gone away. It has disappeared into the technologies that we use and the practices we follow on a day-to-day basis, whether we realise it or not.

Kubernetes: piloting the cybernetic dreamboat

Minimisation of operational burden

The example of autoscaling

A naive autoscaler

A proportional autoscaler (Kubernetes’ HPA)

A proportional-integral autoscaler

A proportional-integral-derivative autoscaler

Other forms of control

Responsiveness to customer demand

What happened to cybernetics?

Further reading

View more blogs

Combatting sophisticated cybersecurity threats with AI

Why Evals are the missing link to your AI strategy

Get in touch