Reviewing the Eight Fallacies of Distributed Computing – InfoQ.com

In a recent article on the Ably Blog, Alex Diaconu reviewed the thirty-year-old "eight fallacies of distributed computing" and provided a number of hints at how to handle them. InfoQ has taken the chance to talk with Diaconu to learn more about how Ably engineers deal with the fallacies.

The eight fallacies are a set of conjectures about distributed computing which can lead to failures in software development. The assumptions are: the network is reliable; latency is zero; bandwidth is infinite; the network is secure; topology doesn't change; there is one administrator; transport cost is zero; the network is homogeneous.

The fallacies can be seen as architectural requirements you have to account for when designing distributed systems. InfoQ has taken the chance to talk with Diaconu to learn more about how Ably engineers deal with the fallacies.

InfoQ: Almost thirty years since the fallacies of distributed computing were initially suggested, they are still highly relevant. What's their role at Ably?

Diaconu: All of the fallacies are pointers to distributed system design pitfalls, and they are all still relevant today. They don't all have the same impact some are more easily accommodated than others. The fallacies that have the most pervasive effect on how we structure our systems at Ably are:

InfoQ: Do you think the evolution of distributed systems in the last thirty years has revealed any additional fallacies that should be taken into account?

Diaconu: I believe the most significant transformation over the last 30 years is the maturity of our understanding of how to deal with them. Thats not to say that the answers are any easier, but they are better understood. We know what approaches are good, what approaches are bad, and the limits of any given approach. There is now well-established scientific theory and engineering practice around these problem spaces. Computer science students are taught the problems and what the state of the art is.

Of course, its important to acknowledge that the fallacies are manifestations of enduring technical challenges; they shouldnt be thought of as easily avoided pitfalls. I suppose you could say that there is, in fact, a new fallacy "avoiding the fallacies of distributed computing is easy."

InfoQ: Some of the fallacies have become meanwhile commonplace, for example the idea that the Cloud is not secure is widely accepted. Still there may be some subtlety to them that makes the process of dealing with them not so trivial.

Diaconu: As previously mentioned, the challenges of distributed systems, and the broad science around the techniques and mechanisms used to build them, are now well researched. The thing you learn when addressing these challenges in the real world, however, is that academic understanding only gets you so far.

Building distributed systems involves engineering pragmatism and trade-offs, and the best solutions are the ones you discover by experience and experiment.

As an example, the "network is reliable" fallacy is the most basic thing you have to address. The known solutions involve protocols with retries; or consensus formation protocols; or redundancy for fault tolerance, depending on the particular failure mode of concern.

However, the engineering reality is that multiple kinds of failures can, and will, occur at the same time. The ideal solution now depends on the statistical distribution of failures; or on analysis of error budgets, and the specific service impact of certain errors.

The recovery mechanisms can themselves fail due to system unreliability, and the probability of those failures might impact the solution. And of course, you have the dangers of complexity: solutions that are theoretically sound, but complex, might be far more complicated to manage or understand whenever an incident takes place than simpler mechanisms that are theoretically not as complete.

InfoQ: If we look at microservices, which have become quite popular in the last few years, they seem to be at odds with the "transport cost is zero" fallacy. In fact, the smaller each microservice, the larger their overall count and the ensuing transport cost. How do you explain this?

Diaconu: Maybe another fallacy is "microservices make it easier to reason about your system". Sometimes breaking things down into components with a smaller surface area makes them easier to reason about. However, sometimes creating those boundaries adds complexity; it can certainly add failure modes, and it can create new things whose behavior also needs to be reasoned about.

Much like the previous answer, the actual design choices, and when and where you deploy the known theoretical solutions, come down to engineering judgment and experience. At Ably, we operate a system with multiple roles that scale, interoperate and discover one another independently. However, splitting functionality out into a distinct role is something we rarely do, and only when there is a particular driver for that to happen. For example, if we want some specific functionality to scale independently of other functionality, that justifies the creation of an independent role, even if it brings additional complexity.

Diaconu's article not only helps you understand where the fallacies originate from, but it also attempts to provide useful hints at current techniques and approaches to address the fallacies, so do not miss it if you are interested in the subject.

Read the rest here:

Reviewing the Eight Fallacies of Distributed Computing - InfoQ.com

Related Posts

Comments are closed.