Lambda Architecture in 2020

As I start to think about some of the upcoming projects we’ll be working on over the next year and how we might go about building them, I wanted to consider where lambda architecture fits in our toolbox for building data services.

Having led a project a few years ago that tried to follow lambda architecture, I’ve got a decent feel for it’s strengths and weaknesses. In my opinion, lambda architecture is based on two key assumptions:

Streaming processes are in some way imperfect (e.g. lossy, maybe slow for all but simple processing, unable to handle late or replayed data, unable to do complex aggregations, etc)
Users are happy to have either missing or approximated data for a while until the batch process completes

A problem for many use cases is that users do not want assumption 2, or do not understand why that should be the case, and therefore start losing trust in the whole system.

I also don’t think assumption 1 holds up as much now we have frameworks like Beam and Flink, which are a lot better than what we had 3-5 years ago.

The exception to this might be if you really need (imperfect) data very, very fast. But even then, you would probably be better off simplifying your requirements rather than adding the complexity of this architecture.

I think lambda architecture is now a product of its (really very brief) time, and shouldn’t be seriously considered for building data services these days. Having said that, the ideas and its approach to solving them are still interesting to learn from, so the book may still be worth a read if you are interested in the area.

Cover image from Unsplash.