Skip to main content

Count the joins

·1 min

I enjoyed this post from Nicole Radziwill, PhD on LinkedIn:

How fragile are your pipelines? Start with this simple metric: COUNT THE JOINS. Every time you have to join, you’re making multiple assumptions about the underlying raw data, the biggest one being: you’re assuming it’s not going to change.

Not only does that increase the fragility, joins are costly. Particularly on modern data warehouses like Snowflake and BigQuery.

Nicole then goes on to say:

Without a modelled “clean data layer” that can stay invariant even when software engineers change the systems upon which they’re based, your organization will struggle with data integrity.

Which is what I argue for a lot, and what data contracts was originally designed to solve, by providing a well-defined interface to that data.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

Enter your best email here:

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.