Skip to main content

Count the joins

·1 min

I enjoyed this post from Nicole Radziwill, PhD on LinkedIn:

How fragile are your pipelines? Start with this simple metric: COUNT THE JOINS. Every time you have to join, you’re making multiple assumptions about the underlying raw data, the biggest one being: you’re assuming it’s not going to change.

Not only does that increase the fragility, joins are costly. Particularly on modern data warehouses like Snowflake and BigQuery.

Nicole then goes on to say:

Without a modelled “clean data layer” that can stay invariant even when software engineers change the systems upon which they’re based, your organization will struggle with data integrity.

Which is what I argue for a lot, and what data contracts was originally designed to solve, by providing a well-defined interface to that data.

Data platforms for data leaders - daily newsletter

Get tips like this in your inbox, every day!

Give me a minute or two a day and I’ll show you how to get the most out of your organisation's data.

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones