Skip to main content

Count the joins

·1 min

I enjoyed this post from Nicole Radziwill, PhD on LinkedIn:

How fragile are your pipelines? Start with this simple metric: COUNT THE JOINS. Every time you have to join, you’re making multiple assumptions about the underlying raw data, the biggest one being: you’re assuming it’s not going to change.

Not only does that increase the fragility, joins are costly. Particularly on modern data warehouses like Snowflake and BigQuery.

Nicole then goes on to say:

Without a modelled “clean data layer” that can stay invariant even when software engineers change the systems upon which they’re based, your organization will struggle with data integrity.

Which is what I argue for a lot, and what data contracts was originally designed to solve, by providing a well-defined interface to that data.

Daily data contracts tips

Get tips like this in your inbox, every day!

Give me a minute or two a day and I’ll show you how to transform your organisations data with data contracts.

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue. Guaranteed, with data contracts.