Skip to main content

Data Quality

2024


Count the joins

·1 min

I enjoyed this post from Nicole Radziwill, PhD on LinkedIn:

How fragile are your pipelines? Start with this simple metric: COUNT THE JOINS. Every time you have to join, you’re making multiple assumptions about the underlying raw data, the biggest one being: you’re assuming it’s not going to change.

Software engineering and dependencies

·1 min

If you’re a software engineer, and an upstream dependency is unreliable, then you would speak to the team who owns that dependency.

Data quality can only be improved at the source

·1 min

Data quality can only be improved at the source.

If the source of the data isn’t capturing the data at the required accuracy, there’s nothing you can do later to increase the accuracy.

Trust starts at the source

·1 min

As I wrote yesterday, many data professionals don’t trust the data they are building on. And many users of data and data applications don’t trust the data they’re being provided.

Do you trust your data?

·1 min

At most of my recent talks I’ve asked the audience - who are made up of data professionals - a simple question: Do you trust your data?


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

    Newsletter

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)