Data Quality

2024

What happens when a data contract is breached?

28 January 2024·2 mins

I wrote earlier this month that data contracts shouldn’t focus on enforcement.

By which I meant, the outcome you’re optimising for isn’t enforcing rules on someones data, but instead using data contracts to facilitate a better quality dataset that allows others to build on it with confidence.

What do you want from your data?

27 January 2024·1 min

What do you want from your data?

Do you want it to be fast changing?

It’s demotivating to work with poor data

25 January 2024·1 min

You try your best to work around the poor quality data you’re given.

Only to deliver a poor outcome to your users.

Count the joins

23 January 2024·1 min

I enjoyed this post from Nicole Radziwill, PhD on LinkedIn:

How fragile are your pipelines? Start with this simple metric: COUNT THE JOINS. Every time you have to join, you’re making multiple assumptions about the underlying raw data, the biggest one being: you’re assuming it’s not going to change.

Software engineering and dependencies

22 January 2024·1 min

If you’re a software engineer, and an upstream dependency is unreliable, then you would speak to the team who owns that dependency.

If you want to improve the quality of the data, you need to speak to the producer of the data

19 January 2024·1 min

If you want to improve the quality of the data

Then you’ll need to speak to the producer of the data.

Treat the cause of data quality problems, not the symptoms

14 January 2024·1 min

Staging layers, medallion architectures, data testing, assigning data stewards, gatekeeping application changes until reviewed by a data team.

How is data seen in your organisation?

13 January 2024·2 mins

In a response to my LinkedIn post on how every data transform is technical debt, Tim Hiebenthal commented:

I totally agree with your statements, but I have doubts about the feasibility of implementing it.

Data quality can only be improved at the source

12 January 2024·1 min

Data quality can only be improved at the source.

If the source of the data isn’t capturing the data at the required accuracy, there’s nothing you can do later to increase the accuracy.

Trust starts at the source

3 January 2024·1 min

As I wrote yesterday, many data professionals don’t trust the data they are building on. And many users of data and data applications don’t trust the data they’re being provided.

Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)