What does it mean to take responsibility?

Hey! Hope you had a good week :)

Today I write about what it actually means to take responsibility for data.

There’s also links to articles on DucbDB transpilation to reduce warehouse costs, ontologies on Snowflake, and observability becoming a bottleneck.

Finally a reminder: Early bird pricing on my only in-person data contracts course this year ends at the end of the month. Do join me in Antwerp!.

What does it mean to take responsibility?

What does it mean when we say data producers must take responsibility for their data?

More importantly, do they know what it means?

And do they know why it’s important?

It needs to be explicitly stated what their responsibility is.

For example, we know that datasets that change without effective change management are one of the most common causes of breakages of downstream data applications, with one report suggesting 37% of data incidents are caused by schema changes.

So, we might say data producers are responsible for the change management of their data, and must follow a certain process when making a breaking change to the data and/or the schema.

That could involve:

Producing RFCs or other documentation for review
Backfilling data from the previous version of the schema to the new
Populating both the current and the new version of the schema for a period of time, allowing for a migration with no downtime for consumers

Importantly, it’s up to the data producers to decide exactly how to do these things - it’s part of their responsibilities to do so.

For example, a small change on a non-critical dataset may only need a migration timeline of days/weeks. A larger change to a critical dataset may have a timeline of months.

The data producers need to consider all the trade-offs in that decision and how it impacts their project and their consumers.

That’s what it means to take responsibility.

Interesting links

Lower your warehouse costs via DuckDB transpilation by Max Halford

The idea of moving some workloads off the data warehouse and onto duckdb is an attractive one, but not one that many people have implemented in practice.

Ontology on Snowflake: Part 1 - Overview and Data Model by Tianxia Jia

Interesting series on building an ontology on Snowflake, most of which is applicable to other data warehouses.

Why Your Observability Platform Has Become A Bottleneck by Andi Mann

Systems change fast and those changes generate a lot of data. That’s more true now than ever. Observability platforms are one place where this may cause a new bottleneck.

Being punny 😅

I finally did it! Bought a new pair of shoes with memory foam insoles. No more forgetting why I walked into the kitchen

Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter — always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

🆕 I’ll be running my in-person workshop, Implementing a Data Mesh with Data Contracts, in June in Belgium. It will likely be only in-person workshop this year. Do join us!

Enjoy your weekend.

Andrew