Skip to main content

Keeping data contracts simple

·3 mins

Happy Friday!

In this weeks newsletter I write about keeping data contracts for your users simple even as you add more capabilities.

Also links to articles on implementing data contacts, data platforms as a product, and watermarks when streaming.


Keeping data contracts simple

Data contracts can power a number of platform capabilities, including change management, data quality checks, data retention, access policies, and more.

But that doesn’t mean they have to become complex.

From the users point of view the data contract should continue to be simple to use even as you add more users and add more capabilities.

For example, we started our data contracts platform with change management, and the data contract managed changes to the tables in the data warehouse.

One of the first features we added was backups, and because our implementation was effectively free (in $) we just rolled it out to all data contract users by default.

The data contract didn’t change.

You could customise your backup if you wanted, e.g. change the retention period or disable them if you wanted to, but you didn’t need to, and no one did.

A more involved feature we added was data retention, and that required people to categorise their data. We actually had the categorisation in the data contract from day 1, knowing we would be doing this, so no migration needed. But it is more for the data contract user to complete.

We then used the same categorisation for access management, so again a new feature without changing the data contract definition itself or adding any more complexity.

The platform itself is getting more complex, so you have to manage that, but that’s a fairly standard engineering problem.

The data contract your users see should remain simple, only containing the metadata you need to automate the creation and management of data.


Proposing and Implementing Data Contracts with Your Team by Hoyt Emerson

Great write up of implementing data contracts, including some of the challenges encountered, and technical details on implementing with GCP.

If I’m a Data Platform PM, What IS My Product Exactly? by Anna Bergevin

What is a data platform, who is it for, and what use cases do you solve for? All covered here.

Flink Watermarks - WTF?

If you’re new to streaming, or even experienced with streaming but still can’t quite get your head around watermarks, late data, etc, then this is a nice interactive explanation. The core concepts will also apply to other streaming engines.


Being punny 😅

Do you know there is a Spanish programming language? It’s called Si++


Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter — always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

🆕 Also check out my self-paced course on Implementing Data Contracts.

Enjoy your weekend.

Andrew


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


Andrew Jones
Author
Andrew Jones
I build data platforms that reduce risk and drive revenue.