A minimal data contract implementation
A data contracts implementation doesn’t have to be complicated.
There’s no reason you couldn’t create a minimal implementation in an hour.
A data contracts implementation doesn’t have to be complicated.
There’s no reason you couldn’t create a minimal implementation in an hour.
It’s all very well saying, as I did yesterday, that data producers need to do more to provide us better quality data the data we need to solve our problems.
For a data consumer to know if the data is applicable to their use case, they need to know what to expect from the data.
I wrote a few weeks ago about the difference between active vs passive data publishing, and how the active publishing of data leads to better outcomes for your organisation.
“You broke our data, so your PRs now need our signoff.”
This is a common reaction from data teams who are feeling the impact of upstream data changes causing breakages in their pipelines.
Data contracts underpin data products.
With data contracts we are explicitly saying data should be treated as a product by those teams who produce the data. That data is then provided through a stable interface.
Passive data publishing is the norm in most organisations.
That’s where we’re using patterns like ELT or CDC to extract copies of the upstream database and replicate them in a data warehouse/lake. The data producer isn’t doing anything to facilitate this - they are passive.
As mentioned yesterday, data contracts are best stored with the code that generates the data.
Because data contracts are stored with the code that creates the data, and maintained by those who own the data, they become the source of truth for everything about that data.
Data contracts should be stored as close as possible to the place the data is generated.
How, exactly, should you version data contracts?
The default answer is often to use SemVer.
SemVer is a standard from software engineering and used widely to version libraries and releases of software applications.
Want great, practical advice on implementing data mesh, data products and data contracts?
In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.
I also include a little pun, because why not? 😅
(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)