Automated schema migrations
When a schema change is made in a data contract you might want to think about how you can automate the deployment of that and migration to it, without data loss or effort from the consumers.
When a schema change is made in a data contract you might want to think about how you can automate the deployment of that and migration to it, without data loss or effort from the consumers.
What is the problem you’re trying to solve with data contracts?
Is it:
And how does solving that problem help the business achieve its goals?
The problem with using data isn’t usually finding it - data catalogs can surface that data.
I was giving a talk a few weeks ago on data contracts, and through it one audience member seemed particularly engaged, nodding along throughout.
Many data engineering teams spend a lot of their time struggling to deal with upstream data. That includes:
Following on from yesterday’s note on the API Mandate and data contracts, I wan’t to be clear you don’t need to mandate data contracts to get adoption.
In 2002 Jeff Bezos issued the now infamous API Mandate which stated the following:
- All teams will henceforth expose their data and functionality through service interfaces.
- Teams must communicate with each other through these interfaces.
- There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
- It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter.
- All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
- Anyone who doesn’t do this will be fired.
- Thank you; have a nice day!
Now, at the time creating APIs were not as easy as they are now, so building these interfaces would take time.
Every workflow and process you have has a cost, including the following:
Each of these slow down delivery:
I wrote yesterday how it’s best to validate as early as possible.
But there are times that might not be possible, maybe because:
It’s always best to do your data validation as early as possible as that:
So with data contracts I encourage data producers to validate their data before they publish their data to a downstream service.
Want great, practical advice on implementing data mesh, data products and data contracts?
In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.
I also include a little pun, because why not? 😅
(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)