Data Contracts

📙 Looking for more on data contracts? Check out my book! 📙

2024

Automated schema migrations

14 October 2024·1 min

When a schema change is made in a data contract you might want to think about how you can automate the deployment of that and migration to it, without data loss or effort from the consumers.

Start with the problem

10 October 2024·1 min

What is the problem you’re trying to solve with data contracts?

Is it:

Improve the governance and control of your data?
Reduce the amount of preventable data incidents?
Move towards a more decentralised operational model?

And how does solving that problem help the business achieve its goals?

The data contract captures contextual information

9 October 2024·1 min

The problem with using data isn’t usually finding it - data catalogs can surface that data.

Data contracts cannot assign ownership

8 October 2024·1 min

I was giving a talk a few weeks ago on data contracts, and through it one audience member seemed particularly engaged, nodding along throughout.

A business case that improves data quality

4 October 2024·2 mins

Many data engineering teams spend a lot of their time struggling to deal with upstream data. That includes:

You don't need a mandate

3 October 2024·1 min

Following on from yesterday’s note on the API Mandate and data contracts, I wan’t to be clear you don’t need to mandate data contracts to get adoption.

Data contracts and the API mandate

2 October 2024·2 mins

In 2002 Jeff Bezos issued the now infamous API Mandate which stated the following:

All teams will henceforth expose their data and functionality through service interfaces.
Teams must communicate with each other through these interfaces.
There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter.
All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
Anyone who doesn’t do this will be fired.
Thank you; have a nice day!

Now, at the time creating APIs were not as easy as they are now, so building these interfaces would take time.

Workflows must have a benefit

30 September 2024·1 min

Every workflow and process you have has a cost, including the following:

Managing your infrastructure as code as part of your development workflow
Creating project and solution design documents and getting wide feedback as part of your project workflow
Running automated tests with CI/CD as part of your deployment workflow

Each of these slow down delivery:

Validating data contracts downstream

27 September 2024·1 min

I wrote yesterday how it’s best to validate as early as possible.

But there are times that might not be possible, maybe because:

Validate as early as possible

26 September 2024·1 min

It’s always best to do your data validation as early as possible as that:

Allows them to be acted on quicker
Alerts the team with the most context of the data/issue
Reduces the potential of having the incorrect data in multiple systems

So with data contracts I encourage data producers to validate their data before they publish their data to a downstream service.

Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)