Skip to main content

Data Contracts


📙 Looking for more on data contracts? Check out my book! 📙

2024


Automated schema migrations

·1 min

When a schema change is made in a data contract you might want to think about how you can automate the deployment of that and migration to it, without data loss or effort from the consumers.

Start with the problem

·1 min

What is the problem you’re trying to solve with data contracts?

Is it:

  • Improve the governance and control of your data?
  • Reduce the amount of preventable data incidents?
  • Move towards a more decentralised operational model?

And how does solving that problem help the business achieve its goals?

Data contracts cannot assign ownership

·1 min

I was giving a talk a few weeks ago on data contracts, and through it one audience member seemed particularly engaged, nodding along throughout.

Data contracts and the API mandate

·2 mins

In 2002 Jeff Bezos issued the now infamous API Mandate which stated the following:

  1. All teams will henceforth expose their data and functionality through service interfaces.
  2. Teams must communicate with each other through these interfaces.
  3. There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
  4. It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter.
  5. All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
  6. Anyone who doesn’t do this will be fired.
  7. Thank you; have a nice day!

Now, at the time creating APIs were not as easy as they are now, so building these interfaces would take time.

Workflows must have a benefit

·1 min

Every workflow and process you have has a cost, including the following:

  • Managing your infrastructure as code as part of your development workflow
  • Creating project and solution design documents and getting wide feedback as part of your project workflow
  • Running automated tests with CI/CD as part of your deployment workflow

Each of these slow down delivery:

Validate as early as possible

·1 min

It’s always best to do your data validation as early as possible as that:

  • Allows them to be acted on quicker
  • Alerts the team with the most context of the data/issue
  • Reduces the potential of having the incorrect data in multiple systems

So with data contracts I encourage data producers to validate their data before they publish their data to a downstream service.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

    Newsletter

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)