2024
Data contracts set the expectations for the data.
These include:
- How to access the data
- How the data will be structured
- How often the data will be published
- Who to contact about the data (the owner)
- What will happen when the data needs to evolve
Without expectations, users make assumptions that are more optimistic than reality.
I mentioned yesterday that one way to reduce the amount of data transformations (and the costs of them) is to challenge the assumption we need to bring all data centrally before it can be useful.
Most of the problems we’re solving in our organisations are not unique to us:
- We need a way to store data
- We need a way to discover data
- We need a way to transform data
- We need a way to build and present dashboards
- We need a way to train and deploy machine learning models
And so on.
2023
Your contracts-backed data platform should give your users a production ready way to generate, manage and consume data, and make that data available through a reliable interface.
APIs and data contracts have a lot in common, and APIs were part of the inspiration behind data contracts when I was coming up with the idea a few years back. The both provide the interface (see my post from a couple of weeks ago on the importance of interfaces), they both set expectations for the user (the structure, semantics, SLOs, and so on), and they both allow for integrations with other services, tools, etc.
In yesterday’s note I wrote about the problem with defaults. One response to my personal data example could be “why don’t we just infer it?”.
Let’s say 20 years ago you ran your code in an environment you configured simply as python. The obvious default would have been Python 2. But today the obvious default is Python 3. If you deployed that same code with the same configuration, what Python environment would you expect? What would you expect that to be in 20 years time?
How do you like your data?
Do you want it to be agile? So it can change at any moment, depending on the needs or wants of those producing the data? If a team decides it wants to model an object differently, with different IDs, they can do so. They are moving fast and breaking things.
It’s that time of year again where teams everywhere are considering a code freeze for the holiday season. Should we have one? How long for? What will we do while our code is frozen? Will we have lots to merge in January, and how will we manage that?
I wrote yesterday about how by promoting autonomy we can empower the users of our data platform to take on more ownership and responsibility. But if you took that message to the extreme, you wouldn’t build a data platform at all! You’d just leave them to it, to make all the decisions on how they should generate, process and consume data.