Data Platform

2024

Data contracts set the expectations

8 January 2024·1 min

Data contracts set the expectations for the data.

These include:

How to access the data
How the data will be structured
How often the data will be published
Who to contact about the data (the owner)
What will happen when the data needs to evolve

Without expectations, users make assumptions that are more optimistic than reality.

Moving less data, with federated queries

5 January 2024·2 mins

I mentioned yesterday that one way to reduce the amount of data transformations (and the costs of them) is to challenge the assumption we need to bring all data centrally before it can be useful.

We are not unique

1 January 2024·1 min

Most of the problems we’re solving in our organisations are not unique to us:

We need a way to store data
We need a way to discover data
We need a way to transform data
We need a way to build and present dashboards
We need a way to train and deploy machine learning models

And so on.

2023

A production ready data platform

24 December 2023·1 min

Your contracts-backed data platform should give your users a production ready way to generate, manage and consume data, and make that data available through a reliable interface.

APIs and data contracts have a lot in common, and APIs were part of the inspiration behind data contracts when I was coming up with the idea a few years back. The both provide the interface (see my post from a couple of weeks ago on the importance of interfaces), they both set expectations for the user (the structure, semantics, SLOs, and so on), and they both allow for integrations with other services, tools, etc.

Infering rather than defining

18 December 2023·1 min

In yesterday’s note I wrote about the problem with defaults. One response to my personal data example could be “why don’t we just infer it?”.

The problem with defaults

17 December 2023·2 mins

Let’s say 20 years ago you ran your code in an environment you configured simply as python. The obvious default would have been Python 2. But today the obvious default is Python 3. If you deployed that same code with the same configuration, what Python environment would you expect? What would you expect that to be in 20 years time?

Agility vs stability

16 December 2023·1 min

How do you like your data?

Do you want it to be agile? So it can change at any moment, depending on the needs or wants of those producing the data? If a team decides it wants to model an object differently, with different IDs, they can do so. They are moving fast and breaking things.

The holiday code freeze

15 December 2023·1 min

It’s that time of year again where teams everywhere are considering a code freeze for the holiday season. Should we have one? How long for? What will we do while our code is frozen? Will we have lots to merge in January, and how will we manage that?

Reducing cognitive load

4 December 2023·2 mins

I wrote yesterday about how by promoting autonomy we can empower the users of our data platform to take on more ownership and responsibility. But if you took that message to the extreme, you wouldn’t build a data platform at all! You’d just leave them to it, to make all the decisions on how they should generate, process and consume data.

Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)