Investing in decentralisation

Hey! Thanks for opening another edition of my newsletter. This week I talk about the investment you need to make when decentralising.

There’s also links to posts on applying software engineering to data, Cloudflare’s support for Iceberg, and Ben Thompson’s interview with Google Cloud Platform CEO Thomas Kurian.

Finally, in case you missed it, don’t forget to check out my post on Confluent’s blog on Shifting Left: How Data Contracts Underpin People, Processes, and Technology.

Enjoy!

Investing in decentralisation

How many times has your data team gone from centralised, to decentralised, and back to centralised again?

A diagram showing a central point connected to two options with dashed lines: the left arrow leading to "Centralised" and the right arrow leading to "Decentralised", illustrating a spectrum between centralization and decentralization.

Often the motivation to become more decentralised comes from the realisation that your data team is a bottleneck that cannot handle the amount of work it needs to do and is therefore slowing the organisation down.

By being decentralised you’re allowing local teams to take on more autonomy, make decisions more quickly, and use their domain expertise to enable faster and better access to data.

But this move to a more decentralised model, if not managed well enough, can lead to a lack of control over the data, duplication of effort, and inconsistent results in different parts of the business.

So, the data team starts to move back to a more centralised model.

And so on.

Moving from one to the other every 18 months doesn’t solve any of the problems you had.

If you want to move to a decentralised model, you need to invest in it, designing your tools and processes to support the model you want to work with.

For example, you should invest in a self-serve data platform that promotes the autonomy of data producers everywhere to create and manage their own data and provide it to their consumers.

It should be easy for them to spin up the infrastructure they need to do this, with minimal friction.

You should also invest in data governance tooling that can manage data across the the organisation, for example implementing data retention policies and access controls, without relying on manual compliance.

This is what data mesh calls federated computational governance. The word computational implies automated, and so you automate this governance through the data platform.

That’s going to be difficult to do without a standard way to describe the data that is used consistently across the organisation, which is where the data contract comes in, as the way to collect these descriptions in a machine-readable format through which we can automate these data governance tasks.

Then there’s also the cultural side, and how you invest in alignment with this model, which is largely about how you communicate with others in the organisation but also could include activities such as training.

It’s these investments that make decentralisation a success.

My upcoming workshop is all about how you do this in practice, including building the self-service data platform, automating data governance, and changing your culture. Join me in Belgium by signing up here!

Interesting links

How we built a robust ecosystem for dataset development by Lavanya Aprameya at Duolingo

As the subheading states, “working with data is just another flavor of working with code”. Some great ideas here in how they are brining software engineering best practices to data.

See also my previous post on what we would need to do to make data pipelines as dependable as software.

The 2025 State of Analytics Engineering Report by dbt Labs

This years dbt Labs state of analytics report is out.

Data quality and trust in data are still the most critical challenges for data teams to solve, with over 56% of respondents citing poor data quality as a challenge. Last year it was 57%, the year before it was 41%.

Maybe we need a different approach, such as addressing data quality at the source…

R2 Data Catalog: Managed Apache Iceberg tables with zero egress fees by Cloudflare

Cloudflare have launched an Apache Iceberg data catalog for R2, currently in beta.

One of the big selling points with R2 is their disruptive zero egress charges, whereas other cloud providers charge high costs to try to lock you in by making it prohibitively expensive to get your data back out.

However, that doesn’t mean Cloudflare are always cheap. They are proposing $9/1MM for catalog API requests, whereas AWS Glue charges $1/1MM requests.

Still, I use Cloudflare for some hobby projects (including this one!) and I like how they take a different approach to other cloud providers.

An Interview with Google Cloud Platform CEO Thomas Kurian About Building an Enterprise Culture by Ben Thompson

I linked previously to Ben’s interview with Snowflake CEO Sridhar Ramaswamy. This is another good one, and it’s interesting to see how bullish Ben is on the future of Google Cloud and its AI offerings.

Being punny 😅

My plan for the long weekend is to go with my partner to get a new pair of glasses. After that, we’ll see.

For those of you who have a long weekend for Easter, enjoy the time off!

Upcoming workshops

Implementing a Data Mesh with Data Contracts - Antwerp, Belgium - June 5
- Alongside the inaugural Data Mesh Live conference, where I’ll also be speaking.
- Sign up here

Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter — always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

Enjoy your weekend.

Andrew