Skip to main content

3 common data assumptions worth challenging

We don't necessarily need to be doing things as we've always done
·4 mins

Hello 👋

This week I propose some common data assumptions that I think are worth challenging.

There’s also links to articles on data platforms not being a destination, ORKs for data platform prioritisation, and query federation.

Also, early bird pricing on my only in-person course this year ends at the end of the month. Do join me in Antwerp!.


3 common data assumptions worth challenging

I like to challenge assumptions, because so often we do things how they have always been done, or how we’ve seen them done at other companies (e.g. copying FAANG), without taking a moment to think about the best solution for the problem we currently have.

Here are 3 common data assumptions I believe are worth challenging.

1. No one else cares about data quality

People do care about the quality of data when it aligns with their incentives.

There are various ways to do this. For example, if you depend on quality data from an upstream producer you can call that out as a dependency and have it tracked however inter-team dependencies are usually tracked in your organisation (e.g. OKRs, etc).

2. You have to bring all data centrally before you can work on it

Often the first thing data people want to do when faced with a new data use case is bring the data from the source system into the data warehouse. But that’s not a small amount of work, so that has to be resourced first, before the actual value delivery of meeting that use case.

What if we can deliver that value without moving all the data first? Think how much time that would save!

I’m not saying “warehousing is dead”, but I want to question the assumption that warehousing is the only way data can be made available to the business.

For example, if we have data in a source system that could be joined with data in our CRM (e.g. Salesforce, HubSpot) to help the sales team better understand your customers, the assumed solution is to bring both those datasets into a data warehouse and build a report for the sales team.

An alternative might be to bring a thin slice of data from the source system into the CRM, which gives the sales team the data they need directly in the tool they need it.

3. Your problems are unique to you, and therefore you need a unique solution

Many of the problems we need to solve are common—only the data is different.

Therefore the solutions we can deploy can often be built on mature, commoditised solutions.

For example, I worked somewhere where someone had read about Airbnb’s feature definition DSL and decided we needed something similar, so built it and moved all features to it.

But we were nowhere near the scale or complexity of Airbnb, and this decision added loads of complexity and technical debt that slowed our ML teams down for years.

We could have just used dbt or similar.

With all of these assumptions, don’t challenge them just to be contrarian. Do so because you are open-minded and always learning.


The Data Platform Was Never a Place by Bjørn Broum

A true platform is a consistent set of capabilities — store, compute, observe, govern — tied together by shared contracts, controls, and semantics, regardless of where data physically resides.

Great post on reframing your data platform from a destination to capabilities.

Leveraging OKRs to Drive Platform Prioritization by Anna Bergevin

An excellent deep dive on why and how to align data platform work to ORKs to ensure your working on the right thing for the business, without becoming responsible for deciding what that is. Includes a system for making this work in practice.

Jack of all trades: query federation in modern OLAP databases by Nicoleta Lazar

Another alternative to moving data around is to federate. Here’s a deep dive into how Trino and StarRock federate and the differences between them.


Being punny 😅

I’ve started a new business making yachts in my attic. Sails are going through the roof.


Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter — always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

🆕 I’ll be running my in-person workshop, Implementing a Data Mesh with Data Contracts, in June in Belgium. It will likely be only in-person workshop this year. Do join us!

Enjoy your weekend.

Andrew


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

    Newsletter

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.