Skip to main content

Data contracts anti-pattern #1: Contracts as documentation

A data contract without impact is just documentation.
·3 mins

Hello 👋 Hope you’ve had a good week!

I’ve seen a lot of people succeed with data contracts, but I’ve also seen a lot of people struggle. So, over the next four weeks I’m going to write up 4 common anti-patterns I’ve seen.

Interestingly, each of them looks like progress, but they fail to solve the underlying problems.

I’ll kick this series off with the most common anti-pattern: Contracts as documentation.

There’s also links to posts on building governance into the platform, making an impact as a platform PM, and analytical identifiers.


Contracts as documentation

This is where the data contract is written once, often in a document or spreadsheet, usually by the data team rather than the data producer, and that’s it.

It is correct at the time of writing, but quickly drifts from reality as the schema evolves in production. The owner points to an individual who leaves the company, or a team that no longer exists following a reorg. The SLO definitions are based on observations over the last few weeks, not a commit made by the data owners.

This happens because the data contract isn’t doing anything. There’s no platform tooling or automation using the contract to monitor or manage the data. The “owner” of the contract has no incentive to maintain it because nothing depends on it being accurate.

While it starts as a useful reference document, it gradually becomes a liability. It’s actually worse than no contract at all, because it provides false assurance to consumers who believe it.

Flow diagram showing why data contracts fail: "Contract written once" leads to four problems — "Schema and context drift", "No tooling", "No incentives", "No consequences" — resulting in "Worse than no contract at all..."

To prevent yourself from implementing this anti-pattern, ensure every field in the contract does something through platform tooling and automation. That could be CI checks preventing breaking schema changes, or SLOs and/or quality rules that fire alerts when they fail, or a data catalog/context layer that is populated from the contract.

It’s the impact a data contract has when it is missing or incorrect that keeps it accurate and up to date. Otherwise it’s just documentation.


Why the Best Governance Is Built Into the Platform, Not Bolted On by Bjørn Broum

Great post on embedding governance into the data platform.

How to make an Impact as a Platform Product Manager by Alex Craciun

While specifically about developer platforms most of the advice is also good for anyone building data platforms.

Integration by Design: Analytical Identifiers by Patrik Lager

Good argument that we should be more deliberate about the identifiers we use.


Being punny 😅

A man tried to sell me a coffin today. I told him that’s the last thing I need.


Upcoming events


Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter — always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

Enjoy your weekend.

Andrew


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

    Newsletter

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.