Data contracts are a simple concept
Hello 👋
In this week’s newsletter I write about the simple concept of data contracts, and its power.
There’s also links to articles on the 7-table fallacy and measurement engineering.
Data contracts are a simple concept
In a virtual book signing earlier this week at ODSC East, I was asked:
For teams new to the concept, how would you explain data contracts in simple terms?
My answer:
A data contract is simply a human and machine readable document that describes the data.
That’s it!
They really are a simple concept, based on the idea that with a bit more context, we can do so much more.
For example, just a simple data contract with a schema allows us to create and manage tables, as shown below:
name: Customer
description: A customer of our e-commerce website.
version: 1
fields:
id:
type: string
description: The unique identifier for the customer.
required: true
name:
type: string
description: The name of the customer.
required: true
email:
type: string
description: The email address of the customer.
required: true
language:
type: string
description: The language preference of the customer.
All we need to do is convert that to something an infrastructure as code tool can understand, and we now have a table under change management, driven by the data contract.

Add a bit more context to the data contract, such as SLOs and/or data quality rules, and we can implement observability:
name: Customer
description: A customer of our e-commerce website.
owner: [email protected]
version: 1
slos:
timeliness: 1hr
fields:
id:
type: string
description: The unique identifier for the customer.
required: true
name:
type: string
description: The name of the customer.
required: true
email:
type: string
description: The email address of the customer.
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
required: true
language:
type: string
description: The language preference of the customer.
enum: [en, fr, es]
Again, the implementation is simple, just converting the data contract to something that can be understood by Great Expectations, Soda, etc.

Add a bit more context, such as an anonymisation strategy, and you can create an anonymisation service that ensures data is anonymised to prevent access and/or when it has breached its retention period.
name: Customer
description: A customer of our e-commerce website.
version: 1
fields:
id:
type: string
description: The unique identifier for the customer.
required: true
name:
type: string
description: The name of the customer.
required: true
anonymisation_strategy: hex
email:
type: string
description: The email address of the customer.
required: true
anonymisation_strategy: email
language:
type: string
description: The language preference of the customer.
You then use this contract in a small tool that anonymises the data using the features of your data warehouse.

The data contract remains simple, both as a concept and as a document, and yet the ability to use it to automate the difficult parts of data creation and management are limitless.
That’s the power of the data contract.
Interesting links
The 7-Table Fallacy: Why Text-to-SQL Isn’t Enterprise AI by Timothy W. Cook
Really good read arguing we don’t just need good column names and the LLM will work it out, nor do we need semantic layers. We need context inside with the data.
Measurement Engineering: The Part of Data Science That Will Thrive in AI by Eric Weber
It’s not just showing the data, its understanding what the numbers mean and what they can and cannot support.
It’s probably not a new job title though - it’s just what we should be doing.
Being punny 😅
I’ve just won the ‘World’s most secretive person’ award. I can’t tell you how much it means to me.
Thanks! If you’d like to support my work…
Thanks for reading this weeks newsletter — always appreciated!
If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.
🆕 I’ll be running my in-person workshop, Implementing a Data Mesh with Data Contracts, in June in Belgium. It will likely be only in-person workshop this year. Do join us!
Enjoy your weekend.
Andrew