Skip to main content

Creating platforms with data contracts

·5 mins

The initial idea for data contracts was to create an interface through which reliable and well-structured data could be made available to consumers. Like an API, but for data.

To create an interface we first need a description of the data - the metadata - that contains enough detail to provision the interface in our system of choice. For example, we need a schema with fields and their types, which allows us to automate the creation and management of a table in the data warehouse.

Then I realised, if we can automate the creation and management of an interface from this metadata, what else could we automate if we had a sufficient metadata?

Could we automate our backups? Yes we could. The name and schema is all we need to provide a sensible default for backups, and we can allow the backups to be optionally configured if the data owner wants to change the retention policy.

Could we automate our data quality checks? Yes we could. We can add those checks to the data contract and run them on the data owners behalf. If we also have the owner of the data captured and a Slack channel we can send the alerts there.

Could we automate the population of our data catalog? Yes we could. We could have the documentation stored in the data contract and push all data contracts to the central data catalog, providing an easy-to-use interface for discovering data. As an added benefit, because the documentation is alongside the schema, it’s much more likely to stay up to date as the schema evolves.

Could we automate our data deletion and anonymisation policies? Yes we could. We’d need to categorise our data sufficiently so we know if it the field contains personally identifiable information. We also need the data owner to tell us the anonymisation policy for that particular field. Then we could easily write a service to automate this.

Could we automate access controls? Yes we could. We have that categorisation of data already. Users should be able to access data of a certain category based on their roles.

In fact, every platform capability we’ve added to the data platform since we adopted data contracts has been built on data contracts.

This is great for the us. As a data platform team we are delivering features that are immediately adopted by all users and applied to all data managed with data contracts.

It’s great for our governance efforts. We know all our policies are applied to all our data, and done so automatically without the need for human intervention.

It’s also great for our users. The data owners have self-serve access to the data platform and the capabilities it provides, and they know as long as they describe their data accurately the data will be backed up, monitored, and managed in accordance with our policies.

The data owners don’t need to become experts in data retention to manually delete or anonymise the right data - they just describe the data and the tools take care of the rest.

Data contracts are a simple idea. Your just describing your data in a standardised human- and machine-readable format.

But they’re so powerful.

Powerful enough to build an entire data platform around.


Data Contracts - AI’s new best friend

Part of the inspiration for todays post came from a great discussion I had on the Talking AI podcast recently.

Aside from the platform chat, we also covered:

  • An introduction to data contracts and their purpose
  • Highlighting the common reliability issues that data contracts can prevent
  • Linking data contracts to data mesh and other modern data architectures
  • Exploring how data contracts help maintain governance and compliance
  • Demonstrating how data contracts ensure quality for AI-driven applications
  • Discussing platform thinking and its role in scaling data contract implementations
  • Offering practical advice on how to begin using data contracts in your organisation

That’s a lot of ground in 30 mins! But it turned out great.

Check it out here or subscribe to the podcast in your preferred player: Apple | Spotify | YouTube


Is a monetised dataset a “Data Product”?

Good discussion on how to consider the value of a data product.

Five ways to make data users happy

I’ve watched countless companies throw millions at data infrastructure, only to end up with a mess that would confuse even the most seasoned data users. People don’t know what’s available, how to access it, how to get it where they want it to be, or how to collaborate.

There’s a vastly better approach — one that focuses on people and use cases, not technology.

💯


Being punny 😅

I saw an Apple Store get robbed over the weekend. Police had to question me as an iWitness.


Upcoming workshops


Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter - always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

Enjoy your weekend,

Andrew


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

Enter your best email here:

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.