What makes data contracts more than a schema?
People new to data contracts often ask “how is that different to a schema?”.
And it’s a fair question, as schemas also describe data structures in a human and machine readable format.
But within a data contract we’re not just describing the structure. We’re defining the business context too. This could be:
- Semantics
- Policies
- Access
- SLOs
And so on - whatever is most useful in the context of your data.
Each of those set expectations on the data, and its those expectations that builds trust in the dataset and allows consumers to use that data with confidence.
Defining these in the data contract is useful for two reasons.
The first is that they act as documentation, which helps the consumer use that data effectively. And because the documentation is located where the data is being generated it’s more likely it will be kept up to date as that data evolves.
The second is that we can use this metadata to build tooling that makes use of the data contract in various places in the data platform.
Sometimes that will just be using the schema to create and manage an interface, such as a table in a data warehouse. Other times it will use the business context to automate required actions on the data, for example implementing data retention policies or applying the correct access controls to the right users.
So, a schema is part of a data contract. But it’s much more than that.