Skip to main content

Delivering data products with data contracts

·8 mins

Hey friends!

I don’t know about you, but January really has felt like this for me!

Today we have a post on delivering data products with data contracts, plus links to a nice article on the importance of semantic integrity and capturing those in a data contract, and the revolution of providing data platforms with golden paths.


Delivering data products with data contracts

Many organisations are adopting the concept of data products, where data is delivered as a product for users to consume, and that product has enough context that allows the user to decide whether it meets their requirements. These are then provided through a self-serve marketplace-like interface so the users can discover these products.

When done well this greatly increases the data available to users throughout the organisation, freeing data from its silos so it can be quickly and easily used to deliver value.

Data contracts help you deliver these data products.

Data products and data contracts

For a data product to be easy to consume and use through a self-served interface, it needs to be described well with appropriate, up-to-date business context. This context is provided through a data contract.

A data contract is a human- and potentially machine-readable structured document that describes the data. It holds metadata—data about the data—about the data product. That metadata can be used to populate a data marketplace, and also used by tools that help manage the data, such as data governance tooling.

Diagram showing the data contract describing the data product, which in this case is a table.

The data contact can hold whatever metadata you need to hold about your data products, but will typically include:

  • The schema
  • A description of the data product
  • An owner, and how to contact them
  • What purpose(s) it can be used for, and what it shouldn’t be used for
  • Policies and rules concerning the use of the data

Many of these are business terms. They cannot be inferred from the data store and must be entered and maintained by the data product owner. That’s why the data product owner must also own the data contract associated with the data product and be responsible for keeping it up to date.

Discovering data products

To be used, data products must be discoverable. This can be done by providing a data marketplace-like interface, populated from the data contract.

The user can use the terms defined in the data contracts, and presented through the marketplace, to decide if the data product is right for them. If it is, they can request access through the marketplace and start consuming the data to deliver whatever business outcome they need to deliver.

Diagram showing the data marketplace being populated by the data contracts. The users are requesting access through the data marketplace

This should be a self-serve interaction as much as possible, reducing the barriers to data use from anyone across the organisation.

Requesting new data products

This concept of a data product marketplace can be extended further and facilitate the interaction between the users and the data product owners when the required data product doesn’t exist.

In this case, the user can request a new data product through the marketplace. They provide the usage model they want and the requirements they have. Again, this is largely in business terms, and may include:

  • The data they need, and why
  • The value it unlocks for the business
  • The performance, refresh rate, SLOs, etc that are needed

There will likely be some discussion needed between the user and the domain team that owns the data product. For example, the user may want the data to be updated within 5 minutes, but to achieve that might take too much effort/time from the data product owner, so they may compromise on hourly.

Designing and composing data products

As part of this discovery phase the data product owner and the requesting user should be discussing the data model.

An important principle of data products is that they are composable. Data products are built as consistent and interoperable components that can be easily combined into new data products, potentially creating value far beyond the initial investment for a single data product.

Diagram showing composite data products being created from composable data products.

To design consistent and interoperable data products you need to spend some time designing the data model for each data product you build.

This design work is done by a Data Architect, who will work with the data product owner and the requesting user to design the data product, ensuring it meets their requirements while also being consistent with existing data products and across domains.

As part of that design process the Data Architect would use an enterprise data model to determine which data products already exist that can be used as a foundation, pull those entities from the model, and start composing the structure of this data product.

The role of the Data Architect is important here. It is they who ensure these data products are well designed, make use of existing patterns and structures, and are consistent and interoperable with other data products across domains in the enterprise.

This design and modelling phase can easily and quickly be iterated on by the Data Architect by using tools such as ER/Studio.

Delivering new data products

Once there is agreement between the user and the data product owner the data product can be delivered, with all agreements captured and codified in the corresponding data contract as live documentation and context for current and future users.

From the logical data model that forms the data contract the asset that delivers the data product can be built in the same way as any other data product asset, ensuring consistency, interoperability, and data governance are well-understood and accepted before delivery.

Once built and access has been provided this data product then becomes just like any other, ready to use by anyone in the organisation who needs that data.

The requesting user can start consuming this new data product and delivering value.

The modern way to deliver data

Data products are the modern way to deliver data to your organisation. They make more of your data available, accessible, and usable, reducing the barriers to data use.

They do this while also increasing the governance of our data. By being deliberate about how you design and model the data product, and aligning with your enterprise data model, your standard governance policies are well understood and applied instantly and equally to all data products as soon as they are created.

Data marketplaces provide the discoverability for data products, surfacing them to users across the organisation with the right context, as well as facilitating the interactions between data product owners and users of data. Unlike traditional data catalogs, this marketplace isn’t simply a list of technical assets indexed from a data warehouse, but is populated with consistently described data products, managed and delivered decentrally by the domain teams who own that data.

Data contracts help deliver data products by capturing this context in a standard format, allowing data products to be dependable and interoperable components that can be easily composed into new data products. They ensure uniformity of data governance across datasets, with consistency of content, terms, and usage policies.

With data contracts we can break down the data silos in organisations where data is inconsistent and incompatible, allowing well-governed data to be more accessible, more cohesive, and more dependable than ever before.

This post is sponsored by ER/Studio.


The Hidden Danger in Data Contracts: Silent Changes and the Complexity of Meaning by Diogo Santos

Semantic integrity is important, though it can be difficult to automatically detect. Some good suggestions on how to do that here (custom CI, a culture of communicating changes, etc).

In the example given (changing the definition of an “active” customer), that’s something that should be well communicated. It’s probably a request coming from the exec or similar, and eventually it comes down to the data producers to make that change, but there should be some rollout plan for that.

That’s probably also true for many of these silent changes.

It would be interesting to have some numbers on what the most common data issues are, either on average or (probably easier, e.g. by tracking incidents) within an organisation. You can then focus on using data contracts to solve those.

The Golden Path Revolution by Robert Sahlin

I strongly agree with the premise of this article, despite it seemingly using ChatGPT for much of the content/padding.

But the premise of providing self-serve capabilities through the platform is spot-on, and the examples are good.

See also my article on a contract-based data platform.


Being punny 😅

I had to drive 50 miles through ice and snow to get a computer part I needed. It was a hard drive.


Upcoming workshops


Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter - always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

Enjoy your weekend.

Andrew


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


Andrew Jones
Author
Andrew Jones
I build data platforms that reduce risk and drive revenue.