Skip to main content

Data contracts and dbt

·1 min

I was asked on LinkedIn how I might approach data contracts with dbt.

I have thought a little about how to apply data contracts to dbt targets, but not a lot and I haven’t done this in practice yet.

One of my guiding principles with data contracts is for them to be defined as close to the code that creates the data as possible. So for dbt, I wouldn’t want my data engineers to be duplicating the schemas and other metadata from dbt and having to enter that into a different system (i.e. our existing, code-based data contracts), as that’s too much friction and they’d fall out of sync.

Instead, I would extend the dbt models YAML properties to include everything that is needed to define the data contract, making use of what is already there and adding what is not currently present.

Then behind the scenes it would integrate with the existing contract-based data platform to provide the same functionality and make use of the same services (e.g. data retention, access controls, backups, integration with the data catalog, and so on).

That way data engineers working with dbt define data contracts in the most natural way for them, but the platform and the data products built on them remain standardised.

Diagram showing dbt and code-based data contracts on top of the contract-driven data platform

Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

Enter your best email here:

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.