Data contracts and dbt
I was asked on LinkedIn how I might approach data contracts with dbt.
I have thought a little about how to apply data contracts to dbt targets, but not a lot and I haven’t done this in practice yet.
One of my guiding principles with data contracts is for them to be defined as close to the code that creates the data as possible. So for dbt, I wouldn’t want my data engineers to be duplicating the schemas and other metadata from dbt and having to enter that into a different system (i.e. our existing, code-based data contracts), as that’s too much friction and they’d fall out of sync.
Instead, I would extend the dbt models YAML properties to include everything that is needed to define the data contract, making use of what is already there and adding what is not currently present.
Then behind the scenes it would integrate with the existing contract-based data platform to provide the same functionality and make use of the same services (e.g. data retention, access controls, backups, integration with the data catalog, and so on).
That way data engineers working with dbt define data contracts in the most natural way for them, but the platform and the data products built on them remain standardised.