If data is part of engineering, do we need data contracts?
David Jayatillake published an interesting post recently titled “We don’t need data contracts”. The title was enough to ensure it caught plenty of attention… but the subtitle more accurately describes what David is arguing for: “We need data to be part of product engineering”.
I agree!
If data applications are supporting key business process, driving ML models that power product features, then they should be built in the same way, and with the same discipline, that product engineering use for their services.
And of course, data should be owned by the team who produces it.
My goal with data contracts was always a way to facilitate a move to this model, without changing the organisation structure first.
That’s why my book talks mostly about that, and much less on the technology.
Even with the perfect org structure, there is still a need for an interface to access the data.
Often that would be a table in a data warehouse, with historical data, because the people consuming this data are often using tools like dbt or SQL-based analytic tools like Looker.
And that’s the interface that can be driven by a data contract.