Active vs passive data publishing
Passive data publishing is the norm in most organisations.
That’s where we’re using patterns like ELT or CDC to extract copies of the upstream database and replicate them in a data warehouse/lake. The data producer isn’t doing anything to facilitate this - they are passive.
Because they’re not doing anything to facilitate this, they can’t assume responsibility for this data. Nor can they assume accountability for any changes they make to their database affecting downstream processes.
They’re doing what they should be doing - making changes to their database to enable them to deliver product features.
They are passively involved in the flow of data - not active.
Active data publishing is where data producers are explicitly providing data to their consumers.
There’s an abstraction away from the database, and the data is provided through an interface. There’s a data contract that documents what the data is and what use cases it is designed for.
They see this production of data as part of their job. They understand the value that is provided to other parts of the business through this data and actively engage with those who consume the data to ensure they have what they need to create that value.
Because data producers are actively producing data, they do assume resume responsibility for that data, and they are accountable to any changes they make to the data contract that impacts downstream users.
Active data publishing leads to better data quality and better results for the organisation.
Data contracts enable active data publishing.