What does it mean to take responsibility?
What does it mean when we say data producers must take responsibility for their data?
More importantly, do they know what it means?
And do they know why it’s important?
It needs to be explicitly stated what their responsibility is.
For example, we might say data producers are responsible for the change management of their data, and must follow a certain process when making a breaking change to the data and/or the schema.
That could involve:
- Producing RFCs or other documentation for review
- Backfilling data from the previous version of the schema to the new
- Populating both the current and the new version of the schema for a period of time, allowing for a migration with no downtime for consumers
It’s up to the data producers exactly how to do these things. For example, a small change on a non-critical dataset may only need a migration timeline of days/weeks. A larger change to a critical dataset may have a timeline of months.
The data producers need to consider all the trade-offs in that decision, how it impacts their project, and how it impacts their consumers.
That’s what it means to take responsibility.