CDC creates tight coupling
Many data platforms start with a change data capture (CDC) service to extract data from an organisations transactional databases - the source of truth for their most valuable data.
The idea is once you bring all that data into your data warehouse you can build whatever you need on top of that data.
However, what we have built is now tightly coupled to the upstream transactional database, which will lead to problems in the future.
This is a quote from Shopify’s article on their CDC setup:
One of the unfortunate aspects of CDC is the tight coupling of the external event consumers to the internal data model of the source. Breaking changes and data migrations can be a normal part of application evolution. By allowing external consumers to couple on these fields, the impact of breaking changes spreads beyond the single codebase into multiple consumer areas.
We’ve been living with these problems for years, building tools to solve the symptoms of these problems by deploying services such as data cataloguing, lineage, and automated testing tools.
While these tools still have a role to play, they shouldn’t be there solely to catch and alert on upstream data issues as if that is something that could never be addressed.
Instead, change your upstream processes to create data as consumable events that can confidently be built on top of.
Treat data as a first-class output of your service.
Guarantee it with a data contract.