Skip to main content

Data contracts on streaming data

·1 min

Data contracts are typically used to apply change management to data in tables, in a data warehouse.

But the concepts of data contracts can be applied to any interface where data is made available. For example, we use it for streaming data in Google Pub/Sub, and it could equally be used for streaming data through Kafka or any streaming platform.

These streams are configured in much the same way as a table.

First, we apply a schema to the stream that is taken from the data contract. For Pub/Sub and Kafka that likely means converting the data contract to Avro or Protobuf, but that should be trivial.

Then we apply some change management to that contract, so the schema can only be changed if the change is compatible (non-breaking). If not, we prevent that change happening until a new major version is created.

And that’s really it, for a minimal data contract implementation.

No more difficult than an implementation on a data warehouse.