Skip to main content

Validating data contracts downstream

I wrote yesterday how it’s best to validate as early as possible.

But there are times that might not be possible, maybe because:

  • You have hundreds or thousands of data producers, and it would take a long time to migrate
  • You cannot make changes to the upstream application

In those cases you may need to implement data quality checks a bit further downstream, though still as left as possible.

One architecture to consider is shown below.

Here we have the source system write to an output stream (doesn’t have to be a stream, could be an database/data lake/whatever) and from there we have a validator performing the validations against the data contract.

If the data is valid it gets written to another stream, which is the one consumers will consume from.

If the data is invalid we write to a dead letter queue and send an alert to the source system owner.

It’s important the source system owner gets those alerts and still has responsibility for ensuring their data matches their data contract. We could make that easier by providing runbooks and tools to help them recover from data issues.

With this architecture we’re still getting many of the benefits of validating error while reducing the work needed upstream.

Daily data contracts tips

Get tips like this in your inbox, every day!

Give me a minute or two a day and I’ll show you how to transform your organisations data with data contracts.

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue. Guaranteed, with data contracts.