Skip to main content

Validate as early as possible

·1 min

It’s always best to do your data validation as early as possible as that:

  • Allows them to be acted on quicker
  • Alerts the team with the most context of the data/issue
  • Reduces the potential of having the incorrect data in multiple systems

So with data contracts I encourage data producers to validate their data before they publish their data to a downstream service.

We can make that easier for them by providing easy to use libraries that perform the validation.

And we can make that easier for us by using existing open source tooling.

For example, we can convert the data contracts to JSON Schema, which has good support for data validation rules, and then use one of the many open source libraries to perform the validation in whatever programming language your data producers use.

Of course, this does require a small code change in the application, and then the data producer needs to handle that error in some way (Sentry, dead letter queues, etc).

Tomorrow I’ll write a suggestion for times where that may not be possible.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


Andrew Jones
Author
Andrew Jones
I build data platforms that reduce risk and drive revenue.