Validate as early as possible
It’s always best to do your data validation as early as possible as that:
- Allows them to be acted on quicker
- Alerts the team with the most context of the data/issue
- Reduces the potential of having the incorrect data in multiple systems
So with data contracts I encourage data producers to validate their data before they publish their data to a downstream service.
We can make that easier for them by providing easy to use libraries that perform the validation.
And we can make that easier for us by using existing open source tooling.
For example, we can convert the data contracts to JSON Schema, which has good support for data validation rules, and then use one of the many open source libraries to perform the validation in whatever programming language your data producers use.
Of course, this does require a small code change in the application, and then the data producer needs to handle that error in some way (Sentry, dead letter queues, etc).
Tomorrow I’ll write a suggestion for times where that may not be possible.