Past performance is no guarantee of future performance

If I want to use a dataset to build a critical data application, I need to make a decision on whether it meets my requirements.

I could do that using the past performance of that dataset. For example, if I can see that it has been updated hourly for the last 30 days, I could assume it will always be updated hourly, and build that assumption into my application design.

However, past performance is no guarantee of future performance.

The dataset may have been updated hourly over the last 30 days, but if the data producers aren’t committing to providing that level of support, there’s no SLOs, no on-call, etc, then at some point that performance will reduce.

And when it does it can have a significant impact on my data application and my users, because I had assumed the performance is better than it really was.

That’s why you need the performance to be explicitly defined, in a data contract, and for the data consumer to accept responsibility for that performance.

Only then can you build on the dataset with confidence.