Infering rather than defining

In yesterday’s note I wrote about the problem with defaults. One response to my personal data example could be “why don’t we just infer it?”.

There are plenty of tools that try to do just that, for example Google Cloud Sensitive Data Protection, that can profile all your data in BigQuery and determine whether it is personal data or not.

But these tools are never going to be perfect. And if you’re using a data contract to define your data, why not categorise it too? The result is better, and by asking data producers to do that making it clear to them that it is important to the organisation and is part of their responsibilities.

The trade off is you’re asking them to do a little more, and it is an increase in their cognitive load. But data protection is so important it’s probably worth it, especially as they are probably not creating new data products every day.

P.S. Sorry for the broken link the other day! This is the correct link for the data contracts webinar. It’s today at 6pm GMT, so do come and join us! And thanks everyone who emailed me about the broken link.