Skip to main content

Infering rather than defining

·1 min

In yesterday’s note I wrote about the problem with defaults. One response to my personal data example could be “why don’t we just infer it?”.

There are plenty of tools that try to do just that, for example Google Cloud Sensitive Data Protection, that can profile all your data in BigQuery and determine whether it is personal data or not.

But these tools are never going to be perfect. And if you’re using a data contract to define your data, why not categorise it too? The result is better, and by asking data producers to do that making it clear to them that it is important to the organisation and is part of their responsibilities.

The trade off is you’re asking them to do a little more, and it is an increase in their cognitive load. But data protection is so important it’s probably worth it, especially as they are probably not creating new data products every day.

P.S. Sorry for the broken link the other day! This is the correct link for the data contracts webinar. It’s today at 6pm GMT, so do come and join us! And thanks everyone who emailed me about the broken link.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


Andrew Jones
Author
Andrew Jones
I build data platforms that reduce risk and drive revenue.