The problem with defaults

Let’s say 20 years ago you ran your code in an environment you configured simply as python. The obvious default would have been Python 2. But today the obvious default is Python 3. If you deployed that same code with the same configuration, what Python environment would you expect? What would you expect that to be in 20 years time?

What about when defining whether a field or a dataset contains personal data. What would be the best default? Most data is not personal data, so would you default to the most common definition of no and save people typing that each time? But then it’s not obvious whether someone has made a decision on that or not. You may not even know a decision needs to be made, and just be unaware that definition exists. That could easily lead to personal data not being defined as such, increasing the risk of a privacy incident.

You’re making a trade off. You’re trying to make this easier for the person configuring that environment or categorising data, but by being ambitious you’re increasing the risk of something going wrong.

And in these examples, how often are people configuring that environment, or creating new data products? Probably not that often. So how much time are you saving? Probably very little. And almost certainly not enough to increase that risk.