Skip to main content

Building defensively

·2 mins

I talk a lot about data quality, and how it can be improved.

And that’s because generally, it’s garbage!

And I strongly believe that with a little bit of discipline we can do a lot better.

And that will allow us to provide much greater value to the business at lower cost.

But, the quality of your data will never be perfect. Things will still break!

For example, you could have what you think is complete validation coverage of a form on your website. But someone will still find a way to input something unexpected that, somewhere, will break a data pipeline or a dashboard.

So, we should be thinking about how we can build defensively.

And we should completely never trust our inputs.

An extremely good example is Voyager 2, the space probe launched back in 1977. A few months ago a wrong command (or, a bad input) was sent to it, that tilted its antenna to point two degrees away from Earth, meaning the connection was lost.

However, the probe is programmed to reset its position multiple times each year to keep its antenna pointing at Earth, so without any further interaction (which, of course, can be difficult when you’re 12.39 billion miles from Earth!) the problem will resolve itself and communication will be restored.

Now, clearly we don’t all need to go to those kind of lengths in our data pipelines! But it is a great (and fun!) example of building defensively.

And that code was written nearly 50 years ago!

In theory, we’ve come a long way since then. Our technology is better, and we’re building on the lessons learned by so many people.

And yet, many pipelines are not built defensively. So with one bad input, everything stops working until someone manually fixes it by deploying new code or SQL (probably yet another COALESCE…) or changing some data.

So, next time you’re working on an important data pipeline, consider how you can make it more defensible, so it can handle unexpected data and self-heal as best it can.

It’s not rocket science 🙂