Skip to main content

Limit data divergence

·2 mins

Data is created by various systems and applications owned by various teams. Without attention that will naturally lead to divergent data as they each decide how best to create and model the data they own.

But that lack of consistency is one of the things that makes data engineering expensive and difficult. It becomes the data engineers job to try to bring that divergent data back together again so it can be useful for others. Some of that data will also be missing key dimensions or not be as accurate as others, reducing the value of the dataset as a whole.

So, anything we can do to limit data divergence will reduce that costs, improving the time to insights and the value we can get from our data.

For example, say we need to track usage of features within our project, which will be used to determine where to invest (or not), how to price them, and so on. If we allowed each feature to be tracked differently it’s going to take some effort to bring it all back together later.

We therefore might consider providing a single data contract specification that ensures data is only collected when it matches that structure.

We could go further still, by providing libraries that software engineers can easily use to collect the tracking in a consistent way by making it easy as possible for them to do so.

Spending effort here is much more valuable that spending the effort later to try to clean it up. We’re improving the data at source, ensuring the quality is what we need it to be. So we can quickly and confidently provide the analytics the business needs to make those critical decisions.

Daily data contracts tips

Get tips like this in your inbox, every day!

Give me a minute or two a day and I’ll show you how to transform your organisations data with data contracts.

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue. Guaranteed, with data contracts.