CI is too late in the SDLC to identify data changes
Hey 👋 This week I write about how CI is still too late in the SDLC to prevent major data model changes.
There’s also links to articles on IaC, what’s still true in data engineering, and Nike’s changes to their streaming architecture.
CI is too late in the SDLC to identify data changes
I once worked on a data team that attempted to review PRs from the software engineering teams to identify if they were going to impact any data processes.
There were are number of reasons why that didn’t work and was soon abandoned, including:
- It slowed down software engineering too much.
- The reviewers didn’t have enough context to catch issues.
- Even if an issue was caught it was by this time too late to make large changes to the data models, so the data team still had to work around the changes ASAP to prevent pipeline breakages.
A common use of data contracts is to automate these checks, solving the first of those two problems.
But only using data contracts for CI still leaves the third problem.
The reason why CI is too late for major changes to the data models is that it is towards the end of the Software Development Life Cycle (SDLC), by which time a lot of effort and cost has been spent.

This is why when we talk about shift left, we need to be thinking much earlier in the SDLC, using data contracts to define requirements and capture agreements before the implementation and testing phases have begun.
In some ways this is more difficult, as you need to get yourself involved in the earlier phases of major projects, and that requires relationship building and other people skills, whereas CI checks is just writing and deploying some code.
But CI checks alone will never move you away from reacting to changes and towards proactively working with the business to manage the quality and dependability of data.
Interesting links
Infrastructure as Code for Data Engineer by Erfan Hesami
If you’re not too familiar with IaC then this is a good introduction.
5 Things in Data Engineering That Still Hold True After 10 Years by Ben Rogojan (a.k.a. SeattleDataGuy)
Some things never change - and for good reason.
Beyond the Data Abyss. How We Learned to Fall in Love with our Streaming Data Again by Scott Haines (Nike)
Nice writeup of using schemas (protobuf) and a schema registry to reduce API change times from months to days.
Being punny 😅
We all know about Alan Turing who cracked the Enigma code. But very few people know about his sister Kate, who provided drinks and snacks for him and his team.
Thanks! If you’d like to support my work…
Thanks for reading this weeks newsletter — always appreciated!
If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.
Enjoy your weekend.
Andrew