A feature of architecture
Happy new year! If you had a break I hope it was a good one :)
Today I’m writing about features of architecture, and the impact they have on processes and ways of working.
Also links to articles on orchestration success in decentralised architectures, data modelling for private markets, and unifying batch and streaming.
A feature of architecture
The process we have and the way we work together are often a feature of our architecture, for better and for worse.
For example, if we have a data architecture which provides features which make it as easy as possible to write data to the data lake, with no validation, no quality rules, etc, then we’ll have lots of data being written, but it will be poor quality data that is difficult and expensive to use.
Which is why many data lake architectures become data swamps.
Similarly with change data capture (CDC). If we have a data architecture that makes it easy to extract data from operational databases, with no involvement needed from those service owners, then we will have data that is strongly coupled to that database and with no sense of ownership from the service owners.
Which is why data teams spend so much time transforming data, working around upstream data changes, and asking “who owns this data?”.
But other architectures are available.
We see this with service-to-service communication, where architecturally we use interfaces (APIs) and never access the system’s database directly. Initially, service owners don’t know what APIs they need to provide, so they talk to the consumers and together they agree what APIs are needed, what the structure and quality needs to be, and what will be provided. They then build and own these APIs that other teams can build upon with ease.
Here, the features of the architecture encourage collaboration between teams whose services depend on each other, ensuring that useful interfaces and data are provided at the right quality, and assign clear ownership and responsibilities.
There’s no reason why data architectures can’t be designed in a similar way, providing the right features to encourage better outcomes.
Interesting links
Orchestrating Success by Oscar Ligthart and Rodrigo Loredo at Vinted
We had successfully decentralized ownership, but we had accidentally introduced fragility in the hand‑offs between teams.
That’s a common problem! Vinted’s solution involved creating a DAG Generator powered by an Asset Registry, which is interesting…
Data Modeling for Private Markets: A Field Guide by Michael New at
Arcesium
A nice discussion on/introduction to data modelling. I particularly liked this point:
You’re not optimizing for elegance, you’re protecting against entropy.
Fluss × Iceberg (Part 1): Why Your Lakehouse Isn’t a Streamhouse Yet by Mehul Batra and Luo Yuxia
Unifying batch and streaming has been a challenge for decades. Apache Fluss could be the solution.
Being punny 😅
I played a U2-themed board game called “Bonopoly”. It’s like Monopoly, but where the streets have no name
Thanks! If you’d like to support my work…
Thanks for reading this weeks newsletter — always appreciated!
If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.
Also check out my self-paced course on Implementing Data Contracts.
Enjoy your weekend.
Andrew