From data replication to data publication
Hey ๐
Well done on making it through February! Next week is one of my favourite days of the year โ pancake day ๐ฅ. This is my go to recipe.
From data replication to data publication
Many data engineering teams focus on replicating data from one place to another.
For example, using CDC to replicate a Postgres database to their data warehouse.
But why? Is replication the goal?
We’re not in the business of creating backups. We’re in the business of realising value from data.
And replication isn’t needed for that. Nor is it particularly useful.
It’s not useful to have a copy of the internal models of the database in the data warehouse. If it was, we wouldn’t spend so much time, effort, and money “cleaning” it.
To realise value from data, we don’t need data replication.
We need data publication.
We need data to be published from the upstream services to the data warehouse. Data that designed for our use cases. Data that is immediately useful.
The only people able to make changes to the upstream services to enable them to publish data are the service owners.
Therefore, it must be their responsibility to publish data for other parts of the business to consume.
That doesn’t mean they’re on their own.
We can make this as easy as possible by investing in the data platform and providing enabling capabilities through data contracts.
We also need to get the business onside, which is going to take some time and require some excellent communication skills.
But if the value we are creating from data is important enough to the business, then there’s no reason why we can’t make the case that to do this more quickly, more effectively, more cheaply, we need to change how we generate data.
Moving away from data replication.
Moving to data publication.

Interesting links
10 Tips for Turning Around a Platform Team by John Cutler
Great advice if youโre in a (data) platform team struggling with endless requests and seemingly no time to move the platform forward.
Many of the tips would also be useful to you if you’re a data team struggling with firefighting and ad-hoc requests.
Building Outside-In API Product Feedback Loops Through API Consumer Solidarity by Kin Lane
While this is about APIs, you can swap that for data and all the advice remains relevant.
Engineer to CEO in 3 years: These key lessons got me there by Alex Pettigrew
I don’t particularly want to become a CEO, but I do want to be a better leader. Some good advice on leadership here.
SQL Noir - A Detective SQL Game by Hristo Bogoev
Fun game!
Being punny ๐
I’m having real trouble putting together a hide and seek team. It’s just too hard to find good players.
Upcoming workshops
- Data Quality: Prevention is Better Than the Cure - Virtual Meetup - March 12 4pm GMT / 12:00 ET / 09:00 PT
- I’ll be speaking about data quality and data contracts at the online Data Vault User Group meetup.
- This will probably be the last time I’ll give this particular talk, which has been well received over the last 18 months or so.
- Sign up for free here
- Implementing a Data Mesh with Data Contracts - Antwerp, Belgium - June 5
- Alongside the inaugural Data Mesh Live conference, where I’ll also be speaking.
- Early Bird pricing available until April 30
- โSign up hereโ
Thanks! If you’d like to support my work…
If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.
Enjoy your weekend!
Andrew