The cost of handoffs in data

Every handoff has a cost.

As data engineers we see this most often in the cost of handing off data. We move it from one system to another, paying the cost in the compute needed to do that, paying the cost in the duplication of storage, paying the cost in building and maintaining the pipelines doing it.

Given the high cost we should be expecting high value in return. But I don’t believe that’s always the case.

That’s why I’m trying to challenge the assumption that data has to be moved before value can be realised from it. Is there a way we can provide value from the data where it already is? Can we federate a query to that source system as/when needed, rather than bringing it all in to a central data warehouse? Does the source system has sufficient reporting functionality that meets 90% of our needs, and if so, does the remaining 10% justify this cost?

Try asking yourself and your stakeholders those questions next time you’re asked to move data.

Tomorrow I’ll be writing about handoffs in a different context, at people.