Data teams start with accessibility
A new data team, perhaps being created in a business that is evolving from a startup to a scaleup, often focus on accessibility.
They need to get the data out of systems and into a data warehouse so they can do some basic reporting.
They deploy an ELT or CDC tool to copy the data from the upstream database into a data warehouse.
That makes the data accessible, in that it can now be queried by a business intelligence tool and you can start creating reports.
However, it has a number of problems:
- The terms used in the upstream database may mean something to that service, but not to the business (
customer
vscompany
vsorganisation
vsid
…), so the data needs refining - Only data that has been refined by data engineering can be used. The rest of it is potentially valuable for other use cases but remains inaccessible
- Building on top of the DB is unstable, leading to data incidents. Users may then start to lose trust in the data
That instability makes it impossible to build critical data applications upon the data, for example those that power key business process or data-driven product features (including AI-based features).
As the business requirements change from accessibility to stability, so should the data teams approach.