Data contracts at Miro
Miro recently wrote up their adoption of data contracts in Data Products Reliability: The Power of Metadata.
This describes their most recent iteration of data contracts, building on from their initial approach, which had already been successful:
Through dedicated efforts over multiple quarters, we achieved remarkable improvements, reducing our key pipeline’s downtime from 50% to nearly 1% today. This transformation underscores the crucial role of defining and validating data contracts, which have proven indispensable in fostering trust in our data assets.
However, did have some challenges:
A data contract, which serves as an agreement on expectations between the producer and the consumer, must be easily accessible to the target audience. Traditionally, these contracts were defined within the engineering-owned Airflow repository through technical task-aware files, making them overly technical for many analytics data consumers to engage with effectively.
At Miro they’ve introduced new capabilities to solve those challenges, both of which build on their collection and use of metadata:
Data Integrity Checks: Tools such as dbt tests, data quality frameworks, and event validations now play a pivotal role. These tools deliver meticulous metadata results to a centralized platform, enhancing our ability to ensure data integrity across the board.
Metadata Management Platform: Metadata Management Platform: We have adopted a robust metadata management platform (DataHub Cloud by Acryl Data) that offers comprehensive insights into all assets within our ecosystem. From the initial events that capture user activity to the final Looker dashboards that make insights consumable for the business, this platform provides detailed lineage and quality information critical for maintaining data reliability.
The rest of the post explains the benefits of data products and the changes they made to their development processes as part of the move to a data product approach.
I particularly like this from their conclusion:
By granting greater autonomy to data consumers in defining their priorities, we are shifting decision-making power from technical data engineers to those who directly interact with and benefit from the data. This transition necessitates making our processes and approaches more accessible to non-technical users, ensuring that our data ecosystem continues to evolve in alignment with the needs of all stakeholders. As we move forward, our focus will remain on creating a more inclusive and user-friendly environment, enabling everyone to contribute to and benefit from reliable data products.
This is a great outcome, and really shows the business benefits of moving to data products, implemented with data contracts.