Skip to main content

Stop writing data governance policies

Automate them instead

Hello 👋 I’m writing to you from Data Mesh Live at Antwerp, where I’ve given a full day workshop and a well received talk on implementing a self-service data platform. Besides that I have lots to digest from some excellent talks and made many new great connections!

This week’s newsletter comes from that talk and is about scaling data governance by moving away from documents and towards automation.

There’s also links to articles on data product managers, context graph architecture, and autonomous web scraping.

Enjoy!


Stop writing data governance policies (Automate them instead)

The way data governance is implemented in many organisations is there is a data governance authority (maybe a team, maybe an individual such as a Chief Data Officer (CDO)) who publish data standards and policies that owners of data must follow.

You could argue this works ok when a central data team owns much of the data. They can take the time to understand these policies and the reasons behind them and invest in tooling to help them manage their data.

However, in federated architectures such as data mesh, where data ownership is moved out of the data team to people across the organisation, this approach fails quickly.

These data owners don’t have the time nor the wish to read all these documents, to become experts in data management, to invest in their own tooling. It’s too much!

A simple black-and-white illustration showing a person shouting at a pile of papers labeled "Writes" passing through "Standards and Policies" to a sad and confused person carrying a backpack, with the text "Data Governance Team" below the shouter, "Standards and Policies" below the papers, and "Data owners" below the sad person. The image includes the text on the papers: "____ ??".

On the other side, the data governance authority has less visibility on the implementation of their policies, and little confidence they are being followed consistently across the organisation, even if people say they are.

To scale data governance we need to move away from written standards and policies that require humans to implement, and instead define these in a way they can be codified, and then automate the policies through the data platform.

A diagram illustrating a data governance process showing that data is defined by the Data Governance Team, then automatically processed by the Data Platform, which manages data ownership with a person labeled as a Data Owner, and includes text on the image with a person character pointing to hexagons, saying "Own," and a note indicating "Data owners."

The automation gets the direction from the policies, and then uses data contracts to find the data across the organisation and gain the context it needs to correctly apply the data governance policies to that data.

For example, you may have a policy that customer data is deleted/anonymised after 2 years of inactivity. That is something that can be easily codified in your platform tooling.

Diagram showing data contracts passing context to an anonymisation service, which applies policies and stores anonymized data in a data warehouse with three database icons.

Then the data contract provides the context required to apply that policy to the data, such as the categorisation of the data and its anonymisation strategy, as shown below using ODCS.

version: 1.0.0
name: Customer
schema:
    properties:
      - name: name
        logicalType: string
        description: The name of the customer.
        classification: restricted
        customProperties:
          - property: anonymizationStrategy
            value: hex
      - name: email
        logicalType: string
        description: The email address of the customer.
        classification: restricted
        customProperties:
          - property: anonymizationStrategy
            value: email

With this platform approach to data governance, the data owner can focus their time and energy on publishing data products that enable others to use their data, and the data governance authority can be confident their standards and policies are being correctly applied across the organisation.


There are not enough Data Product Managers - And that’s not the problem by Anna Bergevin, Gaëlle Seret and Juha Korpela

A great post on the emerging role of Data Product Managers and what exactly they should be doing.

Context Graphs Are a Convergence — And Convergence Needs Architecture by Kurt Cagle (on LinkedIn)

Interesting article on the architecture required for context graphs.

Autonomous web scraping with Claude Code by Max Halford

Nice, practical example of having an LLM automatically update a web scraping script as web pages change over time.


Being punny 😅

Too many authors to quote? No problem et al.


Upcoming events


Thanks! If you’d like to support my work…

Thanks for reading this weeks newsletter — always appreciated!

If you’d like to support my work consider buying my book, Driving Data Quality with Data Contracts, or if you have it already please leave a review on Amazon.

Enjoy your weekend.

Andrew


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

    Newsletter

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)


    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.