Skip to main content

Data contracts are simple, but powerful

·1 min

Data contracts are a simple idea. You simply describe your data in a structured format.

For example, here is a YAML file I use in my training.

dataset: customers
owner: [email protected]
description: All active customers of our product.
version: 1

columns:
  - name: id
    description: Unique ID for each customer
    data_type: VARCHAR
    checks:
      - type: no_missing_values
      - type: no_duplicate_values
  - name: size
    description: The customer's t-shirt size
    data_type: VARCHAR
    checks:
      - type: invalid_count
        valid_values: ['S', 'M', 'L']
        must_be_less_than: 1
  - name: created
    description: The timestamp at which the customer object was created
    data_type: TIMESTAMP
  - name: distance
    description: The distance the customer is from our shop
    data_type: INTEGER

But this unlocks so much power, allowing us to:

  • Create a stable interface for the data, much like an API
  • Automate the running and reporting of data quality checks
  • Improve communication between those that create data and those that consume it

And more.

Data contracts are simple, but powerful.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

Enter your best email here:

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue.