Skip to main content

Data contracts are simple, but powerful

·1 min

Data contracts are a simple idea. You simply describe your data in a structured format.

For example, here is a YAML file I use in my training.

dataset: customers
owner: [email protected]
description: All active customers of our product.
version: 1

columns:
  - name: id
    description: Unique ID for each customer
    data_type: VARCHAR
    checks:
      - type: no_missing_values
      - type: no_duplicate_values
  - name: size
    description: The customer's t-shirt size
    data_type: VARCHAR
    checks:
      - type: invalid_count
        valid_values: ['S', 'M', 'L']
        must_be_less_than: 1
  - name: created
    description: The timestamp at which the customer object was created
    data_type: TIMESTAMP
  - name: distance
    description: The distance the customer is from our shop
    data_type: INTEGER

But this unlocks so much power, allowing us to:

  • Create a stable interface for the data, much like an API
  • Automate the running and reporting of data quality checks
  • Improve communication between those that create data and those that consume it

And more.

Data contracts are simple, but powerful.

Daily data contracts tips

Get tips like this in your inbox, every day!

Give me a minute or two a day and I’ll show you how to transform your organisations data with data contracts.

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)

    Andrew Jones
    Author
    Andrew Jones
    I build data platforms that reduce risk and drive revenue. Guaranteed, with data contracts.