Data contracts are simple, but powerful
·1 min
Data contracts are a simple idea. You simply describe your data in a structured format.
For example, here is a YAML file I use in my training.
dataset: customers
owner: [email protected]
description: All active customers of our product.
version: 1
columns:
- name: id
description: Unique ID for each customer
data_type: VARCHAR
checks:
- type: no_missing_values
- type: no_duplicate_values
- name: size
description: The customer's t-shirt size
data_type: VARCHAR
checks:
- type: invalid_count
valid_values: ['S', 'M', 'L']
must_be_less_than: 1
- name: created
description: The timestamp at which the customer object was created
data_type: TIMESTAMP
- name: distance
description: The distance the customer is from our shop
data_type: INTEGER
But this unlocks so much power, allowing us to:
- Create a stable interface for the data, much like an API
- Automate the running and reporting of data quality checks
- Improve communication between those that create data and those that consume it
And more.
Data contracts are simple, but powerful.