Skip to main content

Data Engineering

2024


Deliberately model your data

·1 min

If you’re creating data, you’re modelling data.

You’re making a decision on how that data is presented to your users.

Computation is the most expensive part of your data stack

·1 min

Computation is the most expensive part of your data stack.

So, if you’re thinking about becoming more cost effective, you need to reduce the amount of computation you need to do in order to produce value for the business.

Count the joins

·1 min

I enjoyed this post from Nicole Radziwill, PhD on LinkedIn:

How fragile are your pipelines? Start with this simple metric: COUNT THE JOINS. Every time you have to join, you’re making multiple assumptions about the underlying raw data, the biggest one being: you’re assuming it’s not going to change.


Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

    Newsletter

    (Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)