The right level of abstraction

When building a (data) platform you end up thinking a lot about the abstractions you are providing, and the trade-offs they cause.

On one hand, you want to abstract away some details of the underlying platform to reduce the cognitive load of your users.

On the other hand, abstract away too much and your users will not have the knowledge to manage their services with autonomy. Instead, they will need to ask you about everything, and you become a bottleneck.

The solution we landed on when implementing data contracts at GoCardless was to provide an abstraction that made it easy to define a data contract, which then configured the resources needed to provide an interface for the data and the tooling needed to manage the data (our contract-based data platform).

Whilst at the same time, ensuring there were as few layers as possible below that abstraction before you were using the standard Google Cloud resources. So the data contract directly configured the BigQuery table, the Pub/Sub topic, etc, with the only layer in between being our existing infrastructure as code platform.

I don’t get it right all the time, but I’m always searching for the right level of abstraction.