Skip to main content

Blogs

2019


You build it, you run it - you manage your data

·2 mins

Many tech companies follow the “you build it, you run it” mantra - and for good reason. You want to empower your teams to build high quality services and solutions, and that means feeling the pain when things aren’t quite as good as they should be.

2018


Transparency as a tool

·2 mins

The revelation by British, Dutch, and American spies of the systematic espionage by Russia was astonishing in many ways. One point to note was how Western governments are using transparency as a tool to counter the actions of the Russian GRU. Through transparency, they aim to sway global public opinion against the Russian government.

If your data is worth storing, it's worth structuring

·3 mins

When some people talk about a Data Lake (or Hadoop, or even just Big Data), they go on to say that we can store all our data, unstructured, forever, and be able to analyse it at any time (maybe even in real-time!).

Brewing Beer

·5 mins

Right now I’m drinking my 5th homebrewed beer. It tastes fantastic!

Three brown glass bottles with black caps, each labeled with a tag that has a drawing of a wheat stalk and the text "Wheat from the Chaff," are arranged behind a glass mug filled with a frothy, light-colored beverage.

* * *

2017


Berlin

·2 mins

I recently spent a couple of nights in Berlin. Here are some notes and photos from my trip.

Go development environment with Docker

·1 min

The benefits of using Docker for your development environments are well known by now. You get consistency across machines, the ability to easily install databases locally, and everything is self-contained.

Apache Spark Partitioning

·2 mins

A partition is a group of your data. When loading from HDFS, Spark will create a partition for each one of your files. This is normally fine, but there are times when you want to repartition, and Spark has a few functions to do that. The most commonly used are repartition and coalesce.