Apache Spark

2017

Apache Spark Partitioning

25 May 2017·2 mins

A partition is a group of your data. When loading from HDFS, Spark will create a partition for each one of your files. This is normally fine, but there are times when you want to repartition, and Spark has a few functions to do that. The most commonly used are repartition and coalesce.

Want great, practical advice on implementing data mesh, data products and data contracts?

In my weekly newsletter I share with you an original post and links to what's new and cool in the world of data mesh, data products, and data contracts.

I also include a little pun, because why not? 😅

(Don’t worry—I hate spam, too, and I’ll NEVER share your email address with anyone!)