Apache Spark Partitioning
·2 mins
A partition is a group of your data. When loading from HDFS, Spark will create a partition for each one of your files. This is normally fine, but there are times when you want to repartition, and Spark has a few functions to do that. The most commonly used are repartition and coalesce.