On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies
Author: Zachary Ennenga
Airbnb’s new office building, 650 Townsend
Background
At Airbnb, our offline data processing ecosystem contains many mission-critical, time-sensitive jobs — it is essential for us to maximize the stability and efficiency of our data pipeline infrastructure.
So, when a few months back, we encountered a recurring issue that caused significant outages of our data…