Criteo & Spark : Under the hood of Spark performance, or why query compilation matters

In this post, I will discuss writing efficient Spark code and demonstrate on toy examples common pitfalls. I show that Spark SQL (Datasets) should generally be preferred to Spark Core API (RDD) and that by making the right choice, you can win 2 to 10 times in the performance of your big data jobs, which matters.

Read more: