Paris Scala Group Event May 2019, No more struggles with Apache Spark workloads in production. Apache Spark Primary data structures (RDD, DataSet, Dataframe) Pragmatic explanation - executors, cores, containers, stage, job, a task in Spark. Parallel read from JDBC: Challenges and best practices. Bulk Load API vs JDBC write An optimization strategy for Joins: SortMergeJoin vs BroadcastHashJoin Avoid unnecessary shuffle Alternative to spark default sort Why dropDuplicates() doesn’t result consistency, What is alternative Optimize Spark stage generation plan Predicate pushdown with partitioning and bucketing Why not to use Scala Concurrent ‘Future’ explicitly!