Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman

1,159 views

Published on

Spark Summit East Talk

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman

  1. 1. Morri Feldman The Road Less Traveled Highlights and Challenges from Running Spark on Mesos in Production morri@appsflyer.com
  2. 2. The Plan Attribution & Overall Architecture Retention Data Infrastructure - Spark on Mesos 1 2 3
  3. 3. -OR- User Device StoreRedirected Enables • Cost Per Install (CPI) • Cost Per In-app Action (CPA) • Revenue Share • Network Optimization • Retargeting Media sources The Flow AppsFlyer Servers
  4. 4. Retention Install day 1 2 3 4 5 6 7 8 9 10 11 12
  5. 5. Retention Scale > 30 Million Installs / Day > 5 Billion Sessions / Day Retention Install day 1 2 3 4 5 6 7 8 9 10 11 12
  6. 6. Retention Dimensions
  7. 7. Two Dimensions (App-Id and Media-Source) Cascalog DataLog / Logic programming over Cascading / Hadoop Retention V1 (MVP)
  8. 8. Two Dimensions (App-Id and Media-Source) Cascalog DataLog / Logic programming over Cascading / Hadoop Retention V1 (MVP)
  9. 9. Two Dimensions (App-Id and Media-Source) Cascalog DataLog / Logic programming over Cascading / Hadoop Retention V1 (MVP)
  10. 10. S3 Data v1 – Hadoop Sequence files: Key, Value <Kafka Offset, Json Message> Gzip Compressed ~ 1.8 TB / Day S3 Data v2 – Parquet Files (Schema on Write) Retain fields required for retention, apply some business logic while converting. Generates “tables” for installs and sessions. Retention v2 – “SELECT … JOIN ON ...” 18 Dimensions vs 2 in original report Retention – Spark SQL / Parquet
  11. 11. Retention Calculation Phases 1. Daily aggregation Cohort_day, Activity_day, <Dimensions>, Retained Count 2. Pivot Cohort_day, <Dimensions>, Day0, Day1, Day2 … After Aggregation and Pivot ~ 1 billion rows
  12. 12. Data Warehouse v3 Parquet Files – Schema on Read Retain almost all fields from original json Do not apply any business logic Business logic applied when reading through use of a shared library
  13. 13. Spark and Spark Streaming: ETL for Druid SQL
  14. 14. Why? All Data on S3 – No need for HDFS Spark & Mesos have a long history Some interest in moving our attribution services to Mesos Began using spark with EC2 “standalone” cluster scripts (No VPC) Easy to setup Culture of trying out promising technologies
  15. 15. Mesos Creature Comforts Nice UI – Job outputs / sandbox easy to find Driver and Slave logs are accessible
  16. 16. Mesos Creature Comforts Fault tolerant – Masters store data in zookeeper and canfail over smoothly Nodes join and leave the cluster automatically at bootup / shutdown
  17. 17. Job Scheduling – Chronos ?https://aphyr.com/posts/326-jepsen-chronos
  18. 18. Specific Lessons / Challenges using Spark, Mesos & S3 -or- What Went Wrong with Spark / Mesos & S3 and How We Fixed It. Spark / Mesos in production for nearly 1 year
  19. 19. S3 is not HDFS S3n gives tons of timeouts and DNS Errors @ 5pm Daily Can compensate for timeouts with spark.task.maxFailures set to 20 Use S3a from Hadoop 2.7 (S3a in 2.6 generates millions of partitions – HADOOP-11584) https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
  20. 20. S3 is not HDFS part 2 Use a Direct Output Commiter https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/ Spark writes files to staging area and renames them at end of job Rename on S3 is an expensive operation (~10s of minutes for thousands of files) Direct Output Commiters write to final output location (Safe because S3 is atomic, so writes always succeed) Disadvantages –Incompatible with speculative execution Poor recovery from failures during write operations
  21. 21. Avoid .0 releases if possible https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/ Worst example Spark 1.4.0 randomly loses data especially on jobs with many output partitions Fixed by SPARK-8406
  22. 22. Coarse-Grained or Fine- Grained? TL; DR – Use coarse-grained Not Perfect, but Stable
  23. 23. Coarse-Grained – Disadvantages spark.cores.max (not dynamic)
  24. 24. Coarse-Grained with Dynamic Allocation
  25. 25. Tuning Jobs in Coarse-Grained
  26. 26. Tuning Jobs in Coarse-Grained Set executor memory to ~ entire memory of a machine (200GB for r3.8xlarge) spark.task.cpus is then actually spark memory per task OOM!! 200 GB 32 cpus
  27. 27. Tuning Jobs in Coarse-Grained More Shuffle Partitions OOM!!
  28. 28. Spark on Mesos Future Improvements Increased stability – Dynamic allocation Tungsten Mesos Maintenance Primitives, experimental in 0.25.0 Gracefully reduce size of cluster by marking nodes that will soon be killed Inverse Offers – preemption, more dynamic scheduling
  29. 29. How We Generated Duplicate Data OR S3 is Still Not HDFS
  30. 30. S3 is Still Not HDFS S3 is Eventually Consistent
  31. 31. We are Hiring! https://www.appsflyer.com/jobs/

×