8. Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading /
Hadoop
Retention V1 (MVP)
9. Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading /
Hadoop
Retention V1 (MVP)
10. Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading / Hadoop
Retention V1 (MVP)
11. S3 Data v1 – Hadoop Sequence files:
Key, Value <Kafka Offset, Json Message>
Gzip Compressed ~ 1.8 TB / Day
S3 Data v2 – Parquet Files (Schema on Write)
Retain fields required for retention, apply some
business logic while converting.
Generates “tables” for installs and sessions.
Retention v2 – “SELECT … JOIN ON ...”
18 Dimensions vs 2 in original report
Retention – Spark SQL / Parquet
13. Data Warehouse v3
Parquet Files – Schema on Read
Retain almost all fields from original json
Do not apply any business logic
Business logic applied when reading through
use of a shared library
15. Why?
All Data on S3 – No need for HDFS
Spark & Mesos have a long history
Some interest in moving our attribution services to Mesos
Began using spark with EC2 “standalone” cluster scripts (No VPC)
Easy to setup
Culture of trying out promising technologies
17. Mesos Creature Comforts
Fault tolerant – Masters store data in
zookeeper and canfail over smoothly
Nodes join and leave the cluster
automatically at bootup / shutdown
19. Specific Lessons / Challenges
using Spark, Mesos & S3
-or-
What Went Wrong with
Spark / Mesos & S3 and How
We Fixed It.
Spark / Mesos in production for nearly 1 year
20. S3 is not HDFS
S3n gives tons of timeouts and DNS Errors
@ 5pm Daily
Can compensate for timeouts with
spark.task.maxFailures set to 20
Use S3a from Hadoop 2.7
(S3a in 2.6 generates millions of partitions –
HADOOP-11584)
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
21. S3 is not HDFS part 2
Use a Direct Output Commiter
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
Spark writes files to staging area and renames them at
end of job
Rename on S3 is an expensive operation
(~10s of minutes for thousands of files)
Direct Output Commiters write to final output location
(Safe because S3 is atomic, so writes always succeed)
Disadvantages –Incompatible with speculative
execution
Poor recovery from failures during write operations
22. Avoid .0 releases if possible
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
Worst example
Spark 1.4.0 randomly loses data especially
on jobs with many output partitions
Fixed by SPARK-8406
27. Tuning Jobs in Coarse-Grained
Set executor memory to ~ entire memory of a
machine (200GB for r3.8xlarge)
spark.task.cpus is then actually spark memory
per task
OOM!!
200 GB 32 cpus
28. Tuning Jobs in Coarse-Grained
More Shuffle Partitions
OOM!!
29. Spark on Mesos Future
Improvements
Increased stability –
Dynamic allocation
Tungsten
Mesos Maintenance Primitives, experimental in 0.25.0
Gracefully reduce size of cluster by marking nodes
that will soon be killed
Inverse Offers – preemption, more dynamic scheduling