Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Accelerating Spark Workloads in a Mesos Environment with Alluxio

169 views

Published on

MesosCon EU (Prague) 2017
Gene Pang, Software Engineer, Alluxio, Inc.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Accelerating Spark Workloads in a Mesos Environment with Alluxio

  1. 1. 1 Accelerating Spark Workloads in a Mesos Environment with Alluxio Gene Pang, Software Engineer, Alluxio, Inc. * ©2017 Alluxio, Inc. All Rights Reserved
  2. 2. About Me Gene Pang Software Engineer @ Alluxio, Inc. Alluxio Open Source PMC Member Ph.D. from AMPLab @ UC Berkeley Worked at Google before UC Berkeley Twitter: @unityxx Github: @gpang ©2017 Alluxio, Inc. All Rights Reserved 2
  3. 3. Outline Alluxio Overview Alluxio + Spark + Mesos Use Cases Using Spark with Alluxio on Mesos Deployment with Mesos Demo 1 2 3 4 5 ©2017 Alluxio, Inc. All Rights Reserved 3
  4. 4. Data Ecosystem Yesterday 4* ©2017 Alluxio, Inc. All Rights Reserved • One Compute Framework • Single Storage System • Co-located
  5. 5. Data Ecosystem Today 5* ©2017 Alluxio, Inc. All Rights Reserved … • Many Compute Frameworks • Multiple Storage Systems • Most not co-located …
  6. 6. Data Ecosystem Issues 6* ©2017 Alluxio, Inc. All Rights Reserved • Each application manage multiple data sources • Add/Removing data sources require application changes • Storage optimizations requires application change • Lower performance due to lack of locality … …
  7. 7. Data Ecosystem with Alluxio 7* ©2017 Alluxio, Inc. All Rights Reserved • Apps only talk to Alluxio • Simple Add/Remove • No App Changes • Memory Performance … …
  8. 8. Next Gen Analytics with Alluxio 8* ©2017 Alluxio, Inc. All Rights Reserved ✓ Big Data/IoT ✓ AI/ML ✓ Deep Learning ✓ Cloud Migration ✓ Multi Platform ✓ Autonomous … … Native File System Hadoop Compatible File System Native Key-Value Interface Fuse Compatible File System HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface Apps, Data & Storage at Memory Speed
  9. 9. Enabling Next Gen Analytics Unify your Data 9 1 Architecture Flexibility2 Improved I/O Performance3 * ©2017 Alluxio, Inc. All Rights Reserved
  10. 10. Fastest Growing Big Data Open Source Project 10* ©2017 Alluxio, Inc. All Rights Reserved • Fastest Growing open- source project in the big data ecosystem • Running world’s largest production clusters • 600+ Contributors from 100+ organizations
  11. 11. Outline Alluxio Overview Alluxio + Spark + Mesos Use Cases Using Spark with Alluxio on Mesos Deployment with Mesos Demo 1 2 3 4 5 ©2017 Alluxio, Inc. All Rights Reserved 11
  12. 12. Big Data Case Study – Challenge – Gain end to end view of business with large volume of data for $5B Travel Site Queries were slow / not interactive, resulting in operational inefficiency SPARK HDFS Solution – With Alluxio, 300x improvement in performance Impact – Increased revenue from immediate response to user behavior Use case: http://bit.ly/2pDJdrq CEPH HDFS CEPH FLINK SPARK FLINK ©2017 Alluxio, Inc. All Rights Reserved 12 MESOS
  13. 13. Machine Learning Case Study – 136/12/17 ©2017 Alluxio, Inc. All Rights Reserved Challenge – Disparate Data both on-prem and Cloud. Heterogeneous types of data. Scaling of Exabyte size data. Slow due to disk based approach. SPARK HDFS SPARK MINIO Solution – Using Alluxio to prevent I/O bottlenecks Impact – Orders of magnitude higher performance than before. http://bit.ly/2p18ds3 MESOS
  14. 14. Outline Alluxio Overview Alluxio + Spark + Mesos Use Cases Using Spark with Alluxio on Mesos Deployment with Mesos Demo 1 2 3 4 5 ©2017 Alluxio, Inc. All Rights Reserved 14
  15. 15. Sharing Data via Memory Storage Engine & Execution Engine Same Process • Two copies of data in memory – double the memory used • Inter-process Sharing Slowed Down by Network / Disk I/O ©2017 Alluxio, Inc. All Rights Reserved 15 Mesos Spark Compute Spark Storage block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Spark Compute Spark Storage block 1 block 3
  16. 16. Sharing Data via Memory Storage Engine & Execution Engine Different process • Half the memory used • Inter-process Sharing Happens at Memory Speed Spark Compute Spark Storage HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio block 1 block 3 block 4 Spark Compute Spark Storage ©2017 Alluxio, Inc. All Rights Reserved 16 Mesos
  17. 17. Data Resilience During Crash Spark Compute Spark Storage block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Storage Engine & Execution Engine Same Process ©2017 Alluxio, Inc. All Rights Reserved 17 Mesos
  18. 18. Data Resilience During Crash CRASH Spark Storage block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 4 block 2 • Process Crash Requires Network and/or Disk I/O to Re-read Data Storage Engine & Execution Engine Same Process ©2017 Alluxio, Inc. All Rights Reserved 18 Mesos
  19. 19. Data Resilience During Crash CRASH HDFS / Amazon S3 block 1 block 3 block 2 block 4 Storage Engine & Execution Engine Same Process • Process Crash Requires Network and/or Disk I/O to Re-read Data ©2017 Alluxio, Inc. All Rights Reserved 19 Mesos
  20. 20. Data Resilience During Crash Spark Compute Spark Storage HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio block 1 block 3 block 4 Storage Engine & Execution Engine Different process ©2017 Alluxio, Inc. All Rights Reserved 20 Mesos
  21. 21. Data Resilience During Crash Process Crash - Data is Re-read at Memory SpeedHDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio block 1 block 3 block 4 CRASH Storage Engine & Execution Engine Different process ©2017 Alluxio, Inc. All Rights Reserved 21 Mesos
  22. 22. Alluxio Architecture ©2017 Alluxio, Inc. All Rights Reserved 22 Application AlluxioClient Alluxio Master Alluxio Worker Alluxio Worker … Storage Storage …
  23. 23. Alluxio Client ©2017 Alluxio, Inc. All Rights Reserved 23 Applications interact with Alluxio via the Alluxio client ● Native Alluxio Filesystem Client • Alluxio specific operations like [un]pin, [un]mount, [un]set TTL ● HDFS-Compatible Filesystem Client • No code change necessary ● S3 API
  24. 24. Alluxio Master ©2017 Alluxio, Inc. All Rights Reserved 24 Master is responsible for managing metadata ● Filesystem namespace metadata ● Blocks / workers metadata Primary master writes journal for durable operations ● Secondary masters replay journal entries
  25. 25. Alluxio Worker ©2017 Alluxio, Inc. All Rights Reserved 25 Worker is responsible for managing block data Worker stores block data on various storage media ● HDD, SSD, Memory Reads and writes data to underlying storage systems
  26. 26. Outline Alluxio Overview Alluxio + Spark + Mesos Use Cases Using Spark with Alluxio on Mesos Deployment with Mesos Demo 1 2 3 4 5 ©2017 Alluxio, Inc. All Rights Reserved 26
  27. 27. Alluxio on DC/OS ©2017 Alluxio, Inc. All Rights Reserved 27
  28. 28. Alluxio on DC/OS ©2017 Alluxio, Inc. All Rights Reserved 28 Alluxio brings A unified view of data across disparate storage systems High performance & predictable SLA for analytics workloads DC/OS makes provisioning infrastructure easy Automates provisioning, management & elastic scaling Benefits include: Faster analytics with Spark and other frameworks Process data from hybrid cloud storage systems (HDFS, S3, etc)
  29. 29. Outline Alluxio Overview Alluxio + Spark + Mesos Use Cases Using Spark with Alluxio on Mesos Deployment with Mesos Demo 1 2 3 4 5 ©2017 Alluxio, Inc. All Rights Reserved 29
  30. 30. Demo Environment Spark Alluxio ©2017 Alluxio, Inc. All Rights Reserved 30 SPARK MESOS
  31. 31. Demo Setup Alluxio 1.5.0 DC/OS 1.9.4 Spark 2.0.2 Amazon EC2 (m3.xlarge) ©2017 Alluxio, Inc. All Rights Reserved 31
  32. 32. Results ©2017 Alluxio, Inc. All Rights Reserved 32 8x  improvement  
  33. 33. Conclusion Easy to use Alluxio with Spark in a Mesos environment Predictable and improved performance Easily connect to various storage systems ©2017 Alluxio, Inc. All Rights Reserved 33
  34. 34. Thank you! Gene Pang Software Engineer gene@alluxio.com 34 Twitter.com/alluxio Linkedin.com/alluxio Website www.alluxio.com E-mail info@alluxio.com @ Social Media * ©2017 Alluxio, Inc. All Rights Reserved

×