Successfully reported this slideshow.
Your SlideShare is downloading. ×

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon): An Open Source Memory Speed Virtual Distributed Storage - Gene Pang, Software Engineer, Alluxio

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

YouTube videos are no longer supported on SlideShare

View original on YouTube

Alluxio (formerly Tachyon)
Open Source Memory Speed
Virtual Distributed Storage System
Gene Pang @ Alluxio, Inc.
July 9, 2...
About Me
•  Software Engineer @ Alluxio, Inc.
•  One of the core maintainers of Alluxio Open Source Project
•  Ph.D. @ AMP...

Check these out next

1 of 53 Ad

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon): An Open Source Memory Speed Virtual Distributed Storage - Gene Pang, Software Engineer, Alluxio

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system. The Alluxio open source community is one of the fastest growing open source communities in big data history with more than 300 developers from over 100 organizations around the world. In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. Alluxio now supports a wide range of under storage systems, including Amazon S3, Google Cloud Storage, Gluster, Ceph, HDFS, NFS, and OpenStack Swift. This year, our goal is to make Alluxio accessible to an even wider set of users, through our focus on security, new language bindings, and further increased stability.

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system. The Alluxio open source community is one of the fastest growing open source communities in big data history with more than 300 developers from over 100 organizations around the world. In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. Alluxio now supports a wide range of under storage systems, including Amazon S3, Google Cloud Storage, Gluster, Ceph, HDFS, NFS, and OpenStack Swift. This year, our goal is to make Alluxio accessible to an even wider set of users, through our focus on security, new language bindings, and further increased stability.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon): An Open Source Memory Speed Virtual Distributed Storage - Gene Pang, Software Engineer, Alluxio (20)

More from Data Con LA (20)

Advertisement

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon): An Open Source Memory Speed Virtual Distributed Storage - Gene Pang, Software Engineer, Alluxio

  1. 1. Alluxio (formerly Tachyon) Open Source Memory Speed Virtual Distributed Storage System Gene Pang @ Alluxio, Inc. July 9, 2016 @ Big Data Day LA
  2. 2. About Me •  Software Engineer @ Alluxio, Inc. •  One of the core maintainers of Alluxio Open Source Project •  Ph.D. @ AMPLab, UC Berkeley •  Worked at Google before UC Berkeley •  Twitter: @unityxx 2
  3. 3. About Alluxio, Inc. •  Founded by creators and top committers of Alluxio open source project (formerly named Tachyon) •  Series A by Andreessen Horowitz •  http://www.alluxio.com •  We are hiring! 3
  4. 4. What I’ll be Covering •  Brief overview of Alluxio •  Motivation for Alluxio •  Alluxio Use Cases 4
  5. 5. 5 What Is Alluxio?
  6. 6. 6 Alluxio Open Source Memory Speed Virtual Distributed Storage System
  7. 7. •  Open Source. One of the fastest growing project communities •  Memory Speed. Memory-centric architecture designed for memory I/O •  Virtual. Unified Namespace abstracts storage from applications •  Distributed. Designed to scale out with commodity hardware 7 What Does That Mean?
  8. 8. 8 Alluxio Ecosystem
  9. 9. •  Flexibility. Unified namespace enable new workloads across storage systems •  Agility. Quickly adapt to frameworks and storage systems of your choice •  Performance. Architecture supports fast, memory-speed access to data •  Cost. Grow storage and compute resources independently 9 Alluxio Benefits Any application can access any data from any storage at memory speed
  10. 10. 10 Alluxio is Open Source
  11. 11. •  Started at UC Berkeley AMPLab, Summer 2012 –  The same lab that produced Apache Mesos and Apache Spark •  Open sourced as Tachyon, April 2013 –  Apache License 2.0 –  Renamed to Alluxio in February 2016 –  Latest Release: Version 1.1.1 (July 2016) 11 The Beginnings
  12. 12. •  Over 250 Contributors •  3x growth over the last year 12 Contributor Growth
  13. 13. Alluxio Open Source Community 13 Over 3x increase from 1 year ago!
  14. 14. Contributors and Users 14
  15. 15. 15 Alluxio is Memory Speed
  16. 16. 16 Why Use Memory for Storage?
  17. 17. •  RAM throughput increasing exponentially •  Disk throughput increasing slowly •  Memory-locality key to interactive response times 17 Why Memory? Performance Trend
  18. 18. •  DRAM is becoming inexpensive (source: jcmit.com) 18 Why Memory? Price Trend
  19. 19. 19 What if memory capacity is still not enough?
  20. 20. Alluxio Manages Tiered Storage 20 MEM SSD HDD Faster Higher Capacity
  21. 21. Configurable Storage Tiers 21 MEM only MEM + HDD SSD only
  22. 22. Pluggable Tier Management Policies 22 Evict stale data to slower tier Promote hot data to faster tier
  23. 23. 23 Alluxio is a Virtual Distributed Storage System
  24. 24. 24 The Big Data Ecosystem Today
  25. 25. 25 This is Problematic
  26. 26. •  Costly Ecosystem Integrations •  Costly ETL and Data Duplication •  Data Silos •  Long Cycle from Data to Value 26 What are the Problems?
  27. 27. 27 Alluxio Unifies Access to Data
  28. 28. 28 How to use Alluxio?
  29. 29. •  Accelerate access to remote storage •  Share data across jobs/applications at memory speed •  Transparently manage data across different storage systems 29 Alluxio Common Use Cases
  30. 30. 30 Accelerating Access to Remote Storage
  31. 31. 31 Remote I/O to Data Spark Amazon S3 every data operation requires data transfer, sometimes over the WAN high latency, network throughput
  32. 32. 32 Local I/O with Alluxio Spark Amazon S3 Alluxio low latency, memory throughput high latency, network throughput Keeping data in Alluxio accelerates data access
  33. 33. 33 Sharing Data at Memory Speed
  34. 34. 34 Sharing Data Slowly Spark Amazon S3 MapReduce Flink Network I/O Disk I/O I/O slows down sharing
  35. 35. 35 Sharing Data Memory Speed with Alluxio Spark Amazon S3 MapReduce Flink Alluxio Share data via memory
  36. 36. 36 Managing Data Across Different Storage Systems
  37. 37. 37 Simple World Application 1 HDFS
  38. 38. 38 Adding a Storage System Application 1 HDFS Amazon S3
  39. 39. 39 Adding a Storage System Application 1 Google GCS HDFS Amazon S3
  40. 40. 40 Adding an Application Application 1 Google GCS HDFS Amazon S3 Application 2
  41. 41. 41 Adding an Application Application 1 Google GCS HDFS Amazon S3 Application 2 Application 3 complex, inflexible
  42. 42. 42 With Alluxio Application 1 HDFS Alluxio
  43. 43. 43 New Storage Systems and Applications Application 1 Google GCS HDFS Amazon S3 Application 2 Application 3 Alluxio Flexible, simple no application changes, new mount point
  44. 44. 44 Alluxio in the Wild!
  45. 45. 45 Use Case
  46. 46. •  Framework: Spark SQL •  Under Storage: Baidu’s File System •  Storage Media: MEM + HDD •  200+ nodes deployment •  2PB+ managed space 46 at
  47. 47. 47 Use Case
  48. 48. •  Framework: Spark •  Storage Media: MEM •  Improvement from Hours to Seconds 48 at
  49. 49. 49 Use Case
  50. 50. •  Framework: Spark Streaming + Flink Streaming + Spark + Flink •  Under Storage: Multiple HDFS clusters •  Storage Media: MEM + HDD •  200+ nodes deployment •  Alluxio enables previously impossible jobs to finish •  300x Performance Improvement during peak load 50 at
  51. 51. 51
  52. 52. •  Alluxio Project: www.alluxio.org •  Alluxio, Inc: www.alluxio.com •  Development: www.github.com/Alluxio/alluxio •  Meet Friends: www.meetup.com/Alluxio •  Email: gene@alluxio.com 52 To Get More Information

×