Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Alluxio: Unify Data at Memory Speed; 2016-11-18

381 views

Published on

Haoyuan Li's Alluxio Presentation at End of AMPLab Celebration Workshop.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Alluxio: Unify Data at Memory Speed; 2016-11-18

  1. 1. UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li @ AMPLab End of Project Celebration November 18th, 2016
  2. 2. HISTORY 2 Trex
 12-13
  3. 3. HISTORY 2 Trex
 12-13 Tachyon
 13-15
  4. 4. HISTORY 2 Trex
 12-13 Tachyon
 13-15 Alluxio
 15-
  5. 5. FASTEST-GROWING BIG DATA PROJECT 3
  6. 6. FASTEST-GROWING BIG DATA PROJECT 3 • Fastest growing open-source project in the big data ecosystem • 400+ contributors from 100+ organizations • Running world’s largest production clusters • Welcome to join the community!
  7. 7. CURRENT STATUS 4 Haoyuan Li, CEO Alluxio (formerly Tachyon) Co-creator, Joined AMPLab Ph.D. Program 2011 FOUNDER INVESTOR TEAM From AMD, Dell, Google, Palantir, Uber, Yahoo; Experts in Distributed Systems MSs and PhDs in CS from CMU,, Stanford, UC Berkeley Top 10 Committers of the Alluxio Open Source Project We are Hiring! COMPANY Founded 2015
  8. 8. BIG DATA ECOSYSTEM YESTERDAY 5 … …
  9. 9. BIG DATA ECOSYSTEM TODAY 5 … …
  10. 10. 5 … … BIG DATA ECOSYSTEM ISSUES
  11. 11. BIG DATA ECOSYSTEM WITH ALLUXIO 5 … … FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
  12. 12. BIG DATA ECOSYSTEM WITH ALLUXIO 5 … … FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System Enabling Application to Access Data from any Storage System at Memory-speed GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
  13. 13. WHY ALLUXIO 6 Co-located compute and data with memory-speed access to data Virtualized across different storage systems under a unified namespace Scale-out architecture File system API, software only
  14. 14. ALLUXIO BENEFITS 7 Unification New workflows across any data in any storage system Orders of magnitude improvement in run time Choice in compute and storage – grow each independently, buy only what is needed Performance Flexibility
  15. 15. TRUSTED BY THE WORLD LEADING COMPANIES 8
  16. 16. ALLUXIO USE CASES 9 Accelerating I/O to and from remote storage Managing data across disparate storage systems Sharing data across workloads at memory speed
  17. 17. ACCELERATE I/O TO/FROM REMOTE STORAGE 10 Baidu’s PMs and analysts run 
 interactive queries to gain insights 
 into their products and business • 200+ nodes deployment • 2+ petabytes of storage • Mix of memory + HDD ALLUXIO Baidu File System
  18. 18. ACCELERATE I/O TO/FROM REMOTE STORAGE 10 The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. - Baidu RESULTS • Data queries are now 30x faster with Alluxio • Alluxio cluster runs stably, providing over 50TB of RAM space • By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds Baidu’s PMs and analysts run 
 interactive queries to gain insights 
 into their products and business • 200+ nodes deployment • 2+ petabytes of storage • Mix of memory + HDD ALLUXIO Baidu File System
  19. 19. SHARE DATA ACROSS JOBS @ MEMORY SPEED 11 Barclays uses query and machine learning to train models for risk management • 6 node deployment • 1TB of storage • Memory only ALLUXIO Relational Database
  20. 20. SHARE DATA ACROSS JOBS @ MEMORY SPEED 11 Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. - Barclays RESULTS • Barclays workflow iteration time decreased from hours to seconds • Alluxio enabled workflows that were impossible before • By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds Barclays uses query and machine learning to train models for risk management • 6 node deployment • 1TB of storage • Memory only ALLUXIO Relational Database
  21. 21. MANAGE DATA ACROSS STORAGE SYSTEMS 12 • 200+ nodes deployment • 6 billion logs (4.5 TB) daily • Mix of Memory + HDD ALLUXIO Qunar uses real-time machine learning for their website ads.
  22. 22. MANAGE DATA ACROSS STORAGE SYSTEMS 12 We’ve been running Alluxio in production for over 9 months, Alluxio’s unified namespace enable different applications and frameworks to easily interact with data from different storage systems - Qunar RESULTS • Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing • Improved the performance of their system with 15x – 300x speedups • Tiered storage feature manages storage resources including memory, SSD and disk • 200+ nodes deployment • 6 billion logs (4.5 TB) daily • Mix of Memory + HDD ALLUXIO Qunar uses real-time machine learning for their website ads.
  23. 23. ALLUXIO, INC PRODUCT OFFERINGS 13 Capability/Value Technology Validation Alluxio Open Source Open Source Alluxio Community Edition (ACE) Accelerate Adoption Alluxio Manager Open Source Alluxio Enterprise Edition (AEE) Enterprise Deployment • Kerberos Authentication • Data Replication • Support Alluxio Manager Open Source
  24. 24. GOING INTO THE FUTURE 14
  25. 25. Congrats to the AMPLab!
 Thank you! Contact: haoyuan@alluxio.com or info@alluxio.com 15

×