Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

513 views

Published on

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

  1. 1. UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017
  2. 2. HISTORY • Started at UC Berkeley AMPLab In Summer 2012 • Originally named as Tachyon • Rebranded to Alluxio in early 2016 • Open Sourced in 2013 • Apache License 2.0 • Latest Stable Release: Alluxio 1.4.0 • Alluxio 1.5.0 Planned For Q2, 2017 2
  3. 3. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM YESTERDAY 3
  4. 4. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM TODAY … … 3
  5. 5. © 2017 Alluxio Confidential … … BIG DATA ECOSYSTEM ISSUES 3
  6. 6. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM WITH ALLUXIO … … FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface 3
  7. 7. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM WITH ALLUXIO … … FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System Enabling Application to Access Data from any Storage System at Memory-speed GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface 3
  8. 8. © 2017 Alluxio Confidential 4
  9. 9. © 2017 Alluxio Confidential 5
  10. 10. © 2017 Alluxio Confidential FASTEST-GROWING BIG DATA PROJECT 6
  11. 11. © 2017 Alluxio Confidential FASTEST-GROWING BIG DATA PROJECT • Formerly named Tachyon, born in the AMPLab • 500+ contributors from 100+ organizations • Running world’s largest production clusters 6
  12. 12. © 2017 Alluxio Confidential WHY ALLUXIO 7 Co-located compute and data with memory-speed access to data Virtualized across different storage systems under a unified namespace Scale-out architecture File system API, software only
  13. 13. © 2017 Alluxio Confidential ALLUXIO BENEFITS Unification New workflows across any data in any storage system Orders of magnitude improvement in run time Choice in compute and storage – grow each independently, buy only what is needed Performance Flexibility 8
  14. 14. © 2017 Alluxio Confidential ALLUXIO DEPLOYMENTS 9
  15. 15. © 2017 Alluxio Confidential ALLUXIO USE CASES On-Demand Analytics &
 Accelerating I/O to and from remote storage Managing data across disparate storage systems Sharing data across workloads at memory speed 10
  16. 16. © 2017 Alluxio Confidential MANAGE DATA ACROSS STORAGE SYSTEMS “We’ve been running in production for over 9 months, Alluxio’s enabled different applications & frameworks to easily interact with data from different storage systems RESULTS • Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing • Improved the performance of their system with 15x – 300x speedups • Tiered storage feature manages storage resources including memory, SSD and disk Qunar uses real-time machine learning for their website ads • 200+ nodes deployment • 6 billion logs (4.5 TB) daily • Mix of Memory + HDD ALLUXIO 11
  17. 17. © 2017 Alluxio Confidential ON-DEMAND ANALYTICS &
 ACCELERATE I/O TO/FROM REMOTE STORAGE “The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with Alluxio • Alluxio cluster runs stably, providing over 50TB of RAM space • By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds PMs run interactive queries to gain insights into their products & business • 200+ nodes deployment • 2+ petabytes of storage • Mix of memory + HDD ALLUXIO Baidu File System 12
  18. 18. © 2017 Alluxio Confidential SHARE DATA ACROSS JOBS @ MEMORY SPEED “Thanks to Alluxio, we now have the raw data immediately available at every iteration & can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. RESULTS • Barclays workflow iteration time decreased from hours to seconds • Alluxio enabled workflows that were impossible before • By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds Barclays uses query & machine learning to train models for risk management • 6 node deployment • 1TB of storage • Memory only ALLUXIO 13 ALLUXIO Relational Database: Teradata
  19. 19. © 2017 Alluxio Confidential Thank you! Contact: {haoyuan}@alluxio.com or info@alluxio.com 14

×