Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Alluxio Presentation at Strata San Jose 2016

605 views

Published on

Alluxio Presentation at Strata San Jose 2016

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Alluxio Presentation at Strata San Jose 2016

  1. 1. Alluxio (formerly Tachyon): Unified Namespace and Tiered Storage Calvin Jia, Jiri Simsa
  2. 2. One of the Things to Watch at Strata TechCrunch article: “… An interesting item that made the top terms list is “alluxio,” which is the recently renamed Tachyon project. Alluxio is a virtual distributed storage system, and it has a memory-centric architecture that enables data sharing across clusters at memory speed. … “ 2
  3. 3. Who Are We? • Calvin Jia • SWE @ Alluxio, Inc. • #1 Alluxio contributor • Twitter: @JiaCalvin • Jiri Simsa • SWE @ Alluxio, Inc • CMU Ph.D. & Google • Twitter: @jsimsa 3
  4. 4. Alluxio Inc. • Founded by Alluxio creators and top committers • Formerly Tachyon Nexus, Inc. • $7.5 million Series A by Andreessen Horowitz • Committed to the Alluxio Open Source Project • Company Website: http://www.alluxio.com 4
  5. 5. Outline • Alluxio Introduction • Tiered Storage • Unified Namespace 5
  6. 6. ALLUXIO: Open Source Memory Speed Virtual Distributed Storage 6
  7. 7. Memory Speed • Memory-centric architecture designed for memory I/O Virtual • Abstracts persistent storage from applications Distributed • Designed to scale with nothing but commodity hardware Open Source • One of the fastest growing project communities 7
  8. 8. Contributor Growth • Over 200 Contributors – 3x growth over the last year 8
  9. 9. Organizations • Over 50 Organizations 9
  10. 10. Alluxio Ecosystem 10
  11. 11. Memory is Getting Faster 11
  12. 12. Memory is Getting Cheaper 12
  13. 13. Simple Examples • Data sharing between frameworks • Data resilience during application crashes • Consolidate memory usage and alleviate GC issues 13
  14. 14. Spark Job Spark Memory block 1 block 3 Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data Sharing Between Frameworks Inter-process sharing slowed down by network and/or disk I/O 14
  15. 15. Data Sharing Between Frameworks Spark Job Spark Memory Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 storage engine & execution engine same process Inter-process sharing can happen at memory speed 15
  16. 16. Data Resilience during Crashes Spark Task Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 16
  17. 17. Data Resilience during Crashes Crash Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 17
  18. 18. HDFS / Amazon S3 Data Resilience during Crashes block 1 block 3 block 2 block 4 Crash storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 18
  19. 19. Data Resilience during Crashes Spark Task Spark Memory block manager storage engine & execution engine same process HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Process crash only needs memory I/O to re-read the data 19
  20. 20. Data Resilience during Crashes Crash storage engine & execution engine same process Process crash only needs memory I/O to re-read the data HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 20
  21. 21. HDFS / Amazon S3 Consolidating Memory Spark Job1 Spark Memory block 1 block 3 Spark Job2 Spark Memory block 3 block 1 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data duplicated at memory-level 21
  22. 22. Consolidating Memory Spark Job1 Spark mem Spark Job2 Spark mem HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data not duplicated at memory-level 22
  23. 23. Case Study: Barclays Making the Impossible Possible with Tachyon: Accelerate Spark Jobs from Hours to Seconds • Application: SparkSQL + Spark RDDs • Alluxio Storage Layer: MEM • Backend Storage: None • Result: Speeding up Spark jobs from hours to seconds 23
  24. 24. Common Questions – Memory speed sharing among distributed applications HDFS interface compatible – GC overhead introduced by in-memory caching Off-Heap Memory Management – Data set could be larger than available memory Tiered storage 24
  25. 25. Outline • Alluxio Introduction • Tiered Storage • Unified Namespace 25
  26. 26. Motivation • Memory resources are still constrained • Alluxio data management logic is not limited to memory • Storage resources available on compute clusters 26
  27. 27. Tiered Storage MEM SSD HDD 27
  28. 28. Tiered Storage • Extends Alluxio with support for SSDs and/or HDDs storage • Different tiers have different characteristics – Keep hot data in fast but limited storage – Keep warm data in slower but abundant storage • Workers manage their own storage • Data allocation and eviction is driven by application access 28
  29. 29. Tiered Storage Architecture Machine Type 1 Compute Client Alluxio Master Memory, SSD, HDD Machine Type 2 Compute Client Alluxio Worker Memory, SSD, HDD 29
  30. 30. Tiered Storage Architecture Machine Type 2 Compute Client • Alluxio Client Alluxio Worker • Tiered Block Store • Evictor • Allocator Memory, SSD, HDD 30
  31. 31. Automatic Data Migration • Data can be evicted to lower layers if it is “cooling down” • Data can be promoted to upper layers if it is “warming up” Evict stale data to lower tier Promote hot data to upper tier 31
  32. 32. Pluggable Policies • Policies can be customized to suit workloads • Defaults provided for general scenarios • Advanced users can optimize with additional knowledge – For example: Optimize for iterations 32
  33. 33. Case Study: Baidu Baidu Queries Data 30 Times Faster with Alluxio • Application: Spark • Alluxio Storage: MEM + HDD • Backend Storage: Baidu’s File System • 200+ nodes deployment, 2PB+ managed space • Result: Speeding up data querying by 30x 33
  34. 34. Outline • About Alluxio • Tiered Storage • Unified Namespace 34
  35. 35. Big Data Ecosystem 35
  36. 36. Big Data Ecosystem 36
  37. 37. Big Data Ecosystem 37
  38. 38. Motivation • At large organizations, data spans many storage systems (object storage, network / distributed file systems, DBs) • Application logic needs to integrate with different types of storage systems • Data needs to be moved around to work around application limitations • In-house storage layers are built to address limitations of legacy storage systems 38
  39. 39. Transparent Naming • Applications can transparently and efficiently interact with remote storage through Alluxio. • Applications do not need to use different APIs for interacting with different storage systems. alluxio://host:port/ data users reports sales alice bob s3n://bucket/directory data users reports sales alice bob Alluxio Storage System 39
  40. 40. Single Namespace • Applications can read and write different storage systems. • Decouples data location from application alluxio://host:port/ data users reports sales alice bob hdfs://host:port/ users alice bob s3n://bucket/directory reports sales Alluxio Storage System A Storage System B 40
  41. 41. Architecture Alluxio Interface UFS Interface HDFSS3 Swift … S3 adapter Swift adapter HDFS adapter ALLUXIO 41
  42. 42. Alluxio Benefits 42 • Enable new workloads across storage systems • Work with the framework of your choice • Scale storage and compute independently
  43. 43. Resources • Alluxio Project: http://www.alluxio.org • Development: https://github.com/Alluxio/alluxio • Meet Friends: http://www.meetup.com/Alluxio • Alluxio Inc: http://www.alluxio.com • Contact us: info@alluxio.com 43

×