Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What’s new in Alluxio 2: from seamless operations to structured data management

93 views

Published on

Alluxio Online Community Office Hours
Jan 28, 2020

Speakers:
Bin Fan, Alluxio
Calvin Jia, Alluxio

Alluxio 2.0 release was the biggest update since the birth of the project “Tachyon” from UC Berkley’s AmpLab. Gathering feedback from our Open Source Community and enterprise users, Alluxio 2.0 expands the system in three major directions including improving the operability of the system, having more advanced data management, as well as re-architecting the system to be able to scale to 1 billion + file. The system is now cloud native on AWS, Google Cloud, and allow users to enable native deployment with K8s. The new advanced data management enables data migration and replication from diff storage systems.

In this office hour, we introduce what’s new in the Alluxio 2 release, and dive deeper in each major direction the system has expanded on.

In this Office Hour, we will go over:
- Introduction and motivation of focus areas of Alluxio 2
- Overview of cloud native deployment methods
- New data management features
- System scalability improvements

Published in: Software
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

What’s new in Alluxio 2: from seamless operations to structured data management

  1. 1. What’s new in Alluxio 2 Bin Fan & Calvin Jia | Founding Engineers | Alluxio
  2. 2. Seamless Operations Alluxio 2 Directions Advanced Data Management 2 Hyper-scale Architecture
  3. 3. Seamless Operations
  4. 4. Cloud Native on AWS: AMI, CFT, EMR Presto Hive Cluster Metadata & Data cache Presto Hive Metadata & Data cache Compute-driven Continuous sync Compute-driven Continuous sync 4 § Alluxio AMI in the Marketplace § Alluxio Cloud Formation Template for cluster deployment § AWS EMR with Alluxio with bootstrap script Enable one-click to deploy Alluxio on AWS
  5. 5. Cloud Native on Google Cloud: Dataproc Presto Hive Metadata & Data cache Presto Hive Metadata & Data cache Compute-driven Continuous sync Compute-driven Continuous sync 5 § Google Dataproc with Alluxio (init action integration available) Google Dataproc Cluster Enable one-click to deploy Alluxio on Google Cloud
  6. 6. Native Deployment with Kubernetes Alluxio Worker Kubernetes Cluster Host Machine 6 Alluxio Master Alluxio Worker Host Machine Journal Volume Application ApplicationApplicationApplicationApplication
  7. 7. Self-Managed Quorum 7 Available in 2.0.0 Distributed Storage (ie. HDFS) Alluxio Standby Master Distributed Quorum (Zookeeper) Alluxio Master Alluxio Standby Master Alluxio Standby Master Alluxio Master RAFT No major external dependencies
  8. 8. Hyper-scale architecture
  9. 9. § Challenge: • 1 file metadata takes 1KB of on-heap storage • 1 billion files would take 1 TB of heap space, GC becomes a big problem § Solution: • Add new tier with embedded RocksDB to store inode tree • Keep an in-memory cache of frequently used inodes 9 Scaling to 1 Billion+ Files Scale to one billion files and beyond, with performance comparable to previous on-heap implementation
  10. 10. Scaling to 1 Billion+ Files 10 Available in 2.0.0 Alluxio Master Local Disk RocksDB (Embedded) ● Inode Table ● Edge Table ● Block Table ● Block to Worker Table ● Worker to Block Table On Heap ● Inode Cache ● Mount Table ● Locks Inode ID Metadata (Binary) 12392 010101101101 12393 110110110100 … … Edge (ID, name) Inode ID 12392,foo 12393 … …
  11. 11. Efficient cluster communication with gRPC 11 Available in 2.0.0 Thrift (Metadata) Netty (IO) Alluxio Master Alluxio Worker Alluxio Worker Alluxio Client Alluxio Master Alluxio Worker Alluxio Worker Alluxio Client gRPC (Metadata + IO)
  12. 12. Advanced Data Management
  13. 13. § New Alluxio Catalog Service • Provides the Abstraction of Structured Data • Attaching a Hive MetaStore like Mounting a File system • Understand and Serve Schema of Files or Objects § New Alluxio Data Transformation Service • Tranform csv à parquet • Compact many files à fewer files Deeper Integration with Presto 13 Presto Alluxio Connector Based off the Hive Connector Now available as Developer Preview
  14. 14. Policy Driven Data Management 14 Available in 2.0.0 Alluxio Master Alluxio Policy Engine Example Policy Move files older than 90 days from HDFS to S3 Application Apps access the same path regardless of where the actual data is stored Decouple logical file system namespace with physical storage systems
  15. 15. Replicated Asynchronous Writes 15 RAM / SSD / HDD Network Speed Write of Data Application Alluxio Client Alluxio Worker RAM / SSD / HDD Alluxio Worker Under Store Available in 2.0.0 Fast and reliable writes to Alluxio, with data persisted in background
  16. 16. Questions?

×