SlideShare a Scribd company logo
1 of 19
Download to read offline
Getting Started with
Alluxio Open Source
1
2019/01/28 Office Hour
Follow us | @alluxio
Download Alluxio | www.alluxio.org
Questions? | http://alluxio.org/slack
About Me
• Bin Fan
• PhD CS@CMU
• Founding Engineer@Alluxio
2
Email: binfan@alluxio.com
Github: apc999
Twitter: @binfan
Company
Overview
• Founded Feb. 2015 – Haoyuan Li
• PhD research at UC Berkeley AMPLab
• Initially Tachyon Nexus
• Venture Backed
• Andreessen Horowitz, Seven Seas etc.
• Open Source Business Model
• Tachyon Open Sourced in Dec. 2012
• Open source v1.0 released Feb. 2016
• Commercial product released Oct. 2016
• Office in San Mateo, CA
• Team: Google, Palantir, Vmware, AMD, Cisco…
Data Access Layer
Data Access Layer: Alluxio
Security Standard APIsHigh Performance
Compatibility Decoupling
Transparent
Migration
4
The Data Access Layer
5
• Abstraction layer between applications and storage systems
• Present a stable storage interface to applications, including
semantics, security, and performance
• Eliminate weakness of data silos instead of data silos
themselves
• Enable transparent migration of underlying storage systems
• Enable application API to storage API translation in a single
layer
Alluxio
6
• Our implementation of the data access layer – a virtual
distributed file system
• Open source project with over 900 contributors from 100s of
organizations worldwide
• Deployed in many top internet and financial companies
100+ Known Production Deployments
AND MORE!
11/16/18 7
Install Alluxio
8
- Install Alluxio using brew on Local MacOS
- brew install alluxio
- Install Alluxio using docker on Local Linux
- http://www.alluxio.org/docs/1.8/en/deploy/Running-Alluxio-On-Docker.ht
ml
Alluxio Architecture
Alluxio
Master
Zookeeper
Standby
Master
Alluxio
Worker
Alluxio
Worker Under Store
RAM / SSD / HDD
RAM / SSD / HDD
Control Path
Data Path
9
Read Data not Cached in Alluxio + Caching
10
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
WorkerUnder Store 12
3
4
4
Read Cached Data in Alluxio
Alluxio
Worker
RAM / SSD / HDD
Application
Alluxio
Client
11
1
2
3
Write data only to Alluxio
Alluxio
Worker
RAM / SSD / HDD
Application
Alluxio
Client
12
1
2
3
Write to Alluxio and Under Store Synchronously
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Worker
Under Store
13
12
2
3
A Common File System Abstraction
14
• Common interface across apps
• HDFS-compatible interface:
change hdfs://foo/bar to
alluxio://foo/bar
• Other interfaces: Native Alluxio Java
FS, POSIX and S3.
• Cloud storage becomes “hidden”
to apps
• Less vendor lock-in!
Compute Zone
Standalone or managed with Mesos or Yarn
Storage in Different Availability Zone
Either on-prem or cloud
TensorflowPrestoMR
HDFS API POSIX API
Data Path: Improved I/O Performance
15
• A New Tier Above Cloud Storage for Compute
• Distributed buffer cache
• Restore locality to compute
• Read:
• Cache-hit read: served by Alluxio workers (local worker preferred)
• Cache-miss read: served by cloud storage, then cache to Alluxio worker
• Write:
• Burst buffer, then async propagate to S3 (Alluxio 2.0)
• Challenges:
• Locality: expose location information to applications; serve local apps
through ramdisk (rather than network)
Data Path: Async Persist to S3 (Alluxio 2.0)
16
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Master
Alluxio
Worker
Under Store
• Async Writes
• Step1: App writes to Alluxio
• Step2: Alluxio writes to UFS
• Benefits
• Apps writes in Alluxio speed
• Data gets persisted
• Challenges
• File rename/delete before
persist: 2PC
• Fault-tolerance: journal async
requests
Metadata Path: Familiar Semantics
17
• Listing / renaming on object store can be expensive
• Common operations for batch or SQL analytics
• Overwriting Put is eventually consistent
• Alluxio loads and manages metadata in master
• Apps can continue assuming HDFS-like semantics and performance
implication
• Challenges
• Data modification bypassing Alluxio: when and how to re-sync
• Slow lists in object store: batch operations
• Too many objects: off-heap metadata (Alluxio 2.0)
- Alluxio: A New Data Access Layer
- Between compute and storage
- Transparent to bigdata analytics (HDFS-compatible, POSIX)
- Improve data and metadata performance on cloud storage
- Architecture and Data Flow
- Master, Worker, Under Storage
- Cache-{hit, miss} reads, Sync/Async writes
- More Use Cases:
https://www.alluxio.org/community/powered-by-alluxio
- Send your mailing address to mel@alluxio.com to get the
Tshirt!
Summary
18
zhuanlan.zhihu.com/alluxio
www.alluxio.com
info@alluxio.com
twitter.com/alluxio
linkedIn.com/alluxio
Thank you
binfan@alluxio.com

More Related Content

What's hot

What's hot (20)

Choosing a dev ops paas platform svccd presentation v2 for slideshare
Choosing a dev ops paas platform svccd presentation v2 for slideshareChoosing a dev ops paas platform svccd presentation v2 for slideshare
Choosing a dev ops paas platform svccd presentation v2 for slideshare
 
Empowering Publishers - Hosting Provider Selection Process - May-15-2013
Empowering Publishers - Hosting Provider Selection Process - May-15-2013Empowering Publishers - Hosting Provider Selection Process - May-15-2013
Empowering Publishers - Hosting Provider Selection Process - May-15-2013
 
Got Shadow IT? How to Win-Win with a Private Cloud.
Got Shadow IT? How to Win-Win with a Private Cloud.Got Shadow IT? How to Win-Win with a Private Cloud.
Got Shadow IT? How to Win-Win with a Private Cloud.
 
The Pivotal Engineering Dojo: Earning Your Black Belt in Cloud Foundry Engine...
The Pivotal Engineering Dojo: Earning Your Black Belt in Cloud Foundry Engine...The Pivotal Engineering Dojo: Earning Your Black Belt in Cloud Foundry Engine...
The Pivotal Engineering Dojo: Earning Your Black Belt in Cloud Foundry Engine...
 
SoftNAS Cloud for AWS
SoftNAS Cloud for AWSSoftNAS Cloud for AWS
SoftNAS Cloud for AWS
 
Managing vSphere Across Multiple Regions and Multiple vCenters
Managing vSphere Across Multiple Regions and Multiple vCenters Managing vSphere Across Multiple Regions and Multiple vCenters
Managing vSphere Across Multiple Regions and Multiple vCenters
 
The twelve factor app
The twelve factor appThe twelve factor app
The twelve factor app
 
70-533 -- Course Introduction
70-533 -- Course Introduction70-533 -- Course Introduction
70-533 -- Course Introduction
 
Java in the Cloud : PaaS Platforms in Comparison
Java in the Cloud : PaaS Platforms in Comparison Java in the Cloud : PaaS Platforms in Comparison
Java in the Cloud : PaaS Platforms in Comparison
 
SYMANTEC Backup Exec 2014 - infographic
SYMANTEC Backup Exec 2014 - infographicSYMANTEC Backup Exec 2014 - infographic
SYMANTEC Backup Exec 2014 - infographic
 
INFOGRAPHIC: #BackupExec 2014 - Backup Anything. Restore Anywhere.
INFOGRAPHIC: #BackupExec 2014 - Backup Anything. Restore Anywhere.INFOGRAPHIC: #BackupExec 2014 - Backup Anything. Restore Anywhere.
INFOGRAPHIC: #BackupExec 2014 - Backup Anything. Restore Anywhere.
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
 
Roll your own FOSS cloud hosting
Roll your own FOSS cloud hostingRoll your own FOSS cloud hosting
Roll your own FOSS cloud hosting
 
Jelastic Enterprise
Jelastic EnterpriseJelastic Enterprise
Jelastic Enterprise
 
Enterprise grade-deployment-2019
Enterprise grade-deployment-2019Enterprise grade-deployment-2019
Enterprise grade-deployment-2019
 
Drupal Solutions Comparison For Multiple Sites With Related Content - Acquia ...
Drupal Solutions Comparison For Multiple Sites With Related Content - Acquia ...Drupal Solutions Comparison For Multiple Sites With Related Content - Acquia ...
Drupal Solutions Comparison For Multiple Sites With Related Content - Acquia ...
 
Cassandra summit 2015 - Simplifying Streaming Analytics
Cassandra summit 2015 - Simplifying Streaming AnalyticsCassandra summit 2015 - Simplifying Streaming Analytics
Cassandra summit 2015 - Simplifying Streaming Analytics
 
Mulesoft cloudhub
Mulesoft cloudhubMulesoft cloudhub
Mulesoft cloudhub
 
Meta cloud architecture for the mobile agile enterprise
Meta cloud architecture for the mobile agile enterpriseMeta cloud architecture for the mobile agile enterprise
Meta cloud architecture for the mobile agile enterprise
 
Dcpl cloud computing amazon fail
Dcpl cloud computing amazon failDcpl cloud computing amazon fail
Dcpl cloud computing amazon fail
 

Similar to Alluxio Community Office Hour: Getting Started with Alluxio Open Source

Similar to Alluxio Community Office Hour: Getting Started with Alluxio Open Source (20)

Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed StorageAlluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-final
 
Accelerating Spark with Kubernetes
Accelerating Spark with KubernetesAccelerating Spark with Kubernetes
Accelerating Spark with Kubernetes
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
 
Accelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAccelerate Cloud Training with Alluxio
Accelerate Cloud Training with Alluxio
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Alluxio Community Office Hour: Getting Started with Alluxio Open Source

  • 1. Getting Started with Alluxio Open Source 1 2019/01/28 Office Hour Follow us | @alluxio Download Alluxio | www.alluxio.org Questions? | http://alluxio.org/slack
  • 2. About Me • Bin Fan • PhD CS@CMU • Founding Engineer@Alluxio 2 Email: binfan@alluxio.com Github: apc999 Twitter: @binfan
  • 3. Company Overview • Founded Feb. 2015 – Haoyuan Li • PhD research at UC Berkeley AMPLab • Initially Tachyon Nexus • Venture Backed • Andreessen Horowitz, Seven Seas etc. • Open Source Business Model • Tachyon Open Sourced in Dec. 2012 • Open source v1.0 released Feb. 2016 • Commercial product released Oct. 2016 • Office in San Mateo, CA • Team: Google, Palantir, Vmware, AMD, Cisco…
  • 4. Data Access Layer Data Access Layer: Alluxio Security Standard APIsHigh Performance Compatibility Decoupling Transparent Migration 4
  • 5. The Data Access Layer 5 • Abstraction layer between applications and storage systems • Present a stable storage interface to applications, including semantics, security, and performance • Eliminate weakness of data silos instead of data silos themselves • Enable transparent migration of underlying storage systems • Enable application API to storage API translation in a single layer
  • 6. Alluxio 6 • Our implementation of the data access layer – a virtual distributed file system • Open source project with over 900 contributors from 100s of organizations worldwide • Deployed in many top internet and financial companies
  • 7. 100+ Known Production Deployments AND MORE! 11/16/18 7
  • 8. Install Alluxio 8 - Install Alluxio using brew on Local MacOS - brew install alluxio - Install Alluxio using docker on Local Linux - http://www.alluxio.org/docs/1.8/en/deploy/Running-Alluxio-On-Docker.ht ml
  • 9. Alluxio Architecture Alluxio Master Zookeeper Standby Master Alluxio Worker Alluxio Worker Under Store RAM / SSD / HDD RAM / SSD / HDD Control Path Data Path 9
  • 10. Read Data not Cached in Alluxio + Caching 10 RAM / SSD / HDD Application Alluxio Client Alluxio WorkerUnder Store 12 3 4 4
  • 11. Read Cached Data in Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 11 1 2 3
  • 12. Write data only to Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 12 1 2 3
  • 13. Write to Alluxio and Under Store Synchronously RAM / SSD / HDD Application Alluxio Client Alluxio Worker Under Store 13 12 2 3
  • 14. A Common File System Abstraction 14 • Common interface across apps • HDFS-compatible interface: change hdfs://foo/bar to alluxio://foo/bar • Other interfaces: Native Alluxio Java FS, POSIX and S3. • Cloud storage becomes “hidden” to apps • Less vendor lock-in! Compute Zone Standalone or managed with Mesos or Yarn Storage in Different Availability Zone Either on-prem or cloud TensorflowPrestoMR HDFS API POSIX API
  • 15. Data Path: Improved I/O Performance 15 • A New Tier Above Cloud Storage for Compute • Distributed buffer cache • Restore locality to compute • Read: • Cache-hit read: served by Alluxio workers (local worker preferred) • Cache-miss read: served by cloud storage, then cache to Alluxio worker • Write: • Burst buffer, then async propagate to S3 (Alluxio 2.0) • Challenges: • Locality: expose location information to applications; serve local apps through ramdisk (rather than network)
  • 16. Data Path: Async Persist to S3 (Alluxio 2.0) 16 RAM / SSD / HDD Application Alluxio Client Alluxio Master Alluxio Worker Under Store • Async Writes • Step1: App writes to Alluxio • Step2: Alluxio writes to UFS • Benefits • Apps writes in Alluxio speed • Data gets persisted • Challenges • File rename/delete before persist: 2PC • Fault-tolerance: journal async requests
  • 17. Metadata Path: Familiar Semantics 17 • Listing / renaming on object store can be expensive • Common operations for batch or SQL analytics • Overwriting Put is eventually consistent • Alluxio loads and manages metadata in master • Apps can continue assuming HDFS-like semantics and performance implication • Challenges • Data modification bypassing Alluxio: when and how to re-sync • Slow lists in object store: batch operations • Too many objects: off-heap metadata (Alluxio 2.0)
  • 18. - Alluxio: A New Data Access Layer - Between compute and storage - Transparent to bigdata analytics (HDFS-compatible, POSIX) - Improve data and metadata performance on cloud storage - Architecture and Data Flow - Master, Worker, Under Storage - Cache-{hit, miss} reads, Sync/Async writes - More Use Cases: https://www.alluxio.org/community/powered-by-alluxio - Send your mailing address to mel@alluxio.com to get the Tshirt! Summary 18