SlideShare a Scribd company logo
1 of 12
Download to read offline
Open Source Memory-Speed Virtual Distributed Storage
August 2016 @ Strata+Hadoop World Beijing
Haoyuan Li
BIG DATA ECOSYSTEM YESTERDAY
2
BIG DATA ECOSYSTEM TODAY = COMPLEXITY
…
…
3
FUSE Compatible File System
Hadoop Compatible File System
 Native Key-Value Interface
Native File System
Enabling any application to
access data from any storage
system at memory-speed
BIG DATA ECOSYSTEM TODAY WITH ALLUXIO

…
…
4
HARDWARE TRENDS
RAM throughput
increasing
exponentially

Disk throughput
increasing slowly
Memory prices
halving every 18
months

Memory Locality is
the Key!
5
Non-persistent
data-storage
software
WHAT IS ALLUXIO
Memory-Speed
 Virtual
 Distributed
 Storage
Scale out
architecture
Virtualized across different
storage types under a unified
namespace
Memory-speed
access to data
6
BENEFITS
Performance
High
performance
data access
Unification
New workflows
across existing
data and access
data in any
storage system
Flexibility
Work with the
compute and
storage
frameworks of
your choice
Cost
Grow compute
and storage
independently
7
CASE STUDY
Baidu File System
The performance was amazing. With
Spark SQL alone, it took 100-150 seconds
to finish a query; using Alluxio, where data
may hit local or remote Alluxio nodes, it
took 10-15 seconds.

- Shaoshan Liu, Baidu
RESULTS
•  Data queries are now 30x faster with Alluxio
•  1000-worker Alluxio cluster run stably,
providing over 50TB of RAM space
•  By using Alluxio, batch queries usually lasting
over 15 minutes were transformed into an
interactive query taking less than 30 seconds
Accelerate Interact with Remote
Storage
•  200+ nodes deployment
•  2+ petabytes of storage
•  Mix of memory + HDD
•  1000-worker cluster
8
CASE STUDY
Thanks to Alluxio, we now have the raw
data immediately available at every
iteration and we can skip the costs of
loading in terms of time waiting, network
traffic, and RDBMS activity.

- Henry Powell, Barclays
RESULTS
•  Barclays workflow iteration time decreased
from hours to seconds
•  Alluxio enabled workflows that were
impossible before
•  By keeping data only in memory, the I/O cost of
loading and storing in Alluxio is now on the
order of seconds
Relational Database
Share Data Across Jobs at Memory-
Speed
•  Reducing time from hours to
seconds
•  Solving issues of Teradata as
under storage
9
CASE STUDY
We’ve been running Alluxio in production
for over 9 months, resulting in 15x
speedup on average, and 300x speedup at
peak service times.

- Xueyan Li, Qunar
RESULTS
•  Alluxio’s unified namespace enables different
applications and frameworks to easily interact
with their data from different storage systems
•  Improved the performance of their system with
15x – 300x speedups
•  Tiered storage feature manages various
storage resources including memory, SSD and
disk
Transparently Manage Data Across
Different Storage Systems
•  6 billion logs (4.5 TB) daily
•  Mix of Memory + HDD
10
OPEN SOURCE ALLUXIO
11
Contact us anytime at info@alluxio.com. 
Follow us on Twitter @Alluxio.
Check out www.alluxio.com and www.alluxio.org.

Thank you!
12

More Related Content

What's hot

What's hot (20)

Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
 
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
 
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
 
Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
 

Viewers also liked

232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
NAVER D2
 

Viewers also liked (6)

232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
 
Play node conference
Play node conferencePlay node conference
Play node conference
 
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
 
Node.js in Flitto
Node.js in FlittoNode.js in Flitto
Node.js in Flitto
 
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
 

Similar to Alluxio Keynote at Strata+Hadoop World Beijing 2016

Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Community
 
Hitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-family
Hitachi Vantara
 

Similar to Alluxio Keynote at Strata+Hadoop World Beijing 2016 (20)

Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
 
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
#MFSummit2016 Operate: The race for space
#MFSummit2016 Operate: The race for space#MFSummit2016 Operate: The race for space
#MFSummit2016 Operate: The race for space
 
Workload Centric Scale-Out Storage for Next Generation Datacenter
Workload Centric Scale-Out Storage for Next Generation DatacenterWorkload Centric Scale-Out Storage for Next Generation Datacenter
Workload Centric Scale-Out Storage for Next Generation Datacenter
 
Webinar: NAS vs Object - Can NAS Make a Comeback?
Webinar: NAS vs Object - Can NAS Make a Comeback?Webinar: NAS vs Object - Can NAS Make a Comeback?
Webinar: NAS vs Object - Can NAS Make a Comeback?
 
SoftLayer Storage Services Overview
SoftLayer Storage Services OverviewSoftLayer Storage Services Overview
SoftLayer Storage Services Overview
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Hitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-family
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory Speed
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Alluxio Keynote at Strata+Hadoop World Beijing 2016

  • 1. Open Source Memory-Speed Virtual Distributed Storage August 2016 @ Strata+Hadoop World Beijing Haoyuan Li
  • 2. BIG DATA ECOSYSTEM YESTERDAY 2
  • 3. BIG DATA ECOSYSTEM TODAY = COMPLEXITY … … 3
  • 4. FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System Enabling any application to access data from any storage system at memory-speed BIG DATA ECOSYSTEM TODAY WITH ALLUXIO … … 4
  • 5. HARDWARE TRENDS RAM throughput increasing exponentially Disk throughput increasing slowly Memory prices halving every 18 months Memory Locality is the Key! 5
  • 6. Non-persistent data-storage software WHAT IS ALLUXIO Memory-Speed Virtual Distributed Storage Scale out architecture Virtualized across different storage types under a unified namespace Memory-speed access to data 6
  • 7. BENEFITS Performance High performance data access Unification New workflows across existing data and access data in any storage system Flexibility Work with the compute and storage frameworks of your choice Cost Grow compute and storage independently 7
  • 8. CASE STUDY Baidu File System The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. - Shaoshan Liu, Baidu RESULTS •  Data queries are now 30x faster with Alluxio •  1000-worker Alluxio cluster run stably, providing over 50TB of RAM space •  By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds Accelerate Interact with Remote Storage •  200+ nodes deployment •  2+ petabytes of storage •  Mix of memory + HDD •  1000-worker cluster 8
  • 9. CASE STUDY Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. - Henry Powell, Barclays RESULTS •  Barclays workflow iteration time decreased from hours to seconds •  Alluxio enabled workflows that were impossible before •  By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds Relational Database Share Data Across Jobs at Memory- Speed •  Reducing time from hours to seconds •  Solving issues of Teradata as under storage 9
  • 10. CASE STUDY We’ve been running Alluxio in production for over 9 months, resulting in 15x speedup on average, and 300x speedup at peak service times. - Xueyan Li, Qunar RESULTS •  Alluxio’s unified namespace enables different applications and frameworks to easily interact with their data from different storage systems •  Improved the performance of their system with 15x – 300x speedups •  Tiered storage feature manages various storage resources including memory, SSD and disk Transparently Manage Data Across Different Storage Systems •  6 billion logs (4.5 TB) daily •  Mix of Memory + HDD 10
  • 12. Contact us anytime at info@alluxio.com. Follow us on Twitter @Alluxio. Check out www.alluxio.com and www.alluxio.org. Thank you! 12