SlideShare a Scribd company logo
1 of 55
Download to read offline
Haoyuan Li, Tachyon Nexus

haoyuan@tachyonnexus.com

September 30, 2015 @ Strata and Hadoop World NYC 2015
An Open Source Memory-Centric
Distributed Storage System
Outline
•  Open Source
•  Introduction to Tachyon
•  New Features
•  Getting Involved
2
Outline
•  Open Source
•  Introduction to Tachyon
•  New Features
•  Getting Involved
3
History
•  Started at UC Berkeley AMPLab
–  From summer 2012
–  Same lab produced Apache Spark and Apache Mesos
•  Open sourced
–  April 2013
–  Apache License 2.0
–  Latest Release: Version 0.7.1 (August 2015)
•  Deployed at > 100 companies
4
Contributors Growth
5
v0.4!
Feb ‘14
v0.3!
Oct ‘13
v0.2
Apr ‘13
v0.1
Dec ‘12
v0.6!
Mar ‘15
v0.5!
Jul ‘14
v0.7!
Jul ‘15
1
 3
15
30
46
70
111
Contributors Growth
6
> 150 Contributors
(3x increment over the last Strata NYC)
> 50 Organizations
Contributors Growth
7
One of the Fastest
Growing Big Data
Open Source
Project
Thanks to Contributors and Users!
8
One Tachyon Production

Deployment Example
•  Baidu (Dominant Search Engine in China,
~ 50 Billion USD Market Cap)
•  Framework: SparkSQL
•  Under Storage: Baidu’s File System
•  Storage Media: MEM + HDD
•  100+ nodes deployment
•  1PB+ managed space
•  30x Performance Improvement
9
Outline
•  Open Source
•  Introduction to Tachyon
•  New Features
•  Getting Involved
10
Tachyon is an
Open Source

Memory-centric

Distributed
Storage System
11
12
Why Tachyon?
Performance Trend: 

Memory is Fast
•  RAM throughput 

increasing exponentially
•  Disk throughput
increasing slowly
13
Memory-locality key to interactive response times
Price Trend: Memory is Cheaper
source:	
  jcmit.com	
  
14
Realized by many…
15
16
Is the
Problem Solved?
17
Missing a Solution
for the Storage Layer
A Use Case Example with - 
•  Fast, in-memory data processing framework
– Keep one in-memory copy inside JVM
– Track lineage of operations used to derive data
– Upon failure, use lineage to recompute data
map
filter
 map
join
 reduce
Lineage Tracking
18
Issue 1
19
Data Sharing is the bottleneck in
analytics pipeline:

Slow writes to disk
Spark Job1
Spark mem
block manager
block 1
block 3
Spark Job2
Spark mem
block manager
block 3
block 1
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine & 
execution engine
same process
(slow writes)
Issue 1
20
Spark Job
Spark mem
block manager
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Data Sharing is the bottleneck in
analytics pipeline:

Slow writes to disk
storage engine & 
execution engine
same process
(slow writes)
Issue 1 resolved with Tachyon
21
Memory-speed data sharing

among jobs in different
frameworks
execution engine & 

storage engine
same process
(fast writes)
Spark Job
Spark mem
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS	
  
disk	
  
block	
  1	
  
block	
  3	
  
block	
  2	
  
block	
  4	
  
Tachyon!
in-memory
block 1
block 3
 block 4
Issue 2
22
Spark Task
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
Cache loss when process
crashes
Issue 2
23
crash
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
Cache loss when process
crashes
HDFS / Amazon S3
Issue 2
24
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
crash
Cache loss when process
crashes
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Tachyon!
in-memory
block 1
block 3
 block 4
Issue 2 resolved with Tachyon
25
Spark Task
Spark memory
block manager
execution engine & 

storage engine
same process
Keep in-memory data safe,

even when a job crashes.
Issue 2 resolved with Tachyon
26
HDFS	
  
disk	
  
block	
  1	
  
block	
  3	
  
block	
  2	
  
block	
  4	
  
execution engine & 

storage engine
same process
Tachyon!
in-memory 

block 1
block 3
 block 4
crash
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Keep in-memory data safe,

even when a job crashes.
HDFS / Amazon S3
Issue 3
27
In-memory Data Duplication &
Java Garbage Collection
Spark Job1
Spark mem
block manager
block 1
block 3
Spark Job2
Spark mem
block manager
block 3
block 1
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
(duplication & GC)
Issue 3 resolved with Tachyon
28
No in-memory data duplication,

much less GC
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
(no duplication & GC)
HDFS	
  
disk	
  
block	
  1	
  
block	
  3	
  
block	
  2	
  
block	
  4	
  
Tachyon!
in-memory
block 1
block 3
 block 4
Previously Mentioned
•  A memory-centric storage architecture
•  Push lineage down to storage layer
29
Tachyon Memory-Centric Architecture
30
Tachyon Memory-Centric Architecture
31
Lineage in Tachyon
32
Outline
•  Open Source
•  Introduction to Tachyon
•  New Features
•  Getting Involved
33
1) Eco-system:
Enable new workload in any storage;
Work with the framework of your choice;
34
2) Tachyon running in
production environment, 
both 
in the Cloud and on Premise.
35
Use Case: Baidu
•  Framework: SparkSQL
•  Under Storage: Baidu’s File System
•  Storage Media: MEM + HDD
•  100+ nodes deployment
•  1PB+ managed space
•  30x Performance Improvement
36
Use Case: a SAAS Company
•  Framework: Impala
•  Under Storage: S3
•  Storage Media: MEM + SSD
•  15x Performance Improvement
37
Use Case: an Oil Company
•  Framework: Spark
•  Under Storage: GlusterFS
•  Storage Media: MEM only
•  Analyzing data in traditional storage
38
Use Case: a SAAS Company
•  Framework: Spark
•  Under Storage: S3
•  Storage Media: SSD only
•  Elastic Tachyon deployment
39
40
What if 

data size exceeds 

memory capacity?
41
3) Tiered Storage:

Tachyon Manages More Than DRAM
MEM
SSD
HDD
Faster
Higher 

Capacity
42
Configurable Storage Tiers
MEM only
MEM + HHD
SSD only
43
4) Pluggable Data Management Policy
Evict stale data to
lower tier
Promote hot data to
upper tier
44
Pin Data in Memory
5) Transparent Naming
45
6) Unified Namespace
46
More Features
•  7) Remote Write Support
•  8) Easy deployment with Mesos and Yarn
•  9) Initial Security Support
•  10) One Command Cluster Deployment
•  11) Metrics Reporting for Clients, Workers,
and Master
47
12) More Under Storage Supports
48
Reported Tachyon Usage
49
Outline
•  Open Source
•  Introduction to Tachyon
•  New Features
•  Getting Involved
50
Memory-Centric Distributed Storage
Welcome to try, contact, and collaborate!
51
JIRA New Contributor Tasks
•  Team consists of Tachyon creators, top contributors
•  Series A ($7.5 million) from Andreessen Horowitz


•  Committed to Tachyon Open Source


52
53
Strata NYC 2015
•  Welcome to visit us at our booth #P18.
•  Check out other Tachyon related talks.
–  First-ever scalable, distributed deep learning architecture
using Spark and Tachyon
•  Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc)
•  2:05pm–2:45pm Thursday, 10/01/2015
–  Faster time to insight using Spark, Tachyon, and Zeppelin
•  Nirmal Ranganathan (Rackspace Hosting)
•  2:05pm–2:45pm Thursday, 10/01/2015
54
•  Try Tachyon: http://tachyon-project.org


•  Develop Tachyon: https://github.com/amplab/tachyon


•  Meet Friends: http://www.meetup.com/Tachyon


•  Get News: http://goo.gl/mwB2sX
•  Tachyon Nexus: http://www.tachyonnexus.com

•  Contact us: haoyuan@tachyonnexus.com
55

More Related Content

What's hot

An Introduction to OAuth2
An Introduction to OAuth2An Introduction to OAuth2
An Introduction to OAuth2Aaron Parecki
 
IBM API Connect Deployment `Good Practices - IBM Think 2018
IBM API Connect Deployment `Good Practices - IBM Think 2018IBM API Connect Deployment `Good Practices - IBM Think 2018
IBM API Connect Deployment `Good Practices - IBM Think 2018Chris Phillips
 
SIngle Sign On with Keycloak
SIngle Sign On with KeycloakSIngle Sign On with Keycloak
SIngle Sign On with KeycloakJulien Pivotto
 
PRISMACLOUD Cloud Security and Privacy by Design
PRISMACLOUD Cloud Security and Privacy by DesignPRISMACLOUD Cloud Security and Privacy by Design
PRISMACLOUD Cloud Security and Privacy by DesignPRISMACLOUD Project
 
How to create a User Defined Policy with IBM APIc (v10)
How to create a User Defined Policy with IBM APIc (v10)How to create a User Defined Policy with IBM APIc (v10)
How to create a User Defined Policy with IBM APIc (v10)Shiu-Fun Poon
 
Báo Cáo Thự Tập ISA Server 2006
Báo Cáo Thự Tập ISA Server 2006Báo Cáo Thự Tập ISA Server 2006
Báo Cáo Thự Tập ISA Server 2006xeroxk
 
Token, token... From SAML to OIDC
Token, token... From SAML to OIDCToken, token... From SAML to OIDC
Token, token... From SAML to OIDCShiu-Fun Poon
 
Giseproi plantilla especificación casos de uso
Giseproi   plantilla especificación casos de usoGiseproi   plantilla especificación casos de uso
Giseproi plantilla especificación casos de usogiseproi
 
OOP in Java - Ver1.1
OOP in Java -  Ver1.1OOP in Java -  Ver1.1
OOP in Java - Ver1.1vdlinh08
 
Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023
Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023
Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023Jean-Michel Doudoux
 
Implementing OAuth
Implementing OAuthImplementing OAuth
Implementing OAuthleahculver
 
Phân tích mã độc cơ bản - báo cáo thực tập
Phân tích mã độc cơ bản - báo cáo thực tậpPhân tích mã độc cơ bản - báo cáo thực tập
Phân tích mã độc cơ bản - báo cáo thực tậpPhạm Trung Đức
 

What's hot (20)

OpenID Connect Explained
OpenID Connect ExplainedOpenID Connect Explained
OpenID Connect Explained
 
An Introduction to OAuth2
An Introduction to OAuth2An Introduction to OAuth2
An Introduction to OAuth2
 
IBM API Connect Deployment `Good Practices - IBM Think 2018
IBM API Connect Deployment `Good Practices - IBM Think 2018IBM API Connect Deployment `Good Practices - IBM Think 2018
IBM API Connect Deployment `Good Practices - IBM Think 2018
 
SIngle Sign On with Keycloak
SIngle Sign On with KeycloakSIngle Sign On with Keycloak
SIngle Sign On with Keycloak
 
OAuth 2.0
OAuth 2.0OAuth 2.0
OAuth 2.0
 
PRISMACLOUD Cloud Security and Privacy by Design
PRISMACLOUD Cloud Security and Privacy by DesignPRISMACLOUD Cloud Security and Privacy by Design
PRISMACLOUD Cloud Security and Privacy by Design
 
How to create a User Defined Policy with IBM APIc (v10)
How to create a User Defined Policy with IBM APIc (v10)How to create a User Defined Policy with IBM APIc (v10)
How to create a User Defined Policy with IBM APIc (v10)
 
Phân tích tự động các website để phát hiện lỗ hổng tiêm nhiễm, 9đ
Phân tích tự động các website để phát hiện lỗ hổng tiêm nhiễm, 9đPhân tích tự động các website để phát hiện lỗ hổng tiêm nhiễm, 9đ
Phân tích tự động các website để phát hiện lỗ hổng tiêm nhiễm, 9đ
 
The state of the art in iOS Forensics
The state of the art in iOS ForensicsThe state of the art in iOS Forensics
The state of the art in iOS Forensics
 
Báo Cáo Thự Tập ISA Server 2006
Báo Cáo Thự Tập ISA Server 2006Báo Cáo Thự Tập ISA Server 2006
Báo Cáo Thự Tập ISA Server 2006
 
Spring Security
Spring SecuritySpring Security
Spring Security
 
Token, token... From SAML to OIDC
Token, token... From SAML to OIDCToken, token... From SAML to OIDC
Token, token... From SAML to OIDC
 
Giseproi plantilla especificación casos de uso
Giseproi   plantilla especificación casos de usoGiseproi   plantilla especificación casos de uso
Giseproi plantilla especificación casos de uso
 
OAuth 2
OAuth 2OAuth 2
OAuth 2
 
OOP in Java - Ver1.1
OOP in Java -  Ver1.1OOP in Java -  Ver1.1
OOP in Java - Ver1.1
 
Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023
Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023
Les nouveautés de Java 19, 20 et 21 - RivieraDev 2023
 
Implementing OAuth
Implementing OAuthImplementing OAuth
Implementing OAuth
 
OIDC4VP for AB/C WG
OIDC4VP for AB/C WGOIDC4VP for AB/C WG
OIDC4VP for AB/C WG
 
Phân tích mã độc cơ bản - báo cáo thực tập
Phân tích mã độc cơ bản - báo cáo thực tậpPhân tích mã độc cơ bản - báo cáo thực tập
Phân tích mã độc cơ bản - báo cáo thực tập
 
Báo cáo snort
Báo cáo snortBáo cáo snort
Báo cáo snort
 

Viewers also liked

Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.
 
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed StorageOpen Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed StorageAlluxio, Inc.
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersAlluxio, Inc.
 
Tachyon meetup slides.
Tachyon meetup slides.Tachyon meetup slides.
Tachyon meetup slides.David Groozman
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Micah Altman
 
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Phil Cryer
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionNuno Loureiro
 
7 distributed storage_open_stack
7 distributed storage_open_stack7 distributed storage_open_stack
7 distributed storage_open_stackopenstackindia
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014Công Lợi Dương
 
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Alluxio, Inc.
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeVenkatesh Devam ☁
 
Use Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierUse Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierManfred Furuholmen
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Tachyon Nexus, Inc.
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Jiří Šimša
 

Viewers also liked (20)

Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed StorageOpen Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed Storage
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
 
Tachyon meetup slides.
Tachyon meetup slides.Tachyon meetup slides.
Tachyon meetup slides.
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and Processing
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage Solution
 
7 distributed storage_open_stack
7 distributed storage_open_stack7 distributed storage_open_stack
7 distributed storage_open_stack
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014
 
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage Scheme
 
Integrated Distributed Solar and Storage
Integrated Distributed Solar and StorageIntegrated Distributed Solar and Storage
Integrated Distributed Solar and Storage
 
Use Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierUse Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage Tier
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
 

Similar to Tachyon: An Open Source Memory-Centric Distributed Storage System

Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Nexus, Inc.
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangSpark Summit
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemAlluxio, Inc.
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMShaoshan Liu
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonAlluxio, Inc.
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3Alluxio, Inc.
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Databricks
 
Control dataset partitioning and cache to optimize performances in Spark
Control dataset partitioning and cache to optimize performances in SparkControl dataset partitioning and cache to optimize performances in Spark
Control dataset partitioning and cache to optimize performances in SparkChristophePraud2
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaAlluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 

Similar to Tachyon: An Open Source Memory-Centric Distributed Storage System (20)

Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage System
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on Tachyon
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
Control dataset partitioning and cache to optimize performances in Spark
Control dataset partitioning and cache to optimize performances in SparkControl dataset partitioning and cache to optimize performances in Spark
Control dataset partitioning and cache to optimize performances in Spark
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 

Recently uploaded

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Tachyon: An Open Source Memory-Centric Distributed Storage System

  • 1. Haoyuan Li, Tachyon Nexus
 haoyuan@tachyonnexus.com
 September 30, 2015 @ Strata and Hadoop World NYC 2015 An Open Source Memory-Centric Distributed Storage System
  • 2. Outline •  Open Source •  Introduction to Tachyon •  New Features •  Getting Involved 2
  • 3. Outline •  Open Source •  Introduction to Tachyon •  New Features •  Getting Involved 3
  • 4. History •  Started at UC Berkeley AMPLab –  From summer 2012 –  Same lab produced Apache Spark and Apache Mesos •  Open sourced –  April 2013 –  Apache License 2.0 –  Latest Release: Version 0.7.1 (August 2015) •  Deployed at > 100 companies 4
  • 5. Contributors Growth 5 v0.4! Feb ‘14 v0.3! Oct ‘13 v0.2 Apr ‘13 v0.1 Dec ‘12 v0.6! Mar ‘15 v0.5! Jul ‘14 v0.7! Jul ‘15 1 3 15 30 46 70 111
  • 6. Contributors Growth 6 > 150 Contributors (3x increment over the last Strata NYC) > 50 Organizations
  • 7. Contributors Growth 7 One of the Fastest Growing Big Data Open Source Project
  • 8. Thanks to Contributors and Users! 8
  • 9. One Tachyon Production
 Deployment Example •  Baidu (Dominant Search Engine in China, ~ 50 Billion USD Market Cap) •  Framework: SparkSQL •  Under Storage: Baidu’s File System •  Storage Media: MEM + HDD •  100+ nodes deployment •  1PB+ managed space •  30x Performance Improvement 9
  • 10. Outline •  Open Source •  Introduction to Tachyon •  New Features •  Getting Involved 10
  • 11. Tachyon is an Open Source
 Memory-centric
 Distributed Storage System 11
  • 13. Performance Trend: 
 Memory is Fast •  RAM throughput 
 increasing exponentially •  Disk throughput increasing slowly 13 Memory-locality key to interactive response times
  • 14. Price Trend: Memory is Cheaper source:  jcmit.com   14
  • 17. 17 Missing a Solution for the Storage Layer
  • 18. A Use Case Example with - •  Fast, in-memory data processing framework – Keep one in-memory copy inside JVM – Track lineage of operations used to derive data – Upon failure, use lineage to recompute data map filter map join reduce Lineage Tracking 18
  • 19. Issue 1 19 Data Sharing is the bottleneck in analytics pipeline:
 Slow writes to disk Spark Job1 Spark mem block manager block 1 block 3 Spark Job2 Spark mem block manager block 3 block 1 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process (slow writes)
  • 20. Issue 1 20 Spark Job Spark mem block manager block 1 block 3 Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 Data Sharing is the bottleneck in analytics pipeline:
 Slow writes to disk storage engine & execution engine same process (slow writes)
  • 21. Issue 1 resolved with Tachyon 21 Memory-speed data sharing
 among jobs in different frameworks execution engine & 
 storage engine same process (fast writes) Spark Job Spark mem Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS   disk   block  1   block  3   block  2   block  4   Tachyon! in-memory block 1 block 3 block 4
  • 22. Issue 2 22 Spark Task Spark memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 execution engine & 
 storage engine same process Cache loss when process crashes
  • 23. Issue 2 23 crash Spark memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 execution engine & 
 storage engine same process Cache loss when process crashes
  • 24. HDFS / Amazon S3 Issue 2 24 block 1 block 3 block 2 block 4 execution engine & 
 storage engine same process crash Cache loss when process crashes
  • 25. HDFS / Amazon S3 block 1 block 3 block 2 block 4 Tachyon! in-memory block 1 block 3 block 4 Issue 2 resolved with Tachyon 25 Spark Task Spark memory block manager execution engine & 
 storage engine same process Keep in-memory data safe,
 even when a job crashes.
  • 26. Issue 2 resolved with Tachyon 26 HDFS   disk   block  1   block  3   block  2   block  4   execution engine & 
 storage engine same process Tachyon! in-memory block 1 block 3 block 4 crash HDFS / Amazon S3 block 1 block 3 block 2 block 4 Keep in-memory data safe,
 even when a job crashes.
  • 27. HDFS / Amazon S3 Issue 3 27 In-memory Data Duplication & Java Garbage Collection Spark Job1 Spark mem block manager block 1 block 3 Spark Job2 Spark mem block manager block 3 block 1 block 1 block 3 block 2 block 4 execution engine & 
 storage engine same process (duplication & GC)
  • 28. Issue 3 resolved with Tachyon 28 No in-memory data duplication,
 much less GC Spark Job1 Spark mem Spark Job2 Spark mem HDFS / Amazon S3 block 1 block 3 block 2 block 4 execution engine & 
 storage engine same process (no duplication & GC) HDFS   disk   block  1   block  3   block  2   block  4   Tachyon! in-memory block 1 block 3 block 4
  • 29. Previously Mentioned •  A memory-centric storage architecture •  Push lineage down to storage layer 29
  • 33. Outline •  Open Source •  Introduction to Tachyon •  New Features •  Getting Involved 33
  • 34. 1) Eco-system: Enable new workload in any storage; Work with the framework of your choice; 34
  • 35. 2) Tachyon running in production environment, both in the Cloud and on Premise. 35
  • 36. Use Case: Baidu •  Framework: SparkSQL •  Under Storage: Baidu’s File System •  Storage Media: MEM + HDD •  100+ nodes deployment •  1PB+ managed space •  30x Performance Improvement 36
  • 37. Use Case: a SAAS Company •  Framework: Impala •  Under Storage: S3 •  Storage Media: MEM + SSD •  15x Performance Improvement 37
  • 38. Use Case: an Oil Company •  Framework: Spark •  Under Storage: GlusterFS •  Storage Media: MEM only •  Analyzing data in traditional storage 38
  • 39. Use Case: a SAAS Company •  Framework: Spark •  Under Storage: S3 •  Storage Media: SSD only •  Elastic Tachyon deployment 39
  • 40. 40 What if 
 data size exceeds 
 memory capacity?
  • 41. 41 3) Tiered Storage:
 Tachyon Manages More Than DRAM MEM SSD HDD Faster Higher 
 Capacity
  • 42. 42 Configurable Storage Tiers MEM only MEM + HHD SSD only
  • 43. 43 4) Pluggable Data Management Policy Evict stale data to lower tier Promote hot data to upper tier
  • 44. 44 Pin Data in Memory
  • 47. More Features •  7) Remote Write Support •  8) Easy deployment with Mesos and Yarn •  9) Initial Security Support •  10) One Command Cluster Deployment •  11) Metrics Reporting for Clients, Workers, and Master 47
  • 48. 12) More Under Storage Supports 48
  • 50. Outline •  Open Source •  Introduction to Tachyon •  New Features •  Getting Involved 50
  • 51. Memory-Centric Distributed Storage Welcome to try, contact, and collaborate! 51 JIRA New Contributor Tasks
  • 52. •  Team consists of Tachyon creators, top contributors •  Series A ($7.5 million) from Andreessen Horowitz
 •  Committed to Tachyon Open Source
 52
  • 53. 53
  • 54. Strata NYC 2015 •  Welcome to visit us at our booth #P18. •  Check out other Tachyon related talks. –  First-ever scalable, distributed deep learning architecture using Spark and Tachyon •  Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc) •  2:05pm–2:45pm Thursday, 10/01/2015 –  Faster time to insight using Spark, Tachyon, and Zeppelin •  Nirmal Ranganathan (Rackspace Hosting) •  2:05pm–2:45pm Thursday, 10/01/2015 54
  • 55. •  Try Tachyon: http://tachyon-project.org
 •  Develop Tachyon: https://github.com/amplab/tachyon
 •  Meet Friends: http://www.meetup.com/Tachyon
 •  Get News: http://goo.gl/mwB2sX •  Tachyon Nexus: http://www.tachyonnexus.com •  Contact us: haoyuan@tachyonnexus.com 55