SlideShare a Scribd company logo
Tachyon: memory centric, fault tolerance
storage for cluster framworks
presented by Viet-Trung Tran
Memory is King
• RAM throughput increasing exponentially
• Disk throughput increasing slowly
Memory-locality key to interactive response time
Memory as cache
• Improve READ
• Cannot help much with write
• Replication for fault tolerance
• Network bandwidth and latency are much worse than that of memory
• Write throughput is limited by disk I/O
• Required at least one copy on disk
• Inter-job data sharing cost dominates pipeline end-to-end latency
• 34% jobs output as large as input (Cloudera survey)
Different jobs share data
Slow writes to disk
Spark Task
Spark mem
block manager
block 1
block 3
Spark Task
Spark mem
block manager
block 3
block 1
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
(slow writes)
4
Different frameworks share data
Spark Task
Spark mem
block manager
block 1
block 3
Hadoop MR
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
(slow writes)
5
Slow writes to disk
Tachyon: realiable data sharing at memory speed
within and across frameworks/jobs
Tachyon
Spark
MapRe
duce
Spark
SQL
H2O GraphX Impala
HDFS S3
Gluster
FS
Orange
FS
NFS Ceph ……
……
Challenges
How to achieve reliability data sharing without replication?
Target workload properties
• Immutable data
• Deterministic jobs
• Locality based scheduling
• All data vs working set
• Program size vs data size
System architecture
Consists of two layer
• Lineage
• Deliver high throughput I/O
• Capture sequence of jobs/tasks that create output
• Persistence
• Asynchronous checkpoints
Facts
• One data copy in memory
• Recomputation for fault-tolerance
Memory-Centric Storage Architecture
10
Master Node
• Similar to HDFS and GPS
• Passive standby model
• BUT also contains a workflow manager
• Track lineage information
• Compute checkpoint order
• Interact with cluster resource manager to allocate resources for re-
computations
Lineage
More complex lineage
Lineage metadata
• Binary program
• Configuration
• Input Files List
• Output Files List
• Dependency Type
• Narrow (filter, map)
• Wide (suffle, join)
Fault-recovery by recomputations
• Challenge
• Bounding the recomputation cost for a long running storage
• Asynchronous checkpointing
• Allocate resources for recomputations
• Make sure recomputation tasks get enough resources
• Do not impact system performance (task priorities)
• Assumption
• Input files are immutable
• job executions are deterministic
• Client side caching to mitigate read hotspots
Asynchronous checkpointing
• Goals
• Bounded recomputation time
• Checkpointing hot files
• Avoid checkpointing temp files
• Edge algoritim
• Modeling relationships of files with a DAG
• Vertices are files
• Edge from A to B if B is generated by a job that read A
Edge algorithm
• Checkpoint leaves
• Checkpointing hot files
• Most file access are less than 3 ( yahoo survey for big data workload)
• Thus, access more than twice get checkpointed
• Dealing with large dataset
• 96% active job sizes fit in the cluster memory
• synchronously write dataset above a defined threshold to disk
• Most of the files in memory checkpointed can be evicted from memory
to make room
Resource allocation
• Depend on the scheduling policy of the running cluster
• Requirements
• Priority compatibility
• Resource sharing
• Avoid cascading recomputation
• Best ordering recomputation
• Most common policies
• priority based
• weighted fair sharing
Priority based scheduler
•
Fair sharing based scheduler
Evaluation
• 110x faster than MemHDFS
• 4x faster in realistic jobs
• 3,8x faster in case of failure
• Recover from master failure within 1 second
• reduce replication caused network traffic up to 50%
• recomputation impact is less than 1,6%

More Related Content

What's hot

HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
Michael Stack
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
Geoff Hendrey
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
Norberto Leite
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
Databricks
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
Michael Stack
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
oysteing
 
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka PresentationJanuary 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
Yahoo Developer Network
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
MongoDB
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
Vinod Nayal
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
Vinoth Chandar
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
oysteing
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
Modern Data Stack France
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in Hadoop
DataWorks Summit
 
Pnuts Review
Pnuts ReviewPnuts Review
Pnuts Review
Ruchika Mehresh
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
Bill Graham
 

What's hot (20)

HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiHBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
 
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka PresentationJanuary 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in Hadoop
 
Pnuts Review
Pnuts ReviewPnuts Review
Pnuts Review
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 

Viewers also liked

The Rules - SGS
The Rules - SGSThe Rules - SGS
The Rules - SGS
Tania Kasongo
 
Balanceo de una ecuación química
Balanceo de una ecuación químicaBalanceo de una ecuación química
Balanceo de una ecuación química
dopamina mexico
 
Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...
Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...
Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...
Paul Brown
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analytics
Viet-Trung TRAN
 
Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...
Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...
Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...
Dave McClure
 
Social media strategies for libraries poster
Social media strategies for libraries posterSocial media strategies for libraries poster
Social media strategies for libraries poster
Nataly Blas
 
Jobs consultant
Jobs consultantJobs consultant
Jobs consultant
Tenforce
 
How to increase traffic to your WordPress website.
How to increase traffic to your WordPress website. How to increase traffic to your WordPress website.
How to increase traffic to your WordPress website.
Liquis Design
 
Charitable Giving and Happiness
Charitable Giving and HappinessCharitable Giving and Happiness
Charitable Giving and Happiness
Faircom New York
 
Latin Dansları
Latin DanslarıLatin Dansları
Latin Dansları
Busrawien28
 
teaching methods
teaching methods teaching methods
teaching methods
estefycoronel
 
xoxooo tkmmm
xoxooo tkmmmxoxooo tkmmm
xoxooo tkmmm
ceny2
 
Practica 2 quimica organica -espol
Practica 2  quimica organica -espolPractica 2  quimica organica -espol
Practica 2 quimica organica -espol
Lissy Rodriguez
 
The State of Facilities at Eastern Region Institutions JUNE16
The State of Facilities at Eastern Region Institutions JUNE16The State of Facilities at Eastern Region Institutions JUNE16
The State of Facilities at Eastern Region Institutions JUNE16
Sightlines
 
William Gross Sues Pimco for Hundreds of Millions
William Gross Sues Pimco for Hundreds of MillionsWilliam Gross Sues Pimco for Hundreds of Millions
William Gross Sues Pimco for Hundreds of Millions
Tric Park
 
Moving to the Right Side of Safety
Moving to the Right Side of SafetyMoving to the Right Side of Safety
Moving to the Right Side of Safety
SAMTRAC International
 
Jvm mbeans jmxtran
Jvm mbeans jmxtranJvm mbeans jmxtran
Jvm mbeans jmxtran
adm_exoplatform
 
God Is Forgiving
God Is ForgivingGod Is Forgiving
God Is Forgiving
William Harris
 
Torque
TorqueTorque
Torque
caitlinforan
 
Guia De Estudio Digestivo
Guia De Estudio DigestivoGuia De Estudio Digestivo
Guia De Estudio Digestivo
Luciana Yohai
 

Viewers also liked (20)

The Rules - SGS
The Rules - SGSThe Rules - SGS
The Rules - SGS
 
Balanceo de una ecuación química
Balanceo de una ecuación químicaBalanceo de una ecuación química
Balanceo de una ecuación química
 
Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...
Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...
Challenging our Notions of Learning: Understanding How Web 2.0 Technology Wor...
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analytics
 
Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...
Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...
Ultimate Platform Hotness Smackdown (Twitter, Facebook, iPhone, Native Web / ...
 
Social media strategies for libraries poster
Social media strategies for libraries posterSocial media strategies for libraries poster
Social media strategies for libraries poster
 
Jobs consultant
Jobs consultantJobs consultant
Jobs consultant
 
How to increase traffic to your WordPress website.
How to increase traffic to your WordPress website. How to increase traffic to your WordPress website.
How to increase traffic to your WordPress website.
 
Charitable Giving and Happiness
Charitable Giving and HappinessCharitable Giving and Happiness
Charitable Giving and Happiness
 
Latin Dansları
Latin DanslarıLatin Dansları
Latin Dansları
 
teaching methods
teaching methods teaching methods
teaching methods
 
xoxooo tkmmm
xoxooo tkmmmxoxooo tkmmm
xoxooo tkmmm
 
Practica 2 quimica organica -espol
Practica 2  quimica organica -espolPractica 2  quimica organica -espol
Practica 2 quimica organica -espol
 
The State of Facilities at Eastern Region Institutions JUNE16
The State of Facilities at Eastern Region Institutions JUNE16The State of Facilities at Eastern Region Institutions JUNE16
The State of Facilities at Eastern Region Institutions JUNE16
 
William Gross Sues Pimco for Hundreds of Millions
William Gross Sues Pimco for Hundreds of MillionsWilliam Gross Sues Pimco for Hundreds of Millions
William Gross Sues Pimco for Hundreds of Millions
 
Moving to the Right Side of Safety
Moving to the Right Side of SafetyMoving to the Right Side of Safety
Moving to the Right Side of Safety
 
Jvm mbeans jmxtran
Jvm mbeans jmxtranJvm mbeans jmxtran
Jvm mbeans jmxtran
 
God Is Forgiving
God Is ForgivingGod Is Forgiving
God Is Forgiving
 
Torque
TorqueTorque
Torque
 
Guia De Estudio Digestivo
Guia De Estudio DigestivoGuia De Estudio Digestivo
Guia De Estudio Digestivo
 

Similar to Tachyon memory centric, fault tolerance storage for cluster framworks

Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Oracle Architecture software overview ppts
Oracle Architecture software overview pptsOracle Architecture software overview ppts
Oracle Architecture software overview ppts
ssuserf272701
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Geek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbGeek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring Tempdb
IDERA Software
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
vijayapraba1
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Investigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock HolmesInvestigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock Holmes
Richard Douglas
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
Perforce
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drives
Pratik Bhat
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
Joe Alex
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
Marius Adrian Popa
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
Bitta_man
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
dilip kumar
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
Tim Vaillancourt
 

Similar to Tachyon memory centric, fault tolerance storage for cluster framworks (20)

Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Oracle Architecture software overview ppts
Oracle Architecture software overview pptsOracle Architecture software overview ppts
Oracle Architecture software overview ppts
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Geek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbGeek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring Tempdb
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Investigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock HolmesInvestigate TempDB Like Sherlock Holmes
Investigate TempDB Like Sherlock Holmes
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drives
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
Viet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
Viet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
Viet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Viet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
Viet-Trung TRAN
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
Viet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
Viet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
Viet-Trung TRAN
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
Viet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Viet-Trung TRAN
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 

Tachyon memory centric, fault tolerance storage for cluster framworks

  • 1. Tachyon: memory centric, fault tolerance storage for cluster framworks presented by Viet-Trung Tran
  • 2. Memory is King • RAM throughput increasing exponentially • Disk throughput increasing slowly Memory-locality key to interactive response time
  • 3. Memory as cache • Improve READ • Cannot help much with write • Replication for fault tolerance • Network bandwidth and latency are much worse than that of memory • Write throughput is limited by disk I/O • Required at least one copy on disk • Inter-job data sharing cost dominates pipeline end-to-end latency • 34% jobs output as large as input (Cloudera survey)
  • 4. Different jobs share data Slow writes to disk Spark Task Spark mem block manager block 1 block 3 Spark Task Spark mem block manager block 3 block 1 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process (slow writes) 4
  • 5. Different frameworks share data Spark Task Spark mem block manager block 1 block 3 Hadoop MR YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process (slow writes) 5 Slow writes to disk
  • 6. Tachyon: realiable data sharing at memory speed within and across frameworks/jobs Tachyon Spark MapRe duce Spark SQL H2O GraphX Impala HDFS S3 Gluster FS Orange FS NFS Ceph …… ……
  • 7. Challenges How to achieve reliability data sharing without replication?
  • 8. Target workload properties • Immutable data • Deterministic jobs • Locality based scheduling • All data vs working set • Program size vs data size
  • 9. System architecture Consists of two layer • Lineage • Deliver high throughput I/O • Capture sequence of jobs/tasks that create output • Persistence • Asynchronous checkpoints Facts • One data copy in memory • Recomputation for fault-tolerance
  • 11.
  • 12. Master Node • Similar to HDFS and GPS • Passive standby model • BUT also contains a workflow manager • Track lineage information • Compute checkpoint order • Interact with cluster resource manager to allocate resources for re- computations
  • 15. Lineage metadata • Binary program • Configuration • Input Files List • Output Files List • Dependency Type • Narrow (filter, map) • Wide (suffle, join)
  • 16. Fault-recovery by recomputations • Challenge • Bounding the recomputation cost for a long running storage • Asynchronous checkpointing • Allocate resources for recomputations • Make sure recomputation tasks get enough resources • Do not impact system performance (task priorities) • Assumption • Input files are immutable • job executions are deterministic • Client side caching to mitigate read hotspots
  • 17. Asynchronous checkpointing • Goals • Bounded recomputation time • Checkpointing hot files • Avoid checkpointing temp files • Edge algoritim • Modeling relationships of files with a DAG • Vertices are files • Edge from A to B if B is generated by a job that read A
  • 18. Edge algorithm • Checkpoint leaves • Checkpointing hot files • Most file access are less than 3 ( yahoo survey for big data workload) • Thus, access more than twice get checkpointed • Dealing with large dataset • 96% active job sizes fit in the cluster memory • synchronously write dataset above a defined threshold to disk • Most of the files in memory checkpointed can be evicted from memory to make room
  • 19. Resource allocation • Depend on the scheduling policy of the running cluster • Requirements • Priority compatibility • Resource sharing • Avoid cascading recomputation • Best ordering recomputation • Most common policies • priority based • weighted fair sharing
  • 21. Fair sharing based scheduler
  • 22. Evaluation • 110x faster than MemHDFS • 4x faster in realistic jobs • 3,8x faster in case of failure • Recover from master failure within 1 second • reduce replication caused network traffic up to 50% • recomputation impact is less than 1,6%