➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
Using Spark with Tachyon by Gene Pang
1. Using Spark with Tachyon: An
Open Source Memory-Centric
Distributed Storage System
Gene Pang, Tachyon Nexus
gene@tachyonnexus.com
October 29, 2015 @ Spark Summit Europe
2. Who Am I?
• Gene Pang
• PhD from UC Berkeley AMPLab
• Software Engineer at Tachyon Nexus
3. • Team consists of Tachyon creators, top contributors
• Series A ($7.5 million) from Andreessen Horowitz
• Committed to Tachyon Open Source Project
• www.tachyonnexus.com
4.
5. Outline
• Introduction to Tachyon
• Using Spark with Tachyon
• New Tachyon Features
• Getting Involved
6. Outline
• Introduction to Tachyon
• Using Spark with Tachyon
• New Tachyon Features
• Getting Involved
7. History of Tachyon
• Started at UC Berkeley AMPLab
– From Summer 2012
– Same lab produced Apache Spark and Apache
Mesos
• Open sourced on April 2013
– Apache License 2.0
– Latest Release: Version 0.8.0 (October 2015)
• Deployed at > 100 companies
36. Issue 3 resolved with Tachyon
No in-memory data duplication,
much less GC
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Tachyon
in-memory
block 1
block 3 block 4
storage engine &
execution engine
same process
37. Tachyon Use Case: Baidu
• Framework: SparkSQL
• Under Storage: Baidu’s File System
• Tachyon Storage Media: MEM + HDD
• 100+ Tachyon nodes
• 1PB+ Tachyon managed storage
• 30x Performance Improvement
38. Tachyon Use Case: An Oil
Company
• Framework: Spark
• Under Storage: GlusterFS
• Tachyon Storage Media: MEM only
• Analyzing data in traditional storage
39. Tachyon Use Case: A SAAS
Company
• Framework: Spark
• Under Storage: S3
• Tachyon Storage Media: SSD only
• Elastic Tachyon deployment
40. Outline
• Introduction to Tachyon
• Using Spark with Tachyon
• New Tachyon Features
• Getting Involved
44. MEM only
MEM + HDD
SSD only
2. Tiered Storage
Configurable storage tiers
45. Evict stale data
to lower tier
Promote hot data
to upper tier
3. Pluggable Data Management
Policy
46. Tachyon Storage System (HDFS, S3, …)
tachyon://host:port/
Data Users
Reports Sales Alice Bob
s3n://bucket/directory/
Data Users
Reports Sales Alice Bob
4. Transparent Naming
• Persisted Tachyon files are mapped to under
storage
• Tachyon paths are preserved in under
storage
47. Tachyon Storage System A
tachyon://host:port/
Data Users
Alice Bob
hdfs://host:port/
Users
Alice Bob
Storage System B
s3n://bucket/directory/
Reports Sales
Reports Sales
5. Unified Namespace
• Unified namespace for multiple storage
systems
• Share data across storage systems
• On-the-fly mounting/unmounting
48. Additional Features
Remote Write Support
Easy deployment with Mesos and Yarn
Initial Security Support
One Command Cluster Deployment
Metrics for Clients/Workers/Master
49. Outline
• Introduction to Tachyon
• Using Spark with Tachyon
• New Tachyon Features
• Getting Involved
50. Welcome users and collaborators!
Memory-Centric Distributed
Storage System