SlideShare a Scribd company logo
1
Spark, Tachyon & Mesos
Internals
Data Days Texas
Jan 2015
2
• xPatterns architecture
• Demos (data pipeline, warehouse, notebook ML, real-
time apis, ops, anti-fraud GUI)
• BDAS++ (jaws-spark-sql-rest, spark-job-rest)
• Spark internals (count, shuffle, kryo)
• Tachyon internals (apis)
• Mesos internals (schedulers, framework starvation)
Agenda
Let’s Build Something
4
• Jaws, http spark sql rest service
• http://github.com/Atigeo/http-spark-sql-server
 Backward compatible with Shark and Spark 0.x stack
• Spark Job Server
 multiple Spark contexts in same JVM, job submission in Java + Scala
https://github.com/Atigeo/spark-job-rest
• Mesos framework starvation bug
 https://github.com/Atigeo/mesos_starvation
• Tachyon patch (https://github.com/amplab/tachyon/pull/482)
BDAS++
5
• persist(OFF_HEAP) temporary storage
• RDD.persist() OFF_HEAP > MEMORY_SER_AND_DISK
• count() when ser/de is expensive, gc cost
• Shuffle Spark vs Hadoop, file consolidation, spillage, compression
• Storage vs memory fraction
• Kryo
• Do not set spark.executor.uri
Spark Internals
6
• partial in-memory file storage bug
• HadoopRDD vs TachyonRDD
• journal file on hdfs -> backup of local master disk
• hdfs api
• RawTable in Shark
• native API: getInStream(CACHE|NO_CACHE) -> local workers
• do not evict blocks when streaming to Tachyon/hdfs
• Tachyon > Spark JVM Cache for long running jobs
• kryo/defaultCodec/sequenceFile format to minimize memory footprint
• Parquet is better! Schema and similar performance
• ramDisk vs SSD
Tachyon Internals
7
• … vs YARN
• Fine-grained vs coarse-grained Spark contexts
• *SchedulerBackend and *ExecutorBackend
• Hadoop-mesos plugin
• Master failover (zk)
• Framework starvation patch
Mesos Internals
8
Q & A
© 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this
presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided
after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

What's hot

Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
airisData
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Spark Summit
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
Elasticsearch
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Databricks
 
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachAutomated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Databricks
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
Introduction and HDInsight best practices
Introduction and HDInsight best practicesIntroduction and HDInsight best practices
Introduction and HDInsight best practices
Ashish Thapliyal
 
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
In-Memory Computing Summit
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Databricks
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Databricks
 

What's hot (20)

Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
 
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachAutomated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
Introduction and HDInsight best practices
Introduction and HDInsight best practicesIntroduction and HDInsight best practices
Introduction and HDInsight best practices
 
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
 

Viewers also liked

Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014
Claudiu Barbura
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Claudiu Barbura
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Steve Kramer
 
Glaice poster official civic
Glaice poster official civicGlaice poster official civic
Glaice poster official civic
ND PHARMA BIOTECH
 
Up opening 2014.10.19 verson1
Up opening 2014.10.19  verson1Up opening 2014.10.19  verson1
Up opening 2014.10.19 verson1
Will Lu
 
Mapa mental siomara (1)
Mapa mental siomara (1)Mapa mental siomara (1)
Mapa mental siomara (1)
siomadeluq
 
Service jam taipei 2014 note
Service jam taipei 2014 noteService jam taipei 2014 note
Service jam taipei 2014 noteWill Lu
 
Траекторія ліцеїста
Траекторія ліцеїстаТраекторія ліцеїста
Траекторія ліцеїста
Марина Семенюк
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure Batch
Khalid Salama
 
девіантна поведінка
девіантна поведінкадевіантна поведінка
девіантна поведінка
lelipusik
 
презентация ''бібліотека в житті школи'
презентация ''бібліотека в житті школи'презентация ''бібліотека в житті школи'
презентация ''бібліотека в житті школи'tsurkan
 
стаття погрібняк о.с.
стаття погрібняк о.с.стаття погрібняк о.с.
стаття погрібняк о.с.
lelipusik
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
Khalid Salama
 
Cloud Foundry Introduction for CF Meetup Tokyo March 2016
Cloud Foundry Introduction for CF Meetup Tokyo March 2016Cloud Foundry Introduction for CF Meetup Tokyo March 2016
Cloud Foundry Introduction for CF Meetup Tokyo March 2016
Tomohiro Ichimura
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud Foundry
VMware Tanzu
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
Khalid Salama
 

Viewers also liked (18)

Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
 
Glaice poster official civic
Glaice poster official civicGlaice poster official civic
Glaice poster official civic
 
Up opening 2014.10.19 verson1
Up opening 2014.10.19  verson1Up opening 2014.10.19  verson1
Up opening 2014.10.19 verson1
 
Mapa mental siomara (1)
Mapa mental siomara (1)Mapa mental siomara (1)
Mapa mental siomara (1)
 
Service jam taipei 2014 note
Service jam taipei 2014 noteService jam taipei 2014 note
Service jam taipei 2014 note
 
Траекторія ліцеїста
Траекторія ліцеїстаТраекторія ліцеїста
Траекторія ліцеїста
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure Batch
 
девіантна поведінка
девіантна поведінкадевіантна поведінка
девіантна поведінка
 
презентация ''бібліотека в житті школи'
презентация ''бібліотека в житті школи'презентация ''бібліотека в житті школи'
презентация ''бібліотека в житті школи'
 
стаття погрібняк о.с.
стаття погрібняк о.с.стаття погрібняк о.с.
стаття погрібняк о.с.
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Cloud Foundry Introduction for CF Meetup Tokyo March 2016
Cloud Foundry Introduction for CF Meetup Tokyo March 2016Cloud Foundry Introduction for CF Meetup Tokyo March 2016
Cloud Foundry Introduction for CF Meetup Tokyo March 2016
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud Foundry
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 

Similar to Spark, Tachyon and Mesos internals

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
Abhishek Gupta
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
オラクルエンジニア通信
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Iulia Emanuela Iancuta
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
In-Memory Computing Summit
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]
【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]
【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]
オラクルエンジニア通信
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
オラクルエンジニア通信
 
Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16
Comsysto Reply GmbH
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
オラクルエンジニア通信
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Databricks
 
Terraform - Taming Modern Clouds
Terraform  - Taming Modern CloudsTerraform  - Taming Modern Clouds
Terraform - Taming Modern Clouds
Nic Jackson
 
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Arun Gupta
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
Databricks
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsight
Ashish Thapliyal
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Timothy Spann
 
【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]
オラクルエンジニア通信
 

Similar to Spark, Tachyon and Mesos internals (20)

Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2020年1月版]
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]
【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]
【旧版】Oracle Cloud Infrastructure:サービス概要のご紹介 [2020年2月版]
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介
 
Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
 
Terraform - Taming Modern Clouds
Terraform  - Taming Modern CloudsTerraform  - Taming Modern Clouds
Terraform - Taming Modern Clouds
 
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
Running your Java EE 6 applications in the Cloud @ Silicon Valley Code Camp 2010
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsight
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Autonomous Database:サービス概要のご紹介 [2020年8月版]
 

Recently uploaded

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 

Recently uploaded (20)

Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 

Spark, Tachyon and Mesos internals

  • 1. 1 Spark, Tachyon & Mesos Internals Data Days Texas Jan 2015
  • 2. 2 • xPatterns architecture • Demos (data pipeline, warehouse, notebook ML, real- time apis, ops, anti-fraud GUI) • BDAS++ (jaws-spark-sql-rest, spark-job-rest) • Spark internals (count, shuffle, kryo) • Tachyon internals (apis) • Mesos internals (schedulers, framework starvation) Agenda
  • 4. 4 • Jaws, http spark sql rest service • http://github.com/Atigeo/http-spark-sql-server  Backward compatible with Shark and Spark 0.x stack • Spark Job Server  multiple Spark contexts in same JVM, job submission in Java + Scala https://github.com/Atigeo/spark-job-rest • Mesos framework starvation bug  https://github.com/Atigeo/mesos_starvation • Tachyon patch (https://github.com/amplab/tachyon/pull/482) BDAS++
  • 5. 5 • persist(OFF_HEAP) temporary storage • RDD.persist() OFF_HEAP > MEMORY_SER_AND_DISK • count() when ser/de is expensive, gc cost • Shuffle Spark vs Hadoop, file consolidation, spillage, compression • Storage vs memory fraction • Kryo • Do not set spark.executor.uri Spark Internals
  • 6. 6 • partial in-memory file storage bug • HadoopRDD vs TachyonRDD • journal file on hdfs -> backup of local master disk • hdfs api • RawTable in Shark • native API: getInStream(CACHE|NO_CACHE) -> local workers • do not evict blocks when streaming to Tachyon/hdfs • Tachyon > Spark JVM Cache for long running jobs • kryo/defaultCodec/sequenceFile format to minimize memory footprint • Parquet is better! Schema and similar performance • ramDisk vs SSD Tachyon Internals
  • 7. 7 • … vs YARN • Fine-grained vs coarse-grained Spark contexts • *SchedulerBackend and *ExecutorBackend • Hadoop-mesos plugin • Master failover (zk) • Framework starvation patch Mesos Internals
  • 9. © 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.