SlideShare a Scribd company logo
Tachyon in xPatterns 
Tachyon Meetup SF 
1 
Oct 2014
Agenda 
• xPatterns architecture 
• BDAS++ 
• Demos 
• Tachyon internals (apis) 
• Lessons learned 
2
3
BDAS++ 
• Tachyon patch (https://github.com/amplab/tachyon/pull/482) 
• Jaws, xPatterns http spark sql server http://github.com/Atigeo/http-spark-sql- 
4 
server 
 Backward compatible with Shark and Spark 0.x stack 
• Spark Job Server 
 multiple Spark contexts in same JVM, job submission in Java + Scala 
https://github.com/Atigeo/spark-job-rest 
• Mesos framework starvation bug 
 submitted patch… detailed Tech Blog link soon at http://xpatterns.com 
• *SchedulerBackend update, 0.9.0 patches (shuffle spill, Mesos fine-grained)
Demos … 
5
Lessons learned 
• partial in-memory file storage bug 
• journal file on hdfs -> backup of local master disk 
• hdfs api 
• RawTable in Shark 
• persist(OFF_HEAP) temporary storage 
• RDD.persist() OFF_HEAP > MEMORY_SER_AND_DISK 
• native API: getInStream(CACHE|NO_CACHE) -> local workers 
• do not evict blocks when streaming to Tachyon/hdfs 
• Tachyon > Spark JVM Cache for long running jobs 
• kryo/defaultCodec/sequenceFile format to minimize memory footprint 
• 25million emails/month 2TB, 3-45 nodes, 120-170GB of RAM for Tachyon 
6
Q & A 
7
© 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this 
presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided 
after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

What's hot

Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Yousun Jeong
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraUsing Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Spark Summit
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
ScyllaDB
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Databricks
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuck
Data Con LA
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Spark Summit
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Databricks
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 

What's hot (20)

Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraUsing Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuck
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 

Similar to Tachyon meetup San Francisco Oct 2014

Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann
 
Caching in applications still matters
Caching in applications still mattersCaching in applications still matters
Caching in applications still matters
Anthony Dahanne
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
In-Memory Computing Summit
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
David Buck
 
Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16
Comsysto Reply GmbH
 
Terraform - Taming Modern Clouds
Terraform  - Taming Modern CloudsTerraform  - Taming Modern Clouds
Terraform - Taming Modern Clouds
Nic Jackson
 
Share point 2013 distributed cache
Share point 2013 distributed cacheShare point 2013 distributed cache
Share point 2013 distributed cache
Michael Nokhamzon
 
Using Snap Clone with Enterprise Manager 12c
Using Snap Clone with Enterprise Manager 12cUsing Snap Clone with Enterprise Manager 12c
Using Snap Clone with Enterprise Manager 12c
Pete Sharman
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Yahoo!デベロッパーネットワーク
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Timothy Spann
 
Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014
Shay Hassidim
 
Advanced caching techniques with ehcache, big memory, terracotta, and coldfusion
Advanced caching techniques with ehcache, big memory, terracotta, and coldfusionAdvanced caching techniques with ehcache, big memory, terracotta, and coldfusion
Advanced caching techniques with ehcache, big memory, terracotta, and coldfusion
ColdFusionConference
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Chris Fregly
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
David Delabassee
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
Ashish Thapliyal
 
HotSpotコトハジメ
HotSpotコトハジメHotSpotコトハジメ
HotSpotコトハジメ
Yasumasa Suenaga
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
Abhishek Gupta
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018
CodeOps Technologies LLP
 

Similar to Tachyon meetup San Francisco Oct 2014 (20)

Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
 
Caching in applications still matters
Caching in applications still mattersCaching in applications still matters
Caching in applications still matters
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
 
Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16Caching and JCache with Greg Luck 18.02.16
Caching and JCache with Greg Luck 18.02.16
 
Terraform - Taming Modern Clouds
Terraform  - Taming Modern CloudsTerraform  - Taming Modern Clouds
Terraform - Taming Modern Clouds
 
Share point 2013 distributed cache
Share point 2013 distributed cacheShare point 2013 distributed cache
Share point 2013 distributed cache
 
Using Snap Clone with Enterprise Manager 12c
Using Snap Clone with Enterprise Manager 12cUsing Snap Clone with Enterprise Manager 12c
Using Snap Clone with Enterprise Manager 12c
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014
 
Advanced caching techniques with ehcache, big memory, terracotta, and coldfusion
Advanced caching techniques with ehcache, big memory, terracotta, and coldfusionAdvanced caching techniques with ehcache, big memory, terracotta, and coldfusion
Advanced caching techniques with ehcache, big memory, terracotta, and coldfusion
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
HotSpotコトハジメ
HotSpotコトハジメHotSpotコトハジメ
HotSpotコトハジメ
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018
 

Recently uploaded

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
IJECEIAES
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 

Recently uploaded (20)

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 

Tachyon meetup San Francisco Oct 2014

  • 1. Tachyon in xPatterns Tachyon Meetup SF 1 Oct 2014
  • 2. Agenda • xPatterns architecture • BDAS++ • Demos • Tachyon internals (apis) • Lessons learned 2
  • 3. 3
  • 4. BDAS++ • Tachyon patch (https://github.com/amplab/tachyon/pull/482) • Jaws, xPatterns http spark sql server http://github.com/Atigeo/http-spark-sql- 4 server  Backward compatible with Shark and Spark 0.x stack • Spark Job Server  multiple Spark contexts in same JVM, job submission in Java + Scala https://github.com/Atigeo/spark-job-rest • Mesos framework starvation bug  submitted patch… detailed Tech Blog link soon at http://xpatterns.com • *SchedulerBackend update, 0.9.0 patches (shuffle spill, Mesos fine-grained)
  • 6. Lessons learned • partial in-memory file storage bug • journal file on hdfs -> backup of local master disk • hdfs api • RawTable in Shark • persist(OFF_HEAP) temporary storage • RDD.persist() OFF_HEAP > MEMORY_SER_AND_DISK • native API: getInStream(CACHE|NO_CACHE) -> local workers • do not evict blocks when streaming to Tachyon/hdfs • Tachyon > Spark JVM Cache for long running jobs • kryo/defaultCodec/sequenceFile format to minimize memory footprint • 25million emails/month 2TB, 3-45 nodes, 120-170GB of RAM for Tachyon 6
  • 7. Q & A 7
  • 8. © 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Editor's Notes

  1. The physical architecture diagram for our largest customer deployment, demonstrating the enterprise-grade attributes of the platform: scalability, high availability, performance, resilience, manageability while providing means for geo-failover (warehouse), geo-replication (real-time DB), data and system monitoring, instrumentation, backup & restore. Cassandra rings are DC-replicated across EC2 east and west coast regions, data between geo-replicas synchronized in real time through an ipsec tunnel (VPC-to-VPC). Geo-replicated apis behind an AWS Route 53 DNS service (latency based resource records sets) and ELBs ensures users requests are served from the closest geographical location. Failure to an entire region (happened to us during a big conference!) does not affect our availability and SLAs. User facing dashboards are served from Cassandra (real-time store), with data being exported from a data warehouse (Shark/Hive) build on top a Mesos-managed Spark/Hadoop cluster. Export jobs are instrumented and provide a throttling mechanism to control throughput. Export jobs run on the east-coast only, data is synchronized in real time with the west coast ring. Generated apis are automatically instrumented (Graphite) and monitored (Nagios).
  2. Referral Provider Network: one of the 6 applications that we built for our healthcare customer using the xPatterns APIs and tools on the new beyond Hadoop infrastructure: ELT Pipeline, Export to NoSQL API. The dashboard for the RPN application was built using D3.js and angular against the generic api published by the export tool. The application allows for building a graph of downstream and upstream referred and referring providers, grouped by specialty and with computed aggregates like patient counts, claim counts and total charged amounts. RPN is used for both fraud detection and for aiding a clinic buying decision, by following the busiest graph paths. The dataset behind the app consists of 8 billion medical records, from which we extracted 1.7 million providers (Shark warehouse) and built 53 million relationships in the graph (persisted in Cassandra)