SlideShare a Scribd company logo
Distributed Systems from
Scratch - Part 1
Motivation and Introduction to Apache Mesos
https://github.com/phatak-dev/distributedsystems
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Idea
● Motivation
● Architecture of existing big data system
● What we want to build?
● Introduction to Apache Mesos
● Distributed Shell
● Function API
● Custom executor
Idea
“What it takes to build a
distributed processing system
like Spark?”
Motivation
● First version of Spark only had 1600 lines of Scala code
● Had all basic pieces of RDD and ability to run
distributed system using Mesos
● Recreating the same code with step by step
understanding
● Ample of time in hand
Distributed systems from 30000ft
Distributed Storage(HDFS/S3)
Distributed Cluster management
(YARN/Mesos)
Distributed Processing Systems
(Spark/MapReduce)
Data Applications
Standardization of frameworks
● Building a distributed processing system is like building
a web framework
● Already we have excellent underneath frameworks like
YARN,Mesos for cluster management and HDFS for
distributed storage
● We can build on these frameworks rather than trying to
do everything from scratch
● Most of third generation systems like Spark, Flink do the
same
Conventional wisdom
● To build distributed system you need to read complex
papers
● Understand the details of how distribution is done using
different protocols
● Need to care about complexities of concurrency ,
locking etc
● Need to do everything from scratch
Modern wisdom
● Read spark code to understand how to build a
distributed processing system
● Use Apache Mesos and YARN to tedious cluster
resource management
● Use AKKA to do distributed concurrency
● Use excellent proven frameworks rather inventing your
own
Why this talk in Spark meetup?
YARN/Mesos
Applications Experience sharing
Introduction sessions
Anatomy Sessions
Spark on YARN
Spark
Runtime
Data abstraction( RDD/ Dataframe)
API’s
Top down
approach
Top down approach
● We started discussing Spark API’s about using
introductory sessions like Spark batch, Spark streaming
● Once we understood the basic API’s, we have
discussed different abstraction layers like RDD,
Dataframe in our anatomy sessions
● We have also talked about spark runtime like data
sources in one of our anatomy session
● Last meetup we discussed cluster management in
session Spark on YARN
Bottom up approach
● Start at the cluster management layer using mesos and
YARN
● Build
○ Runtime
○ Abstractions
○ API’s
● Build application using our own abstractions and
runtime
● Use all we learnt in our top down approach
Design
● Heavily influenced by the way Apache Spark is built
● Lot of code and design comes from Spark code
● No dependency on the spark itself
● Only implements very basic distributed processing
pieces
● Make it work on Apache mesos and Apache YARN
● Process oriented not data oriented
Spark at it’s birth - 2010
● Only 1600 lines of Scala code
● Used Apache Mesos for cluster management
● Used Mesos messaging API for concurrency
management (no AKKA)
● Used scala functions as processing abstraction rather
than DAG
● No optimizations
Steps to get there
● Learn Apache Mesos
● Implement a simple hello world on Mesos
● Implement simple function oriented API on mesos
● Support third party libraries
● Support shuffle
● Support aggregations and counters
● Implement similar functionality on YARN
Apache Mesos
● Apache mesos is an open source cluster manager
● It "provides efficient resource isolation and sharing
across distributed applications, or frameworks
● Built at UC Berkeley
● YARN ideas are inspired by Mesos
● Written in C++
● Uses linux cgroups (aka Docker) for resource isolation
Why Mesos?
● Abstracts out the managing resources from processing
application
● Handles cluster setup and management
● With help of zookeeper, can provide master fault
tolerance
● Modular and simple API
● Supports different distributed processing systems on the
same cluster
● Provides API’s in multiple languages like C++,Java
Architecture of Mesos
Mesos Master
Mesos slave Mesos slave Mesos slave
Hadoop
Scheduler
Spark Scheduler
Hadoop
Executor
Spark
Executor
Custom
Framework
Custom
executor
Frameworks
Architecture of Mesos
● Mesos master - Single master node of the mesos
cluster. Entry point to any mesos application.
● Mesos slaves - Each machine in cluster runs mesos
slave which is responsible for running tasks
● Framework - Distributed Application build using Apache
Mesos API
○ Scheduler - Entrypoint to framework. Responsible
for launching tasks
○ Executor - Runs actual tasks on mesos slaves
Starting mesos
● Starting master
bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos
● Starting slave
bin/mesos-slave.sh --master=127.0.0.1:5050
● Accessing UI
http://127.0.0.1:5050
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
Hello world on Mesos
● Run a simple shell command in each mesos slave
● We create our own framework which is capable of
running shell commands
● Our framework should these three following
components
○ Client
○ Scheduler
○ Executor
Client
● Code that submits the tasks to the framework
● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources.
● It’s similar to driver program in Spark
● It create an instance of the framework and submits to
mesos driver
● Mesos uses protocol buffer for serialization
● Example code
DistributedShell.scala
Scheduler
● Every framework in the apache mesos, should extend
the scheduler interface
● Scheduler is the entry point for our custom framework
● It’s similar to Sparkcontext
● We need to override
○ resourceoffers
● It acts like Application master from the YARN
Offers
● Each resource in the mesos is offered as the offer
● Whenever there is resource (disk,memory and cpu)
mesos offers it to all the frameworks running on it
● A framework can accept the offer and use it for running
it’s own tasks
● Once execution is done, it can release that resource so
that mesos can offer to other framework
● Quite different than the YARN model
Executor
● Once a framework receives the offer, it has to specify
the executor which actually run a piece of code on work
nodes
● Executor sets up environment to run each task given by
client
● Scheduler uses this executor to run each task
● In our distributed shell example, we use the default
executor provided by the mesos
Task
● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources.
● It’s basic unit of computation of processing on mesos
● It has
○ Id
○ Offer (resources)
○ Executor
○ Slave Id - machine on which it’s has to run
Scala Scheduler example
Running hello world
● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.
mesos.helloworld.DistributedShell "/bin/echo hello"
● Mesos needs the it’s library *.so files in the classpath to
connect to the mesos cluster
● Once execution is done, we can look at the all tasks ran
for a given framework from mesos UI
● Let’s look the ones for our distributed shell application
Custom executor
● In last example, we ran shell commands
● What if we want to run some custom code which is of
the type of Java/Scala?
● We need to define our own executor which setups the
environment to run the code rather than using the built
in command executor
● Executors are the way mesos supports the ability
different language frameworks on same cluster
Defining function task API
● We are going to define an abstraction of tasks which
wraps a simple scala function
● This allows to run any given pure scala function on large
cluster
● This is the spark started to support distributed
processing for it’s rdd in the initial implementation
● This task will extend the serializable which allows us to
serialize the function over network
● Example : Task.scala
Task scheduler
● Similar to earlier scheduler but uses custom executor
rather default one
● Creates the TaskInfo object which contains
○ Offer
○ Executor
○ Serialized function as data
● getExecutorInfo uses custom script to launch our own
TaskExecutor
● TaskScheduler.scala
Task executor
● Task executor is our custom executor which is capable
of running our function tasks
● It creates an instance of mesos executor and overrides
launchTask
● It deserializes the task from the task info object which
was sent by the task scheduler
● Once it deserializes the object, it runs that function in
that machine
● Example : TaskExecutor.scala
CustomTasks
● Once we everything in place, we can run any scala
function in the distributed manner now.
● We can create different kind of scala functions and wrap
inside our function task abstraction
● In our client, we create multiple tasks and submit to the
task scheduler
● Observe that the API also supports the closures
● Example : CustomTasks.scala
Running custom executor
● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.
mesos.customexecutor.CustomTasks localhost:5050
/home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resou
rces/run-executor.sh
● We are passing the script which has the environment to launch our custom
executor
● In our example, we are using local file system. You can use the hdfs for the
same
References
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
● http://blog.madhukaraphatak.com/mesos-helloworld-
scala/
● http://blog.madhukaraphatak.com/custom-mesos-
executor-scala/

More Related Content

What's hot

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
datamantra
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
datamantra
 

What's hot (20)

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 

Viewers also liked

Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
Alex Payne
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
Murali Iyengar
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Aysylu Greenberg
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
Tudor Lapusan
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
Krishna-Kumar
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
봉조 김
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
봉조 김
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
봉조 김
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
봉조 김
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
봉조 김
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
Taposh Roy
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with Spark
Sylvain Zimmer
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
Freeman Zhang
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
datamantra
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
jeykottalam
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
datamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
datamantra
 

Viewers also liked (20)

Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with Spark
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 

Similar to Building Distributed Systems from Scratch - Part 1

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
Martin Zapletal
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
Anant Corporation
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
 
Internals
InternalsInternals
Internals
Sandeep Purohit
 
internals
internalsinternals
internals
Sandeep Purohit
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
Knoldus Inc.
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Anant Corporation
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
Yulian Slobodyan
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
ShidrokhGoudarzi1
 
Docker, Mesos, Spark
Docker, Mesos, Spark Docker, Mesos, Spark
Docker, Mesos, Spark
Qiang Wang
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Modern web technologies
Modern web technologiesModern web technologies
Modern web technologies
Simeon Prusiyski
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptxCLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
bhuvankumar3877
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
Jyotasana Bharti
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 

Similar to Building Distributed Systems from Scratch - Part 1 (20)

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Internals
InternalsInternals
Internals
 
internals
internalsinternals
internals
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Docker, Mesos, Spark
Docker, Mesos, Spark Docker, Mesos, Spark
Docker, Mesos, Spark
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Modern web technologies
Modern web technologiesModern web technologies
Modern web technologies
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptxCLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 

More from datamantra

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
datamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
datamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
datamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
datamantra
 

More from datamantra (13)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 

Recently uploaded

BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
kinni singh$A17
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
PyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache ArrowPyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache Arrow
Uwe Korn
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
Grant McAlister
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
kuldeepsharmaks8120
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
norina2645
 
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
weiwchu
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
Ak47
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Alexander Teggin
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
BaraDaniel1
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 

Recently uploaded (20)

BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
PyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache ArrowPyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache Arrow
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
 
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
VVIP Girls Call Noida 9873940964 Provide Best And Top Girl Service And No1 in...
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 

Building Distributed Systems from Scratch - Part 1

  • 1. Distributed Systems from Scratch - Part 1 Motivation and Introduction to Apache Mesos https://github.com/phatak-dev/distributedsystems
  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Idea ● Motivation ● Architecture of existing big data system ● What we want to build? ● Introduction to Apache Mesos ● Distributed Shell ● Function API ● Custom executor
  • 4. Idea “What it takes to build a distributed processing system like Spark?”
  • 5. Motivation ● First version of Spark only had 1600 lines of Scala code ● Had all basic pieces of RDD and ability to run distributed system using Mesos ● Recreating the same code with step by step understanding ● Ample of time in hand
  • 6. Distributed systems from 30000ft Distributed Storage(HDFS/S3) Distributed Cluster management (YARN/Mesos) Distributed Processing Systems (Spark/MapReduce) Data Applications
  • 7. Standardization of frameworks ● Building a distributed processing system is like building a web framework ● Already we have excellent underneath frameworks like YARN,Mesos for cluster management and HDFS for distributed storage ● We can build on these frameworks rather than trying to do everything from scratch ● Most of third generation systems like Spark, Flink do the same
  • 8. Conventional wisdom ● To build distributed system you need to read complex papers ● Understand the details of how distribution is done using different protocols ● Need to care about complexities of concurrency , locking etc ● Need to do everything from scratch
  • 9. Modern wisdom ● Read spark code to understand how to build a distributed processing system ● Use Apache Mesos and YARN to tedious cluster resource management ● Use AKKA to do distributed concurrency ● Use excellent proven frameworks rather inventing your own
  • 10. Why this talk in Spark meetup? YARN/Mesos Applications Experience sharing Introduction sessions Anatomy Sessions Spark on YARN Spark Runtime Data abstraction( RDD/ Dataframe) API’s Top down approach
  • 11. Top down approach ● We started discussing Spark API’s about using introductory sessions like Spark batch, Spark streaming ● Once we understood the basic API’s, we have discussed different abstraction layers like RDD, Dataframe in our anatomy sessions ● We have also talked about spark runtime like data sources in one of our anatomy session ● Last meetup we discussed cluster management in session Spark on YARN
  • 12. Bottom up approach ● Start at the cluster management layer using mesos and YARN ● Build ○ Runtime ○ Abstractions ○ API’s ● Build application using our own abstractions and runtime ● Use all we learnt in our top down approach
  • 13. Design ● Heavily influenced by the way Apache Spark is built ● Lot of code and design comes from Spark code ● No dependency on the spark itself ● Only implements very basic distributed processing pieces ● Make it work on Apache mesos and Apache YARN ● Process oriented not data oriented
  • 14. Spark at it’s birth - 2010 ● Only 1600 lines of Scala code ● Used Apache Mesos for cluster management ● Used Mesos messaging API for concurrency management (no AKKA) ● Used scala functions as processing abstraction rather than DAG ● No optimizations
  • 15. Steps to get there ● Learn Apache Mesos ● Implement a simple hello world on Mesos ● Implement simple function oriented API on mesos ● Support third party libraries ● Support shuffle ● Support aggregations and counters ● Implement similar functionality on YARN
  • 16. Apache Mesos ● Apache mesos is an open source cluster manager ● It "provides efficient resource isolation and sharing across distributed applications, or frameworks ● Built at UC Berkeley ● YARN ideas are inspired by Mesos ● Written in C++ ● Uses linux cgroups (aka Docker) for resource isolation
  • 17. Why Mesos? ● Abstracts out the managing resources from processing application ● Handles cluster setup and management ● With help of zookeeper, can provide master fault tolerance ● Modular and simple API ● Supports different distributed processing systems on the same cluster ● Provides API’s in multiple languages like C++,Java
  • 18. Architecture of Mesos Mesos Master Mesos slave Mesos slave Mesos slave Hadoop Scheduler Spark Scheduler Hadoop Executor Spark Executor Custom Framework Custom executor Frameworks
  • 19. Architecture of Mesos ● Mesos master - Single master node of the mesos cluster. Entry point to any mesos application. ● Mesos slaves - Each machine in cluster runs mesos slave which is responsible for running tasks ● Framework - Distributed Application build using Apache Mesos API ○ Scheduler - Entrypoint to framework. Responsible for launching tasks ○ Executor - Runs actual tasks on mesos slaves
  • 20. Starting mesos ● Starting master bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos ● Starting slave bin/mesos-slave.sh --master=127.0.0.1:5050 ● Accessing UI http://127.0.0.1:5050 ● http://blog.madhukaraphatak.com/mesos-single-node- setup-ubuntu/
  • 21. Hello world on Mesos ● Run a simple shell command in each mesos slave ● We create our own framework which is capable of running shell commands ● Our framework should these three following components ○ Client ○ Scheduler ○ Executor
  • 22. Client ● Code that submits the tasks to the framework ● Task is an abstraction used by mesos to indicate any piece of work which takes some resources. ● It’s similar to driver program in Spark ● It create an instance of the framework and submits to mesos driver ● Mesos uses protocol buffer for serialization ● Example code DistributedShell.scala
  • 23. Scheduler ● Every framework in the apache mesos, should extend the scheduler interface ● Scheduler is the entry point for our custom framework ● It’s similar to Sparkcontext ● We need to override ○ resourceoffers ● It acts like Application master from the YARN
  • 24. Offers ● Each resource in the mesos is offered as the offer ● Whenever there is resource (disk,memory and cpu) mesos offers it to all the frameworks running on it ● A framework can accept the offer and use it for running it’s own tasks ● Once execution is done, it can release that resource so that mesos can offer to other framework ● Quite different than the YARN model
  • 25. Executor ● Once a framework receives the offer, it has to specify the executor which actually run a piece of code on work nodes ● Executor sets up environment to run each task given by client ● Scheduler uses this executor to run each task ● In our distributed shell example, we use the default executor provided by the mesos
  • 26. Task ● Task is an abstraction used by mesos to indicate any piece of work which takes some resources. ● It’s basic unit of computation of processing on mesos ● It has ○ Id ○ Offer (resources) ○ Executor ○ Slave Id - machine on which it’s has to run
  • 28. Running hello world ● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar - Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak. mesos.helloworld.DistributedShell "/bin/echo hello" ● Mesos needs the it’s library *.so files in the classpath to connect to the mesos cluster ● Once execution is done, we can look at the all tasks ran for a given framework from mesos UI ● Let’s look the ones for our distributed shell application
  • 29. Custom executor ● In last example, we ran shell commands ● What if we want to run some custom code which is of the type of Java/Scala? ● We need to define our own executor which setups the environment to run the code rather than using the built in command executor ● Executors are the way mesos supports the ability different language frameworks on same cluster
  • 30. Defining function task API ● We are going to define an abstraction of tasks which wraps a simple scala function ● This allows to run any given pure scala function on large cluster ● This is the spark started to support distributed processing for it’s rdd in the initial implementation ● This task will extend the serializable which allows us to serialize the function over network ● Example : Task.scala
  • 31. Task scheduler ● Similar to earlier scheduler but uses custom executor rather default one ● Creates the TaskInfo object which contains ○ Offer ○ Executor ○ Serialized function as data ● getExecutorInfo uses custom script to launch our own TaskExecutor ● TaskScheduler.scala
  • 32. Task executor ● Task executor is our custom executor which is capable of running our function tasks ● It creates an instance of mesos executor and overrides launchTask ● It deserializes the task from the task info object which was sent by the task scheduler ● Once it deserializes the object, it runs that function in that machine ● Example : TaskExecutor.scala
  • 33. CustomTasks ● Once we everything in place, we can run any scala function in the distributed manner now. ● We can create different kind of scala functions and wrap inside our function task abstraction ● In our client, we create multiple tasks and submit to the task scheduler ● Observe that the API also supports the closures ● Example : CustomTasks.scala
  • 34. Running custom executor ● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar - Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak. mesos.customexecutor.CustomTasks localhost:5050 /home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resou rces/run-executor.sh ● We are passing the script which has the environment to launch our custom executor ● In our example, we are using local file system. You can use the hdfs for the same