SlideShare a Scribd company logo
Distributed Systems from
Scratch - Part 1
Motivation and Introduction to Apache Mesos
https://github.com/phatak-dev/distributedsystems
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Idea
● Motivation
● Architecture of existing big data system
● What we want to build?
● Introduction to Apache Mesos
● Distributed Shell
● Function API
● Custom executor
Idea
“What it takes to build a
distributed processing system
like Spark?”
Motivation
● First version of Spark only had 1600 lines of Scala code
● Had all basic pieces of RDD and ability to run
distributed system using Mesos
● Recreating the same code with step by step
understanding
● Ample of time in hand
Distributed systems from 30000ft
Distributed Storage(HDFS/S3)
Distributed Cluster management
(YARN/Mesos)
Distributed Processing Systems
(Spark/MapReduce)
Data Applications
Standardization of frameworks
● Building a distributed processing system is like building
a web framework
● Already we have excellent underneath frameworks like
YARN,Mesos for cluster management and HDFS for
distributed storage
● We can build on these frameworks rather than trying to
do everything from scratch
● Most of third generation systems like Spark, Flink do the
same
Conventional wisdom
● To build distributed system you need to read complex
papers
● Understand the details of how distribution is done using
different protocols
● Need to care about complexities of concurrency ,
locking etc
● Need to do everything from scratch
Modern wisdom
● Read spark code to understand how to build a
distributed processing system
● Use Apache Mesos and YARN to tedious cluster
resource management
● Use AKKA to do distributed concurrency
● Use excellent proven frameworks rather inventing your
own
Why this talk in Spark meetup?
YARN/Mesos
Applications Experience sharing
Introduction sessions
Anatomy Sessions
Spark on YARN
Spark
Runtime
Data abstraction( RDD/ Dataframe)
API’s
Top down
approach
Top down approach
● We started discussing Spark API’s about using
introductory sessions like Spark batch, Spark streaming
● Once we understood the basic API’s, we have
discussed different abstraction layers like RDD,
Dataframe in our anatomy sessions
● We have also talked about spark runtime like data
sources in one of our anatomy session
● Last meetup we discussed cluster management in
session Spark on YARN
Bottom up approach
● Start at the cluster management layer using mesos and
YARN
● Build
○ Runtime
○ Abstractions
○ API’s
● Build application using our own abstractions and
runtime
● Use all we learnt in our top down approach
Design
● Heavily influenced by the way Apache Spark is built
● Lot of code and design comes from Spark code
● No dependency on the spark itself
● Only implements very basic distributed processing
pieces
● Make it work on Apache mesos and Apache YARN
● Process oriented not data oriented
Spark at it’s birth - 2010
● Only 1600 lines of Scala code
● Used Apache Mesos for cluster management
● Used Mesos messaging API for concurrency
management (no AKKA)
● Used scala functions as processing abstraction rather
than DAG
● No optimizations
Steps to get there
● Learn Apache Mesos
● Implement a simple hello world on Mesos
● Implement simple function oriented API on mesos
● Support third party libraries
● Support shuffle
● Support aggregations and counters
● Implement similar functionality on YARN
Apache Mesos
● Apache mesos is an open source cluster manager
● It "provides efficient resource isolation and sharing
across distributed applications, or frameworks
● Built at UC Berkeley
● YARN ideas are inspired by Mesos
● Written in C++
● Uses linux cgroups (aka Docker) for resource isolation
Why Mesos?
● Abstracts out the managing resources from processing
application
● Handles cluster setup and management
● With help of zookeeper, can provide master fault
tolerance
● Modular and simple API
● Supports different distributed processing systems on the
same cluster
● Provides API’s in multiple languages like C++,Java
Architecture of Mesos
Mesos Master
Mesos slave Mesos slave Mesos slave
Hadoop
Scheduler
Spark Scheduler
Hadoop
Executor
Spark
Executor
Custom
Framework
Custom
executor
Frameworks
Architecture of Mesos
● Mesos master - Single master node of the mesos
cluster. Entry point to any mesos application.
● Mesos slaves - Each machine in cluster runs mesos
slave which is responsible for running tasks
● Framework - Distributed Application build using Apache
Mesos API
○ Scheduler - Entrypoint to framework. Responsible
for launching tasks
○ Executor - Runs actual tasks on mesos slaves
Starting mesos
● Starting master
bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos
● Starting slave
bin/mesos-slave.sh --master=127.0.0.1:5050
● Accessing UI
http://127.0.0.1:5050
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
Hello world on Mesos
● Run a simple shell command in each mesos slave
● We create our own framework which is capable of
running shell commands
● Our framework should these three following
components
○ Client
○ Scheduler
○ Executor
Client
● Code that submits the tasks to the framework
● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources.
● It’s similar to driver program in Spark
● It create an instance of the framework and submits to
mesos driver
● Mesos uses protocol buffer for serialization
● Example code
DistributedShell.scala
Scheduler
● Every framework in the apache mesos, should extend
the scheduler interface
● Scheduler is the entry point for our custom framework
● It’s similar to Sparkcontext
● We need to override
○ resourceoffers
● It acts like Application master from the YARN
Offers
● Each resource in the mesos is offered as the offer
● Whenever there is resource (disk,memory and cpu)
mesos offers it to all the frameworks running on it
● A framework can accept the offer and use it for running
it’s own tasks
● Once execution is done, it can release that resource so
that mesos can offer to other framework
● Quite different than the YARN model
Executor
● Once a framework receives the offer, it has to specify
the executor which actually run a piece of code on work
nodes
● Executor sets up environment to run each task given by
client
● Scheduler uses this executor to run each task
● In our distributed shell example, we use the default
executor provided by the mesos
Task
● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources.
● It’s basic unit of computation of processing on mesos
● It has
○ Id
○ Offer (resources)
○ Executor
○ Slave Id - machine on which it’s has to run
Scala Scheduler example
Running hello world
● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.
mesos.helloworld.DistributedShell "/bin/echo hello"
● Mesos needs the it’s library *.so files in the classpath to
connect to the mesos cluster
● Once execution is done, we can look at the all tasks ran
for a given framework from mesos UI
● Let’s look the ones for our distributed shell application
Custom executor
● In last example, we ran shell commands
● What if we want to run some custom code which is of
the type of Java/Scala?
● We need to define our own executor which setups the
environment to run the code rather than using the built
in command executor
● Executors are the way mesos supports the ability
different language frameworks on same cluster
Defining function task API
● We are going to define an abstraction of tasks which
wraps a simple scala function
● This allows to run any given pure scala function on large
cluster
● This is the spark started to support distributed
processing for it’s rdd in the initial implementation
● This task will extend the serializable which allows us to
serialize the function over network
● Example : Task.scala
Task scheduler
● Similar to earlier scheduler but uses custom executor
rather default one
● Creates the TaskInfo object which contains
○ Offer
○ Executor
○ Serialized function as data
● getExecutorInfo uses custom script to launch our own
TaskExecutor
● TaskScheduler.scala
Task executor
● Task executor is our custom executor which is capable
of running our function tasks
● It creates an instance of mesos executor and overrides
launchTask
● It deserializes the task from the task info object which
was sent by the task scheduler
● Once it deserializes the object, it runs that function in
that machine
● Example : TaskExecutor.scala
CustomTasks
● Once we everything in place, we can run any scala
function in the distributed manner now.
● We can create different kind of scala functions and wrap
inside our function task abstraction
● In our client, we create multiple tasks and submit to the
task scheduler
● Observe that the API also supports the closures
● Example : CustomTasks.scala
Running custom executor
● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.
mesos.customexecutor.CustomTasks localhost:5050
/home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resou
rces/run-executor.sh
● We are passing the script which has the environment to launch our custom
executor
● In our example, we are using local file system. You can use the hdfs for the
same
References
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
● http://blog.madhukaraphatak.com/mesos-helloworld-
scala/
● http://blog.madhukaraphatak.com/custom-mesos-
executor-scala/

More Related Content

What's hot

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
datamantra
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
datamantra
 

What's hot (20)

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 

Viewers also liked

Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
Alex Payne
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
Murali Iyengar
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Aysylu Greenberg
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
Tudor Lapusan
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
Yiguang Hu
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
Krishna-Kumar
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
봉조 김
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
봉조 김
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
봉조 김
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
봉조 김
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
봉조 김
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
Taposh Roy
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with Spark
Sylvain Zimmer
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
Freeman Zhang
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
datamantra
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
jeykottalam
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
datamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
datamantra
 

Viewers also liked (20)

Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
 
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)Building A Distributed Build System at Google Scale (StrangeLoop 2016)
Building A Distributed Build System at Google Scale (StrangeLoop 2016)
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
IoT 공통 보안가이드
IoT 공통 보안가이드IoT 공통 보안가이드
IoT 공통 보안가이드
 
(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회(2016 08-02) 멘토스성과발표간담회
(2016 08-02) 멘토스성과발표간담회
 
4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서4.16세월호참사 특별조사위원회 중간점검보고서
4.16세월호참사 특별조사위원회 중간점검보고서
 
2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료2015개정교육과정질의 응답자료
2015개정교육과정질의 응답자료
 
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 24.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
4.16세월호참사 특별조사위원회 제3차 청문회 자료집 3차 청문회 자료집(최종) 2
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with Spark
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 

Similar to Building Distributed Systems from Scratch - Part 1

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
Martin Zapletal
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
Anant Corporation
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
 
Internals
InternalsInternals
Internals
Sandeep Purohit
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
Knoldus Inc.
 
internals
internalsinternals
internals
Sandeep Purohit
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Anant Corporation
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
Yulian Slobodyan
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
ShidrokhGoudarzi1
 
Docker, Mesos, Spark
Docker, Mesos, Spark Docker, Mesos, Spark
Docker, Mesos, Spark
Qiang Wang
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Modern web technologies
Modern web technologiesModern web technologies
Modern web technologies
Simeon Prusiyski
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
Jyotasana Bharti
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
Demi Ben-Ari
 

Similar to Building Distributed Systems from Scratch - Part 1 (20)

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Internals
InternalsInternals
Internals
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
internals
internalsinternals
internals
 
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
Apache Cassandra Lunch #41: Cassandra on Kubernetes - Docker/Kubernetes/Helm ...
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Docker, Mesos, Spark
Docker, Mesos, Spark Docker, Mesos, Spark
Docker, Mesos, Spark
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Modern web technologies
Modern web technologiesModern web technologies
Modern web technologies
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Spark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computingSpark 101 - First steps to distributed computing
Spark 101 - First steps to distributed computing
 

More from datamantra

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
datamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
datamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
datamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
datamantra
 

More from datamantra (13)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 

Recently uploaded

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Building Distributed Systems from Scratch - Part 1

  • 1. Distributed Systems from Scratch - Part 1 Motivation and Introduction to Apache Mesos https://github.com/phatak-dev/distributedsystems
  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Idea ● Motivation ● Architecture of existing big data system ● What we want to build? ● Introduction to Apache Mesos ● Distributed Shell ● Function API ● Custom executor
  • 4. Idea “What it takes to build a distributed processing system like Spark?”
  • 5. Motivation ● First version of Spark only had 1600 lines of Scala code ● Had all basic pieces of RDD and ability to run distributed system using Mesos ● Recreating the same code with step by step understanding ● Ample of time in hand
  • 6. Distributed systems from 30000ft Distributed Storage(HDFS/S3) Distributed Cluster management (YARN/Mesos) Distributed Processing Systems (Spark/MapReduce) Data Applications
  • 7. Standardization of frameworks ● Building a distributed processing system is like building a web framework ● Already we have excellent underneath frameworks like YARN,Mesos for cluster management and HDFS for distributed storage ● We can build on these frameworks rather than trying to do everything from scratch ● Most of third generation systems like Spark, Flink do the same
  • 8. Conventional wisdom ● To build distributed system you need to read complex papers ● Understand the details of how distribution is done using different protocols ● Need to care about complexities of concurrency , locking etc ● Need to do everything from scratch
  • 9. Modern wisdom ● Read spark code to understand how to build a distributed processing system ● Use Apache Mesos and YARN to tedious cluster resource management ● Use AKKA to do distributed concurrency ● Use excellent proven frameworks rather inventing your own
  • 10. Why this talk in Spark meetup? YARN/Mesos Applications Experience sharing Introduction sessions Anatomy Sessions Spark on YARN Spark Runtime Data abstraction( RDD/ Dataframe) API’s Top down approach
  • 11. Top down approach ● We started discussing Spark API’s about using introductory sessions like Spark batch, Spark streaming ● Once we understood the basic API’s, we have discussed different abstraction layers like RDD, Dataframe in our anatomy sessions ● We have also talked about spark runtime like data sources in one of our anatomy session ● Last meetup we discussed cluster management in session Spark on YARN
  • 12. Bottom up approach ● Start at the cluster management layer using mesos and YARN ● Build ○ Runtime ○ Abstractions ○ API’s ● Build application using our own abstractions and runtime ● Use all we learnt in our top down approach
  • 13. Design ● Heavily influenced by the way Apache Spark is built ● Lot of code and design comes from Spark code ● No dependency on the spark itself ● Only implements very basic distributed processing pieces ● Make it work on Apache mesos and Apache YARN ● Process oriented not data oriented
  • 14. Spark at it’s birth - 2010 ● Only 1600 lines of Scala code ● Used Apache Mesos for cluster management ● Used Mesos messaging API for concurrency management (no AKKA) ● Used scala functions as processing abstraction rather than DAG ● No optimizations
  • 15. Steps to get there ● Learn Apache Mesos ● Implement a simple hello world on Mesos ● Implement simple function oriented API on mesos ● Support third party libraries ● Support shuffle ● Support aggregations and counters ● Implement similar functionality on YARN
  • 16. Apache Mesos ● Apache mesos is an open source cluster manager ● It "provides efficient resource isolation and sharing across distributed applications, or frameworks ● Built at UC Berkeley ● YARN ideas are inspired by Mesos ● Written in C++ ● Uses linux cgroups (aka Docker) for resource isolation
  • 17. Why Mesos? ● Abstracts out the managing resources from processing application ● Handles cluster setup and management ● With help of zookeeper, can provide master fault tolerance ● Modular and simple API ● Supports different distributed processing systems on the same cluster ● Provides API’s in multiple languages like C++,Java
  • 18. Architecture of Mesos Mesos Master Mesos slave Mesos slave Mesos slave Hadoop Scheduler Spark Scheduler Hadoop Executor Spark Executor Custom Framework Custom executor Frameworks
  • 19. Architecture of Mesos ● Mesos master - Single master node of the mesos cluster. Entry point to any mesos application. ● Mesos slaves - Each machine in cluster runs mesos slave which is responsible for running tasks ● Framework - Distributed Application build using Apache Mesos API ○ Scheduler - Entrypoint to framework. Responsible for launching tasks ○ Executor - Runs actual tasks on mesos slaves
  • 20. Starting mesos ● Starting master bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos ● Starting slave bin/mesos-slave.sh --master=127.0.0.1:5050 ● Accessing UI http://127.0.0.1:5050 ● http://blog.madhukaraphatak.com/mesos-single-node- setup-ubuntu/
  • 21. Hello world on Mesos ● Run a simple shell command in each mesos slave ● We create our own framework which is capable of running shell commands ● Our framework should these three following components ○ Client ○ Scheduler ○ Executor
  • 22. Client ● Code that submits the tasks to the framework ● Task is an abstraction used by mesos to indicate any piece of work which takes some resources. ● It’s similar to driver program in Spark ● It create an instance of the framework and submits to mesos driver ● Mesos uses protocol buffer for serialization ● Example code DistributedShell.scala
  • 23. Scheduler ● Every framework in the apache mesos, should extend the scheduler interface ● Scheduler is the entry point for our custom framework ● It’s similar to Sparkcontext ● We need to override ○ resourceoffers ● It acts like Application master from the YARN
  • 24. Offers ● Each resource in the mesos is offered as the offer ● Whenever there is resource (disk,memory and cpu) mesos offers it to all the frameworks running on it ● A framework can accept the offer and use it for running it’s own tasks ● Once execution is done, it can release that resource so that mesos can offer to other framework ● Quite different than the YARN model
  • 25. Executor ● Once a framework receives the offer, it has to specify the executor which actually run a piece of code on work nodes ● Executor sets up environment to run each task given by client ● Scheduler uses this executor to run each task ● In our distributed shell example, we use the default executor provided by the mesos
  • 26. Task ● Task is an abstraction used by mesos to indicate any piece of work which takes some resources. ● It’s basic unit of computation of processing on mesos ● It has ○ Id ○ Offer (resources) ○ Executor ○ Slave Id - machine on which it’s has to run
  • 28. Running hello world ● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar - Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak. mesos.helloworld.DistributedShell "/bin/echo hello" ● Mesos needs the it’s library *.so files in the classpath to connect to the mesos cluster ● Once execution is done, we can look at the all tasks ran for a given framework from mesos UI ● Let’s look the ones for our distributed shell application
  • 29. Custom executor ● In last example, we ran shell commands ● What if we want to run some custom code which is of the type of Java/Scala? ● We need to define our own executor which setups the environment to run the code rather than using the built in command executor ● Executors are the way mesos supports the ability different language frameworks on same cluster
  • 30. Defining function task API ● We are going to define an abstraction of tasks which wraps a simple scala function ● This allows to run any given pure scala function on large cluster ● This is the spark started to support distributed processing for it’s rdd in the initial implementation ● This task will extend the serializable which allows us to serialize the function over network ● Example : Task.scala
  • 31. Task scheduler ● Similar to earlier scheduler but uses custom executor rather default one ● Creates the TaskInfo object which contains ○ Offer ○ Executor ○ Serialized function as data ● getExecutorInfo uses custom script to launch our own TaskExecutor ● TaskScheduler.scala
  • 32. Task executor ● Task executor is our custom executor which is capable of running our function tasks ● It creates an instance of mesos executor and overrides launchTask ● It deserializes the task from the task info object which was sent by the task scheduler ● Once it deserializes the object, it runs that function in that machine ● Example : TaskExecutor.scala
  • 33. CustomTasks ● Once we everything in place, we can run any scala function in the distributed manner now. ● We can create different kind of scala functions and wrap inside our function task abstraction ● In our client, we create multiple tasks and submit to the task scheduler ● Observe that the API also supports the closures ● Example : CustomTasks.scala
  • 34. Running custom executor ● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar - Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak. mesos.customexecutor.CustomTasks localhost:5050 /home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resou rces/run-executor.sh ● We are passing the script which has the environment to launch our custom executor ● In our example, we are using local file system. You can use the hdfs for the same