SlideShare a Scribd company logo
1 of 5
Download to read offline
COLLEGE OF COMPUTING, GEORGIA INSTITUTE OF TECHNOLOGY
Workshop 8/Systems Workshop 3:
Worker Task Execution
In this module of the class, you are going to implement the required code to
execute the map and reduce tasks on the worker. Use the code created in the
previous workshop as a base for the implementation.
1 EXPECTED OUTCOME
The student is going to:
• Deploy MapReduce applications running in Kubernetes to Azure
• Develop An HTTP user interface for submitting input files and mapper and reducer
functions to our system.
• Create working mapper and reducer functions that execute user-submitted Python code
inside the worker nodes.
• Design and implement the interfaces and functionalities for the execution of the map
and reduce phases in the workers.
2 ASSUMPTIONS
This workshop assumes that the student had successfully completed all the previous work-
shops on this module; and the corresponding assumptions for those workshops.
1
3 DOWNLOAD RELEVANT ENVIRONMENT TOOLS
Install the Azure CLI.
4 SPECIFICATION
Your MapReduce implementation should be able to:
• Deploy your MapReduce cluster to Azure Kubernetes Service(AKS)
• Execute map/reduce phase in any worker
• After executing the map phase, sort the mapper result in place and store it in the corre-
sponding location.
• Store the required information in the master to be able to fetch the required <key,value>
pairs to execute the Reduce phase in the corresponding worker.
• Store the final results into Azure Blob, you should be able to use this data as an input for
a pipelined map/reduce computation.
5 IMPLEMENTATION
5.1 DEPLOYING RESOURCES
You would need to deploy your kubernetes cluster(which has been running locally up till now)
to Azure Kubernetes Service. You would need to create an AKS instance and use kubectl to
deploy. Please consult the AKS Docs.
Right now your container images are only available locally, you would need to push your
images to Azure using Azure Container Registry, and configure your Kubernetes deployment
to use the correct images. Please consult the ACR Docs.
You can configure access to both your local cluster running in KIND and to your Azure
deployment via the kubeconfig. Please use these docs to learn more.
5.1.1 USEFUL LINKS
• Kubernetes Walkthrough
• Configure Kubernetes KIND Cluster in Azure
5.2 USER INTERFACE FOR MAPREDUCE
Creating a good user interface for software is an important aspect of developing a successful
product. In this section, you are going to develop an HTTP interface to our MapReduce service.
This interface should be on the master node and it should allow you to POST a job to
our service. It is up to you to specify how this interface works. Think about what kind of
2
information you are going to need to POST and how you might use HTTP to transfer that
information.
We recommend checking out the http netlib library to accomplish this goal.
If you did not set up Kubernetes readiness probes in the first week, we recommend that you
do so now. It will be a simple addition to this interface.
5.2.1 USEFUL LINKS
• c++ netlib library
5.3 PYTHON CODE WRAPPER
The map and reduce functions are going to be implemented in Python. The Python script
receives each input value through the standard input and writes the key value pairs through
the standard output. Your worker functions need to be able to feed the inputs as stdin to the
Python scripts, start the execution of the code, and capture the output of the Python script.
5.3.1 OPTION A
The map and reduce components are going to be implemented in Python, similar to the first
workshop of this course. The python script receives each input value through the standard
input and writes the key values through the standard output. Your code needs to be able to
both feed the inputs to the python script and save the results from the output of the python
script and start the execution of this programs. To be able to accomplish this task you are
going to use four functions: pipe, execl and fork, and dup2. Using these functions you are
going to implement a bidirectional pipe to communicate with the python code.
5.3.2 SUGGESTIONS
• Discuss with the other students about the corner and error cases that can arise when
using the four suggested functions, how do we avoid deadlock scenarios? and how do
we handle these situations?.
5.3.3 OPTION B
Another possibility for implementing the python function call is Extending python with C. In
which we use the file Python.h functions to call the python function directly from C++.
5.3.4 SUGGESTIONS
• Discuss with the other students about the benefits and drawbacks of using either option
A or option B, and potentially suggest other options. You are free to choose a different
way to run the mapper and reducer, but be sure to analyze the cons and pros of your
solution.
3
5.3.5 USEFUL LINKS
• Calling Python Functions from C
• execl(3) - Linux man page
• pipe(2) - Linux man page
• fork(2) - Linux man page
• Piping for input/output
• Creating pipes in C
• Popen
5.4 SAVE INTERMEDIATE RESULT
Using the API created in the previous workshop, save the output created by the map phase into
the intermediate storage, there should be R outputs created. The structure of this intermediate
storage is going to depend on the specification file presented as a deliverable for the previous
workshop.
5.5 SUGGESTIONS
• Discuss with other students about ways to store the intermediate results. Should it be in
blob storage or local storage of the workers? Should there be M*R outputs in total files,
or only R output files using atomic append operations? If you are using local files are
you using Linux commands like scp to copy the files or are you using RPC connections
to the workers?
5.6 SAVE FINAL RESULT
Using the API created in the previous workshop save the output generated from the reduce
phase into the final location.
5.6.1 SUGGESTIONS
• Your framework should be able to use it as an input to a pipeline of map reduce execu-
tions.
6 DELIVERABLES
• The git repo that contains all the required code and commit id.
• A demo that shows:
– Deploy your system to Azure Kubernetes.
4
– Configure your kubectl cli to point to the Azure cluster.
– Demonstrate your ability to scale your worker and master nodes via the kubectl cli.
– Submit a job via an HTTP request to your cluster.
– Show the output of your MapReduce job.
7 USEFUL REFERENCES
• MapReduce paper
5

More Related Content

Similar to workshop_8_c__.pdf

Similar to workshop_8_c__.pdf (20)

Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Java dev mar_2021_keynote
Java dev mar_2021_keynoteJava dev mar_2021_keynote
Java dev mar_2021_keynote
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
.NET Core 3.0 - What's new?
.NET Core 3.0 - What's new?.NET Core 3.0 - What's new?
.NET Core 3.0 - What's new?
 
What do you need to know about g rpc on .net
What do you need to know about g rpc on .net What do you need to know about g rpc on .net
What do you need to know about g rpc on .net
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
End to end testing Single Page Apps & APIs with Cucumber.js and Puppeteer (Em...
End to end testing Single Page Apps & APIs with Cucumber.js and Puppeteer (Em...End to end testing Single Page Apps & APIs with Cucumber.js and Puppeteer (Em...
End to end testing Single Page Apps & APIs with Cucumber.js and Puppeteer (Em...
 
Porting Projects to .NET 5
Porting Projects to .NET 5Porting Projects to .NET 5
Porting Projects to .NET 5
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylight
 
API workshop by AWS and 3scale
API workshop by AWS and 3scaleAPI workshop by AWS and 3scale
API workshop by AWS and 3scale
 

Recently uploaded

Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 

workshop_8_c__.pdf

  • 1. COLLEGE OF COMPUTING, GEORGIA INSTITUTE OF TECHNOLOGY Workshop 8/Systems Workshop 3: Worker Task Execution In this module of the class, you are going to implement the required code to execute the map and reduce tasks on the worker. Use the code created in the previous workshop as a base for the implementation. 1 EXPECTED OUTCOME The student is going to: • Deploy MapReduce applications running in Kubernetes to Azure • Develop An HTTP user interface for submitting input files and mapper and reducer functions to our system. • Create working mapper and reducer functions that execute user-submitted Python code inside the worker nodes. • Design and implement the interfaces and functionalities for the execution of the map and reduce phases in the workers. 2 ASSUMPTIONS This workshop assumes that the student had successfully completed all the previous work- shops on this module; and the corresponding assumptions for those workshops. 1
  • 2. 3 DOWNLOAD RELEVANT ENVIRONMENT TOOLS Install the Azure CLI. 4 SPECIFICATION Your MapReduce implementation should be able to: • Deploy your MapReduce cluster to Azure Kubernetes Service(AKS) • Execute map/reduce phase in any worker • After executing the map phase, sort the mapper result in place and store it in the corre- sponding location. • Store the required information in the master to be able to fetch the required <key,value> pairs to execute the Reduce phase in the corresponding worker. • Store the final results into Azure Blob, you should be able to use this data as an input for a pipelined map/reduce computation. 5 IMPLEMENTATION 5.1 DEPLOYING RESOURCES You would need to deploy your kubernetes cluster(which has been running locally up till now) to Azure Kubernetes Service. You would need to create an AKS instance and use kubectl to deploy. Please consult the AKS Docs. Right now your container images are only available locally, you would need to push your images to Azure using Azure Container Registry, and configure your Kubernetes deployment to use the correct images. Please consult the ACR Docs. You can configure access to both your local cluster running in KIND and to your Azure deployment via the kubeconfig. Please use these docs to learn more. 5.1.1 USEFUL LINKS • Kubernetes Walkthrough • Configure Kubernetes KIND Cluster in Azure 5.2 USER INTERFACE FOR MAPREDUCE Creating a good user interface for software is an important aspect of developing a successful product. In this section, you are going to develop an HTTP interface to our MapReduce service. This interface should be on the master node and it should allow you to POST a job to our service. It is up to you to specify how this interface works. Think about what kind of 2
  • 3. information you are going to need to POST and how you might use HTTP to transfer that information. We recommend checking out the http netlib library to accomplish this goal. If you did not set up Kubernetes readiness probes in the first week, we recommend that you do so now. It will be a simple addition to this interface. 5.2.1 USEFUL LINKS • c++ netlib library 5.3 PYTHON CODE WRAPPER The map and reduce functions are going to be implemented in Python. The Python script receives each input value through the standard input and writes the key value pairs through the standard output. Your worker functions need to be able to feed the inputs as stdin to the Python scripts, start the execution of the code, and capture the output of the Python script. 5.3.1 OPTION A The map and reduce components are going to be implemented in Python, similar to the first workshop of this course. The python script receives each input value through the standard input and writes the key values through the standard output. Your code needs to be able to both feed the inputs to the python script and save the results from the output of the python script and start the execution of this programs. To be able to accomplish this task you are going to use four functions: pipe, execl and fork, and dup2. Using these functions you are going to implement a bidirectional pipe to communicate with the python code. 5.3.2 SUGGESTIONS • Discuss with the other students about the corner and error cases that can arise when using the four suggested functions, how do we avoid deadlock scenarios? and how do we handle these situations?. 5.3.3 OPTION B Another possibility for implementing the python function call is Extending python with C. In which we use the file Python.h functions to call the python function directly from C++. 5.3.4 SUGGESTIONS • Discuss with the other students about the benefits and drawbacks of using either option A or option B, and potentially suggest other options. You are free to choose a different way to run the mapper and reducer, but be sure to analyze the cons and pros of your solution. 3
  • 4. 5.3.5 USEFUL LINKS • Calling Python Functions from C • execl(3) - Linux man page • pipe(2) - Linux man page • fork(2) - Linux man page • Piping for input/output • Creating pipes in C • Popen 5.4 SAVE INTERMEDIATE RESULT Using the API created in the previous workshop, save the output created by the map phase into the intermediate storage, there should be R outputs created. The structure of this intermediate storage is going to depend on the specification file presented as a deliverable for the previous workshop. 5.5 SUGGESTIONS • Discuss with other students about ways to store the intermediate results. Should it be in blob storage or local storage of the workers? Should there be M*R outputs in total files, or only R output files using atomic append operations? If you are using local files are you using Linux commands like scp to copy the files or are you using RPC connections to the workers? 5.6 SAVE FINAL RESULT Using the API created in the previous workshop save the output generated from the reduce phase into the final location. 5.6.1 SUGGESTIONS • Your framework should be able to use it as an input to a pipeline of map reduce execu- tions. 6 DELIVERABLES • The git repo that contains all the required code and commit id. • A demo that shows: – Deploy your system to Azure Kubernetes. 4
  • 5. – Configure your kubectl cli to point to the Azure cluster. – Demonstrate your ability to scale your worker and master nodes via the kubectl cli. – Submit a job via an HTTP request to your cluster. – Show the output of your MapReduce job. 7 USEFUL REFERENCES • MapReduce paper 5