SlideShare a Scribd company logo
1 of 12
GROOVY DISTRIBUTED
TASK MANAGEMENT
SYSTEM
COSC 880
Towson University
Department of Computer and Information Sciences
Advisor: Dr. Josh Dehlinger
Emanuel Rivera
https://github.com/mannyrivera2010/barium
Motivation/Purpose
 Motivation- Figure out a way to deal with
processing growing data sets and processing
graph data in less time in a distributed way
 The purpose of this project was to create a
framework based on dividing a job into many
tasks to use the computational power of many
machines
Terminology
 Owner/Worker Framework- an asynchronous distributed task
management system framework that allows developers to have a
generic way to execute code on multiple machines. Also known as
Barium
 Queues/Publish Subscribe- communication models in which the
framework uses to communicate between worker nodes
 BariumUI- a front-end which consume barium’s RESTful API
powered by an AngularJS web application framework.
Terminology (Cont.)
 Owner- the owner is responsible for generating,
monitoring, and putting tasks into the queue for workers
to be executed in a distributed way
 Worker- the worker is responsible for executing a task a
task from the queue and publishing all results to the
pub/sub topic that the owner listens to
 Task- a task is a unit of work that is executed on the
worker
System
Owner1
Tracking
Table
FIFO Queue
Pub/Sub
Worker1 Worker2
WS
Client
WS
Client
REST
Client
REST
WS
Pub/Su
b
Pub/Su
b
Pub/Sub Pub/Sub
Technologies
 Backend Technologies
 Netty
 HazelCast
 Groovy/Java
 Gradle
 Node.js
 Frontend Technologies
 AngularJS
 Grunt
Demonstration
 Processing many PubMed Central® (PMC)
XMLs for conversion into a Single Line-
delimited JSON file for analytics
 PMC - is a free full-text archive in xml format of
biomedical and life sciences journal literature
at the U.S. National Institutes of Health's
National Library of Medicine (NIH/NLM).
 Demo
 There is a slide in the end of presentation called “Demonstration
Procedure” with more information
Improvements
 Owner/Worker Framework Frontend
(BariumUI)
 Owner/Worker Framework Backend (Barium)
 Better Software Testing
 Major Code Refactor
 Fully Feature Website
 Multi-Broker Support
Lesson Learned
 A lot of different technologies
 Software Design
 Distributed Computing
 Gain the skill to use different resources to
solve computing problems
Discussion/Thank You
Sources
 https://hazelcast.com/
 https://angularjs.org/
 http://yeoman.io/codelab/setup.html
 https://git-scm.com/
 https://www.github.com
 http://www.csun.edu/~shan/comp696-698/Resources/Thesis-Outline-Guide-rev1.pdf
 http://www.easterbrook.ca/steve/2010/01/how-to-write-a-scientific-abstract-in-six-easy-steps/
 http://oanasagile.blogspot.com/2012/11/test-driven-business-featuring-lean.html
 http://netty.io/
 http://sd.jtimothyking.com/2006/07/11/twelve-benefits-of-writing-unit-tests-first/
 http://blog.codinghorror.com/code-smells/
 http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
 http://www.cs.washington.edu/events/colloquia/search/details?id=437
 http://kafka.apache.org/
 http://gruntjs.com/
 https://nodejs.org/en/
 http://theleanstartup.com/principles
 https://commons.wikimedia.org/wiki/File:Inside_and_Rear_of_Webserver.jpg
 http://www.engadget.com/2015/08/20/google-reveals-server-info/
Presentation Demonstration
Procedure
This is the procedure used for the demonstration on how to use the Owner/Worker framework for the
presentation portion of the project. The goal of the demo was to convert PubMed Central Open
Access Subset XML’s files into single JSON line-delimited file in a scalable way and using the
processing power of many machines. Once all PMC’s xml has been converted into JSON format,
there are tools which can be used to analyze the data which is used for data mining. It will allow you
to ask questions that provided value from the data. The dataset contains all of the articles in the PMC
open access subset. PubMed Central has a public ftp server that allows you to download subset.
Procedure
 Visit http://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ and download the datasets archive files
 Un-compress the archive and put the files into the file server powered by Node.js
 Start Owner and Worker nodes on each machine
 Owner- read directory from Webserver and make queue for each folder that exist in the top
directory. There will be one Task for each folder that exist.
 Worker- get file from webserver, convert XML to JSON , send results to the owner to Flat File of
JSON lines for analyzing.
 After the job has finished, analyze the file with a tool

More Related Content

Similar to Rivera_COSC880_Presentation

Normalizing x pages web development
Normalizing x pages web development Normalizing x pages web development
Normalizing x pages web development Shean McManus
 
Lunch and learn as3_frameworks
Lunch and learn as3_frameworksLunch and learn as3_frameworks
Lunch and learn as3_frameworksYuri Visser
 
.NET Recommended Resources
.NET Recommended Resources.NET Recommended Resources
.NET Recommended ResourcesGreg Sohl
 
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxSimulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxedgar6wallace88877
 
Project management tools
Project management toolsProject management tools
Project management toolsRakesh Sankar
 
E learning resource Locator Project Report (J2EE)
E learning resource Locator Project Report (J2EE)E learning resource Locator Project Report (J2EE)
E learning resource Locator Project Report (J2EE)Chiranjeevi Adi
 
Mobile Web App development multiplatform using phonegap-cordova
Mobile Web App development multiplatform using phonegap-cordovaMobile Web App development multiplatform using phonegap-cordova
Mobile Web App development multiplatform using phonegap-cordovaKhirulnizam Abd Rahman
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Testable client side_mvc_apps_in_javascript
Testable client side_mvc_apps_in_javascriptTestable client side_mvc_apps_in_javascript
Testable client side_mvc_apps_in_javascriptTimothy Oxley
 
SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem.
SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem. SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem.
SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem. Kushan Lahiru Perera
 
How To Implement a CMS
How To Implement a CMSHow To Implement a CMS
How To Implement a CMSJonathan Smith
 
Resume - Parag Bhayani
Resume - Parag BhayaniResume - Parag Bhayani
Resume - Parag BhayaniParag Bhayani
 
Punith's Résumé Cover
Punith's Résumé CoverPunith's Résumé Cover
Punith's Résumé Coverpunith s
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
State of modern web technologies: an introduction
State of modern web technologies: an introductionState of modern web technologies: an introduction
State of modern web technologies: an introductionMichael Ahearn
 
Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_enggAnnie Thomas
 

Similar to Rivera_COSC880_Presentation (20)

Normalizing x pages web development
Normalizing x pages web development Normalizing x pages web development
Normalizing x pages web development
 
Lunch and learn as3_frameworks
Lunch and learn as3_frameworksLunch and learn as3_frameworks
Lunch and learn as3_frameworks
 
.NET Recommended Resources
.NET Recommended Resources.NET Recommended Resources
.NET Recommended Resources
 
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docxSimulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
Simulation Modelling Practice and Theory 47 (2014) 28–45Cont.docx
 
Project management tools
Project management toolsProject management tools
Project management tools
 
E learning resource Locator Project Report (J2EE)
E learning resource Locator Project Report (J2EE)E learning resource Locator Project Report (J2EE)
E learning resource Locator Project Report (J2EE)
 
Resume(2-8-2017)
Resume(2-8-2017)Resume(2-8-2017)
Resume(2-8-2017)
 
Mobile Web App development multiplatform using phonegap-cordova
Mobile Web App development multiplatform using phonegap-cordovaMobile Web App development multiplatform using phonegap-cordova
Mobile Web App development multiplatform using phonegap-cordova
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Testable client side_mvc_apps_in_javascript
Testable client side_mvc_apps_in_javascriptTestable client side_mvc_apps_in_javascript
Testable client side_mvc_apps_in_javascript
 
SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem.
SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem. SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem.
SharePoint Framework -The future of SharePoint/ Office 365 developer ecosystem.
 
How To Implement a CMS
How To Implement a CMSHow To Implement a CMS
How To Implement a CMS
 
Resume - Parag Bhayani
Resume - Parag BhayaniResume - Parag Bhayani
Resume - Parag Bhayani
 
Resume
ResumeResume
Resume
 
Punith's Résumé Cover
Punith's Résumé CoverPunith's Résumé Cover
Punith's Résumé Cover
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
State of modern web technologies: an introduction
State of modern web technologies: an introductionState of modern web technologies: an introduction
State of modern web technologies: an introduction
 
AseemMahajan_Resume
AseemMahajan_ResumeAseemMahajan_Resume
AseemMahajan_Resume
 
AjitMedhekar_Resume
AjitMedhekar_ResumeAjitMedhekar_Resume
AjitMedhekar_Resume
 
Object oriented software_engg
Object oriented software_enggObject oriented software_engg
Object oriented software_engg
 

Rivera_COSC880_Presentation

  • 1. GROOVY DISTRIBUTED TASK MANAGEMENT SYSTEM COSC 880 Towson University Department of Computer and Information Sciences Advisor: Dr. Josh Dehlinger Emanuel Rivera https://github.com/mannyrivera2010/barium
  • 2. Motivation/Purpose  Motivation- Figure out a way to deal with processing growing data sets and processing graph data in less time in a distributed way  The purpose of this project was to create a framework based on dividing a job into many tasks to use the computational power of many machines
  • 3. Terminology  Owner/Worker Framework- an asynchronous distributed task management system framework that allows developers to have a generic way to execute code on multiple machines. Also known as Barium  Queues/Publish Subscribe- communication models in which the framework uses to communicate between worker nodes  BariumUI- a front-end which consume barium’s RESTful API powered by an AngularJS web application framework.
  • 4. Terminology (Cont.)  Owner- the owner is responsible for generating, monitoring, and putting tasks into the queue for workers to be executed in a distributed way  Worker- the worker is responsible for executing a task a task from the queue and publishing all results to the pub/sub topic that the owner listens to  Task- a task is a unit of work that is executed on the worker
  • 6. Technologies  Backend Technologies  Netty  HazelCast  Groovy/Java  Gradle  Node.js  Frontend Technologies  AngularJS  Grunt
  • 7. Demonstration  Processing many PubMed Central® (PMC) XMLs for conversion into a Single Line- delimited JSON file for analytics  PMC - is a free full-text archive in xml format of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM).  Demo  There is a slide in the end of presentation called “Demonstration Procedure” with more information
  • 8. Improvements  Owner/Worker Framework Frontend (BariumUI)  Owner/Worker Framework Backend (Barium)  Better Software Testing  Major Code Refactor  Fully Feature Website  Multi-Broker Support
  • 9. Lesson Learned  A lot of different technologies  Software Design  Distributed Computing  Gain the skill to use different resources to solve computing problems
  • 11. Sources  https://hazelcast.com/  https://angularjs.org/  http://yeoman.io/codelab/setup.html  https://git-scm.com/  https://www.github.com  http://www.csun.edu/~shan/comp696-698/Resources/Thesis-Outline-Guide-rev1.pdf  http://www.easterbrook.ca/steve/2010/01/how-to-write-a-scientific-abstract-in-six-easy-steps/  http://oanasagile.blogspot.com/2012/11/test-driven-business-featuring-lean.html  http://netty.io/  http://sd.jtimothyking.com/2006/07/11/twelve-benefits-of-writing-unit-tests-first/  http://blog.codinghorror.com/code-smells/  http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf  http://www.cs.washington.edu/events/colloquia/search/details?id=437  http://kafka.apache.org/  http://gruntjs.com/  https://nodejs.org/en/  http://theleanstartup.com/principles  https://commons.wikimedia.org/wiki/File:Inside_and_Rear_of_Webserver.jpg  http://www.engadget.com/2015/08/20/google-reveals-server-info/
  • 12. Presentation Demonstration Procedure This is the procedure used for the demonstration on how to use the Owner/Worker framework for the presentation portion of the project. The goal of the demo was to convert PubMed Central Open Access Subset XML’s files into single JSON line-delimited file in a scalable way and using the processing power of many machines. Once all PMC’s xml has been converted into JSON format, there are tools which can be used to analyze the data which is used for data mining. It will allow you to ask questions that provided value from the data. The dataset contains all of the articles in the PMC open access subset. PubMed Central has a public ftp server that allows you to download subset. Procedure  Visit http://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ and download the datasets archive files  Un-compress the archive and put the files into the file server powered by Node.js  Start Owner and Worker nodes on each machine  Owner- read directory from Webserver and make queue for each folder that exist in the top directory. There will be one Task for each folder that exist.  Worker- get file from webserver, convert XML to JSON , send results to the owner to Flat File of JSON lines for analyzing.  After the job has finished, analyze the file with a tool