SlideShare a Scribd company logo
Increase computational power
with distributed processing
Neil Stein 03 Nov 2012
A Discussion Example……..
Getting the data, and ordering it as needed…..
Familiar with grep and sort?

—  “grep” extracts all the matching lines
—  “sort” sorts all the lines
grep “some_record_parameters” hl7_transfer.data-file | sort
[2012/02/25/ 9:15] records sent to healthcare-1
[2012/02/28/ 6:15] records sent to healthcare-2
[2012/03/12/ 10:30] records sent to healthcare-3
A Discussion Example……..
—  As the amount of data increases, process requires more and
more resources

—  What if hl7_transfor.data-file is 500GB or bigger?
—  What if there are hundreds or thousands of data files?
—  What if there are multiple types of data files?
grep “provider 1” hl7_transfor.data-file | sort

—  Ignoring the process for a moment, how do we write all the data to
disk in the first place?

Need to rethink the process
Distributed File-System – “the cloud”
—  Files can be stored across many machines
—  Files can be replicated across many machines
—  Files can be in a hyrbid-cloud model
—  Share the file-system transparently
—  You simply see the usual file structure
—  Opportunity to leverage private and public cloud environments
Map-Reduce – the cloud
—  A way of processing large amounts of data across many machines
—  Must be able to split-up the data in chunks for processing, (Map)
—  Recombined after processing (Reduce)
—  Requires a constant flow of data from one simple state to another
—  Allows for a simple way of breaking down a large task into smaller
manageable tasks

—  Increase the available computational power
A look at Hadoop
What is Hadoop
—  A Map-Reduce framework
—  Designed to run applications on clusters of
local and remote systems

—  HDFS
—  The file system of Hadoop (Hadoop Distributed
File System)
—  Designed to access clusters of local and
remote systems
Putting the pieces together….
First, we need some code……
Map

Reduce
Map

Hadoop streams information on STDIN
Separate value with a newline (for Hadoop)
Reduce

Hadoop streams back to us on STDIN
Output the aggregated records
Sanity Checking
Command

Results
This should work with small data-sets
Push file to “the distributed file system”

Put file on the DFS

Check that the file is in the cloud
Running in “the distributed environment”

Call the Hadoop streaming command
Pass the appropriate parameters
Running in “the distributed environment”
Running in “the distributed environment”
Running in “the distributed environment”
Running in “the distributed environment”
Checking Status
—  Cluster Summary
—  Running Jobs
—  Completed Jobs
—  Failed Jobs
—  Job Statistics
—  Detailed Job Logs
Checking Distributed Cluster Health
—  List Data-Nodes
—  Dead Nodes
—  Node Heart-beat information
—  Failed Jobs
—  Job Statistics
—  Detailed Job Logs
Conclusion
—  A different paradigm for solving large-scale problems
—  Designed to solve specific problems that can be defined
in a focused map-reduce manner

More Related Content

What's hot

Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
Forrester High School
 
Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)
mobeen.laws
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
Pooja Dixit
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database SystemSulemang
 
Distributed database system
Distributed database systemDistributed database system
Distributed database system
M. Ahmad Mahmood
 
Parallel databases
Parallel databasesParallel databases
Parallel databases
Aniruddha Patil
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
Vinay D. Patel
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
shopnil786
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
JovyLee4
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management systememailharmeet
 
Distributed databases,types of database
Distributed databases,types of databaseDistributed databases,types of database
Distributed databases,types of databaseBoomadevi Shanmugam
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
Diana Patricia Rey Cabra
 
Dremel
DremelDremel
Dremel
Anhua Xu
 
Distributed database
Distributed databaseDistributed database
Distributed database
Ahmed Salama
 
Cluster computing
Cluster computingCluster computing
Cluster computing
Kajal Thakkar
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
Gyanmanjari Institute Of Technology
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
BOSS Webtech
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 IntroductionAli Usman
 

What's hot (20)

Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
 
hadoop
hadoophadoop
hadoop
 
Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
Distributed database system
Distributed database systemDistributed database system
Distributed database system
 
Parallel databases
Parallel databasesParallel databases
Parallel databases
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
 
Distributed databases,types of database
Distributed databases,types of databaseDistributed databases,types of database
Distributed databases,types of database
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
 
Dremel
DremelDremel
Dremel
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
 

Viewers also liked

Distributed Processing
Distributed ProcessingDistributed Processing
Distributed ProcessingImtiaz Hussain
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
Alokeparna Choudhury
 
Compare Chihuahua and Queretaro
Compare Chihuahua and QueretaroCompare Chihuahua and Queretaro
Compare Chihuahua and Queretaro
American Industries Group
 
Cloud ready discussion
Cloud ready discussionCloud ready discussion
Cloud ready discussionNeil Stein
 
Law presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntLaw presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntDianeAmbrose
 
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
Altos Escondidos Panama
 
AML Manual AltosEscondidos
AML Manual AltosEscondidosAML Manual AltosEscondidos
AML Manual AltosEscondidos
Altos Escondidos Panama
 
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Panama
 
Visual Arts Workshop
Visual  Arts WorkshopVisual  Arts Workshop
Visual Arts WorkshopYan Min Shan
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Qfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptQfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 ppt
Arun Ramanathan
 
Instructional Design Projects and Resources
Instructional Design Projects and ResourcesInstructional Design Projects and Resources
Instructional Design Projects and Resources
Lucimara Mello
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)Sri Prasanna
 
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsTHE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
Arun Ramanathan
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Presentation on data communication
Presentation on data communicationPresentation on data communication
Presentation on data communicationHarpreet Dhaliwal
 
Chapter 3 - Data and Signals
Chapter 3 - Data and SignalsChapter 3 - Data and Signals
Chapter 3 - Data and Signals
Wayne Jones Jnr
 

Viewers also liked (20)

Distributed Processing
Distributed ProcessingDistributed Processing
Distributed Processing
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Compare Chihuahua and Queretaro
Compare Chihuahua and QueretaroCompare Chihuahua and Queretaro
Compare Chihuahua and Queretaro
 
Cloud ready discussion
Cloud ready discussionCloud ready discussion
Cloud ready discussion
 
Law presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntLaw presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. Int
 
Viraj D Visual cv
Viraj D Visual cvViraj D Visual cv
Viraj D Visual cv
 
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
 
AML Manual AltosEscondidos
AML Manual AltosEscondidosAML Manual AltosEscondidos
AML Manual AltosEscondidos
 
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact Study
 
Visual Arts Workshop
Visual  Arts WorkshopVisual  Arts Workshop
Visual Arts Workshop
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Qfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptQfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 ppt
 
Instructional Design Projects and Resources
Instructional Design Projects and ResourcesInstructional Design Projects and Resources
Instructional Design Projects and Resources
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
 
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsTHE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
DDBMS
DDBMSDDBMS
DDBMS
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
Presentation on data communication
Presentation on data communicationPresentation on data communication
Presentation on data communication
 
Chapter 3 - Data and Signals
Chapter 3 - Data and SignalsChapter 3 - Data and Signals
Chapter 3 - Data and Signals
 

Similar to Distributed processing

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
Sudar Muthu
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
TIB Academy
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
NamrataBhatt8
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
Arjen de Vries
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
Ankan Banerjee
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
HADOOP
HADOOPHADOOP
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 

Similar to Distributed processing (20)

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
hadoop
hadoophadoop
hadoop
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP
HADOOPHADOOP
HADOOP
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Distributed processing

  • 1. Increase computational power with distributed processing Neil Stein 03 Nov 2012
  • 2.
  • 3. A Discussion Example…….. Getting the data, and ordering it as needed….. Familiar with grep and sort? —  “grep” extracts all the matching lines —  “sort” sorts all the lines grep “some_record_parameters” hl7_transfer.data-file | sort [2012/02/25/ 9:15] records sent to healthcare-1 [2012/02/28/ 6:15] records sent to healthcare-2 [2012/03/12/ 10:30] records sent to healthcare-3
  • 4. A Discussion Example…….. —  As the amount of data increases, process requires more and more resources —  What if hl7_transfor.data-file is 500GB or bigger? —  What if there are hundreds or thousands of data files? —  What if there are multiple types of data files? grep “provider 1” hl7_transfor.data-file | sort —  Ignoring the process for a moment, how do we write all the data to disk in the first place? Need to rethink the process
  • 5.
  • 6. Distributed File-System – “the cloud” —  Files can be stored across many machines —  Files can be replicated across many machines —  Files can be in a hyrbid-cloud model —  Share the file-system transparently —  You simply see the usual file structure —  Opportunity to leverage private and public cloud environments
  • 7.
  • 8. Map-Reduce – the cloud —  A way of processing large amounts of data across many machines —  Must be able to split-up the data in chunks for processing, (Map) —  Recombined after processing (Reduce) —  Requires a constant flow of data from one simple state to another —  Allows for a simple way of breaking down a large task into smaller manageable tasks —  Increase the available computational power
  • 9. A look at Hadoop
  • 10. What is Hadoop —  A Map-Reduce framework —  Designed to run applications on clusters of local and remote systems —  HDFS —  The file system of Hadoop (Hadoop Distributed File System) —  Designed to access clusters of local and remote systems
  • 11. Putting the pieces together….
  • 12. First, we need some code…… Map Reduce
  • 13. Map Hadoop streams information on STDIN Separate value with a newline (for Hadoop)
  • 14. Reduce Hadoop streams back to us on STDIN Output the aggregated records
  • 15. Sanity Checking Command Results This should work with small data-sets
  • 16. Push file to “the distributed file system” Put file on the DFS Check that the file is in the cloud
  • 17. Running in “the distributed environment” Call the Hadoop streaming command Pass the appropriate parameters
  • 18. Running in “the distributed environment”
  • 19. Running in “the distributed environment”
  • 20. Running in “the distributed environment”
  • 21. Running in “the distributed environment”
  • 22. Checking Status —  Cluster Summary —  Running Jobs —  Completed Jobs —  Failed Jobs —  Job Statistics —  Detailed Job Logs
  • 23. Checking Distributed Cluster Health —  List Data-Nodes —  Dead Nodes —  Node Heart-beat information —  Failed Jobs —  Job Statistics —  Detailed Job Logs
  • 24. Conclusion —  A different paradigm for solving large-scale problems —  Designed to solve specific problems that can be defined in a focused map-reduce manner