SlideShare a Scribd company logo
1 of 24
Download to read offline
Increase computational power
with distributed processing
Neil Stein 03 Nov 2012
A Discussion Example……..
Getting the data, and ordering it as needed…..
Familiar with grep and sort?

—  “grep” extracts all the matching lines
—  “sort” sorts all the lines
grep “some_record_parameters” hl7_transfer.data-file | sort
[2012/02/25/ 9:15] records sent to healthcare-1
[2012/02/28/ 6:15] records sent to healthcare-2
[2012/03/12/ 10:30] records sent to healthcare-3
A Discussion Example……..
—  As the amount of data increases, process requires more and
more resources

—  What if hl7_transfor.data-file is 500GB or bigger?
—  What if there are hundreds or thousands of data files?
—  What if there are multiple types of data files?
grep “provider 1” hl7_transfor.data-file | sort

—  Ignoring the process for a moment, how do we write all the data to
disk in the first place?

Need to rethink the process
Distributed File-System – “the cloud”
—  Files can be stored across many machines
—  Files can be replicated across many machines
—  Files can be in a hyrbid-cloud model
—  Share the file-system transparently
—  You simply see the usual file structure
—  Opportunity to leverage private and public cloud environments
Map-Reduce – the cloud
—  A way of processing large amounts of data across many machines
—  Must be able to split-up the data in chunks for processing, (Map)
—  Recombined after processing (Reduce)
—  Requires a constant flow of data from one simple state to another
—  Allows for a simple way of breaking down a large task into smaller
manageable tasks

—  Increase the available computational power
A look at Hadoop
What is Hadoop
—  A Map-Reduce framework
—  Designed to run applications on clusters of
local and remote systems

—  HDFS
—  The file system of Hadoop (Hadoop Distributed
File System)
—  Designed to access clusters of local and
remote systems
Putting the pieces together….
First, we need some code……
Map

Reduce
Map

Hadoop streams information on STDIN
Separate value with a newline (for Hadoop)
Reduce

Hadoop streams back to us on STDIN
Output the aggregated records
Sanity Checking
Command

Results
This should work with small data-sets
Push file to “the distributed file system”

Put file on the DFS

Check that the file is in the cloud
Running in “the distributed environment”

Call the Hadoop streaming command
Pass the appropriate parameters
Running in “the distributed environment”
Running in “the distributed environment”
Running in “the distributed environment”
Running in “the distributed environment”
Checking Status
—  Cluster Summary
—  Running Jobs
—  Completed Jobs
—  Failed Jobs
—  Job Statistics
—  Detailed Job Logs
Checking Distributed Cluster Health
—  List Data-Nodes
—  Dead Nodes
—  Node Heart-beat information
—  Failed Jobs
—  Job Statistics
—  Detailed Job Logs
Conclusion
—  A different paradigm for solving large-scale problems
—  Designed to solve specific problems that can be defined
in a focused map-reduce manner

More Related Content

What's hot

Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databasesForrester High School
 
Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)mobeen.laws
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management systemPooja Dixit
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database SystemSulemang
 
Distributed database system
Distributed database systemDistributed database system
Distributed database systemM. Ahmad Mahmood
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management systemVinay D. Patel
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed DatabaseJovyLee4
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management systememailharmeet
 
Distributed databases,types of database
Distributed databases,types of databaseDistributed databases,types of database
Distributed databases,types of databaseBoomadevi Shanmugam
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mppDiana Patricia Rey Cabra
 
Distributed database
Distributed databaseDistributed database
Distributed databaseAhmed Salama
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 IntroductionAli Usman
 

What's hot (20)

Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
 
hadoop
hadoophadoop
hadoop
 
Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)Distributed Database Management System(DDMS)
Distributed Database Management System(DDMS)
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
Distributed database system
Distributed database systemDistributed database system
Distributed database system
 
Parallel databases
Parallel databasesParallel databases
Parallel databases
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
 
Distributed databases,types of database
Distributed databases,types of databaseDistributed databases,types of database
Distributed databases,types of database
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
 
Dremel
DremelDremel
Dremel
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
 

Viewers also liked

Distributed Processing
Distributed ProcessingDistributed Processing
Distributed ProcessingImtiaz Hussain
 
Cloud ready discussion
Cloud ready discussionCloud ready discussion
Cloud ready discussionNeil Stein
 
Law presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntLaw presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntDianeAmbrose
 
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAAltos Escondidos Panama
 
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Panama
 
Visual Arts Workshop
Visual  Arts WorkshopVisual  Arts Workshop
Visual Arts WorkshopYan Min Shan
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Qfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptQfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptArun Ramanathan
 
Instructional Design Projects and Resources
Instructional Design Projects and ResourcesInstructional Design Projects and Resources
Instructional Design Projects and ResourcesLucimara Mello
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)Sri Prasanna
 
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsTHE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsArun Ramanathan
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Presentation on data communication
Presentation on data communicationPresentation on data communication
Presentation on data communicationHarpreet Dhaliwal
 
Chapter 3 - Data and Signals
Chapter 3 - Data and SignalsChapter 3 - Data and Signals
Chapter 3 - Data and SignalsWayne Jones Jnr
 

Viewers also liked (20)

Distributed Processing
Distributed ProcessingDistributed Processing
Distributed Processing
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Compare Chihuahua and Queretaro
Compare Chihuahua and QueretaroCompare Chihuahua and Queretaro
Compare Chihuahua and Queretaro
 
Cloud ready discussion
Cloud ready discussionCloud ready discussion
Cloud ready discussion
 
Law presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. IntLaw presentation: Summarry of Stat. Int
Law presentation: Summarry of Stat. Int
 
Viraj D Visual cv
Viraj D Visual cvViraj D Visual cv
Viraj D Visual cv
 
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMAALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
ALTOS ESCONDIDOS PANAMA: ECO LUXURY LIVING IN PANAMA
 
AML Manual AltosEscondidos
AML Manual AltosEscondidosAML Manual AltosEscondidos
AML Manual AltosEscondidos
 
Altos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact StudyAltos Escondidos Road Construction and Enviremental Impact Study
Altos Escondidos Road Construction and Enviremental Impact Study
 
Visual Arts Workshop
Visual  Arts WorkshopVisual  Arts Workshop
Visual Arts Workshop
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Qfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 pptQfi boarding lodging 2012 ppt
Qfi boarding lodging 2012 ppt
 
Instructional Design Projects and Resources
Instructional Design Projects and ResourcesInstructional Design Projects and Resources
Instructional Design Projects and Resources
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
 
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 FinalsTHE AVIAL PURSUIT OPEN QUIZ 2013 Finals
THE AVIAL PURSUIT OPEN QUIZ 2013 Finals
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
DDBMS
DDBMSDDBMS
DDBMS
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
Presentation on data communication
Presentation on data communicationPresentation on data communication
Presentation on data communication
 
Chapter 3 - Data and Signals
Chapter 3 - Data and SignalsChapter 3 - Data and Signals
Chapter 3 - Data and Signals
 

Similar to Distributed processing

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inTIB Academy
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptxNamrataBhatt8
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 

Similar to Distributed processing (20)

Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
hadoop
hadoophadoop
hadoop
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP
HADOOPHADOOP
HADOOP
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Distributed processing

  • 1. Increase computational power with distributed processing Neil Stein 03 Nov 2012
  • 2.
  • 3. A Discussion Example…….. Getting the data, and ordering it as needed….. Familiar with grep and sort? —  “grep” extracts all the matching lines —  “sort” sorts all the lines grep “some_record_parameters” hl7_transfer.data-file | sort [2012/02/25/ 9:15] records sent to healthcare-1 [2012/02/28/ 6:15] records sent to healthcare-2 [2012/03/12/ 10:30] records sent to healthcare-3
  • 4. A Discussion Example…….. —  As the amount of data increases, process requires more and more resources —  What if hl7_transfor.data-file is 500GB or bigger? —  What if there are hundreds or thousands of data files? —  What if there are multiple types of data files? grep “provider 1” hl7_transfor.data-file | sort —  Ignoring the process for a moment, how do we write all the data to disk in the first place? Need to rethink the process
  • 5.
  • 6. Distributed File-System – “the cloud” —  Files can be stored across many machines —  Files can be replicated across many machines —  Files can be in a hyrbid-cloud model —  Share the file-system transparently —  You simply see the usual file structure —  Opportunity to leverage private and public cloud environments
  • 7.
  • 8. Map-Reduce – the cloud —  A way of processing large amounts of data across many machines —  Must be able to split-up the data in chunks for processing, (Map) —  Recombined after processing (Reduce) —  Requires a constant flow of data from one simple state to another —  Allows for a simple way of breaking down a large task into smaller manageable tasks —  Increase the available computational power
  • 9. A look at Hadoop
  • 10. What is Hadoop —  A Map-Reduce framework —  Designed to run applications on clusters of local and remote systems —  HDFS —  The file system of Hadoop (Hadoop Distributed File System) —  Designed to access clusters of local and remote systems
  • 11. Putting the pieces together….
  • 12. First, we need some code…… Map Reduce
  • 13. Map Hadoop streams information on STDIN Separate value with a newline (for Hadoop)
  • 14. Reduce Hadoop streams back to us on STDIN Output the aggregated records
  • 15. Sanity Checking Command Results This should work with small data-sets
  • 16. Push file to “the distributed file system” Put file on the DFS Check that the file is in the cloud
  • 17. Running in “the distributed environment” Call the Hadoop streaming command Pass the appropriate parameters
  • 18. Running in “the distributed environment”
  • 19. Running in “the distributed environment”
  • 20. Running in “the distributed environment”
  • 21. Running in “the distributed environment”
  • 22. Checking Status —  Cluster Summary —  Running Jobs —  Completed Jobs —  Failed Jobs —  Job Statistics —  Detailed Job Logs
  • 23. Checking Distributed Cluster Health —  List Data-Nodes —  Dead Nodes —  Node Heart-beat information —  Failed Jobs —  Job Statistics —  Detailed Job Logs
  • 24. Conclusion —  A different paradigm for solving large-scale problems —  Designed to solve specific problems that can be defined in a focused map-reduce manner