SlideShare a Scribd company logo
1 of 44
Cloud Computing Lecture #1
What is Cloud Computing?
(and an intro to parallel/distributed processing)




                             Jimmy Lin
                             The iSchool
                             University of Maryland

                             Wednesday, September 3, 2008


       Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
       Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
       This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
       See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Source: http://www.free-pictures-photos.com/
What is Cloud Computing?
1.   Web-scale problems
2.   Large data centers
3.   Different models of computing
4.   Highly-interactive Web applications




                                                       The iSchool
                                           University of Maryland
1. Web-Scale Problems
   Characteristics:
       Definitely data-intensive
       May also be processing intensive
   Examples:
       Crawling, indexing, searching, mining the Web
       “Post-genomics” life sciences research
       Other scientific data (physics, astronomers, etc.)
       Sensor networks
       Web 2.0 applications
       …




                                                                     The iSchool
                                                         University of Maryland
How much data?
   Wayback Machine has 2 PB + 20 TB/month (2006)
   Google processes 20 PB a day (2008)
   “all words ever spoken by human beings” ~ 5 EB
   NOAA has ~1 PB climate data (2007)
   CERN’s LHC will generate 15 PB a year (2008)

                           640K ought to be
                           enough for anybody.




                                                             The iSchool
                                                 University of Maryland
Maximilien Brice, © CERN
Maximilien Brice, © CERN
There’s nothing like more data!
      s/inspiration/data/g;




                                          The iSchool
(Banko and Brill, ACL 2001)
                              University of Maryland
(Brants et al., EMNLP 2007)
What to do with more data?
          Answering factoid questions
                 Pattern matching on the Web
                 Works amazingly well
                            Who shot Abraham Lincoln? → X shot Abraham Lincoln

          Learning relations
                 Start with seed instances
                 Search for patterns on the Web
                 Using patterns to find more instances
                                                            Wolfgang Amadeus Mozart (1756 - 1791)
                                                            Einstein was born in 1879

               Birthday-of(Mozart, 1756)
               Birthday-of(Einstein, 1879)
                                                                    PERSON (DATE –
                                                                    PERSON was born in DATE

                                                                                                    The iSchool
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )                   University of Maryland
2. Large Data Centers
   Web-scale problems? Throw more machines at it!
   Clear trend: centralization of computing resources in large
    data centers
       Necessary ingredients: fiber, juice, and space
       What do Oregon, Iceland, and abandoned mines have in
        common?
   Important Issues:
       Redundancy
       Efficiency
       Utilization
       Management



                                                               The iSchool
                                                   University of Maryland
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
Key Technology: Virtualization


                           App      App          App

    App     App      App   OS       OS           OS

      Operating System           Hypervisor

          Hardware               Hardware

      Traditional Stack      Virtualized Stack




                                                   The iSchool
                                       University of Maryland
3. Different Computing Models
     “Why do it yourself if you can pay someone to do it for you?”

   Utility computing
       Why buy machines when you can rent cycles?
       Examples: Amazon’s EC2, GoGrid, AppNexus
   Platform as a Service (PaaS)
       Give me nice API and take care of the implementation
       Example: Google App Engine
   Software as a Service (SaaS)
       Just run it for me!
       Example: Gmail



                                                                         The iSchool
                                                             University of Maryland
4. Web Applications
   A mistake on top of a hack built on sand held together by
    duct tape?
   What is the nature of software applications?
       From the desktop to the browser
       SaaS == Web-based applications
       Examples: Google Maps, Facebook
   How do we deliver highly-interactive Web-based
    applications?
       AJAX (asynchronous JavaScript and XML)
       For better, or for worse…




                                                             The iSchool
                                                 University of Maryland
What is the course about?
   MapReduce: the “back-end” of cloud computing
       Batch-oriented processing of large datasets
   Ajax: the “front-end” of cloud computing
       Highly-interactive Web-based applications
   Computing “in the clouds”
       Amazon’s EC2/S3 as an example of utility computing




                                                                  The iSchool
                                                      University of Maryland
Amazon Web Services
   Elastic Compute Cloud (EC2)
       Rent computing resources by the hour
       Basic unit of accounting = instance-hour
       Additional costs for bandwidth
   Simple Storage Service (S3)
       Persistent storage
       Charge by the GB/month
       Additional costs for bandwidth
   You’ll be using EC2/S3 for course assignments!




                                                               The iSchool
                                                   University of Maryland
This course is not for you…
    If you’re not genuinely interested in the topic
    If you’re not ready to do a lot of programming
    If you’re not open to thinking about computing in new ways
    If you can’t cope with uncertainly, unpredictability, poor
     documentation, and immature software
    If you can’t put in the time



             Otherwise, this will be a richly rewarding course!



                                                                    The iSchool
                                                        University of Maryland
Source: http://davidzinger.wordpress.com/2007/05/page/2/
Cloud Computing Zen
   Don’t get frustrated (take a deep breath)…
       This is bleeding edge technology
       Those W$*#T@F! moments
   Be patient…
       This is the second first time I’ve taught this course
   Be flexible…
       There will be unanticipated issues along the way
   Be constructive…
       Tell me how I can make everyone’s experience better




                                                                      The iSchool
                                                          University of Maryland
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Things to go over…
   Course schedule
   Assignments and deliverables
   Amazon EC2/S3




                                               The iSchool
                                   University of Maryland
Web-Scale Problems?
   Don’t hold your breath:
       Biocomputing
       Nanocomputing
       Quantum computing
       …
   It all boils down to…
       Divide-and-conquer
       Throwing more hardware at the problem




                       Simple to understand… a lifetime to master…

                                                                       The iSchool
                                                           University of Maryland
Divide and Conquer

                 “Work”
                                       Partition


        w1         w2         w3


      “worker”   “worker”   “worker”


         r1         r2         r3




                 “Result”              Combine


                                                     The iSchool
                                         University of Maryland
Different Workers
   Different threads in the same core
   Different cores in the same CPU
   Different CPUs in a multi-processor system
   Different machines in a distributed system




                                                             The iSchool
                                                 University of Maryland
Choices, Choices, Choices
   Commodity vs. “exotic” hardware
   Number of machines vs. processor vs. cores
   Bandwidth of memory vs. disk vs. network
   Different programming models




                                                           The iSchool
                                               University of Maryland
Flynn’s Taxonomy

                                        Instructions

                              Single (SI)         Multiple (MI)

                                 SISD                  MISD
           Single (SD)


                            Single-threaded         Pipeline
                                process           architecture
    Data




                                SIMD                   MIMD
           Multiple (MD)




                           Vector Processing     Multi-threaded
                                                 Programming




                                                                          The iSchool
                                                              University of Maryland
SISD



                    Processor


       D   D   D        D         D   D   D




                   Instructions




                                                          The iSchool
                                              University of Maryland
SIMD

                       Processor


       D0   D0   D0       D0         D0   D0   D0
       D1   D1   D1       D1         D1   D1   D1
       D2   D2   D2       D2         D2   D2   D2
       D3   D3   D3       D3         D3   D3   D3
       D4   D4   D4       D4         D4   D4   D4
       …    …    …        …          …    …    …
       Dn   Dn   Dn       Dn         Dn   Dn   Dn



                      Instructions



                                                            The iSchool
                                                University of Maryland
MIMD
                    Processor


       D   D   D        D         D   D   D




                   Instructions

                    Processor


       D   D   D        D         D   D   D




                   Instructions


                                                          The iSchool
                                              University of Maryland
Memory Typology: Shared




           Processor            Processor

                       Memory

           Processor            Processor




                                                        The iSchool
                                            University of Maryland
Memory Typology: Distributed



       Processor   Memory             Processor   Memory


                            Network



       Processor   Memory             Processor   Memory




                                                              The iSchool
                                                  University of Maryland
Memory Typology: Hybrid


       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor


                            Network



       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor




                                                              The iSchool
                                                  University of Maryland
Parallelization Problems
    How do we assign work units to workers?
    What if we have more work units than workers?
    What if workers need to share partial results?
    How do we aggregate partial results?
    How do we know all the workers have finished?
    What if workers die?


                What is the common theme of all of these problems?



                                                                The iSchool
                                                    University of Maryland
General Theme?
   Parallelization problems arise from:
       Communication between workers
       Access to shared resources (e.g., data)
   Thus, we need a synchronization system!
   This is tricky:
       Finding bugs is hard
       Solving bugs is even harder




                                                              The iSchool
                                                  University of Maryland
Managing Multiple Workers
   Difficult because
       (Often) don’t know the order in which workers run
       (Often) don’t know where the workers are running
       (Often) don’t know when workers interrupt each other
   Thus, we need:
       Semaphores (lock, unlock)
       Conditional variables (wait, notify, broadcast)
       Barriers
   Still, lots of problems:
       Deadlock, livelock, race conditions, ...
   Moral of the story: be careful!
       Even trickier if the workers are on different machines
                                                                      The iSchool
                                                          University of Maryland
Patterns for Parallelism
    Parallel computing has been around for decades
    Here are some “design patterns” …




                                                         The iSchool
                                             University of Maryland
Master/Slaves



                master




                slaves




                                     The iSchool
                         University of Maryland
Producer/Consumer Flow




         P     C   P     C


         P     C   P     C


         P     C   P     C




                                         The iSchool
                             University of Maryland
Work Queues




        P                    C
              shared queue

        P     W W W W W      C

        P                    C




                                             The iSchool
                                 University of Maryland
Rubber Meets Road
   From patterns to implementation:
       pthreads, OpenMP for multi-threaded programming
       MPI for clustering computing
       …
   The reality:
       Lots of one-off solutions, custom code
       Write you own dedicated library, then program with it
       Burden on the programmer to explicitly manage everything
   MapReduce to the rescue!
       (for next time)




                                                                 The iSchool
                                                     University of Maryland

More Related Content

Viewers also liked

Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)embur
 
Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012confesercentivenezia
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOPascale Gaudet
 
BioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsBioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsPascale Gaudet
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012Pascale Gaudet
 

Viewers also liked (7)

Rinaldi - ODIN
Rinaldi - ODINRinaldi - ODIN
Rinaldi - ODIN
 
Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)
 
Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
BioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsBioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next Developments
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012
 

Similar to Cloud computing

ccna course 2
ccna course 2ccna course 2
ccna course 2S Sridhar
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM educationSu White
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfittingLibgirlTeam
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce George Ang
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceGeorge Ang
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunamiinside-BigData.com
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCameron Kiddle
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)Andy Lenards
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Martin
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmosopenforum
 
ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning Mark S. Steed
 

Similar to Cloud computing (20)

ccna course 2
ccna course 2ccna course 2
ccna course 2
 
Data Research Vision
Data Research VisionData Research Vision
Data Research Vision
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM education
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfitting
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
 
Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunami
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in Science
 
Environmental Science, Big Data and the Cloud
Environmental Science, Big Data and the CloudEnvironmental Science, Big Data and the Cloud
Environmental Science, Big Data and the Cloud
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)
 
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challengesCloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmos
 
ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Cloud computing

  • 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
  • 3. What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications The iSchool University of Maryland
  • 4. 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  … The iSchool University of Maryland
  • 5. How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. The iSchool University of Maryland
  • 8. There’s nothing like more data! s/inspiration/data/g; The iSchool (Banko and Brill, ACL 2001) University of Maryland (Brants et al., EMNLP 2007)
  • 9. What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well Who shot Abraham Lincoln? → X shot Abraham Lincoln  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) PERSON (DATE – PERSON was born in DATE The iSchool (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) University of Maryland
  • 10. 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management The iSchool University of Maryland
  • 13. Key Technology: Virtualization App App App App App App OS OS OS Operating System Hypervisor Hardware Hardware Traditional Stack Virtualized Stack The iSchool University of Maryland
  • 14. 3. Different Computing Models “Why do it yourself if you can pay someone to do it for you?”  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail The iSchool University of Maryland
  • 15. 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse… The iSchool University of Maryland
  • 16. What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing The iSchool University of Maryland
  • 17. Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments! The iSchool University of Maryland
  • 18. This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course! The iSchool University of Maryland
  • 20. Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better The iSchool University of Maryland
  • 25. Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3 The iSchool University of Maryland
  • 26. Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master… The iSchool University of Maryland
  • 27. Divide and Conquer “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 “Result” Combine The iSchool University of Maryland
  • 28. Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system The iSchool University of Maryland
  • 29. Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models The iSchool University of Maryland
  • 30. Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) SISD MISD Single (SD) Single-threaded Pipeline process architecture Data SIMD MIMD Multiple (MD) Vector Processing Multi-threaded Programming The iSchool University of Maryland
  • 31. SISD Processor D D D D D D D Instructions The iSchool University of Maryland
  • 32. SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions The iSchool University of Maryland
  • 33. MIMD Processor D D D D D D D Instructions Processor D D D D D D D Instructions The iSchool University of Maryland
  • 34. Memory Typology: Shared Processor Processor Memory Processor Processor The iSchool University of Maryland
  • 35. Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory The iSchool University of Maryland
  • 36. Memory Typology: Hybrid Processor Processor Memory Memory Processor Processor Network Processor Processor Memory Memory Processor Processor The iSchool University of Maryland
  • 37. Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems? The iSchool University of Maryland
  • 38. General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder The iSchool University of Maryland
  • 39. Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines The iSchool University of Maryland
  • 40. Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” … The iSchool University of Maryland
  • 41. Master/Slaves master slaves The iSchool University of Maryland
  • 42. Producer/Consumer Flow P C P C P C P C P C P C The iSchool University of Maryland
  • 43. Work Queues P C shared queue P W W W W W C P C The iSchool University of Maryland
  • 44. Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time) The iSchool University of Maryland