SlideShare a Scribd company logo
1 of 44
Cloud Computing Lecture #1
What is Cloud Computing?
(and an intro to parallel/distributed processing)




                             Jimmy Lin
                             The iSchool
                             University of Maryland

                             Wednesday, September 3, 2008


       Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
       Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
       This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
       See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Source: http://www.free-pictures-photos.com/
What is Cloud Computing?
1.   Web-scale problems
2.   Large data centers
3.   Different models of computing
4.   Highly-interactive Web applications




                                                       The iSchool
                                           University of Maryland
1. Web-Scale Problems
   Characteristics:
       Definitely data-intensive
       May also be processing intensive
   Examples:
       Crawling, indexing, searching, mining the Web
       “Post-genomics” life sciences research
       Other scientific data (physics, astronomers, etc.)
       Sensor networks
       Web 2.0 applications
       …




                                                                     The iSchool
                                                         University of Maryland
How much data?
   Wayback Machine has 2 PB + 20 TB/month (2006)
   Google processes 20 PB a day (2008)
   “all words ever spoken by human beings” ~ 5 EB
   NOAA has ~1 PB climate data (2007)
   CERN’s LHC will generate 15 PB a year (2008)

                           640K ought to be
                           enough for anybody.




                                                             The iSchool
                                                 University of Maryland
Maximilien Brice, © CERN
Maximilien Brice, © CERN
There’s nothing like more data!
      s/inspiration/data/g;




                                          The iSchool
(Banko and Brill, ACL 2001)
                              University of Maryland
(Brants et al., EMNLP 2007)
What to do with more data?
          Answering factoid questions
                 Pattern matching on the Web
                 Works amazingly well
                            Who shot Abraham Lincoln? → X shot Abraham Lincoln

          Learning relations
                 Start with seed instances
                 Search for patterns on the Web
                 Using patterns to find more instances
                                                            Wolfgang Amadeus Mozart (1756 - 1791)
                                                            Einstein was born in 1879

               Birthday-of(Mozart, 1756)
               Birthday-of(Einstein, 1879)
                                                                    PERSON (DATE –
                                                                    PERSON was born in DATE

                                                                                                    The iSchool
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )                   University of Maryland
2. Large Data Centers
   Web-scale problems? Throw more machines at it!
   Clear trend: centralization of computing resources in large
    data centers
       Necessary ingredients: fiber, juice, and space
       What do Oregon, Iceland, and abandoned mines have in
        common?
   Important Issues:
       Redundancy
       Efficiency
       Utilization
       Management



                                                               The iSchool
                                                   University of Maryland
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
Key Technology: Virtualization


                           App      App          App

    App     App      App   OS       OS           OS

      Operating System           Hypervisor

          Hardware               Hardware

      Traditional Stack      Virtualized Stack




                                                   The iSchool
                                       University of Maryland
3. Different Computing Models
     “Why do it yourself if you can pay someone to do it for you?”

   Utility computing
       Why buy machines when you can rent cycles?
       Examples: Amazon’s EC2, GoGrid, AppNexus
   Platform as a Service (PaaS)
       Give me nice API and take care of the implementation
       Example: Google App Engine
   Software as a Service (SaaS)
       Just run it for me!
       Example: Gmail



                                                                         The iSchool
                                                             University of Maryland
4. Web Applications
   A mistake on top of a hack built on sand held together by
    duct tape?
   What is the nature of software applications?
       From the desktop to the browser
       SaaS == Web-based applications
       Examples: Google Maps, Facebook
   How do we deliver highly-interactive Web-based
    applications?
       AJAX (asynchronous JavaScript and XML)
       For better, or for worse…




                                                             The iSchool
                                                 University of Maryland
What is the course about?
   MapReduce: the “back-end” of cloud computing
       Batch-oriented processing of large datasets
   Ajax: the “front-end” of cloud computing
       Highly-interactive Web-based applications
   Computing “in the clouds”
       Amazon’s EC2/S3 as an example of utility computing




                                                                  The iSchool
                                                      University of Maryland
Amazon Web Services
   Elastic Compute Cloud (EC2)
       Rent computing resources by the hour
       Basic unit of accounting = instance-hour
       Additional costs for bandwidth
   Simple Storage Service (S3)
       Persistent storage
       Charge by the GB/month
       Additional costs for bandwidth
   You’ll be using EC2/S3 for course assignments!




                                                               The iSchool
                                                   University of Maryland
This course is not for you…
    If you’re not genuinely interested in the topic
    If you’re not ready to do a lot of programming
    If you’re not open to thinking about computing in new ways
    If you can’t cope with uncertainly, unpredictability, poor
     documentation, and immature software
    If you can’t put in the time



             Otherwise, this will be a richly rewarding course!



                                                                    The iSchool
                                                        University of Maryland
Source: http://davidzinger.wordpress.com/2007/05/page/2/
Cloud Computing Zen
   Don’t get frustrated (take a deep breath)…
       This is bleeding edge technology
       Those W$*#T@F! moments
   Be patient…
       This is the second first time I’ve taught this course
   Be flexible…
       There will be unanticipated issues along the way
   Be constructive…
       Tell me how I can make everyone’s experience better




                                                                      The iSchool
                                                          University of Maryland
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Things to go over…
   Course schedule
   Assignments and deliverables
   Amazon EC2/S3




                                               The iSchool
                                   University of Maryland
Web-Scale Problems?
   Don’t hold your breath:
       Biocomputing
       Nanocomputing
       Quantum computing
       …
   It all boils down to…
       Divide-and-conquer
       Throwing more hardware at the problem




                       Simple to understand… a lifetime to master…

                                                                       The iSchool
                                                           University of Maryland
Divide and Conquer

                 “Work”
                                       Partition


        w1         w2         w3


      “worker”   “worker”   “worker”


         r1         r2         r3




                 “Result”              Combine


                                                     The iSchool
                                         University of Maryland
Different Workers
   Different threads in the same core
   Different cores in the same CPU
   Different CPUs in a multi-processor system
   Different machines in a distributed system




                                                             The iSchool
                                                 University of Maryland
Choices, Choices, Choices
   Commodity vs. “exotic” hardware
   Number of machines vs. processor vs. cores
   Bandwidth of memory vs. disk vs. network
   Different programming models




                                                           The iSchool
                                               University of Maryland
Flynn’s Taxonomy

                                        Instructions

                              Single (SI)         Multiple (MI)

                                 SISD                  MISD
           Single (SD)


                            Single-threaded         Pipeline
                                process           architecture
    Data




                                SIMD                   MIMD
           Multiple (MD)




                           Vector Processing     Multi-threaded
                                                 Programming




                                                                          The iSchool
                                                              University of Maryland
SISD



                    Processor


       D   D   D        D         D   D   D




                   Instructions




                                                          The iSchool
                                              University of Maryland
SIMD

                       Processor


       D0   D0   D0       D0         D0   D0   D0
       D1   D1   D1       D1         D1   D1   D1
       D2   D2   D2       D2         D2   D2   D2
       D3   D3   D3       D3         D3   D3   D3
       D4   D4   D4       D4         D4   D4   D4
       …    …    …        …          …    …    …
       Dn   Dn   Dn       Dn         Dn   Dn   Dn



                      Instructions



                                                            The iSchool
                                                University of Maryland
MIMD
                    Processor


       D   D   D        D         D   D   D




                   Instructions

                    Processor


       D   D   D        D         D   D   D




                   Instructions


                                                          The iSchool
                                              University of Maryland
Memory Typology: Shared




           Processor            Processor

                       Memory

           Processor            Processor




                                                        The iSchool
                                            University of Maryland
Memory Typology: Distributed



       Processor   Memory             Processor   Memory


                            Network



       Processor   Memory             Processor   Memory




                                                              The iSchool
                                                  University of Maryland
Memory Typology: Hybrid


       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor


                            Network



       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor




                                                              The iSchool
                                                  University of Maryland
Parallelization Problems
    How do we assign work units to workers?
    What if we have more work units than workers?
    What if workers need to share partial results?
    How do we aggregate partial results?
    How do we know all the workers have finished?
    What if workers die?


                What is the common theme of all of these problems?



                                                                The iSchool
                                                    University of Maryland
General Theme?
   Parallelization problems arise from:
       Communication between workers
       Access to shared resources (e.g., data)
   Thus, we need a synchronization system!
   This is tricky:
       Finding bugs is hard
       Solving bugs is even harder




                                                              The iSchool
                                                  University of Maryland
Managing Multiple Workers
   Difficult because
       (Often) don’t know the order in which workers run
       (Often) don’t know where the workers are running
       (Often) don’t know when workers interrupt each other
   Thus, we need:
       Semaphores (lock, unlock)
       Conditional variables (wait, notify, broadcast)
       Barriers
   Still, lots of problems:
       Deadlock, livelock, race conditions, ...
   Moral of the story: be careful!
       Even trickier if the workers are on different machines
                                                                      The iSchool
                                                          University of Maryland
Patterns for Parallelism
    Parallel computing has been around for decades
    Here are some “design patterns” …




                                                         The iSchool
                                             University of Maryland
Master/Slaves



                master




                slaves




                                     The iSchool
                         University of Maryland
Producer/Consumer Flow




         P     C   P     C


         P     C   P     C


         P     C   P     C




                                         The iSchool
                             University of Maryland
Work Queues




        P                    C
              shared queue

        P     W W W W W      C

        P                    C




                                             The iSchool
                                 University of Maryland
Rubber Meets Road
   From patterns to implementation:
       pthreads, OpenMP for multi-threaded programming
       MPI for clustering computing
       …
   The reality:
       Lots of one-off solutions, custom code
       Write you own dedicated library, then program with it
       Burden on the programmer to explicitly manage everything
   MapReduce to the rescue!
       (for next time)




                                                                 The iSchool
                                                     University of Maryland

More Related Content

Viewers also liked

Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)embur
 
Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012confesercentivenezia
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOPascale Gaudet
 
BioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsBioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsPascale Gaudet
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012Pascale Gaudet
 

Viewers also liked (7)

Rinaldi - ODIN
Rinaldi - ODINRinaldi - ODIN
Rinaldi - ODIN
 
Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)
 
Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
BioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsBioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next Developments
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012
 

Similar to Cloud computing

ccna course 2
ccna course 2ccna course 2
ccna course 2S Sridhar
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM educationSu White
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfittingLibgirlTeam
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce George Ang
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceGeorge Ang
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunamiinside-BigData.com
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCameron Kiddle
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)Andy Lenards
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmosopenforum
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Martin
 

Similar to Cloud computing (20)

ccna course 2
ccna course 2ccna course 2
ccna course 2
 
Data Research Vision
Data Research VisionData Research Vision
Data Research Vision
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM education
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfitting
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
 
Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunami
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in Science
 
Environmental Science, Big Data and the Cloud
Environmental Science, Big Data and the CloudEnvironmental Science, Big Data and the Cloud
Environmental Science, Big Data and the Cloud
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)
 
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challengesCloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmos
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Cloud computing

  • 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
  • 3. What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications The iSchool University of Maryland
  • 4. 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  … The iSchool University of Maryland
  • 5. How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. The iSchool University of Maryland
  • 8. There’s nothing like more data! s/inspiration/data/g; The iSchool (Banko and Brill, ACL 2001) University of Maryland (Brants et al., EMNLP 2007)
  • 9. What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well Who shot Abraham Lincoln? → X shot Abraham Lincoln  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) PERSON (DATE – PERSON was born in DATE The iSchool (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) University of Maryland
  • 10. 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management The iSchool University of Maryland
  • 13. Key Technology: Virtualization App App App App App App OS OS OS Operating System Hypervisor Hardware Hardware Traditional Stack Virtualized Stack The iSchool University of Maryland
  • 14. 3. Different Computing Models “Why do it yourself if you can pay someone to do it for you?”  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail The iSchool University of Maryland
  • 15. 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse… The iSchool University of Maryland
  • 16. What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing The iSchool University of Maryland
  • 17. Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments! The iSchool University of Maryland
  • 18. This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course! The iSchool University of Maryland
  • 20. Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better The iSchool University of Maryland
  • 25. Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3 The iSchool University of Maryland
  • 26. Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master… The iSchool University of Maryland
  • 27. Divide and Conquer “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 “Result” Combine The iSchool University of Maryland
  • 28. Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system The iSchool University of Maryland
  • 29. Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models The iSchool University of Maryland
  • 30. Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) SISD MISD Single (SD) Single-threaded Pipeline process architecture Data SIMD MIMD Multiple (MD) Vector Processing Multi-threaded Programming The iSchool University of Maryland
  • 31. SISD Processor D D D D D D D Instructions The iSchool University of Maryland
  • 32. SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions The iSchool University of Maryland
  • 33. MIMD Processor D D D D D D D Instructions Processor D D D D D D D Instructions The iSchool University of Maryland
  • 34. Memory Typology: Shared Processor Processor Memory Processor Processor The iSchool University of Maryland
  • 35. Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory The iSchool University of Maryland
  • 36. Memory Typology: Hybrid Processor Processor Memory Memory Processor Processor Network Processor Processor Memory Memory Processor Processor The iSchool University of Maryland
  • 37. Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems? The iSchool University of Maryland
  • 38. General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder The iSchool University of Maryland
  • 39. Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines The iSchool University of Maryland
  • 40. Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” … The iSchool University of Maryland
  • 41. Master/Slaves master slaves The iSchool University of Maryland
  • 42. Producer/Consumer Flow P C P C P C P C P C P C The iSchool University of Maryland
  • 43. Work Queues P C shared queue P W W W W W C P C The iSchool University of Maryland
  • 44. Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time) The iSchool University of Maryland