Cloud Computing Lecture #1What is Cloud Computing?(and an intro to parallel/distributed processing)                       ...
Source: http://www.free-pictures-photos.com/
What is Cloud Computing?1.   Web-scale problems2.   Large data centers3.   Different models of computing4.   Highly-intera...
1. Web-Scale Problems   Characteristics:       Definitely data-intensive       May also be processing intensive   Exam...
How much data?   Wayback Machine has 2 PB + 20 TB/month (2006)   Google processes 20 PB a day (2008)   “all words ever ...
Maximilien Brice, © CERN
Maximilien Brice, © CERN
There’s nothing like more data!      s/inspiration/data/g;                                          The iSchool(Banko and ...
What to do with more data?          Answering factoid questions                 Pattern matching on the Web             ...
2. Large Data Centers   Web-scale problems? Throw more machines at it!   Clear trend: centralization of computing resour...
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
Key Technology: Virtualization                           App      App          App    App     App      App   OS       OS  ...
3. Different Computing Models     “Why do it yourself if you can pay someone to do it for you?”   Utility computing     ...
4. Web Applications   A mistake on top of a hack built on sand held together by    duct tape?   What is the nature of so...
What is the course about?   MapReduce: the “back-end” of cloud computing       Batch-oriented processing of large datase...
Amazon Web Services   Elastic Compute Cloud (EC2)       Rent computing resources by the hour       Basic unit of accoun...
This course is not for you…    If you’re not genuinely interested in the topic    If you’re not ready to do a lot of pro...
Source: http://davidzinger.wordpress.com/2007/05/page/2/
Cloud Computing Zen   Don’t get frustrated (take a deep breath)…       This is bleeding edge technology       Those W$*...
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Things to go over…   Course schedule   Assignments and deliverables   Amazon EC2/S3                                    ...
Web-Scale Problems?   Don’t hold your breath:       Biocomputing       Nanocomputing       Quantum computing       …...
Divide and Conquer                 “Work”                                       Partition        w1         w2         w3 ...
Different Workers   Different threads in the same core   Different cores in the same CPU   Different CPUs in a multi-pr...
Choices, Choices, Choices   Commodity vs. “exotic” hardware   Number of machines vs. processor vs. cores   Bandwidth of...
Flynn’s Taxonomy                                        Instructions                              Single (SI)         Mult...
SISD                    Processor       D   D   D        D         D   D   D                   Instructions               ...
SIMD                       Processor       D0   D0   D0       D0         D0   D0   D0       D1   D1   D1       D1         ...
MIMD                    Processor       D   D   D        D         D   D   D                   Instructions               ...
Memory Typology: Shared           Processor            Processor                       Memory           Processor         ...
Memory Typology: Distributed       Processor   Memory             Processor   Memory                            Network   ...
Memory Typology: Hybrid       Processor                      Processor                   Memory                         Me...
Parallelization Problems    How do we assign work units to workers?    What if we have more work units than workers?   ...
General Theme?   Parallelization problems arise from:       Communication between workers       Access to shared resour...
Managing Multiple Workers   Difficult because       (Often) don’t know the order in which workers run       (Often) don...
Patterns for Parallelism    Parallel computing has been around for decades    Here are some “design patterns” …         ...
Master/Slaves                master                slaves                                     The iSchool                 ...
Producer/Consumer Flow         P     C   P     C         P     C   P     C         P     C   P     C                      ...
Work Queues        P                    C              shared queue        P     W W W W W      C        P                ...
Rubber Meets Road   From patterns to implementation:       pthreads, OpenMP for multi-threaded programming       MPI fo...
Upcoming SlideShare
Loading in...5
×

Cloud computing

290

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
290
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cloud computing

  1. 1. Cloud Computing Lecture #1What is Cloud Computing?(and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
  2. 2. Source: http://www.free-pictures-photos.com/
  3. 3. What is Cloud Computing?1. Web-scale problems2. Large data centers3. Different models of computing4. Highly-interactive Web applications The iSchool University of Maryland
  4. 4. 1. Web-Scale Problems Characteristics:  Definitely data-intensive  May also be processing intensive Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  … The iSchool University of Maryland
  5. 5. How much data? Wayback Machine has 2 PB + 20 TB/month (2006) Google processes 20 PB a day (2008) “all words ever spoken by human beings” ~ 5 EB NOAA has ~1 PB climate data (2007) CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. The iSchool University of Maryland
  6. 6. Maximilien Brice, © CERN
  7. 7. Maximilien Brice, © CERN
  8. 8. There’s nothing like more data! s/inspiration/data/g; The iSchool(Banko and Brill, ACL 2001) University of Maryland(Brants et al., EMNLP 2007)
  9. 9. What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well Who shot Abraham Lincoln? → X shot Abraham Lincoln  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) PERSON (DATE – PERSON was born in DATE The iSchool(Brill et al., TREC 2001; Lin, ACM TOIS 2007)(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) University of Maryland
  10. 10. 2. Large Data Centers Web-scale problems? Throw more machines at it! Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common? Important Issues:  Redundancy  Efficiency  Utilization  Management The iSchool University of Maryland
  11. 11. Source: Harper’s (Feb, 2008)
  12. 12. Maximilien Brice, © CERN
  13. 13. Key Technology: Virtualization App App App App App App OS OS OS Operating System Hypervisor Hardware Hardware Traditional Stack Virtualized Stack The iSchool University of Maryland
  14. 14. 3. Different Computing Models “Why do it yourself if you can pay someone to do it for you?” Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine Software as a Service (SaaS)  Just run it for me!  Example: Gmail The iSchool University of Maryland
  15. 15. 4. Web Applications A mistake on top of a hack built on sand held together by duct tape? What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse… The iSchool University of Maryland
  16. 16. What is the course about? MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing The iSchool University of Maryland
  17. 17. Amazon Web Services Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth You’ll be using EC2/S3 for course assignments! The iSchool University of Maryland
  18. 18. This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course! The iSchool University of Maryland
  19. 19. Source: http://davidzinger.wordpress.com/2007/05/page/2/
  20. 20. Cloud Computing Zen Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments Be patient…  This is the second first time I’ve taught this course Be flexible…  There will be unanticipated issues along the way Be constructive…  Tell me how I can make everyone’s experience better The iSchool University of Maryland
  21. 21. Source: Wikipedia
  22. 22. Source: Wikipedia
  23. 23. Source: Wikipedia
  24. 24. Source: Wikipedia
  25. 25. Things to go over… Course schedule Assignments and deliverables Amazon EC2/S3 The iSchool University of Maryland
  26. 26. Web-Scale Problems? Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  … It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master… The iSchool University of Maryland
  27. 27. Divide and Conquer “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 “Result” Combine The iSchool University of Maryland
  28. 28. Different Workers Different threads in the same core Different cores in the same CPU Different CPUs in a multi-processor system Different machines in a distributed system The iSchool University of Maryland
  29. 29. Choices, Choices, Choices Commodity vs. “exotic” hardware Number of machines vs. processor vs. cores Bandwidth of memory vs. disk vs. network Different programming models The iSchool University of Maryland
  30. 30. Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) SISD MISD Single (SD) Single-threaded Pipeline process architecture Data SIMD MIMD Multiple (MD) Vector Processing Multi-threaded Programming The iSchool University of Maryland
  31. 31. SISD Processor D D D D D D D Instructions The iSchool University of Maryland
  32. 32. SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions The iSchool University of Maryland
  33. 33. MIMD Processor D D D D D D D Instructions Processor D D D D D D D Instructions The iSchool University of Maryland
  34. 34. Memory Typology: Shared Processor Processor Memory Processor Processor The iSchool University of Maryland
  35. 35. Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory The iSchool University of Maryland
  36. 36. Memory Typology: Hybrid Processor Processor Memory Memory Processor Processor Network Processor Processor Memory Memory Processor Processor The iSchool University of Maryland
  37. 37. Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems? The iSchool University of Maryland
  38. 38. General Theme? Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data) Thus, we need a synchronization system! This is tricky:  Finding bugs is hard  Solving bugs is even harder The iSchool University of Maryland
  39. 39. Managing Multiple Workers Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers Still, lots of problems:  Deadlock, livelock, race conditions, ... Moral of the story: be careful!  Even trickier if the workers are on different machines The iSchool University of Maryland
  40. 40. Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” … The iSchool University of Maryland
  41. 41. Master/Slaves master slaves The iSchool University of Maryland
  42. 42. Producer/Consumer Flow P C P C P C P C P C P C The iSchool University of Maryland
  43. 43. Work Queues P C shared queue P W W W W W C P C The iSchool University of Maryland
  44. 44. Rubber Meets Road From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  … The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything MapReduce to the rescue!  (for next time) The iSchool University of Maryland
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×