Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloud Computing

453 views

Published on

What is Cloud Computing?
and an intro to parallel/distributed processing

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Cloud Computing

  1. 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
  2. 2. Source: http://www.free-pictures-photos.com/
  3. 3. The iSchool University of Maryland What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications
  4. 4. The iSchool University of Maryland 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  …
  5. 5. The iSchool University of Maryland How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody.
  6. 6. Maximilien Brice, © CERN
  7. 7. Maximilien Brice, © CERN
  8. 8. The iSchool University of Maryland There’s nothing like more data! s/inspiration/data/g; (Banko and Brill, ACL 2001) (Brants et al., EMNLP 2007)
  9. 9. The iSchool University of Maryland What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Who shot Abraham Lincoln? → X shot Abraham Lincoln Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 PERSON (DATE – PERSON was born in DATE (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
  10. 10. The iSchool University of Maryland 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management
  11. 11. Source: Harper’s (Feb, 2008)
  12. 12. Maximilien Brice, © CERN
  13. 13. The iSchool University of Maryland Key Technology: Virtualization Hardware Operating System App App App Traditional Stack Hardware OS App App App Hypervisor OS OS Virtualized Stack
  14. 14. The iSchool University of Maryland 3. Different Computing Models  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail “Why do it yourself if you can pay someone to do it for you?”
  15. 15. The iSchool University of Maryland 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse…
  16. 16. The iSchool University of Maryland What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing
  17. 17. The iSchool University of Maryland Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments!
  18. 18. The iSchool University of Maryland This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course!
  19. 19. Source: http://davidzinger.wordpress.com/2007/05/page/2/
  20. 20. The iSchool University of Maryland Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better
  21. 21. Source: Wikipedia
  22. 22. Source: Wikipedia
  23. 23. Source: Wikipedia
  24. 24. Source: Wikipedia
  25. 25. The iSchool University of Maryland Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3
  26. 26. The iSchool University of Maryland Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master…
  27. 27. The iSchool University of Maryland Divide and Conquer “Work” w1 w2 w3 r1 r2 r3 “Result” “worker” “worker” “worker” Partition Combine
  28. 28. The iSchool University of Maryland Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system
  29. 29. The iSchool University of Maryland Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models
  30. 30. The iSchool University of Maryland Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) Data Multiple(M SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Multi-threaded Programming Single(SD)
  31. 31. The iSchool University of Maryland SISD D D D D D D D Processor Instructions
  32. 32. The iSchool University of Maryland SIMD D0 Processor Instructions D0D0 D0 D0 D0 D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D0
  33. 33. The iSchool University of Maryland MIMD D D D D D D D Processor Instructions D D D D D D D Processor Instructions
  34. 34. The iSchool University of Maryland Memory Typology: Shared Memory Processor Processor Processor Processor
  35. 35. The iSchool University of Maryland Memory Typology: Distributed MemoryProcessor MemoryProcessor MemoryProcessor MemoryProcessor Network
  36. 36. The iSchool University of Maryland Memory Typology: Hybrid Memory Processor Network Processor Memory Processor Processor Memory Processor Processor Memory Processor Processor
  37. 37. The iSchool University of Maryland Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems?
  38. 38. The iSchool University of Maryland General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder
  39. 39. The iSchool University of Maryland Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines
  40. 40. The iSchool University of Maryland Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” …
  41. 41. The iSchool University of Maryland Master/Slaves slaves master
  42. 42. The iSchool University of Maryland Producer/Consumer Flow CP P P C C CP P P C C
  43. 43. The iSchool University of Maryland Work Queues CP P P C C shared queue W W W W W
  44. 44. The iSchool University of Maryland Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time)

×