Infrastructure for CloudComputingDahai Li2008/06/12
Agenda    • About Cloud Computing    • Tools for Cloud Computing in Google    • Google’s partnerships with universities2
What’s new?3
Advantages• Data safety and reliability• Data synchronization between different devices• Low requirement of end device• Un...
Cloud for end user                     Google Cloud
Cloud for web developer             Google Cloud                            APIs
Example: Earthquake map based on Map API7
Agenda    • About Cloud Computing    • Tools for Cloud Computing in Google    • Google’s partnerships with universities8
google.stanford.edu (circa 1997)
google.com (1999)
Google Data Center (circa 2000)
Google File System (GFS)12
Why GFS?     • Google has unusual requirements     • Unfair advantage     • Fun and challenging to build large-scale      ...
GFS Architecture                         Replicas                           GFS Master               Masters    MSN       ...
Master     • Maintain Metadata:       – File namespace       – Access control info       – Maps files to chunks     • Cont...
Client     • Protocol implemented by client library     • Read protocol16
GFS Usage in Google Cloud     • 50+ clusters     • Filesystem clusters of up to 1000+      machines     • Pools of 1000+ c...
MapReduce18
What’s MapReduce     • A simple programming model that applies to      many large-scale computing problems     • Hide mess...
Typical problem solved by MapReduce     • Read a lot of data     • Map: extract something you care about from      each re...
More specifically…     • Programmer specifies two primary methods:       – map(k, v) → <k, v>*       – reduce(k, <v>*) → <...
Example: Word Frequencies in Web Pages     • Input is files with one document per record     • Specify a map function that...
Continued: word frequencies in web pages     • MapReduce library gathers together all pairs with the      same key (shuffl...
Example: Pseudo-code     Map(String input_key, String input_value):      // input_key: document name      // input_value: ...
Conclusion to MapReduce     • MapReduce has proven to be a remarkably-useful      abstraction     • Greatly simplifies lar...
BigTable26
Overview     • Structure data storage, not database     • Wide applicability     • Scalability     • High performance     ...
Basic Data Model     • Distributed multi-dimensional sparse map           (row, column, timestamp)        cell contents   ...
BigTable API     • Metadata operations       – Create/delete tables, column families, change metadata     • Writes (atomic...
System Structure                                                              Bigtable client     Bigtable cell           ...
Current status of BigTable     • Design/initial implementation started beginning of 2004     • Currently ~100 BigTable cel...
Typical Cluster             Lock service          GFS master             Scheduling masters             Machine 1         ...
Agenda     • About Cloud Computing     • Tools for Cloud Computing in Google     • Google’s partnerships with universities33
ACCI in Oct. 2007     • Stand for Academic Cloud Computing      Initiative     • IBM and Google partnership     • Facilita...
Google’s ACCI activities in Greater China• Google Greater China has helped create a cloud computing course at Tsinghua in ...
Example: THU MR Course, Fall 2007• “Massive Data Processing” course based on Google Cloud technology• Google employees gav...
Count: THU MR Course, Fall 2007Students presenting course          Massive data processing toproject “simulating the opera...
THANK YOUMore info on        http://code.google.com/intl/zh-CN/
Upcoming SlideShare
Loading in …5
×

Infrastructure for cloud_computing

567 views
487 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
567
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Infrastructure for cloud_computing

  1. 1. Infrastructure for CloudComputingDahai Li2008/06/12
  2. 2. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities2
  3. 3. What’s new?3
  4. 4. Advantages• Data safety and reliability• Data synchronization between different devices• Low requirement of end device• Unlimited potential of the cloud
  5. 5. Cloud for end user Google Cloud
  6. 6. Cloud for web developer Google Cloud APIs
  7. 7. Example: Earthquake map based on Map API7
  8. 8. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities8
  9. 9. google.stanford.edu (circa 1997)
  10. 10. google.com (1999)
  11. 11. Google Data Center (circa 2000)
  12. 12. Google File System (GFS)12
  13. 13. Why GFS? • Google has unusual requirements • Unfair advantage • Fun and challenging to build large-scale systems13
  14. 14. GFS Architecture Replicas GFS Master Masters MSN Client 19% Master GFS Google Client Client 48% Client Client C0 C1 C1 C0 C5 Client Client Yahoo C5 C2 C5 33% C3 … C2 Client Client Chunkserver 1 Chunkserver 2 Chunkserver N14
  15. 15. Master • Maintain Metadata: – File namespace – Access control info – Maps files to chunks • Control system activities: – Monitor state of chunkservers – Chunk allocation and placement – Initiate chunk recovery and rebalancing – Garbage collect dead chunks – Collect and display stats, admin functions15
  16. 16. Client • Protocol implemented by client library • Read protocol16
  17. 17. GFS Usage in Google Cloud • 50+ clusters • Filesystem clusters of up to 1000+ machines • Pools of 1000+ clients • 10+ GB/s read/write load – in the presence of frequent hardware failures17
  18. 18. MapReduce18
  19. 19. What’s MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library19
  20. 20. Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results20
  21. 21. More specifically… • Programmer specifies two primary methods: – map(k, v) → <k, v>* – reduce(k, <v>*) → <k, v>* • All v with same k are reduced together, in order.21
  22. 22. Example: Word Frequencies in Web Pages • Input is files with one document per record • Specify a map function that takes a key/value pair – key = document URL – value = document contents • Output of map function is (potentially many) key/value pairs. – In our case, output (word, “1”) once per word in the document <“网页1”, “是也不是”> <“是”, “1”> <“也”, “1”> <“不”, “1”> …22
  23. 23. Continued: word frequencies in web pages • MapReduce library gathers together all pairs with the same key (shuffle/sort) • The reduce function combines the values for a key In our case, compute the sum key = “是” key = “也” key = “不” values = “1”, “1” values = “1” values = “1” “2” “1” “1” • Output of reduce (usually 0 or 1 value) paired with key and saved “是”, “2” “也”, “1” “不”, “1”23
  24. 24. Example: Pseudo-code Map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));24
  25. 25. Conclusion to MapReduce • MapReduce has proven to be a remarkably-useful abstraction • Greatly simplifies large-scale computations at Google • Fun to use: focus on problem, let library deal with messy details • Many thousands of parallel programs written by hundreds of different programmers in last few years – Many had no prior parallel or distributed programming experience25
  26. 26. BigTable26
  27. 27. Overview • Structure data storage, not database • Wide applicability • Scalability • High performance • High availability27
  28. 28. Basic Data Model • Distributed multi-dimensional sparse map (row, column, timestamp) cell contents “contents” COLUMNS ROWS … www.cnn.com t1 … t2 “<html>…” t3 TIMESTAMPS • Good match for most of our applications28
  29. 29. BigTable API • Metadata operations – Create/delete tables, column families, change metadata • Writes (atomic) – Set(): write cells in a row – DeleteCells(): delete cells in a row – DeleteRow(): delete all cells in a row • Reads – Scanner: read arbitrary cells in a bigtable29
  30. 30. System Structure Bigtable client Bigtable cell Bigtable client Bigtable master library performs metadata ops, Open() load balancing Bigtable tablet server Bigtable tablet server Bigtable tablet server serves data serves data serves data Cluster Scheduling Master GFS Lock servicehandles failover, monitoring holds tablet data, logs holds metadata, handles master-election
  31. 31. Current status of BigTable • Design/initial implementation started beginning of 2004 • Currently ~100 BigTable cells • Production use or active development for many projects: – Google Print – My Search History – Orkut – Crawling/indexing pipeline – Google Maps/Google Earth – Blogger – … • Largest bigtable cell manages ~200TB of data spread over several thousand machines (larger cells planned)31
  32. 32. Typical Cluster Lock service GFS master Scheduling masters Machine 1 Machine 2 Machine N User User User app1 app1 app3 User User app2 app3 User app2 … Scheduler GFS Scheduler GFS Scheduler GFS slave chunkserver slave chunkserver slave chunkserver Linux Linux Linux32
  33. 33. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities33
  34. 34. ACCI in Oct. 2007 • Stand for Academic Cloud Computing Initiative • IBM and Google partnership • Facilitate universities education with distributed system programming skills • Started from University of Washington and scaling to many others34
  35. 35. Google’s ACCI activities in Greater China• Google Greater China has helped create a cloud computing course at Tsinghua in summer 2007• Now scaling to other mainland China and Taiwan Universities
  36. 36. Example: THU MR Course, Fall 2007• “Massive Data Processing” course based on Google Cloud technology• Google employees gave lectures during the course offering;• Got interesting results from the smart students• http://hpc.cs.tsinghua.edu.cn/dpcourse/
  37. 37. Count: THU MR Course, Fall 2007Students presenting course Massive data processing toproject “simulating the operation simulate the operation ofof solar system based on the solar systemMapReduce technology” atGoogle office
  38. 38. THANK YOUMore info on http://code.google.com/intl/zh-CN/

×