SlideShare a Scribd company logo
Infrastructure for Cloud
Computing
Dahai Li

2008/06/12
Agenda

    • About Cloud Computing
    • Tools for Cloud Computing in Google
    • Google’s partnerships with universities




2
What’s new?




3
Advantages

• Data safety and reliability
• Data synchronization between different
 devices
• Low requirement of end device
• Unlimited potential of the cloud
Cloud for end user




                     Google Cloud
Cloud for web developer


             Google Cloud



                            APIs
Example: Earthquake map based on Map API




7
Agenda

    • About Cloud Computing
    • Tools for Cloud Computing in Google
    • Google’s partnerships with universities




8
google.stanford.edu (circa 1997)
google.com (1999)
Google Data Center (circa 2000)
Google File System (GFS)




12
Why GFS?

     • Google has unusual requirements
     • Unfair advantage
     • Fun and challenging to build large-scale
      systems




13
GFS Architecture




                         Replicas
                           GFS Master
               Masters    MSN                             Client
                          19% Master
                           GFS
                       Google                             Client
                                                         Client
                        48%
                                                         Client
                                                          Client

      C0     C1             C1            C0    C5       Client
                                                          Client
                         Yahoo

      C5     C2       C5
                          33%
                            C3       …          C2
                                                         Client
                                                          Client

     Chunkserver 1   Chunkserver 2       Chunkserver N




14
Master

     • Maintain Metadata:
       – File namespace
       – Access control info
       – Maps files to chunks
     • Control system activities:
       – Monitor state of chunkservers
       – Chunk allocation and placement
       – Initiate chunk recovery and rebalancing
       – Garbage collect dead chunks
       – Collect and display stats, admin functions
15
Client

     • Protocol implemented by client library
     • Read protocol




16
GFS Usage in Google Cloud

     • 50+ clusters
     • Filesystem clusters of up to 1000+
      machines
     • Pools of 1000+ clients
     • 10+ GB/s read/write load
       – in the presence of frequent hardware failures




17
MapReduce




18
What’s MapReduce

     • A simple programming model that applies to
      many large-scale computing problems
     • Hide messy details in MapReduce runtime
      library




19
Typical problem solved by MapReduce

     • Read a lot of data
     • Map: extract something you care about from
      each record
     • Shuffle and Sort
     • Reduce: aggregate, summarize, filter, or
      transform
     • Write the results




20
More specifically…

     • Programmer specifies two primary methods:
       – map(k, v) → <k', v'>*
       – reduce(k', <v'>*) → <k', v'>*
     • All v' with same k' are reduced together, in
      order.




21
Example: Word Frequencies in Web Pages

     • Input is files with one document per record
     • Specify a map function that takes a key/value pair
       – key = document URL
       – value = document contents
     • Output of map function is (potentially many) key/value
      pairs.
       – In our case, output (word, “1”) once per word in the
          document
                        <“网页1”, “是也不是”>

                              <“是”, “1”>
                              <“也”, “1”>
                              <“不”, “1”>
                              …
22
Continued: word frequencies in web pages

     • MapReduce library gathers together all pairs with the
      same key (shuffle/sort)
     • The reduce function combines the values for a key
      In our case, compute the sum
     key = “是”               key = “也”         key = “不”
     values = “1”, “1”       values = “1”      values = “1”

           “2”                   “1”               “1”

     • Output of reduce (usually 0 or 1 value) paired with key
      and saved
                             “是”, “2”
                             “也”, “1”
                             “不”, “1”


23
Example: Pseudo-code


     Map(String input_key, String input_value):
      // input_key: document name
      // input_value: document contents
      for each word w in input_values:
        EmitIntermediate(w, "1");

     Reduce(String key, Iterator intermediate_values):
      // key: a word, same for input and output
      // intermediate_values: a list of counts
      int result = 0;
      for each v in intermediate_values:
        result += ParseInt(v);
      Emit(AsString(result));



24
Conclusion to MapReduce

     • MapReduce has proven to be a remarkably-useful
      abstraction
     • Greatly simplifies large-scale computations at Google
     • Fun to use: focus on problem, let library deal with messy
      details
     • Many thousands of parallel programs written by
      hundreds of different programmers in last few years
       – Many had no prior parallel or distributed programming
          experience




25
BigTable




26
Overview

     • Structure data storage, not database
     • Wide applicability
     • Scalability
     • High performance
     • High availability




27
Basic Data Model

     • Distributed multi-dimensional sparse map
           (row, column, timestamp)        cell contents

                                 “contents”       COLUMNS

          ROWS
                                      …
     www.cnn.com                                   t1
                                    …
                                                t2
                               “<html>…”       t3 TIMESTAMPS




     • Good match for most of our applications


28
BigTable API

     • Metadata operations
       – Create/delete tables, column families, change metadata
     • Writes (atomic)
       – Set(): write cells in a row
       – DeleteCells(): delete cells in a row
       – DeleteRow(): delete all cells in a row
     • Reads
       – Scanner: read arbitrary cells in a bigtable




29
System Structure

                                                              Bigtable client
     Bigtable cell
                                                              Bigtable client
                                  Bigtable master                 library
                            performs metadata ops,              Open()
                                 load balancing

   Bigtable tablet server      Bigtable tablet server   Bigtable tablet server

       serves data                 serves data              serves data



   Cluster Scheduling Master             GFS               Lock service
handles failover, monitoring holds tablet data, logs holds metadata,
                                                   handles master-election
Current status of BigTable

     • Design/initial implementation started beginning of 2004
     • Currently ~100 BigTable cells
     • Production use or active development for many projects:
        – Google Print
        – My Search History
        – Orkut
        – Crawling/indexing pipeline
        – Google Maps/Google Earth
        – Blogger
        – …
     • Largest bigtable cell manages ~200TB of data spread
       over several thousand machines (larger cells planned)



31
Typical Cluster



             Lock service          GFS master             Scheduling masters


             Machine 1                  Machine 2                   Machine N
      User                       User                                         User
      app1                       app1                                         app3
                                                  User
     User app2                                    app3         User app2
                                                          …
     Scheduler       GFS        Scheduler      GFS            Scheduler       GFS
       slave      chunkserver     slave     chunkserver         slave      chunkserver
               Linux                      Linux                       Linux




32
Agenda

     • About Cloud Computing
     • Tools for Cloud Computing in Google
     • Google’s partnerships with universities




33
ACCI in Oct. 2007

     • Stand for Academic Cloud Computing
      Initiative
     • IBM and Google partnership
     • Facilitate universities education with
      distributed system programming skills
     • Started from University of Washington and
      scaling to many others




34
Google’s ACCI activities in Greater China

• Google Greater China has helped create a
 cloud computing course at Tsinghua in
 summer 2007
• Now scaling to other mainland China and
 Taiwan Universities
Example: THU MR Course, Fall 2007

• “Massive Data Processing” course based
 on Google Cloud technology
• Google employees gave lectures during
 the course offering;
• Got interesting results from the smart
 students



• http://hpc.cs.tsinghua.edu.cn/dpcourse/
Count: THU MR Course, Fall 2007




Students presenting course          Massive data processing to
project “simulating the operation   simulate the operation of
of solar system based on            the solar system
MapReduce technology” at
Google office
THANK YOU



More info on
        http://code.google.com/intl/zh-CN/

More Related Content

Viewers also liked

Nuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applicationsNuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applications
Nuxeo
 
Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0
Armedia LLC
 
Armedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECMArmedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECMArmedia LLC
 
Grottarossa:Why?
Grottarossa:Why?Grottarossa:Why?
Grottarossa:Why?
Maurizio Farina
 
Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1
rloggen
 
Nigeria national iccm implementation framework
Nigeria national iccm implementation frameworkNigeria national iccm implementation framework
Nigeria national iccm implementation framework
tomowo George
 
Composing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise productsComposing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise products
Leon Smiers
 
Amplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management FrameworkAmplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management Framework
Amplexor
 
Nuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management FrameworkNuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management Framework
Nuxeo
 
Nuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical OverviewNuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical Overview
Nuxeo
 
Managing the Cloud with Open Source Tools
Managing the Cloud with Open Source ToolsManaging the Cloud with Open Source Tools
Managing the Cloud with Open Source ToolsNakul Ezhuthupally
 
Open Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud ComputingOpen Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud Computing
Mark Hinkle
 
Electronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposalElectronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposal
Laud Randy Amofah
 
Dream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management SystemDream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management System
Salesforce Engineering
 
Odoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparisionOdoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparisionOdoo
 
2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results
Black Duck by Synopsys
 
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, ParisNuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, ParisOW2
 
Design Your Career 2018
Design Your Career 2018Design Your Career 2018
Design Your Career 2018
Slides That Rock
 

Viewers also liked (18)

Nuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applicationsNuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applications
 
Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0
 
Armedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECMArmedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECM
 
Grottarossa:Why?
Grottarossa:Why?Grottarossa:Why?
Grottarossa:Why?
 
Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1
 
Nigeria national iccm implementation framework
Nigeria national iccm implementation frameworkNigeria national iccm implementation framework
Nigeria national iccm implementation framework
 
Composing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise productsComposing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise products
 
Amplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management FrameworkAmplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management Framework
 
Nuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management FrameworkNuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management Framework
 
Nuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical OverviewNuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical Overview
 
Managing the Cloud with Open Source Tools
Managing the Cloud with Open Source ToolsManaging the Cloud with Open Source Tools
Managing the Cloud with Open Source Tools
 
Open Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud ComputingOpen Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud Computing
 
Electronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposalElectronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposal
 
Dream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management SystemDream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management System
 
Odoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparisionOdoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparision
 
2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results
 
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, ParisNuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
 
Design Your Career 2018
Design Your Career 2018Design Your Career 2018
Design Your Career 2018
 

Similar to Infrastructure for cloud_computing

Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
hansen3032
 
Cloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made onCloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made on
Patrick Chanezon
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Albert Bifet
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
Michael Kopp
 
Cloud computing_processing frameworks
Cloud computing_processing frameworksCloud computing_processing frameworks
Cloud computing_processing frameworks
Reem Abdel-Rahman
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
datastack
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume LaforgeGR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM
Joy Rahman
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
Kelly Technologies
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Xavier Llorà
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
spil-engineering
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 

Similar to Infrastructure for cloud_computing (20)

Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Cloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made onCloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made on
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Cloud computing_processing frameworks
Cloud computing_processing frameworksCloud computing_processing frameworks
Cloud computing_processing frameworks
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume LaforgeGR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM
 
Handout3o
Handout3oHandout3o
Handout3o
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 

More from JULIO GONZALEZ SANZ

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]JULIO GONZALEZ SANZ
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]JULIO GONZALEZ SANZ
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]JULIO GONZALEZ SANZ
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]JULIO GONZALEZ SANZ
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement systemJULIO GONZALEZ SANZ
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean productionJULIO GONZALEZ SANZ
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrJULIO GONZALEZ SANZ
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julio
JULIO GONZALEZ SANZ
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data qualityJULIO GONZALEZ SANZ
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processesJULIO GONZALEZ SANZ
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 

More from JULIO GONZALEZ SANZ (20)

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
 
Cmmi 26 ago_2009_
Cmmi 26 ago_2009_Cmmi 26 ago_2009_
Cmmi 26 ago_2009_
 
Creation use-of-simple-model
Creation use-of-simple-modelCreation use-of-simple-model
Creation use-of-simple-model
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement system
 
Magic quadrant
Magic quadrantMagic quadrant
Magic quadrant
 
6 six sigma presentation
6 six sigma presentation6 six sigma presentation
6 six sigma presentation
 
Volvo csr suppliers guide vsib
Volvo csr suppliers guide vsibVolvo csr suppliers guide vsib
Volvo csr suppliers guide vsib
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean production
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfr
 
Using minitab exec files
Using minitab exec filesUsing minitab exec files
Using minitab exec files
 
Sga iso-14001
Sga iso-14001Sga iso-14001
Sga iso-14001
 
Cslt closing plenary_portugal
Cslt closing plenary_portugalCslt closing plenary_portugal
Cslt closing plenary_portugal
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julio
 
Swebokv3
Swebokv3 Swebokv3
Swebokv3
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data quality
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Infrastructure for cloud_computing

  • 2. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities 2
  • 4. Advantages • Data safety and reliability • Data synchronization between different devices • Low requirement of end device • Unlimited potential of the cloud
  • 5. Cloud for end user Google Cloud
  • 6. Cloud for web developer Google Cloud APIs
  • 7. Example: Earthquake map based on Map API 7
  • 8. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities 8
  • 11. Google Data Center (circa 2000)
  • 12. Google File System (GFS) 12
  • 13. Why GFS? • Google has unusual requirements • Unfair advantage • Fun and challenging to build large-scale systems 13
  • 14. GFS Architecture Replicas GFS Master Masters MSN Client 19% Master GFS Google Client Client 48% Client Client C0 C1 C1 C0 C5 Client Client Yahoo C5 C2 C5 33% C3 … C2 Client Client Chunkserver 1 Chunkserver 2 Chunkserver N 14
  • 15. Master • Maintain Metadata: – File namespace – Access control info – Maps files to chunks • Control system activities: – Monitor state of chunkservers – Chunk allocation and placement – Initiate chunk recovery and rebalancing – Garbage collect dead chunks – Collect and display stats, admin functions 15
  • 16. Client • Protocol implemented by client library • Read protocol 16
  • 17. GFS Usage in Google Cloud • 50+ clusters • Filesystem clusters of up to 1000+ machines • Pools of 1000+ clients • 10+ GB/s read/write load – in the presence of frequent hardware failures 17
  • 19. What’s MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library 19
  • 20. Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results 20
  • 21. More specifically… • Programmer specifies two primary methods: – map(k, v) → <k', v'>* – reduce(k', <v'>*) → <k', v'>* • All v' with same k' are reduced together, in order. 21
  • 22. Example: Word Frequencies in Web Pages • Input is files with one document per record • Specify a map function that takes a key/value pair – key = document URL – value = document contents • Output of map function is (potentially many) key/value pairs. – In our case, output (word, “1”) once per word in the document <“网页1”, “是也不是”> <“是”, “1”> <“也”, “1”> <“不”, “1”> … 22
  • 23. Continued: word frequencies in web pages • MapReduce library gathers together all pairs with the same key (shuffle/sort) • The reduce function combines the values for a key In our case, compute the sum key = “是” key = “也” key = “不” values = “1”, “1” values = “1” values = “1” “2” “1” “1” • Output of reduce (usually 0 or 1 value) paired with key and saved “是”, “2” “也”, “1” “不”, “1” 23
  • 24. Example: Pseudo-code Map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); 24
  • 25. Conclusion to MapReduce • MapReduce has proven to be a remarkably-useful abstraction • Greatly simplifies large-scale computations at Google • Fun to use: focus on problem, let library deal with messy details • Many thousands of parallel programs written by hundreds of different programmers in last few years – Many had no prior parallel or distributed programming experience 25
  • 27. Overview • Structure data storage, not database • Wide applicability • Scalability • High performance • High availability 27
  • 28. Basic Data Model • Distributed multi-dimensional sparse map (row, column, timestamp) cell contents “contents” COLUMNS ROWS … www.cnn.com t1 … t2 “<html>…” t3 TIMESTAMPS • Good match for most of our applications 28
  • 29. BigTable API • Metadata operations – Create/delete tables, column families, change metadata • Writes (atomic) – Set(): write cells in a row – DeleteCells(): delete cells in a row – DeleteRow(): delete all cells in a row • Reads – Scanner: read arbitrary cells in a bigtable 29
  • 30. System Structure Bigtable client Bigtable cell Bigtable client Bigtable master library performs metadata ops, Open() load balancing Bigtable tablet server Bigtable tablet server Bigtable tablet server serves data serves data serves data Cluster Scheduling Master GFS Lock service handles failover, monitoring holds tablet data, logs holds metadata, handles master-election
  • 31. Current status of BigTable • Design/initial implementation started beginning of 2004 • Currently ~100 BigTable cells • Production use or active development for many projects: – Google Print – My Search History – Orkut – Crawling/indexing pipeline – Google Maps/Google Earth – Blogger – … • Largest bigtable cell manages ~200TB of data spread over several thousand machines (larger cells planned) 31
  • 32. Typical Cluster Lock service GFS master Scheduling masters Machine 1 Machine 2 Machine N User User User app1 app1 app3 User User app2 app3 User app2 … Scheduler GFS Scheduler GFS Scheduler GFS slave chunkserver slave chunkserver slave chunkserver Linux Linux Linux 32
  • 33. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities 33
  • 34. ACCI in Oct. 2007 • Stand for Academic Cloud Computing Initiative • IBM and Google partnership • Facilitate universities education with distributed system programming skills • Started from University of Washington and scaling to many others 34
  • 35. Google’s ACCI activities in Greater China • Google Greater China has helped create a cloud computing course at Tsinghua in summer 2007 • Now scaling to other mainland China and Taiwan Universities
  • 36. Example: THU MR Course, Fall 2007 • “Massive Data Processing” course based on Google Cloud technology • Google employees gave lectures during the course offering; • Got interesting results from the smart students • http://hpc.cs.tsinghua.edu.cn/dpcourse/
  • 37. Count: THU MR Course, Fall 2007 Students presenting course Massive data processing to project “simulating the operation simulate the operation of of solar system based on the solar system MapReduce technology” at Google office
  • 38. THANK YOU More info on http://code.google.com/intl/zh-CN/