This document discusses infrastructure for cloud computing and Google's tools. It describes Google's MapReduce and BigTable frameworks, which were developed for large-scale data processing and storage. It also outlines Google's Academic Cloud Computing Initiative (ACCI) partnership with universities to provide cloud computing education and skills. ACCI has helped create cloud computing courses at schools like Tsinghua University in China.
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract: The processing of massive amount of data gives great insights into analysis for business. Many primary algorithms run over the data and gives information which can be used for business benefits and scientific research. Extraction and processing of large amount of data has become a primary concern in terms of time, processing power and cost. Map Reduce algorithm promises to address the above mentioned concerns. It makes computing of large sets of data considerably easy and flexible. The algorithm offers high scalability across many computing nodes. This session will introduce Map Reduce algorithm, followed by few variations of the same and also hands on example in Map Reduce using Apache Hadoop.
Speaker: Allahbaksh Asadullah is a Product Technology Lead from Infosys Labs, Bangalore. He has over 5 years of experience in software industry in various technologies. He has extensively worked on GWT, Eclipse Plugin development, Lucene, Solr, No SQL databases etc. He speaks at the developer events like ACM Compute, Indic Threads and Dev Camps.
SecureGRC™ is a world-leading solution for all enterprises, including small and medium businesses. SecureGRC™ includes all security and IT-GRC functions required to be compliant with easy to adopt compliance management framework with ready to use frameworks, leading edge context based inference engines, most advanced alert processing and easy to use logging and monitoring solution.
Social, political and technological considerations for national identity mana...Ravinder (Ravi) Singh
Government agencies face the intricate challenge of effectively and securely controlling population flows,
identifying individuals, and managing their access to services, while aligning their strategies with citizen’s
expectations for convenience, security and privacy. Identity Management initiatives, especially after the
increased frequency of terrorist attacks around the world, have become a political imperative of
unprecedented urgency, for an increasing number of governments around the world. The India’s answer
to this challenge is expressed through the proposed UID Scheme.
This paper details all the architecture considerations and its realizations ...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract: The processing of massive amount of data gives great insights into analysis for business. Many primary algorithms run over the data and gives information which can be used for business benefits and scientific research. Extraction and processing of large amount of data has become a primary concern in terms of time, processing power and cost. Map Reduce algorithm promises to address the above mentioned concerns. It makes computing of large sets of data considerably easy and flexible. The algorithm offers high scalability across many computing nodes. This session will introduce Map Reduce algorithm, followed by few variations of the same and also hands on example in Map Reduce using Apache Hadoop.
Speaker: Allahbaksh Asadullah is a Product Technology Lead from Infosys Labs, Bangalore. He has over 5 years of experience in software industry in various technologies. He has extensively worked on GWT, Eclipse Plugin development, Lucene, Solr, No SQL databases etc. He speaks at the developer events like ACM Compute, Indic Threads and Dev Camps.
SecureGRC™ is a world-leading solution for all enterprises, including small and medium businesses. SecureGRC™ includes all security and IT-GRC functions required to be compliant with easy to adopt compliance management framework with ready to use frameworks, leading edge context based inference engines, most advanced alert processing and easy to use logging and monitoring solution.
Social, political and technological considerations for national identity mana...Ravinder (Ravi) Singh
Government agencies face the intricate challenge of effectively and securely controlling population flows,
identifying individuals, and managing their access to services, while aligning their strategies with citizen’s
expectations for convenience, security and privacy. Identity Management initiatives, especially after the
increased frequency of terrorist attacks around the world, have become a political imperative of
unprecedented urgency, for an increasing number of governments around the world. The India’s answer
to this challenge is expressed through the proposed UID Scheme.
This paper details all the architecture considerations and its realizations ...
Nuxeo CMF, a framework for case centric applicationsNuxeo
Alain Escaffre, Darcy Carrié, and Mariana Cedica present real-word examples of how Nuxeo CMF provides case management solutions for large scale businesses and organizations.
Introduction to case management - Roeland Loggen vs1.1rloggen
An introductory presentation on case management as a business work pattern and the IT solution to support it. Language: English. Roeland Loggen, Capgemini.
Nigeria national iccm implementation frameworktomowo George
The Nigeria's National ICCM implementation Framework is a 'one national iCCM Implementation Model' describing the activities expected to be carried out at the different levels of government, with clear programme boundaries, roles and responsibilities of individuals, organizations and other players. This framework also shows the pattern of information flow for iCCM in the country.
Composing a case management solution with SaaS, PaaS, On-premise productsLeon Smiers
Case management is supporting the core processes of a company. Big challenges, both internal and external, have impact on the core processes. Internal drivers related to costs, external drivers to customers demanding better and faster delivered services. Should we continue with our on-premise application landscape, should we move to the cloud or do we end up with a hyrbid landscape. An holistic approach leads to better insight in the solution! This holistic approach consists of a Case Management Framework that gives insight where internal/external changes impact the solution and how these can be mapped towards on-prem/cloud products. Three examples are used to explain this methodology, a mortgage request, police investigation and the hotel overbooking scenario.
More information can be found in the book 'Oracle Case Management Solutions' http://oraclecasemanagementsolutions.com/
This presented is given at the AMIS25 conference June 2nd 2016 http://www.amis.nl/en/events-eng/jubileumconferentie/
Nuxeo's Chief Technology Officer, Theirry Delprat, provides a technical overview of Nuxeo Enterprise Platform from extensible services, high level frameworks to ready-to-use pre-packaged applications.
Open Source Tool Chains for Cloud ComputingMark Hinkle
This presentation was given at LinuxCon 2010.
The proliferation of cloud computing is inevitable, hosted apps, software-as-as-service and now dynamic on-demand utility computing is becoming the norm. The session will be a “fire-side” chat style discussion of the types of challenges presented by IT management operations personnel and how they can manage cloud infrastructure using open source tools. The talk will discuss options for deploying and integrating tools that provision, configure, orchestrate and monitor cloud (and physical)infrastructure. The session will appeal to those IT professionals (syadmins, net-ops, developers) who develop and manage infrastructure that resides in hosted environments like Amazon EC2 without disregarding traditionally hosted internal infrastructure.
Talk given by Robert Maxwell, Lead Incident Handler and Kelly McCracken, Director, CSIRT at Salesforce, at Techno Security, in June 2016
Effective IR Communication & Coordination using a Case Management System Description: Too often IR teams are left to managing incidents from email, personal folders, and shared drive. Salesforce's CSIRT will demonstrate how they have developed an effective case management system to increase the team's ability to effectively track, respond, manage, measure, and report on incidents from detection through the lessons learned phase of the incident response lifecycle.
With a record-breaking 1,300 respondents, the 2015 Future of Open Source Survey results highlight record levels of corporate participation in open source, as well as the greater impact OSS is having on technology and security. Yet, this year's results also reveal a reported lack of formal company policies and processes for consuming and managing open source and its associated legal, operational, and security risks.
Learn more at www.blackducksoftware.com/future-of-open-source
There is a profound architecture transition happening in software in 2011, like we see every 15 years: html5 browsers and powerful mobile platforms (android, iphone) bring new capabilities on the client side of apps, and the switch from vertical to horizontal scalability gave birth to powerful cloud platforms that allow fast development of scalable backends.
This talk will focus on the server side, explaining the opportunities and challenges that the Cloud represents for developers, in 4 areas: Delivery/Monetization/Marketing, Infrastructure, Platform and Development.
I will give an overview of several product and services in these areas: Amazon (AWS, Beanstalk), Google (App Engine), Joyent (Node.js), Salesforce (Heroku), VMWare (Cloud Foundry), GitHub, Cloudbees, Exo, Cloud9, Eclipse Orion.
The Cloud is an opportunity for developers to embrace agility and change, reinvent themselves, make money and have fun. It's time to start building your dreams on it!
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
Nuxeo CMF, a framework for case centric applicationsNuxeo
Alain Escaffre, Darcy Carrié, and Mariana Cedica present real-word examples of how Nuxeo CMF provides case management solutions for large scale businesses and organizations.
Introduction to case management - Roeland Loggen vs1.1rloggen
An introductory presentation on case management as a business work pattern and the IT solution to support it. Language: English. Roeland Loggen, Capgemini.
Nigeria national iccm implementation frameworktomowo George
The Nigeria's National ICCM implementation Framework is a 'one national iCCM Implementation Model' describing the activities expected to be carried out at the different levels of government, with clear programme boundaries, roles and responsibilities of individuals, organizations and other players. This framework also shows the pattern of information flow for iCCM in the country.
Composing a case management solution with SaaS, PaaS, On-premise productsLeon Smiers
Case management is supporting the core processes of a company. Big challenges, both internal and external, have impact on the core processes. Internal drivers related to costs, external drivers to customers demanding better and faster delivered services. Should we continue with our on-premise application landscape, should we move to the cloud or do we end up with a hyrbid landscape. An holistic approach leads to better insight in the solution! This holistic approach consists of a Case Management Framework that gives insight where internal/external changes impact the solution and how these can be mapped towards on-prem/cloud products. Three examples are used to explain this methodology, a mortgage request, police investigation and the hotel overbooking scenario.
More information can be found in the book 'Oracle Case Management Solutions' http://oraclecasemanagementsolutions.com/
This presented is given at the AMIS25 conference June 2nd 2016 http://www.amis.nl/en/events-eng/jubileumconferentie/
Nuxeo's Chief Technology Officer, Theirry Delprat, provides a technical overview of Nuxeo Enterprise Platform from extensible services, high level frameworks to ready-to-use pre-packaged applications.
Open Source Tool Chains for Cloud ComputingMark Hinkle
This presentation was given at LinuxCon 2010.
The proliferation of cloud computing is inevitable, hosted apps, software-as-as-service and now dynamic on-demand utility computing is becoming the norm. The session will be a “fire-side” chat style discussion of the types of challenges presented by IT management operations personnel and how they can manage cloud infrastructure using open source tools. The talk will discuss options for deploying and integrating tools that provision, configure, orchestrate and monitor cloud (and physical)infrastructure. The session will appeal to those IT professionals (syadmins, net-ops, developers) who develop and manage infrastructure that resides in hosted environments like Amazon EC2 without disregarding traditionally hosted internal infrastructure.
Talk given by Robert Maxwell, Lead Incident Handler and Kelly McCracken, Director, CSIRT at Salesforce, at Techno Security, in June 2016
Effective IR Communication & Coordination using a Case Management System Description: Too often IR teams are left to managing incidents from email, personal folders, and shared drive. Salesforce's CSIRT will demonstrate how they have developed an effective case management system to increase the team's ability to effectively track, respond, manage, measure, and report on incidents from detection through the lessons learned phase of the incident response lifecycle.
With a record-breaking 1,300 respondents, the 2015 Future of Open Source Survey results highlight record levels of corporate participation in open source, as well as the greater impact OSS is having on technology and security. Yet, this year's results also reveal a reported lack of formal company policies and processes for consuming and managing open source and its associated legal, operational, and security risks.
Learn more at www.blackducksoftware.com/future-of-open-source
There is a profound architecture transition happening in software in 2011, like we see every 15 years: html5 browsers and powerful mobile platforms (android, iphone) bring new capabilities on the client side of apps, and the switch from vertical to horizontal scalability gave birth to powerful cloud platforms that allow fast development of scalable backends.
This talk will focus on the server side, explaining the opportunities and challenges that the Cloud represents for developers, in 4 areas: Delivery/Monetization/Marketing, Infrastructure, Platform and Development.
I will give an overview of several product and services in these areas: Amazon (AWS, Beanstalk), Google (App Engine), Joyent (Node.js), Salesforce (Heroku), VMWare (Cloud Foundry), GitHub, Cloudbees, Exo, Cloud9, Eclipse Orion.
The Cloud is an opportunity for developers to embrace agility and change, reinvent themselves, make money and have fun. It's time to start building your dreams on it!
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
This presentation will give you Information about :
1.Configuring HDFS
2.Interacting With HDFS
3.HDFS Permissions and Security
4.Additional HDFS Tasks
HDFS Overview and Architecture
5.HDFS Installation
6.Hadoop File System Shell
7.File System Java API
Hadoop Institutes: kelly technologies are the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad.
http://www.kellytechno.com/Hyderabad/Course/Hadoop-Training
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
My presentation for the Cloud Data Management course at EPFL by Anastasia Ailamaki and Christoph Koch.
It is mainly based on the following two papers:
1) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP, 2003
2) J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, 2004
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
4. Advantages
• Data safety and reliability
• Data synchronization between different
devices
• Low requirement of end device
• Unlimited potential of the cloud
15. Master
• Maintain Metadata:
– File namespace
– Access control info
– Maps files to chunks
• Control system activities:
– Monitor state of chunkservers
– Chunk allocation and placement
– Initiate chunk recovery and rebalancing
– Garbage collect dead chunks
– Collect and display stats, admin functions
15
17. GFS Usage in Google Cloud
• 50+ clusters
• Filesystem clusters of up to 1000+
machines
• Pools of 1000+ clients
• 10+ GB/s read/write load
– in the presence of frequent hardware failures
17
19. What’s MapReduce
• A simple programming model that applies to
many large-scale computing problems
• Hide messy details in MapReduce runtime
library
19
20. Typical problem solved by MapReduce
• Read a lot of data
• Map: extract something you care about from
each record
• Shuffle and Sort
• Reduce: aggregate, summarize, filter, or
transform
• Write the results
20
21. More specifically…
• Programmer specifies two primary methods:
– map(k, v) → <k', v'>*
– reduce(k', <v'>*) → <k', v'>*
• All v' with same k' are reduced together, in
order.
21
22. Example: Word Frequencies in Web Pages
• Input is files with one document per record
• Specify a map function that takes a key/value pair
– key = document URL
– value = document contents
• Output of map function is (potentially many) key/value
pairs.
– In our case, output (word, “1”) once per word in the
document
<“网页1”, “是也不是”>
<“是”, “1”>
<“也”, “1”>
<“不”, “1”>
…
22
23. Continued: word frequencies in web pages
• MapReduce library gathers together all pairs with the
same key (shuffle/sort)
• The reduce function combines the values for a key
In our case, compute the sum
key = “是” key = “也” key = “不”
values = “1”, “1” values = “1” values = “1”
“2” “1” “1”
• Output of reduce (usually 0 or 1 value) paired with key
and saved
“是”, “2”
“也”, “1”
“不”, “1”
23
24. Example: Pseudo-code
Map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_values:
EmitIntermediate(w, "1");
Reduce(String key, Iterator intermediate_values):
// key: a word, same for input and output
// intermediate_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(AsString(result));
24
25. Conclusion to MapReduce
• MapReduce has proven to be a remarkably-useful
abstraction
• Greatly simplifies large-scale computations at Google
• Fun to use: focus on problem, let library deal with messy
details
• Many thousands of parallel programs written by
hundreds of different programmers in last few years
– Many had no prior parallel or distributed programming
experience
25
27. Overview
• Structure data storage, not database
• Wide applicability
• Scalability
• High performance
• High availability
27
28. Basic Data Model
• Distributed multi-dimensional sparse map
(row, column, timestamp) cell contents
“contents” COLUMNS
ROWS
…
www.cnn.com t1
…
t2
“<html>…” t3 TIMESTAMPS
• Good match for most of our applications
28
29. BigTable API
• Metadata operations
– Create/delete tables, column families, change metadata
• Writes (atomic)
– Set(): write cells in a row
– DeleteCells(): delete cells in a row
– DeleteRow(): delete all cells in a row
• Reads
– Scanner: read arbitrary cells in a bigtable
29
30. System Structure
Bigtable client
Bigtable cell
Bigtable client
Bigtable master library
performs metadata ops, Open()
load balancing
Bigtable tablet server Bigtable tablet server Bigtable tablet server
serves data serves data serves data
Cluster Scheduling Master GFS Lock service
handles failover, monitoring holds tablet data, logs holds metadata,
handles master-election
31. Current status of BigTable
• Design/initial implementation started beginning of 2004
• Currently ~100 BigTable cells
• Production use or active development for many projects:
– Google Print
– My Search History
– Orkut
– Crawling/indexing pipeline
– Google Maps/Google Earth
– Blogger
– …
• Largest bigtable cell manages ~200TB of data spread
over several thousand machines (larger cells planned)
31
32. Typical Cluster
Lock service GFS master Scheduling masters
Machine 1 Machine 2 Machine N
User User User
app1 app1 app3
User
User app2 app3 User app2
…
Scheduler GFS Scheduler GFS Scheduler GFS
slave chunkserver slave chunkserver slave chunkserver
Linux Linux Linux
32
33. Agenda
• About Cloud Computing
• Tools for Cloud Computing in Google
• Google’s partnerships with universities
33
34. ACCI in Oct. 2007
• Stand for Academic Cloud Computing
Initiative
• IBM and Google partnership
• Facilitate universities education with
distributed system programming skills
• Started from University of Washington and
scaling to many others
34
35. Google’s ACCI activities in Greater China
• Google Greater China has helped create a
cloud computing course at Tsinghua in
summer 2007
• Now scaling to other mainland China and
Taiwan Universities
36. Example: THU MR Course, Fall 2007
• “Massive Data Processing” course based
on Google Cloud technology
• Google employees gave lectures during
the course offering;
• Got interesting results from the smart
students
• http://hpc.cs.tsinghua.edu.cn/dpcourse/
37. Count: THU MR Course, Fall 2007
Students presenting course Massive data processing to
project “simulating the operation simulate the operation of
of solar system based on the solar system
MapReduce technology” at
Google office