Chapter Three: Google TechnologyChapter Three:                        Google Technology     “Apart from the problems of sc...
Chapter Three: Google Technologyluminescence makes it difficult for the observer to see other aspects of the phenomenoncle...
Chapter Three: Google TechnologyGoogle’s technology has emerged from a series of continuous improvements or what Japanesem...
Chapter Three: Google TechnologyPageRank requires a lot of computing horsepower cycles to work. When Google gotunderway in...
Chapter Three: Google Technologyelimination of such troublesome jobs as backing up data, Google’s hardware innovations giv...
Chapter Three: Google TechnologyHow Google Is Different from MSN and YahooGoogle’s technology is simultaneously just like ...
Chapter Three: Google TechnologySoftware and hardware engineering cannot be easily segregated at Google. At MSN and Yahooh...
Chapter Three: Google TechnologyIn terms of technology, Google has the hardware and software engineering expertise to buil...
Chapter Three: Google Technology  One-click access to functions  performed on the user’s local  computer.  Recently-viewed...
Chapter Three: Google Technologycollisions are inevitable. The figure below provides an overview of the mid-2005 technical...
Chapter Three: Google TechnologyMSN  and Yahoo! are becoming ad-supported versions of general-interest portals like Yahoo,...
Chapter Three: Google TechnologyThe focus on low-cost, commodity hardware and smart software is part of the Google culture...
Chapter Three: Google TechnologyWhat is of interest is that Google does this with low-cost commodity hardware running onGo...
Chapter Three: Google TechnologyNot Google. Google uses commodity pizza box servers organized in a cluster. A cluster isgr...
Chapter Three: Google Technologyof indexed Web pages is the best match. Without fast response to a query, users would not ...
Chapter Three: Google Technology     2   Integrated information from Google Local in early 2005.     3   Hooked Keyhole sa...
Chapter Three: Google Technologycode or fiddle with code to get different pieces of a program to execute simultaneously us...
Chapter Three: Google Technology       consuming than Microsoft’s “death march” to get Longhorn shipped by late 2006.     ...
Chapter Three: Google Technologybiographies of Google executives and Google Web logs can yield some useful technicalinform...
Chapter Three: Google Technology        Technology                          Purpose                             To Learn M...
Chapter Three: Google Technologystate-of-the-art facility reflects what Google engineers have learned about heat and power...
Chapter Three: Google Technologycenters indicates that this “plug and play” concept and automatic discovery of new resourc...
Chapter Three: Google TechnologyUnanticipated Faults Could Derail Google’s JuggernautGoogle’s network uses a number of con...
Chapter Three: Google TechnologyFinally, other operating systems – including those from computer research laboratories and...
Chapter Three: Google TechnologyGoogle’s technology is one major challenge to Microsoft and Yahoo. So to conclude thiscurs...
Upcoming SlideShare
Loading in …5

Google technology


Published on

Published in: Technology, Business
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Google technology

  1. 1. Chapter Three: Google TechnologyChapter Three: Google Technology “Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to product better search results.... Fast crawling technology is needed to gather the Web documents and keep them up to date. Storage space must be used efficiently to store indices and, optionally, the documents themselves. The indexing system must process hundreds of gigabytes of data efficiently. Queries must be handled quickly, at the rate of hundreds to thousands per second.” – Sergey Brin and Lawrence Page, 19971In the beginning, there was BackRub, the service that became Google. Today, Google is mostclosely associated with its PageRank algorithm. PageRank is a voting algorithm weighted forimportance. The indicators of a Web page’s importance is the number of pages that link to aparticular page.Messrs. Brin and Page soon added another factor which voted for the importance of a Webpage. This idea was the number of people who click on a Web page. The more clicks on a Webpage, the more weight that Web page was given. Over time, still other factors have been addedto the PageRank algorithm; for example, the frequency with which content on a page ischanged.Google’s PageRank technology is closely allied with Internet search. Voting algorithms areless effective in enterprise search, for instance. The attention given to Google and its searchtechnology dominate popular thinking about the company. Google search is like a nova. The 1. From “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” www.- Google Legacy 55
  2. 2. Chapter Three: Google Technologyluminescence makes it difficult for the observer to see other aspects of the phenomenonclearly or easily.Radiance aside, Google is a technology company.2 Some of that technology when described intechnical papers such as the earliest one “The Anatomy of a Large-Scale Hypertextual WebSearch Engine” is demanding. The later papers such as “MapReduce: Simplified DataProcessing on Large Clusters” can be a slow read.3 Since Google is technology, explainingwhat Google does in an easily-digestible meal is difficult. The diagram below providesunauthorized snapshot of Google’s computing framework. b a d c Important Google technologies that underlie this diagram of the Googleplex include: [a] modifications to Linux to permit large file sizes and other functions so as to accelerate the overall system; [b] a distributed architecture that allows applications and scaling to be “plugged in” without the type of hands-on set-up other operating systems require; [c] a technical architecture that is similar at every level of scale; [d] a Web-centric architecture that allows new types of applications to be built without a programming language limitation. 2. The annex to this monograph contains a listing of more than 60 Google patents. The list is not all-inclusive; however, it does provide the patent number and a brief description for some of Google’s most important patents. The PageRank patent belongs to the trustees of Stanford University. Google’s patent efforts have focused on systems and methods for relevance, advertising, and other core foci of the company. Google is creating a patent fence to protect its interests. 3. Jeff Dean, former Alta Vista researcher and a Google senior engineer, has been an advocate of MapReduce. His most recent papers are available on his Web page at http:// The Google Legacy
  3. 3. Chapter Three: Google TechnologyGoogle’s technology has emerged from a series of continuous improvements or what Japanesemanagement consultants call kaizan. Each Google technical change may be inconsequential tothe average user of Google. But when taken as a whole, Google’s “technological advantage”comes from Google’s incremental innovations, clever adaptations of research-computingconcepts, and Byzantine tweaks to Linux. Some day, a historian of technology will be able toidentify, from the hundreds of improvements that Google has engineered in the last nine years,one or two that stand with PageRank as of major importance. Critics of Google will see thatthe company has grafted to its core technology processes from many different sources.To illustrate, the structure of Google’s data centers and the messages passed to and from thesedata centers is in many ways a variant of grid computing.4 Google’s ability to read data frommany computers simultaneously is reminiscent of BitTorrent’s technology.5 Google’s use ofcommodity or “white box” hardware in its data centers is an indication of Google’s hackerethos. The use of memory and discs to store multiple copies of data comes from the frontiersof computing.Google’s approach to technology, then, is eclectic and in many ways represents a buildingblock approach to large-scale systems. Google benefits from that eclecticism in several ways.First, Google’s computational framework delivers sizzling performance from low-costhardware. Second, Google worked around the bottlenecks of such operating systems asSolaris, Windows Advanced Server, and off-the-shelf Linux. Third, Google took goodprogramming ideas from other languages, implementing new functions and libraries toeliminate most of the manual coding required to parallelise an application across Google’sservers.6According to Jeff Dean, one of Google’s senior engineers, “Google engineering is sort ofchaotic.”7 This is neither surprising nor necessarily a negative. The Googleplex is a toy boxfor engineers and programmers. The tools are sophisticated. The challenges of the problemsand peers make Google “the place to be” for the best and brightest technical talent in theworld. The nature of creativity combined with Google’s approach to innovation make itdifficult to predict the next big thing from Google.Before reviewing selected parts of Google’s technology in somewhat more detail, the diagram“Google’s Computing Framework” provides an overview of the Googleplex and some of itstechnologies. These will be touched upon in this section. 4. Grid computing is applying resources from many computers in a network to a single problem or application. Google uses grid-like technology in its distributed computing system. 5. BitTorrent is a peer-to-peer file distribution tool written by programmer Bram Cohen in 2001.The reference implementation is written in Python and is released under the MIT License. 6. Google has anywhere from 100,000 to 165,000 or more servers. Servers are organized into clusters. Clusters may reside within one rack or across multiple racks of servers. Some Google functions are distributed across data centers. 7. From Dr Dean’s speech at the University of Washington in October 2003. See http:// Google Legacy 57
  4. 4. Chapter Three: Google TechnologyPageRank requires a lot of computing horsepower cycles to work. When Google gotunderway in 1996, Messrs. Brin and Page had limited computing horsepower. In order tomake PageRank work, they had to figure out how to get the PageRank algorithm to run ongarden-variety computers available to them.From the beginning – and this is an important issue with regards to Google’s almost-certaincollision course with Microsoft – Google had to solve both software engineering andhardware engineering issues to make Google Search viable. In fact, when discussing Googletechnology, it is important to keep in mind that PageRank is important only because it can runquickly in the real world, not in a sterile computer lab illuminated with the blue glow ofsupercomputers.The figure Google’s Fusion: Hardware and Software Engineering shows that Google’stechnology framework has two areas of activity. There is the software engineering effort thatfocuses on PageRank and other applications. Software engineering, as used here, meanswriting code and thinking about how computer systems operate in order to get work donequickly. Quickly means the sub one-second response times that Google is able to maintaindespite its surging growth in usage, applications and data processing. Google’s Fusion: Hardware and Software Innovations The Google phenomenon comes from the fission occurring when PageRank’s software and hardware engineering interact. Google’s technology delivers super computer applications for mass markets.The other effort focuses on hardware. Google has refined server racks, cable placement,cooling devices, and data center layout. The payoff is lower operating costs and the ability toscale as demand for computing resources increases. With faster turnaround and the58 The Google Legacy
  5. 5. Chapter Three: Google Technologyelimination of such troublesome jobs as backing up data, Google’s hardware innovations giveit a competitive advantage few of its rivals can equal as of mid-2005.PageRank with its layering of additional computations added over the years is a softwareproblem of considerable difficulty. The Google system must find Web pages and performdozens, if not hundreds of analyses of those Web pages. Consider the links pointing to a Webpage. Google must keep track of them for more than eight billion Web pages. For a single Webpage with one link pointing to it, the problem is trivial. One link equals one pointer. But whathappens when a site has 10,000 links pointing to it? The problem becomes many times largerand more computationally demanding. Some of these links are likely to come from sites thathave more traffic than others. Some of the links may come from sites that have spoofedGoogle for fun or profit. The calculations to sort out the “value” of each of these links adds tocomputational work associated with PageRank. Keeping track of these factors is a big job.Sizing up different factors against one another for a single page can be hard without acalculator to help. Take the same task and apply it by a couple of billion Web pages, and thecomputing task becomes one for a supercomputer.Yet this task is everyday stuff for Google and its PageRank process. Users do not give muchthought to what technology underpins a routine query or the 300 million queries Googlehandles each day. In a single second, Google’s technology handles around 340 queries indozens of languages from users worldwide.Google’s technology cannot be separated from search. Search was the prime mover in theGoogle universe. Once Messrs. Brin and Page were able to fiddle with a limited number ofcommodity computers and make their PageRank algorithm work, Google was headed down aroad that it still follows.The software requires a suitable hardware and network infrastructure in which to operate.Without Google’s hardware and software, there would be no Google. Hardware and softwareare inextricably linked at Google. With each new advance in software, Google’s engineersmust make correspondingly significant advances in hardware. And when hardware engineerscome up with an advance, the software engineers greedily use that advance to up thefunctionality of their software.What Google owns is its own snappy, turbocharged supercomputer, interesting software tools,and several thousand people trying to figure out what else the Googleplex can do. Some of thetinkerers come at the problem from bits and bytes, writing code, and weaving applications outof the available functions. The result is a brilliant product.Others come at the problem from the soldering iron and screwdriver angle. These engineerslook for ways to build hardware and physical systems that can perform the calculations neededto make PageRank work. Google’s approach to data centers, the racks in the data centers, andthe devices in the racks in the data centers is as clever as the company’s search system. Thehardware has to be more than clever. The hardware has to work 24x7, under continuous load,and in locations from Switzerland to Beijing. The synergy between software and hardware isperhaps one of Google’s major accomplishments.The Google Legacy 59
  6. 6. Chapter Three: Google TechnologyHow Google Is Different from MSN and YahooGoogle’s technology is simultaneously just like other online companies’ technology, and verydifferent. A data center is usually a facility owned and operated by a third party wherecustomers place their servers. The staff of the data center manage the power, air conditioningand routine maintenance. The customer specifies the computers and components. When a datacenter must expand, the staff of the facility may handle virtually all routine chores and maywork with the customer’s engineers for certain more specialized tasks.Before looking at some significant engineering differences between Google and two of itsmajor competitors, review this list of characteristics for a Google data center. 1 Google data centers – now numbering about two dozen, although no one outside Google knows the exact number or their locations. They come online and automatically, under the direction of the Google File System, start getting work from other data centers. These facilities, sometimes filled with 10,000 or more Google computers, find one another and configure themselves with minimal human intervention. 2 The hardware in a Google data center can be bought at a local computer store. Google uses the same types of memory, disc drives, fans and power supplies as those in a standard desktop PC. 3 Each Google server comes in a standard case called a pizza box with one important change: the plugs and ports are at the front of the box to make access faster and easier. 4 Google racks are assembled for Google to hold servers on their front and back sides. This effectively allows a standard rack, normally holding 40 pizza box servers, to hold 80. 5 A Google data center can go from a stack of parts to online operation in as little as 72 hours, unlike more typical data centers that can require a week or even a month to get additional resources online. 6 Each server, rack and data center works in a way that is similar to what is called “plug and play.” Like a mouse plugged into the USB port on a laptop, Google’s network of data centers knows when more resources have been connected. These resources, for the most part, go into operation without human intervention.Several of these factors are dependent on software. This overlap between the hardware andsoftware competencies at Google, as previously noted, illustrates the symbiotic relationshipbetween these two different engineering approaches. At Google, from its inception, Googlesoftware and Google hardware have been tightly coupled. Google is not a software companynor is it a hardware company. Google is, like IBM, a company that owes its existence to bothhardware and software. Unlike IBM, Google has a business model that is advertiser supported.Technically, Google is conceptually closer to IBM (at one time a hardware and softwarecompany) than it is to Microsoft (primarily a software company) or Yahoo! (an integrator ofmultiple softwares).60 The Google Legacy
  7. 7. Chapter Three: Google TechnologySoftware and hardware engineering cannot be easily segregated at Google. At MSN and Yahoohardware and software are more loosely-coupled. Two examples will illustrate thesedifferences.Microsoft – with some minor excursions into the Xbox game machine and peripherals –develops operating systems and traditional applications. Microsoft has multiple operatingsystems, and its engineers are hard at work on the company’s next-generation of operatingsystems. Microsoft does not design or make its own hardware. Its operating systems are coded,for example, for processors that evolved from the Intel chips for personal computers. RecentlyMicrosoft embarked on a new path with its game machine, the Xbox 360. The new Xbox usesa processor from IBM’s family of PowerPC chips also used in the Macintosh computer, theSony PS/3, and Nintendo next-generation game machines. Microsoft’s applications run onMicrosoft operating systems, although a version of Microsoft Office and Internet Explorer runon Apple’s Macintosh.In addition, Microsoft buys hardware from various suppliers to run its online systems. Most ofthese suppliers, not surprisingly, are certified by Microsoft. Examples include Microsoft’s useof Dell Computers. Microsoft’s engineers use these machines in configurations required by theMicrosoft operating systems and applications. For example, Microsoft servers often require aload balancing feature. Microsoft implements its load balancing via software. When moreperformance is required, Microsoft upgrades the hardware, adds memory, or shifts to higher-speed hard drive technology instead of recoding the operating system itself to deliver higherperformance as Google does. Once a function is released to customers, Microsoft’s engineersfocus on stamping out bugs. Re-engineering a software application for higher performance isnot typically a priority.Several observations are warranted: 1 Unlike Google, Microsoft does not focus on performance as an end in itself. As a result, Microsoft gets performance the way most computer users do. Microsoft buys or upgrades machines. Microsoft does not fiddle with its operating systems and their subfunctions to get that extra time slice or two out of the hardware. 2 Unlike Google, Microsoft has to support many operating systems and invest time and energy in making certain that important legacy applications such as Microsoft Office or SQLServer can run on these new operating systems. Microsoft has a boat anchor tied to its engineer’s ankles. The boat anchor is the need to ensure that legacy code works in Microsoft’s latest and greatest operating systems. 3 Unlike Google, Microsoft has no significant track record in designing and building hardware for distributed, massively parallelised computing. The mice and keyboards were a success. Microsoft has continued to lose money on the Xbox, and the sudden demise of Microsoft’s entry into the home network hardware market provides more evidence that Microsoft does not have a hardware competency equal to Google’s.The Google Legacy 61
  8. 8. Chapter Three: Google TechnologyIn terms of technology, Google has the hardware and software engineering expertise to buildapplications rapidly, perform computationally-intensive applications quickly, and deliverhigh-reliability services from low-cost, commodity hardware.Yahoo! operates differently from both Google and Microsoft. Yahoo! is in mid-2005 a directcompetitor to Google for advertising dollars. Yahoo! has grown through acquisitions. Insearch, for example, Yahoo acquired to handle Chinese language search andretrieval. Yahoo bought Inktomi to provide Web search. Yahoo bought Stata Labs in order toprovide users with search and retrieval of their Yahoo! mail. Yahoo! also, a Web search site created by FAST Search & Transfer. Yahoo! owns theOverture search technology used by advertisers to locate key words to bid on. Yahoo! ownsAlta Vista, the Web search system developed by Digital Equipment Corp. Yahoo! licensesInQuira search for customer support functions. Yahoo has a jumble of search technology;Google has one search technology.Historically Yahoo has acquired technology companies and allowed each company to operateits technology in a silo. Integration of these different technologies is a time-consuming,expensive activity for Yahoo. Each of these software applications requires servers and systemsparticular to each technology. The result is that Yahoo has a mosaic of operating systems,hardware and systems. Yahoo!’s problem is different from Microsoft’s legacy boat-anchorproblem. Yahoo! faces a Balkan-states problem.There are many voices, many needs, and many opposing interests. Yahoo! must invest inmanagement resources to keep the peace. Yahoo! does not have a core competency inhardware engineering for performance and consistency. Yahoo! may well have considerablecompetency in supporting a crazy-quilt of hardware and operating systems, however. Yahoo!is not a software engineering company. Its engineers make functions from disparate systemsavailable via a portal.Google also acquires technology. A good example is Picasa. The photo management softwareruns on the user’s Windows PC.The program has been integrated with several of Google’s network-centric applications: 1 Gmail. The user’s images can be uploaded and sent via email to friends, colleagues and family. A Picasa user without a Gmail account is able to register and receive a user name and password. The Gmail account can also be used, if the user wishes, for other Google services, including Fusion, which is Google’s personalized portal, and the search history function, which saves a registered user’s Google queries for later reference. 2 Blog Publishing. The user can post pictures to a Google property, The image publishing function is simplified to one or two clicks. Posting images on some Web log systems is beyond the expertise of many computer users. 3 Image Printing. The user can send images to online photo processing services.62 The Google Legacy
  9. 9. Chapter Three: Google Technology One-click access to functions performed on the user’s local computer. Recently-viewed images One-click access to network services available as part of the user’s virtual application.In sharp contrast to Yahoo’s approach, Google integrated the Picasa application into theGoogleplex. The “hooks” are painless to the user.8 Google has bundled into one freeapplication point-and-click solutions to make management of digital still images intuitive andfluid. Yahoo!’s acquisitions, in general, are not woven into a seamless experience with otherYahoo! services. Consider the search system. That service remains a separateChinese language operation available from mostly non-English Yahoo pages. Googleconstructs an application using some code on the user’s PC and other software running on theGoogleplex somewhere on the Internet.These three companies, different in structure and technical focus, are on a collision course.Like vessels in America’s Cup, each is going toward the same goal, but subject to forcesdifficult for their helmsman to control. Even though there is market space between the three, 8. Picasa requires a download. The installation process is smooth. Indexing speed was about five times faster than ACDSee’s image management program, a competitive product. With Picasa, Google’s technologists demonstrate a rapid, trouble-free installation and an intuitive interface.The Google Legacy 63
  10. 10. Chapter Three: Google Technologycollisions are inevitable. The figure below provides an overview of the mid-2005 technicalorientation of Google, Microsoft and Yahoo.MSN, and by extension Microsoft Corporation, has a core competency in software. Thecompany has grown from its operating system roots to provide a range of products for mobiledevices, desktop and notebook computers, and enterprise-class servers. Looking forward, thecompany’s Dot Net technology is Microsoft’s framework for virtual applications. In someways, Dot Net is a less-open version of the AJAX technology that Google uses in the GoogleMaps and Gmail products. Microsoft has expended great effort to push Windows downward tomobile devices and outward to network-centric computers in an effort to increase revenue. ForMicrosoft to continue to be the dominant force in software in the future, the company must beable to capture a commanding share of the market for network-centric applications. However,Microsoft’s position (whether real or perceived) is its products’ vulnerability to securitybreaches. Patch after patch, problem after problem, then promise after promise have done littleto bolster the firm’s credibility for delivering secure systems and software. Looking forwardover the next 12 to 18 months, Microsoft’s prospects hinge on security, cost and its developercommunity. The growth of open source alternatives are hard proof that die-hard Microsoftusers are willing to shift for security, cost savings and functionality. Microsoft has weaknessesthat can be attacked by Google and other competitors.Yahoo’s situation is typical to many American organizations. Most large US corporations are ahotch-potch of different systems, incompatible architectures and a Tower of Babel of dataformats. For Yahoo to deliver specific markets to its advertisers, Yahoo must integrateinformation from disparate systems and be able to segment and deliver ads to those usersefficiently. Yahoo is now spending money to break down the walls of its data silos andintegrating its user data. If Yahoo cannot deliver narrowly segmented markets, advertisersmay abandon Yahoo for services that offer more targeted marketing opportunities. After yearsof flirting with becoming a New Age America Online, Yahoo is beginning to behave like atraditional media company.64 The Google Legacy
  11. 11. Chapter Three: Google TechnologyMSN and Yahoo! are becoming ad-supported versions of general-interest portals like Yahoo,America Online and Tiscali. In contrast, Google is focusing on applications that tie users to itsGoogleplex. The company’s focus on hardware and software engineering gives it a cost andperformance advantage over MSN and Yahoo, among others competing in Web search.Google’s high-performance, homogeneous Googleplex means that the company does notstruggle with some integration, performance and cost issues that bedevil Microsoft and MSN.Google may not be doing everything right from a computer science point of view. Comparedto MSN or Yahoo, Google is doing less wrong than these two aggressive competitors.The Technology PreceptsGoogle’s technology uses concepts and techniques from the leading edge of computer science.Most of these innovations are difficult to explain to engineers steeped in traditional approachesto massively distributed, highly parallelized computing. The eclectic footnotes and referencesin the earlier BackRub paper have been sharpened in Google’s later technical presentations.Readers without a first-hand understanding of NOW-Sort, River, and BAD-FS are unlikely tocraft dinner conversation from Google’s explanations of the influence of these researchcomputing demonstrations.9For the purposes of this monograph and understanding the nature of Google’s technology, fiveprecepts thread through Google’s technical papers and presentations. The following snapshotsare extreme simplifications of complex, yet extremely fundamental, aspects of theGoogleplex.Cheap Hardware and Smart SoftwareGoogle’s use of commodity hardware for high-demand, 24x7 systems has existed as a coreprecept since 1996. Most of its competitors’ online systems combine branded hardware fromIBM, Sun Microsystems, Hewlett-Packard, and Dell Computers with specialized peripherals.The operating systems in use are a combination of Unix and Microsoft operating systems withsome Linux and open source components.Google approaches the problem of reducing the costs of hardware, set up, burn-in andmaintenance pragmatically. A large number of cheap devices using off-the-shelf commoditycontrollers, cables and memory reduces costs. But cheap hardware fails.In order to minimize the “cost” of failure, Google conceived of smart software that wouldperform whatever tasks were needed when hardware devices fail. A single device or an entirerack of devices could crash, and the overall system would not fail. More important, when sucha crash occurs, no full-time systems engineering team has to perform technical triage at 3 a.m. 9. See for example Andrea C. Arpaci-Dusseau, et. al. “HIgh Performance Sorting on Network of Workstations”. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, May 1997 or John Bent, et. al. “Explicit Control in a Batch-Aware Distributed File System”. Both contained in Proceedings of the 1st USENIX Symposium on Networked Systems Design and Implementation. March 2004.The Google Legacy 65
  12. 12. Chapter Three: Google TechnologyThe focus on low-cost, commodity hardware and smart software is part of the Google culture.In one presentation at a December 2004 technical conference, a Google spokesman joked thatanyone in the room could buy the same hardware that Google uses at Frye’s Electronics, aretail chain with stores in Palo Alto and other cities in California.Logical ArchitectureGoogle’s technical papers do not describe the architecture of the Googleplex as self-similar.Google’s technical papers provide tantalizing glimpses of an approach to online systems thatmakes a single server share features and functions of a cluster of servers, a complete datacenter, and a group of Google’s data centers.The diagram below shows a representation of the Googleplex’s tightly organized, highlyregular organization of files, servers, clusters, and more than two dozen data centers in a stableorganizational pattern.10 The Googleplex A data centre is a larger uses the same instance of the design and is organization of composed of a single pizza racks. box server. A single Google cluster embodies the same organizing A single principle as a replicated single pizza box Google file server reflects the controllling A single Google organizing pizza box server principleThe diagram illustrates that Google’s technical infrastructure is similar at every level in theGoogleplex. The collections of servers running Google applications on the Google version ofLinux is a supercomputer. The Googleplex can perform mundane computing chores liketaking a user’s query and matching it to documents Google has indexed. Further more, theGoogleplex can perform side calculations needed to embed ads in the results pages shown touser, execute parallelized, high-speed data transfers like computers running state-of-the-artstorage devices, and handle necessary housekeeping chores for usage tracking and billing. 10.The illustration is a Sierpinkski Triangle, chosen because it conveys how each component in Google’s infrastructure replicates other larger combinations of servers and data centers. The overall structure – in this illustration an equilateral triangle – expresses the stability of the Google approach to its system. This famous fractal connotes how Google scales without altering the micro or macro structure of the Googleplex.66 The Google Legacy
  13. 13. Chapter Three: Google TechnologyWhat is of interest is that Google does this with low-cost commodity hardware running onGoogle’s version of Linux. Google has infused the Googleplex with logic that allows softwareto handle data recovery, to streamline messages passed from server to server, and to grabadditional computing resources in order to complete a job quickly. When Google needs to addprocessing capacity or additional storage, Google’s engineers plug in the needed resources.Due to self-similarity, the Googleplex can recognize, configure and use the new resource.Google has an almost unlimited flexibility with regard to scaling and accessing the capabilitiesof the Googleplex. Unlike a collection of different building materials, Google’s approachdelivers a homogeneous computing system.A good example is bringing a new rack of 40 or more pizza box servers online and creatingone of the many types of servers Google users.11 Servers, according to the fractal architecture,consist of two or more clusters of pizza boxes. A cluster allows data to be replicated and workshared among pizza boxes with spare capacity. A rack is assembled and then Google’s pizzabox servers are “plugged in.” Cables are attached among the pizza boxes and the rack is thenplugged into a network hub. An engineer turns on the power, and the other devices becomeaware of the new rack’s resources. Master servers – Google’s term for the pizza box that is incharge of one or more clusters – instruct other servers to copy data to the new cluster and beginusing the clusters to do work.In Google’s self-similar architecture, the loss of an individual device is irrelevant. In fact, arack or a data center can fail without data loss or taking the Googleplex down. The Googleoperating system ensures that each file is written three to six times to different storage devices.When a copy of that file is not available, the Googleplex consults a log for the location of thecopies of the needed file. The application then uses that replica of the needed file andcontinues with the job’s processing. Redundancy and other engineering tweaks to Linux givesthe Googleplex ways to eliminate or reduce the bottlenecks associated with traditional onlinecomputer systems’ operation. The Google technical recipe includes distributed computing,optimized file handling, and embedded logic to make the servers working on tasks smarter.This architecture allows Google to expand its computational capacity, its storage and itssupported applications with an ease and price point rivals cannot easily match. According toJeff Dean, one of Google’s senior engineers, “At Google, everything is about scale.”12Speed and Then More SpeedGoogle Search is fast with most results coming back to the user in less than one second. Incommercial data centers, speed has traditionally been achieved by buying high-end, high-performance hardware from such manufacturers such as Sun Microsystems and usingadvanced storage devices connected to the servers by exotic fibre optics. 11.Data centers use computer cases that are shaped like the boxes used to hold pizzas. The term pizza boxes has been appropriated by engineers to describe one of the standard form factors for servers housed in rack mounts in data centers. 12.Statement made at the University of Washington, October 2004The Google Legacy 67
  14. 14. Chapter Three: Google TechnologyNot Google. Google uses commodity pizza box servers organized in a cluster. A cluster isgroup of computers that are joined together to create a more robust system. Instead of usingexotic servers with eight or more processors, Google generally uses servers that have twoprocessors similar to those found in a typical home computer.Through proprietary changes to Linux and other engineering innovations, Google is able toachieve supercomputer performance from components that are cheap and widely available.The table below provides some data from 2002 about the speed with which Google can readdata from hard drives:13 These data show the results of two clusters’ performance. Google’s read throughput has gone up since 2002. Based on increases in commodity drive throughput, Google’s read rate may be close to 2,000 megabytes per second, which may be a Google watchers enthusiasm boosting already-robust figures.To put these data in a context of 2002 technology, consider that an IBM EXP3 storage deviceavailable in 2002 could read data in burst mode at the rate of about 58 MB / second. Google’sread rate in 2002 averaged ten times the read rate of the IBM EXP The write rate is comparable.The cost of a single IBM EXP3 in 2002 was about $18,000 for 360 gigabytes of storage,excluding controller and cables. Google’s cost for comparable storage and the higherperformance was about $1,000. For greater speed, Google spends less. In the world of ever-increasing demands for speed and storage, Google has a strong one-two punch.14 Advances incommodity storage devices translate to even faster performance for Google. Google has notupdated its read rate data, but engineers familiar with Google believe that read rates may insome clusters approach 2,000 megabytes a second. When commodity hardware gets better,Google runs faster without paying a premium for that performance gain.Google engineers for computational speed. Google’s approach has been to focus on making itssoftware engineering produce the turbocharged performance. Speed is crucial to Google’sPageRank and other analytic processes. If Google’s computational throughput were slow,Google could not perform the work needed to know that for a particular query, a particular set 13.From “The Google File System” by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Google) ACM SOSP 2003 Conference Proceedings 1-58113-757-5/03/0010, page 12. 14.With Google’s advanced programming tools, Google is able to increase the productivity of its engineers. Combined with hardware speed and performance, Google squeezes out more productivity by applying its engineering talents to application development. This is a one-two- three punch to which Google’s competitors have to respond.68 The Google Legacy
  15. 15. Chapter Three: Google Technologyof indexed Web pages is the best match. Without fast response to a query, users would not bewilling to run multiple queries and interact fluidly with the Google applications.Google does not mindlessly match key words in a user’s query to the terms in the Googleindex. Google’s approach is more subtle and computationally involved, although termmatching is an important part of the Google process. Google reviews data, various scores orvalues from certain algorithms. Google then uses these different values in other algorithms tofind search results, identify the best match (Google’s “Feeling Lucky” link), extract matchingads from its advertising server, and continuously update values as Google users of click onlinks. Once these various query and ad matching processes are complete, Google displays theresults page to the user; typically in less than one second across a public network.Google is a hot rod computer that can perform the basic mathematics needed to deliver mostsearch results in less than a half second, display maps with the speed of a dedicated desktopapplication like Encarta, and look at a Web page matching a user’s query and, in someapplications, insert additional hyperlinks to related content before displaying the results pageto the user. The Googleplex does experience slow downs. When these occur, the Googleplexallocates additional resources to eliminate the brown out.Speed has many meanings at Google. Speed means that users can interact with the Googleproducts and services as if the Google application were running on a dedicated PC in front ofthe user. Speed also means that Google must be able to expand its computational and storagecapacity quickly. Speed also means rapid development and deployment of new products.Speed, like Google’s ability to scale, is a core functionality of the Googleplex.Google applies its high-speed technology to search and to other types of servers. Among theservers using Google’s go-fast technology are those shown below: Type Function Advertising server Delivers text and other paid advertisements for AdWords and AdSense. Chunkserver Schedules and delivers blocks of data for further processing. Image servers Serves images for Google Image, Print and Video services. Index server The workhorse of search. Server handles search-and-retrieval. Mail server Delivers the Gmail service. News server Gathers, analyses and displays news. Web server Orders results and makes them available to users.What does the combination of go-fast technology plus multiple types of Google data allow thecompany to do? Google can engage in fast new product development. One example is GoogleMaps. Google developed a basic mapping product over the course of 2004. In late 2004,Google purchased Keyhole. By June 30, 2005, Google had: 1 Released a basic mapping product.The Google Legacy 69
  16. 16. Chapter Three: Google Technology 2 Integrated information from Google Local in early 2005. 3 Hooked Keyhole satellite imagery into Google Maps in early May 2005. 4 Announced Google Earth in May 2005. 5 Upgraded the system to integrate two dimensional point-to-point routes on top of satellite imagery. 6 Demonstrated a function that accepts a query in another language, translates the results to the user’s language, and displays the data in a three-dimensional mode.The image below shows that Google’s Map and Earth service pushes the functions of onlinemap and data integration to another level. In the span of several days, Google integratedKeyhole technology, launched, upgraded and redefined online mapping services.15 This is the results of a Japanese language Google Maps-Earth query for the location of Wendy’s restaurants in New York City. The addition of the Japanese language support, the three-dimensional view of the section of Manhattan where the user wants directions, and the integration of hot links, the two dimensional map, and information about the restaurants was part of Google’s fast-cycle launch and enhancement program designed to beat Microsoft to the market.Another key notion of speed at Google concerns writing computer programs to deploy toGoogle users. Google has developed short cuts to programming. An example is Google’screating a library of canned functions to make it easy for a programmer to optimize a programto run on the Googleplex computer. At Microsoft or Yahoo, a programmer must write some 15.The source for this image was The Google Legacy
  17. 17. Chapter Three: Google Technologycode or fiddle with code to get different pieces of a program to execute simultaneously usingmultiple processors. Not at Google. A programmer writes a program, uses a function from aGoogle bundle of canned routines, and lets the Googleplex handle the details. Google’sprogrammers are freed from much of the tedium associated with writing software for adistributed, parallel computer. What does increased programmer productivity mean? In termsof money, Google makes each engineering dollar go farther. If a single programmer can reduceby 10 percent the time required to code a program, the savings could be several thousanddollars. If a programmer can slash coding time in half, Google gets twice the potentialproductivity out of each of its 3,000 plus programmers.16Eliminate or Reduce Certain System ExpensesSome lucky investors jumped on the Google bandwagon early. Nevertheless, Google wasfrugal, partly by necessity and partly by design. The focus on frugality influenced manyhardware and software engineering decisions at the company. Spending money wisely doesnot mean cheaply. Examples of how Google eliminates or reduces certain system expensesinclude: • Google eliminates the costs associated with backing up and restoring data when a hardware failure occurs. The fractal principal requires that Google replicate data three to six times elsewhere in the Googleplex. When a device fails, the “master server” for a task looks at a file that tells where the other copies of the data or the programs are. The “master server” then uses those data or those processes to complete a task. No tape, no human intervention, and no downtime; Google does not have these costs due to its engineering acumen. • Google does not have to certify new hardware. When additional storage or computational capacity is required, Google technicians assemble one or more racks of Google “pizza boxes.” Once in the rack, the Googleplex recognizes the new resources in a way that is similar to how a laptop knows when a user plugs in a USB mouse. The expensive certification processes otherwise required for some high-end hardware are eliminated. Google engineers plug in resources and let the Googleplex handle the other tasks. • Google innovation uses open source code as a starting point. Many of Google’s most striking technical advances are based on modifying open source software to benefit from insights gained from experimental results in supercomputing. Google does not have to work around known bottlenecks in some commercial operating systems. Unlike Microsoft, Google did not write a complete operating system for its Googleplex. Google made key changes to Linux, adding necessary services and functions to meet the specific requirements of Google applications. Google’s approach is pragmatic and less time- 16.Some Google programmers have complained about the peer pressure to perform. Google management faces a challenge in managing its programming talent. Staff burn out or defections could impair Google’s technical resources.The Google Legacy 71
  18. 18. Chapter Three: Google Technology consuming than Microsoft’s “death march” to get Longhorn shipped by late 2006. Compared with Yahoo, Google’s approach is more cohesive. Yahoo faces integration drudgery as a result of its multiple systems and heterogeneous hardware and data. Google has used Linux, standards, and open source software for virtually all of its core services and thus spends less time pounding disparate systems and data into a standard type.17 • Google does not spend money for high-performance devices to make its system perform faster.To illustrate the financial payoff from the use of commodity hardware, Google engineersrevealed a back-of-the-envelope calculation. Although dated, it underscores the economies ofthe Google approach:18 The cost advantages of using inexpensive, PC-based clusters over high-end multiprocessor servers can be quite substantial, at least for a highly parallelisable application like ours. For example, a $278,000 rack contains 176 2-GHz Xeon CPUs, 176 Gbytes of RAM, and 7 Tbytes of disk space. In comparison, a typical x86-based server contains eight 2-GHz Xeon CPUs, 64 Gbytes of RAM, and 8 Tbytes of disk space; it costs about $758,000. In other words, the multi-processor server is about three times more expensive but has 22 times fewer CPUs, three times less RAM, and slightly more disk space. Much of the cost difference derives from the much higher interconnect bandwidth and reliability of a high-end server, but again, Google’s highly redundant architecture does not rely on either of these attributes. [Emphasis added]This means that when Microsoft of Yahoo! spends US$3.00 for better performance, Googlespends less than US$1.00.19 Over time, competitors such as IBM, Microsoft or Yahoo mayimplement similar features into their network-centric services. Until then, Google has a costadvantage at least with regards to scaling online operations. If these 2002 data can beaccepted, Google spends one-third for more computing horsepower and disc space thancompanies spend using a traditional server architecture.Snapshots of Google TechnologyGoogle engineers generate a large volume of technical information. Some of the data are inthe form of patents, often written in a style that communicates little of the patent’s substanceto a lay reader. The link for Google’s publications can shift unexpectedly.20 Exploring 17.Google does not explicitly state that it has embraced a services oriented architecture or SOA. However, many of Google’s practices illustrate an informed use of certain features of SOA. 18.Luiz André Barroso, Jeffrey Dean, and Urs Hölzle, “Web Search for a Planet: The Google Cluster Architecture”, IEEE Computer Society 0272-1732/03, March April 2003. 19.A review of Google’s cost estimates for this monograph revealed that Google is understating its cost advantage by one or two orders of magnitude. As the performance of commodity hardware goes up, the cost of that hardware goes down. Bulk purchasing chops as much as 50 percent off the cost of some hardware. Google can replicate its data and give away free gigabytes of email storage. The cost to Google can be as low as a few cents a gigabyte. 20.See on June 1, 2005.72 The Google Legacy
  19. 19. Chapter Three: Google Technologybiographies of Google executives and Google Web logs can yield some useful technicalinformation. For example, one Google biography linked to more than 36 personal projects,including one by Google’s CEO.21 Surprisingly, Google’s search engine does a hit-and-missjob of indexing Google’s own technical information.Useful engineering information appears on the Google Web site. The topics covered in variousmonographs, white papers and technical notes concern a wide range of subjects. For example,in mid-2005, papers were available on such topics as algorithms, compiler optimization,information retrieval, artificial intelligence, file system design, data mining, geneticalgorithms, software engineering and design, and operating systems and distributed systems,among others. Google explains its use of very large files as well as how the Google-modifiedversion of Linux automatically allocates work and avoids the file system bottlenecks that canplague Solaris and Windows Advanced Server 2003, among others.Google’s technical papers and Google patents provide some insight into areas of interest atGoogle. For example, Google is posting more information about operating systems andapplications. The thrust of Google’s innovation is to build out the search platform and expandthe functionality of its backoffice programs such as those used for advertising services.The annex to this monograph provides information about more than 60 patents for whichGoogle is believed to be the assignee. To provide a more fine-grained look at Googletechnology, the table below identifies selected examples of innovations documented byGoogle engineers or researchers close to the company. Most of these papers appeared prior toGoogle’s receiving a patent for the technology referenced in these reports: Technology Purpose To Learn More Google Suggest Helps users find needed information Services Computing, 2004 IEEE by analysing queries and suggesting International Conference on (SCC04) by other queries. Stephen Davies, Serdar Badem, Michael D. Williams, Roger King September 2004. Video Object Search User types an object name and Google Ninth IEEE International Conference on finds that object in a video. Computer Vision Volume 2 Josef Sivic, Andrew Zisserman Publication Date: October 2003. MapReduce New functions in Google Linux to OSDI Proceedings, December 2004. speed programming and other processes involving large data sets. Google File System Extension to Google Linux to allow ACM Publication 1-58113-757-5/03/ high-speed data reads and writes from 0010. commodity drives. 21.This is the lex project that “helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine.”The Google Legacy 73
  20. 20. Chapter Three: Google Technology Technology Purpose To Learn More Identify Authoritative or Uses pattern mining in order to Seventh International Database High-Value Sources in generate a numeric value to indicate Engineering and Applications Web Content an authoritative source as an Symposium (IDEAS03) Haofeng Zhou, indication of content quality. Yubo Lou, Qingqing Yuan, Wilfred Ng, Wei Wang, Baile Shi July 2003. MetaCrystal Metasearch technology to allow a Second International Conference on single query to retrieve and organize Coordinated & Multiple Views in results in a visual display. Exploratory Visualization (CMV04) Anselm Spoerri July 2004.Drawbacks of the GoogleplexThe coaching mantra, “No pain without gain” is true for Google. Google does make mistakes:and some big ones. The example fresh in news headlines is Web Accelerator. The product wasintroduced in May 2005 and withdrawn less than six weeks later. Speed and nimbleness aside,Web Accelerator was technology that ran head on into “issues.”Of greater consequence are theperiodic slowdowns for Gmail. The Googleplex is scalable, but until more servers are online,users may face annoying delays.Going Too Fast: The Google Web AcceleratorThe Web Accelerator software was supposed to use Google servers to store Web pages a userviewed. Web Accelerator parsed a page in the user’s browser. The Web Accelerator functionthen followed each link on that specific page. The page was then stored in a Google cache.When the user clicked on a link, the user would see the page from the Google cache, thusreducing the time required to display the page to the user.Web Accelerator worked fine on such sites as a, which makes minimaluse of advanced Web services. Unfortunately, the Web Accelerator function followed linksthat transmitted instructions to Web applications. For example, Web Accelerator would clickon “delete” links, causing some Web applications such as Backpack to remove the user’spreferences or content.22 Web Accelerator blithely ignored confirmations generated byJavaScript so that unintentional instructions were transmitted. Some Google watchers raisedquestions about caching data as well as privacy and copyright issues. Before these concernsreached a crescendo, Google reported that Web Accelerator had reached its capacity. Googleblocked downloads for the product.The Laws of Physics: Heat and Power 101Google does not reveal the number of servers it uses, but the number is believed to be in the150,000 to 170,000 range as of June 30, 2005. Conflicting information surfaces in Web logsand in talks at conferences. In reality, no one knows. Google has a rapidly expanding numberof data centers. The data center near Atlanta, Georgia, is one of the newest deployed. This 22.Backpack is a Web application that sends a user the contents of any page as email. See The Google Legacy
  21. 21. Chapter Three: Google Technologystate-of-the-art facility reflects what Google engineers have learned about heat and powerissues in its other data centers. Within the last 12 months, Google has shifted fromconcentrating its servers at about a dozen data centers, each with 10,000 or more servers, toabout 60 data centers, each with fewer machines.23 The change is a response to the heat andpower issues associated with larger concentrations of Google servers.The most failure prone components are: • Fans. • IDE drives which fail at the rate of one per 1,000 drives per day. • Power supplies which fail at a lower rate.Repairs are batch operations. Scheduling the fixes is a major job and work is underway toimprove the Google-developed scheduling capability. Google has to locate hosting facilitiesthat can meet the company’s heat and power requirements.Other Data Center Issues Google data centers have access to multiple high-speed lines and normal data center functions such as redundant power, traffic routing and strict rules governing access to the physical boxes. PRWeaver’s Web log contained a posting of a photograph allegedly taken inside a Google data center. If true, the physical layout of the racks holding an estimated 2,000 or more servers squeezes a large amount of hardware in a tightly-packed space.This type of dense configuration helps explain the comments about Google’s heat and powerconcerns. Most data centers were not designed to handle dense concentrations of thousands ofservers. Heat contributes to hard drive failures. On the plus side, the dense configurationmakes set up and maintenance somewhat easier. Google packs servers on two sides of a rack.A unique property of the data centers is that replicated content can be written from one datacenter to another. Google data within the data center are replicated on other servers and otherclusters running in the racks.The Google “plug and play” engineering philosophy appears to be used in and across datacenters. If a data center, such as the one shown above, needs additional index server capacity,the technicians in that center can build a Google rack of 40 pizza box servers. These serversare connected to the network. When the rack is powered up, it becomes available to the masterservers for that data center. These master servers then mark the rack’s resources as available.Master servers then begin sending work to the new devices. The information about data 23.These data appear at Google Legacy 75
  22. 22. Chapter Three: Google Technologycenters indicates that this “plug and play” concept and automatic discovery of new resourcesapplies to new data centers, not just the racks within them.It may be an exaggeration that a Google rack and the data center in which the rack residesworks like a USB mouse. The general concept seems to be what Google engineers have tried toachieve. By eliminating such tasks as certifying and configuring Small Computer SystemInterface RAID storage devices, Google is content to let the auto-discovery functionality alert a“master server” to a new resource, master servers to alert other master servers, masters tonotify clients of tasks, and data centers to pass information that racks, clusters or a new datacenter are available for use.A a Google engineer said, “Wherever we put a cluster, we have heat, cooling and powerissues. When we put in a data center, that data center operator faces new challenges. We useeach day four megawatts of electric power.”The problems include: 1 Heat. Special racks with fans that cool the core of the rack are used. 2 Power. The power demand at load is greater than data centers typically sustain. “Our cages are custom built and there’s a lot of work done by us and the data center people before we can flip the switch,” said Jeff Dean, a senior Google engineer. 3 Network management tools. Google has had to create network management tools to manage its self-healing, automatic failover operating system.What’s Up, Sergey?The Google data centers are concentrated in North America with other data centers located inSwitzerland, the Pacific Rim, and Beijing.24Because the GOS is self-healing, the operating system and the various “master computers” in acluster know what device is online and what device is dead. Off-the-shelf networkmanagement tools are not tailored to Google’s requirements. Therefore, Google is developingnetwork management and monitoring tools so that the information in the Google operatingsystem log files can be displayed in a meaningful way to Google network engineers.The overall Googleplex works and continues working even if a device, rack or data centergoes dark or dies. Network management tools have to provide a broad range of monitoringand support functions for the global network, devices, data flows, work loads and potentialproblem areas. Google is developing needed network management tools specifically for its theGoogleplex. 24.The Beijing data center was purpose built to conform to the ruling body’s requirements for online access, monitoring and related issues. Google complied in order to do business in China. Yahoo! bought in order to accelerate its effort in China.76 The Google Legacy
  23. 23. Chapter Three: Google TechnologyUnanticipated Faults Could Derail Google’s JuggernautGoogle’s network uses a number of concepts from the fringes of computer innovation as wellas its hands-on knowledge gained by from the Googleplex itself. The result is a highly-resilient network that may breed problems not previously encountered. Although Google hasoperated for more than five years without downtime from system failure, the possibility –however remote – does exist that something unanticipated could occur. A sufficiently largeproblem could deal Google a severe blow. The advanced technology of Google’s MapReducetool and its 400 module library could pose as yet unforeseen technical problems. The diagram shows how Google’s approach eliminates the bottleneck in parallelized systems produced by excessive message traffic flowing through a server coordinating work among different computers. This is a diagram produced by Google engineers.Summary of Google’s DrawbacksCritics of Google can point to three “problems” with Google’s approach to performance.First, Google is a one-trick pony. The changes to Linux and the other technical modificationsare little more than hackers’ attempts to squeeze a small performance gain.Second, Google’s use of commodity hardware and cheap storage is a risky solution. Unknownproblems may lurk when cheap components are used in a mission-critical system. Increasingthe potential risk are the changes Google makes to speed up program execution.The Google Legacy 77
  24. 24. Chapter Three: Google TechnologyFinally, other operating systems – including those from computer research laboratories andeven Microsoft – do the same things and have for years.Leveraging the GoogleplexGoogle has demonstrated that search is just one application that can run in the Googleenvironment. There are many other applications that can benefit from Google’s approach toonline services. 1 Applications that require a high performance payoff for a low cost such as electronic mail. 2 An application that can run in Google’s redundant environment where there is no private-state replication such as found in IBM’s AS/400 operating environment and others. 3 Computationally-intensive, stateless applications. 4 Applications that require request-level parallelism, a characteristic exploitable by running individual requests on separate servers such as Google Earth.There is little to be gained by trotting out war-horses to trample Google. The user experiencespeaks for itself. Google’s approach to massively-parallel distributed computing works, evenon dial-up networks.Google fused the type of thinking associated with small, cash-strapped companies withtechniques from advanced computer systems. Commodity products keep costs down. Amodified Linux delivers fast performance at a bargain basement cost. Google is taking astrategic risk with commodity hardware and a souped up version of Linux. Each day Googlebets that its technologists can keep the system humming.Another reason why Google’s approach to technology is paying off is that Google employesthe same pragmatism and cleverness in application development. Google uses standardengineering practices, proprietary knowledge, and off-the-shelf techniques such as its use ofWeb services. Google uses the same Web programming techniques that millions of Webdevelopers use. The payoff is that it is easy for Google to hire people who can code for theGoogleplex. Google so far has not had to spend money for developer marketing programs ortrain new hires to work in the Googleplex.The biggest boost to Google’s technical approach is that its competitors are followingdifferent, more expensive approaches. Yahoo is a fruit cake of hardware, operating systems,and applications coded at different times in different languages by different people. Microsoftuses its own operating systems but relies on other operating systems as well, including Solaris.Microsoft’s must invest in hardware to squeeze performance out of its platforms. Yahoowrestles with its many different platforms. Microsoft seems powerless to enhance the speed ofits operating system. Both are digital ostriches burying their heads in their own marketingmaterial.78 The Google Legacy
  25. 25. Chapter Three: Google TechnologyGoogle’s technology is one major challenge to Microsoft and Yahoo. So to conclude thiscursory and vastly simplified look at Google technology, consider these items: 1 Google is fast anywhere in the world. 2 Google learns. When the heat and power problems at dense data centers surfaced, Google introduced cooling and power conservation innovations to its two dozen data centers. 3 Programmers want to work at Google. “Google has cachet,” said one recent University of Washington graduate. 4 Google’s operating and scaling costs are lower than most other firms offering similar businesses. 5 Google squeezes more work out of programmers and engineers by design. 6 Google does not break down, or at least it has not gone offline since 2000. 7 Google’s Googleplex can deliver desktop-server applications now. 8 Google’s applications install and update without burdening the user with gory details and messy crashes. 9 Google’s patents provide basic technology insight pertinent to Google’s core functionality.A young programmer in Osaka or Beijing is very likely to have been influenced by Google.The skilled programmers want to work at Google, develop for the Googleplex, and, if possible,create their own Google killer. The mantra is, “Be like Sergey and Larry”.Google has a next-generation computing platform. That platform is optimised to delivervirtual applications to its users worldwide. Google uses standard Web technologies in cleverways. Although the technical challenges facing Google are formidable, the company hasadvanced the art of online computing.The Google Legacy 79