Google Architecture - Breaking it Open

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
14,484
On Slideshare
11,902
From Embeds
2,582
Number of Embeds
25

Actions

Shares
Downloads
300
Comments
0
Likes
19

Embeds 2,582

http://adititechnologiesblog.blogspot.in 1,007
http://adititechnologiesblog.blogspot.com 677
http://www.alivenow.in 652
http://blog.aditi.com 137
http://alivenow.in 43
http://adititechnologiesblog.blogspot.co.uk 20
https://twitter.com 7
http://feeds.feedburner.com 4
http://adititechnologiesblog.blogspot.ie 4
http://adititechnologiesblog.blogspot.com.br 3
http://localhost 3
http://adititechnologiesblog.blogspot.mx 3
http://adititechnologiesblog.blogspot.kr 3
http://adititechnologiesblog.blogspot.de 3
https://www.linkedin.com 2
http://adititechnologiesblog.blogspot.it 2
http://adititechnologiesblog.blogspot.com.au 2
http://adititechnologiesblog.blogspot.ru 2
http://adititechnologiesblog.blogspot.co.nz 2
http://adititechnologiesblog.blogspot.nl 1
http://131.253.14.66 1
http://adititechnologiesblog.blogspot.co.il 1
http://www.docseek.net 1
http://adititechnologiesblog.blogspot.ca 1
http://www.adititechnologiesblog.blogspot.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Learning and Development Be part of the learning experience at Aditi. presents Join the talks. Its free. Free as in freedom at work, not free-beer. Its not training. Its mind-opener. Speak at these events. Or bring an expert/friend to talk. Open Talk Series Mail OpenTalk@aditi.com with topic and A series of illuminating talks and interactions that open our minds to new availability.ideas and concepts; that makes us look for newer or better ways of doing what we did; or point us to exciting things we have never done before. A range of topics on Usually at 4.30PM Wednesdays. Technology, Business, Fun and Life.
  • 2. HOW TO ENJOY AN TALKBring coffee & friends Switch OFF mobile Switch ON mindSign attendance sheet SHARE your wisdom QUESTION notions THANK the Talker SPREAD the good word
  • 3. New Champion Sahil SagarAditi Technologies | Partnering Innovation
  • 4. Agenda • We are not talking about crawler • No discussion on PageRank… maybe? 4Aditi Technologies | Partnering Innovation
  • 5. The art of scale 10-50 users 100-500 users 500-10000 5Aditi Technologies | Partnering Innovation
  • 6. Scale ???? 800,000 Machines Largest Linux Base 6Aditi Technologies | Partnering Innovation
  • 7. • What gives us this scale? Good Code? More servers? Powerful Servers? 7Aditi Technologies | Partnering Innovation
  • 8. • Lets see what gives Google the scale Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL The apps on top GMAIL... Python. Java. Python, Java, C++, of it. C++ Sawzall, other GWQ Mapreduce BigTable BigTable The Secret Sauce Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE Infrastructure SERVER HARDWARE RACK DC Exterior Network 8Aditi Technologies | Partnering Innovation
  • 9. Scale in Google Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other 1. The first touch GWQ Mapreduce 2. Size does matter BigTable BigTable Chubby Lock 3. The Safe GFS / GFS II 4. Operating System Implementation INTERIOR NETWORK IPv6 RHEL 2.6.X PAE 5. Interior Network Architecture SERVER HARDWARE RACK DC Exterior Network 9Aditi Technologies | Partnering Innovation
  • 10. The first touch to the services 10Aditi Technologies | Partnering Innovation
  • 11. The first touch to the service Architecture GOOGLE APPS SEARCH GOOGLE APP ENGINE INDEX CRAWL Client Browser Firewall DMZ GMAIL... 80/443 80/443 Perimeter Firewall Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce Squid GWS BigTable Reverse Proxy Web Server Farm Chubby Lock NetScalar http multiplexing Cell Interior Network GFS II etc GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 11Aditi Technologies | Partnering Innovation
  • 12. The touch is not always real Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL 80/443 80/443 GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ Squid Reverse Proxy BigTable Mapreduce BigTable Chubby Lock • Uses Squid Reverse Proxy • Perimeter Cache hit rates 30-60% = Huge! GFS / GFS II • Dependent on search complexity/user preferences/traffic INTERIOR NETWORK IPv6 type RHEL 2.6.X PAE • All Image Thumbnails caches, much Multimedia cached SERVER HARDWARE RACK • Expensive common queries cached (common words like DC ‘Obama‘) as they require significant back-end processing. Exterior Network 12Aditi Technologies | Partnering Innovation
  • 13. Size does matter 13Aditi Technologies | Partnering Innovation
  • 14. Worldwide Data Centres Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of DC 800K machines. Exterior Network 14Aditi Technologies | Partnering Innovation
  • 15. The Modular Data Centre Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U). INTERIOR NETWORK IPv6 RHEL 2.6.X PAE This is the “Atomic“ Data Centre Building Block of Google. SERVER HARDWARE A Data Centre would consist of 100‘s of Modular Cells. RACK DC Exterior Network 15Aditi Technologies | Partnering Innovation
  • 16. THE Safe How is a server stored in the Data Centre? 16Aditi Technologies | Partnering Innovation
  • 17. Google Rack (GOOG rack) Architecture EVERYTHING custom! GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... • Optimized Motherboards Python. Java. C++ Python, Java, C++, Sawzall, other • Have their own HW builds GWQ • Build redundancy on top of failure BigTable Mapreduce BigTable • Motherboard directly Chubby Lock mounted into Rack • Servers have no casing - GFS / GFS II just bare boards • Assist with heat dispersal INTERIOR NETWORK IPv6 issues RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 17Aditi Technologies | Partnering Innovation
  • 18. THE OPERATING SYSTEM The Core Software on each of those servers 18Aditi Technologies | Partnering Innovation
  • 19. OPERATING SYSTEM Architecture GOOGLE APPS GOOGLE APP SEARCH INDEX -100% Redhat Linux Based since 1998 inception ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, - RHEL C++ Sawzall, other - 2.6.X Kernel GWQ - PAE - Custom glibc.. rpc... ipvs... Mapreduce - Custom FS (GFS II) BigTable BigTable - Custom Kerberos Chubby Lock - Custom NFS - Custom CUPS - Custom gPXE bootloader - Custom EVERYTHING..... GFS / GFS II INTERIOR NETWORK IPv6 Kernel/Subsystem Modifications tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads... RHEL 2.6.X PAE rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%) SERVER HARDWARE Significantly modified Kernel and Subsystems – all IPv6 enabled RACK DC Exterior Network 19Aditi Technologies | Partnering Innovation
  • 20. THE Secret Sauce 20Aditi Technologies | Partnering Innovation
  • 21. Section II – Googles Major Glue Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ 1. Google File System Architecture – GFS II BigTable Mapreduce BigTable Chubby Lock 2. Google Database - Bigtable 3. Google Computation - Mapreduce GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 21Aditi Technologies | Partnering Innovation
  • 22. GOOGLE FILE SYSTEM Manages the underlying Data on behalf of the upper layers and ultimately the applications 22Aditi Technologies | Partnering Innovation
  • 23. GFS versus NFS Network File System (NFS) Google File System (GFS) • Single machine makes part of  Single virtual file system spread over its file system available to many machines other machines  Optimized for sequential read • Sequential or random access and local accesses • PRO: Simplicity, generality,  PRO: High throughput, high transparency capacity • CON: Storage capacity and  "CON": Specialized for particular throughput limited by single types of applications server 23 University of PennsylvaniaAditi Technologies | Partnering Innovation
  • 24. FILE SYSTEM I – GFS II Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE Elegant Master Failover SERVER HARDWARE Chunk Size is now 1MB RACK Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so DC assumed extremely reliable Exterior Network 24Aditi Technologies | Partnering Innovation
  • 25. CAP Theorem (Brewers theorem) • Consistency: All nodes see the same data at the same time • Availability: Node failures do not prevent survivors from continuing to operate • Partition tolerance: The system continues to operate despite arbitrary message loss 25Aditi Technologies | Partnering Innovation
  • 26. GOOGLE DATABASE Accesses the underlying Data on behalf of the upper layers and ultimately the applications 26Aditi Technologies | Partnering Innovation
  • 27. Why not commercial DB? • Scale is too large for most commercial databases • Cost would be very high – Building internally means system can be applied across many projects for low incremental cost • Low-level storage optimizations help performance significantly – Much harder to do when running on top of a database layer “Also fun and challenging to build large-scale systems” 27Aditi Technologies | Partnering Innovation
  • 28. BigTable • A distributed storage system for managing structured data. • Scalable – Thousands of servers – Terabytes of in-memory data – Petabyte of disk-based data – Millions of reads/writes per second, efficient scans • Self-managing – Servers can be added/removed dynamically – Servers adjust to load imbalance • Used for many Google projects – Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … 28Aditi Technologies | Partnering Innovation
  • 29. BigTable • Physically sorted on row-key – like a row-store • Column families - like column-stores • Variable (record-by-record) columns within a column family • Column-values versioned; stored in reverse chronological order • Designed to store hyperlink structure of webAditi Technologies | Partnering Innovation
  • 30. GOOGLE MAPREDUCE Computes the underlying Data on behalf of the applications 30Aditi Technologies | Partnering Innovation
  • 31. Mapreduce I Architecture GOOGLE APPS SEARCH GOOGLE APP ENGINE INDEX CRAWL Map Reduction can be seen as a way to exploit massive parallelism GMAIL... by breaking a task down into constituent parts and executing on Python. Java. Python, Java, C++, C++ Sawzall, other multiple processors GWQ The Major Functions are MAP & REDUCE (with a number of intermediatary steps BigTable Mapreduce MAP Break task down into parallel steps BigTable Chubby Lock REDUCE Combine results into final output GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline) RACK Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!) DC Exterior Network 31Aditi Technologies | Partnering Innovation
  • 32. Word-Count using MapReduce Problem: determine the frequency of each word in a large document collectionAditi Technologies | Partnering Innovation
  • 33. What runs on top of all this 33Aditi Technologies | Partnering Innovation
  • 34. PageRank: Intuition Shouldnt Es vote be worth more than Fs? G A H E BHow many levels I Cshould we consider? F J D • Imagine a contest for The Webs Best Page – Initially, each page has one vote – Each page votes for all the pages it has a link to – To ensure fairness, pages voting for more than one page must split their vote equally between them – Voting proceeds in rounds; in each round, each page has the number of votes it received in the previous round – In practice, its a little more complicated - but not much! 34Aditi Technologies | Partnering Innovation
  • 35. Random Surfer Model • PageRank has an intuitive basis in random walks on graphs • Imagine a random surfer, who starts on a random page and, in each step, – with probability d, clicks on a random link on the page – with probability 1-d, jumps to a random page (bored?) • The PageRank of a page can be interpreted as the fraction of steps the surfer spends on the corresponding page 35Aditi Technologies | Partnering Innovation
  • 36. BUILD YOUR OWN GOOGLE The Basic Open Source Tools 36Aditi Technologies | Partnering Innovation
  • 37. The Google Stack (vs Yahoo‘ish/Open Source) Open Source (Yahoo’ish) Architecture Architecture GOOGLE APPS SEARCH APP ENGINE INDEX CLIENT APPLICATION CRAWL GMAIL... Python, Java, Python, Java, C++, Pig Latin, Python, PHP, Java .... C++, Sawzall, other anything Task Queue GWQ Job Tracker Googles Mapreduce Hadoop Framework Hadoop BigTable Secret Sauce BigTable Chubby Lock Mapreduce Hbase (Bigtable equiv.) Open Source (Other Tools such as crawlers, indexers readily available) GFS / GFS II HDFS (hadoop) INTERIOR NETWORK IPv6 INTERIOR NETWORK IPv6 RHEL 2.6.X PAE CentOS 2.6.X PAE SERVER HARDWARE SERVER HARDWARE RACK RACK DC DC Exterior Network Exterior Network Conceptual Overview Google vs. Open Source 37Aditi Technologies | Partnering Innovation
  • 38. END (Thankyou) 38Aditi Technologies | Partnering Innovation
  • 39. Pre Presentation The Google Philosophy (according to ed) • Jedis build their own lightsabres (the MS Eat your own Dog Food) • Parallelize Everything • Distribute Everything (to atomic level if possible) • Compress Everything (CPU cheaper than bandwidth) • Secure Everything (you can never be too paranoid) • Cache (almost) Everything • Redundantize Everything (in triplicate usually) • Latency is VERY evil 39Aditi Technologies | Partnering Innovation
  • 40. Special Thanks to …. The Anatomy of the Google Architecture “The unofficial Version“ V1.0 November 2009 • Ed Austin • {ed, edik} @i-dot.comAditi Technologies | Partnering Innovation
  • 41. Keep LearningFor any suggestions on topics/ feedbacks etc., Contact OpenTalk@aditi.com