SlideShare a Scribd company logo
구글을 지탱하는 기술
구글을 지탱하는 기술 – chapter1.ppt
1. First Appearance of Google
2. Main Concepts
3. Search Engine Structure
    - ‘s Roll
    - Back-end Structure
    - Index Structure
4. Total Structure
First Appearance of Google


• Why?
           Get useful results


• Who?
           Sergey Brin & Larry Page
Main Concepts



Hardware expands


Ranking Function
         – Page Rank
         – Anchor Text
         – Word
Search Engine Structure




                      Internet
    Search Engine
Search Engine
Structure



Search Server’s Roll



• 통신 관리                                 Back-
                       Search
                                Index
                       Server            end
• 요청 해석하여 처리할 내용 판단

• 인덱스에서 필요한 정보 찾아냄

• 결과를 편집해 이용자에게 보냄
Search Engine
Structure



Back-end’s Roll

• Crawling

     •Web page 수집해 오는 기술
                                                  Back-
                                 Search
                                          Index
                                 Server            end
     •많은 시간 -> 복수의 crawler 사용

     •수집한 것을 Repository에 보관


• Creating Index

     •Repository에 저장된 web page
     로 Index를 만들어 냄

     •구조분석, 단어처리, 링크 처리
      랭킹 등
Search Engine
Structure



Index’s Roll



• 주어진 Data를 안전하게 저장                             Back-
                               Search
                                        Index
                               Server            end
• 요청 받은 Data를 찾아냄

• Search Engine의 Data Base 역
할
Search Engine
Structure
Back-end Structure



Crawling

Web page 수집해 오는 기술



초기 Google 2400만개 Web Page 등록

초당 avg40page를 유지하기 위해선
동시에 수백 개의 download유지

-> 현재는??

구글 검색했을 때 3,070,000,000개 결과
Search Engine
Structure
Back-end Structure
                               URL
                              server
                                                     crawler
Crawler

                                          crawler
URL server 가 전체 crawler 지휘

각 crawler는 지시에 따라             crawler
                                                           Internet
Web Page download

Repository에 임시 저장

• docID – 고유 숫자 값
                                        Repository
• url  – URL
• text – 압축물
• etc. – date, page length…
Search Engine
Structure
Back-end Structure
                       URL
                      server
                                             crawler
Crawler

                                  crawler
주소해석이 시간 많이 소요
-> 내부에 DNS cache 관리
                      crawler
                                                   Internet
Repository에 저장후
URL server가 다음주소 할당



                                Repository
Search Engine
Structure
Back-end Structure
                                                         docID   Sejong.ac.k
                                                          url         r
                                        <html>
                                                           1
                                        <head>
Creating Index                  <title>세종대학교</title>
                                        </body>
                                   <h1>학사정보<h1>
                                                                 세종대학교
                                                         Title
                                           ….
                                                         기타        …
Analyzing Web Page structures


DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

                                       DocIndex              URLlist
URLlist
– url을 key로 사용                    docID url title etc.     url docID
– docID를 가져오기 위함
Search Engine                           Lexicon
Structure
                                     word    wordID
Back-end Structure
                                     세종       101
                                                                      Barrels
                                     대학교      102
                                     학사       201
Creating Index                       정보       202


                                                         Barrels
                                     docID    wordID#1   Position#1   Size#1    Etc.#1
Word Index
                                                         Position#2   Size#2    Etc.#2

Lexicon                                       wordID#2   Position#1   Size#1    Etc.#1
 – word -> wordID
                                                         Position#2   Size#2    Etc.#2

                                                            …
Barrels
 – docID wordID position size etc.

Inverted Index
 – wordID를 Key로 사용
Search Engine
Structure
Back-end Structure


                                 docID    Sejong.ac.k
                                                               docID       3
Creating Index                    url          r
                                                                url    Cyworld.com
                                   1

                                                        Link

Link Index


URLlist
                                          URLlist
Links                                                                Links
                                 Sejong.ac.kr       1              1     3
                                 Cyworld.com        3
Anchortext
- A information of linked page
Search Engine
Structure
Back-end Structure



Creating Index



Ranking Index


Page Rank - Link
                       Web Page 사이의 link를 일종의 투표처럼 분석
                       -> 더 많은 link를 받은 문서 = 더 좋은 문서
Anchortext
Word       - Barrels
Search Engine
Structure
                      DocIndex
Index Structure


                       Lexicon

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용


Lexicon
– word -> wordID


                        Barrels
Barrels
– storages
Total Structure

User

         Index                   Back-end           Internet


                                  crawler
         DocIndex
Search
Server                            crawler

          Lexicon
                                  crawler

                     Structure
                                                         URL
                                                        server
                       word
         Barrels
          Barrels
           Barrels               Repository

                       Link
                                              URLlist

                     Ranking
                                    Links
Thanks for your attention
구글을 지탱하는 기술

More Related Content

Similar to 구글을 지탱하는 기술

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
ITDogadjaji.com
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
IE Group
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Dan Usher
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
Dan Usher
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
guestd9aa5
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
guru122
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Deploy Software Solutions ("Deploy Solutions")
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
Tom Croucher
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
Dan Usher
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Cengage Learning
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
Dali Wang
 
Google
GoogleGoogle
Google
ConveyUX
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
Stoney deGeyter
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
hernanibf
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
Marianne Sweeny
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
Paul James
 
Session6
Session6Session6
Session6
Denise Garofalo
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
kmloomis
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
Shaun Haber
 

Similar to 구글을 지탱하는 기술 (20)

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
Google
GoogleGoogle
Google
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Session6
Session6Session6
Session6
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
 

More from sid choi

벤치마킹
벤치마킹벤치마킹
벤치마킹sid choi
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4sid choi
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
sid choi
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
sid choi
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹sid choi
 
미코노미
미코노미미코노미
미코노미sid choi
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는sid choi
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 

More from sid choi (15)

벤치마킹
벤치마킹벤치마킹
벤치마킹
 
Meconomy
MeconomyMeconomy
Meconomy
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹
 
미코노미
미코노미미코노미
미코노미
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 

구글을 지탱하는 기술

  • 1.
  • 3. 구글을 지탱하는 기술 – chapter1.ppt
  • 4. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure
  • 5. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page
  • 6. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word
  • 7. Search Engine Structure Internet Search Engine
  • 8. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄
  • 9. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등
  • 10. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할
  • 11. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과
  • 12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…
  • 13. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository
  • 14. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함
  • 15. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용
  • 16. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page
  • 17. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels
  • 18. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages
  • 19. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links
  • 20. Thanks for your attention