SlideShare a Scribd company logo
구글을 지탱하는 기술
구글을 지탱하는 기술 – chapter1.ppt
1. First Appearance of Google
2. Main Concepts
3. Search Engine Structure
    - ‘s Roll
    - Back-end Structure
    - Index Structure
4. Total Structure
First Appearance of Google


• Why?
           Get useful results


• Who?
           Sergey Brin & Larry Page
Main Concepts



Hardware expands


Ranking Function
         – Page Rank
         – Anchor Text
         – Word
Search Engine Structure




                      Internet
    Search Engine
Search Engine
Structure



Search Server’s Roll



• 통신 관리                                 Back-
                       Search
                                Index
                       Server            end
• 요청 해석하여 처리할 내용 판단

• 인덱스에서 필요한 정보 찾아냄

• 결과를 편집해 이용자에게 보냄
Search Engine
Structure



Back-end’s Roll

• Crawling

     •Web page 수집해 오는 기술
                                                  Back-
                                 Search
                                          Index
                                 Server            end
     •많은 시간 -> 복수의 crawler 사용

     •수집한 것을 Repository에 보관


• Creating Index

     •Repository에 저장된 web page
     로 Index를 만들어 냄

     •구조분석, 단어처리, 링크 처리
      랭킹 등
Search Engine
Structure



Index’s Roll



• 주어진 Data를 안전하게 저장                             Back-
                               Search
                                        Index
                               Server            end
• 요청 받은 Data를 찾아냄

• Search Engine의 Data Base 역
할
Search Engine
Structure
Back-end Structure



Crawling

Web page 수집해 오는 기술



초기 Google 2400만개 Web Page 등록

초당 avg40page를 유지하기 위해선
동시에 수백 개의 download유지

-> 현재는??

구글 검색했을 때 3,070,000,000개 결과
Search Engine
Structure
Back-end Structure
                               URL
                              server
                                                     crawler
Crawler

                                          crawler
URL server 가 전체 crawler 지휘

각 crawler는 지시에 따라             crawler
                                                           Internet
Web Page download

Repository에 임시 저장

• docID – 고유 숫자 값
                                        Repository
• url  – URL
• text – 압축물
• etc. – date, page length…
Search Engine
Structure
Back-end Structure
                       URL
                      server
                                             crawler
Crawler

                                  crawler
주소해석이 시간 많이 소요
-> 내부에 DNS cache 관리
                      crawler
                                                   Internet
Repository에 저장후
URL server가 다음주소 할당



                                Repository
Search Engine
Structure
Back-end Structure
                                                         docID   Sejong.ac.k
                                                          url         r
                                        <html>
                                                           1
                                        <head>
Creating Index                  <title>세종대학교</title>
                                        </body>
                                   <h1>학사정보<h1>
                                                                 세종대학교
                                                         Title
                                           ….
                                                         기타        …
Analyzing Web Page structures


DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

                                       DocIndex              URLlist
URLlist
– url을 key로 사용                    docID url title etc.     url docID
– docID를 가져오기 위함
Search Engine                           Lexicon
Structure
                                     word    wordID
Back-end Structure
                                     세종       101
                                                                      Barrels
                                     대학교      102
                                     학사       201
Creating Index                       정보       202


                                                         Barrels
                                     docID    wordID#1   Position#1   Size#1    Etc.#1
Word Index
                                                         Position#2   Size#2    Etc.#2

Lexicon                                       wordID#2   Position#1   Size#1    Etc.#1
 – word -> wordID
                                                         Position#2   Size#2    Etc.#2

                                                            …
Barrels
 – docID wordID position size etc.

Inverted Index
 – wordID를 Key로 사용
Search Engine
Structure
Back-end Structure


                                 docID    Sejong.ac.k
                                                               docID       3
Creating Index                    url          r
                                                                url    Cyworld.com
                                   1

                                                        Link

Link Index


URLlist
                                          URLlist
Links                                                                Links
                                 Sejong.ac.kr       1              1     3
                                 Cyworld.com        3
Anchortext
- A information of linked page
Search Engine
Structure
Back-end Structure



Creating Index



Ranking Index


Page Rank - Link
                       Web Page 사이의 link를 일종의 투표처럼 분석
                       -> 더 많은 link를 받은 문서 = 더 좋은 문서
Anchortext
Word       - Barrels
Search Engine
Structure
                      DocIndex
Index Structure


                       Lexicon

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용


Lexicon
– word -> wordID


                        Barrels
Barrels
– storages
Total Structure

User

         Index                   Back-end           Internet


                                  crawler
         DocIndex
Search
Server                            crawler

          Lexicon
                                  crawler

                     Structure
                                                         URL
                                                        server
                       word
         Barrels
          Barrels
           Barrels               Repository

                       Link
                                              URLlist

                     Ranking
                                    Links
Thanks for your attention
구글을지탱하는기술

More Related Content

Viewers also liked

Inversión española en el Perú
Inversión española en el PerúInversión española en el Perú
Inversión española en el Perú
LLYC
 
File
FileFile
Con mi imaginación y la computación escribo una oración dedicada a mi educación
Con mi imaginación y la computación escribo una oración dedicada a mi educaciónCon mi imaginación y la computación escribo una oración dedicada a mi educación
Con mi imaginación y la computación escribo una oración dedicada a mi educación
Flor Ramos
 
Presentación1
Presentación1Presentación1
Presentación1
veronica1822
 
Steelmood marcas diferenciales
Steelmood marcas diferencialesSteelmood marcas diferenciales
Steelmood marcas diferenciales
Steelmood
 
PresentacióN Nono
PresentacióN NonoPresentacióN Nono
PresentacióN Nonoguest6ac5e1
 

Viewers also liked (7)

Inversión española en el Perú
Inversión española en el PerúInversión española en el Perú
Inversión española en el Perú
 
File
FileFile
File
 
Con mi imaginación y la computación escribo una oración dedicada a mi educación
Con mi imaginación y la computación escribo una oración dedicada a mi educaciónCon mi imaginación y la computación escribo una oración dedicada a mi educación
Con mi imaginación y la computación escribo una oración dedicada a mi educación
 
Presentación 3
Presentación 3Presentación 3
Presentación 3
 
Presentación1
Presentación1Presentación1
Presentación1
 
Steelmood marcas diferenciales
Steelmood marcas diferencialesSteelmood marcas diferenciales
Steelmood marcas diferenciales
 
PresentacióN Nono
PresentacióN NonoPresentacióN Nono
PresentacióN Nono
 

Similar to 구글을지탱하는기술

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
ITDogadjaji.com
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
IE Group
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Dan Usher
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
Dan Usher
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
guestd9aa5
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
guru122
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Deploy Software Solutions ("Deploy Solutions")
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
Tom Croucher
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
Dan Usher
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Cengage Learning
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
Dali Wang
 
Google
GoogleGoogle
Google
ConveyUX
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
Stoney deGeyter
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
hernanibf
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
Marianne Sweeny
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
Paul James
 
Session6
Session6Session6
Session6
Denise Garofalo
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
kmloomis
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
Shaun Haber
 

Similar to 구글을지탱하는기술 (20)

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
Google
GoogleGoogle
Google
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Session6
Session6Session6
Session6
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
 

More from sid choi

벤치마킹
벤치마킹벤치마킹
벤치마킹sid choi
 
웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각sid choi
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4sid choi
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
sid choi
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
sid choi
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹sid choi
 
미코노미
미코노미미코노미
미코노미sid choi
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는sid choi
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
sid choi
 

More from sid choi (16)

벤치마킹
벤치마킹벤치마킹
벤치마킹
 
Meconomy
MeconomyMeconomy
Meconomy
 
웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹
 
미코노미
미코노미미코노미
미코노미
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 

Recently uploaded

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 

Recently uploaded (20)

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 

구글을지탱하는기술

  • 1.
  • 3. 구글을 지탱하는 기술 – chapter1.ppt
  • 4. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure
  • 5. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page
  • 6. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word
  • 7. Search Engine Structure Internet Search Engine
  • 8. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄
  • 9. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등
  • 10. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할
  • 11. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과
  • 12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…
  • 13. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository
  • 14. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함
  • 15. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용
  • 16. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page
  • 17. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels
  • 18. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages
  • 19. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links
  • 20. Thanks for your attention