• Save
구글을 지탱하는 기술
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,779
On Slideshare
1,709
From Embeds
70
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 70

http://kms.sec.samsung.net 66
http://hellogomtae.blogspot.com 2
http://hellogomtae.blogspot.kr 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 구글을 지탱하는 기술
  • 2. 구글을 지탱하는 기술 – chapter1.ppt
  • 3. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure
  • 4. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page
  • 5. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word
  • 6. Search Engine Structure Internet Search Engine
  • 7. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄
  • 8. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등
  • 9. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할
  • 10. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과
  • 11. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…
  • 12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository
  • 13. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함
  • 14. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용
  • 15. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page
  • 16. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels
  • 17. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages
  • 18. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links
  • 19. Thanks for your attention