SlideShare a Scribd company logo
Internet  信息检索中的数学 Zhi-Ming Ma April 24, 2009,  厦门 Email: mazm@amt.ac.cn  http://www.amt.ac.cn/member/mazhiming/index.html
 
How can google make a ranking of  2,040,000  pages  in  0.11  seconds?
A main task of  Internet (Web)  Information Retrieval    = Design and  Analysis of  Search Engine (SE) Algorithm involving plenty of  Mathematics
Inter network  is a large scale complex  random network The Earth is developing an electronic nervous system, a network with diverse  nodes  and  links  are
搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索引编辑器 Page Ranks 网络图生成器 Indexing and Ranking
Static Rank ( 静态排序) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamic Rank (动态排序) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Research on Complex Networks and Information Retrieval ,[object Object]
 
 
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
 
 
 
[object Object],[object Object],[object Object]
   HITS    PageRank 1998  Jon Kleinberg  Cornell University ,[object Object],[object Object]
Nevanlinna Prize ( 2006) Jon Kleinberg ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Page   Rank ,  the ranking system   used by the Google search   engine. ,[object Object],[object Object],[object Object]
 
Markov chain describing  surfing behavior
Markov chain describing  surfing behavior
[object Object],[object Object],[object Object]
where
More generally we may consider  personalized d .: PageRank is the unique positive eigenvector:   By the strong ergodic theorem:
Problem:
 
 
PageRank as a Function of the Damping Factor Paolo Boldi Massimo Santini Sebastiano Vigna DSI, Università degli Studi di Milano WWW 2005  paper 3.1 Choosing the damping factor 3  General Behaviour 3.2 Getting close to 1 ,[object Object],[object Object],[object Object]
is the limit distribution of  P  when the starting distribution is uniform, that is, Conjecture 1   :
Research results by our group: ,[object Object],[object Object],[object Object],[object Object]
Weak points of PageRank ,[object Object],[object Object],[object Object],[object Object],BrowseRankSIGIR.ppt
 
Letting Web Users Vote for Page Importance ,[object Object],[object Object],[object Object],[object Object],[object Object],09/09/10 Yuting Liu@SIGIR'08
 
Browsing Process ,[object Object],[object Object]
 
 
 
BrowseRank: User browsing graph 09/09/10 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition  Edge weight  w ij : The number of transitions  Staying time  T i : The time spend on page  i Reset probability  : Normalized frequencies as first page of session
Mathematical Deduction Maximum likelihood estimation: of staying time
Mathematical Deduction where Therefore
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mathematical Deduction Assume Noise:  Chi-square distribution with degree k
Mathematical Deduction ideally we would have:   However, due to data sparseness,  we encounter challenges……
Mathematical Deduction To tackle this challenge, we turn it into  optimization problems :
 
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object]
Mathematical Deduction ,[object Object],[object Object],[object Object],[object Object]
 
Experiments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],09/09/10 Yuting Liu@SIGIR'08
Website-level: Find good 09/09/10 Yuting Liu@SIGIR'08
Website-level: Fight spam  09/09/10 Yuting Liu@SIGIR'08
 
BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu ,  Bin Gao, Tie-Yan Liu, Ying Zhang,  Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR  Conference on Research & Development  on  Information Retrieval. Best student paper !
BrowseRank: Letting Web Users Vote for Page Importance ,[object Object],[object Object],[object Object]
Further Studies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamic Rank (动态排序) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
Learning to Rank Model Learning  System Ranking  System Wei-Ying Ma, Microsoft Research Asia min Loss
learning to rank in IR is  a  two layer statistical learning   ,[object Object],[object Object],[object Object],[object Object]
Document level  vs  Query level ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Microsoft Scholar Fellowship
[object Object],[object Object],the two layer structure of training data is  not artificial , but  arises from the real world Especially from learning to rank in Information Retrieval
Two-Layer Statistical Learning Framework   ,[object Object],[object Object],:  instances : descriptions of instances Instances are the objectives which we are concern
[object Object],[object Object],[object Object],[object Object],[object Object],a score (or label) of a document an order on a pair of documents a permutation (list) of documents
Training Process i.i.d. For each i,  the associated samples ,  distribution the training data is denoted as
[object Object],[object Object]
empirical object level loss loss function  on expected object level loss
[object Object],expected risk
[object Object]
Generalization Analysis based on Stability Theory ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Definition:  We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as   Object –level stability,  if: Function learned from training data   Function learned from training data
Generalization based on Object-level Stability Object-level stability The number of training objects With probability at least
Note:  if  , then  the bound makes sense.  This condition can be  satisfied in many practical cases. As case studies, we investigate Ranking SVM and RankBoost.  We show that  after introducing query-level normalization to its objective function,  Ranking  SVM  will have query-level stability.  For  RankBoost , the query-level stability can be achieved if we introduce both query-level normalization and regularization to its objective function .  These analyses agree largely with our experiments and the experiments in  Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon,  2006 [5] and [11].
[object Object],[object Object],Query-level Empirical Risk Generalization Bound:
Generalization Bounds Comparison ,[object Object],Generalization Bound: Generalization Bound: Modified  RSVM
RankBoost with Query-level Normalization and Regularization ,[object Object],query-level normalization cannot make  RankBoost have query-level stability. ,[object Object],[object Object],[object Object],[object Object]
Experimental Results (I) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Experimental Results (II)
Future Problems and Challenges ,[object Object],[object Object],[object Object],[object Object]
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
Outlines ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Thank you !
 
 

More Related Content

What's hot

Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
Editor IJARCET
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Ding Li
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit
 
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
IJDKP
 
Done reread deeperinsidepagerank
Done reread deeperinsidepagerankDone reread deeperinsidepagerank
Done reread deeperinsidepagerank
James Arnold
 
Sub1579
Sub1579Sub1579
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
Traian Rebedea
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
IOSR Journals
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
Saeedeh Shekarpour
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0
Ed Chi
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep Project
UKOLN (dev), University of Bath
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
Pvrtechnologies Nellore
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data
1crore projects
 

What's hot (17)

Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
 
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
AN EFFECTIVE RANKING METHOD OF WEBPAGE THROUGH TFIDF AND HYPERLINK CLASSIFIED...
 
Done reread deeperinsidepagerank
Done reread deeperinsidepagerankDone reread deeperinsidepagerank
Done reread deeperinsidepagerank
 
Sub1579
Sub1579Sub1579
Sub1579
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep Project
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data
 

Similar to Mazhiming

Macran
MacranMacran
Macran
Pradip Rahul
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
Pradip Rahul
 
K1803057782
K1803057782K1803057782
K1803057782
IOSR Journals
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET Journal
 
Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web search
Emrullah Delibas
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
ijcsa
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
IOSR Journals
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
Zac Darcy
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IJwest
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
dannyijwest
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
butest
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
butest
 
A017250106
A017250106A017250106
A017250106
IOSR Journals
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
iosrjce
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
IOSR Journals
 
H017124652
H017124652H017124652
H017124652
IOSR Journals
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
ijnlc
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
kevig
 

Similar to Mazhiming (20)

Macran
MacranMacran
Macran
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
 
K1803057782
K1803057782K1803057782
K1803057782
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
 
Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web search
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMSIDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
IDENTIFYING IMPORTANT FEATURES OF USERS TO IMPROVE PAGE RANKING ALGORITHMS
 
Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms Identifying Important Features of Users to Improve Page Ranking Algorithms
Identifying Important Features of Users to Improve Page Ranking Algorithms
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
A017250106
A017250106A017250106
A017250106
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
H017124652
H017124652H017124652
H017124652
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 

Recently uploaded

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
ArianaBusciglio
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 

Recently uploaded (20)

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 

Mazhiming

  • 1. Internet 信息检索中的数学 Zhi-Ming Ma April 24, 2009, 厦门 Email: mazm@amt.ac.cn http://www.amt.ac.cn/member/mazhiming/index.html
  • 2.  
  • 3. How can google make a ranking of 2,040,000 pages in 0.11 seconds?
  • 4. A main task of Internet (Web) Information Retrieval = Design and Analysis of Search Engine (SE) Algorithm involving plenty of Mathematics
  • 5. Inter network is a large scale complex random network The Earth is developing an electronic nervous system, a network with diverse nodes and links are
  • 6. 搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索引编辑器 Page Ranks 网络图生成器 Indexing and Ranking
  • 7.
  • 8.
  • 9.
  • 10.  
  • 11.  
  • 12.
  • 13.  
  • 14.  
  • 15.  
  • 16.  
  • 17.  
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  
  • 23. Markov chain describing surfing behavior
  • 24. Markov chain describing surfing behavior
  • 25.
  • 26. where
  • 27. More generally we may consider personalized d .: PageRank is the unique positive eigenvector: By the strong ergodic theorem:
  • 29.  
  • 30.  
  • 31.
  • 32. is the limit distribution of P when the starting distribution is uniform, that is, Conjecture 1 :
  • 33.
  • 34.
  • 35.  
  • 36.
  • 37.  
  • 38.
  • 39.  
  • 40.  
  • 41.  
  • 42. BrowseRank: User browsing graph 09/09/10 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition Edge weight w ij : The number of transitions Staying time T i : The time spend on page i Reset probability : Normalized frequencies as first page of session
  • 43. Mathematical Deduction Maximum likelihood estimation: of staying time
  • 45.
  • 46. Mathematical Deduction Assume Noise: Chi-square distribution with degree k
  • 47. Mathematical Deduction ideally we would have: However, due to data sparseness, we encounter challenges……
  • 48. Mathematical Deduction To tackle this challenge, we turn it into optimization problems :
  • 49.  
  • 50.
  • 51.
  • 52.
  • 53.  
  • 54.
  • 55. Website-level: Find good 09/09/10 Yuting Liu@SIGIR'08
  • 56. Website-level: Fight spam 09/09/10 Yuting Liu@SIGIR'08
  • 57.  
  • 58. BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu , Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval. Best student paper !
  • 59.
  • 60.
  • 61.
  • 62.
  • 63. Learning to Rank Model Learning System Ranking System Wei-Ying Ma, Microsoft Research Asia min Loss
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70. Training Process i.i.d. For each i, the associated samples , distribution the training data is denoted as
  • 71.
  • 72. empirical object level loss loss function on expected object level loss
  • 73.
  • 74.
  • 75.
  • 76. Definition: We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as Object –level stability, if: Function learned from training data Function learned from training data
  • 77. Generalization based on Object-level Stability Object-level stability The number of training objects With probability at least
  • 78. Note: if , then the bound makes sense. This condition can be satisfied in many practical cases. As case studies, we investigate Ranking SVM and RankBoost. We show that after introducing query-level normalization to its objective function, Ranking SVM will have query-level stability. For RankBoost , the query-level stability can be achieved if we introduce both query-level normalization and regularization to its objective function . These analyses agree largely with our experiments and the experiments in Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, 2006 [5] and [11].
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 89.  
  • 90.  

Editor's Notes

  1. 基于算法的网络搜索技术
  2. That is, when we compute the PI, we should base on the real users’ behavior, in it, all of the transitions reflect the users’ real endorsement from one page to another, and it is without any artificial assumption on the users’ behavior. Second, we should base on the complete users’ behavior. Not only contain the transitions, but also contain the time information spend by users on webpages.
  3. Now we introduce the user browsing graph. Before that, we see a collection of user behavior data. In each piece, it records when the user visit some web pages, and with which visiting type. From huge number of such data, we generate the user browsing graph. In this graph, the vertex denotes the web page, and directed edge denotes the transition between them. And each edge with the number of transitions as its weight. Besides these, we collect the staying time and reset probability as the meta data for each vertex. In one word, the UBG is a directed weighted graph, and with meta-data for each vertex.
  4. In this setting, we do not distinguish the pages in the same website, And we collect the user behavior data from a commercial search engine. We compare BR with PR and TR. We want to justify the BR is an efficient method to find good websites and fight spam websites.
  5. This is the result figure. It lists top 20 websites ranked by three methods, and we highlight the web 2 websites, which are considered more important, because more users visit them frequently and want to spend longer time on them. From the figure, we find that BR can rank more web 2 websites in top positions than PR and TR
  6. As to the spam page, we did the following experiment, first, we use these three methods give ranking list for all websites, and split them into 15 buckets, and the sum of pr in each bucket is equal to others. These three collumn number are the number of spam websites in these buckets. We find BR rank less spam websites in top buckets than other two methods, and push most spam websites in the tail bucket.