Your SlideShare is downloading. ×
0
Internet  信息检索中的数学 Zhi-Ming Ma April 24, 2009,  厦门 Email: mazm@amt.ac.cn  http://www.amt.ac.cn/member/mazhiming/index.html
 
How can google make a ranking of  2,040,000  pages  in  0.11  seconds?
A main task of  Internet (Web)  Information Retrieval    = Design and  Analysis of  Search Engine (SE) Algorithm involving...
Inter network  is a large scale complex  random network The Earth is developing an electronic nervous system, a network wi...
搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索...
Static Rank ( 静态排序) <ul><li>Importance ranking </li></ul><ul><ul><li>Goal: compute page importance, page authority </li></...
Dynamic Rank (动态排序) <ul><li>Relevance ranking  (相关性排序) </li></ul><ul><ul><li>Goal: compute the content match relevant scor...
Research on Complex Networks and Information Retrieval <ul><li>In recent years we have been involved in the research direc...
 
 
Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li>...
 
 
 
 
 
<ul><li>Browse Rank    </li></ul><ul><li>vs.   </li></ul><ul><li>Page Rank  ? </li></ul>
   HITS    PageRank 1998  Jon Kleinberg  Cornell University <ul><li>Sergey Brin and Larry Page  </li></ul><ul><li>Stanfo...
Nevanlinna Prize ( 2006) Jon Kleinberg <ul><li>One of Kleinberg‘s most important research achievements focuses on the inte...
Page   Rank ,  the ranking system   used by the Google search   engine. <ul><li>Query independent  </li></ul><ul><li>conte...
 
Markov chain describing  surfing behavior
Markov chain describing  surfing behavior
<ul><li>Web surfers usually have two basic ways to access  web pages: </li></ul><ul><li>with probability α, they visit a w...
where
More generally we may consider  personalized d .: PageRank is the unique positive eigenvector:   By the strong ergodic the...
Problem:
 
 
PageRank as a Function of the Damping Factor Paolo Boldi Massimo Santini Sebastiano Vigna DSI, Università degli Studi di M...
is the limit distribution of  P  when the starting distribution is uniform, that is, Conjecture 1   :
Research results by our group: <ul><li>Limit of PageRank </li></ul><ul><li>Comparison of Different Irreducible Markov Chai...
Weak points of PageRank <ul><li>Using only  static  web graph structure </li></ul><ul><li>Reflecting only the will of web ...
 
Letting Web Users Vote for Page Importance <ul><li>When calculating the page importance, </li></ul><ul><ul><li>Use the use...
 
Browsing Process <ul><li>Markov property </li></ul><ul><li>Time-homogeneity </li></ul>
 
 
 
BrowseRank: User browsing graph 06/09/09 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition  Edge weight  w ij : The nu...
Mathematical Deduction Maximum likelihood estimation: of staying time
Mathematical Deduction where Therefore
Mathematical Deduction <ul><li>Additional Noise: </li></ul><ul><li>the speed of the Internet connection,  </li></ul><ul><l...
Mathematical Deduction Assume Noise:  Chi-square distribution with degree k
Mathematical Deduction ideally we would have:   However, due to data sparseness,  we encounter challenges……
Mathematical Deduction To tackle this challenge, we turn it into  optimization problems :
 
Mathematical Deduction <ul><ul><li>Stationary distribution: </li></ul></ul><ul><ul><li>is the mean of the staying time on ...
Mathematical Deduction <ul><li>Properties of  Q process:  </li></ul><ul><ul><li>Jumping probability is conditionally indep...
Mathematical Deduction <ul><ul><li>is the stationary distribution of  </li></ul></ul><ul><ul><li>The stationary distributi...
 
Experiments <ul><li>Data set: </li></ul><ul><ul><li>5.6 million vertices </li></ul></ul><ul><ul><li>53 million edges </li>...
Website-level: Find good 06/09/09 Yuting Liu@SIGIR'08
Website-level: Fight spam  06/09/09 Yuting Liu@SIGIR'08
 
BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu ,  Bin Gao, Tie-Yan Liu, Ying Zhang,  Zhiming Ma, Shuyua...
BrowseRank: Letting Web Users Vote for Page Importance <ul><li>Google search ,  </li></ul><ul><li>110,000,000  results for...
Further Studies <ul><li>Browsing Processes will be a  </li></ul><ul><li>Basic Mathematical Tool  </li></ul><ul><li>in Inte...
Dynamic Rank (动态排序) <ul><li>Relevance ranking  (相关性排序) </li></ul><ul><ul><li>Goal: compute the content match relevant scor...
Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li>...
Learning to Rank Model Learning  System Ranking  System Wei-Ying Ma, Microsoft Research Asia min Loss
learning to rank in IR is  a  two layer statistical learning   <ul><li>Distributions of the relevance judgments of documen...
Document level  vs  Query level <ul><li>Two queries in total. </li></ul><ul><li>Same errors in terms of pairwise classific...
<ul><li>Query-Level Stability and Generalization in Learning to Rank,  </li></ul><ul><li>to appear in Proceedings of the 2...
<ul><li>We propose a new framework of statistical learning model, in which the </li></ul><ul><li>training dada are compose...
Two-Layer Statistical Learning Framework   <ul><li>First layer:  objects  </li></ul><ul><li>Second layer: associated sampl...
<ul><li>In learning to rank for IR,  an object is a query, </li></ul><ul><li>an instance  and  corresponding description c...
Training Process i.i.d. For each i,  the associated samples ,  distribution the training data is denoted as
<ul><li>The algorithm of a training process is to learn from the training data a  function  </li></ul><ul><li>that will be...
empirical object level loss loss function  on expected object level loss
<ul><li>empirical risk </li></ul>expected risk
<ul><li>The challenge is that when dealing with two layer training data, most of the existing results of statistical learn...
Generalization Analysis based on Stability Theory <ul><li>Devroye, L. and Wagner, T.(1979).  Stability </li></ul><ul><li>s...
Definition:  We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as   Object –level st...
Generalization based on Object-level Stability Object-level stability The number of training objects With probability at l...
Note:  if  , then  the bound makes sense.  This condition can be  satisfied in many practical cases. As case studies, we i...
<ul><li>IRSVM :  modified Ranking SVM  in Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, 2006 </li></ul><ul><li...
Generalization Bounds Comparison <ul><li>Ranking SVM </li></ul>Generalization Bound: Generalization Bound: Modified  RSVM
RankBoost with Query-level Normalization and Regularization <ul><li>introducing query-level normalization to  RankBoost do...
Experimental Results (I) <ul><li>Query-Level Stability </li></ul><ul><li>1200 queries from a search  </li></ul><ul><li>eng...
<ul><li>Query-level Generalization Bound </li></ul>Experimental Results (II)
Future Problems and Challenges <ul><li>It is worth to see whether new learning to rank algorithms can be derived under the...
Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li>...
Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li>...
<ul><li>We have briefly reviewed part of our recent joint work (in collaboration with Microsoft Research Asia) concerning ...
Thank you !
Upcoming SlideShare
Loading in...5
×

Internet 信息检索中的数学

1,889

Published on

2009年4月23日15:00,马志明院士在厦门大学克立楼3楼报告厅做演讲,题目是互联网信息检索中的数学,整个报告非常精彩。

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,889
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
45
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Transcript of "Internet 信息检索中的数学"

    1. 1. Internet 信息检索中的数学 Zhi-Ming Ma April 24, 2009, 厦门 Email: mazm@amt.ac.cn http://www.amt.ac.cn/member/mazhiming/index.html
    2. 3. How can google make a ranking of 2,040,000 pages in 0.11 seconds?
    3. 4. A main task of Internet (Web) Information Retrieval = Design and Analysis of Search Engine (SE) Algorithm involving plenty of Mathematics
    4. 5. Inter network is a large scale complex random network The Earth is developing an electronic nervous system, a network with diverse nodes and links are
    5. 6. 搜索引擎的流程 Web Links & Anchors Pages Link Map 查询 在线部分 离线部分 Link Analysis 缓存 网页剖析器 倒排表 Page & Site 数据库 网络图 网页爬取器 r 用户界面 缓存页面 索引编辑器 Page Ranks 网络图生成器 Indexing and Ranking
    6. 7. Static Rank ( 静态排序) <ul><li>Importance ranking </li></ul><ul><ul><li>Goal: compute page importance, page authority </li></ul></ul><ul><ul><li>Method: Link analysis </li></ul></ul><ul><ul><ul><li>A method is based on the topology of the graph of whole Web pages. </li></ul></ul></ul><ul><ul><ul><li>Web graph: page  node, hyperlink  edge. </li></ul></ul></ul><ul><ul><li>Algorithms: </li></ul></ul><ul><ul><ul><li>HITS(Kleinberg) [5] </li></ul></ul></ul><ul><ul><ul><li>PageRank(GOOGLE) [6] </li></ul></ul></ul>
    7. 8. Dynamic Rank (动态排序) <ul><li>Relevance ranking (相关性排序) </li></ul><ul><ul><li>Goal: compute the content match relevant score between pages and query. </li></ul></ul><ul><ul><li>Method: Statistic machine learning </li></ul></ul><ul><ul><li>Algorithms: </li></ul></ul><ul><ul><ul><li>Point-wise: BM25[7] </li></ul></ul></ul><ul><ul><ul><li>Pair-wise: RankBoost[3], RankSVM[4], RankNet[1],… </li></ul></ul></ul><ul><ul><ul><li>List-wise: ListNet[10] </li></ul></ul></ul>
    8. 9. Research on Complex Networks and Information Retrieval <ul><li>In recent years we have been involved in the research direction of Random Complex Networks and Information Retrieval. I shall briefly review some of our recent results ( in collaboration with Microsoft Research Asia ) in this direction. </li></ul>
    9. 12. Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li></ul><ul><li>Two layer statistical learning </li></ul><ul><li>Stochstic complement method in ranking Web sites </li></ul><ul><li>Final remarks </li></ul>
    10. 18. <ul><li>Browse Rank  </li></ul><ul><li>vs. </li></ul><ul><li>Page Rank ? </li></ul>
    11. 19.  HITS  PageRank 1998 Jon Kleinberg Cornell University <ul><li>Sergey Brin and Larry Page </li></ul><ul><li>Stanford University </li></ul>
    12. 20. Nevanlinna Prize ( 2006) Jon Kleinberg <ul><li>One of Kleinberg‘s most important research achievements focuses on the internetwork structure of the World Wide Web. </li></ul><ul><li>Prior to   Kleinberg‘s work, search engines focused only on the content of web pages , not on the link structure. </li></ul><ul><li>Kleinberg introduced the idea of </li></ul><ul><li>“ authorities” and “hubs”: </li></ul><ul><li>An authority is a web page that contains   information on a particular topic, </li></ul><ul><li>and a hub is a page that contains links to   many authorities. Zhuzihu thesis.pdf </li></ul>
    13. 21. Page Rank , the ranking system used by the Google search engine. <ul><li>Query independent </li></ul><ul><li>content independent. </li></ul><ul><li>using only the web graph structure </li></ul>
    14. 23. Markov chain describing surfing behavior
    15. 24. Markov chain describing surfing behavior
    16. 25. <ul><li>Web surfers usually have two basic ways to access web pages: </li></ul><ul><li>with probability α, they visit a web page by clicking a hyperlink. </li></ul><ul><li>2. with probability 1-α, they visit a web page by inputting its URL address. </li></ul>
    17. 26. where
    18. 27. More generally we may consider personalized d .: PageRank is the unique positive eigenvector: By the strong ergodic theorem:
    19. 28. Problem:
    20. 31. PageRank as a Function of the Damping Factor Paolo Boldi Massimo Santini Sebastiano Vigna DSI, Università degli Studi di Milano WWW 2005 paper 3.1 Choosing the damping factor 3 General Behaviour 3.2 Getting close to 1 <ul><li>can we somehow characterise the properties of ? </li></ul><ul><li>what makes different from the other (infinitely </li></ul><ul><li>many, if P is reducible) limit distributions of P ? </li></ul>
    21. 32. is the limit distribution of P when the starting distribution is uniform, that is, Conjecture 1 :
    22. 33. Research results by our group: <ul><li>Limit of PageRank </li></ul><ul><li>Comparison of Different Irreducible Markov Chains </li></ul><ul><li>N-step PageRank </li></ul><ul><li>…… </li></ul>
    23. 34. Weak points of PageRank <ul><li>Using only static web graph structure </li></ul><ul><li>Reflecting only the will of web managers, </li></ul><ul><li>but ignore the will of users e.g. the staying time of users on a web. </li></ul><ul><li>Can not effectively against spam and junk pages. </li></ul>BrowseRankSIGIR.ppt
    24. 36. Letting Web Users Vote for Page Importance <ul><li>When calculating the page importance, </li></ul><ul><ul><li>Use the users’ real browsing behavior </li></ul></ul><ul><ul><ul><li>Make no artificial assumption on the users’ behavior </li></ul></ul></ul><ul><ul><li>Use the users’ complete browsing behavior </li></ul></ul><ul><ul><ul><li>Contain the time information </li></ul></ul></ul>06/09/09 Yuting Liu@SIGIR'08
    25. 38. Browsing Process <ul><li>Markov property </li></ul><ul><li>Time-homogeneity </li></ul>
    26. 42. BrowseRank: User browsing graph 06/09/09 Yuting Liu@SIGIR'08 Vertex: Web page Edge: Transition Edge weight w ij : The number of transitions Staying time T i : The time spend on page i Reset probability : Normalized frequencies as first page of session
    27. 43. Mathematical Deduction Maximum likelihood estimation: of staying time
    28. 44. Mathematical Deduction where Therefore
    29. 45. Mathematical Deduction <ul><li>Additional Noise: </li></ul><ul><li>the speed of the Internet connection, </li></ul><ul><li>the length of the page, </li></ul><ul><li>the layout of the page, </li></ul><ul><li>user does some other things (e.g.,answers a phone call) </li></ul><ul><li>Other factors </li></ul>
    30. 46. Mathematical Deduction Assume Noise: Chi-square distribution with degree k
    31. 47. Mathematical Deduction ideally we would have: However, due to data sparseness, we encounter challenges……
    32. 48. Mathematical Deduction To tackle this challenge, we turn it into optimization problems :
    33. 50. Mathematical Deduction <ul><ul><li>Stationary distribution: </li></ul></ul><ul><ul><li>is the mean of the staying time on page i. </li></ul></ul><ul><ul><li>The more important a page is, the longer staying time on it is. </li></ul></ul><ul><ul><li>is the mean of the first re-visit time at page i. The more important a page is, the smaller the </li></ul></ul><ul><ul><li>re-visit time is, and the larger the visit frequency is. </li></ul></ul>
    34. 51. Mathematical Deduction <ul><li>Properties of Q process: </li></ul><ul><ul><li>Jumping probability is conditionally independent from jumping time: </li></ul></ul><ul><ul><li>Embedded Markov chain: </li></ul></ul><ul><ul><ul><li>is a Markov chain with the transition probability matrix </li></ul></ul></ul>
    35. 52. Mathematical Deduction <ul><ul><li>is the stationary distribution of </li></ul></ul><ul><ul><li>The stationary distribution of discrete model is easy to compute </li></ul></ul><ul><ul><ul><li>Power method for </li></ul></ul></ul><ul><ul><ul><li>Log data for </li></ul></ul></ul>
    36. 54. Experiments <ul><li>Data set: </li></ul><ul><ul><li>5.6 million vertices </li></ul></ul><ul><ul><li>53 million edges </li></ul></ul><ul><li>Baselines: PageRank, TrustRank </li></ul><ul><li>Aim to: </li></ul><ul><ul><li>Find good websites </li></ul></ul><ul><ul><li>Fight spam websites </li></ul></ul>06/09/09 Yuting Liu@SIGIR'08
    37. 55. Website-level: Find good 06/09/09 Yuting Liu@SIGIR'08
    38. 56. Website-level: Fight spam 06/09/09 Yuting Liu@SIGIR'08
    39. 58. BrowseRank: Letting Web Users Vote for Page Importance Yuting Liu , Bin Gao, Tie-Yan Liu, Ying Zhang, Zhiming Ma, Shuyuan He, and Hang Li July 23, 2008, Singapore the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval. Best student paper !
    40. 59. BrowseRank: Letting Web Users Vote for Page Importance <ul><li>Google search , </li></ul><ul><li>110,000,000 results for </li></ul><ul><li>Browse Rank </li></ul>
    41. 60. Further Studies <ul><li>Browsing Processes will be a </li></ul><ul><li>Basic Mathematical Tool </li></ul><ul><li>in Internet Information Retrieval </li></ul><ul><li>How about inhomogenous process? </li></ul><ul><li>Marked point process </li></ul><ul><ul><li>Hyperlink is not reliable. </li></ul></ul><ul><ul><li>Users’ real behavior should be considered. </li></ul></ul>
    42. 61. Dynamic Rank (动态排序) <ul><li>Relevance ranking (相关性排序) </li></ul><ul><ul><li>Goal: compute the content match relevant score between pages and query. </li></ul></ul><ul><ul><li>Method: Statistic machine learning </li></ul></ul><ul><ul><li>Algorithms: </li></ul></ul><ul><ul><ul><li>Point-wise: BM25[7] </li></ul></ul></ul><ul><ul><ul><li>Pair-wise: RankBoost[3], RankSVM[4], RankNet[1],… </li></ul></ul></ul><ul><ul><ul><li>List-wise: ListNet[10] </li></ul></ul></ul>
    43. 62. Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li></ul><ul><li>Two layer statistical learning </li></ul><ul><li>Stochstic complement method in ranking Web sites </li></ul><ul><li>Final remarks </li></ul>
    44. 63. Learning to Rank Model Learning System Ranking System Wei-Ying Ma, Microsoft Research Asia min Loss
    45. 64. learning to rank in IR is a two layer statistical learning <ul><li>Distributions of the relevance judgments of documents may vary from query to query </li></ul><ul><li>The numbers of documents for different queries may differ largely </li></ul><ul><li>Evaluation in IR is usually conducted at </li></ul><ul><li>query level </li></ul>
    46. 65. Document level vs Query level <ul><li>Two queries in total. </li></ul><ul><li>Same errors in terms of pairwise classification. 780/790=98.73% </li></ul><ul><li>Different errors in terms of query-level evaluation. 99% vs. 50%. </li></ul>
    47. 66. <ul><li>Query-Level Stability and Generalization in Learning to Rank, </li></ul><ul><li>to appear in Proceedings of the 25th International Conference on Machine Learning 2008, </li></ul><ul><li>Yanyan Lan, Tie-Yan Liu, Tao Qin, Zhiming Ma, Hang Li </li></ul><ul><li>Two Layer Statistical Learning and Applications in Information Retrieval, </li></ul><ul><li>in preparation , Yanyan Lan, Hang Li, </li></ul><ul><li>Tie-Yan Liu, Zhi-Ming Ma, Tao Qin </li></ul>Microsoft Scholar Fellowship
    48. 67. <ul><li>We propose a new framework of statistical learning model, in which the </li></ul><ul><li>training dada are composed in two layers . </li></ul>the two layer structure of training data is not artificial , but arises from the real world Especially from learning to rank in Information Retrieval
    49. 68. Two-Layer Statistical Learning Framework <ul><li>First layer: objects </li></ul><ul><li>Second layer: associated samples </li></ul>: instances : descriptions of instances Instances are the objectives which we are concern
    50. 69. <ul><li>In learning to rank for IR, an object is a query, </li></ul><ul><li>an instance and corresponding description can be interpreted as </li></ul><ul><li>a single document ( pointwise algorithm ) </li></ul><ul><li>a pair of document ( pairwise algorithm ) </li></ul><ul><li>a set of documents ( listwise algorithm ) </li></ul>a score (or label) of a document an order on a pair of documents a permutation (list) of documents
    51. 70. Training Process i.i.d. For each i, the associated samples , distribution the training data is denoted as
    52. 71. <ul><li>The algorithm of a training process is to learn from the training data a function </li></ul><ul><li>that will be used to predict the features of instances. </li></ul>
    53. 72. empirical object level loss loss function on expected object level loss
    54. 73. <ul><li>empirical risk </li></ul>expected risk
    55. 74. <ul><li>The challenge is that when dealing with two layer training data, most of the existing results of statistical learning can not be directly applied. Thus we have to develop new results or modify the existing results to suit the new model. In this aspect much research should be conducted. </li></ul>
    56. 75. Generalization Analysis based on Stability Theory <ul><li>Devroye, L. and Wagner, T.(1979). Stability </li></ul><ul><li>stability-bounds depend on properties of the algorithm itself </li></ul><ul><li>rather than the property of the function class, </li></ul><ul><li>Bousquet, O. and Elisseeff, A.(2002). . </li></ul><ul><li>uniform leave-one-out stability </li></ul><ul><li>Motivated by the above work, we invent: </li></ul><ul><li>Object –level uniform leave-one-out stability </li></ul><ul><li>In short, Object –level stability </li></ul>
    57. 76. Definition: We say a algorithm possesses: Object –level uniform leave-one-out stability Abbreviated as Object –level stability, if: Function learned from training data Function learned from training data
    58. 77. Generalization based on Object-level Stability Object-level stability The number of training objects With probability at least
    59. 78. Note: if , then the bound makes sense. This condition can be satisfied in many practical cases. As case studies, we investigate Ranking SVM and RankBoost. We show that after introducing query-level normalization to its objective function, Ranking SVM will have query-level stability. For RankBoost , the query-level stability can be achieved if we introduce both query-level normalization and regularization to its objective function . These analyses agree largely with our experiments and the experiments in Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, 2006 [5] and [11].
    60. 79. <ul><li>IRSVM : modified Ranking SVM in Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon, 2006 </li></ul><ul><li>query-level normalization </li></ul>Query-level Empirical Risk Generalization Bound:
    61. 80. Generalization Bounds Comparison <ul><li>Ranking SVM </li></ul>Generalization Bound: Generalization Bound: Modified RSVM
    62. 81. RankBoost with Query-level Normalization and Regularization <ul><li>introducing query-level normalization to RankBoost does not lead to good performance [11]. </li></ul>query-level normalization cannot make RankBoost have query-level stability. <ul><li>adding both query-level normalization and regularization to the objective function, </li></ul><ul><li>query-level stability can be achieved </li></ul><ul><li>. </li></ul><ul><li>Thus the framework offers us a way of modifying RankBoost for good ranking performance </li></ul>
    63. 82. Experimental Results (I) <ul><li>Query-Level Stability </li></ul><ul><li>1200 queries from a search </li></ul><ul><li>engine’s data repository. </li></ul><ul><li>200 queries for training, 500 </li></ul><ul><li>queries for validation and 500 </li></ul><ul><li>queries for test. </li></ul><ul><li>Five relevance labels and we </li></ul><ul><li>treat the first three label as </li></ul><ul><li>“ relevant” and the other ones </li></ul><ul><li>as “irrelevant” to construct pairs. </li></ul>
    64. 83. <ul><li>Query-level Generalization Bound </li></ul>Experimental Results (II)
    65. 84. Future Problems and Challenges <ul><li>It is worth to see whether new learning to rank algorithms can be derived under the guide of our theoretical studies. </li></ul><ul><li>We have investigated the generalization analysis based on the novel two layer statistical learning. We will continue to conduct other theoretical analysis. </li></ul><ul><li>We have proposed “object-level stability”, we will try to investigate other tools. </li></ul><ul><li>Two layer statistical learning in other fields </li></ul>
    66. 85. Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li></ul><ul><li>Two layer statistical learning </li></ul><ul><li>Stochstic complement method in ranking Web sites </li></ul><ul><li>Final remarks </li></ul>
    67. 86. Outlines <ul><li>Markov chain methods in search engines </li></ul><ul><li>Point process describing Browsing behavior </li></ul><ul><li>Two layer statistical learning </li></ul><ul><li>Stochstic complement method in ranking Web sites </li></ul><ul><li>Final remarks </li></ul>
    68. 87. <ul><li>We have briefly reviewed part of our recent joint work (in collaboration with Microsoft Research Asia) concerning Internet Information Retrieval. </li></ul><ul><li>Mathematics is becoming more and more important in the area of Internet information retrieval. </li></ul><ul><li>Internet information retrieval has been a rich source providing plenty of interesting and challenging problems in Mathematics. </li></ul>
    69. 88. Thank you !
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×