Fonctionnement des Moteurs De Recherche

Phase 2 EDA :
Exploration
Moteurs de recherche et indicateurs

Enjeux

‣ Construire le meilleur corpus possible
‣ Permettre une interrogation du corpus
‣ Fournir les résultats les plus pertinents possibles
en un minimum de temps

Comment atteindre cet
objectif ?

‣ Approche 1 : topologique
‣ Approche 2 : sémantique
‣ Approche 3 : savant mélange

Ranking
Comment sont construits les indicateurs

La liste inversé
Doc1 : mot1 mot2
Doc2 : mot2 mot3
Doc3 : mot1 mot4

Mot1 : [Doc1 Doc3]
Mot3 : [Doc2]

From backrub to google
La petite histoire du n°1 des moteurs de recherche

The web creates new challenges for information retrieval. The amount of
information on the web is growing rapidly, as well as the number of new
users inexperienced in the art of web research.

People are likely to surf the web using its link graph, often starting with high
quality human maintained indices such as Yahoo! or with search engines. Human
maintained lists cover popular topics effectively but are subjective, expensive to
build and maintain, slow to improve, and cannot cover all esoteric topics.

Automated search engines that rely on keyword matching usually return too
many low quality matches. To make matters worse, some advertisers attempt to
gain people's attention by taking measures meant to mislead automated search
engines. We have built a large-scale search engine which addresses many of the
problems of existing systems. It makes especially heavy use of the additional
structure present in hypertext to provide much higher quality search results.

We chose our system name, Google, because it is a common spelling of
googol, or 10100 and ﬁts well with our goal of building very large-scale search
engines.

Page Rank ou le surfeur
aléatoire

http://www.youtube.com/watch?v=H6QRv_bCzEI

aléatoire
Modélisation du comportement d’un internaute :

1. Prendre une page web au hasard

2. Prendre un nombre 0 < p < 1

3. Si p > c alors choisir une nouvelle page au hasard

4. Si p < c choisir un lien au hasard dans la page et le suivre

La probabilité que cet internaute se trouve en une page donnée à un
moment donné est égale au PageRank de cette page.

Si le PageRank est fort alors la probabilité d’être visité est forte

aléatoire
PageRank : la vision classique

v1 c*PR(v1)
c*PR(v4)/3 v4

c*PR(v2) u
v2 c*PR(v5)

v5
c*PR(v3)/2
v3

PageRank : la visionle surfeur
Page Rank ou classique
PageRank : la vision la vision
classique
PageRank :
v1
aléatoire
c*PR(v1)
c*PR(v4)/3
classique
v4

v1 c*PR(v1) c*PR(v4)/3 v4
c*PR(v2) v1 c*PR(v1)
c*PR(v4)/3 v4
v2
u
c*PR(v2) u c*PR(v5)
v2 c*PR(v5)
c*PR(v2) u
v2 v5
c*PR(v5)
c*PR(v3)/2
c*PR(v3)/2
v5
v3 v3
v5
c*PR(v3)/2
(1-c)/N
v3
(1-c)/N
(1-c)/N
nitialisation : ∀u PR(u) = 1/N
Initialisation : ∀u PR(u) = 1/N
alcul it´ratif :
e
Calcul it´ratif :
e
(1 − c) PR(v )
PR(u) = PR(u) + c. − c) + c.
=
(1 PR(v )
N N
v →u
#liens(v )
#liens(v ) v →u

aléatoire
The web in 1839

y = y /2 + a /2
a = y /2 + m
y/2
m = a /2
y Yahoo

 y+a+m = 1
a/2  y = 2/5, a = 2/5, m = 1/5
y/2

m

Amazon M’soft
a/2
a m

∀u 0 < PageRank(u) < 1
PageRank(u) = 1
aléatoire
Le PageRank est int´ressant car c’est une notion simple et
e
facile ` calculer
a

Relation au Toolbar PageRank (TPR) :

0 0 < PageRank < 0,8 0,8 < PR < 0,96
etc. ... 1
TPR = 1 TPR = 2

04/02/2009 SEO Campus 2009 : Pagerank et optimisation 5/2

What now ?
Comment sont construits les indicateurs

Page rank et consors
Pénurie d’informations : rétro-ingénierie, white
paper, brevet...
Nouveaux modèles : browser rank, user sensitive
pagerank, etc.

Page rank et consors
‣ Date du document
‣ Modiﬁcation du contenu
‣ Analyse des requêtes et clics sur les résultats
‣ Critères des liens sur la page
‣ Texte des ancres
‣ Traﬁc
‣ Comportement des visiteurs
‣ Informations sur le nom de domaine
‣ Rangs précédents
‣ Bookmarks
‣ Mots uniques et ancres
‣ Liens non pertinents
‣ Sujet du document

Fonctionnement des Moteurs De Recherche

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

More from Fabien Pfaender

More from Fabien Pfaender (10)

Fonctionnement des Moteurs De Recherche

Editor's Notes