SlideShare a Scribd company logo
1 of 42
Download to read offline
How Does Google? !
!
	

David F. Gleich!
Computer Science!
Purdue University!

A journey into the wondrous mathematics
behind your favorite websites
1
Mathematics underlies an
enormous number of the
websites we use everyday!
2
1.  ‘s PageRank

2.  Multi-armed bandits and
internet experiments
3
4
Larry Page !
Sergey Brin!

•  Created a web-search algorithm
called “backrub”
•  Spun-off a company “Googol”
based on the paper

•  The importance of a page is
determined by the importance of
pages that link to it.
Lawrence Page, Sergey Brin, Rajeev Motwani,Terry
Winograd “The PageRank Citation Ranking: Bringing
Order to the Web” TR, Stanford InfoLab, 1999	

5
A websearch primer
1.  Crawl webpages
2.  Analyze webpage text (information retrieval)
3.  Analyze webpage links
4.  Fit over 200 measures to human evaluations
5.  Produce rankings
6.  Continuously update
6
Pages, nodes, incoming links,
outgoing links, and “importance”
7
“Important” pages
that link to me!
c
b
a
“Important”
pages that
link to
Purdue!
8
Tim Davis andYifan Hu	

Sparse Matrix Gallery
http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
1000 vertices on
8.5-by-11 paper
1,000,000,000,000
vertices (one trillion)

Paper the size of
Manhattan island !
(23 sq miles)?
The web
10
We need something better!
11
A wee web-graph: link
counting is too easy to game!
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

12
A wee web-graph: link
counting is too easy to game!
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

The importance of a
page is determined
by the importance of
pages that link to it.
x1 = 0
x2 =
1
3
x1
x3 =
1
3
x1 +
1
2
x2
x4 =
1
3
x1 + x3 + x5
x5 = x4
x6 =
1
2
x2
13
The importance of a page is determined
by the importance of pages that link to it
xi =
X
j2Bi
1
dj
xj
“Back-links from page i”
Why it was called Backrub!	

“Importance” of page i
“Importance” of page j
Number of links page j uses!
out-degree in graph theory	

x3 =
1
3
x1 +
1
2
x2
1	

2	

3	

1/3 	

1/2 	

14
We can rewrite this equation in a more
mathematically convenient way
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
4 1 2 3 4 5 6
5 1 2 3 4 5 6
6 1 2 3 4 5 6
x 0 x 0 x 0 x 0 x 0 x 0 x
1
x x 0 x 0 x 0 x 0 x 0 x
3
1 1
x x x 0 x 0 x 0 x 0 x
3 2
1
x x 0 x 1x 0 x 1x 0 x
3
x 0 x 0 x 0 x 1x 0 x 0 x
1
x 0 x x 0 x 0 x 0 x 0 x
2
= + + + + +
= + + + + +
= + + + + +
= + + + + +
= + + + + +
= + + + + +
15
1 1
2 2
3 3
4 4
5 5
6 6
x x0 0 0 0 0 0
x x1/ 3 0 0 0 0 0
x x1/ 3 1/ 2 0 0 0 0
or
x x1/ 3 0 1 0 1 0
x x0 0 0 1 0 0
x x0 1/ 2 0 0 0 0
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
=⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
x = Px
And even more conveniently!
Element k in column m = "probability" of
going from node m to node k
16
The matrix P for websites
shows a lot of structure
Every dot is a non-zero element indicating a link
Matrices are sparse, and generally with block structure
block structure can be explored to speed up ranking algorithm
17
But this idea doesn’t work for
the wee web-graph
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

Nodes 1, 4 and 5
determine everything!
x1 = 0
x2 =
1
3
x1
x3 =
1
3
x1 +
1
2
x2
x4 =
1
3
x1 + x3 + x5
x5 = x4
x6 =
1
2
x2
x1 = 0
x2 =
1
3
x1 = 0
x3 =
1
3
x1 +
1
2
x2 = 0
x4 =
1
3
x1 + x3 + x5 = x5
x5 = x4
x6 =
1
2
x2 = 0
18
But this idea doesn’t work for
the wee web-graph
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

Node 1 !
“lonely”

Nodes 4 and 5 !
“mutual admiration
societies” 

Node 6 
“anti-social”
These nodes need to be “fixed” to get a
reliable and useful ranking!
19
The gang of four to the rescue
Andrei
Markov
Oscar
Perron
Georg
Frogenius
Richard !
von Mises
20
Let’s fix it up and force node 6 to
choose, or link to everyone
1
2
3
4
5
6
P =
2
6
6
6
6
6
6
4
0 0 0 0 0 0
1/3 0 0 0 0 0
1/3 1/2 0 0 0 0
1/3 0 1 0 1 0
0 0 0 1 0 0
0 1/2 0 0 0 0
3
7
7
7
7
7
7
5
P =
2
6
6
6
6
6
6
4
0 0 0 0 0 1/6
1/3 0 0 0 0 1/6
1/3 1/2 0 0 0 1/6
1/3 0 1 0 1 1/6
0 0 0 1 0 1/6
0 1/2 0 0 0 1/6
3
7
7
7
7
7
7
5
21
Taxation is the way to
representation!
c
b
a
If is a good page, then
it’ll still be a good page if
we “tax” the importance
from a, b, and c

We can redistribute the
taxed amounts to all
including lonely nodes!
22
The importance of a page is determined
by the importance of pages that link to it*
* After tax and any benefits
The total importance that page j !
contributes to page i
Benefits to page i
The taxation rate of all
xi =
X
j2Bi
↵
xj
dj
+ (1 ↵)bi
23
x1
x2
x3
x4
x5
x6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
= α
0 0 0 0 0 1/ 6
1/ 3 0 0 0 0 1/ 6
1/ 3 1/ 2 0 0 0 1/ 6
1/ 3 0 1 0 1 1/ 6
0 0 0 1 0 1/ 6
0 1/ 2 0 0 0 1/ 6
!
"
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
x1
x2
x3
x4
x5
x6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
+(1− α)
b1
b2
b3
b4
b5
b6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
Perron and Frobenius showed the new
equation always has a unique solution
x = ↵Px + (1 ↵)b
24
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

What von Mises and Richardson showed
is that guess, check, and correct works!
x(new)
= ↵Px(old)
+ (1 ↵)b
x(start)
=
2
6
6
6
6
6
6
4
0.17
0.17
0.17
0.17
0.17
0.17
3
7
7
7
7
7
7
5
x(1)
=
2
6
6
6
6
6
6
4
0.05
0.10
0.17
0.38
0.19
0.12
3
7
7
7
7
7
7
5
x(2)
=
2
6
6
6
6
6
6
4
0.04
0.06
0.10
0.36
0.36
0.08
3
7
7
7
7
7
7
5
x(1)
=
2
6
6
6
6
6
6
4
0.03
0.04
0.06
0.43
0.39
0.05
3
7
7
7
7
7
7
5
25
26
There’s still a lot of work left to
do to make a search engine
Make it fast!
Watch out for spam
Watch out for manipulation
Personalize

Experiment!
27
1.  ‘s PageRank

2.  Multi-armed bandits and
internet experiments
28
http://adamlofting.com/736/drawn-multi-armed-bandit-experiments/multi-armed-bandit/
Not this!
29
http://upload.wikimedia.org/wikipedia/en/8/82/Las_Vegas_slot_machines.jpg
This!
Pays out !
$0.92/
dollar
Pays out !
$0.98/
dollar
Pays out !
$0.95/
dollar
Pays out !
$0.99/
dollar
30
What in the heck does a multi-armed
bandit have to do with Google?
31
What in the heck does a multi-armed
bandit have to do with Google?
Pays out !
$0.92/
view
Pays out !
$0.66/
view
Pays out !
$0.91/
view to
show ads
Pays out !
-$0.02/view
hide ads
32
How to optimize your website
without exploiting the bandits
Try condition A 100 times, find 45 “wins”
Try condition B 100 times, find 85 “wins”
Try condition C 100 times, find 10 “wins”
…
Choose the best!
33
This field has some of the
best terminology

Explore !

Exploit !

Regret
34
This field has some of the
best terminology

Explore – Visiting Las Vegas!

Exploit – Your new winning strategy!

Regret – That you didn’t quit after
winning the first round
35
This field has some of the
best terminology

Explore – Testing slot machines/
experiments for their reward
Exploit – Playing the best reward
you’ve found so far 
Regret – How much you lost due !
to exploration
36
How to optimize your website
without exploiting the bandits
Try condition A 100 times, find 45 “wins”
Try condition B 100 times, find 85 “wins”
Try condition C 100 times, find 10 “wins”
…
Choose the best!
Pure
exploration!
We only exploit our findings at the end!
37
How to optimize your website
exploiting the bandits
Try condition A 5 times, find 4 wins!
Try condition B 5 times, find 4 wins!
Try condition C 5 times, find 2 wins

Try condition A 7 times, find 3 wins!
Try condition B 7 times, find 5 wins!
Try condition C 1 time, find 0 wins


Pure
exploration!
Exploit our
knowledge
Condition
 A
 B
 C
Est. Return
 0.58
 0.75
 0.33
38
The goal of these problems is to construct
optimal strategies to minimize regret
Regret how much you left “on the table” by exploring	

	

	

	

	

zero-regret strategy is one where 

regret(T trials) is sublinear in T!

as the number of plays T → ∞ 	

E[play best always plays made based on data]
regret 100-each 255/300 140/300 = 0.38
regret 30-mixed 25.5/30 0.45 ⇥ 12 + 0.85 ⇥ 12 + 0.1 ⇥ 6 = 0.31
39
[The bandit problem] was formulated during the [second
world] war, and efforts to solve it so sapped the energies
and minds of Allied analysts that the suggestion was
made that the problem be dropped over Germany, as the
ultimate instrument of intellectual sabotage.	

Peter Whittle (Whittle, 1979)
Discussion of “Bandit processes and dynamical allocation indices”
Their importance to website optimization,
advertising, and recommendation has
rejuvenated research on these problems
with fascinating new questions. 
40
Math is everywhere and
especially your favorite
websites!
Matrices and probability are
key ingredients.
41
PageRank on Wikipedia
= 0.50
United States
C:Living people
France
Germany
England
United Kingdom
Canada
Japan
Poland
Australia
= 0.85
United States
C:Main topic classif.
C:Contents
C:Living people
C:Ctgs. by country
United Kingdom
C:Fundamental
C:Ctgs. by topic
C:Wikipedia admin.
France
= 0.99
C:Contents
C:Main topic classif.
C:Fundamental
United States
C:Wikipedia admin.
P:List of portals
P:Contents/Portals
C:Portals
C:Society
C:Ctgs. by topic
Note Top 10 articles on Wikipedia with highest PageRank
David F. Gleich (Sandia) Sensitivity Purdue 11 / 36
42

More Related Content

Viewers also liked

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 

Viewers also liked (20)

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 

Similar to How does Google Google: A journey into the wondrous mathematics behind your favorite websites

Aprendo las tablas de multiplicar
Aprendo las tablas de multiplicarAprendo las tablas de multiplicar
Aprendo las tablas de multiplicarKúbico Animación
 
Perkalian kelas 2
Perkalian kelas 2Perkalian kelas 2
Perkalian kelas 2Ven Dot
 
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Ivan Corneillet
 
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and ConcreteKyle Pearce
 
12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theoryNigel Simmons
 
Sexy Maths
Sexy Maths Sexy Maths
Sexy Maths sam ran
 
maths easy
maths easymaths easy
maths easysam ran
 
RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby Gautam Rege
 
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friSt Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friNICOLEWHITE118
 
2º tablas-multiplicar-mini
2º tablas-multiplicar-mini2º tablas-multiplicar-mini
2º tablas-multiplicar-minicarolian4
 
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxG10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxKirbyRaeDiaz2
 
The lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxThe lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxoreo10
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithmsbigdata trunk
 
Lesson 1 solving linear equations
Lesson 1   solving linear equationsLesson 1   solving linear equations
Lesson 1 solving linear equationsAngela Phillips
 
Multiplication
MultiplicationMultiplication
Multiplicationhiratufail
 
Multiplication
MultiplicationMultiplication
Multiplicationmsnancy
 

Similar to How does Google Google: A journey into the wondrous mathematics behind your favorite websites (20)

Math 5
Math 5 Math 5
Math 5
 
Aprendo las tablas de multiplicar
Aprendo las tablas de multiplicarAprendo las tablas de multiplicar
Aprendo las tablas de multiplicar
 
Perkalian kelas 2
Perkalian kelas 2Perkalian kelas 2
Perkalian kelas 2
 
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
 
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
 
12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory
 
Sexy Maths
Sexy Maths Sexy Maths
Sexy Maths
 
maths easy
maths easymaths easy
maths easy
 
RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby
 
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friSt Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
2º tablas-multiplicar-mini
2º tablas-multiplicar-mini2º tablas-multiplicar-mini
2º tablas-multiplicar-mini
 
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxG10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
 
Skills ii
Skills iiSkills ii
Skills ii
 
The lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxThe lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docx
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptxYr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx
 
Lesson 1 solving linear equations
Lesson 1   solving linear equationsLesson 1   solving linear equations
Lesson 1 solving linear equations
 
Multiplication
MultiplicationMultiplication
Multiplication
 
Multiplication
MultiplicationMultiplication
Multiplication
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 

More from David Gleich (8)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

How does Google Google: A journey into the wondrous mathematics behind your favorite websites

  • 1. How Does Google? ! ! David F. Gleich! Computer Science! Purdue University! A journey into the wondrous mathematics behind your favorite websites 1
  • 2. Mathematics underlies an enormous number of the websites we use everyday! 2
  • 3. 1.  ‘s PageRank 2.  Multi-armed bandits and internet experiments 3
  • 4. 4
  • 5. Larry Page ! Sergey Brin! •  Created a web-search algorithm called “backrub” •  Spun-off a company “Googol” based on the paper •  The importance of a page is determined by the importance of pages that link to it. Lawrence Page, Sergey Brin, Rajeev Motwani,Terry Winograd “The PageRank Citation Ranking: Bringing Order to the Web” TR, Stanford InfoLab, 1999 5
  • 6. A websearch primer 1.  Crawl webpages 2.  Analyze webpage text (information retrieval) 3.  Analyze webpage links 4.  Fit over 200 measures to human evaluations 5.  Produce rankings 6.  Continuously update 6
  • 7. Pages, nodes, incoming links, outgoing links, and “importance” 7 “Important” pages that link to me! c b a “Important” pages that link to Purdue!
  • 8. 8
  • 9. Tim Davis andYifan Hu Sparse Matrix Gallery
  • 10. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html 1000 vertices on 8.5-by-11 paper 1,000,000,000,000 vertices (one trillion) Paper the size of Manhattan island ! (23 sq miles)? The web 10
  • 11. We need something better! 11
  • 12. A wee web-graph: link counting is too easy to game! 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 12
  • 13. A wee web-graph: link counting is too easy to game! 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 The importance of a page is determined by the importance of pages that link to it. x1 = 0 x2 = 1 3 x1 x3 = 1 3 x1 + 1 2 x2 x4 = 1 3 x1 + x3 + x5 x5 = x4 x6 = 1 2 x2 13
  • 14. The importance of a page is determined by the importance of pages that link to it xi = X j2Bi 1 dj xj “Back-links from page i” Why it was called Backrub! “Importance” of page i “Importance” of page j Number of links page j uses! out-degree in graph theory x3 = 1 3 x1 + 1 2 x2 1 2 3 1/3 1/2 14
  • 15. We can rewrite this equation in a more mathematically convenient way 1 1 2 3 4 5 6 2 1 2 3 4 5 6 3 1 2 3 4 5 6 4 1 2 3 4 5 6 5 1 2 3 4 5 6 6 1 2 3 4 5 6 x 0 x 0 x 0 x 0 x 0 x 0 x 1 x x 0 x 0 x 0 x 0 x 0 x 3 1 1 x x x 0 x 0 x 0 x 0 x 3 2 1 x x 0 x 1x 0 x 1x 0 x 3 x 0 x 0 x 0 x 1x 0 x 0 x 1 x 0 x x 0 x 0 x 0 x 0 x 2 = + + + + + = + + + + + = + + + + + = + + + + + = + + + + + = + + + + + 15
  • 16. 1 1 2 2 3 3 4 4 5 5 6 6 x x0 0 0 0 0 0 x x1/ 3 0 0 0 0 0 x x1/ 3 1/ 2 0 0 0 0 or x x1/ 3 0 1 0 1 0 x x0 0 0 1 0 0 x x0 1/ 2 0 0 0 0 ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦⎣ ⎦ ⎣ ⎦ x = Px And even more conveniently! Element k in column m = "probability" of going from node m to node k 16
  • 17. The matrix P for websites shows a lot of structure Every dot is a non-zero element indicating a link Matrices are sparse, and generally with block structure block structure can be explored to speed up ranking algorithm 17
  • 18. But this idea doesn’t work for the wee web-graph 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 Nodes 1, 4 and 5 determine everything! x1 = 0 x2 = 1 3 x1 x3 = 1 3 x1 + 1 2 x2 x4 = 1 3 x1 + x3 + x5 x5 = x4 x6 = 1 2 x2 x1 = 0 x2 = 1 3 x1 = 0 x3 = 1 3 x1 + 1 2 x2 = 0 x4 = 1 3 x1 + x3 + x5 = x5 x5 = x4 x6 = 1 2 x2 = 0 18
  • 19. But this idea doesn’t work for the wee web-graph 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 Node 1 ! “lonely” Nodes 4 and 5 ! “mutual admiration societies” Node 6 “anti-social” These nodes need to be “fixed” to get a reliable and useful ranking! 19
  • 20. The gang of four to the rescue Andrei Markov Oscar Perron Georg Frogenius Richard ! von Mises 20
  • 21. Let’s fix it up and force node 6 to choose, or link to everyone 1 2 3 4 5 6 P = 2 6 6 6 6 6 6 4 0 0 0 0 0 0 1/3 0 0 0 0 0 1/3 1/2 0 0 0 0 1/3 0 1 0 1 0 0 0 0 1 0 0 0 1/2 0 0 0 0 3 7 7 7 7 7 7 5 P = 2 6 6 6 6 6 6 4 0 0 0 0 0 1/6 1/3 0 0 0 0 1/6 1/3 1/2 0 0 0 1/6 1/3 0 1 0 1 1/6 0 0 0 1 0 1/6 0 1/2 0 0 0 1/6 3 7 7 7 7 7 7 5 21
  • 22. Taxation is the way to representation! c b a If is a good page, then it’ll still be a good page if we “tax” the importance from a, b, and c We can redistribute the taxed amounts to all including lonely nodes! 22
  • 23. The importance of a page is determined by the importance of pages that link to it* * After tax and any benefits The total importance that page j ! contributes to page i Benefits to page i The taxation rate of all xi = X j2Bi ↵ xj dj + (1 ↵)bi 23
  • 24. x1 x2 x3 x4 x5 x6 ! " # # # # # # # # # $ % & & & & & & & & & = α 0 0 0 0 0 1/ 6 1/ 3 0 0 0 0 1/ 6 1/ 3 1/ 2 0 0 0 1/ 6 1/ 3 0 1 0 1 1/ 6 0 0 0 1 0 1/ 6 0 1/ 2 0 0 0 1/ 6 ! " # # # # # # # $ % & & & & & & & x1 x2 x3 x4 x5 x6 ! " # # # # # # # # # $ % & & & & & & & & & +(1− α) b1 b2 b3 b4 b5 b6 ! " # # # # # # # # # $ % & & & & & & & & & Perron and Frobenius showed the new equation always has a unique solution x = ↵Px + (1 ↵)b 24
  • 25. 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 What von Mises and Richardson showed is that guess, check, and correct works! x(new) = ↵Px(old) + (1 ↵)b x(start) = 2 6 6 6 6 6 6 4 0.17 0.17 0.17 0.17 0.17 0.17 3 7 7 7 7 7 7 5 x(1) = 2 6 6 6 6 6 6 4 0.05 0.10 0.17 0.38 0.19 0.12 3 7 7 7 7 7 7 5 x(2) = 2 6 6 6 6 6 6 4 0.04 0.06 0.10 0.36 0.36 0.08 3 7 7 7 7 7 7 5 x(1) = 2 6 6 6 6 6 6 4 0.03 0.04 0.06 0.43 0.39 0.05 3 7 7 7 7 7 7 5 25
  • 26. 26
  • 27. There’s still a lot of work left to do to make a search engine Make it fast! Watch out for spam Watch out for manipulation Personalize Experiment! 27
  • 28. 1.  ‘s PageRank 2.  Multi-armed bandits and internet experiments 28
  • 30. http://upload.wikimedia.org/wikipedia/en/8/82/Las_Vegas_slot_machines.jpg This! Pays out ! $0.92/ dollar Pays out ! $0.98/ dollar Pays out ! $0.95/ dollar Pays out ! $0.99/ dollar 30
  • 31. What in the heck does a multi-armed bandit have to do with Google? 31
  • 32. What in the heck does a multi-armed bandit have to do with Google? Pays out ! $0.92/ view Pays out ! $0.66/ view Pays out ! $0.91/ view to show ads Pays out ! -$0.02/view hide ads 32
  • 33. How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 “wins” Try condition B 100 times, find 85 “wins” Try condition C 100 times, find 10 “wins” … Choose the best! 33
  • 34. This field has some of the best terminology Explore ! Exploit ! Regret 34
  • 35. This field has some of the best terminology Explore – Visiting Las Vegas! Exploit – Your new winning strategy! Regret – That you didn’t quit after winning the first round 35
  • 36. This field has some of the best terminology Explore – Testing slot machines/ experiments for their reward Exploit – Playing the best reward you’ve found so far Regret – How much you lost due ! to exploration 36
  • 37. How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 “wins” Try condition B 100 times, find 85 “wins” Try condition C 100 times, find 10 “wins” … Choose the best! Pure exploration! We only exploit our findings at the end! 37
  • 38. How to optimize your website exploiting the bandits Try condition A 5 times, find 4 wins! Try condition B 5 times, find 4 wins! Try condition C 5 times, find 2 wins Try condition A 7 times, find 3 wins! Try condition B 7 times, find 5 wins! Try condition C 1 time, find 0 wins Pure exploration! Exploit our knowledge Condition A B C Est. Return 0.58 0.75 0.33 38
  • 39. The goal of these problems is to construct optimal strategies to minimize regret Regret how much you left “on the table” by exploring zero-regret strategy is one where regret(T trials) is sublinear in T! as the number of plays T → ∞ E[play best always plays made based on data] regret 100-each 255/300 140/300 = 0.38 regret 30-mixed 25.5/30 0.45 ⇥ 12 + 0.85 ⇥ 12 + 0.1 ⇥ 6 = 0.31 39
  • 40. [The bandit problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. Peter Whittle (Whittle, 1979) Discussion of “Bandit processes and dynamical allocation indices” Their importance to website optimization, advertising, and recommendation has rejuvenated research on these problems with fascinating new questions. 40
  • 41. Math is everywhere and especially your favorite websites! Matrices and probability are key ingredients. 41
  • 42. PageRank on Wikipedia = 0.50 United States C:Living people France Germany England United Kingdom Canada Japan Poland Australia = 0.85 United States C:Main topic classif. C:Contents C:Living people C:Ctgs. by country United Kingdom C:Fundamental C:Ctgs. by topic C:Wikipedia admin. France = 0.99 C:Contents C:Main topic classif. C:Fundamental United States C:Wikipedia admin. P:List of portals P:Contents/Portals C:Portals C:Society C:Ctgs. by topic Note Top 10 articles on Wikipedia with highest PageRank David F. Gleich (Sandia) Sensitivity Purdue 11 / 36 42