Comparing the Performance of US College Football Teams in the Web and on the Field - Presentation Transcript
Comparing the Performance of
US College Football Teams
in the Web and on the Field
Martin Klein Olena Hunsicker Michael L. Nelson
mklein@cs.odu.edu koval_olena@yahoo.com mln@cs.odu.edu
Old Dominion University
Hypertext 2009
Torino, Italy
06/30/2009
Naming Conventions
2
Naming Conventions
Football
2
Naming Conventions
Football
Soccer
2
Motivation
3
Motivation
• “Does Authority mean Quality?”[Amento00]
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
• Lists compiled by experts are cool!
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
• Lists compiled by experts are cool!
• Companies, schools, people, places, etc
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
• Lists compiled by experts are cool!
• Companies, schools, people, places, etc
• “Big 3” search engines play a central role in our lives
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
• Lists compiled by experts are cool!
• Companies, schools, people, places, etc
• “Big 3” search engines play a central role in our lives
• “If I can’t find it in the top 10 it doesn’t exist in the web”
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
• Lists compiled by experts are cool!
• Companies, schools, people, places, etc
• “Big 3” search engines play a central role in our lives
• “If I can’t find it in the top 10 it doesn’t exist in the web”
• SEOs
3
Motivation
• “Does Authority mean Quality?”[Amento00]
• Link-based web page metrics can be used to estimate
experts’ assessment of quality
• Lists compiled by experts are cool!
• Companies, schools, people, places, etc
• “Big 3” search engines play a central role in our lives
• “If I can’t find it in the top 10 it doesn’t exist in the web”
• SEOs
Do expert rankings of real-world entities
correlate with search engine ranking of
corresponding web resources?
3
Background
• Expert ranking of real-world entities:
• Collegiate football programs in the US
• Associated Press (AP) poll
• 65 sportswriters and broadcasters
• USA Today Coaches poll
• 63 college football head coaches
• Published once a week, top 25 teams, 25-1 point system
• “Big 3” search engines
• Google, Yahoo and MSN Live (APIs)
4
US College Football Season 2008
• 2008 season began on August 28th 2008
• Concluded January 8th 2009
• 18 instances of poll data:
• Final polls from 2007 season (as a baseline)
• 2008 pre-season polls
• once for each of the 16 weeks of the 2008 season
5
US College Football Season 2008
• 2008 season began on August 28th 2008
• Concluded January 8th 2009
• 18 instances of poll data:
• Final polls from 2007 season (as a baseline)
• 2008 pre-season polls
• once for each of the 16 weeks of the 2008 season
5
Mapping Resources to URLs
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
• e.g. Virginia Tech college football
returned
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
• e.g. Virginia Tech college football
returned
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
• e.g. Virginia Tech college football
returned
• Official school page
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
• e.g. Virginia Tech college football
returned
• Official school page
• Commercial sports sites
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
• e.g. Virginia Tech college football
returned
• Official school page
• Commercial sports sites
• Wikipedia
6
Mapping Resources to URLs
• Often impossible to distill the
canonical URL for a football program
• e.g. Virginia Tech college football
returned
• Official school page
• Commercial sports sites
• Wikipedia
• Blogs, Fan sites, etc
6
Mapping Resources to URLs
• Query 3 search engine APIs for representative URLs
• Query: schoolname+College+Football
• e.g.: Ohio+State+College+Football
• Aggregate the top 8 representative URLs (n = 1 .. 8)
• Temporal aspect in mind:
• Repeat query and renew aggregation weekly
7
Ordinal Ranking of URLs from SE Queries
8
Ordinal Ranking of URLs from SE Queries
We are not interested in computing search engine’s absolute
ranking for a particular URL (PR values)
rank(U RLA ) = 0.92 rank(U RLB ) = 0.73
rank(U RLC ) = 0.42 rank(U RLD ) = 0.13
8
Ordinal Ranking of URLs from SE Queries
We are not interested in computing search engine’s absolute
ranking for a particular URL (PR values)
rank(U RLA ) = 0.92 rank(U RLB ) = 0.73
rank(U RLC ) = 0.42 rank(U RLD ) = 0.13
BUT
We are determining that a search engine ranks URLs in order
rank(U RLA ) ≥ rank(U RLB ) ≥ rank(U RLC ) ≥ rank(U RLD )
8
Ordinal Ranking of URLs from SE Queries
We are not interested in computing search engine’s absolute
ranking for a particular URL (PR values)
rank(U RLA ) = 0.92 rank(U RLB ) = 0.73
rank(U RLC ) = 0.42 rank(U RLD ) = 0.13
BUT
We are determining that a search engine ranks URLs in order
rank(U RLA ) ≥ rank(U RLB ) ≥ rank(U RLC ) ≥ rank(U RLD )
distance(U RLA , U RLB ) = distance(U RLB , U RLC )
8
Ordinal Ranking of URLs from SE Queries
9
Ordinal Ranking of URLs from SE Queries
• Search engines enforce query restrictions (length, amount
per day etc)
9
Ordinal Ranking of URLs from SE Queries
• Search engines enforce query restrictions (length, amount
per day etc)
• Build unbiased and overlapping queries
9
Ordinal Ranking of URLs from SE Queries
• Search engines enforce query restrictions (length, amount
per day etc)
• Build unbiased and overlapping queries
• site and OR operators
9
Ordinal Ranking of URLs from SE Queries
• Search engines enforce query restrictions (length, amount
per day etc)
• Build unbiased and overlapping queries
• site and OR operators
• Variation of strand sort
9
Ordinal Ranking of URLs from SE Queries
• Search engines enforce query restrictions (length, amount
per day etc)
• Build unbiased and overlapping queries
• site and OR operators
• Variation of strand sort
USC Georgia Ohio State Oklahoma Florida
site:http://usctrojans.cstv.com/sports/m-footbl/usc-m-footbl-body.html OR
site:http://uga.rivals.com/ OR
site:http://sportsillustrated.cnn.com/football/ncaa/teams/ohiost/ OR
site:http://www.soonersports.com/ OR
site:http://www.gatorzone.com/
9
Weighting Ranked URLs
• If real-world resources are mapped to more than one URL
(n > 1)
• Need to accumulate ranking score
• Determine one final overall school score
• Assign weights per URL depending on their rank
P
W eight = 1 −
T
P - Position of URL in result set
T - Total number of URLs in the list (n * number of teams)
10
Correlation Results
Kendall Tau used to test for statistically significant (p<0.05) correlation
1.0 1.0
0.9 ● 0.9
0.8
0.8 ●
●
● ● 0.7 ● ● ●
0.7 ● ● ●
● ● 0.6 ●
0.6 ●
0.5 ●
●
0.5
0.4
● ● ●
Kendall Tau
Kendall Tau
●
0.4 ● ● 0.3 ● ●
● ●
0.3 ● 0.2
●
●
0.2 0.1
0
0.1
●
−0.1
0
−0.2
−0.1
−0.3
−0.2 Yahoo Yahoo
−0.4
−0.3
● Google ● Google
−0.5
MSN MSN
−0.4 P>0.05 P>0.05
●
2007 W1 W3 W5 W7 W9 W11 W13 W15 2007 W1 W3 W5 W7 W9 W11 W13 W15
Time Intervals Time Intervals
Top 10 AP Poll Top 10 USA Poll
11
Correlation Results
Kendall Tau used to test for statistically significant (p<0.05) correlation
1.0 1.0
0.9 0.9
0.8
0.8
● 0.7
●
●
0.7 ●
0.6
● ● ●
0.6 0.5 ●
Kendall Tau
Kendall Tau
● ●
0.5 0.4
● ● ●
0.4 0.3 ● ●
●
●
0.2
●
0.3 ●
●
● ●
● ● ● ●
● ● 0.1
● ●
0.2
●
0
●
Yahoo Yahoo
0.1 ● Google −0.1 ● Google
MSN MSN
0 P>0.05 −0.2 P>0.05
2007 W1 W3 W5 W7 W9 W11 W13 W15 2007 W1 W3 W5 W7 W9 W11 W13 W15
Time Intervals Time Intervals
Top 25 AP Poll Top 25 USA Poll
12
Correlation Results
Kendall Tau used to test for statistically significant (p<0.05) correlation
1.0 1.0
0.9 0.9
0.8
0.8
● 0.7
●
●
0.7 ●
0.6
● ● ●
0.6 0.5 ●
Kendall Tau
Kendall Tau
● ●
0.5 0.4
● ● ●
0.4 0.3 ● ●
●
●
0.2
●
0.3 ●
●
● ●
● ● ● ●
● ● 0.1
● ●
0.2
●
0
●
Yahoo Yahoo
0.1 ● Google −0.1 ● Google
MSN MSN
0 P>0.05 −0.2 P>0.05
2007 W1 W3 W5 W7 W9 W11 W13 W15 2007 W1 W3 W5 W7 W9 W11 W13 W15
“Inertia” 25 USA Poll
Time Intervals Time Intervals
Top 25 AP Poll Top
12
n-Values for Correlation
Yahoo Yahoo
● Google ● Google
MSN MSN
P>0.05 P>0.05
● ●
8
8
7
7
● ● ● ● ●
6
6
n
n
● ● ● ● ●
5
5
● ● ● ● ●
4
4
● ● ● ● ● ● ●
3
3
● ● ● ● ●
2
●
2 ● ●
1
1
2007 W1 W3 W5 W7 W9 W11 W13 W15 2007 W1 W3 W5 W7 W9 W11 W13 W15
Time Intervals Time Intervals
Top 10 AP Poll Top 10 USA Poll
13
n-Values for Correlation
Yahoo Yahoo
● Google ● Google
MSN MSN
P>0.05 P>0.05
● ●
8
8
● ● ● ● ●
7
7
● ● ● ●
6
6
n
n
● ●
5
5
● ● ●
4
4
● ● ● ● ● ● ● ● ● ●
3
3
● ● ● ●
2
●
2 ●
1
1
2007 W1 W3 W5 W7 W9 W11 W13 W15 2007 W1 W3 W5 W7 W9 W11 W13 W15
Time Intervals Time Intervals
Top 25 AP Poll Top 25 USA Poll
14
n-Values for Correlation
Yahoo Yahoo
● Google ● Google
MSN MSN
P>0.05 P>0.05
● ●
8
8
● ● ● ● ●
7
7
● ● ● ●
6
6
n
n
● ●
5
5
● ● ●
4
4
● ● ● ● ● ● ● ● ● ●
3
3
● ● ● ●
2
●
2 ●
1
1
2007 W1 W3 W5 W7 W9 W11 W13 W15 2007 W1 W3 W5 W7 W9 W11 W13 W15
Time Intervals Time Intervals
Top 25 AP Poll
n=2..6 Top 25 USA Poll
14
Correlation of Overlapping URLs
Over Time
• 12 schools occur in all AP polls throughout the season
• Given the “inertia”, by how much does the web trail?
• Can we measure a “delayed correlation”?
• Declare AP ranking for each week as separate “truth values”
• Compute correlation between truth values and search engine
ranking
• Expect to see in increased correlation in the weeks following
the truth value
USC Georgia Ohio State Oklahoma
Florida Missouri Texas Texas Tech
Alabama BYU Penn State Utah 15
Concluding Remarks
• Inspired by “Does Authority mean Quality?” we asked “Does
Quality mean Authority?”
• High correlations for the last seasons final rankings and
rankings early in the season
• Correlation decreases because of “inertia”
• No correlation between attendance and search engine rankings
18
Concluding Remarks
• Inspired by “Does Authority mean Quality?” we asked “Does
Quality mean Authority?”
• High correlations for the last seasons final rankings and
rankings early in the season
• Correlation decreases because of “inertia”
• No correlation between attendance and search engine rankings
• Better query for mapping URLs e.g., include nicknames such as
“Hokies”
18
Concluding Remarks
• Inspired by “Does Authority mean Quality?” we asked “Does
Quality mean Authority?”
• High correlations for the last seasons final rankings and
rankings early in the season
• Correlation decreases because of “inertia”
• No correlation between attendance and search engine rankings
• Better query for mapping URLs e.g., include nicknames such as
“Hokies”
• Since link based metrics seem to slow, investigate more
dynamic metrics such as magnitude of search results, fan based
message board activity, etc. 18
Although authority means quality,
quality does not necessarily mean
authority - at least not immediately.
19
Although authority means quality,
quality does not necessarily mean
authority - at least not immediately.
Comparing the Performance of US College Football Teams
in the Web and on the Field
Questions?
Martin Klein Olena Hunsicker Michael L. Nelson
mklein@cs.odu.edu koval_olena@yahoo.com mln@cs.odu.edu
Old Dominion University 19
0 comments
Post a comment