yourHistory - entity linking for a personalized timeline of historic events

David Graus
David GrausLead Data Scientist
Gaza War

Britches
World War II

Berlin Wall

Woodstock
1950

1900

1910

1970

1920

9/11

Gulf War

1930

1980

1940

1950

1990

1960

BET Hiphop Awards
2000

1970

1980

2010

1990

2000

David Graus, Maria-Hendrike Peetz,
Daan Odijk, Maarten de Rijke, Ork de Rooij

2010
Entity Linking for a personalized timeline of historic events

•

Motivation

•

Method
•
•

Part II: Generate User Profile

•

Part III: Matching Events to User Profile

•
•

Part I: Fetch Candidate Historic Events

Part IV: Scoring & Ranking Events
Future Work
•

[…] To design and build innovative and robust prototypes and
demos for tools that analyse and/or integrate open web data for
educational purposes.
History education
yourHistory - entity linking for a personalized timeline of historic events
Personalized historic timeline

Gaza War

Britches
World War II

Berlin Wall

Woodstock
1950

1900

1910

1970

1920

9/11

Gulf War

1930

1980

1940

1950

1990

1960

BET Hiphop Awards
2000

1970

1980

2010

1990

2000

2010
Part I: Candidate Historic Events
Part I: Candidate Historic Events

select	
  ?concept	
  	
  
where	
  {	
  	
  
	
   ?concept	
  rdf:type	
  dbpedia-­‐owl:Event	
  	
  
	
   }
concept	
  	
  
	
  
ept	
  rdf:type	
  dbpedia-­‐owl:Event	
  	
  
concept	
  	
  
	
  
ept	
  rdf:type	
  dbpedia-­‐owl:Event	
  	
  
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
Part II: User Profile

MY FACEBOOK
PROFILE

BIO

POST
POST

LIKES

POST
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
Extract Information from Facebook profile

MY FACEBOOK
PROFILE

BIO

POST
POST

LIKES

POST
Access Facebook profile

MY FACEBOOK
PROFILE

BIO

POST
POST

LIKES

POST

{	
  
"id":	
  "1183880085",	
  
"likes":	
  {	
  
	
  	
  	
  	
  "data":	
  [	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Musician/band",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2013-­‐10-­‐27T11:37:51+0
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "NAS",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "113591595350795"	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Company",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2013-­‐10-­‐17T07:45:36+0
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "Infinibase",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "573216229380347"	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Magazine",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2013-­‐10-­‐04T13:55:10+0
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "New	
  Scientist	
  NL",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "369158433181445"	
  
	
  	
  	
  	
  	
  	
  },	
  
Extract text
attributes

•
•
•
•
•
•

{	
  
"id":	
  "1183880085",	
  
"likes":	
  {	
  
	
  	
  	
  	
  "data":	
  [	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Musician/band",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2013-­‐10-­‐27T11:37:51+0000",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "NAS",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "113591595350795"	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Company",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2013-­‐10-­‐17T07:45:36+0000",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "Infinibase",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "573216229380347"	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Magazine",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2013-­‐10-­‐04T13:55:10+0000",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "New	
  Scientist	
  NL",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "369158433181445"	
  
	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "category":	
  "Tv	
  show",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "created_time":	
  "2010-­‐05-­‐09T01:06:27+0000",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "name":	
  "The	
  Wire",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "id":	
  "5991693871"	
  
	
  	
  	
  	
  	
  	
  }	
  ]	
  
}

•
•
•
•
•
•
•
•
•
•
•
•
•

•

Story	
  
Omroep	
  Maxim	
  
Gamer01	
  
Breaking	
  Bad	
  
AT5	
  
Mad	
  Men	
  
The	
  Wire	
  
Monty	
  Python's	
  
Flying	
  Circus	
  
Flight	
  of	
  the	
  
Conchords	
  
Donnie	
  Darko	
  
Flevopark	
  Film	
  
Festival	
  
Do	
  The	
  Right	
  
Thing	
  
A	
  Clockwork	
  
Orange	
  
Wild	
  Style	
  
Princess	
  
Mononoke	
  
The	
  Fountain	
  
Pi	
  
Northfork	
  
La	
  Haine	
  
Zen	
  and	
  the	
  Art	
  
of	
  Motorcycle	
  
Maintenance	
  
Moon	
  Palace	
  

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Fountainhead	
  
The	
  Wind-­‐Up	
  
Bird	
  Chronicle	
  
Wu-­‐Tang	
  
J.Cole	
  
NAS	
  
Pusha	
  T	
  
ASAP	
  Rocky	
  
Ab-­‐Soul	
  
Chance	
  The	
  
Rapper	
  
Cannibal	
  Ox	
  
Bonobo	
  
Aesop	
  Rock	
  
Boards	
  Of	
  
Canada	
  
Jurassic	
  5	
  
GREMS	
  
Quasimoto	
  
Strange	
  Journey	
  
Volume	
  Three	
  
Drop	
  Velvet	
  
MODESELEKTOR	
  
IAM	
  
Derek	
  
The	
  Onion	
  
Imgur	
  
De	
  Speld	
  
Wu-­‐Tang	
  
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

ASAP	
  Rocky	
  
Ab-­‐Soul	
  
Chance	
  The	
  Rapper	
  
Cannibal	
  Ox	
  
Bonobo	
  
Aesop	
  Rock	
  
Boards	
  Of	
  Canada	
  
Jurassic	
  5	
  
GREMS	
  
Quasimoto	
  
Strange	
  Journey	
  Volume	
  Three	
  
Drop	
  Velvet	
  
MODESELEKTOR	
  
IAM	
  
Derek	
  
The	
  Onion	
  
Imgur	
  
De	
  Speld	
  
Wu-­‐Tang	
  
J.Cole	
  
I	
  Am	
  Fucking	
  Ambivalent	
  About	
  
Science	
  
NAS	
  
Pusha	
  T	
  
ASAP	
  Rocky	
  
Chrietitie	
  
Infinibase	
  
Marktplaatspoxc3xabzie	
  
Jeannette	
  Span	
  :	
  Spelen	
  
Entity Linking
•

Given a Knowledge Base

•

Link mentions of entities (or concepts) to their referent entities
Entity Linking
•

From Wikipedia:
•

Extract anchor texts (words used to link to Wikipedia pages)
!
!
!
!
!
!

•

For each n-gram n ↔ Wikipedia page W estimate:
•

Probability of using n-gram n to refer to Wikipedia page W
Entity Linking Example
Link Probability
“Nas” occurs 2475x in Wikipedia

!

is anchor

1.723x

is no anchor

752x
Entity Linking Example
Link Probability
“Nas” occurs 2475x in Wikipedia

!

is anchor

1723/2475

=

69,6%

is no anchor

752/2475

=

30.4%
Entity Linking Example
Commonness
•

Nas is used to refer to:
•

http://en.wikipedia.org/wiki/Nas

•

http://en.wikipedia.org/wiki/Naas

•

http://en.wikipedia.org/wiki/Nås

•

http://en.wikipedia.org/wiki/Nas (Ikaria)

•

http://en.wikipedia.org/wiki/Untitled Nas album
Entity Linking Example
Commonness
•

Nas is used to refer to:
•

http://en.wikipedia.org/wiki/Nas

14x

•

http://en.wikipedia.org/wiki/Naas

4x

•

http://en.wikipedia.org/wiki/Nås

3x

•

http://en.wikipedia.org/wiki/Nas (Ikaria)

2x

•

http://en.wikipedia.org/wiki/Untitled Nas album

2x
Entity Linking Example
Commonness
•

Nas is used to refer to:
•

http://en.wikipedia.org/wiki/Nas

14/25 =

56%

•

http://en.wikipedia.org/wiki/Naas

4/25 =

1.6%

•

http://en.wikipedia.org/wiki/Nås

3/25 =

1.2%

•

http://en.wikipedia.org/wiki/Nas (Ikaria)

2/25 =

0.8%

•

http://en.wikipedia.org/wiki/Untitled Nas album

2/25 =

0.8%
{	
  
	
  	
  	
  	
  "text":	
  "Nas",	
  
	
  	
  	
  	
  "links":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "senseProbability":	
  0.726027397260274,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "title":	
  "Nas",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "url":	
  "http://en.wikipedia.org/wiki/Nas"	
  
	
  	
  	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "senseProbability":	
  0.125,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "title":	
  "Naas",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "url":	
  "http://en.wikipedia.org/wiki/Naas"	
  
	
  	
  	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "senseProbability":	
  0.1111111111111111,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "title":	
  "Nås",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "url":	
  "http://en.wikipedia.org/wiki/N%C3%A5s"	
  
	
  	
  	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "senseProbability":	
  0.0006523157208088715,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "title":	
  "Nas	
  (Ikaria)",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "url":	
  "http://en.wikipedia.org/wiki/Nas%20%28Ikaria%29"	
  
	
  	
  	
  	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "senseProbability":	
  0.0006523157208088715,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "title":	
  "Untitled	
  Nas	
  album",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "url":	
  "http://en.wikipedia.org/wiki/Untitled%20Nas%20album"	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
}
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

AT5	
  
Mad	
  Men	
  
The	
  Wire	
  
Monty	
  Python's	
  Flying	
  
Circus	
  
Flight	
  of	
  the	
  Conchords	
  
Donnie	
  Darko	
  
Flevopark	
  Film	
  Festival	
  
Do	
  The	
  Right	
  Thing	
  
A	
  Clockwork	
  Orange	
  
Wild	
  Style	
  
Princess	
  Mononoke	
  
The	
  Fountain	
  
Pi	
  
Northfork	
  
La	
  Haine	
  
Zen	
  and	
  the	
  Art	
  of	
  
Motorcycle	
  Maintenance	
  
Moon	
  Palace	
  
The	
  Fountainhead	
  
The	
  Wind-­‐Up	
  Bird	
  
Chronicle	
  
Wu-­‐Tang	
  
J.Cole	
  
yourHistory - entity linking for a personalized timeline of historic events
Match Events to Profile Entities
Match Events to Profile Entities
Map Events to Wikipedia Entities
Match Events to Profile Entities
Matching metric #1: link overlap
Matching metric #1: link overlap
U.S.

Hiphop

NAS

Kanye!
West

Jay-Z
Damian!
Marley
Global!
War

U.S.
U.S.
Allies
Hiphop

Axis

NAS

Kanye!
West

Jay-Z
Damian!
Marley

World!
War II
Global!
War

U.S.
U.S.
Allies
Hiphop

Axis

NAS

Kanye!
West

Jay-Z
Damian!
Marley

1
World!
War II
Global!
War

1

U.S.

World!
War II

U.S.
Allies
Hiphop

Axis

NAS

Kanye!
West

Jay-Z
Damian!
Marley

Jay-Z

Hiphop

Kanye!
West

Link
#4

51st!
Grammy!
Awards
Global!
War

1

U.S.

World!
War II

U.S.
Allies
Hiphop

Axis

NAS

Kanye!
West

Jay-Z
Damian!
Marley

Jay-Z

Hiphop

Kanye!
West

Link
#4

3
51st!
Grammy!
Awards
Matching metric #2: direct link

U.S.

Hiphop

NAS

Kanye!
West

Jay-Z
Damian!
Marley

Jay-Z

Hiphop

Kanye!
West

51st!
Grammy!
Awards
Matching metric #3: textual similarity
NAS

51st!
Grammy!
Awards
Matching metric #3: textual similarity
NAS

51st!
Grammy!
Awards
Matching metric #3: textual similarity
NAS

51st!
Grammy!
Awards
51st!
Grammy!
Awards

World!
War II

Score: 0.74

Score: 0.35
Combine scores & rank events
	
  	
  	
  	
  "5043324":	
  {	
  
	
  	
  	
  	
  	
  	
  "event_title":	
  "Iraq	
  War",	
  
	
  	
  	
  	
  	
  	
  "related_entity_title":	
  "The	
  Wire",	
  
	
  	
  	
  	
  	
  	
  "score":	
  1.0,	
  
	
  	
  	
  	
  	
  	
  "event_date":	
  "2003-­‐03-­‐20"	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  "1376628":	
  {	
  
	
  	
  	
  	
  	
  	
  "event_title":	
  "Blankets	
  (comics)",	
  
	
  	
  	
  	
  	
  	
  "related_entity_title":	
  "Princess	
  Mononoke",	
  
	
  	
  	
  	
  	
  	
  "score":	
  0.11465851113504691,	
  
	
  	
  	
  	
  	
  	
  "event_date":	
  "2003-­‐07-­‐23"	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  "15694206":	
  {	
  
	
  	
  	
  	
  	
  	
  "event_title":	
  "2006	
  LG	
  Hockey	
  Games",	
  
	
  	
  	
  	
  	
  	
  "related_entity_title":	
  "Reimersholme",	
  
	
  	
  	
  	
  	
  	
  "score":	
  0.3467068139664613,	
  
	
  	
  	
  	
  	
  	
  "event_date":	
  "2006-­‐04-­‐29"	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  "4861876":	
  {	
  
	
  	
  	
  	
  	
  	
  "event_title":	
  "2005	
  UEFA	
  Champions	
  League	
  Final",	
  
	
  	
  	
  	
  	
  	
  "related_entity_title":	
  "Istanbul",	
  
	
  	
  	
  	
  	
  	
  "score":	
  1.0,	
  
	
  	
  	
  	
  	
  	
  "event_date":	
  "2005-­‐05-­‐25"	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  "31966809":	
  {	
  
	
  	
  	
  	
  	
  	
  "event_title":	
  "63rd	
  Primetime	
  Emmy	
  Awards",	
  
	
  	
  	
  	
  	
  	
  "related_entity_title":	
  "Mad	
  Men",	
  
	
  	
  	
  	
  	
  	
  "score":	
  0.04039278737569369,	
  
	
  	
  	
  	
  	
  	
  "event_date":	
  "2011-­‐09-­‐18"	
  
	
  	
  	
  	
  },
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
Future Work
•

Log interactions

•

Interpret clicks as (implicit) feedback:
•

Click on Event: user is interested

•

No click on Event: user is not

•

Learn scoring & ranking functions
Thank you! Questions?
Try yourHistory:
See our poster:

http://apps.facebook.com/yourHistory

#98

!
!
!
!






David Graus


d.p.graus@uva.nl
@dvdgrs
1 of 54

Recommended

Understanding Email Traffic (talk @ E-Discovery NL Symposium) by
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)David Graus
4.7K views25 slides
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams by
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsDavid Graus
2.6K views78 slides
Big Data & Machine Learning - Mogelijkheden & Valkuilen by
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenDavid Graus
4.5K views100 slides
Real-time Semantic Web with Twitter Annotations by
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsJoshua Shinavier
5.7K views20 slides
The Gulf Tower project by
The Gulf Tower projectThe Gulf Tower project
The Gulf Tower projectDavid Newbury
646 views66 slides
Making sense out of things on the web by
Making sense out of things on the webMaking sense out of things on the web
Making sense out of things on the webPradeep Varadaraja Banavara
1.8K views71 slides

More Related Content

Similar to yourHistory - entity linking for a personalized timeline of historic events

Linked Data Progress - IFLA 2013 by
Linked Data Progress - IFLA 2013Linked Data Progress - IFLA 2013
Linked Data Progress - IFLA 2013Richard Wallis
5.4K views48 slides
The Web of Data is Our Oyster by
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our OysterRichard Wallis
2K views133 slides
ServerSide Javascript on Freebase - SF JavaScript meetup #9 by
ServerSide Javascript on Freebase - SF JavaScript meetup #9ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9Will Moffat
779 views21 slides
Hacking The Newsroom by
Hacking The NewsroomHacking The Newsroom
Hacking The Newsroomblprnt
734 views117 slides
Civil War Data 150 at DLF Fall Forum 2011 by
Civil War Data 150 at DLF Fall Forum 2011Civil War Data 150 at DLF Fall Forum 2011
Civil War Data 150 at DLF Fall Forum 2011Jon Voss
1.7K views67 slides
The Commons on Filckr: a primer by
The Commons on Filckr: a primerThe Commons on Filckr: a primer
The Commons on Filckr: a primerKennisland
633 views128 slides

Similar to yourHistory - entity linking for a personalized timeline of historic events(20)

Linked Data Progress - IFLA 2013 by Richard Wallis
Linked Data Progress - IFLA 2013Linked Data Progress - IFLA 2013
Linked Data Progress - IFLA 2013
Richard Wallis5.4K views
ServerSide Javascript on Freebase - SF JavaScript meetup #9 by Will Moffat
ServerSide Javascript on Freebase - SF JavaScript meetup #9ServerSide Javascript on Freebase - SF JavaScript meetup #9
ServerSide Javascript on Freebase - SF JavaScript meetup #9
Will Moffat779 views
Hacking The Newsroom by blprnt
Hacking The NewsroomHacking The Newsroom
Hacking The Newsroom
blprnt734 views
Civil War Data 150 at DLF Fall Forum 2011 by Jon Voss
Civil War Data 150 at DLF Fall Forum 2011Civil War Data 150 at DLF Fall Forum 2011
Civil War Data 150 at DLF Fall Forum 2011
Jon Voss1.7K views
The Commons on Filckr: a primer by Kennisland
The Commons on Filckr: a primerThe Commons on Filckr: a primer
The Commons on Filckr: a primer
Kennisland633 views
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm... by MODUL Technology GmbH
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
Web Driven Revolution For Library Data by Richard Wallis
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
Richard Wallis1.4K views
Data Science - The Most Profitable Movie Characteristic by Cheah Eng Soon
Data Science -  The Most Profitable Movie CharacteristicData Science -  The Most Profitable Movie Characteristic
Data Science - The Most Profitable Movie Characteristic
Cheah Eng Soon184 views
The ARK Identifier Scheme at Ten Years Old by John Kunze
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
John Kunze2.2K views
Looking at Content Recommendations through a Search Lens - Extended Version by Sonya Liberman
Looking at Content Recommendations through a Search Lens - Extended VersionLooking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended Version
Sonya Liberman128 views
Maps and Math CwiC by DaveSabol
Maps and Math CwiCMaps and Math CwiC
Maps and Math CwiC
DaveSabol950 views
Event stream processing using Kafka streams by Fredrik Vraalsen
Event stream processing using Kafka streamsEvent stream processing using Kafka streams
Event stream processing using Kafka streams
Fredrik Vraalsen1.5K views
R, Data Wrangling & Kaggle Data Science Competitions by Krishna Sankar
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
Krishna Sankar3.9K views
GRASS GIS, Star Trek and old Video Tape – a reference case on audiovisual pre... by Peter Löwe
GRASS GIS, Star Trek and old Video Tape – a reference case on audiovisual pre...GRASS GIS, Star Trek and old Video Tape – a reference case on audiovisual pre...
GRASS GIS, Star Trek and old Video Tape – a reference case on audiovisual pre...
Peter Löwe1.3K views
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S... by Spark Summit
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit9.4K views
HackMIT Lightning Talk by Matt Harris
HackMIT Lightning TalkHackMIT Lightning Talk
HackMIT Lightning Talk
Matt Harris1.8K views
Freebase - Semantic Technologies 2010 Code Camp by Jamie Taylor
Freebase - Semantic Technologies 2010 Code CampFreebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code Camp
Jamie Taylor3.3K views
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor... by Ian Milligan
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...
Ian Milligan626 views

More from David Graus

Pragmatic ethical and fair AI for data scientists by
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsDavid Graus
177 views36 slides
Bias in Recommendations by
Bias in RecommendationsBias in Recommendations
Bias in RecommendationsDavid Graus
2.8K views191 slides
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity. by
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
2.4K views104 slides
CAT/AI: Computer Assisted Translation 
Assessment for Impact by
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactDavid Graus
208 views60 slides
Opening the Black Box of User Profiles in Content-based Recommender Systems by
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsDavid Graus
108 views43 slides
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy by
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyDavid Graus
2.9K views76 slides

More from David Graus(18)

Pragmatic ethical and fair AI for data scientists by David Graus
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
David Graus177 views
Bias in Recommendations by David Graus
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus2.8K views
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity. by David Graus
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
David Graus2.4K views
CAT/AI: Computer Assisted Translation 
Assessment for Impact by David Graus
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
David Graus208 views
Opening the Black Box of User Profiles in Content-based Recommender Systems by David Graus
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
David Graus108 views
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy by David Graus
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
David Graus2.9K views
Layman's Talk: Entities of Interest --- Discovery in Digital Traces by David Graus
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
David Graus265 views
Financial News Mining @ PyData Amsterdam by David Graus
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
David Graus748 views
De Macht van Data --- Hoe algoritmen ons leven vormgeven by David Graus
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
David Graus293 views
Financial News Mining @ FD Mediagroep/Company.info by David Graus
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
David Graus2.5K views
Analyzing and Predicting Task Reminders by David Graus
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
David Graus415 views
Dynamic Collective Entity Representations for Entity Ranking by David Graus
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus497 views
Dynamic Collective Entity Representations for Entity Ranking by David Graus
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus566 views
Understanding Email Traffic by David Graus
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus530 views
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th by David Graus
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus554 views
Semantic Search in E-Discovery by David Graus
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus939 views
Semantic Annotation of the Cyttron Database by David Graus
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
David Graus805 views
Semantic annotation, clustering and visualization by David Graus
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
David Graus546 views

Recently uploaded

The Playing cards.pptx by
The Playing cards.pptxThe Playing cards.pptx
The Playing cards.pptxdivyabhana2
21 views5 slides
PDF.pdf by
PDF.pdfPDF.pdf
PDF.pdfoliverumr
6 views1 slide
SOCO 9.pdf by
SOCO 9.pdfSOCO 9.pdf
SOCO 9.pdfSocioCosmos
6 views1 slide
Soco 7.pdf by
Soco 7.pdfSoco 7.pdf
Soco 7.pdfSocioCosmos
6 views1 slide
SOCO 8.pdf by
SOCO 8.pdfSOCO 8.pdf
SOCO 8.pdfSocioCosmos
5 views1 slide
"Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C... by
"Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C..."Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C...
"Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C...Embtel Solutions
13 views19 slides

Recently uploaded(7)

The Playing cards.pptx by divyabhana2
The Playing cards.pptxThe Playing cards.pptx
The Playing cards.pptx
divyabhana221 views
"Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C... by Embtel Solutions
"Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C..."Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C...
"Mastering Social Media Marketing: A Guide to Fremont's Local Influence and C...
Embtel Solutions13 views
Unlock the Power of Viral Marketing 7 Proven Strategies to Amplify Your Brand... by Sarah Boyer
Unlock the Power of Viral Marketing 7 Proven Strategies to Amplify Your Brand...Unlock the Power of Viral Marketing 7 Proven Strategies to Amplify Your Brand...
Unlock the Power of Viral Marketing 7 Proven Strategies to Amplify Your Brand...
Sarah Boyer6 views

yourHistory - entity linking for a personalized timeline of historic events

  • 1. Gaza War Britches World War II Berlin Wall Woodstock 1950 1900 1910 1970 1920 9/11 Gulf War 1930 1980 1940 1950 1990 1960 BET Hiphop Awards 2000 1970 1980 2010 1990 2000 David Graus, Maria-Hendrike Peetz, Daan Odijk, Maarten de Rijke, Ork de Rooij 2010
  • 2. Entity Linking for a personalized timeline of historic events • Motivation • Method • • Part II: Generate User Profile • Part III: Matching Events to User Profile • • Part I: Fetch Candidate Historic Events Part IV: Scoring & Ranking Events Future Work
  • 3. • […] To design and build innovative and robust prototypes and demos for tools that analyse and/or integrate open web data for educational purposes.
  • 6. Personalized historic timeline Gaza War Britches World War II Berlin Wall Woodstock 1950 1900 1910 1970 1920 9/11 Gulf War 1930 1980 1940 1950 1990 1960 BET Hiphop Awards 2000 1970 1980 2010 1990 2000 2010
  • 7. Part I: Candidate Historic Events
  • 8. Part I: Candidate Historic Events select  ?concept     where  {       ?concept  rdf:type  dbpedia-­‐owl:Event       }
  • 9. concept       ept  rdf:type  dbpedia-­‐owl:Event    
  • 10. concept       ept  rdf:type  dbpedia-­‐owl:Event    
  • 14. Part II: User Profile MY FACEBOOK PROFILE BIO POST POST LIKES POST
  • 20. Extract Information from Facebook profile MY FACEBOOK PROFILE BIO POST POST LIKES POST
  • 21. Access Facebook profile MY FACEBOOK PROFILE BIO POST POST LIKES POST {   "id":  "1183880085",   "likes":  {          "data":  [              {                  "category":  "Musician/band",                  "created_time":  "2013-­‐10-­‐27T11:37:51+0                "name":  "NAS",                  "id":  "113591595350795"              },              {                  "category":  "Company",                  "created_time":  "2013-­‐10-­‐17T07:45:36+0                "name":  "Infinibase",                  "id":  "573216229380347"              },              {                  "category":  "Magazine",                  "created_time":  "2013-­‐10-­‐04T13:55:10+0                "name":  "New  Scientist  NL",                  "id":  "369158433181445"              },  
  • 22. Extract text attributes • • • • • • {   "id":  "1183880085",   "likes":  {          "data":  [              {                  "category":  "Musician/band",                  "created_time":  "2013-­‐10-­‐27T11:37:51+0000",                  "name":  "NAS",                  "id":  "113591595350795"              },              {                  "category":  "Company",                  "created_time":  "2013-­‐10-­‐17T07:45:36+0000",                  "name":  "Infinibase",                  "id":  "573216229380347"              },              {                  "category":  "Magazine",                  "created_time":  "2013-­‐10-­‐04T13:55:10+0000",                  "name":  "New  Scientist  NL",                  "id":  "369158433181445"              },              {                  "category":  "Tv  show",                  "created_time":  "2010-­‐05-­‐09T01:06:27+0000",                  "name":  "The  Wire",                  "id":  "5991693871"              }  ]   } • • • • • • • • • • • • • • Story   Omroep  Maxim   Gamer01   Breaking  Bad   AT5   Mad  Men   The  Wire   Monty  Python's   Flying  Circus   Flight  of  the   Conchords   Donnie  Darko   Flevopark  Film   Festival   Do  The  Right   Thing   A  Clockwork   Orange   Wild  Style   Princess   Mononoke   The  Fountain   Pi   Northfork   La  Haine   Zen  and  the  Art   of  Motorcycle   Maintenance   Moon  Palace   • • • • • • • • • • • • • • • • • • • • • • • • Fountainhead   The  Wind-­‐Up   Bird  Chronicle   Wu-­‐Tang   J.Cole   NAS   Pusha  T   ASAP  Rocky   Ab-­‐Soul   Chance  The   Rapper   Cannibal  Ox   Bonobo   Aesop  Rock   Boards  Of   Canada   Jurassic  5   GREMS   Quasimoto   Strange  Journey   Volume  Three   Drop  Velvet   MODESELEKTOR   IAM   Derek   The  Onion   Imgur   De  Speld   Wu-­‐Tang  
  • 23. • • • • • • • • • • • • • • • • • • • • • • • • • • • • ASAP  Rocky   Ab-­‐Soul   Chance  The  Rapper   Cannibal  Ox   Bonobo   Aesop  Rock   Boards  Of  Canada   Jurassic  5   GREMS   Quasimoto   Strange  Journey  Volume  Three   Drop  Velvet   MODESELEKTOR   IAM   Derek   The  Onion   Imgur   De  Speld   Wu-­‐Tang   J.Cole   I  Am  Fucking  Ambivalent  About   Science   NAS   Pusha  T   ASAP  Rocky   Chrietitie   Infinibase   Marktplaatspoxc3xabzie   Jeannette  Span  :  Spelen  
  • 24. Entity Linking • Given a Knowledge Base • Link mentions of entities (or concepts) to their referent entities
  • 25. Entity Linking • From Wikipedia: • Extract anchor texts (words used to link to Wikipedia pages) ! ! ! ! ! ! • For each n-gram n ↔ Wikipedia page W estimate: • Probability of using n-gram n to refer to Wikipedia page W
  • 26. Entity Linking Example Link Probability “Nas” occurs 2475x in Wikipedia ! is anchor 1.723x is no anchor 752x
  • 27. Entity Linking Example Link Probability “Nas” occurs 2475x in Wikipedia ! is anchor 1723/2475 = 69,6% is no anchor 752/2475 = 30.4%
  • 28. Entity Linking Example Commonness • Nas is used to refer to: • http://en.wikipedia.org/wiki/Nas • http://en.wikipedia.org/wiki/Naas • http://en.wikipedia.org/wiki/Nås • http://en.wikipedia.org/wiki/Nas (Ikaria) • http://en.wikipedia.org/wiki/Untitled Nas album
  • 29. Entity Linking Example Commonness • Nas is used to refer to: • http://en.wikipedia.org/wiki/Nas 14x • http://en.wikipedia.org/wiki/Naas 4x • http://en.wikipedia.org/wiki/Nås 3x • http://en.wikipedia.org/wiki/Nas (Ikaria) 2x • http://en.wikipedia.org/wiki/Untitled Nas album 2x
  • 30. Entity Linking Example Commonness • Nas is used to refer to: • http://en.wikipedia.org/wiki/Nas 14/25 = 56% • http://en.wikipedia.org/wiki/Naas 4/25 = 1.6% • http://en.wikipedia.org/wiki/Nås 3/25 = 1.2% • http://en.wikipedia.org/wiki/Nas (Ikaria) 2/25 = 0.8% • http://en.wikipedia.org/wiki/Untitled Nas album 2/25 = 0.8%
  • 31. {          "text":  "Nas",          "links":  [                  {                          "senseProbability":  0.726027397260274,                          "title":  "Nas",                          "url":  "http://en.wikipedia.org/wiki/Nas"                  },                  {                          "senseProbability":  0.125,                          "title":  "Naas",                          "url":  "http://en.wikipedia.org/wiki/Naas"                  },                  {                          "senseProbability":  0.1111111111111111,                          "title":  "Nås",                          "url":  "http://en.wikipedia.org/wiki/N%C3%A5s"                  },                  {                          "senseProbability":  0.0006523157208088715,                          "title":  "Nas  (Ikaria)",                          "url":  "http://en.wikipedia.org/wiki/Nas%20%28Ikaria%29"                  },                  {                          "senseProbability":  0.0006523157208088715,                          "title":  "Untitled  Nas  album",                          "url":  "http://en.wikipedia.org/wiki/Untitled%20Nas%20album"                  }   }
  • 32. • • • • • • • • • • • • • • • • • • • • • AT5   Mad  Men   The  Wire   Monty  Python's  Flying   Circus   Flight  of  the  Conchords   Donnie  Darko   Flevopark  Film  Festival   Do  The  Right  Thing   A  Clockwork  Orange   Wild  Style   Princess  Mononoke   The  Fountain   Pi   Northfork   La  Haine   Zen  and  the  Art  of   Motorcycle  Maintenance   Moon  Palace   The  Fountainhead   The  Wind-­‐Up  Bird   Chronicle   Wu-­‐Tang   J.Cole  
  • 34. Match Events to Profile Entities
  • 35. Match Events to Profile Entities
  • 36. Map Events to Wikipedia Entities
  • 37. Match Events to Profile Entities
  • 38. Matching metric #1: link overlap
  • 39. Matching metric #1: link overlap
  • 45. Matching metric #2: direct link U.S. Hiphop NAS Kanye! West Jay-Z Damian! Marley Jay-Z Hiphop Kanye! West 51st! Grammy! Awards
  • 46. Matching metric #3: textual similarity NAS 51st! Grammy! Awards
  • 47. Matching metric #3: textual similarity NAS 51st! Grammy! Awards
  • 48. Matching metric #3: textual similarity NAS 51st! Grammy! Awards
  • 50. Combine scores & rank events        "5043324":  {              "event_title":  "Iraq  War",              "related_entity_title":  "The  Wire",              "score":  1.0,              "event_date":  "2003-­‐03-­‐20"          },          "1376628":  {              "event_title":  "Blankets  (comics)",              "related_entity_title":  "Princess  Mononoke",              "score":  0.11465851113504691,              "event_date":  "2003-­‐07-­‐23"          },          "15694206":  {              "event_title":  "2006  LG  Hockey  Games",              "related_entity_title":  "Reimersholme",              "score":  0.3467068139664613,              "event_date":  "2006-­‐04-­‐29"          },          "4861876":  {              "event_title":  "2005  UEFA  Champions  League  Final",              "related_entity_title":  "Istanbul",              "score":  1.0,              "event_date":  "2005-­‐05-­‐25"          },          "31966809":  {              "event_title":  "63rd  Primetime  Emmy  Awards",              "related_entity_title":  "Mad  Men",              "score":  0.04039278737569369,              "event_date":  "2011-­‐09-­‐18"          },
  • 53. Future Work • Log interactions • Interpret clicks as (implicit) feedback: • Click on Event: user is interested • No click on Event: user is not • Learn scoring & ranking functions
  • 54. Thank you! Questions? Try yourHistory: See our poster: http://apps.facebook.com/yourHistory
 #98 ! ! ! ! 

 

 David Graus

 d.p.graus@uva.nl @dvdgrs