Mining Big Data and Open
Knowledge Sources to develop
transparent and serendipitous
content-based adaptive systems
Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci
state of the art.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
our research: personalization
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Recommender Systems
Relevant items (movies, news, books, etc.) are pushed to the
user according to her preferences or her needs.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Amazon.com
Recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
current recommendation technologies share three
important drawbacks.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
(1) training is a bottleneck.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
need for
explicit
information
about
user interests.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
(2) recsys are black boxes.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
(3) suggestions are not surprising.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploiting big data to build a novel generation
of content-based adaptive systems
solution
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
current work.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
near future work.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
big data.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Information
Overload
we can handle 126 bits of information
we deal with 393 bits of information
ratio: more than 3x(Source: Adrian C.Ott,The 24-hour customer)
consequence:
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Information Overload
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Big Data: obstacle or
opportunity?
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
cornestone 1
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploit social media to
model user
preferences.
social media are an opportunity
provide information about user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
example
user preferences in music from Facebook
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
implicit preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
example
Play.me
playlist
Most popular songs of the artists extracted from Last.fm (as well as
those added through the enrichment) are proposed to the user.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Myusic
recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
cornestone 2
exploit entity linking algorithms
to make user profiles more
transparent and LOD-aware
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
MyFeeds
RSS recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
MyFeeds
transparent user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
extracted from Facebook.
MyFeeds
transparent user preferences
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
further processing
MyFeeds
entity linking algorithms
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• They map free text with structured
information
• Wikipedia pages or DBpedia nodes
• examples
• Tag.me ,Wikipedia Miner, DBpedia
Spotlight, etc.
Tag.me
extracts the Wikipedia pages the content refers to.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Linked Open Data Cloud
Structured
(RDF)
representation
of the information
stored in Wikipedia.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
Linked Open Data Cloud
Profiles based
on Tag.me are
LOD-aware
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
cornestone 3
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
exploit open knowledge sources
to make recommendation
techniques more serendipitous.
‘in vitro’ experiments
Watchmi plug-in
developed by Aprico.tv
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
From BOW to eBOW
Given a description of a TV show, we exploit ESA to
obtain an enhanced representation
The original set of features is enriched with the set of
Wikipedia articles related the most with theTV show
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
TV SHOW
Rad an Rad
Die besten Duelle der MotoGP
(Wheel to wheel
The best duels in the MotoGP)
Wikipedia(Articles(
großer&preis&von&italien&
(motorrad)&
großer&preis&von&malaysia&
(motorrad)&
großer&preis&von&tschechien&
(motorrad)&
scuderia&ferrari&
valen8no&rossi&
motorrad9wm9saison&2005&
motorrad9wm9saison&2006&
max&biaggi&
großer&preis&der&usa&(motorrad)&
motorrad9wm9saison&2008&
rad&(heraldik)&
loris&capirossi&
shin’ya&nakano&
motogp&
example
From BOW to eBOW
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
challenges.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
issues.
recommendations.
Challenges and Issues
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• Main challenge and issue:
• data representation and data filtering
• How to exploit these novel data sylos?
• What information is relevant for personalization?
• What kind of processing do data need?
• Which one is the best representation?
• Do reasoning techniques improve profiles transparency and
personalization accuracy?
• Do people accept the exploitation of these data?
• How to model the context?
Recommendations
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
• Cornerstones
• Social media-based user profiling
• LOD-aware user profiles
• Open Knowledge Sources for Serendipitous Encounters
• Recommendations
• Promote the LOD initiative, to publish data in a structured
form, to enable reasoning on the information
• Make data sylos interconnected
• To design applications able to properly model, manage and
exploit the big amount of data coming from social media.
questions?
Cataldo Musto, Ph.D. - cataldo.musto@uniba.it

Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

  • 1.
    Mining Big Dataand Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci
  • 2.
    state of theart. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 3.
    our research: personalization C.Musto,G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 4.
    Recommender Systems Relevant items(movies, news, books, etc.) are pushed to the user according to her preferences or her needs. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 5.
    Amazon.com Recommendations C.Musto, G.Semeraro -Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 6.
    current recommendation technologiesshare three important drawbacks. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 7.
    (1) training isa bottleneck. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 8.
    need for explicit information about user interests. C.Musto,G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 9.
    (2) recsys areblack boxes. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 10.
    (3) suggestions arenot surprising. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 11.
    exploiting big datato build a novel generation of content-based adaptive systems solution C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 12.
    current work. C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 near future work.
  • 13.
    C.Musto, G.Semeraro -Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 14.
    big data. C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 15.
    Information Overload we can handle126 bits of information we deal with 393 bits of information ratio: more than 3x(Source: Adrian C.Ott,The 24-hour customer) consequence: C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 16.
    Information Overload C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 17.
    Big Data: obstacleor opportunity? C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 18.
    cornestone 1 C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 exploit social media to model user preferences.
  • 19.
    social media arean opportunity provide information about user preferences C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 20.
    example user preferences inmusic from Facebook C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 21.
    implicit preferences C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 example
  • 22.
    Play.me playlist Most popular songsof the artists extracted from Last.fm (as well as those added through the enrichment) are proposed to the user. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 23.
    Myusic recommendations C.Musto, G.Semeraro -Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 24.
    cornestone 2 exploit entitylinking algorithms to make user profiles more transparent and LOD-aware C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 25.
    MyFeeds RSS recommendations C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 26.
    MyFeeds transparent user preferences C.Musto,G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 extracted from Facebook.
  • 27.
    MyFeeds transparent user preferences C.Musto,G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 further processing
  • 28.
    MyFeeds entity linking algorithms C.Musto,G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 • They map free text with structured information • Wikipedia pages or DBpedia nodes • examples • Tag.me ,Wikipedia Miner, DBpedia Spotlight, etc.
  • 29.
    Tag.me extracts the Wikipediapages the content refers to. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 30.
    Linked Open DataCloud Structured (RDF) representation of the information stored in Wikipedia. C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 31.
    Linked Open DataCloud Profiles based on Tag.me are LOD-aware C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 32.
    cornestone 3 C.Musto, G.Semeraro- Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 exploit open knowledge sources to make recommendation techniques more serendipitous.
  • 33.
    ‘in vitro’ experiments Watchmiplug-in developed by Aprico.tv C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 34.
    From BOW toeBOW Given a description of a TV show, we exploit ESA to obtain an enhanced representation The original set of features is enriched with the set of Wikipedia articles related the most with theTV show C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 35.
    TV SHOW Rad anRad Die besten Duelle der MotoGP (Wheel to wheel The best duels in the MotoGP) Wikipedia(Articles( großer&preis&von&italien& (motorrad)& großer&preis&von&malaysia& (motorrad)& großer&preis&von&tschechien& (motorrad)& scuderia&ferrari& valen8no&rossi& motorrad9wm9saison&2005& motorrad9wm9saison&2006& max&biaggi& großer&preis&der&usa&(motorrad)& motorrad9wm9saison&2008& rad&(heraldik)& loris&capirossi& shin’ya&nakano& motogp& example From BOW to eBOW C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013
  • 36.
    challenges. C.Musto, G.Semeraro -Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 issues. recommendations.
  • 37.
    Challenges and Issues C.Musto,G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 • Main challenge and issue: • data representation and data filtering • How to exploit these novel data sylos? • What information is relevant for personalization? • What kind of processing do data need? • Which one is the best representation? • Do reasoning techniques improve profiles transparency and personalization accuracy? • Do people accept the exploitation of these data? • How to model the context?
  • 38.
    Recommendations C.Musto, G.Semeraro -Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013 • Cornerstones • Social media-based user profiling • LOD-aware user profiles • Open Knowledge Sources for Serendipitous Encounters • Recommendations • Promote the LOD initiative, to publish data in a structured form, to enable reasoning on the information • Make data sylos interconnected • To design applications able to properly model, manage and exploit the big amount of data coming from social media.
  • 39.
    questions? Cataldo Musto, Ph.D.- cataldo.musto@uniba.it