Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

Mining Big Data and Open
Knowledge Sources to develop
transparent and serendipitous
content-based adaptive systems
Cataldo Musto, Giovanni Semeraro, Fedelucio Narducci

state of the art.
C.Musto, G.Semeraro - Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous
content-based adaptive systems - World Summit on Big Data and Organization Design, Paris, 16-17 May 2013

our research: personalization

Recommender Systems
Relevant items (movies, news, books, etc.) are pushed to the
user according to her preferences or her needs.

Amazon.com
Recommendations

current recommendation technologies share three
important drawbacks.

(1) training is a bottleneck.

need for
explicit
information
about
user interests.

(2) recsys are black boxes.

(3) suggestions are not surprising.

exploiting big data to build a novel generation
of content-based adaptive systems
solution

current work.
near future work.

big data.

Information
Overload
we can handle 126 bits of information
we deal with 393 bits of information
ratio: more than 3x(Source: Adrian C.Ott,The 24-hour customer)
consequence:

Information Overload

Big Data: obstacle or
opportunity?

cornestone 1
exploit social media to
model user
preferences.

social media are an opportunity
provide information about user preferences

example
user preferences in music from Facebook

implicit preferences
example

Play.me
playlist
Most popular songs of the artists extracted from Last.fm (as well as
those added through the enrichment) are proposed to the user.

Myusic
recommendations

cornestone 2
exploit entity linking algorithms
to make user proﬁles more
transparent and LOD-aware

MyFeeds
RSS recommendations

MyFeeds
transparent user preferences
extracted from Facebook.

MyFeeds
transparent user preferences
further processing

MyFeeds
entity linking algorithms
• They map free text with structured
information
• Wikipedia pages or DBpedia nodes
• examples
• Tag.me ,Wikipedia Miner, DBpedia
Spotlight, etc.

Tag.me
extracts the Wikipedia pages the content refers to.

Linked Open Data Cloud
Structured
(RDF)
representation
of the information
stored in Wikipedia.

Linked Open Data Cloud
Proﬁles based
on Tag.me are
LOD-aware

cornestone 3
exploit open knowledge sources
to make recommendation
techniques more serendipitous.

‘in vitro’ experiments
Watchmi plug-in
developed by Aprico.tv

From BOW to eBOW
Given a description of a TV show, we exploit ESA to
obtain an enhanced representation
The original set of features is enriched with the set of
Wikipedia articles related the most with theTV show

TV SHOW
Rad an Rad
Die besten Duelle der MotoGP
(Wheel to wheel
The best duels in the MotoGP)
Wikipedia(Articles(
großer&preis&von&italien&
(motorrad)&
großer&preis&von&malaysia&
(motorrad)&
großer&preis&von&tschechien&
(motorrad)&
scuderia&ferrari&
valen8no&rossi&
motorrad9wm9saison&2005&
max&biaggi&
großer&preis&der&usa&(motorrad)&
rad&(heraldik)&
loris&capirossi&
shin’ya&nakano&
motogp&
example
From BOW to eBOW

challenges.
issues.
recommendations.

Challenges and Issues
• Main challenge and issue:
• data representation and data ﬁltering
• How to exploit these novel data sylos?
• What information is relevant for personalization?
• What kind of processing do data need?
• Which one is the best representation?
• Do reasoning techniques improve proﬁles transparency and
personalization accuracy?
• Do people accept the exploitation of these data?
• How to model the context?

Recommendations
• Cornerstones
• Social media-based user proﬁling
• LOD-aware user proﬁles
• Open Knowledge Sources for Serendipitous Encounters
• Recommendations
• Promote the LOD initiative, to publish data in a structured
form, to enable reasoning on the information
• Make data sylos interconnected
• To design applications able to properly model, manage and
exploit the big amount of data coming from social media.

questions?
Cataldo Musto, Ph.D. - cataldo.musto@uniba.it

Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

More Related Content

Similar to Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems

More from Cataldo Musto

Recently uploaded

Mining Big Data and Open Knowledge Sources to develop transparent and serendipitous content-based adaptive systems