The document compares different methods for concept recommendation in open innovation scenarios using Linked Data. It describes using hyProximity and Random Indexing to suggest related concepts for innovation problems by exploring relationships between concepts in DBpedia. An evaluation of the methods on real problems showed they provided a reasonable list of concepts and had higher precision and unexpectedness than a baseline in a user study.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Linked Data-based Concept Recommendation: Comparison of Different Methods in Open Innovation Scenario
1. Linked Data-‐‑based
Concept Recommendation:
Comparison of Different Methods in Open
Innovation Scenario
Danica Damljanovic, Milan Stankovic,
Philippe Laublet
4. Finding
Meaningful
Connec0ons
Kaolinite
Clay
mining
extrac0on
from
…
rocks
…
Different
communi-es
use
different
terms
and
concepts
to
speak
about
seman-cally
related
things.
Such
“language”
defines
communi-es
and
separates
them.
Being
able
to
find
meaningful
connec-ons
between
concepts
would
enable
us
to
build
bridges
between
people
and
content.
h;p://bit.ly/hyProximity
5. Concept
recommenda0on
• Concepts
you
might
not
know
but
might
want
to
use:
to
annotate
your
content,
to
search
for
content,
to
search
for
people…
• Help
problem
promoters
discover
relevant
concepts
(problem
promoters
some0mes
not
field
experts)
• Discovery
=
relevance
+
unexpectedness
h;p://bit.ly/hyProximity
6. Discovering Direct and
Lateral Concepts
• HyProximity, a structure-based similarity
• Structure-based Statistical Semantics Similarity
Random Indexing, a well-known statistical semantics
from Information Retrieval to RDF
7. Linked
Data-‐based
Concept
Recommenda0on
DBPedia
Textual Concepts DBPedia
Zemanta
suggestions
Input
found in Exploration
the text
h;p://bit.ly/hyProximity
8. hyProximity
• We
start
from
several
seed
concepts
found
directly
in
the
text,
and
search
the
DBPedia
graph
• The
concepts
found
in
the
proximity
of
several
seed
concepts
are
considered
more
“in
context”
for
the
given
input
• Concepts
found
at
a
shorter
distance
from
the
seed
concepts
have
higher
hyProximity
9. Different
Distance
Func0ons
Things in France
skos:broader
other
property
Rivers in France Products of France Car Industry
Cities in France
2
2
2
2+1
Marne Seine Paris Chanel Peugeot BMW
• Hierarchical:
exploring
skos:broader
rela9ons
• Transversal:
exploring
transversal
links
• mixed:
a
linear
combina0on
of
hierarchical
and
transversal
research.hypios.com/hyproximity
10. Different
Distance
Func0ons
Things in France
skos:broader
other
property
Rivers in France Products of France Car Industry
Cities in France
famous for
flows through “fashion”
competitor
1
1
1
Marne Seine Paris Chanel Peugeot BMW
• Hierarchical:
exploring
skos:broader
rela0ons
• Transversal:
exploring
transversal
links
• Mixed:
a
linear
combina0on
of
hierarchical
and
transversal
research.hypios.com/hyproximity
11. Random Indexing
• Words which appear in the similar context - with the
same set of other words - are contextually related
e.g. synonyms.
• Synonyms tend not to co-occur with one another
directly, so indirect inference is required to draw
associations between words used to express the
same idea
12. Two steps to Random
Indexing
• Indexing
o For an RDF graph, generate virtual documents
o Prepare the corpus (pre-processing)
o Generate semantic index
• Search - given a term X calculate a cosine similarity
between the vector of that term and other vectors
in the semantic space
14. Indexing: virtual documents
lexicalise
S
P1
L1
L8
S
P2
L2
L7
L1
P10
S
P3
L3
P9
S
P4
O1
P1
P7
O2
P8
L6
S
P7
O2
P2
S
P4
O
P
L
L2
S
1 5 4
P3
P4
S
P4
O1
P6
L
L5
5
P6
S
P7
O
L3
O1
P5
2 P8
L6
L4
S
P7
O2
P
L7
9
S
P7
O2
P10
L8
Representative subgraph for URI=S
Virtual document for
URI=S
14
15. Experiments
• 26 real innovation problems from Hypios
• Measure of success: the suggested concepts
appear in the actual solutions (precision, recall, f-
measure)
(+) reasonable list of concepts from real scenarios
(-) not complete:
o User study: measure discovery = relevance
+unexpectedness
16. DBpedia Dataset
• Select a number of properties relevant to the Open
Innovation-related scenario
• dbo:product, dbp:pruducts, dbo:industry,
dbo:service, dbo:genre, and properties serving to
establish a hierarchical categorization of con-
cepts, namely dc:subject and skos:broader
17. Evaluation
• “Gold standard”
o Extract problem URIs
o Extract solution URIs
• Baseline:
o Google Adwords Keyword Tool: finds similar
topics based on their distribution in textual
corpora and the corpora of search queries.
o Suggesting up to 600 concepts which are then
used for Web crawling for finding experts.
19. User Study
• Suggestions being both relevant and unexpected
o the most valuable discoveries for the user
• 12 users
• 34 problem evaluations
o 3060 suggested concepts/keywords.
• For the chosen innovation problem, the evaluators
were presented with the lists of 30 top-ranked
suggestions generated by adWords, hyProximity
(mixed approach) and Random Indexing.
22. Conclusion
• Linked Data valuable source of knowledge for
concept recommendation
• Our two methods complementary
o hyProximity better for precision
o Random Indexing better for recall
• User study: unexpectedness higher with our
methods than with baseline
• Subjective user comment:
o Random Indexing: generic
o hyProximity: granular
o adWords: redundant
23. Thank You!
• Find out more:
• http://research.hypios.com/?page_id=165
Contact us:
• Danica Damljanovic @dancheeee
• Milan Stankovic: @milstan