Exploring Content with Wikipedia

Exploring Content
with Semantic Transformations
using Collaborative Knowledge Bases
Yegin Genc
Prof. Jeffrey V. Nickerson

OBJECTIVE

Understanding text automatically to support
search driven exploratory activities.

EXPLORATORY SEARCH

LOOKUP

Fact retrieval
Known item search
Navigation

Marchionini, G. (2006)

LEARN

Knowledge acquisition
Comprehension/interpretation
Comparison

INVESTIGATE

Accretion
Analysis
Exclusion/Negation

EXPLORATORY SEARCH
ILL-STRUCTURED PROBLEM
• No single right approach
• Problem definitions change as new
information is gathered

Text: “ Foreign Minorities Germany ”

Exploratory Search Task

Given a journal abstract, rank other abstracts
based on their relevancy to the seed abstract.

Evaluation is based on relevancy and diversity.

Concepts

Candidates
Seed
Document

(candidates that match
to a Wikipedia Page title
and connected through Ontology)

n-grams
(1 to 3)

CONCEPT– WORD
K (W x K)

d
Tf-idf(D)

DOCUMENT – CONCEPT
Θ (D x K)

k

DOCUMENT – W0RD
D (D x W )

k

*
D: Documents

=

d

Tf-idf(K)
K: Concepts

Argsort (row.sum(Θ) )

W: Words

EXTRACTING CONCEPT NETWORK
“Representation independence formally characterizes the
encapsulation provided by language constructs for data
abstraction and justifies reasoning by simulation.
Representation independence has been shown for a
variety of languages and constructs but not for shared
references to mutable state; indeed it fails in general for
such languages. This article formulates representation
independence for classes, in an imperative, objectoriented language with pointers, subclassing and dynamic
dispatch, class oriented visibility control, recursive types
and methods, and a simple form of module. An instance
of a class is considered to implement an abstraction using
private fields and so-called representation objects.
Encapsulation of representation objects is expressed by a
restriction,
called
confinement,
on
aliasing.
Representation independence is proved for programs
satisfying the confinement condition. A static analysis is
given for confinement that accepts common designs such
as the observer and factory patterns. The formalization
takes into account not only the usual interface between a
client and a class that provides an abstraction but also the
interface (often called protected") between the class
and its subclasses."

WIKIPEDIA PAGES AS CONCEPTS
Solar System
“The Solar System[a] consists
of the Sun and the
astronomical objects
gravitationally bound in orbit
around it, all of which formed
from the collapse of a giant
molecular cloud
approximately 4.6 billion
years ago…”
(http://en.wikipedia.org/wiki/Solar
_System)

Word Stem

Occ. Freq.

abstract

53

0.056

program

44

0.046

langu

33

0.035

spec

16

0.017

comput

12

0.013

conceiv

12

0.013

dat

12

0.013

bk = p(Wi | k) =

{Wi Î k}
N

å {W Î k}
i

i

βk : Per-concept word distribution

RANKING DOCUMENTS

DOCUMENT – W0RD
D (D x W )

Θ (D x K)

CONCEPT– WORD
K (W x K)

k

k
d

=

*

D: Documents

K: Concepts
W: Words

d

SORT DOCUMENTS

DOCUMENT – W0RD
D (D x W )

Θ (D x K)

CONCEPT– WORD
K (W x K)

k

k
d

=

*

D: Documents

K: Concepts
W: Words

d

EXPERIMENT
Given a journal abstract, rank other abstracts based on
their relevancy to the seed abstract.

• Data: 619 abstracts of the Journal of the ACM
(JACM) and their references.
• Task: Select Top-k (5,10,15, and 20) relevant
abstracts.
• Observe: Relevancy (measured by LSA vector
similarity) and Diversity (measured through the
coverage of the references.)

MAXIMAL MARGINAL RELEVANCE
• a measure to increase the diversity of documents
retrieved by an IR system

-Similarity to query: BM25 (Xapian1)
-Similarity to results: LSA similarity (Gensim2)
1.
2.

http://xapian.org
http://radimrehurek.com/gensim/

CONCLUDING REMARKS
• Our Wiki based technique provides high
diversity with low relevancy loss.
• Semantics embedded in concept networks
extracted from Wikipedia can improve
exploratory search tasks.

Exploring Content with Wikipedia

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (15)

Similar to Exploring Content with Wikipedia

Similar to Exploring Content with Wikipedia (20)

Recently uploaded

Recently uploaded (20)

Exploring Content with Wikipedia

Editor's Notes