Im2013vit

SKIMMR: Making Knowledge
Discovery Easier
Vít Nováˇ ek (vit.novacek@deri.org)
c
February 8th, 2013 @ DERI meeting

Introduction SKIMMR Demo Evaluation Conclusions

Outline

Introduction
SKIMMR
KB Computation
KB Utilisation
Demo
Evaluation
Evaluated Features
Evaluation Methodology
Conclusions

1 / 10


Machine-Aided Skim Reading
Traditional (Skim) Reading
full reading – deep insights (slow)
skim reading – superﬁcial overview (quicker)

How Can Automation Help?
going deep is hard Image source:
http://a-pieceofpaper.blogspot.com

large scale shallow processing more feasible

What Kind of Automation?
extraction (text and data mining)
augmentation (computing more complex content)
indexing and querying
presentation of the results

Related Work
processing: text mining, graph analysis, distributional semantics, fuzzy IR
presentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . .

2 / 10


Input/Extraction Pipe-Lines

Text Extraction
preprocessing (tokenization, tagging, shallow parsing)
NE recognition
relation extraction
co-occurrence analysis + statistics (PMI, TF/IDF, . . . )
Image source:
http://atyoursurveys.blogspot.com

Digesting Linked Data
graph decomposition
cluster analysis
co-occurrence analysis + statistics (PMI, TF/IDF, . . . )

Extraction Results
(s, p, o, r , w) statements
subject, predicate, object, provenance, weight

3 / 10


Computing the Knowledge Base

Distributional Representation
aggregated co-occurrence/relation statements
statements → tensor representation
every element still linked to its provenance
matrix perspectives of the tensor
Image source:
www.bystonline.org

Augmentation
perspectives give rise to emergent patterns like:
semantic similarity
concept clusters and taxonomies
IF-THEN rules
concept ordering and relative relevance

4 / 10


Indexing the Knowledge Base

Term Index Provenance Index
T1 T2 ... Tn P1 P2 ... Pq
T1 ¯ ¯
w1,1 w1,2 . . . ¯
w1,n S1 w1,1 w1,2 . . . w1,q
T2 ¯ ¯
w2,1 w2,2 . . . ¯
w2,n S2 w2,1 w2,2 . . . w2,q
.
. .
. .
. .. .
. .
. .
. .
. .. .
.
. . . . . . . . . .
Tn ¯ ¯
wn,1 wn,2 . . . ¯
wn,n Sm wm,1 wm,2 . . . wm,q
¯
wi,j ∈ [0, 1] wi,j ∈ [0, 1]
Image source:
http://teptdataservices.blogspot.com

Statement Index Auxiliary Fulltext Index
S1 S2 ... Sm user’s entry point
T1 c1,1 c1,2 . . . c1,m increasing robustness
T2 c2,1 c2,2 . . . c2,m
. . . .
“keys”: queries
. . . .. .
. . . . . values: term identiﬁers
Tn cn,1 cn,2 . . . cn,m fairly standard IR:
ci,j ∈ {0, 1}
OKAPI BM25F

5 / 10


Querying the Knowledge Base
Initial Result Term Set
example query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz )
term index look-up:
¯ ¯ ¯
Fx = {(T1 , wx,1 ), (T2 , wx,2 ), . . . , (Tn , wx,n )}
¯ ¯ ¯
Fy = {(T1 , wy ,1 ), (T2 , wy ,2 ), . . . , (Tn , wy ,n )}
¯ ¯ ¯
Fz = {(T1 , wz,1 ), (T2 , wz,2 ), . . . , (Tn , wz,n )}
Image source:

combining atomic results: Fx ∩ (Fy ∪ Fz ) http://nuget.org

Complete Results
terms: RT = {(T1 , w1 ), (T2 , w2 ), . . . , Tn , wn }, where wiT are
T T T

the weights resulting from the combination
S S S
statements: RS = {(S1 , w1 ), (S2 , w2 ), . . . , (Sm , wm )}, where
wiS = fν ( n wjT cj,i )
j=1
P P P
provenances: RP = {(P1 , w1 ), (P2 , w2 ), . . . , (Pq , wq )}, where
wiP = f ( m wSw )
ν j=1 j j,i

6 / 10


Let’s Learn About Some Grim Stuff!

7 / 10


What to Evaluate?

Quality of the Extracted/Computed Content
“noise-to-signal” ratio
relevance of results w.r.t. queries
information value (obvious vs. enlightening)

User Experience
Image source:
http://voguepay.com

usability of SKIMMR
general
domain-speciﬁc
performance beneﬁts (over a base-line)
8 / 10


How to Evaluate?

Quality of the Extracted/Computed Content
identiﬁcation (or creation) of a gold standard
generalised IR measures
committee-based annotation of the results

User Experience Image source:
http://www.123rf.com

SUS survey
domain-speciﬁc survey
user performance analysis (SKIMMR vs. base-line)

9 / 10


Conclusions and Future Work
Current Status
machine-aided skim reading notion coined
basic theoretical background proposed
a prototype implemented (general and biomedical versions)
http://pypi.python.org/pypi/skimmr_gt/0.1-a1
http://pypi.python.org/pypi/skimmr_bm/0.1-a1
Image source:
http://support.pacifichost.com

Next Steps
evaluation (with a gold standard and sample users)
dissemination and follow-ups (write-up, proposals)
back-end extensions:
more (complex) types of relations
proper APIs (development, web service, . . . )
database and/or cloud storage
front-end extensions:
smoother transition between the graphs
complex querying
additional visualisations (trends, focused provenances, . . . )

10 / 10

Im2013vit

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Im2013vit

Similar to Im2013vit (20)

Recently uploaded

Recently uploaded (20)

Im2013vit