Unlike full reading, 'skim-reading' involves the process of looking quickly over information in an attempt to cover more material whilst still being able to retain a superficial view of the underlying content. Within this work, we specifically emulate this natural human activity by providing a dynamic graph-based view of entities automatically extracted from text using superficial text parsing / processing techniques. We provide a preliminary web-based tool (called `SKIMMR') that generates a network of inter-related concepts from a set of documents. In SKIMMR, a user may browse the network to investigate the lexically-driven information space extracted from the documents. When a particular area of that space looks interesting to a user, the tool can then display the documents that are most relevant to the displayed concepts. We present this as a simple, viable methodology for browsing a document collection (such as a collection scientific research articles) in an attempt to limit the information overload of examining that document collection.
3. Introduction SKIMMR Demo Evaluation Conclusions
Machine-Aided Skim Reading
Traditional (Skim) Reading
full reading – deep insights (slow)
skim reading – superficial overview (quicker)
How Can Automation Help?
going deep is hard Image source:
http://a-pieceofpaper.blogspot.com
large scale shallow processing more feasible
What Kind of Automation?
extraction (text and data mining)
augmentation (computing more complex content)
indexing and querying
presentation of the results
Related Work
processing: text mining, graph analysis, distributional semantics, fuzzy IR
presentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . .
2 / 10
5. Introduction SKIMMR Demo Evaluation Conclusions
Computing the Knowledge Base
Distributional Representation
aggregated co-occurrence/relation statements
statements → tensor representation
every element still linked to its provenance
matrix perspectives of the tensor
Image source:
www.bystonline.org
Augmentation
perspectives give rise to emergent patterns like:
semantic similarity
concept clusters and taxonomies
IF-THEN rules
concept ordering and relative relevance
4 / 10
7. Introduction SKIMMR Demo Evaluation Conclusions
Querying the Knowledge Base
Initial Result Term Set
example query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz )
term index look-up:
¯ ¯ ¯
Fx = {(T1 , wx,1 ), (T2 , wx,2 ), . . . , (Tn , wx,n )}
¯ ¯ ¯
Fy = {(T1 , wy ,1 ), (T2 , wy ,2 ), . . . , (Tn , wy ,n )}
¯ ¯ ¯
Fz = {(T1 , wz,1 ), (T2 , wz,2 ), . . . , (Tn , wz,n )}
Image source:
combining atomic results: Fx ∩ (Fy ∪ Fz ) http://nuget.org
Complete Results
terms: RT = {(T1 , w1 ), (T2 , w2 ), . . . , Tn , wn }, where wiT are
T T T
the weights resulting from the combination
S S S
statements: RS = {(S1 , w1 ), (S2 , w2 ), . . . , (Sm , wm )}, where
wiS = fν ( n wjT cj,i )
j=1
P P P
provenances: RP = {(P1 , w1 ), (P2 , w2 ), . . . , (Pq , wq )}, where
wiP = f ( m wSw )
ν j=1 j j,i
6 / 10
8. Introduction SKIMMR Demo Evaluation Conclusions
Let’s Learn About Some Grim Stuff!
7 / 10
9. Introduction SKIMMR Demo Evaluation Conclusions
What to Evaluate?
Quality of the Extracted/Computed Content
“noise-to-signal” ratio
relevance of results w.r.t. queries
information value (obvious vs. enlightening)
User Experience
Image source:
http://voguepay.com
usability of SKIMMR
general
domain-specific
performance benefits (over a base-line)
8 / 10
10. Introduction SKIMMR Demo Evaluation Conclusions
How to Evaluate?
Quality of the Extracted/Computed Content
identification (or creation) of a gold standard
generalised IR measures
committee-based annotation of the results
User Experience Image source:
http://www.123rf.com
SUS survey
domain-specific survey
user performance analysis (SKIMMR vs. base-line)
9 / 10
11. Introduction SKIMMR Demo Evaluation Conclusions
Conclusions and Future Work
Current Status
machine-aided skim reading notion coined
basic theoretical background proposed
a prototype implemented (general and biomedical versions)
http://pypi.python.org/pypi/skimmr_gt/0.1-a1
http://pypi.python.org/pypi/skimmr_bm/0.1-a1
Image source:
http://support.pacifichost.com
Next Steps
evaluation (with a gold standard and sample users)
dissemination and follow-ups (write-up, proposals)
back-end extensions:
more (complex) types of relations
proper APIs (development, web service, . . . )
database and/or cloud storage
front-end extensions:
smoother transition between the graphs
complex querying
additional visualisations (trends, focused provenances, . . . )
10 / 10