Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction Method Results Summary
Contextualization of Topics
browsing through terms, authors, journals and cluster allo...
Introduction Method Results Summary
Introduction
What are essence and boundary of a scientific field?
Different ways to find c...
Introduction Method Results Summary
Ariadne: interactive context explorer
Ariadne is an interactive interface which allows...
Introduction Method Results Summary
Research questions
Q1: How does the Ariadne algorithm work on a much smaller,
field spe...
Introduction Method Results Summary
LittleAriadne
LittleAriadne: context explorer over Astrophysics data
Offline: generates ...
Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Tran...
Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Tran...
Introduction Method Results Summary
LittleAriadne
Six different clustering solutions
x Source y=#Cluster #Cluster in Little...
Introduction Method Results Summary
LittleAriadne
Entities in the Astrophysis dataset
There are in total 90,343 entities a...
Introduction Method Results Summary
LittleAriadne
Build semantic representation
Basic assumptions
Entities can be represen...
Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Tran...
Introduction Method Results Summary
LittleAriadne
Dimension reduction using Random Projection
mass
transfer
rate
[subject:...
Introduction Method Results Summary
LittleAriadne
Dimension reduction using Random Projection
mass
transfer
rate
[subject:...
Introduction Method Results Summary
LittleAriadne
From semantic representation to visualisation and more
Each entity has i...
Introduction Method Results Summary
LittleAriadne
From semantic representation to visualisation and more
Each entity has i...
Introduction Method Results Summary
Experiment 1: Exploring context
Experiment 1: Exploring context
Now we can explore
Let...
Introduction Method Results Summary
Experiment 1: Exploring context
Contextual view of stars
Introduction Method Results Summary
Experiment 2: Labelling clusters
Experiment 2: Labelling clusters
What is cluster a 2?
Introduction Method Results Summary
Experiment 2: Labelling clusters
Experiment 2: Labelling clusters
Introduction Method Results Summary
Experiment 2: Labelling clusters
Experiment 2: Labelling clusters
Cluster ID Top 9 mos...
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
Experiment 3: Comparing clustering soluti...
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
Highly similar clustering solutions
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
Partially agreeing clustering solutions
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
An overview of all clustering solutions
Introduction Method Results Summary
Summary
Summary
We present a method and an interface that allows visual
exploration th...
Introduction Method Results Summary
Future extensions
Future extensions
Add more types of entities, such as citations, pub...
Introduction Method Results Summary
Thank you
Thank you
http://thoth.pica.nl/astro/relate
Rob Koopman (rob.koopman@oclc.or...
Upcoming SlideShare
Loading in …5
×

Contextualization of topics - browsing through terms, authors, journals and cluster allocations

Presentation at the 15th International Society of Scientometrics and Informetrics Conference, 29 June to 4 July, Istanbul, Turkey (http://www.issi2015.org).

  • Login to see the comments

Contextualization of topics - browsing through terms, authors, journals and cluster allocations

  1. 1. Introduction Method Results Summary Contextualization of Topics browsing through terms, authors, journals and cluster allocations Rob Koopman1 Shenghui Wang1 Andrea Scharnhorst2 1OCLC Research 2DANS-KNAW ISSI 2015
  2. 2. Introduction Method Results Summary Introduction What are essence and boundary of a scientific field? Different ways to find clusters in scientific literature based on connectivity in terms of authorship, citations, language similarity, etc. Ambiguous nature in science
  3. 3. Introduction Method Results Summary Ariadne: interactive context explorer Ariadne is an interactive interface which allows users to explore the context of entities such as authors, journals, topical terms, etc. It builds on semantic indexing statistically computed from a large scale bibliographic corpus It was originally implemented to explore 1M topical terms, 3M authors, 35K journals and 700+ Dewey decimal classes associated with 65M articles.
  4. 4. Introduction Method Results Summary Research questions Q1: How does the Ariadne algorithm work on a much smaller, field specific dataset? Q2: Can we use Ariadne to label the clusters produce by the different methods? Q3: Can we use Ariadne to compare different clustering solutions?
  5. 5. Introduction Method Results Summary LittleAriadne LittleAriadne: context explorer over Astrophysics data Offline: generates a semantic representation for each entity Online: finds the most related entities and using multidimensional scaling to display
  6. 6. Introduction Method Results Summary LittleAriadne An example article Article ID ISI:000276828000006 Title On the Mass Transfer Rate in SS Cyg Abstract The mass transfer rate in SS Cyg at quiescence, estimated from the observed luminosity of the hot spot, is log M-tr = 16.8 +/- 0.3. This is safely below the critical mass transfer rates of log M-crit = 18.1 (corresponding to log T-crit(0) = 3.88) or log M-crit = 17.2 (corresponding to the “revised” value of log T-crit(0) = 3.65). The mass transfer rate during outbursts is strongly enhanced Author [author:smak j] ISSN [issn:0001-5237] Subject [subject:accretion, accretion disks] [subject:cataclysmic variables] [subject:disc instability model] [subject:dwarf novae] [subject:novae, cataclysmic variables] [subject:outbursts] [subject:parameters] [subject:stars] [subject:stars dwarf novae] [subject:stars individual ss cyg] [subject:state] [subject: superoutbursts]
  7. 7. Introduction Method Results Summary LittleAriadne An example article Article ID ISI:000276828000006 Title On the Mass Transfer Rate in SS Cyg Abstract The mass transfer rate in SS Cyg at quiescence, estimated from the observed luminosity of the hot spot, is log M-tr = 16.8 +/- 0.3. This is safely below the critical mass transfer rates of log M-crit = 18.1 (corresponding to log T-crit(0) = 3.88) or log M-crit = 17.2 (corresponding to the “revised” value of log T-crit(0) = 3.65). The mass transfer rate during outbursts is strongly enhanced Author [author:smak j] ISSN [issn:0001-5237] Subject [subject:accretion, accretion disks] [subject:cataclysmic variables] [subject:disc instability model] [subject:dwarf novae] [subject:novae, cataclysmic variables] [subject:outbursts] [subject:parameters] [subject:stars] [subject:stars dwarf novae] [subject:stars individual ss cyg] [subject:state] [subject: superoutbursts] Cluster label [cluster:a 19] [cluster:b 16] [cluster:c 15] [cluster:d 51] [cluster:e 17] [cluster:f 1]
  8. 8. Introduction Method Results Summary LittleAriadne Six different clustering solutions x Source y=#Cluster #Cluster in LittleAriadne a cwts 1.8 23 23 b UMSI 23 23 c oclc 20 20 20 d hu 139 48 e sts 5664 229 f ECOOM 15 15
  9. 9. Introduction Method Results Summary LittleAriadne Entities in the Astrophysis dataset There are in total 90,343 entities associated with 111,616 astrophysics articles 59 journals 27,027 author names (no disambiguation applied) 39,577 topical terms 23,322 subjects (extracted from ”Author Keywords” and ”Keywords Plus”) 358 cluster labels (source + cluster id)
  10. 10. Introduction Method Results Summary LittleAriadne Build semantic representation Basic assumptions Entities can be represented by its context Entities which share more context are more likely to be related Context is the textual environment where an entity occurs
  11. 11. Introduction Method Results Summary LittleAriadne An example article Article ID ISI:000276828000006 Title On the Mass Transfer Rate in SS Cyg Abstract The mass transfer rate in SS Cyg at quiescence, estimated from the observed luminosity of the hot spot, is log M-tr = 16.8 +/- 0.3. This is safely below the critical mass transfer rates of log M-crit = 18.1 (corresponding to log T-crit(0) = 3.88) or log M-crit = 17.2 (corresponding to the “revised” value of log T-crit(0) = 3.65). The mass transfer rate during outbursts is strongly enhanced Author [author:smak j] ISSN [issn:0001-5237] Subject [subject:accretion, accretion disks] [subject:cataclysmic variables] [subject:disc instability model] [subject:dwarf novae] [subject:novae, cataclysmic variables] [subject:outbursts] [subject:parameters] [subject:stars] [subject:stars dwarf novae] [subject:stars individual ss cyg] [subject:state] [subject: superoutbursts] Cluster label [cluster:a 19] [cluster:b 16] [cluster:c 15] [cluster:d 51] [cluster:e 17] [cluster:f 1]
  12. 12. Introduction Method Results Summary LittleAriadne Dimension reduction using Random Projection mass transfer rate [subject:outburst] [subject:sstars] [subject:parameters] [author:smak j] [cluster: a19] [issn:0001-5237]
  13. 13. Introduction Method Results Summary LittleAriadne Dimension reduction using Random Projection mass transfer rate [subject:outburst] [subject:sstars] [subject:parameters] [author:smak j] [cluster: a19] [issn:0001-5237]
  14. 14. Introduction Method Results Summary LittleAriadne From semantic representation to visualisation and more Each entity has its semantic representation Cosine similarity between entities can be computed very fast, based on which the 2D visualisation is implemented
  15. 15. Introduction Method Results Summary LittleAriadne From semantic representation to visualisation and more Each entity has its semantic representation Cosine similarity between entities can be computed very fast, based on which the 2D visualisation is implemented For each article, we collected the semantic representation of all the entities in which it involves, and take an average as its semantic representation We applied a standard K-means clustering method to cluster these articles based on their semantic representations
  16. 16. Introduction Method Results Summary Experiment 1: Exploring context Experiment 1: Exploring context Now we can explore Let’s start with stars An overview of all journals
  17. 17. Introduction Method Results Summary Experiment 1: Exploring context Contextual view of stars
  18. 18. Introduction Method Results Summary Experiment 2: Labelling clusters Experiment 2: Labelling clusters What is cluster a 2?
  19. 19. Introduction Method Results Summary Experiment 2: Labelling clusters Experiment 2: Labelling clusters
  20. 20. Introduction Method Results Summary Experiment 2: Labelling clusters Experiment 2: Labelling clusters Cluster ID Top 9 most related topical terms a 2 ”cosmology” ”dark energy” ”density perturbations” ”cosmologies” ”planck” ”cosmological” ”spatial curvature” ”inflationary” ”inflation” b 2 ”cosmology” ”cosmological constant” ”cosmologies” ”cosmological” ”universes” ”dark energy” ”quadratic” ”tensor” ”planck” c 17 ”power spectrum” ”cosmological parameters” ”cmb” ”last scattering” ”anisotropies” ”microwave background” ”power spectra” ”planck” ”cosmic microwave” d 28 ”density perturbations” ”inflationary” ”inflation” ”dark energy” ”scale invariant” ”spatial curvature” ”cosmological perturbations” ”inflationary models” ”cosmologies”
  21. 21. Introduction Method Results Summary Experiment 3: Comparing clustering solutions Experiment 3: Comparing clustering solutions Cluster labels are treated as entities Let’s compare
  22. 22. Introduction Method Results Summary Experiment 3: Comparing clustering solutions Highly similar clustering solutions
  23. 23. Introduction Method Results Summary Experiment 3: Comparing clustering solutions Partially agreeing clustering solutions
  24. 24. Introduction Method Results Summary Experiment 3: Comparing clustering solutions An overview of all clustering solutions
  25. 25. Introduction Method Results Summary Summary Summary We present a method and an interface that allows visual exploration through the contexts of entities We can provide the most related topical terms to clusters although expert knowledge is needed to transform them into real labels/topics LittleAriadne provides a visual way of comparing different clustering solutions Our na¨ıve way of clustering is worth exploring further
  26. 26. Introduction Method Results Summary Future extensions Future extensions Add more types of entities, such as citations, publishers, conferences, etc, to provide richer context Add direct links to articles to answer information retrieval needs Study context sensitivity compare ”young” and ”young”
  27. 27. Introduction Method Results Summary Thank you Thank you http://thoth.pica.nl/astro/relate Rob Koopman (rob.koopman@oclc.org) Shenghui Wang (shenghui.wang@oclc.org) Andrea Scharnhorst (andrea.scharnhorst@dans.knaw.nl)

×