• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output
 

Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

on

  • 727 views

 

Statistics

Views

Total Views
727
Views on SlideShare
727
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This one uses the division labels from the IEEE web site to show the data distribution. Purple of IEEE, red is Mesh, blue is DTIC
  • Blob plot – 1998 IEEE terms only – size of node relative to number of documents indexing the thesaurus branch below the given term.Colored by IEEE division. Yellow is Division VI – mostly governance and general science/engineering – cross-cutting.
  • IEEE Transactions on Information Theory
  • IEEE Transactions on Magnetics
  • IEEE only – term clusters linearized
  • Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  • IEEE only. Circular plot showing all IEEE output. IEEE term clusters from linear plot ordered around circle starting at dot (top in linear) and going counterclockwise.
  • Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  • Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  • IEEE + DTIC (blue) + MeSH (red)Labels indicate positions of key terms and IEEE division numbers

Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output Presentation Transcript

  • SciTech Strategies, Inc. Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output Kevin W. Boyack Marjorie M.K. Hlava Feb 26, 2010
  • Agenda  Work in progress presentation  Introduction » Science mapping background » Questions with visual answers  Mapping IEEE thesaurus space » Expanding thesaurus space to include adjacencies  Overlay data on thesaurus space » Compare databases » Compare journals » Trends  SummarySciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 2
  • Science mapping  30-40 year tradition of science mapping » Well-established methodologies » Current computing power and data availability enable large scale mapping and analysis  Science maps can/have been created using » Articles » Journals » Authors » Terms  Maps used for communication, strategy, planning, evaluation …SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 3
  • SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 4
  • SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 5
  • Questions with visual answers  From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points?  From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure?SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 6
  • Preparing the data  Index 1.2 Million eXplore records » Using the IEEE Thesaurus » Using the MeSH - Medical Subject Headings » Using the DTIC Thesaurus  Normalize and enrich the XML as needed  Create an XML / SQL Database  Look for outlyers  Massage for imagesSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 7
  • Mapping IEEE thesaurus space  Simple map – process » Obtain IEEE thesaurus » Index IEEE content (assign thesaurus terms to documents) » Calculate relationships between thesaurus terms » Map thesaurus terms based on relationships 6k terms 6k terms IEEE IEEE 1.2M documents 6k terms TERM MAPSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 8
  • Mapping IEEE thesaurus space  We are more interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion » Similar process to that for simple map except … » We need additional terms to add  Criteria for additional terms » Low occurrence rate in IEEE documents » Linkage to terms in IEEE documents » Similar level of detail to current IEEE thesaurus terms  Where do we find these terms? How can we add them?SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 9
  • Defining expanded term space 0. Desired result 6k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 10
  • Defining expanded term space 1. Limit IEEE thesaurusSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 11
  • Defining expanded term space 2. Select related corpus’ 475k patents 14k DTIC 2k terms IEEE 1.2M documents 24k MeSH PubMed 525k docsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 12
  • Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 13
  • Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 14
  • Defining expanded term space 4. Resulting term set 2k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 15
  • Clustering of terms (loose clustering)SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 16
  • Clustering of terms (tight clustering)SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 17
  • Remove non-linked MeSHSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 18
  • Cluster the term clustersSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 19
  • Linearize the term cluster orderSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 20
  • IEEE corpus distribution over topicsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 21
  • USPTO corpus distribution over topicsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 22
  • PubMed corpus distribution over topicsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 23
  • SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Summary  Term space can be mapped effectively  The mapped space can be used to show distributions and trends that give answers to questions » Database distribution comparisons » Journal / segment distribution comparisons (overlaps) » Journal / segment trending » Identify groups of terms that need trimming (rule base changes)SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 25
  • Radial thesaurus structureSciTech Strategies, Inc. Ordered by division
  • IEEE T Magnetics Purple – Magnetics heading Orange – all otherSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony