SciTech Strategies, Inc.        Found in Space: Creating and   Visualizing IEEE Abstract Space for                    Publ...
Agenda          Work in progress presentation          Introduction           »   Science mapping background           »...
Science mapping          30-40 year tradition of science mapping           »   Well-established methodologies           »...
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data H...
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data H...
Questions with visual answers          From a society / publisher perspective           »   Which topical areas form our ...
Preparing the data          Index 1.2 Million eXplore records           »   Using the IEEE Thesaurus           »   Using ...
Mapping IEEE thesaurus space          Simple map – process           »       Obtain IEEE thesaurus           »       Inde...
Mapping IEEE thesaurus space          We are more interested in an expanded map that           includes adjacencies to th...
Defining expanded term space                                              0. Desired result                               ...
Defining expanded term space                                   1. Limit IEEE thesaurusSciTech Strategies   Better Maps   B...
Defining expanded term space                                          2. Select related corpus’                     475k p...
Defining expanded term space                                   3. Identify related terms                                  ...
Defining expanded term space                                   3. Identify related terms                                  ...
Defining expanded term space                                   4. Resulting term set                                   2k ...
Clustering of terms (loose clustering)SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic En...
Clustering of terms (tight clustering)SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic En...
Remove non-linked MeSHSciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Acces...
Cluster the term clustersSciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Ac...
Linearize the term cluster orderSciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichm...
IEEE corpus distribution over topicsSciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enri...
USPTO corpus distribution over topicsSciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enr...
PubMed corpus distribution over topicsSciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic En...
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data H...
Summary          Term space can be mapped effectively          The mapped space can be used to show distributions       ...
Radial thesaurus structureSciTech Strategies, Inc.   Ordered by division
IEEE T Magnetics                                                                                       Purple – Magnetics ...
Division   I               Division   II               Division   III               Division   IV               Division  ...
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data H...
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data H...
Division   I                                                                                                     Division ...
Division   I                                                                                                     Division ...
Division   I                                                                                                     Division ...
Division   I                                                                                                     Division ...
Division   I                                                                                                     Division ...
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data H...
Upcoming SlideShare
Loading in...5
×

Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

715

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
715
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This one uses the division labels from the IEEE web site to show the data distribution. Purple of IEEE, red is Mesh, blue is DTIC
  • Blob plot – 1998 IEEE terms only – size of node relative to number of documents indexing the thesaurus branch below the given term.Colored by IEEE division. Yellow is Division VI – mostly governance and general science/engineering – cross-cutting.
  • IEEE Transactions on Information Theory
  • IEEE Transactions on Magnetics
  • IEEE only – term clusters linearized
  • Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  • IEEE only. Circular plot showing all IEEE output. IEEE term clusters from linear plot ordered around circle starting at dot (top in linear) and going counterclockwise.
  • Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  • Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  • IEEE + DTIC (blue) + MeSH (red)Labels indicate positions of key terms and IEEE division numbers
  • Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

    1. 1. SciTech Strategies, Inc. Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output Kevin W. Boyack Marjorie M.K. Hlava Feb 26, 2010
    2. 2. Agenda  Work in progress presentation  Introduction » Science mapping background » Questions with visual answers  Mapping IEEE thesaurus space » Expanding thesaurus space to include adjacencies  Overlay data on thesaurus space » Compare databases » Compare journals » Trends  SummarySciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 2
    3. 3. Science mapping  30-40 year tradition of science mapping » Well-established methodologies » Current computing power and data availability enable large scale mapping and analysis  Science maps can/have been created using » Articles » Journals » Authors » Terms  Maps used for communication, strategy, planning, evaluation …SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 3
    4. 4. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 4
    5. 5. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 5
    6. 6. Questions with visual answers  From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points?  From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure?SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 6
    7. 7. Preparing the data  Index 1.2 Million eXplore records » Using the IEEE Thesaurus » Using the MeSH - Medical Subject Headings » Using the DTIC Thesaurus  Normalize and enrich the XML as needed  Create an XML / SQL Database  Look for outlyers  Massage for imagesSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 7
    8. 8. Mapping IEEE thesaurus space  Simple map – process » Obtain IEEE thesaurus » Index IEEE content (assign thesaurus terms to documents) » Calculate relationships between thesaurus terms » Map thesaurus terms based on relationships 6k terms 6k terms IEEE IEEE 1.2M documents 6k terms TERM MAPSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 8
    9. 9. Mapping IEEE thesaurus space  We are more interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion » Similar process to that for simple map except … » We need additional terms to add  Criteria for additional terms » Low occurrence rate in IEEE documents » Linkage to terms in IEEE documents » Similar level of detail to current IEEE thesaurus terms  Where do we find these terms? How can we add them?SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 9
    10. 10. Defining expanded term space 0. Desired result 6k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 10
    11. 11. Defining expanded term space 1. Limit IEEE thesaurusSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 11
    12. 12. Defining expanded term space 2. Select related corpus’ 475k patents 14k DTIC 2k terms IEEE 1.2M documents 24k MeSH PubMed 525k docsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 12
    13. 13. Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 13
    14. 14. Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 14
    15. 15. Defining expanded term space 4. Resulting term set 2k terms IEEE 1.2M documentsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 15
    16. 16. Clustering of terms (loose clustering)SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 16
    17. 17. Clustering of terms (tight clustering)SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 17
    18. 18. Remove non-linked MeSHSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 18
    19. 19. Cluster the term clustersSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 19
    20. 20. Linearize the term cluster orderSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 20
    21. 21. IEEE corpus distribution over topicsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 21
    22. 22. USPTO corpus distribution over topicsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 22
    23. 23. PubMed corpus distribution over topicsSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 23
    24. 24. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    25. 25. Summary  Term space can be mapped effectively  The mapped space can be used to show distributions and trends that give answers to questions » Database distribution comparisons » Journal / segment distribution comparisons (overlaps) » Journal / segment trending » Identify groups of terms that need trimming (rule base changes)SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 25
    26. 26. Radial thesaurus structureSciTech Strategies, Inc. Ordered by division
    27. 27. IEEE T Magnetics Purple – Magnetics heading Orange – all otherSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    28. 28. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    29. 29. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    30. 30. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    31. 31. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    32. 32. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    33. 33. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    34. 34. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    35. 35. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X MultipleSciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
    36. 36. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony

    ×