SlideShare a Scribd company logo
SciTech Strategies, Inc.




        Found in Space: Creating and
   Visualizing IEEE Abstract Space for
                    Publication Output
                             Kevin W. Boyack
                           Marjorie M.K. Hlava
                                  Feb 26, 2010
Agenda
          Work in progress presentation
          Introduction
           »   Science mapping background
           »   Questions with visual answers
          Mapping IEEE thesaurus space
           »   Expanding thesaurus space to include adjacencies
          Overlay data on thesaurus space
           »   Compare databases
           »   Compare journals
           »   Trends
          Summary

SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               2
Science mapping
          30-40 year tradition of science mapping
           »   Well-established methodologies
           »   Current computing power and data availability enable large
               scale mapping and analysis
          Science maps can/have been created using
           »   Articles
           »   Journals
           »   Authors
           »   Terms
          Maps used for communication, strategy, planning,
           evaluation …


SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               3
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              4
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              5
Questions with visual answers
          From a society / publisher perspective
           »   Which topical areas form our core? periphery?
           »   Where is the coverage dense? thin?
           »   Which topical areas are most active? least active?
           »   Which topical areas seem to be emerging? declining?
           »   Which topical areas are interrelated? isolated?
           »   What are the overlaps between journals / segments?
           »   Where are the potential expansion points?
          From a thesaurus perspective
           »   What terms are too broadly defined?
           »   How do actual topical relationships differ from the thesaurus
               structure?


SciTech Strategies     Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                6
Preparing the data
          Index 1.2 Million eXplore records
           »   Using the IEEE Thesaurus
           »   Using the MeSH - Medical Subject Headings
           »   Using the DTIC Thesaurus
        Normalize and enrich the XML as needed
        Create an XML / SQL Database
        Look for outlyers
        Massage for images




SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               7
Mapping IEEE thesaurus space
          Simple map – process
           »       Obtain IEEE thesaurus
           »       Index IEEE content (assign thesaurus terms to documents)
           »       Calculate relationships between thesaurus terms
           »       Map thesaurus terms based on relationships




                                                                                           6k terms
               6k terms




                                  IEEE                                                                IEEE

                            1.2M documents                                                            6k terms




                                                                                               TERM MAP

SciTech Strategies        Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                   8
Mapping IEEE thesaurus space
          We are more interested in an expanded map that
           includes adjacencies to the IEEE data
           »   Expanded term set shows adjacent white space; opportunities
               for expansion
           »   Similar process to that for simple map except …
           »   We need additional terms to add
          Criteria for additional terms
           »   Low occurrence rate in IEEE documents
           »   Linkage to terms in IEEE documents
           »   Similar level of detail to current IEEE thesaurus terms
          Where do we find these terms? How can we add them?


SciTech Strategies     Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                9
Defining expanded term space
                                              0. Desired result




                                   6k terms


                                                       IEEE
                                                 1.2M documents




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              10
Defining expanded term space
                                   1. Limit IEEE thesaurus




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              11
Defining expanded term space
                                          2. Select related corpus’



                     475k patents
                     14k DTIC




                                              2k terms


                                                                  IEEE
                                                            1.2M documents




                                                                                                24k MeSH
                                                                                                     PubMed
                                                                                                    525k docs



SciTech Strategies              Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                                         12
Defining expanded term space
                                   3. Identify related terms




                                    2k terms


                                                        IEEE
                                                  1.2M documents




SciTech Strategies   Better Maps    Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               13
Defining expanded term space
                                   3. Identify related terms




                                    2k terms


                                                        IEEE
                                                  1.2M documents




SciTech Strategies   Better Maps    Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               14
Defining expanded term space
                                   4. Resulting term set




                                   2k terms


                                                       IEEE
                                                 1.2M documents




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              15
Clustering of terms (loose clustering)




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              16
Clustering of terms (tight clustering)




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              17
Remove non-linked MeSH




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              18
Cluster the term clusters




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              19
Linearize the term cluster order




SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               20
IEEE corpus distribution over topics




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              21
USPTO corpus distribution over topics




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              22
PubMed corpus distribution over topics




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                              23
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Summary
          Term space can be mapped effectively
          The mapped space can be used to show distributions
           and trends that give answers to questions
           »   Database distribution comparisons
           »   Journal / segment distribution comparisons (overlaps)
           »   Journal / segment trending
           »   Identify groups of terms that need trimming (rule base changes)




SciTech Strategies    Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
                                                                                                                               25
Radial thesaurus structure
SciTech Strategies, Inc.   Ordered by division
IEEE T Magnetics
                                                                                       Purple – Magnetics heading
                                                                                       Orange – all other




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
               Division   II
               Division   III
               Division   IV
               Division   V
               Division   VI
               Division   VII
               Division   IX
               Division   X
               Multiple

SciTech Strategies              Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
Division   I
                                                                                                     Division   II
                                                                                                     Division   III
                                                                                                     Division   IV
                                                                                                     Division   V
                                                                                                     Division   VI
                                                                                                     Division   VII
                                                                                                     Division   IX
                                                                                                     Division   X
                                                                                                     Multiple




SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
SciTech Strategies   Better Maps   Better Decisions   Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony

More Related Content

Viewers also liked

photoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_mintaphotoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_mintaKrist P
 
Drogas
DrogasDrogas
Drogas
leninbailon
 
Rotulación de negocio
Rotulación de negocioRotulación de negocio
Rotulación de negocio
Impresion Total S.A.
 
Educación virtual
Educación virtualEducación virtual
Educación virtual
juan manuel chi
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
TSoholt
 
F I N A L P O W E R P O I N T
F I N A L  P O W E R P O I N TF I N A L  P O W E R P O I N T
F I N A L P O W E R P O I N T
Nicole Busch
 
Product Development Process
Product Development ProcessProduct Development Process
Product Development Process
James Young
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author Networks
TSoholt
 
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Lynn Pinder
 
windows_8_biblia_minta
windows_8_biblia_mintawindows_8_biblia_minta
windows_8_biblia_mintaKrist P
 
Phoffa
PhoffaPhoffa
2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards Update
TSoholt
 
International schools in abu dhabi
International schools in abu dhabiInternational schools in abu dhabi
International schools in abu dhabi
GIIS AbuDhabi
 
Tour of language landscape
Tour of language landscapeTour of language landscape
Tour of language landscape
Yan Cui
 
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNowMeasuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
ThousandEyes
 
Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1
Joyce Lim
 
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
Massoud AlShareef
 
Modelling game economy with neo4j Oredev
Modelling game economy with neo4j OredevModelling game economy with neo4j Oredev
Modelling game economy with neo4j Oredev
Yan Cui
 

Viewers also liked (18)

photoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_mintaphotoshop_elements_13_biblia_minta
photoshop_elements_13_biblia_minta
 
Drogas
DrogasDrogas
Drogas
 
Rotulación de negocio
Rotulación de negocioRotulación de negocio
Rotulación de negocio
 
Educación virtual
Educación virtualEducación virtual
Educación virtual
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
 
F I N A L P O W E R P O I N T
F I N A L  P O W E R P O I N TF I N A L  P O W E R P O I N T
F I N A L P O W E R P O I N T
 
Product Development Process
Product Development ProcessProduct Development Process
Product Development Process
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author Networks
 
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
Tubman City News' 2016 Baltimore City Council Candidates' Primary Election Qu...
 
windows_8_biblia_minta
windows_8_biblia_mintawindows_8_biblia_minta
windows_8_biblia_minta
 
Phoffa
PhoffaPhoffa
Phoffa
 
2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards Update
 
International schools in abu dhabi
International schools in abu dhabiInternational schools in abu dhabi
International schools in abu dhabi
 
Tour of language landscape
Tour of language landscapeTour of language landscape
Tour of language landscape
 
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNowMeasuring and Troubleshooting Performance of Global Data Centers at ServiceNow
Measuring and Troubleshooting Performance of Global Data Centers at ServiceNow
 
Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1Personal Income Tax 2016 Guide Part 1
Personal Income Tax 2016 Guide Part 1
 
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية   دي سبيّس 5 كأنم...
أهمية نظم المستودعات الرقمية مفتوحة المصدر للجامعات العربية دي سبيّس 5 كأنم...
 
Modelling game economy with neo4j Oredev
Modelling game economy with neo4j OredevModelling game economy with neo4j Oredev
Modelling game economy with neo4j Oredev
 

Similar to Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Access Innovations, Inc.
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
Dr. Haxel Consult
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content
Access Innovations, Inc.
 
Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012
Alan Yagoda
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
Artificial Intelligence Institute at UofSC
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
Dr.-Ing. Thomas Hartmann
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Artificial Intelligence Institute at UofSC
 
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
National Information Standards Organization (NISO)
 
Semantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital LibrariesSemantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital Libraries
Getaneh Alemu
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Amit Sheth
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
CS, NcState
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
ASIS&T
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
Graham Pryor
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
LIBER Europe
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
Dr.-Ing. Thomas Hartmann
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
Bradley Allen
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research DataNISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
National Information Standards Organization (NISO)
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Artificial Intelligence Institute at UofSC
 
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Alan Yagoda
 

Similar to Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output (20)

Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content
 
Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012Elsevier Smart Content LDR SemTech 2012
Elsevier Smart Content LDR SemTech 2012
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
 
Semantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital LibrariesSemantic Metadata Interoperability in Digital Libraries
Semantic Metadata Interoperability in Digital Libraries
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research DataNISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
 

Recently uploaded

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output

  • 1. SciTech Strategies, Inc. Found in Space: Creating and Visualizing IEEE Abstract Space for Publication Output Kevin W. Boyack Marjorie M.K. Hlava Feb 26, 2010
  • 2. Agenda  Work in progress presentation  Introduction » Science mapping background » Questions with visual answers  Mapping IEEE thesaurus space » Expanding thesaurus space to include adjacencies  Overlay data on thesaurus space » Compare databases » Compare journals » Trends  Summary SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 2
  • 3. Science mapping  30-40 year tradition of science mapping » Well-established methodologies » Current computing power and data availability enable large scale mapping and analysis  Science maps can/have been created using » Articles » Journals » Authors » Terms  Maps used for communication, strategy, planning, evaluation … SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 3
  • 4. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 4
  • 5. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 5
  • 6. Questions with visual answers  From a society / publisher perspective » Which topical areas form our core? periphery? » Where is the coverage dense? thin? » Which topical areas are most active? least active? » Which topical areas seem to be emerging? declining? » Which topical areas are interrelated? isolated? » What are the overlaps between journals / segments? » Where are the potential expansion points?  From a thesaurus perspective » What terms are too broadly defined? » How do actual topical relationships differ from the thesaurus structure? SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 6
  • 7. Preparing the data  Index 1.2 Million eXplore records » Using the IEEE Thesaurus » Using the MeSH - Medical Subject Headings » Using the DTIC Thesaurus  Normalize and enrich the XML as needed  Create an XML / SQL Database  Look for outlyers  Massage for images SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 7
  • 8. Mapping IEEE thesaurus space  Simple map – process » Obtain IEEE thesaurus » Index IEEE content (assign thesaurus terms to documents) » Calculate relationships between thesaurus terms » Map thesaurus terms based on relationships 6k terms 6k terms IEEE IEEE 1.2M documents 6k terms TERM MAP SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 8
  • 9. Mapping IEEE thesaurus space  We are more interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion » Similar process to that for simple map except … » We need additional terms to add  Criteria for additional terms » Low occurrence rate in IEEE documents » Linkage to terms in IEEE documents » Similar level of detail to current IEEE thesaurus terms  Where do we find these terms? How can we add them? SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 9
  • 10. Defining expanded term space 0. Desired result 6k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 10
  • 11. Defining expanded term space 1. Limit IEEE thesaurus SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 11
  • 12. Defining expanded term space 2. Select related corpus’ 475k patents 14k DTIC 2k terms IEEE 1.2M documents 24k MeSH PubMed 525k docs SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 12
  • 13. Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 13
  • 14. Defining expanded term space 3. Identify related terms 2k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 14
  • 15. Defining expanded term space 4. Resulting term set 2k terms IEEE 1.2M documents SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 15
  • 16. Clustering of terms (loose clustering) SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 16
  • 17. Clustering of terms (tight clustering) SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 17
  • 18. Remove non-linked MeSH SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 18
  • 19. Cluster the term clusters SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 19
  • 20. Linearize the term cluster order SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 20
  • 21. IEEE corpus distribution over topics SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 21
  • 22. USPTO corpus distribution over topics SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 22
  • 23. PubMed corpus distribution over topics SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 23
  • 24. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 25. Summary  Term space can be mapped effectively  The mapped space can be used to show distributions and trends that give answers to questions » Database distribution comparisons » Journal / segment distribution comparisons (overlaps) » Journal / segment trending » Identify groups of terms that need trimming (rule base changes) SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony 25
  • 26. Radial thesaurus structure SciTech Strategies, Inc. Ordered by division
  • 27. IEEE T Magnetics Purple – Magnetics heading Orange – all other SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 28. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 29. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 30. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 31. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 32. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 33. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 34. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 35. Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X Multiple SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
  • 36. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony

Editor's Notes

  1. This one uses the division labels from the IEEE web site to show the data distribution. Purple of IEEE, red is Mesh, blue is DTIC
  2. Blob plot – 1998 IEEE terms only – size of node relative to number of documents indexing the thesaurus branch below the given term.Colored by IEEE division. Yellow is Division VI – mostly governance and general science/engineering – cross-cutting.
  3. IEEE Transactions on Information Theory
  4. IEEE Transactions on Magnetics
  5. IEEE only – term clusters linearized
  6. Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  7. IEEE only. Circular plot showing all IEEE output. IEEE term clusters from linear plot ordered around circle starting at dot (top in linear) and going counterclockwise.
  8. Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  9. Purple – IEEE Transactions on MagneticsBlue – IEEE Transactions on Information Theory
  10. IEEE + DTIC (blue) + MeSH (red)Labels indicate positions of key terms and IEEE division numbers