SlideShare a Scribd company logo
1 of 34
Using KOS as a basis for
    text analytics and trend
                 forecasting
NKOSS Networked Knowledge Organization Systems and Services
The 9th European NKOS Workshop at the 14th ECDL
Conference, Glasgow, Scotland
10th September 2010
                                            Marjorie M.K. Hlava
                                                 President
                                           mhlava@accessinn.com
                                                      www.accessinn.com



           © 2010. Access Innovations, Inc. All Rights Reserved.
Agenda
   IEEE Challenge
              Where are our publication strengths?
              What are the emerging topics?
              Use our own data to address these question
   Access Innovation’s Response
              Expanding and Mapping the IEEE Thesaurus
              Use term instead of text analytics to investigate
   Findings


2
      © 2010. Access Innovations, Inc. All Rights Reserved.
Who is the IEEE?




                                    3
    © 2010. Access Innovations, Inc. All Rights Reserved.
About IEEE…
   Founded in 1884, IEEE is the world’s largest professional association
    advancing technology for the benefit of humanity.

   We publish 148 technical journals, transactions and magazines,
    sponsor nearly 800 conferences annually, develop technology
    standards, and support the professional interests of more than
    400,000 members in over 160 countries.

   Members participate in 38 societies and 7 councils

   The IEEE Xplore® digital library provides access to IEEE journals,
    transactions, letters, magazines and conference proceedings, IET and
    other 3rd Party journals and conference proceedings, IEEE Standards
    and IEEE educational courses.
       Over 2.5 million documents

                                      4
      © 2010. Access Innovations, Inc. All Rights Reserved.
The New IEEE Xplore




                                   5
   © 2010. Access Innovations, Inc. All Rights Reserved.
Specific Challenges
   Is there a way, using IEEE information, to forecast future
    direction?
   Where is the industry headed? What about by technology
    sector?
   Does our coverage match the IEEE mission and vision?
   Can IEEE become smarter about their data and potential
    markets using their collection in new ways?
    Are the societies publishing and talking about what their
    charter indicates they cover?
   What are the trends – are topics emerging/cooling?
   Can IEEE use technology and their own data to explore
    these questions while enhancing their data?
                                           6
           © 2010. Access Innovations, Inc. All Rights Reserved.
Access Innovation’s
    Response
                 SciTech Strategies' Maps
                 Access Innovation’s Tools
                 IEEE Xplore data
                 Test with several thesauri




7
      © 2010. Access Innovations, Inc. All Rights Reserved.
Access Innovations / Data Harmony
   Founded in 1978
   Suite of Semantic Enrichment tools
   Updated the IEEE Thesaurus in 2005
   Built a rule base to auto index IEEE
    content
             “90 % accuracy out of the box on journal data”*
             “80% out of the box on proceedings data”*
       Auto indexed 1.2 million Xplore records
             With the IEEE thesaurus terms rule base
             With the MeSH rule base
             With DTIC rule base
            *Adam D. Philippidis, Manager, Indexing & Database Production, IEEE

    8
               © 2010. Access Innovations, Inc. All Rights Reserved.
SciTech Strategies, Inc.
   Founded in 1982 (Center for Research
    Planning)
   Bibliometric Modeling of very large datasets
            Thomson/ISI data (1982-2004)
            Elsevier/Scopus data (2004-present)
   Focus on Accuracy
            Disciplines > Research Communities > Researchers




9
          © 2010. Access Innovations, Inc. All Rights Reserved.
Disciplinary Map of Science




10
     © 2010. Access Innovations, Inc. All Rights Reserved.
Relevant Disciplines for Science Mapping




11
     © 2010. Access Innovations, Inc. All Rights Reserved.
Research Communities in the
                           Information Science Discipline




12
     © 2010. Access Innovations, Inc. All Rights Reserved.
Intellectual Base of
                   Two Research Communiities




13
     © 2010. Access Innovations, Inc. All Rights Reserved.
From a society / publisher
perspective visual answers
    Which topical areas form our core?
     Periphery?
    Where is the coverage dense? Thin?
    Which topical areas are most active? Least
     active?
    Which topical areas seem to be emerging?
     Declining?
    Which topical areas are interrelated? Isolated?
    What are the overlaps between journals /
     segments?

14
     Where are the potential expansion points?
        © 2010. Access Innovations, Inc. All Rights Reserved.
Questions with visual answers
    From a thesaurus perspective
    What terms are too broadly
     defined?
    How do actual topical
     relationships differ from the
     thesaurus structure?


15
       © 2010. Access Innovations, Inc. All Rights Reserved.
Preparing the data
        Index 1.2 Million Xplore records
                       Using the IEEE Thesaurus
                       Using the MeSH - Medical Subject
                        Headings
                       Using the DTIC Thesaurus
              Normalize and enrich the XML as
               needed
              Create an XML / SQL Database



16
             © 2010. Access Innovations, Inc. All Rights Reserved.
Mapping IEEE thesaurus space
    We are more interested in an expanded
     map that includes adjacencies to the
     IEEE data
              Expanded term set shows adjacent white
               space; opportunities for expansion
              Similar process to that for simple map
               except …
              We need additional terms to add


17
           © 2010. Access Innovations, Inc. All Rights Reserved.
Mapping IEEE thesaurus space
   Criteria for additional terms
             Low occurrence rate in IEEE documents
             Linkage to terms in IEEE documents
             Similar level of detail to current IEEE
              thesaurus terms
   Where do we find these terms? How can
    we add them?



       © 2010. Access Innovations, Inc. All Rights Reserved.
Defining expanded term space

                             1. Select related corpus’
            475k patents
            14k DTIC




                                  2k terms


                                                              IEEE
                                                   1.2M documents




                                                                     24k MeSH
                                                                          PubMed
                                                                         525k docs

 19
      © 2010. Access Innovations, Inc. All Rights Reserved.
Defining expanded term space
                            2. Identify related terms


                                   2k terms


                                                               IEEE
                                                    1.2M documents




20
       © 2010. Access Innovations, Inc. All Rights Reserved.
Defining expanded term space

                           2.      Identify related terms

                                 2k terms


                                                             IEEE
                                                  1.2M documents




21
     © 2010. Access Innovations, Inc. All Rights Reserved.
Defining expanded term space
                              3. Resulting term set


                                 2k terms


                                                             IEEE
                                                  1.2M documents




22
     © 2010. Access Innovations, Inc. All Rights Reserved.
Defining expanded term space

                            4. Term:Term Matrix




23
     © 2010. Access Innovations, Inc. All Rights Reserved.
Defining expanded term space

                      5. Visualization


                                            Visualization
      Matrix                                  Software




 24
               © 2010. Access Innovations, Inc. All Rights Reserved.
Sample term and data maps




        Division I
        Division II
        Division III
        Division IV
        Division V
        Division VI
        Division VII
        Division IX
        Division X




 25
      © 2010. Access Innovations, Inc. All Rights Reserved.
PubMed




       Division I
       Division II
       Division III
       Division IV
       Division V
       Division VI
       Division VII
       Division IX
       Division X




26
     © 2010. Access Innovations, Inc. All Rights Reserved.
IEEE Transactions on
       Antennas and Propagation




       Division I
       Division II
       Division III
       Division IV
       Division V
       Division VI
       Division VII
       Division IX
       Division X




27
     © 2010. Access Innovations, Inc. All Rights Reserved.
IEEE Transactions on
           Electron Devices




       Division I
       Division II
       Division III
       Division IV
       Division V
       Division VI
       Division VII
       Division IX
       Division X




28
     © 2010. Access Innovations, Inc. All Rights Reserved.
Findings
    Term space can be mapped effectively
    The mapped space can be used to show
     distributions and trends that give answers
     to questions
              Database distribution comparisons
              Journal / segment distribution comparisons
               (overlaps)
              Journal / segment trending
              Identify groups of terms that need trimming (rule
               base changes)

    29
              © 2010. Access Innovations, Inc. All Rights Reserved.
Answering the IEEE questions
   Yes, we can use IEEE information, to forecast future
    directions
   Yes, look at each industry by technology sector over
    time to see where it is headed.
   IEEE coverage does not match the IEEE mission
    and vision by industry sector
   Provides new ways for IEEE to become smarter
    about their data and potential markets using their
    collection
   The societies are not all publishing and talking about
    what their charter indicates they cover.

         © 2010. Access Innovations, Inc. All Rights Reserved.
Looking to the future
   We can see specific trends and which
    topics are emerging/cooling
   Using the IEEE data and these term
    analytics technology we can explore the
    future and the boundaries of the IEEE
    future




      © 2010. Access Innovations, Inc. All Rights Reserved.
Thank You
                                Marjorie M.K. Hlava, President,
                                Access Innovations / Data
                                  Harmony
                                mhlava@accessinn.com
                                Access Innovations
                                4725 Indian School NE Suite 100
                                Albuquerque, NM 87110
                                www.accessinn.com
                                (505) 998-0800 office
                                (505) 256-1080 fax
                                www.taxodiary.com the taxonomy news blog

© 2010. Access Innovations, Inc. All Rights Reserved.
Consensus: Thesaurus vs.
              Titles




33
      © 2010. Access Innovations, Inc. All Rights Reserved.   33
Scientific Social Networking
             Based on Metadata
   Idea has been here - Who is citing who like
       ISI does it with references
       API UniPHY does it using semantics
   Expand your options using
       good metadata and descriptors                               See the authors connections
    Map who is working in the
    field and where




            © 2010. Access Innovations, Inc. All Rights Reserved.

More Related Content

Viewers also liked

Electronic industries association
Electronic industries associationElectronic industries association
Electronic industries associationlindavargas33
 
Solving Real World Production Problems with Docker
Solving Real World Production Problems with DockerSolving Real World Production Problems with Docker
Solving Real World Production Problems with DockerMarc Campbell
 
Digital innovation and ERDF funding in Hungary
Digital innovation and ERDF funding in HungaryDigital innovation and ERDF funding in Hungary
Digital innovation and ERDF funding in HungarySKILLS+ project
 
المستودعات الالكترونية
المستودعات الالكترونيةالمستودعات الالكترونية
المستودعات الالكترونيةBeni-Suef University
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
IT-AAC Defense IT Reform Report to the Sec 809 Panel
IT-AAC Defense IT Reform Report to the Sec 809 PanelIT-AAC Defense IT Reform Report to the Sec 809 Panel
IT-AAC Defense IT Reform Report to the Sec 809 PanelJohn Weiler
 
Guía centros comerciales 2014
Guía centros comerciales 2014Guía centros comerciales 2014
Guía centros comerciales 2014Guillermo Avila
 

Viewers also liked (9)

Ifla
IflaIfla
Ifla
 
Electronic industries association
Electronic industries associationElectronic industries association
Electronic industries association
 
Solving Real World Production Problems with Docker
Solving Real World Production Problems with DockerSolving Real World Production Problems with Docker
Solving Real World Production Problems with Docker
 
Zadar County Overview
Zadar County OverviewZadar County Overview
Zadar County Overview
 
Digital innovation and ERDF funding in Hungary
Digital innovation and ERDF funding in HungaryDigital innovation and ERDF funding in Hungary
Digital innovation and ERDF funding in Hungary
 
المستودعات الالكترونية
المستودعات الالكترونيةالمستودعات الالكترونية
المستودعات الالكترونية
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
IT-AAC Defense IT Reform Report to the Sec 809 Panel
IT-AAC Defense IT Reform Report to the Sec 809 PanelIT-AAC Defense IT Reform Report to the Sec 809 Panel
IT-AAC Defense IT Reform Report to the Sec 809 Panel
 
Guía centros comerciales 2014
Guía centros comerciales 2014Guía centros comerciales 2014
Guía centros comerciales 2014
 

Similar to Using KOS as a Basis for Text Analytics and Trend Forecasting

2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards UpdateTSoholt
 
Found in Space: Creating and Visualizing IEEE Document Space
Found in Space: Creating and Visualizing IEEE Document SpaceFound in Space: Creating and Visualizing IEEE Document Space
Found in Space: Creating and Visualizing IEEE Document SpaceAccess Innovations, Inc.
 
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfuaIEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfuaAndy Shutka
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Matthew Petrillo
 
Final year IEEE projects for 2013-14
Final year IEEE projects for 2013-14Final year IEEE projects for 2013-14
Final year IEEE projects for 2013-14projectsepark
 
Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...
Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...
Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...Mehak Azeem
 
GoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiGoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiFriprogsenteret
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...AugmentedWorldExpo
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE
 
Application Of VLSI In Artificial Intelligence
Application Of VLSI In Artificial IntelligenceApplication Of VLSI In Artificial Intelligence
Application Of VLSI In Artificial IntelligenceDereck Downing
 
Application of VLSI In Artificial Intelligence
Application of VLSI In Artificial IntelligenceApplication of VLSI In Artificial Intelligence
Application of VLSI In Artificial IntelligenceIOSR Journals
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierDATAVERSITY
 
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3   session 1 - st dev con 2016 -ieee- iot standards adn open sourceTrack 3   session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open sourceST_World
 
Overview AG AKSW
Overview AG AKSWOverview AG AKSW
Overview AG AKSWSören Auer
 
The Internet of Things (IoT)
The Internet of Things (IoT)The Internet of Things (IoT)
The Internet of Things (IoT)Dadhaniya Renish
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 

Similar to Using KOS as a Basis for Text Analytics and Trend Forecasting (20)

2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards Update
 
Found in Space: Creating and Visualizing IEEE Document Space
Found in Space: Creating and Visualizing IEEE Document SpaceFound in Space: Creating and Visualizing IEEE Document Space
Found in Space: Creating and Visualizing IEEE Document Space
 
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfuaIEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
Final year IEEE projects for 2013-14
Final year IEEE projects for 2013-14Final year IEEE projects for 2013-14
Final year IEEE projects for 2013-14
 
Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...
Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...
Research Opportunities with IEEE/ Research Pattern/ Paper guidelines/ How to ...
 
GoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiGoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar Conradi
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)
 
Application Of VLSI In Artificial Intelligence
Application Of VLSI In Artificial IntelligenceApplication Of VLSI In Artificial Intelligence
Application Of VLSI In Artificial Intelligence
 
Application of VLSI In Artificial Intelligence
Application of VLSI In Artificial IntelligenceApplication of VLSI In Artificial Intelligence
Application of VLSI In Artificial Intelligence
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at Elsevier
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3   session 1 - st dev con 2016 -ieee- iot standards adn open sourceTrack 3   session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open source
 
158 162
158 162158 162
158 162
 
Overview AG AKSW
Overview AG AKSWOverview AG AKSW
Overview AG AKSW
 
The Internet of Things (IoT)
The Internet of Things (IoT)The Internet of Things (IoT)
The Internet of Things (IoT)
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 

More from TSoholt

Taxonomies for Publishing
Taxonomies for PublishingTaxonomies for Publishing
Taxonomies for PublishingTSoholt
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the CardsTSoholt
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...TSoholt
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksTSoholt
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTSoholt
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationTSoholt
 
Taxonomies in Search
Taxonomies in SearchTaxonomies in Search
Taxonomies in SearchTSoholt
 

More from TSoholt (7)

Taxonomies for Publishing
Taxonomies for PublishingTaxonomies for Publishing
Taxonomies for Publishing
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the Cards
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author Networks
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User Experience
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
 
Taxonomies in Search
Taxonomies in SearchTaxonomies in Search
Taxonomies in Search
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Using KOS as a Basis for Text Analytics and Trend Forecasting

  • 1. Using KOS as a basis for text analytics and trend forecasting NKOSS Networked Knowledge Organization Systems and Services The 9th European NKOS Workshop at the 14th ECDL Conference, Glasgow, Scotland 10th September 2010 Marjorie M.K. Hlava President mhlava@accessinn.com www.accessinn.com © 2010. Access Innovations, Inc. All Rights Reserved.
  • 2. Agenda  IEEE Challenge  Where are our publication strengths?  What are the emerging topics?  Use our own data to address these question  Access Innovation’s Response  Expanding and Mapping the IEEE Thesaurus  Use term instead of text analytics to investigate  Findings 2 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 3. Who is the IEEE? 3 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 4. About IEEE…  Founded in 1884, IEEE is the world’s largest professional association advancing technology for the benefit of humanity.  We publish 148 technical journals, transactions and magazines, sponsor nearly 800 conferences annually, develop technology standards, and support the professional interests of more than 400,000 members in over 160 countries.  Members participate in 38 societies and 7 councils  The IEEE Xplore® digital library provides access to IEEE journals, transactions, letters, magazines and conference proceedings, IET and other 3rd Party journals and conference proceedings, IEEE Standards and IEEE educational courses.  Over 2.5 million documents 4 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 5. The New IEEE Xplore 5 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 6. Specific Challenges  Is there a way, using IEEE information, to forecast future direction?  Where is the industry headed? What about by technology sector?  Does our coverage match the IEEE mission and vision?  Can IEEE become smarter about their data and potential markets using their collection in new ways? Are the societies publishing and talking about what their charter indicates they cover?  What are the trends – are topics emerging/cooling?  Can IEEE use technology and their own data to explore these questions while enhancing their data? 6 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 7. Access Innovation’s Response  SciTech Strategies' Maps  Access Innovation’s Tools  IEEE Xplore data  Test with several thesauri 7 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 8. Access Innovations / Data Harmony  Founded in 1978  Suite of Semantic Enrichment tools  Updated the IEEE Thesaurus in 2005  Built a rule base to auto index IEEE content  “90 % accuracy out of the box on journal data”*  “80% out of the box on proceedings data”*  Auto indexed 1.2 million Xplore records  With the IEEE thesaurus terms rule base  With the MeSH rule base  With DTIC rule base *Adam D. Philippidis, Manager, Indexing & Database Production, IEEE 8 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 9. SciTech Strategies, Inc.  Founded in 1982 (Center for Research Planning)  Bibliometric Modeling of very large datasets  Thomson/ISI data (1982-2004)  Elsevier/Scopus data (2004-present)  Focus on Accuracy  Disciplines > Research Communities > Researchers 9 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 10. Disciplinary Map of Science 10 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 11. Relevant Disciplines for Science Mapping 11 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 12. Research Communities in the Information Science Discipline 12 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 13. Intellectual Base of Two Research Communiities 13 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 14. From a society / publisher perspective visual answers  Which topical areas form our core? Periphery?  Where is the coverage dense? Thin?  Which topical areas are most active? Least active?  Which topical areas seem to be emerging? Declining?  Which topical areas are interrelated? Isolated?  What are the overlaps between journals / segments?  14 Where are the potential expansion points? © 2010. Access Innovations, Inc. All Rights Reserved.
  • 15. Questions with visual answers  From a thesaurus perspective  What terms are too broadly defined?  How do actual topical relationships differ from the thesaurus structure? 15 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 16. Preparing the data  Index 1.2 Million Xplore records  Using the IEEE Thesaurus  Using the MeSH - Medical Subject Headings  Using the DTIC Thesaurus  Normalize and enrich the XML as needed  Create an XML / SQL Database 16 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 17. Mapping IEEE thesaurus space  We are more interested in an expanded map that includes adjacencies to the IEEE data  Expanded term set shows adjacent white space; opportunities for expansion  Similar process to that for simple map except …  We need additional terms to add 17 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 18. Mapping IEEE thesaurus space  Criteria for additional terms  Low occurrence rate in IEEE documents  Linkage to terms in IEEE documents  Similar level of detail to current IEEE thesaurus terms  Where do we find these terms? How can we add them? © 2010. Access Innovations, Inc. All Rights Reserved.
  • 19. Defining expanded term space 1. Select related corpus’ 475k patents 14k DTIC 2k terms IEEE 1.2M documents 24k MeSH PubMed 525k docs 19 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 20. Defining expanded term space 2. Identify related terms 2k terms IEEE 1.2M documents 20 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 21. Defining expanded term space 2. Identify related terms 2k terms IEEE 1.2M documents 21 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 22. Defining expanded term space 3. Resulting term set 2k terms IEEE 1.2M documents 22 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 23. Defining expanded term space 4. Term:Term Matrix 23 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 24. Defining expanded term space 5. Visualization Visualization Matrix Software 24 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 25. Sample term and data maps Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X 25 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 26. PubMed Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X 26 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 27. IEEE Transactions on Antennas and Propagation Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X 27 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 28. IEEE Transactions on Electron Devices Division I Division II Division III Division IV Division V Division VI Division VII Division IX Division X 28 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 29. Findings  Term space can be mapped effectively  The mapped space can be used to show distributions and trends that give answers to questions  Database distribution comparisons  Journal / segment distribution comparisons (overlaps)  Journal / segment trending  Identify groups of terms that need trimming (rule base changes) 29 © 2010. Access Innovations, Inc. All Rights Reserved.
  • 30. Answering the IEEE questions  Yes, we can use IEEE information, to forecast future directions  Yes, look at each industry by technology sector over time to see where it is headed.  IEEE coverage does not match the IEEE mission and vision by industry sector  Provides new ways for IEEE to become smarter about their data and potential markets using their collection  The societies are not all publishing and talking about what their charter indicates they cover. © 2010. Access Innovations, Inc. All Rights Reserved.
  • 31. Looking to the future  We can see specific trends and which topics are emerging/cooling  Using the IEEE data and these term analytics technology we can explore the future and the boundaries of the IEEE future © 2010. Access Innovations, Inc. All Rights Reserved.
  • 32. Thank You Marjorie M.K. Hlava, President, Access Innovations / Data Harmony mhlava@accessinn.com Access Innovations 4725 Indian School NE Suite 100 Albuquerque, NM 87110 www.accessinn.com (505) 998-0800 office (505) 256-1080 fax www.taxodiary.com the taxonomy news blog © 2010. Access Innovations, Inc. All Rights Reserved.
  • 33. Consensus: Thesaurus vs. Titles 33 © 2010. Access Innovations, Inc. All Rights Reserved. 33
  • 34. Scientific Social Networking Based on Metadata  Idea has been here - Who is citing who like  ISI does it with references  API UniPHY does it using semantics  Expand your options using  good metadata and descriptors See the authors connections Map who is working in the field and where © 2010. Access Innovations, Inc. All Rights Reserved.

Editor's Notes

  1. A 125 year professional society, with over 148 journals, conference transactions and magazinesSponsor approx 800 conferences annuallyTotal Membership over 400,000 as of Dec 31, 2009Span the globe, with participation in 160 countries
  2. Key features include personalization w/ up to 15 saved search profiles, improved search, including: facets for faster resolution, type ahead, breadcrumbs to easily navigate your search and refine, and Institutional branding, not to mentioned improved reliability and stability
  3. We knew there was “gold in them thare hills!” but how to unlock it?As a leading source of research materials, could we extract new directions.Are the societies living up to their charters and covering the topical areas they think they are?Are there trends that were just momentary? Are they still vigorously being investigated or were they just a flash in the pan?What other things might we learn?Introducing Dick Klavens
  4. Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.
  5. What do I mean by accuracy? Here’s an example of an accurate disciplinary map.
  6. What do I mean by visual accuracy? Here’s an example of an accurate disciplinary map.
  7. What do I mean by visual accuracy? Here’s an example of an accurate disciplinary map.