Successfully reported this slideshow.
Your SlideShare is downloading. ×

Sources of Change in Modern Knowledge Organization Systems

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 28 Ad
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Sources of Change in Modern Knowledge Organization Systems (20)

Recently uploaded (20)

Advertisement

Sources of Change in Modern Knowledge Organization Systems

  1. 1. SOURCES OF CHANGE IN MODERN KNOWLEDGE ORGANIZATION SYSTEMS Paul Groth (@pgroth) Disruptive Technology Director Elsevier Labs (@elsevierlabs) February 2, 2016 Contributions: Brad Allen, Michael Lauruhn
  2. 2. KNOWLEDGE ORGANIZATION IS IMPORTANT
  3. 3. https://www.elsevier.com/authors/author-schemas/elsevier-xml-dtds-and-transport-schemas • 548 page document • defines the content structure of a document • “Developing a DTD alone is insufficient to allow an XML- based process; high-quality documentation helps in clarifying the interpretation of the tags and specifying the ways in which they are used”
  4. 4. Education 8 • Elsevier Enterprise Content Model ontology • 40+ properties • 20 datatypes • 10 Content types • 20 Asset types • Adaptive Learning ontology • Recommendation • Teaching • Assessing • Remediation • SKOS ontology • 3 third-party vocabularies: QSEN, Bloom etc. • QTI 2.1 compliant schema • XHTML5 schema • 50+ data-type attribute definitions • Student Learning Objective ontology • SKOS ontology extended with 2 properties • Multi-media assets incl. Text Time based Markup Language
  5. 5. BIG KOS
  6. 6. ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  7. 7. KNOWLEDGE GRAPHS AND MACHINE READING TURN CONTENT INTO ANSWERS • Knowledge graphs are "graph structured knowledge bases (KBs) which store factual information in form of relationships between entities” (Nickel, M., Murphy, K., Tresp, V. and Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs. arXiv:1503.00759v3) • Knowledge graphs are metadata evolved beyond the focus on the work, linking people, concepts, things and events • Knowledge graphs organize data extracted from content through machine reading so that queries can provide answers
  8. 8. ELSEVIER: KNOWLEDGE GRAPHS FOR RESEARCH
  9. 9. ELSEVIER: KNOWLEDGE GRAPHS FOR LIFE SCIENCESBiological Pathways extracted via semantic text mining A upregulates B B upregulates C C increases disease D Normalizing vocabularies required: proteins, diseases, drugs, chemicals A  B  C  D Bioactivities through text analysis IC50 6.3nM, kinase binding assay 10mM concentration Chemical Structures And Properties InChi, Name NCBI, Uniprot EMTREE ReaxysTree, Structures
  10. 10. ELSEVIER’S KNOWLEDGE PLATFORM Products Data & Content Sources Knowledge Graphs Platforms & Shared Services Entity Hubs Usage logs Pathways EHRsArticles Authors Institutions SyllabiCitations ChemicalsBooks DrugsFunders Funder Hub Article HubProfile Hub Journal Hub Institution Hub Research HealthcareLife Sciences Content Life Sciences Search IdentityResearch Reaxys CK SherpathScopus SD ROS
  11. 11. THE BATTLE FOR THE KNOWLEDGE GRAPH I really believe that the key battleground in any industry is that of its knowledge graph. Google has it for media/advertising, Netflix has it for filmed entertainment, Uber has it for inner city transportation, Facebook has it across social media as well as messaging and the multiples speak for themselves. Tony Askew, Founder/Partner at REV (personal communication, September 29, 2016)
  12. 12. CHANGE
  13. 13. Concept1 Concept2 Concept3 KOS Professional Curators Literature Software Non-professional contributors Data ⚐Society & Politics (4, 5, 6) (7, 8, 9) (3) (1, 2)
  14. 14. SOURCES OF CHANGE FOR KOS – CURRENT VIEW 1. dealing with changing cultural and societal norms, specifically to address or correct bias; 2. political influence 3. new concepts and terminology arising from discoveries or change in perspective within a technical/scientific community
  15. 15. 4. GARDENING Wikipedia Categories 25% increase in the number of categories over the 2012 - 2014 period vs a 12% increase in the number of articles. Likewise, the number of disambiguation pages has increased by 13%. (Bairi et al. 2015) http://blog.schema.org/2015/11/schemaorg-whats-new.html
  16. 16. 5. INCREMENTAL CONTRIBUTORSHIP Over 17,000 active users on wikidata as of Feb 2017
  17. 17. 6. PROGRESSIVE FORMALIZATION
  18. 18. 7. SOFTWARE AGENTS p=83 r = 176 83 x 176 sparse binary-valued matrix with 366 entries surface form relations structured relations entitypairs Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction 14M articles from Science Direct 3.3M facts 475M facts 49M facts920K concepts from EMMeT glaucoma developed many years after chronic inflammation of uveal tract glaucoma develop following chronic inflammation of uveal tract glaucoma can appear soon in family history of glaucoma glaucoma can appear soon in age over 40 glaucoma the risk of functional visual field loss glaucoma contributing causes of functional visual field loss glaucoma contributed to functional visual field loss glaucoma is considered the second leading cause of functional visual field loss glaucoma remains the second leading cause of functional visual field loss Latent factor matrix r = 176 p=83 Latentfactormatrix × 83 x 176 real-valued matrix with 14,608 entries = diseases 2791370 glaucoma have been documented to cause contact dermatitis 3815093 diseases diseases 2791370 glaucoma is assessed through evaluation 5415395 qualifier diseases 2791370 glaucoma progresses more rapidly than primary open-angle glaucoma 8247149 diseases diseases 2791370 glaucoma recommend treatment 5216597 procedures diseases 2791370 glaucoma supports the assumption that oxidative stress 8184588 diseases diseases 2791370 glaucoma is the death of retinal ganglion cells 8002088 anatomy
  19. 19. 8. INTEGRATION OF LARGE NUMBERS OF DATA SOURCES Groth, Paul, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013 doi: 10.1109/MIS.2013.138 • 10 different extractors • E.g mapping-based infobox extractor • Infobox uses a hand-built ontology based on the 350 • Based on acommonly used English language infoboxes • Integrates with Yago • Yago relies on Wikipedia + Wordnet • Upper ontology from Wordnet and then a mapping to Wikipedia categories based frequencies • Wordnet is built by psycholinguists
  20. 20. 9. TRAINING DATA
  21. 21. CONCLUSION AND A QUESTION • KOSs are important and are expanding in size • A focus on organizing information about entities not just “content” • The construction and maintenance of massive KOSs  new sources of change • Two new actors: software and non-professionals • How do we deal with theses sources? • New biases, opaque systems • The role of a KOS observatory? • Empirical evidence for what to do

Editor's Notes

  • Use of open standards
  • 1700 active contributors
  • We don’t start with a full formal definition but formalize over time from usage

×