Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

389 views

Published on

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.

Published in: Science
  • Be the first to comment

  • Be the first to like this

EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications

  1. 1. Francesco Osborne, Helene de Ribaupierre, Enrico Motta KMi, The Open University, United Kingdom EKAW2016 TechMiner: Extracting Technologies from Academic Publications
  2. 2. 22 Osborne, F., Motta, E. and Mulholland, P. Exploring scholarly data with Rexplore. International Semantic Web Conference 2013 technologies.kmi.open.ac.uk/rexplore/
  3. 3. Semantic Enhanced Scholarly Data Most scholarly datasets capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. We still lack comprehensive information about the content of research papers, often simply represented as a collection of keywords or categories from a taxonomy. Hence, researchers are working for extracting other kinds of entities, including: – Genes – Chemical components – Epistemological concepts (e.g., hypothesis, motivation, experiments) 3
  4. 4. What about technologies? • Technologies such as applications, systems, languages and formats are an essential part of the Computer Science ecosystem. • Current knowledge bases cover just a little part of the set of technologies presented in the literature. • Identifying semantic relationships between technologies and other research entities allows: – Richer semantic search; – Monitoring the emergence and impact of new technologies, both within and across scientific fields; – Studying the scholarly dynamics associated with the emergence of new technologies; – Supporting companies in the field of innovation brokering and initiatives for encouraging software citations across disciplines, e.g. FORCE11 Software Citation Working Group. 4
  5. 5. TechMiner TechMiner (TM) is a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies applications, systems, languages and formats from research publications. It generates an OWL ontology describing technologies and their relationships with other research entities. We evaluated TM on a manually annotated gold standard and found that it improves significantly both precision and recall over alternative NLP approaches. – The proposed semantic features significantly improve both recall and precision. 5
  6. 6. Some example – Tecnologies created by E. Motta 6
  7. 7. Some example – Popular Knowledge Bases in SW 7 -10 10 30 50 70 90 110 2002 2004 2006 2008 2010 2012 Knowledge Bases WordNet DBpedia YAGO GeoNames
  8. 8. TechMiner - Architecture 8
  9. 9. Evaluation – Gold Standard We tested our approach on a gold standard (GS) of manually annotated publications in the field of the Semantic Web We selected a number of publications tagged with keywords related to this field (e.g., ‘semantic web’, ‘linked data’, ‘RDF’) and asked a group of 8 Semantic Web experts to annotate these papers with their technologies. The resulting GS includes 548 publications, each of them annotated by at least two experts, and 539 technologies. 9
  10. 10. Evaluation 10 0.4 0.5 0.6 0.7 0.8 0.9 0 20 40 60 80 100 Recall NL NLW TMN TM TMN_E TM_E
  11. 11. Evaluation 11 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 Precision NL NLW TMN TM TMN_E TM_E
  12. 12. Future works • Enriching the approach for identifying other categories of scientific objects, such as datasets, algorithms and so on. • Trying the approach on other research fields. • Building a pipeline for allowing human experts to correct and manage the information extracted by TechMiner. 12
  13. 13. Helene de Ribaupierre Enrico MottaFrancesco Osborne Osborne, F., Ribaupierre, H., and Motta, E. (2016) TechMiner: Extracting Technologies from Academic Publications. EKAW 2016, Bologna, Italy Email: francesco.osborne@open.ac.uk Twitter: FraOsborne Site: people.kmi.open.ac.uk/francesco http://oro.open.ac.uk/47332/1/EKAW2016_TM.pdf

×