Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OpenMinted: It's Uses and Benefits for the Social Sciences

942 views

Published on

Presentation as presented at the ITOC workshop in Philadelphia, 20 February 2016.
Uses and Benefits for the Social Sciences research community.
By GESIS - Leibniz Institute for the Social Sciences

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

OpenMinted: It's Uses and Benefits for the Social Sciences

  1. 1. 1 twitter.com/openminted_eu Peter Mutschke ITOC Workshop Philadelphia – February 20, 2016 Open Mining Infrastructure for Text & Data (OpenMinTeD)
  2. 2. 2 Goal of Text Mining This is where the footer goes implementation of transformational processes that … uncover knowledge in unstructured text  salient content items  hidden relationships between content items …to assist researchers and scientific data curators in making sense of the textual data
  3. 3. • 1 • 2 • 3 • 4 • 5 • 6 • 7 3 The phases of text mining taken from ICT2015 presentation (N. Manola)@openminted_eu NLP Analysis Entity Recognition Data Mining Knowledge Discovery Information Extraction STAGE 1 STAGE 2 STAGE 3 STAGE 4 Information Retrieval OPENMINTED - The Open Mining Infrastructure for Text and Data
  4. 4. 4 Challenges This is where the footer goes Text Mining (TM)  remains a fragmented set of tools  TM requires particular technological and analytical skills as well as domain knowledge  no shared knowledge how to apply  lack of a central infrastructure (may rule out use of TM for small research groups) high entry costs: need to share infrastructure costs
  5. 5. 5 Putting it all together This is where the footer goes OpenMinTeD Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific and scholarly related sources
  6. 6. • 1 • 2 • 3 • 4 • 5 • 6 • 7 6 OpenMinTeD – working on many fronts @openminted_eu 6 ACCESSIBLE CONTENT DISCOVERABLE SERVICES EFFICIENT PROCESSING TDM COMMUNITIES VALUE ADDED APPS Via standardised programmatic interfaces and access rules Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text Operate on public e-Infrastructures via standarized APIs Different scientific communities have different challenges Community-driven applications to illustrate the value of the infastructure. Engage with industry. OPENMINTED - The Open Mining Infrastructure for Text and Data taken from ICT2015 presentation (N. Manola)
  7. 7. • 1 • 2 • 3 • 4 • 5 • 6 • 7 7 Bridging the gap between different communities @openminted_eu
  8. 8. • 1 • 2 • 3 • 4 • 5 • 6 • 7 8 The project Starts: June 2015 Duration: 3 years 16 Partners: - 6 mining research groups - 3 content providers - 1 data center - 1 library association - 2 legal experts - 6 community related partners - 2 SMEs Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling PARTNERS @openminted_eu OPENMINTED = The Open Mining Infrastructure for Text and Data taken from ICT2015 presentation (N. Manola)
  9. 9. 9 OpenMinTeD users This is where the footer goes  TM consumer to advance their science  Service Providers to enhance their tools  TM researcher to share their algorithms  Content providers to enrich their content
  10. 10. 10 Infrastructural approach This is where the footer goes  OpenMinted does not build new services, but adopts and adapts existing services for new communities  Focuses on interoperability across text mining services and content providers  Creates an open & collaborative space for researchers to use the best fitting textmining services available
  11. 11. • 1 • 2 • 3 • 4 • 5 • 6 • 7 11@openminted_eu Data centre Data centre Data centre Data centre in public cloud Publisher text corpus OpenAIRE/CORE text corpus PMC text corpus Other text corpora Other text corpora Other text corpora Other types of text corpora Layer 3: Interoperability to shared storage and computing resources Language resources Language resources Language resources Language resources Layer 2: Interoperability of language resources & corpora Layer 1: Interoperability of text mining services (platforms or components) Language resources and corpora registry service Platform services Users: researchers, curators, text-miners and new services developers Registry Workflow ManagementAuth2 & Policy management Annotator Accounting Mining Platforms Mining Platforms Mining Platforms Proprietary architectures Mining Platforms OPENMINTED = The Open Mining Infrastructure for Text and Data The architecture taken from ICT2015 presentation (N. Manola)
  12. 12. • 1 • 2 • 3 • 4 • 5 • 6 • 7 12@openminted_eu RESEARCH ANALYTICS SOCIAL SCIENCES AGRICULTURELIFE SCIENCES Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results. OPENMINTED = The Open Mining Infrastructure for Text and Data taken from ICT2015 presentation (N. Manola)
  13. 13. 13 Science driven approach This is where the footer goes
  14. 14. 14 GESIS: Infrastructure for the Social Sciences This is where the footer goes
  15. 15. 15 GESIS Research Data Cycle This is where the footer goes Study planningArchiving and registering Searching Data collectionData analysis 15
  16. 16. 16 Difficulties in Information Seeking This is where the footer goes
  17. 17. 17 Problems Processing Search Results This is where the footer goes
  18. 18. 18 Usefulness of TM enhanced search services This is where the footer goes
  19. 19. • 1 • 2 • 3 • 4 • 5 • 6 • 7 19 Social Science Use case Develop and evaluate methods for automatic detection and linking of named entities in Social Science publications in order to advance reliable and context- sensitive retrieval and linking of relevant entities 1@openminted_eu
  20. 20. 20 Enhancing Search in Text and Data This is where the footer goes  classical named entity recognition and disambiguation of relevant entities (names, places, organizations, terms) to enhance automatic indexing  recognition of vague variable mentions to enhance linking of data and publications  enrich data with context information from text to enhance retrievability of data sets
  21. 21. 21 Identifying references to survey variables This is where the footer goes OLGA NEŠPOROVÁ, ZDENĚK R. NEŠPOR (2009). “Religion: An Unsolved Problem for the Modern Czech Nation” ISSP 2008 Link Database v39: Believe in life after death v40: Believe in Heaven
  22. 22. 22 Benefits from user perspective This is where the footer goes  semantic search: understanding the contextual meaning of (search) terms  fuzzy phrase search: search for attitudes, survey questions in texts (under vagueness)  link retrieval: search and retrieval of links between text and data  dataset retrieval: facilitating search for research data in data catalogues at the level of items and variables
  23. 23. • 1 • 2 • 3 • 4 • 5 • 6 • 7 23 Contact us www.openminted.eu peter.mutschke@gesis.org twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinked in vimeo.com/openminted bit.do/openmintedplus

×