Your SlideShare is downloading. ×
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

OpenAIRE-COAR conference 2014: Argo - a platform for interoperable and customisable text analytics, by Sophia Ananiadou - University of Manchester

228

Published on

Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014. …

Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014.
Argo: a platform for interoperable and customisable text analytics, by Sophia Ananiadou - School of Computer Science, Director, National Centre for Text Mining, University of Manchester

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
228
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Argo: a platform for interoperable and customisable text mining Sophia Ananiadou National Centre for Text Mining School of Computer Science The University of Manchester
  • 2. Overview • Sharing tools, resources and text mining workflows • Challenges • Interoperable infrastructure for processing and annotation 2Open AIRE-COAR ConferenceAnaniadou
  • 3. NaCTeM • 1st publicly funded national text mining centre • Location: Manchester Institute of Biotechnology • Phase I - Biology (2004-2008) • Phase II - Biology, Medicine, Social Sciences (2008-2011) • Phase III – Biology, Medicine, Humanities, Social Sciences; Fully sustainable centre (2011- ) www.nactem.ac.uk
  • 4. Challenges Language Technology Languages English French German Spanish Portuguese Italian Polish …. Chinese Hindu Urdu Japanese Korean….Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery …. Domains Finance/Business Health Biology Social Sciences Humanities…. Text Types Newswire Scientific Literature Full papers/abstracts Twitter Patents Clinical records, EMR Textbooks, monographs Online forums…. Technology Sentence Splitter Paragraph Splitter NP Chunkers C-parser D-parser Semantic parser NE recognizers Relation recognizers ……. Diversity of Languages Diversity of Contexts Diversity of Applications TM Workflows TM Modules Shared! 4Open AIRE-COAR ConferenceAnaniadou
  • 5. Metadata Languages English French German Spanish Portuguese Italian Polish …. Chinese Hindu Urdu Japanese Korean…Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery …. Language Technology Linguistic Resources Knowledge Resources Resource-Rich Big DataBig Text Cloud Computing Crowd Sourcing Big Ontology Text Types Newswire Scientific Literature Full papers/abstracts Twitter Patents Clinical records, EMR Textbooks, monographs Online forums…. Domains Finance/Business Health Biology Social Sciences Humanities…. 5Open AIRE-COAR ConferenceAnaniadou OPEN SCIENCE
  • 6. Requirements from TM infrastructure • Modularity of TM modules • Interoperability among TM modules and resources • Generic across different languages, domains, and text types – Adaptability 6Open AIRE-COAR ConferenceAnaniadou
  • 7. Module Interoperability and Adaptability ModuleModule Resources Dictionaries Ontologies Adaptation Rule Writing (Annotated) Text Interoperability and Adaptability in Resource-rich TM INFRASTRUCTURES! Dependency Parser English French German JapaneseGreek POS Tagger Named Entity Languages Text Types Domains 7Open AIRE-COAR ConferenceAnaniadou
  • 8. Example: extracting proteins, annotations 8 GENIA PennBioIE AIMed GENETAG Incompatibility Type definitions Texts Problem: Inconsistency Open AIRE-COAR ConferenceAnaniadou
  • 9. The problem with incompatibility • Difficult to evaluate NERs 9 Corpus C Corpus D NER A Which NER is best for my task? NER B A: 93% B: 36% A is better than B. A: 63% B: 90% B is better than A. Why so different among different corpora and NERs ? Open AIRE-COAR ConferenceAnaniadou
  • 10. Text mining workflows • A pipeline that executes particular tools and resources in order • Example: semantic search • Various versions (language- or domain-specific) of basic components needed for different applications and tasks • Different workflows can be created, compared and evaluated by the ability to seamlessly “mix and match” various versions of components PoS Tagger Dictionary Lookup NE Extraction Chunking Parsing Semantic Query 10Open AIRE-COAR ConferenceAnaniadou
  • 11. Text mining workflows Interoperability Common Data Representation and Types IBM Journal of Research and Development (2011) U-Compare: a modular NLP workflow construction and evaluation system. Kano, Y., Miwa, M., Cohen, K. B., Hunter, L., Ananiadou, S. and Tsujii, J. 11Open AIRE-COAR ConferenceAnaniadou
  • 12. Common Type System • A common type system is required for the complete interoperability • Solution: Maintain local type systems and bridge them via a sharable type system 12 A single common type is almost impossible to impose for all developers. U-Compare Sharable Type System Local Type System A Local Type System B bridging bridging 12Open AIRE-COAR ConferenceAnaniadou
  • 13. U-Compare Type System Syntactic Level Document Level Semantic Level 13Open AIRE-COAR ConferenceAnaniadou
  • 14. POS tagger B Sentence Splitter B library POS tagger A Sentence Splitter A NER Sentence Splitter A Sentence Splitter A Sentence Splitter A Sentence Splitter B Sentence Splitter B Sentence Splitter B POS tagger A POS tagger A POS tagger A POS tagger B POS tagger B POS tagger B NERNERNER Workflow A Workflow B Workflow C  F-Score A F-Score B F-Score C U-Compare: Evaluate and Compare TM Worklfows UIMA SD OpenNLP SD GENIA SD UIMA Tokenizer OpenNLP Tokenizer GENIA Tagger as Tokenizer GENIA Tagger Stepp Tagger OpenNLP Tagger ABNER MedT-NER GENIA Tagger as NER
  • 15. • Web-based application • Interactive creation of workflows • Cloud and high- performance computing • Integrated TM/NLP processing system • GUI for workflow creation • Library of ready-to-use processing components • Statistics, visualizations, developer APIs • Supports UIMA • http://argo.nactem.ac.uk 15 Database: The Journal of Biological Databases and Curation (2012) Argo: an integrative, interactive, text mining- based workbench supporting curation. Rak, R., Rowley, A., Black, W.J. and Ananiadou, S
  • 16. Structured Data Remote Processing Workflow Diagramming Workflow Designer Manual Editing Annotator/Curator Processing Components Developers UIMA Compliance 16Ananiadou
  • 17. Processing Components • Approaching 100 components (U-Compare) – Additional 50 will be added soon • META-NET • Developed or co-developed by NaCTeM – Planned: Make the library open to others to contribute • Generic Listener component – Developers can plug in their own locally run UIMA component to a workflow in Argo 17Open AIRE-COAR ConferenceAnaniadou
  • 18. Remote Processing • Single machine execution – In-house high-performance machines • Distributed processing – HTCondor – VMware vCloud (EBI) EUPMC – Planned: EC2, Azure, … 18Open AIRE-COAR ConferenceAnaniadou
  • 19. Workflows • Users create workflows as block diagrams • Workflows can be shared among users – Read only – Planned: Read & write – Planned: downloadable workflows • Workflows can be deployed as web services – Plain text (input only), XMI, RDF, BioC 19Open AIRE-COAR ConferenceAnaniadou
  • 20. Workflows view 20Open AIRE-COAR ConferenceAnaniadou
  • 21. Workflow Editor 21Open AIRE-COAR Conference
  • 22. Sample Use Cases 1 Recognition of chemical entities (chemical NER) 2 Semi-automatic curation of metabolic pathways 3 Evaluation of inter-annotator agreement 4 Information extraction as a Web service Ananiadou Open AIRE-COAR Conference 22
  • 23. Use Case 1: Chemical NER Supplies gold standard corpus Removes golden annotations so that they can be created automatically Combinations of syntactic and semantic components create annotations Compares and reports precision, recall and F1 of the different branches against the gold standard corpus
  • 24. Chemical Entity Recogniser • Chemical model evaluated at BioCreative IV CHEMDNER challenge • The challenge – Data: 10,000 manually annotated PubMed abstracts – Automatically recognises names of chemical entities in text 24Open AIRE-COAR ConferenceAnaniadou
  • 25. Chemical Entity Recogniser • Our solution – Ranked unique mentions: ranked 1st out of 18 groups – All mentions: ranked 3rd out of 19 groups Subtask Precision % Recall % F-score % Ranked unique mentions 91 85 88 All mentions 93 81 87 25Open AIRE-COAR ConferenceAnaniadou
  • 26. Use Case 2: Semi-automatic Curation – Metabolic Pathways Search for relevant documents Manual correction of automatic annotations NER for chemicals, genes, process indicators Linking to ontologies: CTD, ChEBI, UniProt 26Open AIRE-COAR ConferenceAnaniadou Save results in various formats, e.g., RDF for querying and incorporation into databases
  • 27. Manual Annotation Editor Create new annotations by selecting text Create, modify or delete annotations Edit details of annotations Open a graphical interface to link annotations to ontologies 27Open AIRE-COAR ConferenceAnaniadou
  • 28. Filtering and converting annotations 28Open AIRE-COAR ConferenceAnaniadou
  • 29. Manual Annotation Editor: linking to ontologiesAutomatic pre- selection can be modified by the user Details show ontology entry webpage 29Open AIRE-COAR ConferenceAnaniadou
  • 30. Use Case 3: Information extraction as a Web service Web service- enabled reader Web service- enabled writer 34Open AIRE-COAR ConferenceAnaniadou
  • 31. Language Universal • Reusable modules • Generic TM modules: Competence • Annotated Text, corpora: Performance • Standards of Data Representation and Types for Resources: Competence • Dictionaries, Thesauri, Ontologies: Performance 36Open AIRE-COAR ConferenceAnaniadou

×