Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

II-SDV 2017: From KNIME to HighThroughPut Pipelining - from KNIME to HTPP

483 views

Published on

From KNIME to HighThroughPut Pipelining - from KNIME to HTPP

Published in: Internet
  • Be the first to comment

  • Be the first to like this

II-SDV 2017: From KNIME to HighThroughPut Pipelining - from KNIME to HTPP

  1. 1. From KNIME 2 HTPP Transforming Prototypes into High Performance Boehringer Ingelheim Pharma GmbH & Co. KG Aleksandar Kapisoda II-SDV 2017 • April 24th 2017 • Nice, France
  2. 2. Talk in brief 1. 2015 – Building Protoypes Background: KNIME & ChemCurator process 2. 2016- Transforming Prototypes into High Performance 1. Motivation & Goals How 3. BI UIMA Pipeline 1. Location for Input Data 2. Pipeline Setup for Process 3. Processing & Producing Results 4. Conclusions 5. Outlook 6. Acknowledgements II-SDV 2017 • April 24th 2017 • Nice, France
  3. 3. 2015 Building Protoypes
  4. 4. 2015 – Building Prototypes Background: 2015 - KNIME & ChemCurator process Patent Curation Matthias Negri • II-SDV Lessons Learned, weak points & limitations II-SDV 2017 • April 24th 2017 • Nice, France
  5. 5. 2016 Transforming Prototypes into High Performance
  6. 6. Motivations & Goals Why? • Customizable processing pipelines: About 10-20 different pipelines needed • Standardized processing pipeline & Standardized results II-SDV 2017 • April 24th 2017 • Nice, France
  7. 7. Motivations & Goals Accomplishments: • remove complexity • processing more patens • improving speed & performance • improving quality II-SDV 2017 • April 24th 2017 • Nice, France
  8. 8. How? UIMA (unstructured information management architecture) • A general framework for information processing – factory approach • Apache open source, modular, used by many groups, many modules already available • you can process what you want (pictures, also rock music), e.g. used by IBM Watson II-SDV 2017 • April 24th 2017 • Nice, France
  9. 9. BI UIMA Pipeline Location for Input Data
  10. 10. BI UIMA Pipeline Manager Location for Input Data From a prototype, “personal” KNIME process towards a cloud based UIMA pipeline Step 1: Cloud based sFTP location for input data and processed results: II-SDV 2017 • April 24th 2017 • Nice, France
  11. 11. BI UIMA Pipeline Pipeline Setup for Process
  12. 12. Pipeline Setup Step 2: Pipeline Setup for Process II-SDV 2017 • April 24th 2017 • Nice, France
  13. 13. Pipeline Setup Reader Components Reader can read XML, PDF, TXT, HTML and Office files II-SDV 2017 • April 24th 2017 • Nice, France
  14. 14. Pipeline Setup Disambiguation Components • Preparation and normalization of text Reader can read XML, PDF, TXT, HTML and Office files II-SDV 2017 • April 24th 2017 • Nice, France
  15. 15. Pipeline Setup Ontology Components • Select suitable annotator from > 40 ontologies & dictionaries II-SDV 2017 • April 24th 2017 • Nice, France
  16. 16. Pipeline Setup Chemistry Components • Chemistry is complex II-SDV 2017 • April 24th 2017 • Nice, France
  17. 17. Pipeline Setup Cleaning and Processing Clean-up and processing procedures… II-SDV 2017 • April 24th 2017 • Nice, France
  18. 18. Pipeline Setup UIMA Consumers II-SDV 2017 • April 24th 2017 • Nice, France
  19. 19. Pipeline Setup Extraction of more Knowledge Clean-up and processing procedures… II-SDV 2017 • April 24th 2017 • Nice, France
  20. 20. Pipeline Setup Overview and management of parallel processes II-SDV 2017 • April 24th 2017 • Nice, France
  21. 21. Pipeline Setup Running Processes Manager Cloud based performance • depends on task & computer • from 1 PDF patent / sec • up to 60 XML patents / sec • distributed processing possible • multi user II-SDV 2017 • April 24th 2017 • Nice, France
  22. 22. BI UIMA Pipeline Processing & Producing Results
  23. 23. Processing & Producing Results Step 3: Processing & Producing Results II-SDV 2017 • April 24th 2017 • Nice, France
  24. 24. Processing & Producing Results .csv format #name prefName nameSource domain source sourceSection confidence mentionCount ethylacetate acetic acid ethyl ester dict;n2sOpsin chem US5428149.pdf Body 0.6 8 Pd(OAc)2 chemFormula inorgmat US5428149.pdf Body 0.5 8 oligonucleotides Oligonucleotide dict polymers US5428149.pdf Body 0.76 8 carbon carbon dict chem US5428149.pdf Body 0.65 7 nucleotides nucleotides dict chem US5428149.pdf Body 0.76 7 tin n2sOpsin chem US5428149.pdf Body 0.07 7 2'-deoxyuridine 2'-Deoxyuridine dict;n2sOpsin chem US5428149.pdf Body 0.76 6 PPh3 Triphenylphosphine dict chem US5428149.pdf Body 0.73 6 triphos phates triphosphate group dict chemGroup US5428149.pdf Body 0.55 6 disulfide dihydrogen disulfide dict chem US5428149.pdf Body 0.6 5 CARBON-CARBON carbon carbon dict;n2sOpsin chem US5428149.pdf Body 0.6 5 alkenyl alkenyl group dict chemGroup US5428149.pdf Body 0.74 5 Organometallics organometallic group dict chemGroup US5428149.pdf Body 0.71 5 Acetone acetone dict;n2sOpsin chem US5428149.pdf Body 0.68 4 acetonitrile acetonitrile dict;n2sOpsin chem US5428149.pdf Body 0.6 4 Tributyltinhydride Tributyltin hydride dict;n2sOpsin chem US5428149.pdf Body 0.73 4 triazole 1,2,3-triazole dict chem US5428149.pdf Body 0.6 4 ribose D-ribofuranose dict chem US5428149.pdf Body 0.71 4 vinyl Vinyl radical dict;n2sOpsin chem US5428149.pdf Body 0.63 4 pyrimidine nucleosides pyrimidine nucleosides dict chem US5428149.pdf Body 0.73 4 aryl aryl group dict chemGroup US5428149.pdf Body 0.70 4 II-SDV 2017 • April 24th 2017 • Nice, France
  25. 25. Processing & Producing Results Extracted Compounds (Smile format) II-SDV 2017 • April 24th 2017 • Nice, France
  26. 26. Processing & Producing Results XML Format with inline annotation II-SDV 2017 • April 24th 2017 • Nice, France
  27. 27. Processing & Producing Results Creating Lucene index for search engines II-SDV 2017 • April 24th 2017 • Nice, France
  28. 28. Processing & Producing Results GUI: BI Miner • ontology based semantic searching: e.g. “Steroids” (and all children) II-SDV 2017 • April 24th 2017 • Nice, France
  29. 29. Conclusions
  30. 30. Excerpts from our mission (Leitbild) statement We are dedicated to serving people – through researching diseases and developing new medications and treatment approaches. In order to achieve our goals, we need to be both financially successful and open to new ideas and developments. Research and development are of central importance to our future success. We concentrate our efforts on diseases that are currently not able to be treated satisfactorily. As an employer, we attract the best minds and promote diversity in the workplace. Our organization is characterized by openness, innovation, collaboration, and mutual respect. Corporate Standard Presentation 2017 – short version Performance/Quantity vs. Quality II-SDV 2017 • April 24th 2017 • Nice, France
  31. 31. • Dialectical Materialism (based on Hegels Principal of Negation) The law of the transformation of quantity into quality and vice versa. For our purpose, we could express this by qualitative changes can only occur by the quantitative addition or subtraction of matter or motion. We were not running in the Quantity / Quality Issue Conclusions - Performance/Quantity vs. Quality II-SDV 2017 • April 24th 2017 • Nice, France https://en.wikipedia.org/wiki/Dialectical_materialism https://en.wikipedia.org/wiki/Georg_Wilhelm_Friedrich_Hegel We could process more patents extract more information with a better quality
  32. 32. Outlook
  33. 33. • Processes Improvement – Optimization of more Processes – Transforming more Prototypes into High Performance Pipelines • Graphs & Visualizstions – Implementing graphics and visualization for Knowledge Worker & Data Scientists • Using the BI Pipeline Manager for data & content merging – Joining and merging different Data Sources – Collaboration Boehringer Ingelheim, Deep Search 9 & OntoChem Outlook II-SDV 2017 • April 24th 2017 • Nice, France
  34. 34. Acknowledgements
  35. 35. Matthias Negri Lutz Weber Acknowledgements II-SDV 2017 • April 24th 2017 • Nice, France Former Post Doc at Boehringer Ingelheim Developing Partner
  36. 36. 1895 vs. 2017
  37. 37. 1895 Paul Otlet We don't live in that kind of world http://ww.mondotheque.be/wiki/index.php/Here
  38. 38. 2017 Aleksandar Kapisoda We do live in that kind of world
  39. 39. aleksandar.kapisoda@boehringer-ingelheim.com Contact Information
  40. 40. Thank You Questions?

×