Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IC-SDV 2019: OntoChem


Published on

The OntoChem IT Solutions GmbH ...
... was founded in 2015 as a purely IT-oriented offshoot of the OntoChem GmbH. Even before we had many years of experience and it has always been our mission to provide added value to our customers by helping them to navigate today’s complex information world by developing cognitive computing solutions, indexing intranet and internet data and applying semantic search solutions for pharmaceutical, material sciences and technology driven businesses.
We strive to support our customers with the most useful tools for knowledge discovery possible, encompassing up-to-date data sources, optimized ontologies and high-throughput semantic document processing and annotation techniques.

We create new knowledge from structured and unstructured data by extracting relationships thereby exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.

We aim at an unprecedented, machine understanding of text and subsequent knowledge extraction and inference. The application of our methods towards chemical compounds and their properties supports our customers in generating intellectual property and their use as novel therapeutics, agrochemical products, nutraceuticals, cosmetics and in the field of novel materials.
It's our mission to provide added value to customers by:
developing and applying cognitive computing solutions
creating intranet and internet data indexing and semantic search solutions
Big Data analytics for technology driven businesses
supporting product development and surveillance.

We deliver useful tools for knowledge discovery for:
creating background knowledge ontologies
high-throughput semantic document processing and annotation
knowledge mining by extracting relationships
exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.

Published in: Software
  • Be the first to comment

  • Be the first to like this

IC-SDV 2019: OntoChem

  1. 1. Nizza , April 8 SciWalker integrating open access & private sources
  2. 2. Ontologies: we understand life and material sciences. Data Streams: normalized patents, scientific articles (open source and behind paywalls) & news etc. Software & Analytics tools: indexing, extracting compound properties, combining internal & external sources - documents, databases, linked data. We support our customers by ...
  3. 3. Compound Registration OC|processor - chemistry annotator dictionary WebAPI190000000809 if no register deliver OCID and data • provides chemistry registration system based on InChI • unique, stable OCID • substructure searchable (JChem SQL Database) • synonyms & classification connected to OCID • can be used for local compounds in DB’s or documents as well is it known?… composition comprising a neonicotinoid such as imidacloprid <190000000809> and a … formula & molpuzzler name-2- structure image-2- structure class/group classify compound store other applications [O-][N+](=O)NC1=NCCN1CC1=CC=C(Cl)N=C1
  4. 4. About 50.000 new compounds per month from new patents and articles Growth of registered unique compounds
  5. 5. Compound Registration your chemistry WebAPI190000000809 if no register deliver OCID and data is it known? [O-][N+](=O)NC1=NCCN1CC1=CC=C(Cl)N=C1 Call WebAPI client compound store Get the client & description at our public FTP User: Mievoogh pwd: phae9Goo other applications compound database
  6. 6. ● Molecules: about 126 million unique InChI compounds from OntoChem registration server, 74 million nucleotide and 11 million peptide sequences = 211 million unique molecules ○ OntoChem, PubChem, ChEMBL, Zinc, DssTox ○ Nucleotide sequences, protein and peptide sequences ● Ontologies - public OCID and hierarchy ○ anatomy, biomarker, chemistry, clinicalTrials, compound_classes, cosmetology, drugs, effects, herbal_drugs, human_genes, inorganic_materials, institutions, magnitudes, methods, natural_products, nutrition, proteins, regions, species, substances, toxicity ● Genes & Proteins ○ GWAS - Genome wide association studies, ClinVar ● Clinical & Drug Data ○, Drug Central, Drug labels “SciWalker Open Data” in BigQuery
  7. 7. Use Examples
  8. 8. Data on Sitagliptin ? SciWalker + BigQuery WebAPI compound link-outs
  9. 9. Application structure changes Qlik Backend API Middleware Frontend OIS API Backend OIS Middleware Looker Tableau Google API BigQuery Google Data Studio so far new Integrity COSMIC GWAS Text sources Databases PMC MedLine Patents Integrity COSMIC GWAS Text sources Databases PMC MedLine Patents Frontend BigQuery your data
  10. 10. ● Technology integration: ○ Lucene or SQL indexes: speed with indexed data ○ BigQuery: fast sorting, deduplication and aggregation of non-indexed data ○ Seamless prototyping of OLAP search and visualization using Google Data Studio, Tableau, Looker, Qlik … ● Data integration: ○ Faster integration of novel data sources ○ JOIN operations on different sources: public + third party + private data ○ Scalability and speed ○ BigQuery: large amount of open access data Advantages of hybrid BigQuery architectures
  11. 11. sciwalker-open-data:sequences.proteins seq = DRVYIHPF Searching for patents with AT2 sequences in BQ
  12. 12. Q: What tetrazole containing drug candidates are in how many clinical trials for which diseases ? A: from 303.973 published tetrazoles, 35 compounds were in 926 trials: Answering pharma related questions
  13. 13. Ian Wetherbee @Google https://datastudio. porting/1xKxWJ9R TOaJCjjzrBgB2p_ s- OstfCikt/page/M6y j
  14. 14. Thanks to Stephen Boyer, Collabra Ian Wetherbee, Google Aleksandar Kapisoda, Karlheinz Spenny, Boehringer-Ingelheim Team, OntoChem