MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease

Semantic Technology: The Basics

Peter Berger

GtoPdb and GtoImmuPdb in context

Presented at the Joint Statistical Meetings (JSM) 2020 (virtual) on August 3, 2020. ==== Abstract ==== The idea of “big data” has recently been drawing much attention of the scientific community as well as the general public. An example of big data in Chemistry is the data contained in PubChem, which is a public database of chemical substance descriptions and their biological activities at the National Institutes of Health. PubChem is a sizeable system with 235 million depositor-provided substance descriptions, 96 million unique chemical structures, 1.1 million biological assays, and 268 million biological activity result outcomes. It also contains significant amounts of scientific research data and the inter-relationships between chemicals, proteins, genes, scientific literature, patents and more. PubChem resources have been used in many studies for developing bioactivity and toxicity prediction models, discovering multi-target ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). This presentation provides an overview of how PubChem’s data, tools, and services can be used for bioassay data analysis and virtual screening (VS) and discusses important aspects of exploiting PubChem for drug discovery.

Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog

Eleanor Howe

PubChem: a public chemical information resource for big data chemistry

Sunghwan Kim

IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...

SureChEMBL and Open PHACTS

George Papadatos

Patent chemisty big bang: utilities for SMEs

Data science requires so many skills, people and time before the results can be accessed. Moreover, these results cannot be static anymore. And finally, the Big Data comes to the plate and the whole tool chain needs to change. In this talk Data Fellas introduces Shar3, a tool kit aiming to bridged the gaps to build a interactive distributed data processing pipeline, or loop! Then the talk covers genomics nowadays problems including data types, processing, discovery by introducing the GA4GH initiative and its implementation using Shar3.

2016 bioinformatics i_databases_wim_vancriekinge

Prof. Wim Van Criekinge

Data Enthusiasts London: Scalable and Interoperable data services. Applied to...

Andy Petrella

Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...

Graham Smith

Spark Summit Europe: Share and analyse genomic data at scale

Andy Petrella

Presented at the Fall 2020 American Chemical Society (ACS) National Meeting (Virtual) on August 20, 2020. Sunghwan Kim, Jian Zhang, Paul Thiessen, Asta Gindulyte, Pertti J. Hakkinen & Evan Bolton National Library of Medicine, National Institutes of Health, Rockville, Maryland, United States ==== Abstract ==== PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public chemical information resource at the U.S. National Institutes of Health. It collects chemical information from 700+ data sources and disseminates the collected data to the public free of charge. Arguably, PubChem contains the largest amount of chemical information available in the public domain, with more than 250 million depositor-provided substance descriptions, 100 million unique chemical structures, and 265 million bioactivity outcomes from one million assays covering around twenty thousand unique protein target sequences. Included in the many types of content in PubChem is toxicological information about chemicals, e.g., human and animal toxicity, ecotoxicity, exposure limits, exposure symptoms, and antidote & emergency treatment. Notably, a substantial amount of toxicological information from resources formerly offered by the TOXicology data NETwork (TOXNET) is now integrated into PubChem, e.g., the Hazardous Substances Data Bank (HSDB), LactMed, and LiverTox. In addition, PubChem contains a large amount of bioactivity and toxicity screening data that can be used to build toxicity prediction models based on statistical and machine-learning approaches. This presentation provides an overview of PubChem’s toxicological information as well as tools and services that help users exploit this information. It also describes how open data in PubChem can be used to develop prediction models for chemical toxicity.

Toxicological information in PubChem

Sunghwan Kim

Presented at the 255th American Chemical Society (ACS) National Meeting in New Orleans, LA (March. 19, 2018). Building automated workflows that exploit the vast amount of data contained in PubChem requires programmatic access to the data through application programming interfaces (APIs). PubChem provides several programmatic access routes to its data, including Entrez Utilities (E-Utilities or E-Utils), PubChem Power User Gateway (PUG), PUG-SOAP, PUG-REST, PUG-View, and a REST-ful interface to PubChemRDF. This presentation provides an overview of these programmatic access tools, including recent updates, limitations, usage policies, and best practices. *References* (1) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem, Nucleic Acids Research, 2015, 43(W1):W605–W611. https://doi.org/10.1093/nar/gkv396 (2) An update on PUG-REST: RESTful interface for programmatic access to PubChem, Nucleic Acids Research, 2018, 46(W1):gky294. https://doi.org/10.1093/nar/gky294

How can you access PubChem programmatically?

Sunghwan Kim

What's hot (20)

SureChEMBL patent annotations in Open PHACTS

Cassava genome hub

BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets

Overview of SureChEMBL

2009 0807 Lod Gmod

Open-source from/in the enterprise: the RDKit

Aug2015 Giab nist integration methods

Semantic Technology: The Basics

GtoPdb and GtoImmuPdb in context

Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog

PubChem: a public chemical information resource for big data chemistry

IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...

SureChEMBL and Open PHACTS

Patent chemisty big bang: utilities for SMEs

2016 bioinformatics i_databases_wim_vancriekinge

Data Enthusiasts London: Scalable and Interoperable data services. Applied to...

Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...

Spark Summit Europe: Share and analyse genomic data at scale

Toxicological information in PubChem

How can you access PubChem programmatically?

Viewers also liked

The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...

MongoDB at Medtronic

Accelerate Pharmaceutical R&D with Big Data and MongoDB

This webinar will focus on how NewWave Telecom & Technologies, Inc incorporates MongoDB into your medical/healthcare records. There was demand for NewWave to produce a scalable, clinical data acquisition product that allowed healthcare providers to collect patient information from multiple systems. They needed a database solution that could handle big and small data without turning to expensive enterprise solutions. MongoDB proved to have the capability of creating a flexible repository that both read and solved sundry system formats. This webinar will cover the initial issues with data collection of one’s electronic health records and how MongoDB delivers a new perspective on data modeling.

Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...

A Translational Medicine Platform at Sanofi

MongoDB is great for content management and delivery across a multitude of apps such as e-commerce websites, online publications, web content management systems (CMS), document management, archives and others. MongoDB's flexible schema and data model make it easy to catalog multiple content types with diverse meta data. -Schema design for content management -Using GridFS for storing binary files -How you can leverage MongoDB's auto-sharding to partition your content across multiple servers

Content Management with MongoDB by Mark Helmstetter

8 out of 10 couples use TheKnot.com to help plan their wedding. A key part of planning involves selecting articles, photographs, and other resources and storing these in the user's Favorites. Recently we migrated major parts of our technology stack to open source technologies. As part of our migration strategy, we zeroed in on MongoDB, since it better suited our requirements for speed and data structure as well as eliminating the need for a caching layer. The transition required a period in which both our legacy and new API where working concurrently with data being persisted on both databases (SQL and Mongo) and all records were being synched with every request. We resourced to many strategies and applications to achieve this goal, including: Pentaho, AWS SQS and SNS, a queue messenger system and some proprietary ruby gems. In this session we will review our strategy and some of the lessons we learned about successfully migrating with zero downtime.

Migration from SQL to MongoDB - A Case Study at TheKnot.com

MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)

Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...

Blockchain is a decentralized, distributed ledger in which users can validate transactions without need for an intermediary 3rd party. As a publicly available, and secure ledger it could replace traditional commercial banking as we understand it. Institutional banks are already integrating this technology as they implement their own private side-chains. As a distributed ledger, blockchain can be used for purposes outside finance, including voting systems and identity registration. Yet, Blockchain is still only one component of a full application architecture. Practical use of use of blockchain will require integration with Spark & Hadoop for analytics capabilities and MongoDB for service to real-time application loads. This talk will describe and demystify blockchain and provide integrations Spark and MongoDB to produce truly performant, innovative and robust applications.

MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB

MongoDB introduces new capabilities that change the way micro-services interact with the database, capabilities that are either absent or exist only partially in high-end commercial databases such as Oracle. In this session I will share from my experiences building a cloud-based, multi-tenant SaaS application with extreme security requirements. We will cover topics including considerations for storing multi-tenant data in the database, best practices for authentication and authorization, and performance considerations specific to security in MongoDB.

Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...

Scaling Hike Messenger to 15M Users

The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.

Viewers also liked (12)

The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Gen...

MongoDB at Medtronic

Accelerate Pharmaceutical R&D with Big Data and MongoDB

Webinar: Electronic Health Records (EHRs) and MongoDB - Advancing the Data Pl...

A Translational Medicine Platform at Sanofi

Content Management with MongoDB by Mark Helmstetter

Migration from SQL to MongoDB - A Case Study at TheKnot.com

MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)

Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...

MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB

Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...

Scaling Hike Messenger to 15M Users

Similar to MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease

MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...

Spark Summit EU talk by Erwin Datema and Roeland van Ham

Spark Summit

Genome in a bottle for ashg grc giab workshop 181016

Bioinformatics and Computational Biosciences Branch

Platforms CIBERER and INB-ELIXIR-es

Joaquin Dopazo

Wikidata for biomedical knowledge integration and curation

Gregory Stupp

Intro to databases

bhargvi sharma

Variant analysis and whole exome sequencing

Advanced Bioinformatics for Genomics and BioData Driven Research

European Bioinformatics Institute

In this webcast on February 19th, Gabe Rudy, Vice President of Product Development, will showcase publicly available databases and resources available for interpreting rare and novel mutations in the context of his own personal exome obtained through a limited 23andMe pilot in 2012. The last couple years have seen many changes in well-established resources such as OMIM and dbSNP, while motivating new efforts such as ClinVar and PhenoDB to bring NGS interpretation to clinical grade through a global data sharing effort. In this webcast, Gabe will cover: The changing landscape of public annotations: Then, Now, and Soon. Will the new human reference (GRCh38) released in December be a game changer? Specific examples of improvements in annotation and algorithms that result in more accurate analysis of his own exome. The utility and progress of NGS to different clinical applications in terms of public resources: carrier screening, hereditary cancer risk, pharmacogenomics, oncology care, and genetic disorder diagnosis. Sharing of new clinical data: How both variation and phenotype level data is currently being shared and what will be the way forward to match rare and undiagnosed cases at a global scale.

Using Public Access Clinical Databases to Interpret NGS Variants

Golden Helix Inc

Genome in a bottle for amp GeT-RM 181030

Rna seq

Amitha Dasari

Guide to Immunopharmacology update

Genome Reference Consortium

Stratergies for the intergration of information (IPI_ConfEX)

Ben Gardner

171017 giab for giab grc workshop

GIAB update for GRC GIAB workshop 191015

Syed Ahmad Chan Bukhari, PhD

MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...

From Expression to Pathways Using Online Tools

Ali Kishk

IUPHAR/BPS Guide to PHARMACOLOGY in 2017: new features and updates

Guide to PHARMACOLOGY

iOMICS Research

InterpretOmics

171017 giab for giab grc workshop

During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.

Similar to MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease (20)

MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...

Spark Summit EU talk by Erwin Datema and Roeland van Ham

Genome in a bottle for ashg grc giab workshop 181016

Platforms CIBERER and INB-ELIXIR-es

Wikidata for biomedical knowledge integration and curation

Intro to databases

Variant analysis and whole exome sequencing

Advanced Bioinformatics for Genomics and BioData Driven Research

Using Public Access Clinical Databases to Interpret NGS Variants

Genome in a bottle for amp GeT-RM 181030

Rna seq

Guide to Immunopharmacology update

Stratergies for the intergration of information (IPI_ConfEX)

171017 giab for giab grc workshop

GIAB update for GRC GIAB workshop 191015

MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...

From Expression to Pathways Using Online Tools

IUPHAR/BPS Guide to PHARMACOLOGY in 2017: new features and updates

iOMICS Research

171017 giab for giab grc workshop

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...