The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond.
Identifying Metabolites and Small Molecules with Mass Spectrometry - in effect a walk through structure elucidation with MetFrag.
Presentation given for the VIB training event ‘Metabolomics Data Interpretation’
https://training.vib.be/metabolomics-data-interpretation
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019Emma Schymanski
The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond. Kindly hosted by Prof. Christoph Steinbeck
ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown MetabolitesEmma Schymanski
Characterising unknown metabolites talk from the ASMS Fall Metabolomics Informatics Workshop 2018 in San Francisco, California.
https://www.asms.org/conferences/fall-workshop/program
Slides with active hyperlinks accessible via tinyurl on the front page.
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018Emma Schymanski
Environmental Cheminformatics to Identify Unknown Chemicals and their Effects
Assoc. Prof. Dr. Emma L. Schymanski
FNR ATTRACT Fellow and PI: Environmental Cheminformatics, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond.
NOTE: some slides causing errors have been removed but can be accessed through the tinyurl on the front page.
Active hyperlinks can be retrieved using the tinyurl on the front page. Please cite this work if you use any of the contents.
ASMS Fall 2018 Metabolomics Informatics Workshop Peak PickingEmma Schymanski
Principles of Peak Picking and Alignment in Pictures and further "doing". ASMS Fall Metabolomics Informatics Workshop 2018.
https://www.asms.org/conferences/fall-workshop/program
Small Molecules in Big Data - Analytica MunichEmma Schymanski
Finding small molecules in big data
E.L. Schymanski, Belvaux/LU, A.J. Williams, North Carolina/USA
Dr. Emma L. Schymanski, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data - from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
Community Resources Connecting Chemistry and Toxicity Knowledge to Environmental Observations presented at the Disease Map Community Meeting 2018 in Paris
Identifying Metabolites and Small Molecules with Mass Spectrometry - in effect a walk through structure elucidation with MetFrag.
Presentation given for the VIB training event ‘Metabolomics Data Interpretation’
https://training.vib.be/metabolomics-data-interpretation
FSUJena Environmental Cheminformatics to Identify Unknowns April 2019Emma Schymanski
The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond. Kindly hosted by Prof. Christoph Steinbeck
ASMS Fall Metabolomics Informatics Workshop 2018 Identifying Unknown MetabolitesEmma Schymanski
Characterising unknown metabolites talk from the ASMS Fall Metabolomics Informatics Workshop 2018 in San Francisco, California.
https://www.asms.org/conferences/fall-workshop/program
Slides with active hyperlinks accessible via tinyurl on the front page.
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018Emma Schymanski
Environmental Cheminformatics to Identify Unknown Chemicals and their Effects
Assoc. Prof. Dr. Emma L. Schymanski
FNR ATTRACT Fellow and PI: Environmental Cheminformatics, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
The Environmental Cheminformatics group at the Luxembourg Centre for Systems Biomedicine focuses on the comprehensive identification of known and unknown chemicals in our environment to investigate their effects on health and disease. The environment and the chemicals to which we are exposed is incredibly complex, with over 125 million chemicals registered in the largest chemical registry and over 70,000 in household use alone. Detectable molecules in complex samples can now be captured using high resolution mass spectrometry (HRMS), which provides a “snapshot” of all chemicals present in a sample and allows for retrospective data analysis through digital archiving. However, scientists cannot yet identify the vast majority of the tens of thousands of features in each sample, leading to critical bottlenecks in identification and data interpretation. For instance, recent studies indicate a strong connection between the gut microbiome and Parkinson’s disease, yet over 60 % of significant metabolites in microbiome experiments are unknown. Unknown identification remains extremely time consuming and, in many cases, a matter of luck. Prioritizing efforts to find significant metabolites or potentially toxic substances responsible for observed effects is the key, which involves reconciling highly complex samples with expert knowledge and careful validation. This talk will cover European, US and worldwide community initiatives to help connect knowledge on chemistry and toxicity with environmental observations - from compound databases to spectral libraries and retrospective screening. It will touch on the challenges of standardized structure representations, data curation, deposition and communication between resources. Finally, it will show how interdisciplinary efforts and data sharing can facilitate research in metabolomics, exposomics and beyond.
NOTE: some slides causing errors have been removed but can be accessed through the tinyurl on the front page.
Active hyperlinks can be retrieved using the tinyurl on the front page. Please cite this work if you use any of the contents.
ASMS Fall 2018 Metabolomics Informatics Workshop Peak PickingEmma Schymanski
Principles of Peak Picking and Alignment in Pictures and further "doing". ASMS Fall Metabolomics Informatics Workshop 2018.
https://www.asms.org/conferences/fall-workshop/program
Small Molecules in Big Data - Analytica MunichEmma Schymanski
Finding small molecules in big data
E.L. Schymanski, Belvaux/LU, A.J. Williams, North Carolina/USA
Dr. Emma L. Schymanski, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data - from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
Community Resources Connecting Chemistry and Toxicity Knowledge to Environmental Observations presented at the Disease Map Community Meeting 2018 in Paris
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on spectral and chemical compound databases in the environmental community and beyond. Increasingly, new methods are relying on open data resources. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Smaller, selective lists of chemicals (also called “suspect lists”) can be used to perform more efficient annotation. Mass spectral libraries can then be used to increase the confidence in tentative identification. Additional metadata (e.g. exposure and hazard information, reference and data source information) can be extremely useful to prioritize substances of high environmental interest. Exchanging information and “sharing structural linkages” between these resources requires extensive curation to ensure that the correct information is shared correctly, yet many valuable datasets arise from scientists and regulators with little official cheminformatics training. This talk will cover curation efforts undertaken to map spectral libraries (e.g. MassBank.EU, mzCloud) and suspect lists from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) to unique chemical identifiers associated with the US EPA CompTox Chemistry Dashboard. The curation workflow takes advantage of years of experience, as well as contact with the original data providers, to enable open access to valuable, curated datasets to support environmental scientists and the broader research community (e.g. https://comptox.epa.gov/dashboard/chemical_lists). Note: This abstract does not reflect US EPA policy.
SETAC Rome Non-Target Screening For Chemical DiscoveryEmma Schymanski
SETAC Special Session: Solutions for emerging pollutants – Towards a holistic chemical quality status assessment in European freshwater resources
Non-target Screening for Holistic Chemical Monitoring and Compound Discovery: Open Science, Real-time and Retrospective Approaches
Emma L. Schymanski [1*], Reza Aalizadeh [2], Nikiforos Alygizakis [3], Juliane Hollender [4], Martin Krauss [5], Tobias Schulze [5], Jaroslav Slobodnik [3], Nikolaos S. Thomaidis [2] and Antony J. Williams [6]
[1] Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
[2] National and Kapodistrian University of Athens, Department of Chemistry, Athens, Greece.
[3] Environmental Institute, Koš, Slovak Republic
[4] Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland
[5] UFZ – Helmholtz Centre for Environmental Research, Leipzig, Germany
[6] National Centre for Computational Toxicity, US EPA, Research Triangle Park, Durham, NC, USA
*presenting author: emma.schymanski@uni.lu
Keywords: Non-target screening, Open Science, Mass Spectrometry, Monitoring
Non-target screening (NTS) with high resolution mass spectrometry (HR-MS) provides opportunities to discover chemicals, their dynamics and effects on the environment far beyond the current 45 “priority pollutants” or even “known” chemicals. Open science and the exchange of information (between for example scientists and regulatory authorities) has a critical role to play in the continuing evolution of NTS. Using a variety of case studies from Europe, this talk will highlight how open science activities such as MassBank.EU (https://massbank.eu), the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) and NORMAN Digital Sample Freezing Platform (http://norman-data.eu) as well as the US EPA CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/) can support NTS. Further, it will show how initiatives such as near “real time” monitoring of the River Rhine and retrospective screening via so-called “digital freezing” platforms have opened up new potential for exploring the dynamics and distribution even of as-yet-unidentified chemicals. Collaborative European and international activities facilitate data exchange amongst analytical data scientists and enable quick, effective and reproducible provisional compound identification in digitally archived HR-MS data. This is leading to new ways of assessing and prioritizing the next generation of “emerging pollutants” in the environment, enabling a pro-active approach to environmental assessment unthinkable only a few years ago. Note: This abstract does not reflect US EPA policy.
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on chemical compound databases. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Additional data (e.g. fragmentation, physicochemical properties, reference and data source information) is then used to select potential candidates, depending on the experimental context. However, these strategies require the presence of substances of interest in these compound databases, which is often not the case as no database can be fully inclusive. A prominent example with clear data gaps are surfactants, used in many products in our daily lives, yet often absent as discrete structures in compound databases. Linear alkylbenzene sulfonates (LAS) are a common, high use and high priority surfactant class that have highly complex transformation behaviour in wastewater. Despite extensive reports in the environmental literature, few of the LAS and none of the related transformation products were reported in any compound databases during an investigation into Swiss wastewater effluents, despite these forming the most intense signals. The LAS surfactant class will be used to demonstrate how the coupling of environmental observations with high resolution mass spectrometry and detailed literature data (expert knowledge) on the transformation of these species can be used to progressively “fill the gaps” in compound databases. The LAS and their transformation products have been added to the CompTox Chemistry Dashboard (https://comptox.epa.gov/) using a combination of “representative structures” and “related structures” starting from the structural information contained in the literature. By adding this information into a centralized open resource, future environmental investigations can now profit from the expert knowledge previously scattered throughout the literature. Note: This abstract does not reflect US EPA policy.
Read-across (RAx) is probably the most used strategy to waive new demanding in vivo tests for toxicological assessment of chemicals. It is based on the possibility to translate available information from well-characterized chemicals (source) to the substance for which there is a toxicological data gap (target). In spite of the widespread use, regulatory acceptance is still limited,
New Approach Methods (NAM) may be used to confirm chemical and toxicological similarities and to contribute to the reduction of the uncertainty.
Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
The scientific and medical literature contains huge amounts of valuable unused information. This talk shows how to discover it, extract, re-use and interpret it. Wikidata is presented as a key new tool and infrastructure. Everyone can become involved. However some of the barriers to use are sociopolitical and these are identified and discussed.
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
Chemical databases have been around for decades, but in recent years we have observed a qualitative change from rather small in-house built proprietary databases to large-scale, open and increasingly complex chemistry knowledgebases. This tectonic shift has imposed new requirements for database design and system architecture as well as the implementation of completely new components and workflows which did not exist in chemical databases before. Probably the most profound change is being caused by the linked nature of modern resources - individual databases are becoming nodes and hubs of a huge and truly distributed web of knowledge. This change has important aspects such as data and format standards, interoperability, provenance, security, quality control and metainformation standards.
ChemSpider at the Royal Society of Chemistry was first public chemical database which incorporated rigorous quality control by introducing both community curation and automated quality checks at the scale of tens of millions of records. Yet we have come to realize that this approach may now be incomplete in a quickly changing world of linked data. In this presentation we will talk about challenges associated with building modern public and private chemical databases as well as lessons that we have learned from our past and present experience. We will also talk about solutions for some common problems.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
Identification of unknowns in mass spectrometry based non-targeted analyses (NTA) requires the integration of complementary pieces of data to arrive at a confident, consensus structure. Researchers use chemical reference databases, spectral matching, fragment prediction tools, retention time prediction tools, and a variety of other data to arrive at tentative, probable, and confirmed, if possible, identifications. With the diverse, robust data contained within the US EPA’s CompTox Chemistry Dashboard (https://comptox.epa.gov), the goal of this research is to identify and implement a harmonized identification tool and workflow using previously generated chemistry data. Data has been compiled from product use, functional use prediction models, environmental media occurrence prediction models, and PubMed references, among other sources. We will report on our development of a visualization tool whereby users can visualize the relative contribution of identification-based metrics on a list of candidate structures and observe the greatest likelihood of occurrence. These data and visualization tools support NTA identification via the Dashboard and demonstrate an open, accessible tool for all users of HRMS data. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.
Video z této přednášky naleznete v repozitáři NTK: http://repozitar.techlib.cz/record/784
Chcete vědět víc? Mnoho dalších prezentací, videí z konferencí, fotografií i jiných dokumentů je k dispozici v institucionálním repozitáři NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
Linking the silos. Data and predictive models integration in toxicology.Nina Jeliazkova
Silo storage system are typically designed to store one single type of grain. An information silo is characterized by a rigid design, not allowing easy exchange of information and integration with other systems. Life sciences software systems and those applied in toxicology in particular, are often developed independently and compatibility is rarely perceived as a primary design goal. The traditional approach to the integration challenge is to build data warehouses and custom applications managing all the information under a central authority and storing data in a single database schema. The majority of the existing chemical databases rely on a centralized design, offering data access via unique and incompatible interfaces. Loosely coupled systems, allowing end users to share and integrate data on the fly are increasingly considered more appropriate and actively developed [1], but not yet the mainstream in cheminformatics. Our experience in several past and on-going projects, dealing with data integration of chemical structures and toxicity databases, has been reflected in the architecture of the systems that we have designed (e.g. OpenTox, a set of distributed web services, with semantic representation of resources, including data [2], [3], [4]). The core concepts of the information integration are the web service oriented architecture and a data representation design that includes semantic and provenance information uncoupled from the software code and any proprietary metadata.
This presentation highlights known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency (EPA) Computational Toxicology Program. This includes consolidating EPA’s ACToR and DSSTox databases, augmenting computed properties and list search features, and introducing quality metrics to assess confidence in chemical structure assignments across hundreds of thousands of chemical substance records. The past decade has seen enormous investments in the generation and release of data from studies of chemicals and their toxicological effects. There is, however, commonly little concern given to provenance and, more generally, to the quality of the data. The presentation will emphasize the importance of rigorous data review procedures, progress in web-based public access to accurate chemical data sets for use in predictive modeling, and the benefits that these efforts will deliver to toxicologists to embrace the “Big Data” era.
This abstract does not necessarily represent the views of the U.S. Environmental Protection Agency.
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk at Lhasa (https://www.lhasalimited.org/) a leading organization for "in silico prediction and database systems for use in metabolism, toxicology and related sciences". ContentMine software can extract data from papers on compound metabolism in reusable semantic form, including metabolic pathways, pharmacokinetic data.
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
The phrase “Big Data” is generally used to describe a large volume of structured and/or unstructured data that cannot be processed using traditional database and software techniques. In the domain of chemistry the Royal Society of Chemistry certainly hosts large structured databases of chemistry data, for example ChemSpider, as well as unstructured content, in the form of our collection of scientific articles. Our research literature provides value to their readership and, at present, as an example of one of our databases, ChemSpider is accessed by many tens of thousands of scientists every day. But do these collections constitute “Big Data” or is it the potential which lies within the collections that can contribute to the Big Data movement. This presentation will discuss our activities to contribute both data, and service-based access to our data sets, to support grant-based projects such as the Innovative Medicines Initiative Open PHACTS project (to support drug discovery) and the PharmaSea initiative (to identify novel natural products from the ocean). We will also provide an overview of our activities to perform data mining of public patent collections and examine what can be done with the data. We are presently extracting physicochemical properties and textual forms of NMR spectra and, with the resulting data, are building predictive models (for melting points at present) and assembling a large NMR spectral database containing many hundreds of thousands of spectral-structure pairs. Our experiences to date have demonstrated that we are working at the edge of current algorithmic and computing capabilities for predictive model building, with over a quarter of a million melting points producing a matrix of over 200 billion descriptors. Our work to produce the NMR spectral database will necessitate batch processing of the data to examine consistency between the spectral-structure pairs and other forms of data validation. The intention is to take our experiences in this work applied to a public patents corpus and apply it to the RSC back file of publications to mine data and enable new paths to the discoverability of both data and the associated publications.
This workshop, organized by HESI’s Technical Committee on Multicomponent Substances (MCS) and Chemical Substances of Unknown or Variable Composition, Complex Reaction Products and Biological Materials (UVCB), was held on December 10-11, 2018 at the HESI Office in Washington, DC. The primary objective of this workshop was to explore approaches to characterize and group UVCB substances considering both established traditional chemical characterization approaches along with lesser utilized “outside the box” approaches. This information was to inform the design of a roadmap to guide future work towards substance prioritization, and hazard and risk assessment frameworks. This work focused on ecological assessment, but human health risk assessment was not excluded from the discussion.
This talk provided an overview of non-targeted screening approaches for structure identification, especially in regards to UVCB chemicals with homologous series. It also provided an overview of the CompTox Chemicals Dashboard and cheminformatics approaches related to UVCB chemicals.
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized how chemicals are detected in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemistry Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow including visualization and access via the Chemistry Dashboard. These tools, data, and visualization approaches within an open chemistry resource provide a freely available software tool to support structure identification and non-targeted analysis. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on spectral and chemical compound databases in the environmental community and beyond. Increasingly, new methods are relying on open data resources. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Smaller, selective lists of chemicals (also called “suspect lists”) can be used to perform more efficient annotation. Mass spectral libraries can then be used to increase the confidence in tentative identification. Additional metadata (e.g. exposure and hazard information, reference and data source information) can be extremely useful to prioritize substances of high environmental interest. Exchanging information and “sharing structural linkages” between these resources requires extensive curation to ensure that the correct information is shared correctly, yet many valuable datasets arise from scientists and regulators with little official cheminformatics training. This talk will cover curation efforts undertaken to map spectral libraries (e.g. MassBank.EU, mzCloud) and suspect lists from the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) to unique chemical identifiers associated with the US EPA CompTox Chemistry Dashboard. The curation workflow takes advantage of years of experience, as well as contact with the original data providers, to enable open access to valuable, curated datasets to support environmental scientists and the broader research community (e.g. https://comptox.epa.gov/dashboard/chemical_lists). Note: This abstract does not reflect US EPA policy.
SETAC Rome Non-Target Screening For Chemical DiscoveryEmma Schymanski
SETAC Special Session: Solutions for emerging pollutants – Towards a holistic chemical quality status assessment in European freshwater resources
Non-target Screening for Holistic Chemical Monitoring and Compound Discovery: Open Science, Real-time and Retrospective Approaches
Emma L. Schymanski [1*], Reza Aalizadeh [2], Nikiforos Alygizakis [3], Juliane Hollender [4], Martin Krauss [5], Tobias Schulze [5], Jaroslav Slobodnik [3], Nikolaos S. Thomaidis [2] and Antony J. Williams [6]
[1] Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
[2] National and Kapodistrian University of Athens, Department of Chemistry, Athens, Greece.
[3] Environmental Institute, Koš, Slovak Republic
[4] Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland
[5] UFZ – Helmholtz Centre for Environmental Research, Leipzig, Germany
[6] National Centre for Computational Toxicity, US EPA, Research Triangle Park, Durham, NC, USA
*presenting author: emma.schymanski@uni.lu
Keywords: Non-target screening, Open Science, Mass Spectrometry, Monitoring
Non-target screening (NTS) with high resolution mass spectrometry (HR-MS) provides opportunities to discover chemicals, their dynamics and effects on the environment far beyond the current 45 “priority pollutants” or even “known” chemicals. Open science and the exchange of information (between for example scientists and regulatory authorities) has a critical role to play in the continuing evolution of NTS. Using a variety of case studies from Europe, this talk will highlight how open science activities such as MassBank.EU (https://massbank.eu), the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) and NORMAN Digital Sample Freezing Platform (http://norman-data.eu) as well as the US EPA CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/) can support NTS. Further, it will show how initiatives such as near “real time” monitoring of the River Rhine and retrospective screening via so-called “digital freezing” platforms have opened up new potential for exploring the dynamics and distribution even of as-yet-unidentified chemicals. Collaborative European and international activities facilitate data exchange amongst analytical data scientists and enable quick, effective and reproducible provisional compound identification in digitally archived HR-MS data. This is leading to new ways of assessing and prioritizing the next generation of “emerging pollutants” in the environment, enabling a pro-active approach to environmental assessment unthinkable only a few years ago. Note: This abstract does not reflect US EPA policy.
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on chemical compound databases. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Additional data (e.g. fragmentation, physicochemical properties, reference and data source information) is then used to select potential candidates, depending on the experimental context. However, these strategies require the presence of substances of interest in these compound databases, which is often not the case as no database can be fully inclusive. A prominent example with clear data gaps are surfactants, used in many products in our daily lives, yet often absent as discrete structures in compound databases. Linear alkylbenzene sulfonates (LAS) are a common, high use and high priority surfactant class that have highly complex transformation behaviour in wastewater. Despite extensive reports in the environmental literature, few of the LAS and none of the related transformation products were reported in any compound databases during an investigation into Swiss wastewater effluents, despite these forming the most intense signals. The LAS surfactant class will be used to demonstrate how the coupling of environmental observations with high resolution mass spectrometry and detailed literature data (expert knowledge) on the transformation of these species can be used to progressively “fill the gaps” in compound databases. The LAS and their transformation products have been added to the CompTox Chemistry Dashboard (https://comptox.epa.gov/) using a combination of “representative structures” and “related structures” starting from the structural information contained in the literature. By adding this information into a centralized open resource, future environmental investigations can now profit from the expert knowledge previously scattered throughout the literature. Note: This abstract does not reflect US EPA policy.
Read-across (RAx) is probably the most used strategy to waive new demanding in vivo tests for toxicological assessment of chemicals. It is based on the possibility to translate available information from well-characterized chemicals (source) to the substance for which there is a toxicological data gap (target). In spite of the widespread use, regulatory acceptance is still limited,
New Approach Methods (NAM) may be used to confirm chemical and toxicological similarities and to contribute to the reduction of the uncertainty.
Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software approaches that can be used to support automated structure verification and elucidation, specifically for Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS). This presentation will provide an overview of freely available data, tools, databases and approaches available to support chemical structure verification and elucidation and highlight some of the known issues regarding data quality and suggest approaches for resolving some of the issues. The importance of structure and spectral standards for data exchange will be discussed, especially with regard to how spectral data can be made openly available to the community via online tools and through scientific publishing. This work does not necessarily reflect U.S. EPA policy.
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
The scientific and medical literature contains huge amounts of valuable unused information. This talk shows how to discover it, extract, re-use and interpret it. Wikidata is presented as a key new tool and infrastructure. Everyone can become involved. However some of the barriers to use are sociopolitical and these are identified and discussed.
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
Chemical databases have been around for decades, but in recent years we have observed a qualitative change from rather small in-house built proprietary databases to large-scale, open and increasingly complex chemistry knowledgebases. This tectonic shift has imposed new requirements for database design and system architecture as well as the implementation of completely new components and workflows which did not exist in chemical databases before. Probably the most profound change is being caused by the linked nature of modern resources - individual databases are becoming nodes and hubs of a huge and truly distributed web of knowledge. This change has important aspects such as data and format standards, interoperability, provenance, security, quality control and metainformation standards.
ChemSpider at the Royal Society of Chemistry was first public chemical database which incorporated rigorous quality control by introducing both community curation and automated quality checks at the scale of tens of millions of records. Yet we have come to realize that this approach may now be incomplete in a quickly changing world of linked data. In this presentation we will talk about challenges associated with building modern public and private chemical databases as well as lessons that we have learned from our past and present experience. We will also talk about solutions for some common problems.
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
In the quest to translate the results biomedical research into effective clinical applications, many are now trying to make sense of the large and rapidly growing amount of public biomedical data. However, substantial challenges exist in traversing the currently fragmented data landscape. In this talk, I will discuss our efforts to use Semantic Web technologies to facilitate biomedical research through the formulation, publication, integration, and exploration of facts, expert knowledge, and web services.
Identification of unknowns in mass spectrometry based non-targeted analyses (NTA) requires the integration of complementary pieces of data to arrive at a confident, consensus structure. Researchers use chemical reference databases, spectral matching, fragment prediction tools, retention time prediction tools, and a variety of other data to arrive at tentative, probable, and confirmed, if possible, identifications. With the diverse, robust data contained within the US EPA’s CompTox Chemistry Dashboard (https://comptox.epa.gov), the goal of this research is to identify and implement a harmonized identification tool and workflow using previously generated chemistry data. Data has been compiled from product use, functional use prediction models, environmental media occurrence prediction models, and PubMed references, among other sources. We will report on our development of a visualization tool whereby users can visualize the relative contribution of identification-based metrics on a list of candidate structures and observe the greatest likelihood of occurrence. These data and visualization tools support NTA identification via the Dashboard and demonstrate an open, accessible tool for all users of HRMS data. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Generating Biomedical Hypotheses Using Semantic Web TechnologiesMichel Dumontier
With its focus on investigating the nature and basis for the sustained existence of living systems, modern biology has always been a fertile, if not challenging, domain for formal knowledge representation and automated reasoning. Over the past 15 years, hundreds of projects have developed or leveraged ontologies for entity recognition and relation extraction, semantic annotation, data integration, query answering, consistency checking, association mining and other forms of knowledge discovery. In this talk, I will discuss our efforts to build a rich foundational network of ontology-annotated linked data, discover significant biological associations across these data using a set of partially overlapping ontologies, and identify new avenues for drug discovery by applying measures of semantic similarity over phenotypic descriptions. As the portfolio of Semantic Web technologies continue to mature in terms of functionality, scalability and an understanding of how to maximize their value, increasing numbers of biomedical researchers will be strategically poised to pursue increasingly sophisticated KR projects aimed at improving our overall understanding of the capability and behavior of biological systems.
Video z této přednášky naleznete v repozitáři NTK: http://repozitar.techlib.cz/record/784
Chcete vědět víc? Mnoho dalších prezentací, videí z konferencí, fotografií i jiných dokumentů je k dispozici v institucionálním repozitáři NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
Methods for extracting facts from the scientific literature, and linking them to Wikidata IDs. Wikidata is introduced by an architectural example and bioscience. Then we explore how data can be extracted from text and from images
Linking the silos. Data and predictive models integration in toxicology.Nina Jeliazkova
Silo storage system are typically designed to store one single type of grain. An information silo is characterized by a rigid design, not allowing easy exchange of information and integration with other systems. Life sciences software systems and those applied in toxicology in particular, are often developed independently and compatibility is rarely perceived as a primary design goal. The traditional approach to the integration challenge is to build data warehouses and custom applications managing all the information under a central authority and storing data in a single database schema. The majority of the existing chemical databases rely on a centralized design, offering data access via unique and incompatible interfaces. Loosely coupled systems, allowing end users to share and integrate data on the fly are increasingly considered more appropriate and actively developed [1], but not yet the mainstream in cheminformatics. Our experience in several past and on-going projects, dealing with data integration of chemical structures and toxicity databases, has been reflected in the architecture of the systems that we have designed (e.g. OpenTox, a set of distributed web services, with semantic representation of resources, including data [2], [3], [4]). The core concepts of the information integration are the web service oriented architecture and a data representation design that includes semantic and provenance information uncoupled from the software code and any proprietary metadata.
This presentation highlights known challenges with the production of high quality chemical databases and outline recent efforts made to address these challenges. Specific examples will be provided illustrating these challenges within the U.S. Environmental Protection Agency (EPA) Computational Toxicology Program. This includes consolidating EPA’s ACToR and DSSTox databases, augmenting computed properties and list search features, and introducing quality metrics to assess confidence in chemical structure assignments across hundreds of thousands of chemical substance records. The past decade has seen enormous investments in the generation and release of data from studies of chemicals and their toxicological effects. There is, however, commonly little concern given to provenance and, more generally, to the quality of the data. The presentation will emphasize the importance of rigorous data review procedures, progress in web-based public access to accurate chemical data sets for use in predictive modeling, and the benefits that these efforts will deliver to toxicologists to embrace the “Big Data” era.
This abstract does not necessarily represent the views of the U.S. Environmental Protection Agency.
Asking the scientific literature to tell us about metabolismpetermurrayrust
Talk at Lhasa (https://www.lhasalimited.org/) a leading organization for "in silico prediction and database systems for use in metabolism, toxicology and related sciences". ContentMine software can extract data from papers on compound metabolism in reusable semantic form, including metabolic pathways, pharmacokinetic data.
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
ContentMining (Text and Data Mining) is now legal in the UK for non-commercial research. Cambridge UK is a natural centre, with several components:
* a world-class University and Library
* many publishers, both Open Access and conventional
* a digital culture
* ContentMine - a leading proponent and practitioner of mining
Cambridge University Press welcomes content mining and invited PMR to give a talk there. He showed the technology and protocols and proposed a practical way forward in 2017
The phrase “Big Data” is generally used to describe a large volume of structured and/or unstructured data that cannot be processed using traditional database and software techniques. In the domain of chemistry the Royal Society of Chemistry certainly hosts large structured databases of chemistry data, for example ChemSpider, as well as unstructured content, in the form of our collection of scientific articles. Our research literature provides value to their readership and, at present, as an example of one of our databases, ChemSpider is accessed by many tens of thousands of scientists every day. But do these collections constitute “Big Data” or is it the potential which lies within the collections that can contribute to the Big Data movement. This presentation will discuss our activities to contribute both data, and service-based access to our data sets, to support grant-based projects such as the Innovative Medicines Initiative Open PHACTS project (to support drug discovery) and the PharmaSea initiative (to identify novel natural products from the ocean). We will also provide an overview of our activities to perform data mining of public patent collections and examine what can be done with the data. We are presently extracting physicochemical properties and textual forms of NMR spectra and, with the resulting data, are building predictive models (for melting points at present) and assembling a large NMR spectral database containing many hundreds of thousands of spectral-structure pairs. Our experiences to date have demonstrated that we are working at the edge of current algorithmic and computing capabilities for predictive model building, with over a quarter of a million melting points producing a matrix of over 200 billion descriptors. Our work to produce the NMR spectral database will necessitate batch processing of the data to examine consistency between the spectral-structure pairs and other forms of data validation. The intention is to take our experiences in this work applied to a public patents corpus and apply it to the RSC back file of publications to mine data and enable new paths to the discoverability of both data and the associated publications.
This workshop, organized by HESI’s Technical Committee on Multicomponent Substances (MCS) and Chemical Substances of Unknown or Variable Composition, Complex Reaction Products and Biological Materials (UVCB), was held on December 10-11, 2018 at the HESI Office in Washington, DC. The primary objective of this workshop was to explore approaches to characterize and group UVCB substances considering both established traditional chemical characterization approaches along with lesser utilized “outside the box” approaches. This information was to inform the design of a roadmap to guide future work towards substance prioritization, and hazard and risk assessment frameworks. This work focused on ecological assessment, but human health risk assessment was not excluded from the discussion.
This talk provided an overview of non-targeted screening approaches for structure identification, especially in regards to UVCB chemicals with homologous series. It also provided an overview of the CompTox Chemicals Dashboard and cheminformatics approaches related to UVCB chemicals.
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized how chemicals are detected in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemistry Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow including visualization and access via the Chemistry Dashboard. These tools, data, and visualization approaches within an open chemistry resource provide a freely available software tool to support structure identification and non-targeted analysis. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Kamel Mansouri
AAAS annual meeting (Boston, Feb 2017)
Humans are potentially exposed to tens of thousands of man-made chemicals in the environment. It is well known that some environmental chemicals mimic natural hormones and thus have the potential to be endocrine disruptors. Most of these environmental chemicals have never been tested for their ability to disrupt the endocrine system, in particular, their ability to interact with the estrogen receptor. EPA needs tools to prioritize thousands of chemicals, for instance in the Endocrine Disruptor Screening Program (EDSP). Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) was intended to be a demonstration of the use of predictive computational models on HTS data including ToxCast and Tox21 assays to prioritize a large chemical universe of 32464 unique structures for one specific molecular target – the estrogen receptor. CERAPP combined multiple computational models for prediction of estrogen receptor activity, and used the predicted results to build a unique consensus model. Models were developed in collaboration between 17 groups in the U.S. and Europe and applied to predict the common set of chemicals. Structure-based techniques such as docking and several QSAR modeling approaches were employed, mostly using a common training set of 1677 compounds provided by U.S. EPA, to build a total of 42 classification models and 8 regression models for binding, agonist and antagonist activity. All predictions were evaluated on ToxCast data and on an external validation set collected from the literature. In order to overcome the limitations of single models, a consensus was built weighting models based on their prediction accuracy scores (including sensitivity and specificity against training and external sets). Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. The final consensus predicted 4001 chemicals as actives to be considered as high priority for further testing and 6742 as suspicious chemicals. The same approach is now being applied on a larger scale project to predict the potential androgen receptor (AR) activity of chemicals. This project called CoMPARA (Collaborative Modeling Project for Androgen Receptor Activity) is a collaboration between 35 international groups working on a common set of ~55k chemicals.
This abstract does not necessarily reflect U.S. EPA policy
The Advanced Serology Course is a comprehensive and specialized program designed for professionals in the fields of clinical diagnostics, immunology, and laboratory medicine who seek an in-depth understanding of advanced serological techniques and methodologies. This advanced-level course builds upon foundational knowledge in serology and delves into sophisticated concepts and cutting-edge technologies.
The Karolinska Institute (KI) is the largest centre for medical education and research in Sweden and the home of the Nobel Prize in Physiology or Medicine.
KI consists of 22 departments and 600 research groups dedicated to improving human health through research and higher education.
The role of the Kohonen/Grafström team has been to guide the application, analysis, interpretation and storage of so called “omics” technology-derived data within the service-oriented subproject “ToxBank”.
Data analytics to support exposome research course slidesChirag Patel
We present new publicly available tools to bootstrap your own data-driven investigations to correlate the environment with phenotype. Course materials here: http://www.chiragjpgroup.org/exposome-analytics-course/
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryJean-Claude Bradley
Mel Reichman, senior investigator and director of the LIMR Chemical Genomics Center at the Lankenau Institute for Medical Research presents at the chemistry department at Drexel University on November 12, 2009.
Modern drug discovery by high-throughput screening (HTS) begins with testing hundreds of thousands of compounds in biological assays. The confirmed hit rate for typical HTS is less than 0.5%; therefore, 99.5% of the costs of HTS are for generating null data. Orthogonal convolution of compound libraries (OCL) is 500% more efficient than present HTS practice. The OCL method combines 10 compounds per well. An advantage of this method is that each compound is represented twice in two separately arrayed pools. The potential for the approach to better enable academic centers of excellence to validate medicinally relevant biological targets is discussed.
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
This presentation highlighted how data curation impacts the reliability of QSAR models. We examined key datasets related to environmental endpoints to validate across chemical structure representations (e.g., mol file and SMILES) and identifiers (chemical names and registry numbers), and approaches to standardize data into QSAR-ready formats prior to modeling procedures. This allowed us to quantify and segregate data into quality categories. This improved our ability to evaluate the resulting models that can be developed from these data slices, and to quantify to what extent efforts developing high-quality datasets have the expected pay-off in terms of predicting performance. The most accurate models that we build will be accessible via our public-facing platform and will be used for screening and prioritizing chemicals for further testing.
MelTree: A Novel Workflow for the Automated Identification of a Large Number ...Kate Barlow
Jean-Christophe Avarre, Head of the High Throughput qPCR Platform and Research Group Leader, University of Montpellier, France
Nucleic acid characterization by High Resolution Melting (HRM) is a simple, flexible, low-cost and powerful technique for identifying sequence variations, making it attractive for a broad range of diagnostic and research applications. However, current procedures for analyzing HRM curves are not suited for large data sets.
For this reason, we have developed an innovative method that enables the simultaneous discrimination of a large number of variants from their HRM profile. This method relies on the establishment of a melting profile library and offers a fully automated analysis workflow.
Using this workflow, it was possible to discriminate 19 nontuberculous mycobacterial species from their HRM profile. It was also possible to design a multiplex diagnostic tool targeting five pathogens responsible for abortive diseases in cattle.
Presented by Richard Kidd at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry", Monday 28 November 2011 at The Linnean Society, Burlington Square, London run by the RSC CICAG group.
The Role of Retention Time in Untargeted MetabolomicsJan Stanstrup
This presentation is about why retention time is important in untargeted metabolomics and how it can be used to identify compounds without having access to standards.
Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-...Annex Publishers
In recent years, an increasing number of new designer-drugs have increased the demands for general toxicological screening [1]. Limited screening based on immunoassays is commonly used in clinical toxicology, whereas more comprehensive approaches are common in forensic toxicology such as screening based on Gas or Liquid Chromatography (GC or LC) approaches [2]. The classic approach has been gas chromatography-Mass Spectrometry (GC-MS) combined with LC-diode-array detection (DAD) for systematic toxicological analysis. This setup has the advantage of covering a very broad spectrum of drugs and illicit substances when combined with library search facilities. However, the analytical sensitivity and specificity of LC-DAD may not be optimal. Thus, more sensitive and specific screening techniques based exclusively on LC combined with mass spectrometry have gained popularity. Multi-target screening and quantitation methods based on LC-tandem mass spectrometry (MS/MS) may provide detection of hundred or more compounds [3]. Using ion-trap MSn detection, several hundred compounds can be detected [4]. A more extended screening is possible using time-of-flight (TOF) mass spectrometry, which is a high-resolution mass spectrometry technique that detects drugs on the basis of their exact mass. Using this technique, scanning is performed over all masses for low molecular drugs, and detected signals can be related to a library of exact drug masses. Retention time and fragmentation pattern contribute to the identification. In principle, it is possible to screen for thousands of compounds, although issues related to software capabilities may limit the number of compounds to several hundred in daily practice [5-7].
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
2. 2
Environmental Cheminformatics and Biomedicine?
Chemistry
what?
Medicine
why?
Biology
how?
Patient cohort
Yeast
MouseHuman
Zebrafish
Microbiome
3. 3
Our challenge? We still have many unknowns …
(l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich.
Wastewater
Cells
4. 4
Target, Suspect and Non-Target Screening
KNOWNS SUSPECTS No Prior Knowledge
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
NON-TARGET
SCREENING
Targets found Suspects found Masses of interest
(Molecular formula)
DATABASE
SEARCH
STRUCTURE
GENERATION
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
Time, Effort & Number of Compounds….
SUSPECTS
SPECTRUM
SEARCH
Spectral match
5. 5
Identification Strategies and Confidence
Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
Peak
picking
Non-target HR-MS(/MS) Acquisition
Target
Screening
Suspect
Screening
Non-target
Screening
Start
Level 1 Confirmed Structure
by reference standard
Level 2 Probable Structure
by library/diagnostic evidence
Start
Level 3 Tentative Candidate(s)
suspect, substructure, class
Level 4 Unequivocal Molecular Formula
insufficient structural evidence
Start
Level 5 Mass of Interest
multiple detection, trends, …
“downgrading” with
contradictory evidence
Increasing identification
confidence
Target list Suspect list
Peak picking or XICs
7. 7
Target Analysis: Status Quo (>364 targets)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Target List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
Targets found
Confirmation and quantification of compounds present
Sampling extraction (SPE) HPLC separation HR-MS/MS
TPs!
8. 8
Target Analysis: Status Quo (>364 targets)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Target List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
Targets found
Confirmation and quantification of compounds present
Sampling extraction (SPE) HPLC separation HR-MS/MS
m/z
RT
9. 9
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Artificial Sweeteners
Diclofenac
Pictures: www.coca-cola-com; www.rivella.ch; www.voltargengel.com
10. 10
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Search in mass spectral libraries
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
11. 11
Searching Mass Spectral Libraries
o … which one?
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
12. 12
Do we need all these libraries?
Vinaixa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005
o Yes … most libraries still have many unique entries
= HMDB,
GNPS,
MassBank,
ReSpect
Compound lists
provided by:
S. Stein, R. Mistrik, Agilent
13. 13
Mind the Gap!
Frainay, C. et al. (2018) “Mind the Gap: …” Metabolites: http://www.mdpi.com/2218-1989/8/3/51
o Best library to choose depends highly on your dataset
• Example: MSforID (https://msforid.com/) is poor for metabolic
networks – but great for forensic toxicology!
15. 15
Creating High-Quality Mass Spectra
Stravs, Schymanski, Singer and Hollender, 2013, Journal of Mass Spectrometry, 48, 89–99. DOI: 10.1002/jms.3131
Automatic MS and MS/MS
Recalibration and Clean-up
Remove interfering peaks
Spectral Annotation with
- Experimental Details
- Compound Information
https://github.com/MassBank/RMassBank/
http://bioconductor.org/packages/RMassBank/
16. 16
Communicating Mass Spectra for Mixtures
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
Formulas: http://sourceforge.net/projects/genform/
Meringer et al, 2011, MATCH 65, 259-290
Data: Schymanski et al. 2014, ES&T, 48:
1811-1818. DOI: 10.1021/es4044374
Chromatography and MS/MS Annotation
Literature: LIT00034,35
Sample: ETS00002
Standard: ETS00016,17,19,20
https://github.com/MassBank/RMassBank/
17. 17
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Search in mass spectral libraries
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
18. 18
Suspect Screening: Benzotriazole TPs
Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z
28 Suspects
HPLC separation and HR-MS/MS
SUSPECT
SCREENING
11 masses for
6 suspect formulas
7 with MS/MS
1 reference std.
1 TP confirmed
1 TP “likely”, no std.
[UM-PPS]
↓
Eawag-PPS
↓
[enviPath]
19. 19
Suspect Screening: Benzotriazole TPs
Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z
28 Suspects
HPLC separation and HR-MS/MS
SUSPECT
SCREENING
11 masses for
6 suspect formulas
7 with MS/MS
1 reference std.
1 TP confirmed
1 TP “likely”, no std.
[UM-PPS]
↓
Eawag-PPS
↓
[enviPath]
N
N
N
H
O
OH
N
N
N
H
O OH - Predicted with
Eawag-PPS
- No standard
- Not in databases
(at that time)
- Confirmed with
reference std.
- Observed in
WWTP effluents
21. 21
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Search in mass spectral libraries
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
22. 22
European (World-)Wide Exchange of Suspects
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
NORMAN Suspect List Exchange:
http://www.norman-network.com/?q=node/236
23. 23
NORMAN Suspect List Exchange
o http://www.norman-network.com/?q=node/236
Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
ReferencesFull Lists
24. 24
NORMAN SusDat (merged list)
https://www.norman-network.com/nds/susdat/
https://comptox.epa.gov/dashboard/chemical_lists/susdat
Schymanski, Aalizadeh et al. in prep; https://www.researchgate.net/project/Supporting-Mass-Spectrometry-Through-Cheminformatics
25. 25
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Search in mass spectral libraries
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
26. 26
RECAP: Target Analysis: Status Quo (>364 targets)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Target List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
Targets found
Confirmation and quantification of compounds present
Sampling extraction (SPE) HPLC separation HR-MS/MS
m/z
RT
28. 28
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
S OO
O
-
O
S
O
-
O
CH2
m/z = 79.96 m/z = 183.01
Picture: www.momsteam.com
29. 29
Surfactant Screening From Literature
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Literature sources
o Formulas, masses (ions), retention times and intensities
o Spectra of selected compounds (different instruments)
Gonzalez et al. Rapid Comm.
Mass Spec. 2008, 22: 1445-54
Lara-Martin et al. EST. 2010, 44: 1670-1676
30. 30
Homologous Series Detection
M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374
http://www.envihomolog.eawag.ch/
Search for
discrete
mass
differences
S OO
OH
CH3
CH3
m
n
C9H19
O
O
S
O
O
OHm
31. 31
Homologous Series Detection
M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374
S OO
OH
CH3
CH3
m
n
DATS
S OO
OH
O
OH
CH3
()n ()m
SPAC
S OO
OH
O
OHCH3
()n
()m
STAC
http://www.envihomolog.eawag.ch/
32. 32
Supporting Evidence for Homologues
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
Formulas: http://sourceforge.net/projects/genform/
Meringer et al, 2011, MATCH 65, 259-290
Data: Schymanski et al. 2014, ES&T, 48:
1811-1818. DOI: 10.1021/es4044374
Chromatography and MS/MS Annotation
Literature: LIT00034,35
Sample: ETS00002
Standard: ETS00016,17,19,20
https://github.com/MassBank/RMassBank/
33. 33
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Acesulfame
Diclofenac
Cyclamate
Saccharin
C10DATS
C10SPAC
SPA5C
C15DATS
STA6C
C9DATS
SPA2DC
S OO
OH
O
OH
CH3
S OO
OH
CH3
CH3
()n
()m
SPAC
DATS
()n ()m
34. 34
Cross-Linking Homologues in the Dashboard
Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6
https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
35. 35
What about Non-Target Screening?
Target List Suspect List (no prior information)
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
NON-TARGET
SCREENING
Targets found Suspects found Masses of interest
(Molecular formula)
DATABASE
SEARCH
STRUCTURE
GENERATION
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
Number of compounds
36. 36
Target List Suspect List
(e.g. NORMAN,
LMC, Eawag-PPS,
ReSOLUTION)
Componentization
(nontarget)
TARGET
ANALYSIS
SUSPECT
SCREENING
NON-TARGET
SCREENING
(enviMass,
vendor software)
Gather evidence
(nontarget,
ReSOLUTION,
RMassBank)
Masses of interest
Molecular formula
determination
(enviPat, GenForm)
Non-target identification
(MetFrag2.3, ReSOLUTION)
Sampling extraction (SPE) HPLC separation HR-MS/MS
Detection of blank/blind/noise/internal standards; time trend analysis (enviMass)
Conversion (Proteowizard) and Peak Picking (enviPick, xcms, MZmine, …)
Prioritization
(enviMass)
MS/MS Extraction
(RMassBank)
Interpretation, confirmation, peak inventory, confidence and reporting
37. 37
MetFrag Relaunched!
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
Status: 2010 => 2019
5 ppm
0.001 Da
mz [M-H]-
213.9637
ChemSpider
or
PubChem± 5 ppm
RT: 4.54 min
355 InChI/RTs
References
External Refs
Data Sources
RSC Count
PubMed Count
Suspect Lists
MS/MS
134.0054 339689
150.0001 77271
213.9607 632466
Elements: C,N,S
S OO
OH
38. 38
MetFrag Relaunched!
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
39. 39
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
MetFrag Relaunched!
40. 40
State of the Art in Small Molecule Identification
Schymanski et al, 2017, J Cheminf., DOI: 10.1186/s13321-017-0207-1 www.casmi-contest.org
Metadata is critical to improving annotation of known unknowns!
44. 44
Connecting and Enhancing Open Resources
https://www.slideshare.net/EmmaSchymanski/small-molecules-in-big-data-analytica-munich
o Sharing knowledge is a win-win situation
2014 2015: found in waters across Europe
2016: 1 datapoint cross-annotates 3072 in GNPS
Hits in GNPS MassIVE datasets:
Surfactants: http://goo.gl/7sY9Pf
2017: Early-Warning
System is born
2018: Highlighted in
Science
45. 45
NORMAN Digital Sample Freezing Platform
“Live” retrospective screening of known and unknown
chemicals in European samples (various matrices)
http://norman-data.eu/ and Alygizakis et al, submitted.
46. 46
Interactive heatmap available at http://norman-data.eu/NORMAN-REACH
NORMAN Digital Sample Freezing Platform
Retrospective screening of REACH chemicals in
Black Sea samples (various matrices)
47. 47
Real-time Monitoring of the Rhine River
Hollender, Schymanski, Singer & Ferguson, 2018, ES&T Feature, 51:20, 11505-11512. DOI: 10.1021/acs.est.7b02184
Previously unknown chemicals detected due to “stand-out” patterns
48. 48
NORMAN Digital Sample Freezing Platform
Future work: use results of unknowns to drive prioritization efforts
http://norman-data.eu/ and Alygizakis et al, submitted
50. 50
Take Home Messages
Unknowns and High Resolution Mass Spectrometry
o Over 60 % of HR-MS peaks are potentially relevant but unknown
Environment
Cells
51. 51
Take Home Messages
o Over 60 % of HR-MS peaks are potentially relevant but unknown
o Annotating unknowns requires data and evidence from many different sources
o Many excellent workflows available to collate this information
o Incorporation of all available metadata is critical to success!
o E.g. MetFrag has greatly improved the speed and success of tentative
identification of “known unknowns”: 15 % => 89 % Ranked Number 1
o https://ipb-halle.github.io/MetFrag/
Unknowns and High Resolution Mass Spectrometry
52. 52
Take Home Messages
o Over 60 % of HR-MS peaks are potentially relevant but unknown
o Annotating unknowns requires data and evidence from many different sources
o Exchange expert knowledge worldwide
o Community efforts contribute greatly to improved cross-annotation
o Information in the public domain helps everyone!
o You never know when it will help you
Unknowns and High Resolution Mass Spectrometry
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365