SETAC Rome Non-Target Screening For Chemical DiscoveryEmma Schymanski
SETAC Special Session: Solutions for emerging pollutants – Towards a holistic chemical quality status assessment in European freshwater resources
Non-target Screening for Holistic Chemical Monitoring and Compound Discovery: Open Science, Real-time and Retrospective Approaches
Emma L. Schymanski [1*], Reza Aalizadeh [2], Nikiforos Alygizakis [3], Juliane Hollender [4], Martin Krauss [5], Tobias Schulze [5], Jaroslav Slobodnik [3], Nikolaos S. Thomaidis [2] and Antony J. Williams [6]
[1] Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
[2] National and Kapodistrian University of Athens, Department of Chemistry, Athens, Greece.
[3] Environmental Institute, Koš, Slovak Republic
[4] Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland
[5] UFZ – Helmholtz Centre for Environmental Research, Leipzig, Germany
[6] National Centre for Computational Toxicity, US EPA, Research Triangle Park, Durham, NC, USA
*presenting author: emma.schymanski@uni.lu
Keywords: Non-target screening, Open Science, Mass Spectrometry, Monitoring
Non-target screening (NTS) with high resolution mass spectrometry (HR-MS) provides opportunities to discover chemicals, their dynamics and effects on the environment far beyond the current 45 “priority pollutants” or even “known” chemicals. Open science and the exchange of information (between for example scientists and regulatory authorities) has a critical role to play in the continuing evolution of NTS. Using a variety of case studies from Europe, this talk will highlight how open science activities such as MassBank.EU (https://massbank.eu), the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) and NORMAN Digital Sample Freezing Platform (http://norman-data.eu) as well as the US EPA CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/) can support NTS. Further, it will show how initiatives such as near “real time” monitoring of the River Rhine and retrospective screening via so-called “digital freezing” platforms have opened up new potential for exploring the dynamics and distribution even of as-yet-unidentified chemicals. Collaborative European and international activities facilitate data exchange amongst analytical data scientists and enable quick, effective and reproducible provisional compound identification in digitally archived HR-MS data. This is leading to new ways of assessing and prioritizing the next generation of “emerging pollutants” in the environment, enabling a pro-active approach to environmental assessment unthinkable only a few years ago. Note: This abstract does not reflect US EPA policy.
Small Molecules in Big Data - Analytica MunichEmma Schymanski
Finding small molecules in big data
E.L. Schymanski, Belvaux/LU, A.J. Williams, North Carolina/USA
Dr. Emma L. Schymanski, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data - from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
Community Resources Connecting Chemistry and Toxicity Knowledge to Environmental Observations presented at the Disease Map Community Meeting 2018 in Paris
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
Chemical databases have been around for decades, but in recent years we have observed a qualitative change from rather small in-house built proprietary databases to large-scale, open and increasingly complex chemistry knowledgebases. This tectonic shift has imposed new requirements for database design and system architecture as well as the implementation of completely new components and workflows which did not exist in chemical databases before. Probably the most profound change is being caused by the linked nature of modern resources - individual databases are becoming nodes and hubs of a huge and truly distributed web of knowledge. This change has important aspects such as data and format standards, interoperability, provenance, security, quality control and metainformation standards.
ChemSpider at the Royal Society of Chemistry was first public chemical database which incorporated rigorous quality control by introducing both community curation and automated quality checks at the scale of tens of millions of records. Yet we have come to realize that this approach may now be incomplete in a quickly changing world of linked data. In this presentation we will talk about challenges associated with building modern public and private chemical databases as well as lessons that we have learned from our past and present experience. We will also talk about solutions for some common problems.
SETAC Rome Non-Target Screening For Chemical DiscoveryEmma Schymanski
SETAC Special Session: Solutions for emerging pollutants – Towards a holistic chemical quality status assessment in European freshwater resources
Non-target Screening for Holistic Chemical Monitoring and Compound Discovery: Open Science, Real-time and Retrospective Approaches
Emma L. Schymanski [1*], Reza Aalizadeh [2], Nikiforos Alygizakis [3], Juliane Hollender [4], Martin Krauss [5], Tobias Schulze [5], Jaroslav Slobodnik [3], Nikolaos S. Thomaidis [2] and Antony J. Williams [6]
[1] Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
[2] National and Kapodistrian University of Athens, Department of Chemistry, Athens, Greece.
[3] Environmental Institute, Koš, Slovak Republic
[4] Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland
[5] UFZ – Helmholtz Centre for Environmental Research, Leipzig, Germany
[6] National Centre for Computational Toxicity, US EPA, Research Triangle Park, Durham, NC, USA
*presenting author: emma.schymanski@uni.lu
Keywords: Non-target screening, Open Science, Mass Spectrometry, Monitoring
Non-target screening (NTS) with high resolution mass spectrometry (HR-MS) provides opportunities to discover chemicals, their dynamics and effects on the environment far beyond the current 45 “priority pollutants” or even “known” chemicals. Open science and the exchange of information (between for example scientists and regulatory authorities) has a critical role to play in the continuing evolution of NTS. Using a variety of case studies from Europe, this talk will highlight how open science activities such as MassBank.EU (https://massbank.eu), the NORMAN Suspect Exchange (http://www.norman-network.com/?q=node/236) and NORMAN Digital Sample Freezing Platform (http://norman-data.eu) as well as the US EPA CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/) can support NTS. Further, it will show how initiatives such as near “real time” monitoring of the River Rhine and retrospective screening via so-called “digital freezing” platforms have opened up new potential for exploring the dynamics and distribution even of as-yet-unidentified chemicals. Collaborative European and international activities facilitate data exchange amongst analytical data scientists and enable quick, effective and reproducible provisional compound identification in digitally archived HR-MS data. This is leading to new ways of assessing and prioritizing the next generation of “emerging pollutants” in the environment, enabling a pro-active approach to environmental assessment unthinkable only a few years ago. Note: This abstract does not reflect US EPA policy.
Small Molecules in Big Data - Analytica MunichEmma Schymanski
Finding small molecules in big data
E.L. Schymanski, Belvaux/LU, A.J. Williams, North Carolina/USA
Dr. Emma L. Schymanski, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Metabolomics and exposomics are amongst the youngest and most dynamic of the omics disciplines. While the molecules involved are smaller than proteomics and the other, larger “omics”, the challenges are in many ways greater. Elements are less constrained, there are no given “puzzle pieces” and there is a resulting explosion in terms of potential chemical space. It is impossible to even enumerate all chemically possible small molecules. The challenges and complexity of identifying small molecules even using the most advanced analytical technologies available today is immense. Current “big data” methods for small molecules rely heavily on chemical databases, the largest of which presently available contain ~100 million chemicals. Despite this large number, high resolution mass spectrometry (HR-MS) measurements contain tens of thousands of features, of which only a few percent can be annotated as “known” and confirmed as metabolites or chemicals of interest using these chemical databases. How can we find relevant small molecules in the ever increasing data loads? How can we annotate more of the unknown features in HR-MS experiments? This talk will present European, US and worldwide initiatives to help find small molecules in big data - from chemical databases to spectral libraries, real-time monitoring to retrospective screening. It will touch on the challenges of standardized structure representations, data curation and deposition. Finally, it will show how interdisciplinary communication, data sharing and pushing the boundaries of current capabilities can facilitate research efforts in metabolomics, exposomics and beyond. This abstract does not necessarily represent U.S. EPA policy.
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
Community Resources Connecting Chemistry and Toxicity Knowledge to Environmental Observations presented at the Disease Map Community Meeting 2018 in Paris
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
Chemical databases have been around for decades, but in recent years we have observed a qualitative change from rather small in-house built proprietary databases to large-scale, open and increasingly complex chemistry knowledgebases. This tectonic shift has imposed new requirements for database design and system architecture as well as the implementation of completely new components and workflows which did not exist in chemical databases before. Probably the most profound change is being caused by the linked nature of modern resources - individual databases are becoming nodes and hubs of a huge and truly distributed web of knowledge. This change has important aspects such as data and format standards, interoperability, provenance, security, quality control and metainformation standards.
ChemSpider at the Royal Society of Chemistry was first public chemical database which incorporated rigorous quality control by introducing both community curation and automated quality checks at the scale of tens of millions of records. Yet we have come to realize that this approach may now be incomplete in a quickly changing world of linked data. In this presentation we will talk about challenges associated with building modern public and private chemical databases as well as lessons that we have learned from our past and present experience. We will also talk about solutions for some common problems.
The Open PHACTS project delivers an online platform integrating a wide variety of data from across chemistry and the life sciences and an ecosystem of tools and services to query this data in support of pharmacological research, turning the semantic web from a research project into something that can be used by practising medicinal chemists in both academia and industry. In the summer of 2015 it was the first winner of the European Linked Data Award. At the Royal Society of Chemistry we have provided the chemical underpinnings to this system and in this talk we review its development over the past five years. We cover both our early work on semantic modelling of chemistry data for the Open PHACTS triplestore and more recent work building an all-purpose data platform, for which the Open PHACTS data has been an important test case, what has worked well, what's missing and where this is is likely to go in future.
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on chemical compound databases. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Additional data (e.g. fragmentation, physicochemical properties, reference and data source information) is then used to select potential candidates, depending on the experimental context. However, these strategies require the presence of substances of interest in these compound databases, which is often not the case as no database can be fully inclusive. A prominent example with clear data gaps are surfactants, used in many products in our daily lives, yet often absent as discrete structures in compound databases. Linear alkylbenzene sulfonates (LAS) are a common, high use and high priority surfactant class that have highly complex transformation behaviour in wastewater. Despite extensive reports in the environmental literature, few of the LAS and none of the related transformation products were reported in any compound databases during an investigation into Swiss wastewater effluents, despite these forming the most intense signals. The LAS surfactant class will be used to demonstrate how the coupling of environmental observations with high resolution mass spectrometry and detailed literature data (expert knowledge) on the transformation of these species can be used to progressively “fill the gaps” in compound databases. The LAS and their transformation products have been added to the CompTox Chemistry Dashboard (https://comptox.epa.gov/) using a combination of “representative structures” and “related structures” starting from the structural information contained in the literature. By adding this information into a centralized open resource, future environmental investigations can now profit from the expert knowledge previously scattered throughout the literature. Note: This abstract does not reflect US EPA policy.
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized how chemicals are detected in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemistry Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow including visualization and access via the Chemistry Dashboard. These tools, data, and visualization approaches within an open chemistry resource provide a freely available software tool to support structure identification and non-targeted analysis. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
3D printing & open access databases for crystallographic college educationVincent Scalfani
Moeck, P.; Stone-Sundberg, J.; Snyder, T.J.; Kaminsky, W.; Gražulis, S.; and all members of the International Advisory Board of the Crystallography Open Database. In 3D printing & open access databases for crystallographic college education, IUCr 23rd Congress on Crystallography, Montreal, Canada, 2014.
How can you access PubChem programmatically?Sunghwan Kim
Presented at the 255th American Chemical Society (ACS) National Meeting in New Orleans, LA (March. 19, 2018).
Building automated workflows that exploit the vast amount of data contained in PubChem requires programmatic access to the data through application programming interfaces (APIs). PubChem provides several programmatic access routes to its data, including Entrez Utilities (E-Utilities or E-Utils), PubChem Power User Gateway (PUG), PUG-SOAP, PUG-REST, PUG-View, and a REST-ful interface to PubChemRDF. This presentation provides an overview of these programmatic access tools, including recent updates, limitations, usage policies, and best practices.
*References*
(1) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem, Nucleic Acids Research, 2015, 43(W1):W605–W611. https://doi.org/10.1093/nar/gkv396
(2) An update on PUG-REST: RESTful interface for programmatic access to PubChem, Nucleic Acids Research, 2018, 46(W1):gky294. https://doi.org/10.1093/nar/gky294
G. Tautkeviciene and I. Ceseviciute. Open Science in Horizon 2020: Open Aire ...Aidis Stukas
Access to scientific information accelerates research, discovery and economic development. For more than a decade libraries take part in the open access movement to develop an environment which provides access to scientific information. Open access stakeholders and experts are working together to achieve these aims. The presentation will elaborate on the infrastructure of open access which bridges the gap between academia and society by sharing research findings in new ways.
OpenAIRE is an initiative by the European Commission for an Open Access Infrastructure for Research in Europe to support open scholarly communication, Open Science and access to the research output of the European funded projects. The infrastructure of OpenAIRE supports the Open Science policy implemented in the programme Horizon2020 by the European Commission.
Vaidas Morkevičius - Open Data for Enhancing the Quality of ResearchAidis Stukas
The European Social Survey (ESS) is an academically driven cross-national survey that has been conducted across Europe since 2001. The presentation will focus on three pillars of quality enhancement in social research: good data, good documentation and good analytics.
Synopsis:
D3.js is a Javascript library primarily used to create interactive data visualizations in the browser. Despite its growing popularity and warm community, getting started with D3 can be tricky. This talk covers the basics of D3 and sheds light on some of its main conceptual hurdles. It concludes by discussing some applications of D3 to big data.
About the speaker:
Sam Selikoff [ http://www.samselikoff.com/ | @samselikoff ] is a self-taught full-stack web developer. Formerly a graduate student of economics and finance, he unexpectedly discovered a passion for programming while doing data work for a consulting firm. He is currently focusing on client-side MVC and data visualization.
Thanks to our Sponsors
Microsoft [ http://microsoftnewengland.com ] for providing awesome venue for the event.
Rovi [ http://rovi.com ] for providing the food/drinks.
cognizeus [ http://cognizeus.com ] for hosting the event and providing books to give away as raffle.
Where do data visualizations come from? Flatiron students Emily Simonton and Mandy Yeung talk D3—a Javascript Library for manipulating documents based on data.
Brigita Serafinavičiūtė. Open science – The Essential Ingredient of Our FutureAidis Stukas
3O - is the new popular abbreviation in the European Research Area. Pushed forward by Commissioner for Research, Science and Innovation it stands for "Open Science, Open Innovation, Open to the World". What is the ideology behind this? Maybe it is just a nicely coined catchy expression? What is the role of Open Science for our future? How far have we gone towards this vision?
Crystallography orientations of Silicon wafers (<100> and <110>) with different primary flats, the positions of <111> planes, and the visualization of etching profiles in crystal level.
More information and resources at - http://ow.ly/yq3OR
2014 is the UNESCO International Year of Crystallography, celebrating 100 years since X-ray diffraction allowed scientists to study the detailed structure of crystalline materials. Now crystallography is used in practically all science disciplines, from geologists analysing and dating meteorites to chemists synthesising brand new drugs to fight disease. Advanced technology, including synchrotrons and electron microscopes, now allow scientists to ‘see’ the structure of a variety of materials, including proteins, viruses, and drugs.
This session will feature Prof Andrea Gerson, who uses crystallography in her current research at UniSA. Andrea is internationally recognised, and can speak firsthand about the exciting work happening at the Australian Synchrotron. Teachers watching online will be able to ask Andrea questions using the RiAus chatroll.
This online session, targeted at Years 7-10 teachers, will outline the scientific principles behind crystallography, how it is being used currently and the problems it could help to solve in the future. Potential career paths and in-class activities will also be discussed.
The Open PHACTS project delivers an online platform integrating a wide variety of data from across chemistry and the life sciences and an ecosystem of tools and services to query this data in support of pharmacological research, turning the semantic web from a research project into something that can be used by practising medicinal chemists in both academia and industry. In the summer of 2015 it was the first winner of the European Linked Data Award. At the Royal Society of Chemistry we have provided the chemical underpinnings to this system and in this talk we review its development over the past five years. We cover both our early work on semantic modelling of chemistry data for the Open PHACTS triplestore and more recent work building an all-purpose data platform, for which the Open PHACTS data has been an important test case, what has worked well, what's missing and where this is is likely to go in future.
The increasing popularity of high mass accuracy non-target mass spectrometry methods has yielded extensive identification efforts based on chemical compound databases. Candidate structures are often retrieved with either exact mass or molecular formula from large resources such as PubChem, ChemSpider or the EPA CompTox Chemistry Dashboard. Additional data (e.g. fragmentation, physicochemical properties, reference and data source information) is then used to select potential candidates, depending on the experimental context. However, these strategies require the presence of substances of interest in these compound databases, which is often not the case as no database can be fully inclusive. A prominent example with clear data gaps are surfactants, used in many products in our daily lives, yet often absent as discrete structures in compound databases. Linear alkylbenzene sulfonates (LAS) are a common, high use and high priority surfactant class that have highly complex transformation behaviour in wastewater. Despite extensive reports in the environmental literature, few of the LAS and none of the related transformation products were reported in any compound databases during an investigation into Swiss wastewater effluents, despite these forming the most intense signals. The LAS surfactant class will be used to demonstrate how the coupling of environmental observations with high resolution mass spectrometry and detailed literature data (expert knowledge) on the transformation of these species can be used to progressively “fill the gaps” in compound databases. The LAS and their transformation products have been added to the CompTox Chemistry Dashboard (https://comptox.epa.gov/) using a combination of “representative structures” and “related structures” starting from the structural information contained in the literature. By adding this information into a centralized open resource, future environmental investigations can now profit from the expert knowledge previously scattered throughout the literature. Note: This abstract does not reflect US EPA policy.
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
Non-targeted and suspect screening studies using high resolution mass spectrometry (HRMS) have revolutionized how chemicals are detected in complex matrices. However, data processing remains challenging due to the vast number of chemicals detected in samples, software and computational requirements of data processing, and inherent uncertainty in confidently identifying chemicals from candidate lists. The US EPA has developed functionality within the CompTox Chemistry Dashboard (https://comptox.epa.gov) to address challenges related to data processing and analysis in HRMS. These tools include the generation of “MS-Ready” structures to optimize database searching, retention time prediction for candidate reduction, consensus ranking using chemical metadata, and in silico MS/MS fragmentation prediction for spectral matching. Combining these tools into a comprehensive workflow improves certainty in candidate identification. This presentation will introduce the tools and combined workflow including visualization and access via the Chemistry Dashboard. These tools, data, and visualization approaches within an open chemistry resource provide a freely available software tool to support structure identification and non-targeted analysis. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
3D printing & open access databases for crystallographic college educationVincent Scalfani
Moeck, P.; Stone-Sundberg, J.; Snyder, T.J.; Kaminsky, W.; Gražulis, S.; and all members of the International Advisory Board of the Crystallography Open Database. In 3D printing & open access databases for crystallographic college education, IUCr 23rd Congress on Crystallography, Montreal, Canada, 2014.
How can you access PubChem programmatically?Sunghwan Kim
Presented at the 255th American Chemical Society (ACS) National Meeting in New Orleans, LA (March. 19, 2018).
Building automated workflows that exploit the vast amount of data contained in PubChem requires programmatic access to the data through application programming interfaces (APIs). PubChem provides several programmatic access routes to its data, including Entrez Utilities (E-Utilities or E-Utils), PubChem Power User Gateway (PUG), PUG-SOAP, PUG-REST, PUG-View, and a REST-ful interface to PubChemRDF. This presentation provides an overview of these programmatic access tools, including recent updates, limitations, usage policies, and best practices.
*References*
(1) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem, Nucleic Acids Research, 2015, 43(W1):W605–W611. https://doi.org/10.1093/nar/gkv396
(2) An update on PUG-REST: RESTful interface for programmatic access to PubChem, Nucleic Acids Research, 2018, 46(W1):gky294. https://doi.org/10.1093/nar/gky294
G. Tautkeviciene and I. Ceseviciute. Open Science in Horizon 2020: Open Aire ...Aidis Stukas
Access to scientific information accelerates research, discovery and economic development. For more than a decade libraries take part in the open access movement to develop an environment which provides access to scientific information. Open access stakeholders and experts are working together to achieve these aims. The presentation will elaborate on the infrastructure of open access which bridges the gap between academia and society by sharing research findings in new ways.
OpenAIRE is an initiative by the European Commission for an Open Access Infrastructure for Research in Europe to support open scholarly communication, Open Science and access to the research output of the European funded projects. The infrastructure of OpenAIRE supports the Open Science policy implemented in the programme Horizon2020 by the European Commission.
Vaidas Morkevičius - Open Data for Enhancing the Quality of ResearchAidis Stukas
The European Social Survey (ESS) is an academically driven cross-national survey that has been conducted across Europe since 2001. The presentation will focus on three pillars of quality enhancement in social research: good data, good documentation and good analytics.
Synopsis:
D3.js is a Javascript library primarily used to create interactive data visualizations in the browser. Despite its growing popularity and warm community, getting started with D3 can be tricky. This talk covers the basics of D3 and sheds light on some of its main conceptual hurdles. It concludes by discussing some applications of D3 to big data.
About the speaker:
Sam Selikoff [ http://www.samselikoff.com/ | @samselikoff ] is a self-taught full-stack web developer. Formerly a graduate student of economics and finance, he unexpectedly discovered a passion for programming while doing data work for a consulting firm. He is currently focusing on client-side MVC and data visualization.
Thanks to our Sponsors
Microsoft [ http://microsoftnewengland.com ] for providing awesome venue for the event.
Rovi [ http://rovi.com ] for providing the food/drinks.
cognizeus [ http://cognizeus.com ] for hosting the event and providing books to give away as raffle.
Where do data visualizations come from? Flatiron students Emily Simonton and Mandy Yeung talk D3—a Javascript Library for manipulating documents based on data.
Brigita Serafinavičiūtė. Open science – The Essential Ingredient of Our FutureAidis Stukas
3O - is the new popular abbreviation in the European Research Area. Pushed forward by Commissioner for Research, Science and Innovation it stands for "Open Science, Open Innovation, Open to the World". What is the ideology behind this? Maybe it is just a nicely coined catchy expression? What is the role of Open Science for our future? How far have we gone towards this vision?
Crystallography orientations of Silicon wafers (<100> and <110>) with different primary flats, the positions of <111> planes, and the visualization of etching profiles in crystal level.
More information and resources at - http://ow.ly/yq3OR
2014 is the UNESCO International Year of Crystallography, celebrating 100 years since X-ray diffraction allowed scientists to study the detailed structure of crystalline materials. Now crystallography is used in practically all science disciplines, from geologists analysing and dating meteorites to chemists synthesising brand new drugs to fight disease. Advanced technology, including synchrotrons and electron microscopes, now allow scientists to ‘see’ the structure of a variety of materials, including proteins, viruses, and drugs.
This session will feature Prof Andrea Gerson, who uses crystallography in her current research at UniSA. Andrea is internationally recognised, and can speak firsthand about the exciting work happening at the Australian Synchrotron. Teachers watching online will be able to ask Andrea questions using the RiAus chatroll.
This online session, targeted at Years 7-10 teachers, will outline the scientific principles behind crystallography, how it is being used currently and the problems it could help to solve in the future. Potential career paths and in-class activities will also be discussed.
Benchmarking Commercial RDF Stores with Publications Office DatasetGhislain Atemezing
The slides present a benchmark of RDF stores with real-world datasets and queries from the EU Publications Office (PO). The study compares the performance of four commercial triple stores: Stardog 4.3 EE, GraphDB 8.0.3 EE, Oracle 12.2c and Virtuoso 7.2.4.2 with respect to the following requirements: bulk loading, scalability, stability and query execution.
A novel programmable attenuator based low Gm-OTA for biomedical applicationsHoopeer Hoopeer
dokumen, Scribd, SlideShare, book: Microelectronic-Devices-and-Circuits, Bit-Vector Pattern Matching Systems on the Basis of Analog-Digital Field Reprogrammable Arrays, Linear System Theory: The State Space Approach, National Academies Press (NAP), Chopper-Stabilized Low-Noise Multipath Operational Amplifier with Dual Ripple Rejection Loops, A Single Slope ADC With Row-Wise Noise Reduction Technique for CMOS Image Sensor, Docsity, TSpace, iThenticate, EBSCO, OpenAIRE, DOAJ, Novel Schmitt trigger and square-wave generator using single current amplifier, Integrated Systems Laboratory, xDevs, A Noninvasive Glucose Monitoring SoC Based on Single Wavelength Photoplethysmography, Slewing Mitigation Technique for Switched Capacitor Circuits, mathjax, Tezzaron Semiconductor, Continuous-Time ΔΣ Modulator, AMiner, A novel programmable attenuator based low Gm-OTA for biomedical applications
_Link24
Open Source Collaboration in Drug Discovery in PharmaKees van Bochove
How pre-competitive collaboration in the pharmaceutical sector through open source platforms enables joint innovation of academics, pharma, SMEs and non-profits.
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
This talk was part of the 2020 Disease Map Modeling Community meeting, covering the steps towards publishing reproducible simulation studies (based on a reused model). Links to different COMBINE guidelines, tutorials and efforts. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
Data Sharing and Reproducible Research
. . . the imperative
◮ “/. . . / research which yields nonsignificant results is not
published. Such research being unknown to other
investigators may be repeated independently until
eventually by chance a significant result occurs /. . . / the
literature of such a field consists in substantial part of
false conclusions” [Sterling, 1959]
◮ in < 1/2 of the microarray publications, analyses are not
reproducible due to lack of
data/protocols/software [Ioannidis et al., 2009]
◮ “If you use p = 0.05 to suggest that you have made a
discovery, you will be wrong at least 30% of the time. If,
as is often the case, experiments are underpowered, you
will be wrong” most of the time1
. [Colquhoun, 2014]
1
Emphasis mine. S.G.
2 / 27
3. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
Data Sharing in Crystallography
Started quite early
◮ 1948 Acta Cryst. (IUCr) The Acta Crystallographica
journal was launched, all coordinates were printed in
journal articles, and Acta Crystallographica published
the structure factors as well
◮ 1965 CSD (CCDC) The CCDC was established at the
Department of Chemistry, Cambridge University /. . . /
about 2000 structures published before 1965 were
gradually incorporated into the developing database
◮ 1971 PDB In June 1971, the two communities
attended the Cold Spring Harbor Symposium on
Quantitative Biology (Cold Spring Laboratory Press,
1972)
3 / 27
5. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
The COD project
But what if crystallographers work together to establish a public
domain database with all relevant crystallographic data? This would
not only overcome the current situation with ’fragmented’ databases,
it would also prevent for becoming dependent from monopolists.
What would be needed?
1. A small team of engaged scientists with some experience in database
and software design to coordinate the project.
2. The authors (i.e. the scientific community = YOU) who provides the
project with database entries (note, that if you have’nt sold your
experimental results exclusively, you are free to distribute the data
to such a database, even if they have already been part of a
publication - and a lot of good data have never been published).
3. Free software a) for maintaining the database, b) for data
evaluation and calculation of derived data (e.g. calculated powder
pattern from crystal structures for search-match purposes), c) for
browsing and retrieval.
gemstonede (Dr. Michael BERNDT) Fri Feb 14, 2003 1:26 pm
5 / 27
7. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
Commong framework: the CIF
The Crystallographic Interchange Framework (CIF) is developed and curated
by the International Union of Crystallography (IUCr).
examples/data/2100858-head.cif:
data_2100858
loop_
_publ_author_name
’Buttner, R. H.’
’Maslen, E. N.’
_publ_section_title
;
Structural parameters and electron difference density in BaTiO~3~
;
_journal_issue 6
_journal_name_full ’Acta Crystallographica Section B’
_journal_page_first 764
_journal_page_last 769
_journal_volume 48
_journal_year 1992
_chemical_compound_source ’synthetic, from a mixture of KF:KMoO4:BaTiO3’
_chemical_formula_sum ’Ba O3 Ti’
_chemical_formula_weight 233.24
_symmetry_cell_setting tetragonal
_symmetry_space_group_name_Hall ’P 4 -2’
_symmetry_space_group_name_H-M ’P 4 m m’
_cell_angle_alpha 90.0
_cell_angle_beta 90.0
_cell_angle_gamma 90.0
_cell_formula_units_Z 1
_cell_length_a 3.9998(8)
_cell_length_b 3.9998(8)
_cell_length_c 4.0180(8)
7 / 27
8. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
Description of semantics
CIF dictionaries
data_cell_length_
loop_ _name ’_cell_length_a’
’_cell_length_b’
’_cell_length_c’
_category cell
_type numb
_type_conditions esd
_enumeration_range 0.0:
_units A
_units_detail ’angstroms’
_definition
; Unit-cell lengths in angstroms corresponding to the structure
reported. The values of _refln_index_h, *_k, *_l must
correspond to the cell defined by these values and _cell_angle_
values. The values of _diffrn_refln_index_h, *_k, *_l may not
correspond to these values if a cell transformation took place
following the measurement of the diffraction intensities. See
also _diffrn_reflns_transf_matrix_.
;
8 / 27
9. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
TCOD dictionary contents
The most basic data names
◮ cif_tcod.dic: ver. 0.008, last update 2015-06-16, 107 data
names;
◮ cif_dft.dic: ver. 0.019, last update 2016-04-13, 87 data
names.
e.g. (same as NOMAD atom_forces?):
data_tcod_atom_site_residual_force
loop_ _name
’_tcod_atom_site_resid_force_Cartn_x’
’_tcod_atom_site_resid_force_Cartn_y’
’_tcod_atom_site_resid_force_Cartn_z’
# ... some names omitted for brevity
_type numb
_units eV/%A
_units_detail ’electronvolts per Angstroem’
_definition
;
These data items describe residual forces on atoms in the final
structure. For a converged computation of a stable structure these
...
;
9 / 27
12. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
COD query examples
Web, REST, SQL
◮ Via the WWW interface – go for “search” in:
◮ http://www.crystallography.net/cod
◮ http://www.crystallography.net/tcod
◮ http://www.crystallography.net/pcod
◮ Via the stable URLs (REST):
◮ http://www.crystallography.net/cod/2000000.cif
◮ http://www.crystallography.net/tcod/10000002.cif
◮ http://www.crystallography.net/cod/result?text=perovskite
◮ Via the views of the SQL database:
◮ mysql -u cod_reader cod -h www.crystallography.net
-e ’select file, a, b, c, vol, formula
from data where
date between "2013-01-01" and
"2014-12-31" and
formula regexp " C[0-9]* "
order by vol desc limit 10’
12 / 27
17. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
*COD data citation
The Research Data Alliance has just published and
endorsed recommendations from the RDA Working
Group on Data Citation:
https://www.rd-alliance.org/groups/data-citation-wg.html
COD data can be cited in several ways:
◮ Using a data reference URI:
Srivastava, R. C.; Klooster, W. T.; Koetzle, T. F. “Neutron Structures of
Ammonium Fluoroberyllate” (1999) The Crystallography Open
Database, rev. 176759, the COD Advisory Board (eds.),
http://www.crystallography.net/cod/2002926.cif. [Retrieved
2016-09-21 16:48 EEST]
◮ Using a “landing page” URI:
Srivastava, R. C.; Klooster, W. T.; Koetzle, T. F. “Neutron Structures of
Ammonium Fluoroberyllate” (1999) The Crystallography Open
Database, rev. 176759, the COD Advisory Board (eds.),
http://www.crystallography.net/cod/2002926.html. [Retrieved
2016-09-21 16:48 EEST]
17 / 27
18. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
*COD data citation (2)
COD data can be cited in several ways:
◮ Using a data reference URI with explicit revision:
Srivastava, R. C.; Klooster, W. T.; Koetzle, T. F. “Neutron Structures of
Ammonium Fluoroberyllate” (1999) The Crystallography Open
Database, rev. 176759, the COD Advisory Board (eds.),
http://www.crystallography.net/cod/2002926.cif@176759. [Retrieved
2016-09-21 16:48 EEST]
◮ Using a content-negotiable URI (with or without
explicit revision):
Srivastava, R. C.; Klooster, W. T.; Koetzle, T. F. “Neutron Structures of
Ammonium Fluoroberyllate” (1999) The Crystallography Open
Database, rev. 176759, the COD Advisory Board (eds.),
http://www.crystallography.net/cod/2002926. [Retrieved 2016-09-21
16:48 EEST]
18 / 27
19. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
*COD metadata
COD metadata are available in RDF format:
http://www.crystallography.net/cod/2002926.rdf
examples/data/2002926-example.rdf:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cod="http://www.crystallography.net/cod/doc/rdf/">
<rdf:Description rdf:about=
"http://www.crystallography.net/cod/2002926.html">
<cod:Rall>0.0476</cod:Rall>
<cod:Robs>0.0476</cod:Robs>
<cod:Z>8</cod:Z>
<cod:Zprime>2</cod:Zprime>
<cod:a>15.017</cod:a>
<cod:acce_code>BK0051</cod:acce_code>
<cod:alpha>90</cod:alpha>
<cod:author>Srivastava, R. C.</cod:author>
<cod:author>Klooster, W. T.</cod:author>
<cod:author>Koetzle, T. F.</cod:author>
<cod:b>5.876</cod:b>
<cod:beta>90</cod:beta>
<cod:c>10.418</cod:c>
<cod:calcformula>Be F4 H8 N2</cod:calcformula>
<cod:celltemp>163</cod:celltemp>
<cod:chemname>ammonium tetrafluoroberyllate</cod:chemname>
<!-- Some content omitted for brevity ... -->
</rdf:Description>
</rdf:RDF>
19 / 27
23. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
Acknowledgements
VU Institute of
Biotechnology
Virginijus Siksnys
(head of the dept.)
Andrius Merkys
Antanas Vaitkus
QM community
Björkman
Torbjörn
Stefaan Cottenier
Nicola Marzari
Giovanni Pizzi
Lubomir Smrcok
Linas Vilˇciauskas
Chris Wolverton
COD Advisory board
Daniel Chateigner
Robert T. Downs
Werner Kaminsky
Armel Le Bail
Luca Lutterotti
Peter Moeck
Peter Murray-Rust
Miguel Quirós
Thanks to the Lorentz Center for supporting participation
of the Vilnius group
Thanks to commercial COD users and supporters – Bruker,
PANalytical, Rigaku; thanks to IUCr for support and
consultations.
23 / 27
25. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
References I
Bernstein, H. J., Bollinger, J. C., Brown, I. D., Gražulis, S., Hester, J. R., McMahon, B.,
Spadaccini, N., Westbrook, J. D., and Westrip, S. P. (2016).
Specification of the Crystallographic Information File format, version 2.0.
Journal of Applied Crystallography, 49(1).
Boullay, P., Lutterotti, L., and Chateigner, D. (2012).
Quantitative analysis of electron diffraction ring patterns using the MAUD program.
Boullay, P., Lutterotti, L., Chateigner, D., and Sicard, L. (2014).
Fast microstructure and phase analyses of nanopowders using combined analysis of transmission
electron microscopy scattering patterns.
Acta Crystallographica Section A, 70:448–456.
Colquhoun, D. (2014).
An investigation of the false discovery rate and the misinterpretation of p-values.
Royal Society Open Science, 1(3):140216.
Ioannidis, J. P. A., Allison, D. B., Ball, C. A., Coulibaly, I., Cui, X., Culhane, A. C., Falchi, M.,
Furlanello, C., Game, L., Jurman, G., Mangion, J., Mehta, T., Nitzberg, M., Page, G. P., Petretto,
E., and van Noort, V. (2009).
Repeatability of published microarray gene expression analyses.
Nat Genet, 41(2):149–155.
Le Bail, A. (2008).
Frontiers between crystal-structure prediction and determination by powder diffractometry.
Powder Diffraction Suppl., pages S5–S12.
Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N., and Kozinsky, B. (2016).
AiiDA: automated interactive infrastructure and database for computational science.
Computational Materials Science, 111:218–230.
A path to freedom: GNU → Linux → Ubuntu → MySQL → R → LATEX→ TikZ → Beamer
26. ThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020
researchandinnovationprogramundergrantagreementNo689868.
References II
Sadowski, P. and Baldi, P. (2013).
Small-molecule 3d structure prediction using open crystallography data.
Journal of Chemical Information and Modeling, 53:3127–3130.
Smith, A. M., Katz, D. S., and Niemeyer, K. E. (2016).
Software citation principles.
PeerJ Computer Science, 2:e86.
Sterling, T. D. (1959).
Publication decisions and their possible effects on inferences drawn from tests of significance —
or vice versa.
Journal of the American Statistical Association, 54:30–34.
A path to freedom: GNU → Linux → Ubuntu → MySQL → R → LATEX→ TikZ → Beamer