The document summarizes the eNanoMapper database and web application. Some key points:
- eNanoMapper is an open source database and ontology framework for nanomaterials design and safety assessment. It was developed through several EU funded projects including NANoREG.
- The database builds on an existing chemical structure database and can represent a wide range of experimental data and endpoints of regulatory interest.
- An ontology was developed and existing ontologies are reused to harmonize terminology. Tools allow processing and importing data from various sources and formats.
- The database and ontology can be accessed through a searchable web interface or via an API for programmatic access and data analysis. It integrates data from multiple
The eNanoMapper database for nanomaterial safety information: storage and queryNina Jeliazkova
A number of challenges exist in engineered nanomaterials (ENM) data representation and integration mainly due to data complexity and provenance. We have recently described the eNanoMapper database [doi:10.1109/BIBM.2014.699936] as part of the computational infrastructure for toxicological data management of ENM, developed within the EU FP7 eNanoMapper project. The ontology-supported data model is based on an exhaustive review of existing nano-related data models, databases, and nanomaterial related entries in chemical and toxicogenomic databases. We demonstrate how this approach provides a common ground for integration of data represented in diverse formats (ISA-TAB, OECD HT, custom RDF and set of spreadsheet templates used by the EU NanoSafety Cluster projects) and enables uniform approach towards import, storage and searching of ENM physicochemical measurements and biological assay results. A configurable parser enables import of the data stored in spreadsheet templates, accommodating different organization of the data. The configuration metadata is defined in a separate file, mapping the spreadsheet into the internal data model. The demonstration data provided by eNanoMapper partners ((i) NanoWiki, (ii) a literature dataset on protein coronas and (iii) the ModNanoTox project dataset consisting of 86 assays and 100 different endpoints) illustrates the capability of the associated REST API to support a variety of tests and endpoints, recommended by the OECD Working Party of Manufactured Nanomaterials. The API is tightly integrated with a chemical structure search, allowing highlighting the function as a core, coating or functionalisation. The REST API enables graphical summaries of the data and integration in applications such as NanoQSAR modelling via programmatic interaction.
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
Integration of the combined JSmol/JSpecView molecular viewer/spectral viewer software in the Eureka Research Workbench. Can display molecular structures, spectra and the linked version where clicking on a peak shows molecular movement (IR).
The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
The eNanoMapper database for nanomaterial safety information: storage and queryNina Jeliazkova
A number of challenges exist in engineered nanomaterials (ENM) data representation and integration mainly due to data complexity and provenance. We have recently described the eNanoMapper database [doi:10.1109/BIBM.2014.699936] as part of the computational infrastructure for toxicological data management of ENM, developed within the EU FP7 eNanoMapper project. The ontology-supported data model is based on an exhaustive review of existing nano-related data models, databases, and nanomaterial related entries in chemical and toxicogenomic databases. We demonstrate how this approach provides a common ground for integration of data represented in diverse formats (ISA-TAB, OECD HT, custom RDF and set of spreadsheet templates used by the EU NanoSafety Cluster projects) and enables uniform approach towards import, storage and searching of ENM physicochemical measurements and biological assay results. A configurable parser enables import of the data stored in spreadsheet templates, accommodating different organization of the data. The configuration metadata is defined in a separate file, mapping the spreadsheet into the internal data model. The demonstration data provided by eNanoMapper partners ((i) NanoWiki, (ii) a literature dataset on protein coronas and (iii) the ModNanoTox project dataset consisting of 86 assays and 100 different endpoints) illustrates the capability of the associated REST API to support a variety of tests and endpoints, recommended by the OECD Working Party of Manufactured Nanomaterials. The API is tightly integrated with a chemical structure search, allowing highlighting the function as a core, coating or functionalisation. The REST API enables graphical summaries of the data and integration in applications such as NanoQSAR modelling via programmatic interaction.
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
Integration of the combined JSmol/JSpecView molecular viewer/spectral viewer software in the Eureka Research Workbench. Can display molecular structures, spectra and the linked version where clicking on a peak shows molecular movement (IR).
The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
247th ACS Meeting: The Eureka Research WorkbenchStuart Chalk
Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Presentation on the Chemical Analysis Metadata Platform (ChAMP) as a new project to characterize and organize metadata about chemical analysis methods. The project will develop an ontology, controlled vocabularies, and design rules
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Stuart Chalk
Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.
As the volume and complexity of data from myriad Earth Observing platforms, both remote sensing and in-situ increases so does the demand for access to both data and information products from these data. The audience no longer is restricted to an investigator team with specialist science credentials. Non-specialist users from scientists from other disciplines, science-literate public, to teachers, to the general public and decision makers want access. What prevents them from this access to resources? It is the very complexity and specialist developed data formats, data set organizations and specialist terminology. What can be done in response? We must shift the burden from the user to the data provider. To achieve this our developed data infrastructures are likely to need greater degrees of internal code and data structure complexity to achieve (relatively) simpler end-user complexity. Evidence from numerous technical and consumer markets supports this scenario. We will cover the elements of modern data environments, what the new use cases are and how we can respond to them.
cytoscape is open source network analyses tools, in this slides we define the basic features of this tool, and a brief tutorial of how can you use this tool in innovative way.
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
I've been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered.
Opportunities in chemical structure standardizationValery Tkachenko
This talk was given at EBI's Wellcome Trust Genome Campus and is dedicated to outlining problems with chemical information standardization and various efforts to tackle this problem.
Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko
Sustainable research progress in many scientific disciplines critically depends on the existence of robust specialized databases that integrate and structure all available experimental information in the respective fields. The need for such reference database is especially critical for nanoscience and nanomaterial research given the significant diversity of shapes, sizes, and properties of engineered nanomaterials and the difficulty of synthesizing engineered nanoparticles with controlled properties. The acquisition of data from public sources is inefficient, time consuming and limited in scope. Moreover, it is not clear where the resources come from to support this activity on a perpetual basis. The NIH has recently posted its intention to provide special funds toward data deposition by the experimental investigators through the ‘data sharing plan’ for each proposal. However, this points to a current weakness which is that all laboratories use different data collection approaches each of which requires interpretation by staff hosting the database. It would be far more efficient and useful if a template with key terms that could be modified to add new or important additional data or parameters for each investigator. We will discuss tools and approaches to facilitate collection and direct deposition of experimental data into Nanomaterial Registry (https://www.nanomaterialregistry.org/) - a versatile semantically enriched templates-based platform for registering diverse data pertaining to nanomaterials research.
The Linked Open Data (LOD) cloud contains tremendous amounts of interlinked instances, from where we can retrieve abundant knowledge. However, because of the heterogeneous and big ontologies, it is time consuming to learn all the ontologies manually and it is difficult to observe which properties are important for describing instances of a specific class. In order to construct an ontology that can help users easily access to various data sets, we propose a semi-automatic ontology inte- gration framework that can reduce the heterogeneity of ontologies and retrieve frequently used core properties for each class. The framework consists of three main components: graph-based ontology integration, machine-learning-based ontology schema extraction, and an ontology merger. By analyzing the instances of the linked data sets, this framework acquires ontological knowledge and constructs a high-quality integrated ontology, which is easily understandable and effective in knowledge ac- quisition from various data sets using simple SPARQL queries.
Chemistry Validation and Standardization Platform v2.0Valery Tkachenko
In recent years there has been explosive growth in the number of public chemical databases available online, a number of these containing 10s of millions of chemical structures. Examples include PubChem, ChemSpider and ChEMBL and users of these databases have become increasingly aware of the issue of data quality associated with these public resources. Seamless integration and mapping between databases, even for some common chemicals, is challenged by differing approaches to chemical standardization prior to registration into a database. The lack of standards in representing and handling chemical information certainly contributes to aspects of this problem. The Chemistry Validation and Standardization Platform (CVSP), originally developed to support the European Innovative Medicines Initiative project known as OpenPHACTS, was developed with the intention of providing an open platform for processing and standardizing chemical compounds. The system has been used to process millions of chemical compounds for dissemination through public websites and, unlike other validation and standardization systems, the system provides support for both standard and custom rulesets. We will provide an overview of CVSP 2.0, the next generation of the platform extending support to new cheminformatics toolkits and additional capabilities such as collaborative rules authoring.
An analysis of the quality issues of the properties available in the Spanish ...Nandana Mihindukulasooriya
DBpedia exposes data from Wikipedia as machine-readable Linked Data. The DBpedia data extraction process generates RDF data in two ways; (a) using the mappings that map the data from Wikipedia infoboxes to the DBpedia ontology and other vocabularies, and (b) using infobox-properties, i.e., properties that are not defined in the DBpedia
ontology but are auto-generated using the infobox attribute-value pairs. The work presented in this paper inspects the quality issues of the properties used in the Spanish DBpedia dataset according to conciseness, consistency, syntactic validity, and semantic accuracy quality dimensions.
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
Short description and updates about DeepBlue Epigenomic Data Server that I presented during the last Blueprint (http://www.blueprint-epigenome.eu/) Jamboree in Madrid (June 2016)
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
247th ACS Meeting: The Eureka Research WorkbenchStuart Chalk
Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Presentation on the Chemical Analysis Metadata Platform (ChAMP) as a new project to characterize and organize metadata about chemical analysis methods. The project will develop an ontology, controlled vocabularies, and design rules
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Stuart Chalk
Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.
As the volume and complexity of data from myriad Earth Observing platforms, both remote sensing and in-situ increases so does the demand for access to both data and information products from these data. The audience no longer is restricted to an investigator team with specialist science credentials. Non-specialist users from scientists from other disciplines, science-literate public, to teachers, to the general public and decision makers want access. What prevents them from this access to resources? It is the very complexity and specialist developed data formats, data set organizations and specialist terminology. What can be done in response? We must shift the burden from the user to the data provider. To achieve this our developed data infrastructures are likely to need greater degrees of internal code and data structure complexity to achieve (relatively) simpler end-user complexity. Evidence from numerous technical and consumer markets supports this scenario. We will cover the elements of modern data environments, what the new use cases are and how we can respond to them.
cytoscape is open source network analyses tools, in this slides we define the basic features of this tool, and a brief tutorial of how can you use this tool in innovative way.
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
I've been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered.
Opportunities in chemical structure standardizationValery Tkachenko
This talk was given at EBI's Wellcome Trust Genome Campus and is dedicated to outlining problems with chemical information standardization and various efforts to tackle this problem.
Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko
Sustainable research progress in many scientific disciplines critically depends on the existence of robust specialized databases that integrate and structure all available experimental information in the respective fields. The need for such reference database is especially critical for nanoscience and nanomaterial research given the significant diversity of shapes, sizes, and properties of engineered nanomaterials and the difficulty of synthesizing engineered nanoparticles with controlled properties. The acquisition of data from public sources is inefficient, time consuming and limited in scope. Moreover, it is not clear where the resources come from to support this activity on a perpetual basis. The NIH has recently posted its intention to provide special funds toward data deposition by the experimental investigators through the ‘data sharing plan’ for each proposal. However, this points to a current weakness which is that all laboratories use different data collection approaches each of which requires interpretation by staff hosting the database. It would be far more efficient and useful if a template with key terms that could be modified to add new or important additional data or parameters for each investigator. We will discuss tools and approaches to facilitate collection and direct deposition of experimental data into Nanomaterial Registry (https://www.nanomaterialregistry.org/) - a versatile semantically enriched templates-based platform for registering diverse data pertaining to nanomaterials research.
The Linked Open Data (LOD) cloud contains tremendous amounts of interlinked instances, from where we can retrieve abundant knowledge. However, because of the heterogeneous and big ontologies, it is time consuming to learn all the ontologies manually and it is difficult to observe which properties are important for describing instances of a specific class. In order to construct an ontology that can help users easily access to various data sets, we propose a semi-automatic ontology inte- gration framework that can reduce the heterogeneity of ontologies and retrieve frequently used core properties for each class. The framework consists of three main components: graph-based ontology integration, machine-learning-based ontology schema extraction, and an ontology merger. By analyzing the instances of the linked data sets, this framework acquires ontological knowledge and constructs a high-quality integrated ontology, which is easily understandable and effective in knowledge ac- quisition from various data sets using simple SPARQL queries.
Chemistry Validation and Standardization Platform v2.0Valery Tkachenko
In recent years there has been explosive growth in the number of public chemical databases available online, a number of these containing 10s of millions of chemical structures. Examples include PubChem, ChemSpider and ChEMBL and users of these databases have become increasingly aware of the issue of data quality associated with these public resources. Seamless integration and mapping between databases, even for some common chemicals, is challenged by differing approaches to chemical standardization prior to registration into a database. The lack of standards in representing and handling chemical information certainly contributes to aspects of this problem. The Chemistry Validation and Standardization Platform (CVSP), originally developed to support the European Innovative Medicines Initiative project known as OpenPHACTS, was developed with the intention of providing an open platform for processing and standardizing chemical compounds. The system has been used to process millions of chemical compounds for dissemination through public websites and, unlike other validation and standardization systems, the system provides support for both standard and custom rulesets. We will provide an overview of CVSP 2.0, the next generation of the platform extending support to new cheminformatics toolkits and additional capabilities such as collaborative rules authoring.
An analysis of the quality issues of the properties available in the Spanish ...Nandana Mihindukulasooriya
DBpedia exposes data from Wikipedia as machine-readable Linked Data. The DBpedia data extraction process generates RDF data in two ways; (a) using the mappings that map the data from Wikipedia infoboxes to the DBpedia ontology and other vocabularies, and (b) using infobox-properties, i.e., properties that are not defined in the DBpedia
ontology but are auto-generated using the infobox attribute-value pairs. The work presented in this paper inspects the quality issues of the properties used in the Spanish DBpedia dataset according to conciseness, consistency, syntactic validity, and semantic accuracy quality dimensions.
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
Short description and updates about DeepBlue Epigenomic Data Server that I presented during the last Blueprint (http://www.blueprint-epigenome.eu/) Jamboree in Madrid (June 2016)
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
The Royal Society of Chemistry publishes many thousands of articles per year, the majority of these containing rich chemistry data that, in general, in limited in its value when isolated only to the HTML or PDF form of the articles commonly consumed by readers. RSC also has an archive of over 300,000 articles containing rich chemistry data especially in the form of chemicals, reactions, property data and analytical spectra. RSC is developing a platform integrating these various forms of chemistry data. The data will be aggregated both during the manuscript deposition process as well as the result of text-mining and extraction of data from across the RSC archive. This presentation will report on the development of the platform including our success in extracting compounds, reactions and spectral data from articles. We will also discuss our developing process for handling data at manuscript deposition and the integration and support of eLab Notebooks (ELNS) in terms of facilitating data deposition and sourcing data. Each of these processes is intended to ensure long-term access to research data with the intention of facilitating improved discovery.
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
Tutorial given at Informatics for HEalth 2017 COnference These slides are for the second part of the tutorial describing provenance capture and management tools.
At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
In this talk at the CECAM 2015 Workshop on Future Technologies in Automated Atomistic Simulations, I will discuss the Materials Project Ecosystem, an initiative to develop a comprehensive set of open-source software and data tools for materials informatics. The Materials Project is a US Department of Energy-funded initiative to make the computed properties of all known inorganic materials publicly available to all materials researchers to accelerate materials innovation. Today, the Materials Project database boasts more than 58,000 materials, covering a broad range of properties, including energetic properties (e.g., phase and aqueous stability, reaction energies), electronic structure (bandstructures, DOSs) and structural and mechanical properties (e.g., elastic constants).
A linchpin of the Materials Project is its robust data and software infrastructure, built on best open-source software development practices such as continuous testing and integration, and comprehensive documentation. I will provide an overview of the open-source software modules that have been developed for materials analysis (Python Materials Genomics), error handling (Custodian) and scientific workflow management (FireWorks), as well as the Materials API, a first-of-its-kind interface for accessing materials data based on REpresentational State Transfer (REST) principles. I will show a materials researcher may use and build on these software and data tools for materials informatics as well as to accelerate his own research.
A (vintage) presentation about a database system for the study of gene expression data. Including distributed metadata annotation and some interactive analytics. Some ideas are still actual today.
BioTeam Bhanu Rekepalli Presentation at BICoB 2015The BioTeam Inc.
Adapting life sciences applications to next generation supercomputers with Intel Xeon Phi coprocessors to accelerate large scale data analysis and discovery.
An Overview of the iMicrobe Project and available tools in the iPlant Cyberinfrastructure. This talk was given at a workshop at ASLO in Granada, Spain focused on applications in Oceanography and Limnology.
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
"On chemical structures, substances, nanomaterials and measurements"
Nina Jeliazkova, Ideaconsult
This talk attempts to highlight how I came to recognize the fundamental role of measurements, coming from the realm of data modelling and data analysis. Besides retaining the data provenance it provides insights how do we go beyond chemical structures and address the challenges of representing the identity of chemical substances and nanomaterials (with examples from the latest developments of AMBIT web services and OpenTox API). Finally, supporting the vision of distributed, open, web-like approach towards recording subtle experimental details is essential, not only for the chemists and biologists in the labs, but for all of us using, modelling, storing and querying the data.
Presented July 14,2014 in Cambridge , UK
Defining the Future for Open Notebook Science – A Memorial Symposium Celebrating the Work of Jean-Claude Bradley
http://inmemoriamjcb.wikispaces.com/Jean-Claude+Bradley+Memorial+Symposium
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access
New Developments in H2O: April 2017 EditionSri Ambati
H2O presentation at Trevor Hastie and Rob Tibshirani's Short Course on Statistical Learning & Data Mining IV: http://web.stanford.edu/~hastie/sldm.html
PDF and Keynote version of the presentation available here: https://github.com/h2oai/h2o-meetups/tree/master/2017_04_06_SLDM4_H2O_New_Developments
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Making project data avalialble eNanomapper through Database
1. Making project data available through
eNanoMapper database:
NANoREG, Nanoreg2,caLIBRAte and Gracious
Nina Jeliazkova, Ideaconsult Ltd., Sofia, Bulgaria
Presented by Georgia Tsiliki, ATHENA Research Center
2. • Open source database and web
application
• Builds upon a Chemical structure
database with support for substances
• The data model supporting experimental
data is capable of representing all
endpoints of regulatory interests and
other types of data.
• eNanoMapper ontology; developed by an
experienced team at EBI. Existing ontologies
are reused;
• Tools to process and import data. Export in
various formats
• Searchable; Free text search based on
ontology
• eNanomapper modelling; Integration of data
analysis tools via API led by NTUA. WEKA, R
python routines uploaded to Jaqpot platform
• Flexible data hosting architecture
ENM Summary & data solutions:
eNanoMapper overview
DB
• FP7 eNanoMapper - A Database and
Ontology Framework for
Nanomaterials Design and Safety
Assessment
• Grant Agreement: 604134
• Duration: 1 Feb 2014 – 31 Jan 2017;
8 partners
Ontology
Modelling
• Open source database and web
application
• Builds upon a Chemical structure
database with support for substances
• The data model supporting experimental
data is capable of representing all
endpoints of regulatory interests and
other types of data.
• eNanoMapper ontology; developed by an
experienced team at EBI. Existing ontologies
are reused;
• Tools to process and import data. Export in
various formats
• Searchable; Free text search based on
ontology
• eNanomapper modelling; Integration of data
analysis tools via API led by NTUA. WEKA, R
python routines uploaded to Jaqpot platform
• Flexible data hosting architecture
DB
3. • Challenges
• Diverse data sources
• Diverse data input
formats
• Different data
organization
• Diverse modelling
tools
• Approach:
• Enable mappings!
• i.e. eNanoMapper
• Physico-chemical identity
Different analytic techniques, manufacturing
conditions, batch effects, mixtures, impurities, size
distribution, differences in the amount of surface
modification, etc.
• Biological identity
Wide variety of measurements, toxicity pathways,
effects of ENM coronas, modes-of-action,
interactions (cell lines, assays).
• Processes requiring information
From raw data (science) to study summaries for
regulatory purposes; linking with experimental
protocols; risk assessment; grouping, safety-by-
design
• Support for data analysis
Requires “spreadsheet” or matrix view of data. The
experimental data in the public datasets is usually
not in a form appropriate for modelling (merging
multiple values, conditions, similar experiments into
matrix form is a challenge).
Organising the nanosafety data
eNanoMapper overview
4. • https://data.enanomapper.net
• Mostly literature data + partial content provided by MODENA and
MARINA projects; links to external DB
• Free text search https://search.data.enanomapper.net/enm
Public eNanoMapper database
eNanoMapper overview
8. https://search.enanomapper.net
Search data integration
ENM data,
caNanoLab
eNanoMapper
db instance
NANoREG /
TNO DB,
Excel files
enanomapper
db instance
(NR1 data)
MARINA
files
enanomapper
db instance
MARINA
enanomapper
db instance
NanoTest
enanomapper
db instance
NanoGenotox
….
NANoREG NanoReg2caLIBRAteeNanoMapper
Protection (User rights)
NanoTest
files
NanoGenotox
files
9. Free text and faceted search
How to retrieve the data ?
Facets or filters
that permit easy
refinement of
search
List of data
resources with direct
links to DB
Selected
filters
10. Export example
How to retrieve and download the data ?
• To download the search results specify “Filtered entries” and “Study results”.
• Click on the TXT icon. The download button caption will change to
• “Download filtered entries as TSV” (tab separated values)
• Note that only limited set of fields are exported in the TXT format.
• The JSON and XML formats contain the full set of fields.
11. • Application Programming Interface (API): a way computer programs talk to one
another. Can be understood in terms of how a programmer sends instructions between
programs.
• Access the database via any programming language , Workflow systems , Data
analysis tools (R, JavaScript, Java, Ruby used by eNanomapper partners)
• eNanoMapper Tutorials:
http://www.enanomapper.net/enm-tutorials
https://github.com/enanomapper/tutorials
Mapping internal database
functions to the external world
How to retrieve and analyse the data?
12. • Input data files: >1,000 Excel files roughly following
• IOM templates (most projects except NANoREG)
• NANoREG / JRC ISA-TAB-logic files
• NANoREG / JRC are NOT ISA-TAB-NANO compliant
• Assays naming : multiple names per assay
• Colony Forming
• Colony Forming Assay
• Colony Forming Efficiency Assay
• CFE
• Cell viability - Colony Forming Efficiency Assay
• Naming materials, cells , methods, etc
Data entry issues
How the data is imported?
13. An ontology is a controlled
vocabulary enhanced with
relationships between terms.
Usage
• Harmonize data from several
sources
• Analysis (logical inference,
semantic search)
Examples:
• Is Cytotoxicity an endpoint, or
EC50 is also an endpoint?
• Is TEM a protocol, or a
technology?
• “Water - Fish - D. Rerio”
assay – is it the same one as
“Zebrafish Embryo Toxicity
Test” ?
Mapping terms: ontology
eNanoMapper overview
15. Mapping spreadsheet content into the data model
eNanoMapper overview
JSON
(JavaScript
Object
Notation) is a
lightweight
data-
through JSON configuration
22. Future plans
Type of data:
• Physical-chemical characterisation: particle size, distribution,
solubility, zeta potential…
• Toxicological characterisation: ALI, submerged-cell, simulated body
fluid, ROS generation.
Type of NPs:
• Engineered NPs (ZrO2, CeO, Sb-Sn Ox, SnO)
• Process-generated NPs (atmospheric plasma spraying)
SnO
E
x
p
o
s
u
r
e
c
o
n
tr
o
lC
e
O
2
N
P
s
,
2
h
C
e
O
2
N
P
s
,
4
h
P
o
s
itiv
e
c
o
n
tr
o
l
P
o
s
itiv
e
c
o
n
tr
o
l+
C
e
O
2
N
P
s
4
h
0
2 0
4 0
6 0
8 0
1 0 0
1 2 0
1 4 0
A f t e r e x p o s u r e
6 . 4 m g / m
3
, h i g h v o l t a g e ( 1 0 0 0 V )
LDHrelease
(%ofpositivecontrol)
* * * *
*
C e l l v i a b i l i t y
2 4 h
* p < 0 . 0 5
* * * * p < 0 . 0 0 0 1
v s e x p c t
WC, Cr, Ni
Engineered NPs Process-generated NPs