The document describes the eBank UK project, which seeks to link e-research data, scholarly communication, and e-learning by building connections from data generated in experiments through publications and into educational resources. It discusses the scholarly knowledge cycle and how eBank UK is addressing the bottleneck of data publication by developing a distributed information architecture with common data standards and ontologies. This will allow an aggregator to harvest metadata from repositories holding experimental data and publications and provide a single access point for discovery across distributed resources through services like search and retrieval.
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
Metabolomics Society meeting 2011 - presentatie Keesthehyve
This document summarizes three challenges for metabolomics study databases: 1) representing the biological context and complex study designs of samples through metadata, 2) implementing data preprocessing, identification, and quantification methods, and 3) embedding metabolomics data with other 'omics' data from the same samples. It provides an overview of the Netherlands Metabolomics Centre's open-source Data Support Platform, which allows flexible representation of study metadata and metabolomics data from various assays to address these challenges.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
The article outlines extensions to the model and algorithm of spyware detection
procedures which, in particular, presents a potential threat to medical information
systems. The approaches and solutions in this paper allow to programmatically
implement binary file segmentation using discrete wavelet transform (WT). Unlike the
existing approaches, the model and algorithm proposed in the article took into
account local extrema of the wavelet coefficients (WC). The described solutions allow
for analysis of potentially dangerous and spyware files the number of bytes, as well as
the entropy for individual segments of files that are transmitted for analysis
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
A survey of heterogeneous information network analysisSOYEON KIM
A Survey of Heterogeneous Information Network Analysis
Chuan Shi, Member, IEEE,
Yitong Li, Jiawei Zhang, Yizhou Sun, Member, IEEE,
and Philip S. Yu, Fellow, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
Metabolomics Society meeting 2011 - presentatie Keesthehyve
This document summarizes three challenges for metabolomics study databases: 1) representing the biological context and complex study designs of samples through metadata, 2) implementing data preprocessing, identification, and quantification methods, and 3) embedding metabolomics data with other 'omics' data from the same samples. It provides an overview of the Netherlands Metabolomics Centre's open-source Data Support Platform, which allows flexible representation of study metadata and metabolomics data from various assays to address these challenges.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
The article outlines extensions to the model and algorithm of spyware detection
procedures which, in particular, presents a potential threat to medical information
systems. The approaches and solutions in this paper allow to programmatically
implement binary file segmentation using discrete wavelet transform (WT). Unlike the
existing approaches, the model and algorithm proposed in the article took into
account local extrema of the wavelet coefficients (WC). The described solutions allow
for analysis of potentially dangerous and spyware files the number of bytes, as well as
the entropy for individual segments of files that are transmitted for analysis
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
A survey of heterogeneous information network analysisSOYEON KIM
A Survey of Heterogeneous Information Network Analysis
Chuan Shi, Member, IEEE,
Yitong Li, Jiawei Zhang, Yizhou Sun, Member, IEEE,
and Philip S. Yu, Fellow, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015
Towards a Query Rewriting Algorithm Over Proteomics XML ResourcesCSCJournals
Querying and sharing Web proteomics data is not an easy task. Given that, several data sources can be used to answer the same sub-goals in the Global query, it is obvious that we can have many candidates rewritings. The user-query is formulated using Concepts and Properties related to Proteomics research (Domain Ontology). Semantic mappings describe the contents of underlying sources. In this paper, we propose a characterization of query rewriting problem using semantic mappings as an associated hypergraph. Hence, the generation of candidates’ rewritings can be formulated as the discovery of minimal Transversals of an hypergraph. We exploit and adapt algorithms available in Hypergraph Theory to find all candidates rewritings from a query answering problem. Then, in future work, some relevant criteria could help to determine optimal and qualitative rewritings, according to user needs, and sources performances.
This talk was part of the 2020 Disease Map Modeling Community meeting, covering the steps towards publishing reproducible simulation studies (based on a reused model). Links to different COMBINE guidelines, tutorials and efforts. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Data Provenance and Scientific Workflow ManagementNeuroMat
Introductory class on techniques and tools to manage scientific data, focusing on sources of information and data analysis. Lecturer: Prof. Kelly Rosa Braghetto, a NeuroMat associate investigator and a professor at the University of São Paulo's Department of Computer Science.
Introduction to FAIR principles in the context of computational biology models. Presented at a Workshop at the Basel Conference of Computational Biology. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Network-based machine learning approach for aggregating multi-modal dataSOYEON KIM
This document summarizes the author's PhD dissertation which develops network-based machine learning approaches for aggregating multi-modal data. It presents two key techniques: 1) multi-view network clustering to integrate different network views for clustering tasks like social-tagged image clustering, and 2) multi-layered network based pathway activity inference to infer pathway activities from integrated gene networks using techniques like directed random walks. It outlines experiments applying these techniques to various biomedical datasets for clustering, survival/metastasis prediction, and identifying cancer-related pathways and genes.
An information-theoretic, all-scales approach to comparing networksJim Bagrow
My presentation at NetSci 2018 on Portrait Divergence, a new approach to comparing networks that is simple, general-purpose, and easy to interpret.
The preprint: https://arxiv.org/abs/1804.03665
The code: https://github.com/bagrow/portrait-divergence
Domains such as drug discovery, data science, and policy studies increasing rely on the combination of complex analysis pipelines with integrated data sources to come to conclusions. A key question then arises is what are these conclusions based upon? Thus, there is a tension between integrating data for analysis and understanding where that data comes from (its provenance). In this talk, I describe recent work that is attempting to facilitate transparency by combining provenance tracked within databases with the data integration and analytics pipelines that feed them. I discuss this with respect to use cases from public policy as well as drug discovery.
Given at: http://ccct.uva.nl/content/ccct-seminar-21-february-2014
This document provides a summary of the 2013 annual progress report for the National Resource for Network Biology (NRNB). It describes the NRNB network in 2013, including personnel from various institutions collaborating on projects. It also summarizes the findings of the NRNB External Advisory Committee meeting in December 2012. The Committee found that the NRNB had made strong progress in developing new network analysis tools and demonstrating their value. They provided suggestions to optimize projects, measure success beyond citations, and prepare for renewal. Specific projects like TRD C on network-extracted ontologies were praised for progress. Cytoscape 3.0 development was also discussed.
This document summarizes the accomplishments of the National Resource for Network Biology over a reporting period. It lists numerous quantitative metrics of success, including over 100 publications citing their grants, thousands of daily downloads and uses of their software tools, and training over 100 users. It also provides details on improvements and developments made to several of their modeling frameworks, algorithms, and software applications. Finally, it outlines the formation of a new working group on single-cell RNA-seq analysis and visualization, and improvements made to their computing infrastructure.
The National Resource for Network Biology (NRNB) held its External Advisory Council meeting on December 12, 2012. The NRNB is focused on developing network biology tools and collaborating with investigators. It oversees various technology research and development projects, software releases including Cytoscape 3.0, collaboration projects, and outreach/training events. The meeting agenda covered progress updates and sought advice on future plans.
This document provides an annual progress report for the National Resource for Network Biology (NRNB) for the period of May 1, 2011 to April 30, 2012. It summarizes the following:
1) Advances made in developing algorithms to identify network modules and use modules as biomarkers for disease. This includes methods to capture complex logical relationships within modules.
2) Progress on tools to enable new network analysis and visualization capabilities, including a new version of Cytoscape.
3) Growth of collaborations through the NRNB, which have nearly doubled over the past year to around 100 projects.
4) Continued development of the Cytoscape App Store to support the user and developer community.
The National Resource for Network Biology (NRNB) aims to advance network biology science through bioinformatic methods, software, infrastructure, collaboration, and training. In the past year, the NRNB made progress in its specific aims, including developing new network analysis methods, catalyzing changes in network representation, establishing software and databases, engaging in collaborations, and providing training opportunities. Going forward, the NRNB plans to further develop methods for differential and predictive network analysis, multi-scale network representation, and pathway analysis tools.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses.
How do we find these undiscovered uses for existing drugs?
We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships.
In this talk we will cover:
Building the knowledge graph
Predicting latent relationships
Using the latent relationships to repurpose existing drugs
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
1. Chemoinformatics is the application of informatics methods to solve chemical problems and encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information.
2. It combines aspects of chemistry and computer science to address challenges such as representing and searching large chemical structure databases, predicting molecular properties, and aiding in drug discovery.
3. Chemoinformatics tools and methods have applications in diverse areas including organic synthesis, analytical chemistry, toxicology prediction, and agrochemical discovery.
The document summarizes the accomplishments of the National Resource for Network Biology (NRNB) over the past year, including:
- Over 100 publications citing NRNB funding and high usage of Cytoscape tools
- 18 supported tools, 93 collaborations, and training of over 100 users
- Progress on developing algorithms for differential network analysis, predictive networks, and multi-scale networks
- Launch of two new NRNB workgroups on single cell genomics and patient similarity networks
- 18 new collaboration projects in areas like cancer, neuroinflammation, and drug transporters
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
2016 Bio-IT World Cell Line Coordination 2016-04-06v1Bruce Kozuma
The document discusses enabling cross-group collaboration on cell lines at the Broad Institute by developing a common platform for sharing cell line metadata. It proposes establishing a Cell Line Master Data Review Board to set standards for metadata categories and curation. A framework is described using an institutional database as the canonical source of cell line definitions, with local data management systems ingesting this data and linking it to project-specific metadata. This approach aims to facilitate collaboration by providing a shared understanding of cell lines across different research groups.
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
This document evaluates the performance of structured and semi-structured tools for accessing bioinformatics databases. It compares the Sequence Retrieval System (SRS) and Entrez search tools for structured data retrieval to Perl and BioPerl programs for semi-structured data retrieval. The study retrieves gene information from the European Bioinformatics Institute and National Centre for Biotechnology Information databases using each method. It finds that semi-structured tools provide an alternative to structured tools, though each approach has advantages and disadvantages for certain types of queries.
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
As we may link: a model to support aggregated scientific knowledgePrashant Gupta
Today, researchers are bogged down by continually growing amount of complex and diverse scientific knowledge, fragmented and dispersed among various disciplines, communities and information resources. Contemporary digital tools are efficient in dealing with complexity and diversity of scientific knowledge and the process of science, but they have compartmentalized scientific knowledge among various disparate and disconnected systems. For example, databases are used to structure data to facilitate its easy retrieval; workflows are used to represent the process of experiments; analytical tools are used to support analyzing data and visualization tools to visualize data and results to gain better understanding. However, they rarely connect or join together to synthesize an integrated view. Our digital knowledge ecosystem is siloed and poses a challenge for researchers to search, comprehend and reproduce scientific experiments.
Vannevar Bush, in his article ‘As we may think’, discussed the huge data and information deluge and the challenge brought by the fragmentary nature of scientific knowledge. He proposed an imaginary machine – Memex – that could tie knowledge records in a mesh of associative trails, which can be reviewed and consulted as a form of graph search. This talk will discuss a model that adopts Bush’s associationist view to integrate scientific knowledge. Categories are commonly used in databases (in the form of logical schema) and ontologies (as concepts and properties), but often these artifacts are disconnected from eachother. The proposed model connects categories, along with their process of construction and evolution, with a database and ontology via tools that support their evolution. Connecting these knowledge artifacts (via their digital tools) explicitly not only provides an integrated view, but may also be capitalized to support mediation among these artifacts and keeping them consistent with new conceptualization. Such mediation among scientific artifacts will reconnect the computationally enabled science and the knowledge underpinning it.
This talk was part of the 2020 Disease Map Modeling Community meeting, covering the steps towards publishing reproducible simulation studies (based on a reused model). Links to different COMBINE guidelines, tutorials and efforts. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Data Provenance and Scientific Workflow ManagementNeuroMat
Introductory class on techniques and tools to manage scientific data, focusing on sources of information and data analysis. Lecturer: Prof. Kelly Rosa Braghetto, a NeuroMat associate investigator and a professor at the University of São Paulo's Department of Computer Science.
Introduction to FAIR principles in the context of computational biology models. Presented at a Workshop at the Basel Conference of Computational Biology. Grants: European Commission: EOSCsecretariat.eu - EOSCsecretariat.eu (831644)
Network-based machine learning approach for aggregating multi-modal dataSOYEON KIM
This document summarizes the author's PhD dissertation which develops network-based machine learning approaches for aggregating multi-modal data. It presents two key techniques: 1) multi-view network clustering to integrate different network views for clustering tasks like social-tagged image clustering, and 2) multi-layered network based pathway activity inference to infer pathway activities from integrated gene networks using techniques like directed random walks. It outlines experiments applying these techniques to various biomedical datasets for clustering, survival/metastasis prediction, and identifying cancer-related pathways and genes.
An information-theoretic, all-scales approach to comparing networksJim Bagrow
My presentation at NetSci 2018 on Portrait Divergence, a new approach to comparing networks that is simple, general-purpose, and easy to interpret.
The preprint: https://arxiv.org/abs/1804.03665
The code: https://github.com/bagrow/portrait-divergence
Domains such as drug discovery, data science, and policy studies increasing rely on the combination of complex analysis pipelines with integrated data sources to come to conclusions. A key question then arises is what are these conclusions based upon? Thus, there is a tension between integrating data for analysis and understanding where that data comes from (its provenance). In this talk, I describe recent work that is attempting to facilitate transparency by combining provenance tracked within databases with the data integration and analytics pipelines that feed them. I discuss this with respect to use cases from public policy as well as drug discovery.
Given at: http://ccct.uva.nl/content/ccct-seminar-21-february-2014
This document provides a summary of the 2013 annual progress report for the National Resource for Network Biology (NRNB). It describes the NRNB network in 2013, including personnel from various institutions collaborating on projects. It also summarizes the findings of the NRNB External Advisory Committee meeting in December 2012. The Committee found that the NRNB had made strong progress in developing new network analysis tools and demonstrating their value. They provided suggestions to optimize projects, measure success beyond citations, and prepare for renewal. Specific projects like TRD C on network-extracted ontologies were praised for progress. Cytoscape 3.0 development was also discussed.
This document summarizes the accomplishments of the National Resource for Network Biology over a reporting period. It lists numerous quantitative metrics of success, including over 100 publications citing their grants, thousands of daily downloads and uses of their software tools, and training over 100 users. It also provides details on improvements and developments made to several of their modeling frameworks, algorithms, and software applications. Finally, it outlines the formation of a new working group on single-cell RNA-seq analysis and visualization, and improvements made to their computing infrastructure.
The National Resource for Network Biology (NRNB) held its External Advisory Council meeting on December 12, 2012. The NRNB is focused on developing network biology tools and collaborating with investigators. It oversees various technology research and development projects, software releases including Cytoscape 3.0, collaboration projects, and outreach/training events. The meeting agenda covered progress updates and sought advice on future plans.
This document provides an annual progress report for the National Resource for Network Biology (NRNB) for the period of May 1, 2011 to April 30, 2012. It summarizes the following:
1) Advances made in developing algorithms to identify network modules and use modules as biomarkers for disease. This includes methods to capture complex logical relationships within modules.
2) Progress on tools to enable new network analysis and visualization capabilities, including a new version of Cytoscape.
3) Growth of collaborations through the NRNB, which have nearly doubled over the past year to around 100 projects.
4) Continued development of the Cytoscape App Store to support the user and developer community.
The National Resource for Network Biology (NRNB) aims to advance network biology science through bioinformatic methods, software, infrastructure, collaboration, and training. In the past year, the NRNB made progress in its specific aims, including developing new network analysis methods, catalyzing changes in network representation, establishing software and databases, engaging in collaborations, and providing training opportunities. Going forward, the NRNB plans to further develop methods for differential and predictive network analysis, multi-scale network representation, and pathway analysis tools.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses.
How do we find these undiscovered uses for existing drugs?
We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships.
In this talk we will cover:
Building the knowledge graph
Predicting latent relationships
Using the latent relationships to repurpose existing drugs
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
1. Chemoinformatics is the application of informatics methods to solve chemical problems and encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information.
2. It combines aspects of chemistry and computer science to address challenges such as representing and searching large chemical structure databases, predicting molecular properties, and aiding in drug discovery.
3. Chemoinformatics tools and methods have applications in diverse areas including organic synthesis, analytical chemistry, toxicology prediction, and agrochemical discovery.
The document summarizes the accomplishments of the National Resource for Network Biology (NRNB) over the past year, including:
- Over 100 publications citing NRNB funding and high usage of Cytoscape tools
- 18 supported tools, 93 collaborations, and training of over 100 users
- Progress on developing algorithms for differential network analysis, predictive networks, and multi-scale networks
- Launch of two new NRNB workgroups on single cell genomics and patient similarity networks
- 18 new collaboration projects in areas like cancer, neuroinflammation, and drug transporters
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLSIJDKP
The biomedical research literature is one among many other domains that hides a precious knowledge, and
the biomedical community made an extensive use of this scientific literature to discover the facts of
biomedical entities, such as disease, drugs,etc.MEDLINE is a huge database of biomedical research
papers which remain a significantly underutilized source of biological information. Discovering the useful
knowledge from such huge corpus leads to various problems related to the type of information such as the
concepts related to the domain of texts and the semantic relationship associated with them. In this paper,
we propose a Two-level model for Self-supervised relation extraction from MEDLINE using Unified
Medical Language System (UMLS) Knowledge base. The model uses a Self-supervised Approach for
Relation Extraction (RE) by constructing enhanced training examples using information from UMLS. The
model shows a better result in comparison with current state of the art and naïve approaches
2016 Bio-IT World Cell Line Coordination 2016-04-06v1Bruce Kozuma
The document discusses enabling cross-group collaboration on cell lines at the Broad Institute by developing a common platform for sharing cell line metadata. It proposes establishing a Cell Line Master Data Review Board to set standards for metadata categories and curation. A framework is described using an institutional database as the canonical source of cell line definitions, with local data management systems ingesting this data and linking it to project-specific metadata. This approach aims to facilitate collaboration by providing a shared understanding of cell lines across different research groups.
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
This document evaluates the performance of structured and semi-structured tools for accessing bioinformatics databases. It compares the Sequence Retrieval System (SRS) and Entrez search tools for structured data retrieval to Perl and BioPerl programs for semi-structured data retrieval. The study retrieves gene information from the European Bioinformatics Institute and National Centre for Biotechnology Information databases using each method. It finds that semi-structured tools provide an alternative to structured tools, though each approach has advantages and disadvantages for certain types of queries.
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
As we may link: a model to support aggregated scientific knowledgePrashant Gupta
Today, researchers are bogged down by continually growing amount of complex and diverse scientific knowledge, fragmented and dispersed among various disciplines, communities and information resources. Contemporary digital tools are efficient in dealing with complexity and diversity of scientific knowledge and the process of science, but they have compartmentalized scientific knowledge among various disparate and disconnected systems. For example, databases are used to structure data to facilitate its easy retrieval; workflows are used to represent the process of experiments; analytical tools are used to support analyzing data and visualization tools to visualize data and results to gain better understanding. However, they rarely connect or join together to synthesize an integrated view. Our digital knowledge ecosystem is siloed and poses a challenge for researchers to search, comprehend and reproduce scientific experiments.
Vannevar Bush, in his article ‘As we may think’, discussed the huge data and information deluge and the challenge brought by the fragmentary nature of scientific knowledge. He proposed an imaginary machine – Memex – that could tie knowledge records in a mesh of associative trails, which can be reviewed and consulted as a form of graph search. This talk will discuss a model that adopts Bush’s associationist view to integrate scientific knowledge. Categories are commonly used in databases (in the form of logical schema) and ontologies (as concepts and properties), but often these artifacts are disconnected from eachother. The proposed model connects categories, along with their process of construction and evolution, with a database and ontology via tools that support their evolution. Connecting these knowledge artifacts (via their digital tools) explicitly not only provides an integrated view, but may also be capitalized to support mediation among these artifacts and keeping them consistent with new conceptualization. Such mediation among scientific artifacts will reconnect the computationally enabled science and the knowledge underpinning it.
The document presents a cell-cycle knowledge integration framework that aims to capture the semantics, temporal aspects, and dynamics of the cell cycle regulatory process. It proposes an enhanced Cell-Cycle Ontology (CCO) as an extension to existing ontologies like the Gene Ontology. A data integration pipeline is described that covers the development and maintenance of the CCO knowledge base by integrating data from multiple sources. The CCO knowledge base is intended to facilitate computational analysis of cell cycle studies and enable hypothesis evaluation through reasoning services.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
Supporting scientific discovery through linkages of literature and dataDon Pellegrino
Poster on "Supporting scientific discovery through linkages of literature and data" by Don Pellegrino, Jean-Claude Bradley and Chaomei Chen. To be presented on May 3, 2011 as part of an event for the Center for Visual and Decision Informatics. More this NSF funded center can be found online at [http://nsf-cvdi.louisiana.edu/]. More on Don Pellegrino's research activities can be found online at [http://www.donpellegrino.com].
Excited to share our vision for bioinformatics education available for students and researchers that want to apply advanced multi-omics integration and machine learning to large biomedical datasets. Practice and learn from real-life projects.
Connecting and synchronizing scientific knowledgePrashant Gupta
This document discusses connecting and synchronizing scientific knowledge through an associationist model. It proposes a process model for tracking the evolution of categories in scientific knowledge over time. An example tool called AdvoCate is described that would allow researchers to model changes to categories and maintain connections between changing categories and tools like databases and ontologies. Future work includes formalizing the data model, developing change recognition rules for different category models, visualizing category evolution, and integrating with ontology and database evolution tools.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
This document proposes a conceptual framework to unify representations of natural systems knowledge. The framework is based on separating the ontological nature of an object of study from the context of its observation. Each object is associated with a concept defined in an ontology and an observation context describing aspects like location and time. Models and data are treated as generic knowledge sources with a semantic type and observation context. This allows flexible integration and calculation of states across heterogeneous sources by composing their observation contexts and resolving semantic compatibility. The framework aims to simplify knowledge representation by abstracting away complexity related to data format and scale.
The document summarizes research on enabling data reuse from published datasets. It reviews 40 papers that cataloged 39 different features of datasets that can enable reuse. These features are grouped into categories related to enabling access, documenting methodological choices and quality, and helping users understand and situate the data. The paper presents a case study analyzing over 1.4 million data files from more than 65,000 repositories on GitHub, relating dataset engagement metrics to various reuse features. Using these metrics as proxies for reuse, an initial deep learning model is developed to predict a dataset's reusability based on its documented features. This work demonstrates the gap between existing principles for enabling reuse and actionable insights that can help data publishers and tools implement functionalities proven
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
Requirements for reproducibility in computational chemistry publications include making available the data, code or algorithms, and results from the study. Authors should provide all data necessary to understand and assess their conclusions. Source code or detailed algorithm descriptions should also be included to allow independent reproduction of the work. Finally, publications must contain the actual results from applying the method rather than just describing results. Adopting these standards of transparency helps ensure others can evaluate and build upon published research claims.
This document outlines plans to develop systems and tools to support the University of Liverpool in meeting requirements for the upcoming Research Excellence Framework (REF), which will replace the Research Assessment Exercise (RAE). It discusses developing an institutional repository for research publications, providing open access to publications, collecting metadata, and creating a searchable "oracle of research excellence." The library will work on migrating data from existing systems, tools for data collection, and redeveloping staff research profiles and other reports. This will help support REF requirements while also providing benefits like increased research visibility and funding.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
Talk covering how knowledge graphs are making us rethink how change occurs in Knowledge Organization Systems. Based on https://arxiv.org/abs/1611.00217
This document discusses profiling linked open data. It outlines the research background, plan, and preliminary results of profiling linked open data. The research aims to automatically generate new statistics and knowledge patterns to provide dataset summaries and inspect data quality. Preliminary results include profiling Italian public administration websites for compliance with open data policies and automatically classifying over 1,000 linked data sets into 8 topics with over 80% accuracy. Future work involves enriching the framework with additional statistics and applying it to unstructured microdata.
Next-Generation Search Engines for Information RetrievalWaqas Tariq
In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is generally rich, not easy to understand, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. This data should be stored in a format that can be locatable, retrievable and understandable, more importantly it should be in a form that will continue to be accessible as technology changes, such as XML. New search technologies are being implemented around these protocols, which makes searching easy, fast and yet robust. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences.
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
This document discusses the need for improved scientific data management systems to support data-driven discovery. It proposes adopting a digital asset management (DAM) approach used in creative fields like photography. Key points:
- Current scientific data management is manual and cannot scale with increasing data volumes and complexity, slowing the pace of discovery.
- A DAM framework is proposed to automate data acquisition, organization, access and sharing using metadata and models tailored for each scientific domain.
- The framework would transform how scientists interact with data, facilitating analysis and reproducibility.
- An initial DAM platform called DERIVA is presented and has been evaluated positively in early use cases.
The interplay between data-driven and theory-driven methods for chemical scie...Ichigaku Takigawa
The 1st International Symposium on Human InformatiX
X-Dimensional Human Informatics and Biology
ATR, Kyoto, February 27-28, 2020
https://human-informatix.atr.jp
Similar to E bank uk_linking_research_data_scholarly (20)
This document is a research report on nanotechnology synthesis study conducted by researchers at the University of Houston. It investigated potential nanotechnology applications in highway pavements, mainly in two categories: smart materials for pavement construction and sensors for transportation infrastructure condition monitoring. The report describes various nanomaterials that could be used in pavement construction as well as several MEMS and nanosensor technologies that could enable low-cost wireless sensor networks for monitoring pavement infrastructure conditions. It also provides examples of potential nanotechnology applications including a smart stop sign prototype developed by the researchers.
This document discusses the ethical, legal and social aspects of systems from the perspective of a system administrator. It notes that system administrators have broad access to computer systems which requires adherence to codes of ethics regarding privacy and appropriate use of privileged access. It also summarizes some of the social impacts of technology including health issues from excessive screen time, impacts on employment, and changing social dynamics with the rise of mobile phones and internet use. Finally, it discusses the "digital divide" between those who do and do not have access to modern information and communications technologies.
This document presents a study on high speed data transmission of images over the visible light spectrum using a PIC microcontroller. It discusses how visible light communication (VLC) provides higher bandwidth than wireless fidelity (Wi-Fi) as the visible light spectrum is much wider than the microwave spectrum used for Wi-Fi. The study involves designing a transmitter and receiver system to send sensor data as images using light as the transmission medium. The system is implemented using a microcontroller to convert electrical signals to light signals and vice versa using a light emitting diode and photodiode. It is expected to provide secure transmission of data through light between dedicated nodes without human interference for applications like smart lighting and the internet of things.
This document discusses potential nanoscale substitutes for silicon in computer memory devices. It begins by explaining the limitations of silicon due to Moore's law and the physical challenges of continuing to shrink transistors. Then it examines several potential substitutes like indium gallium arsenide, vanadium oxide bronze, and carbon nanotubes. For each material, it provides details on their synthesis and properties that make them promising replacements for silicon. It suggests that within the next two decades, computer chips made from these nanoscale substitutes will replace current silicon-based chips due to their ability to store more information at lower power.
This document discusses big data security issues and encryption techniques. It begins with introducing the authors and providing an abstract about addressing security issues related to data integrity, confidentiality and availability. The main body then covers authentication, data, network and generic security challenges with big data as well as proposed solutions like encryption, logging and honeypot nodes. The document focuses on describing symmetric and asymmetric encryption algorithms including DES, AES, RSA and ECC. It compares the algorithms and concludes that encryption techniques can help secure big data stored in clouds, though current security levels may be improved.
This document provides a summary of a research article that conducted a survey of indoor positioning and navigation systems and technologies. It discusses how positioning and navigation technologies developed first for outdoor use but researchers have attempted to implement them indoors with varying levels of success. It outlines several technologies used for indoor positioning including infrared, ultrasound, radio frequency, and pedestrian dead reckoning. It also discusses positioning techniques like time of arrival, received signal strength indication, and fingerprinting. The survey analyzed accuracy, complexity, cost and other metrics of different positioning technologies. It concluded that the study has implications for future research on indoor positioning and navigation systems.
This document provides an overview of lossless data compression techniques. It discusses Huffman coding, Shannon-Fano coding, and Run Length Encoding as common lossless compression algorithms. Huffman coding assigns variable length binary codes to symbols based on their frequency, with more common symbols getting shorter codes. Shannon-Fano coding similarly generates a binary tree to assign codes but aims for a roughly equal probability between left and right subtrees. Run Length Encoding replaces repeated sequences with the length of the run and the symbol. The document contrasts lossless techniques that preserve all data with lossy techniques used for media that can tolerate some loss of information.
Delete unnecessary files and uninstall unused programs before performing disk cleanup and defragmentation. Run disk cleanup to delete temporary files and uninstall programs, check all boxes and confirm deletion. Then defragment the hard drive by selecting the drive and waiting for the process to finish to optimize file organization.
Shs applied empowerment-technologies-for-the-strandLuisa Francisco
The document outlines an Empowerment Technologies curriculum for senior high school students in the applied track. It covers 18 topics over two semesters, focusing on using information and communication technologies as tools for learning in various professional fields. Students will learn about online platforms, productivity tools, graphic design, web design, collaboration, multimedia, and using ICTs for social change. They will apply their skills by developing collaborative online projects advocating for issues related to their professional track.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Generative AI leverages algorithms to create various forms of content
E bank uk_linking_research_data_scholarly
1. eBank UK: linking research data, scholarly communication
and learning
Dr Liz Lyon, Rachel Heery, Monica Duke,
UKOLN, University of Bath
Dr Simon Coles, Dr Jeremy Frey, Prof. Michael Hursthouse,
School of Chemistry, University of Southampton
Dr Leslie Carr, Christopher Gutteridge,
School of Electronics and Computer Science, University of Southampton
Abstract
This paper includes an overview of the changing landscape of scholarly communication and
describes outcomes from the innovative eBank UK project, which seeks to build links from e-
research through to e-learning. As introduction, the scholarly knowledge cycle is described and the
role of digital repositories and aggregator services in linking data-sets from Grid-enabled projects to
e-prints through to peer-reviewed articles as resources in portals and Learning Management
Systems, are assessed. The development outcomes from the eBank UK project are presented
including the distributed information architecture, requirements for common ontologies, data
models, metadata schema, open linking technologies, provenance and workflows. Some emerging
challenges for the future are presented in conclusion.
1. Introduction and context – the
scholarly knowledge cycle
The background and context for the eBank
project is the changing landscape of scholarly
communication afforded by the developing
global network infrastructure (Internet, World
Wide Web, Semantic Web and Grid) together
with the potential for increasing links between
the activities of e-research / e-science, digital
libraries and e-learning at all levels from
schools to post-graduate.
The flow of data and information in the
process of research and learning in an academic
setting leads ultimately to the creation and
acquisition of knowledge. Data and information
is continuously used and re-used in new ways as
part of a research activity or in the development
of course materials in support of learning. These
workflows are interlinked and together underpin
the scholarly knowledge cycle (figure 1) [SKC].
The creation, management, sharing and
interoperation of original data (which may be,
for example, numerical data generated by an
experiment or a survey, or images captured as
part of a clinical study together with the
associated metadata) is an intrinsic part of e-
Science, keeping data & metadata together. This
process consists of one or more subprocesses
which might include aggregation of
experimental data, selection of a particular data
subset, repetition of a laboratory experiment,
statistical analysis or modelling of a set of data,
manipulation of a molecular structure,
annotation of a diagram or editing of a digital
image, and which in turn generate modified
datasets. This in turn entails managing version
control and access to previous versions, as well
as auditing information on processes and
transformations.
The successive levels of derived data are
closely related to the original data (through
refinement, transformation, interpretation,
generalisation etc.) and may themselves be
further processed to form successive
interpretations or abstractions. If the data at any
particular stage of processing is considered to
be generically useful, it may be considered for
inclusion in a publicly-available database for
use beyond the boundaries of the project which
funded the data. This has only been undertaken
in fairly specialised fields of joint community
endeavour (for example in the bioinformatics
field, protein and genome sequences databases,
and in chemistry, material safety information,
and the crystal structures Cambridge Structural
2. Figure 1: The Scholarly Knowledge Cycle
Database), as such databases require some level
of service guarantee from a service provider.
Provision in other chemistry areas i.e. on
spectroscopic data is patchy due to the cost of
maintaining the data as well as attitudes towards
sharing.
Eventually, useful and interesting
knowledge can be interpreted from the
processed data; this knowledge is interpolated
into a scientific article for communication with
the scientific community through the
publication process. Sufficient data must be
included to support the arguments and claims
advanced in the article, however the practical
limits of the publication medium mean that the
data is shown in a very restricted form (such as
a graph or summary table). As well as
submitting the paper for peer review and
publication, it may be deposited in an eprint
archive for immediate dissemination and
discussion by the community.
These secondary items (the articles and
database tables) can themselves re-enter the
research cycle through a citation in a related
paper or inclusion in a meta-study, or inclusion
in postgraduate training or undergraduate
learning activities by a reference in a reading
list or in modular materials as part of an online
course in a Learning Management System.
The eBank UK project is addressing this
challenge of whole-lifecycle reuse by
investigating the role of aggregator services in
linking data-sets from Grid-enabled projects to
e-prints contained in digital repositories through
to peer-reviewed articles as resources in e-
learning portals.
2. The eBank UK Project –
addressing the data publication
bottleneck
This innovative JISC-funded project which is
led by UKOLN in partnership with the
Universities of Southampton and Manchester, is
seeking to build the links between e-research
data, scholarly communication and other on-line
sources. It is working in the chemistry domain
with the EPSRC funded eScience testbed
CombeChem [combechem], which is a pilot
project that seeks to apply the Grid philosophy
to integrate existing structure and property data
sources into an information and knowledge
environment. The specific examplar chosen
from this subject area is that of crystallography
as it has a strict workflow and produces data
3. that is rigidly formatted to an internationally
accepted standard, that demonstrates the
usefulness of an end-to-end philosophy,
tracking from laboratory to literature and back
again (figure 2).
Figure 2. A generalised workflow for
crystallography experiments.
The EPSRC National Crystallography
Service (NCS) is housed in the School of
Chemistry at the University of Southampton,
and is an ideal case study due to its high sample
throughput, state of the art instrumentation,
expert personnel and profile in the academic
chemistry community. Moreover, recent
advances in crystallographic technology and
computational resources have caused an
explosion of crystallographic data, as shown by
the recent exponential growth of the Crystal
Structure Database (CSD). However, despite
this rise it is commonly recognised that, via
traditional publication routes, approximately
only 20% of the data generated globally is
reaching the public domain. This situation is
even worse in the high throughput NCS
scenario where approximately 15% of its output
is disseminated, despite producing more than 60
peer reviewed journal articles per annum. With
the advent of the eScience environment
imminent, this problem can only get more
severe.
The key issue is that the accepted unit of
dissemination is the peer-reviewed journal
article; the relationship between the article and
the collection of data that underpins it (as made
available in the published article itself) is weak,
being expressed in reduced graphical or tabular
form. Consequently not only is a small minority
of scientific research being published in
recognised outlets, but the amount of useful
information being disseminated by those articles
is a tiny fraction of the available (original)
information.
Figure 3. A crystallographic EPrint
record, containing bibliographic information,
an interactive visualisation of the derived
structure, metadata and links to all the data
generated during the course of the
experiment.
generic schema for
scientific experiments.
To improve dissemination of published
articles, the Open Access Initiative allows
researchers to share metadata describing papers
that they make available in ‘institutional’ or
‘subject-based’ repositories. Building on the
OAI concept we present an institutional
repository that also makes available all the raw,
derived and results data from a crystallographic
experiment that cannot be included in the peer-
reviewed published article (figure 3). A schema
for the crystallographic experiment has been
devised that details crystallographic metadata
items and is built on a
The deposition process generates metadata
to be presented to the OAI interface by two
different mechanisms. Firstly the depositor must
enter metadata manually, which is mainly
common bibliographic information but also
includes chemical information items that have
been added to the conventional Dublin Core
EPrint
(Local)
EBank
(World
JOURNAL
PUBLICATION
EBank
REPORT
REPORT
(EPrint)
STRUCTURE
REPORT
CIF
DATASET
(Contains
DATAFILES)
RESULTS
DERIVED
RAW DATA
HOLDING
INVESTIGATION
4. Files associated with this stage Metadata associated with this stage
Name Description of the stage
File Type Description Name Data Type
Initialisation Mount new sample on diffractometer
Parameterisation to set up data collection
datcol.non
i*.kcd
.drx
ASCII
BINARY
ASCII
Parameters for producing i*.kcd files
Unit cell determination images
Unit cell
Morphology STRING
Collection Collect Data collect.rmat
datcol.non
.py
s*.kcd
*scan*.jpg
ASCII
ASCII
ASCII
BINARY
JPG
Orientation of the crystal
Command file
Proprietary configuration file
Diffraction images
Visual version of .kcd file
Instrument_Type
Temperature
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
STRING
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
Processing Process and correct images scale_all.in
scale_all.out
.hkl
.htm
ASCII
ASCII
ASCII
HTML
Result of processing
Result of correction on processed data
Derived data set
Report file
Cell_a
Cell_b
Cell_c
Cell_alpha
Cell_beta
Cell_gamma
Crystal_system
Completeness
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
INTEGER
INTEGER
INTEGER
INTEGER
INTEGER
INTEGER
STRING
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
Solution Solve Structures .prp
xs.lst
ASCII
ASCII
Symmetry file, log of process
Solution log file
Space_group
Figure_of_merit
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
STRING
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
Refinement Refine Structure xl.lst
.res
ASCII
ASCII
Final refinement listing
Output coordinates
R1_obs
wR2_obs
R1_all
wR2_all
Software_Name (x n)
Software_Version (x n)
Software_URL (x n)
INTEGER
INTEGER
INTEGER
INTEGER
STRING (x n)
INTEGER (x n)
URL (x n)
CIF Produce CIF .cif ASCII Final results
Report Generate e-Print report .html .HTML Publication format (HTML/XHTML) Authors
Affiliations
Formula
Compound_name
STRING
STRING
STRING
STRING
2D_diagram STRING
tion of the crystallography experiment schema,
which is an
ternational identifier that uniquely describes
the
nd transform it into knowledge and
the publication bottleneck problem can be
ques based on
CGI-mechanisms and portal-related standards,
par
schema. The chemical information includes, for
example, an InChI code,
in
two dimensional structure.
During the deposition of data in a
Crystallographic EPrint metadata items are
seamlessly extracted and indexed for searching
at the local archive level. The top level
document includes 'Dublin Core bibliographic'
and 'chemical identifier' metadata elements in
an Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) [OAIPMH]
compliant form, which allow access to a
secondary level of searchable crystallographic
metadata items (figure 4), which are in turn
directly linked to the associated archived data.
In this manner the output from a
crystallographic experiment may be
disseminated as ‘data’ in such a way that
aggregator services and researchers may add
value to it a
overcome.
3. Architecture and Data Flow
The metadata about the datasets available in the
repository will be harvested by a central service
using the OAI-PMH. This metadata will then be
indexed together with any other available
metadata about research publications. A
searchable interface will enable users to
discover datasets as well as related literature.
The metadata contains links back to the datasets
which the user will be able to follow in order to
obtain access to the original data, when this is
available. Harvested metadata from different
repositories not only provides a common entry
point to potentially disparate resources (such as
datasets in dataset repositories and published
literature which may reside elsewhere) but also
offers the potential of enhancement of the
metadata such as the addition of subject
keywords to research datasets based on the
knowledge of subject classification terms
assigned to related publications. A further area
of work investigates the embedding of the
search interface within web sites, adopting their
look-and-feel. The PSIgate1
portal will be used
to pilot these embedding techni
ticularly for resource discovery in a teaching
and research training context.
During deposition of the crystallographic
experiment data sets into the e-data repository, a
number of descriptive elements are either
entered by the depositor (figure 6), or generated
automatically, as described previously. The
repository is OAI-PMH compliant, such that it
Figure 4. A representa indicating all the files
generated during the course of the experiment.
1
http://www.psigate.ac.uk/
5. can respond to a number of (HTTP) requests
made by ‘harvesters’. Harvesters collect
metadata from the e-data repositories and can
pool metadata from distributed
InChI International Chemical Identifier
Figure 5. The schema ele
sources, thus
per ming the function of aggregators. Once an
agg
L interface with
link back to the data repositories where the
ori
arch and Retrieve
Web Service (SRW) [SRW]. This provides a
bas
r, the
knowledge that the aggregator possesses about
as been harvested, and thus
hange of more specialised
metadata sets is encouraged (see figure 5 for
sig
blin Core goes
som way to providing interoperability,
L interface with
link back to the data repositories where the
ori
arch and Retrieve
Web Service (SRW) [SRW]. This provides a
bas
r, the
knowledge that the aggregator possesses about
as been harvested, and thus
hange of more specialised
metadata sets is encouraged (see figure 5 for
sig
blin Core goes
som way to providing interoperability,
the resources in the repositories is embodied in
the metadata that h
the resources in the repositories is embodied in
the metadata that h
presented to the OAI interface.
Data Name Data Description Data Type XML wrapped content
EPrint_type
Creators
Affiliations
Formula_empirical
Title
CCDC_Code
Compound_class
Available_data
Related_publications
Publication_date
‘Crystal Structure’
ePrint author(s)
Institution(s) of creator(s)
Total atom count
IUPAC Chemical name
Cambridge Structural Database identifier
Chemical category
Actual data available for various eData Report stages
Other output relating to this compound/structure
Date of releasing eData REport to eBank/world
String
String
String
String
String
String
String (set)
String
String
Phrase ‘Crystal Structure’
ePrint authors ‘Surname, Christian name, initial’
Various authors addresses
Atom symbols with their total count (can be real
number) subscript
Chemical name with text & integers
6 digit code
1 word descriptor of chemical category
Presence of data associated with all stages
Literature reference link
Keywords U
Date of public release of ePrint
ndefined descriptors String Phrase describing chemical relevance
Structure_diagram 3D structure diagram CML
CML
Three dimensional structural diagram in CML format
Unique compound identifier (contains some
structural information)
ments
the services that it can offer are directly related
to the quality, quantity and content of the
metadata that is available (see section 4).
4. Meeting the interoperability
challenge: design of an XML schema
for harvesting.
The OAI-PMH supports selective harvesting of
repositories, based on the partitioning of
resources into ‘sets’ or on last updated date, so
that cumulative harvesting of repositories can be
performed. Data Exchange in the OAI-PMH is
carried out in XML. Whilst the protocol
specifies provision of the simple Dublin Core
set of elements as a minimum interoperability
requirement, the exc
the services that it can offer are directly related
to the quality, quantity and content of the
metadata that is available (see section 4).
4. Meeting the interoperability
challenge: design of an XML schema
for harvesting.
The OAI-PMH supports selective harvesting of
repositories, based on the partitioning of
resources into ‘sets’ or on last updated date, so
that cumulative harvesting of repositories can be
performed. Data Exchange in the OAI-PMH is
carried out in XML. Whilst the protocol
specifies provision of the simple Dublin Core
set of elements as a minimum interoperability
requirement, the exc
for
regator has harvested collections of
metadata, it can take on the role of the ‘service
provider’ (in OAI-PMH terms). eBank UK has
developed a demonstrator of one such
aggregator, or service provider.
One immediate benefit of the harvesting
approach is that the service provider can now
act as a single entry point for the discovery of
resources that are distributed in disparate
repositories. Once the metadata has been
harvested by the eBank UK aggregator, an
SGML indexing and search engine is used to
provide the searching functionality over the
metadata. The search results can be presented
to the user through an HTM
e searching functionality over the
metadata. The search results can be presented
to the user through an HTM nificant chemistry information used in
eBank). The metadata, encoded in XML, is
simply wrapped up in the OAI-PMH response
wrappers (also encoded in XML). The
alternative metadata formats can be specified in
the requests by stating the ‘metadata format’
parameter required.
Experience within the ePrints community
has shown that there are significant challenges
to be met when building services on aggregated
metadata [Guy]. Although a simple set of
metadata elements like the Du
nificant chemistry information used in
eBank). The metadata, encoded in XML, is
simply wrapped up in the OAI-PMH response
wrappers (also encoded in XML). The
alternative metadata formats can be specified in
the requests by stating the ‘metadata format’
parameter required.
Experience within the ePrints community
has shown that there are significant challenges
to be met when building services on aggregated
metadata [Guy]. Although a simple set of
metadata elements like the Du
s
s
ginal data resides. Any links made between
different data sets or data sets and published
literature by searching across the aggregated
metadata are revealed to the user through the
display of the search results.
Moreover, the Cheshire search engine
[Cheshire] used in the eBank UK service
provider makes it simple to make the data
available through additional machine interfaces,
such as the SOAP-based Se
ginal data resides. Any links made between
different data sets or data sets and published
literature by searching across the aggregated
metadata are revealed to the user through the
display of the search results.
Moreover, the Cheshire search engine
[Cheshire] used in the eBank UK service
provider makes it simple to make the data
available through additional machine interfaces,
such as the SOAP-based Se
e
e
is for embedding the search service into
external frameworks, such as portals; other
mechanisms that could be employed include
simple CGI-based mechanisms and portal
protocols such as WSRP.
The harvesting approach overcomes the
limitations of cross searching protocols, which
are well-documented in the digital library
literature [Powell, Lossau]. Howeve
is for embedding the search service into
external frameworks, such as portals; other
mechanisms that could be employed include
simple CGI-based mechanisms and portal
protocols such as WSRP.
The harvesting approach overcomes the
limitations of cross searching protocols, which
are well-documented in the digital library
literature [Powell, Lossau]. Howeve
guidelines are required to help reach some
consensus on the specific interpretation of the
use of those elements, for example by
specifying an acceptable minimum of metadata
to be completed, agreeing on the format for
author names and encoding the identifiers of the
resources described [Powell2].
In the eBank UK project we have made use of
the extension mechanisms of the so-called
qualified Dublin Core. We reuse some of the
guidelines are required to help reach some
consensus on the specific interpretation of the
use of those elements, for example by
specifying an acceptable minimum of metadata
to be completed, agreeing on the format for
author names and encoding the identifiers of the
resources described [Powell2].
In the eBank UK project we have made use of
the extension mechanisms of the so-called
qualified Dublin Core. We reuse some of the
6. basic 15 Dublin Core elements, such as creator
and type [SimpleDC]. The identifier field
contains a link to the HTML page in the edata
repository that contains links to all the data sets
subject of the experimental data. The encoding
of these vocabularies in the XML schemas has
been developed in line with the
recommendations of the Dublin Core Metadata
Initiative [DCguidelines], and they can be easily
substituted when standard vocabularies emerge
within the Crystallography community. Further,
the Dublin Core guidelines for
lines], and they can be easily
substituted when standard vocabularies emerge
within the Crystallography community. Further,
the Dublin Core guidelines for
Figure 6. Entering metadata during the
deposition process.
related to an experiment. The metadata also
contains references to the identifiers of
individual sets of data within an experiment, to
enable future links to be made should that data
be referenced elsewhere. Vocabularies specific
to the crystallography data are being used to
describe the datasets types as well as the names
and identifiers of the chemicals which are the
encoding in
d that “Encoding schemes
.”
There are a number of guidelines for emergent
sta
DC should
cilitate the mapping of the elements to simple
Du
ndards (if not the articles
emselves) [announcement]. The eBank UK
pro
proach of offering data (often in
the region of terabytes in size) through one
e other hand, the CCLRC
g a model which attempts to
encoding in
d that “Encoding schemes
.”
There are a number of guidelines for emergent
sta
DC should
cilitate the mapping of the elements to simple
Du
ndards (if not the articles
emselves) [announcement]. The eBank UK
pro
proach of offering data (often in
the region of terabytes in size) through one
e other hand, the CCLRC
g a model which attempts to
XML recommen
should be implemented using the 'xsi:type'
attribute of the XML element for the property
XML recommen
should be implemented using the 'xsi:type'
attribute of the XML element for the property
ndards in the chemistry community
[Chemistry xml]; an example from our schema
is the representation of the Chemical empirical
formula in the following manner:
<dc:subject
xsi:type="ebankterms:empiricalFormula">C288
H200 Cl24 F48 O48 P16 Pd16</dc:subject>
There are a number of reasons for using the
combination of Simple and Qualified Dublin
Core. Firstly, the use of qualified
ndards in the chemistry community
[Chemistry xml]; an example from our schema
is the representation of the Chemical empirical
formula in the following manner:
<dc:subject
xsi:type="ebankterms:empiricalFormula">C288
H200 Cl24 F48 O48 P16 Pd16</dc:subject>
There are a number of reasons for using the
combination of Simple and Qualified Dublin
Core. Firstly, the use of qualified
fa
fa
blin Core, which as mentioned earlier, is a
requisite for OAI-PMH compliant repositories.
Secondly, by re-using and adhering to a well-
documented standard it is hoped that the
applicability of the approach will extend to
other communities that the eBank UK project
hopes to interact with in the future.
In principle, the Dublin Core fields used for
‘bibliographic’ information in the data set
descriptions should work well with descriptions
of published literature available through the
OAI-PMH and which could be harvested and
cross-searched in the eBank UK service. In
practice the Open Access approach is not
currently as widespread in the Chemistry
community as in some other fields, and
metadata available via the OAI-PMH that
describes the published literature is rather
scarce. However, some publishers have
recently become more willing to expose
bibliographic information on articles in journals
that they publish more openly i.e. through
specified open sta
blin Core, which as mentioned earlier, is a
requisite for OAI-PMH compliant repositories.
Secondly, by re-using and adhering to a well-
documented standard it is hoped that the
applicability of the approach will extend to
other communities that the eBank UK project
hopes to interact with in the future.
In principle, the Dublin Core fields used for
‘bibliographic’ information in the data set
descriptions should work well with descriptions
of published literature available through the
OAI-PMH and which could be harvested and
cross-searched in the eBank UK service. In
practice the Open Access approach is not
currently as widespread in the Chemistry
community as in some other fields, and
metadata available via the OAI-PMH that
describes the published literature is rather
scarce. However, some publishers have
recently become more willing to expose
bibliographic information on articles in journals
that they publish more openly i.e. through
specified open sta
th
th
ject has been pro-active in engaging the
leading organisations involved in the quality
control, publication and dissemination of
crystallography data and it is hoped that the
relevant publishers can be encouraged to release
their metadata for interaction with the eBank
UK service demo.
One further point that should be noted is that
the current metadata exchanged to an extent
hides a degree of complexity in the underlying
experimental data being deposited by the
crystallographers. Such data is often manifested
as a series of related files that may have some
underlying ordering, based, for example, on the
stage of the experiment from which the data was
produced. In this context, the eBank UK project
is investigating the use of so-called ‘packaging
standards’ which attempt to address the need to
describe more complex objects. This is an issue
which is increasingly becoming of interest to
the OAI-PMH community [Herbert]. One other
initiative which is currently seeking to increase
access to scientific (climatic) data, has opted for
a simplified ap
ject has been pro-active in engaging the
leading organisations involved in the quality
control, publication and dissemination of
crystallography data and it is hoped that the
relevant publishers can be encouraged to release
their metadata for interaction with the eBank
UK service demo.
One further point that should be noted is that
the current metadata exchanged to an extent
hides a degree of complexity in the underlying
experimental data being deposited by the
crystallographers. Such data is often manifested
as a series of related files that may have some
underlying ordering, based, for example, on the
stage of the experiment from which the data was
produced. In this context, the eBank UK project
is investigating the use of so-called ‘packaging
standards’ which attempt to address the need to
describe more complex objects. This is an issue
which is increasingly becoming of interest to
the OAI-PMH community [Herbert]. One other
initiative which is currently seeking to increase
access to scientific (climatic) data, has opted for
a simplified ap
‘access’ point identified by a single identifier
[German]. On th
[cclrc] is developin
‘access’ point identified by a single identifier
[German]. On th
[cclrc] is developin
make explicit the inherent complexity of
scientific datasets, and the eBank UK project
could contribute the experience gained to date
to this effort.
make explicit the inherent complexity of
scientific datasets, and the eBank UK project
could contribute the experience gained to date
to this effort.
7. 5. Conclusions – issues and challenges
for the future
Th
he project is to
wo
experimentalists and sold to the publishing
whole in order to embed it into
the scholarly learning and research cycles. At
ance of the
softwa vation of
e data may also be addressed.
be
to gen approach to archiving and
dissem
may b her areas of chemistry
and also further to all the experimental sciences.
Refer
[annou
[Chem
http://www.iupac.org/projects/2002/2002
clrc] Sufi, S., Matthews, B. & Kleese van
[Chesh
.edu/
C., Surridge, M. and Welsh, A. (2003)
an, F.,
Fox, G. and Hey, T., Eds. Wiley Series
es p. 945-962.
cguidelines] Andy Powell and Pete Johnston
[eBank ject
JISC/EPSRC E-Science
[Germ and Citation of Scientific
tion-infrastructure of network-
based scientific-cooperation and digital
[Guy] 004)
in
k/issue38/guy/int
ro.html
[Herbe
ional Laboratory Digital
.org/dlib/november03/be
kaert/11bekaert.html
[Lossa
Discover the
Academic Internet D-Lib Magazine.
[oai-PM pen Archives Initiative
Protocol for Metadata Harvesting
[powel AI
t Gateway
Content. Poster paper, 10th WWW
is paper recounts the lessons learnt so far in
the provision of experimental data together with
the results of analysis linked to an e-print article
and has assessed the requirements for future
work and identified the key challenges that still
need to be addressed. Finally, the immediate
benefits to the learning & teaching and research
communities in the scholarly environment are
indicated and considered together with the
potential for very significant impact in the
longer term.
An immediate challenge for t
rk with the crystallographic community to
develop an internationally standard vocabulary
for the exchange of structural chemistry data via
OAI-PMH procedures. The eBank UK approach
must then be promoted to crystallography
community as a
the same time the issues of mainten
re and archive along with preser
th
The primary challenge for the future will
eralise this
inating crystallographic data so that it
e employed in ot
ences
ncement] Institute of Physics
http://www.iop.org/news/0467j
istry xml] XML Data Dictionaries in
Chemistry
-022-1-024.html
[c
Dam, K. (2003). An interdisciplinary
model for the representation of scientific
studies and associated data holdings. UK
e-Science All Hands Meeting,
Nottingham, 2-4 September 2003.
http://www.nesc.ac.uk/events/ahm2003/
AHMCD/pdf/020.pdf
ire] Cheshire Web Page
http://cheshire.berkeley
[combechem] Frey, J. G., Bradley, M., Essex, J.
W., Hursthouse, M. B., Lewis, S. M.,
Luck, M. M., Moreau, L., De Roure, D.
Grid Computing – Making the Global
Infrastructure a Reality, in Berm
in Communications Networking and
Distributed Systems, pag p
John Wiley and Sons.
[d
(2003) Guidelines for implementing
Dublin Core in XML
http://www.dublincore.org/documents/20
03/04/02/dc-xml-guidelines/
UK Project] eBank UK Pro
http://www.ukoln.ac.uk/projects/ebank-
uk/
an] Publication
Primary Data DFG-Support Program:
"Informa
publication"
Project Page: http://www.std-doi.de/
Marieke Guy and Andy Powell (2
Improving the Quality of Metadata
Eprint Archives, Ariadne Issue 38
January-2004
http://www.ariadne.ac.u
rt] Jeroen Bekaert, Patrick Hochstenbach
and Herbert Van de Sompel (2003)
Using MPEG-21 DIDL to Represent
Complex Digital Objects in the Los
Alamos Nat
Library. D-Lib Magazine. Vol 9 No 11.
http://www.dlib
u] Norbert Lossau (2004) Search Engine
Technology and Digital Libraries:
Libraries Need to
Volume 10 Number 6. June 2004 ISSN
1082-9873
http://www.dlib.org/dlib/june04/lossau/0
6lossau.html
H] The O
http://www.openarchives.org/OAI/opena
rchivesprotocol.html
l] Andy Powell (2001) An O
Approach to Sharing Subjec
8. conference, Hong Kong. May 2001
http://w
10/oaiposter/
ww.rdn.ac.uk/publications/www
[po ell2] Andy Powell, Michael Day and Peter
.dublincore.org/documents/20
02/07/31/dcmes-xml/
[SKC] Liz Lyon. (2003) eBank UK – building
the links between research data, scholarly
communication and learning. Ariadne, Issue 36
http://www.ariadne.ac.uk/issue36/lyon/
[SRW] SRW Search/Retrieve Web Service
http://www.loc.gov/z3950/agency/
zing/srw/
w
Cliff (2003). Using simple Dublin Core
to describe eprints. March 2003
http://www.rdn.ac.uk/projects/eprints-
uk/docs/simpledc-guidelines/
[SimpleDC] Expressing Simple Dublin Core in
XML
http://www