ELIXIR is a European research infrastructure for biological information that aims to support life science research. It brings together major bioinformatics providers and is supported by 17 EU member states. ELIXIR works to safeguard biological data and build sustainable data services. It establishes a distributed infrastructure to handle the large growth of data and provides tools, services, and platforms to facilitate access and analysis of data. ELIXIR also develops standards and provides training to support computational biology. Key activities include establishing national nodes, technical task forces, and pilot projects in areas like cloud resources, data transfer, and linking distributed databases.
ELIXIR aims to establish a pan-European infrastructure for biological information to support life sciences research. It will do this by coordinating nodes that provide services and resources, establishing standards, and closing skills gaps. Key challenges include sustaining data and services, ensuring interoperability, and dealing with increasingly large datasets. ELIXIR is working on pilots and task forces to address issues like cloud computing, storage, authentication and authorization.
This document discusses challenges in life sciences data management and services provided by ELIXIR to address these challenges. ELIXIR aims to facilitate life sciences research by building a sustainable infrastructure for biological data in Europe. It coordinates several nodes across member states that provide specialized data services. ELIXIR is also running pilot projects to test integration of services, including providing cloud access to reference data and distributed authentication and access to clinical archives. Future challenges include sustaining funding and scaling to handle exponentially growing data volumes.
ELIXIR is a European research infrastructure that aims to facilitate life sciences research by coordinating the development of sustainable bioinformatics services and tools across Europe. It is working on several pilot projects to improve data integration, access to cloud computing resources, and authentication and authorization processes for sensitive data. The document discusses ELIXIR's goals and various technical work streams and task forces focused on developing strategies and standards to address challenges in integrating distributed biological data resources.
The Norway Node of ELIXIR will provide expertise and infrastructure focused on marine resources and medical research. It will address challenges in processing and analyzing data from next-generation sequencing and other high-throughput methods. The Node will collaborate with several Norwegian universities to provide genomic resources, tools, and training, with emphasis on marine genomics, biomedical analysis, and secure handling of personal health data.
EMBL Australia Bioinformatics Resource BioInfoSummer 2016Philippa Griffin
EMBL-ABR is a distributed national research infrastructure in Australia that provides bioinformatics support to life science researchers. It aims to increase Australia's capacity to analyze large heterogeneous datasets, contribute to developing best practices in data management and tools, and enable engagement in international bioinformatics programs. EMBL-ABR has nodes across Australian institutions and works to showcase Australian research internationally and coordinate training in bioinformatics.
DisGeNET: A discovery platform for the dynamical exploration of human disease...Núria Queralt Rosinach
The document describes DisGeNET, a discovery platform for exploring gene-disease associations through integrated data sources including expert-curated databases and text mining of biomedical literature. DisGeNET contains over 17,000 genes and 14,000 diseases with 429,000 associations, integrating information from various sources through normalization. It provides tools for users to explore gene-disease relationships and supports biomedical research.
ELIXIR aims to establish a pan-European infrastructure for biological information to support life sciences research. It will do this by coordinating nodes that provide services and resources, establishing standards, and closing skills gaps. Key challenges include sustaining data and services, ensuring interoperability, and dealing with increasingly large datasets. ELIXIR is working on pilots and task forces to address issues like cloud computing, storage, authentication and authorization.
This document discusses challenges in life sciences data management and services provided by ELIXIR to address these challenges. ELIXIR aims to facilitate life sciences research by building a sustainable infrastructure for biological data in Europe. It coordinates several nodes across member states that provide specialized data services. ELIXIR is also running pilot projects to test integration of services, including providing cloud access to reference data and distributed authentication and access to clinical archives. Future challenges include sustaining funding and scaling to handle exponentially growing data volumes.
ELIXIR is a European research infrastructure that aims to facilitate life sciences research by coordinating the development of sustainable bioinformatics services and tools across Europe. It is working on several pilot projects to improve data integration, access to cloud computing resources, and authentication and authorization processes for sensitive data. The document discusses ELIXIR's goals and various technical work streams and task forces focused on developing strategies and standards to address challenges in integrating distributed biological data resources.
The Norway Node of ELIXIR will provide expertise and infrastructure focused on marine resources and medical research. It will address challenges in processing and analyzing data from next-generation sequencing and other high-throughput methods. The Node will collaborate with several Norwegian universities to provide genomic resources, tools, and training, with emphasis on marine genomics, biomedical analysis, and secure handling of personal health data.
EMBL Australia Bioinformatics Resource BioInfoSummer 2016Philippa Griffin
EMBL-ABR is a distributed national research infrastructure in Australia that provides bioinformatics support to life science researchers. It aims to increase Australia's capacity to analyze large heterogeneous datasets, contribute to developing best practices in data management and tools, and enable engagement in international bioinformatics programs. EMBL-ABR has nodes across Australian institutions and works to showcase Australian research internationally and coordinate training in bioinformatics.
DisGeNET: A discovery platform for the dynamical exploration of human disease...Núria Queralt Rosinach
The document describes DisGeNET, a discovery platform for exploring gene-disease associations through integrated data sources including expert-curated databases and text mining of biomedical literature. DisGeNET contains over 17,000 genes and 14,000 diseases with 429,000 associations, integrating information from various sources through normalization. It provides tools for users to explore gene-disease relationships and supports biomedical research.
ELIXIR is a European research infrastructure for biological information that aims to facilitate life sciences research across Europe. It brings together life science resources from member countries to build a robust infrastructure for biological data. Individual organizations or countries cannot achieve this alone. ELIXIR establishes national nodes that leverage local strengths and priorities to deliver shared services through a distributed network. This allows the infrastructure to scale effectively with increasing data challenges. ELIXIR also works to improve data integration and interoperability across distributed resources through activities like developing standards and linking related communities.
This document provides an overview of the European Life Sciences Infrastructure for Biological Information (ELIXIR). ELIXIR aims to build a sustainable infrastructure for biological data by coordinating existing life science resources across Europe. It will provide services for data, tools, computing, standards, and training. ELIXIR is establishing pilot projects to test integration of these services and address challenges like data access and scale. A Technical Coordination Group leads implementation and coordinates task forces to develop each area of the infrastructure. ELIXIR also partners with e-infrastructures to utilize high-performance computing and networking resources. Its goal is to support life science research and translation to areas like medicine, the environment, and bioindustry.
- Data transfer
- Compute access
- Training
Data Hub:
- Secure data sharing
- Data management
- Metadata catalogues
AAI:
- Single sign-on
- User attributes
- Group management
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
eROSA Stakeholder WS1: Ensembl, ELIXIR and engineering interconnectionse-ROSA
EMBL-EBI provides biological data services and is home to many data resources including Ensembl, Gene Ontology, Protein Data Bank, and others. It handles large volumes of data and users. Ensembl is a suite of software for genome analysis and visualization that was originally for vertebrates but now includes other taxa. ELIXIR coordinates biological data resources across Europe to enable access and ensure data is FAIR. There are challenges in agreeing standards, funding, and divisions of labor across many countries and domains.
The document discusses ELIXIR, the European Life Sciences Infrastructure for Biological Information. It provides information on ELIXIR's governance structure, member countries, and nodes. The nodes work with the central ELIXIR hub to develop and deliver bioinformatics services, resources, training, and more. The goal is to support life science research through integrated, interoperable data and tools.
Pistoia Alliance Debates - Data Sustainability 15-07-2015 16.00Pistoia Alliance
This document summarizes a webinar on sustaining translational data for secondary use. It introduces the panelists, who work in areas like pharmaceutical R&D and ELIXIR. Examples of sustainability models are discussed, such as those used by private companies, public-private partnerships like IMI, and disease organizations. ELIXIR services for archiving and discovering biomedical data, like the European Genome-phenome Archive, are also summarized. The webinar involved a panel discussion and Q&A on challenges of sustaining translational data.
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
The document provides information about ELIXIR-UK, which is the UK node of ELIXIR, a European infrastructure for biological information. ELIXIR-UK is a network of 18 UK organizations and has established training programs, services, and communities. It coordinates UK participation in ELIXIR and related EU projects. ELIXIR-UK also works to establish interoperability across biological data resources and help make these resources FAIR (Findable, Accessible, Interoperable, Reusable). It is working to establish BioFAIR, a proposed new institute that would coordinate UK life science data infrastructure.
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
The document discusses ELIXIR, the European infrastructure for biological information. ELIXIR aims to build a sustainable infrastructure to support life science research and its translation to various sectors. It coordinates existing bioinformatics services across Europe. The Italian node of ELIXIR (ELIXIR-ITA) coordinates domestic bioinformatics resources and connects them to the broader ELIXIR network. It works to aggregate and integrate Italy's small but excellent bioinformatics community and supports the large amount of data being generated through high-throughput sequencing platforms in Italy.
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016
The Tryggve project facilitates cross-border biomedical research by providing secure computing services across the Nordic countries. These services allow sensitive human data to be analyzed while protecting individual privacy through secure data storage, transfer and computing environments. The goal is to enable research collaboration while preventing unauthorized access to personal health data.
This document discusses standards and data integration in life sciences databases. It notes that there are many diverse and dispersed databases in molecular biology. Standards facilitate data sharing, integration, and reuse by defining common data representation and description formats. However, integrating data across different sources is challenging due to variables like different interfaces, data types, and levels of information. Initiatives like ELIXIR and HUPO PSI aim to improve interoperability between life sciences resources through defining community standards and best practices.
The document summarizes discussions from the Technical Coordinator Group (TCG) meeting. The TCG is an advisory body to the Heads of Nodes Committee and consists of technical experts from each ELIXIR Node. They discuss technical and scientific aspects of ELIXIR and identify best practices. The summary outlines the members of TCG, describes short term working groups led by technical coordinators on specific technical efforts, and provides updates on various ELIXIR task forces focusing on areas such as cloud, storage, authentication and authorization, service registry, and training.
This document discusses the challenges of interdisciplinarity and managing large amounts of biological data. It notes that while challenges seem easy from the perspective of one's own discipline, knowledge from other disciplines is needed. It then lists the names and affiliations of members of the "Dutch Team" working on issues related to biological data integration and management.
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
The document summarizes a workshop aimed at integrating resources between several bioinformatics standards registries. The workshop agenda includes presentations from Identifiers.org, BioSharing, BMB Service Registry, EDAM ontology, and the BMB standards registry. Breakout sessions will identify overlaps and potential synergies between the registries, and define areas for collaboration. The goal is to reduce duplication of efforts and develop a common integration and development strategy across registries.
Proteomics repositories integration using EUDAT resourcesRafael C. Jimenez
This document discusses plans to integrate proteomics data repositories using resources from the EUDAT data infrastructure. It describes replicating data from the ELIXIR repository PRIDE to EUDAT data centers for backup and access. This will test using EUDAT services like B2SAFE for replication and assigning persistent identifiers (PIDs) to datasets and files. The current status describes installing necessary software at participating sites and initial testing of replication from PRIDE to the Swedish National Bioinformatics Infrastructure data center. Future plans include syncing data changes and exploring data push/pull models between repositories.
ELIXIR is a European research infrastructure for biological information that aims to facilitate life sciences research across Europe. It brings together life science resources from member countries to build a robust infrastructure for biological data. Individual organizations or countries cannot achieve this alone. ELIXIR establishes national nodes that leverage local strengths and priorities to deliver shared services through a distributed network. This allows the infrastructure to scale effectively with increasing data challenges. ELIXIR also works to improve data integration and interoperability across distributed resources through activities like developing standards and linking related communities.
This document provides an overview of the European Life Sciences Infrastructure for Biological Information (ELIXIR). ELIXIR aims to build a sustainable infrastructure for biological data by coordinating existing life science resources across Europe. It will provide services for data, tools, computing, standards, and training. ELIXIR is establishing pilot projects to test integration of these services and address challenges like data access and scale. A Technical Coordination Group leads implementation and coordinates task forces to develop each area of the infrastructure. ELIXIR also partners with e-infrastructures to utilize high-performance computing and networking resources. Its goal is to support life science research and translation to areas like medicine, the environment, and bioindustry.
- Data transfer
- Compute access
- Training
Data Hub:
- Secure data sharing
- Data management
- Metadata catalogues
AAI:
- Single sign-on
- User attributes
- Group management
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
National
resources
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
eROSA Stakeholder WS1: Ensembl, ELIXIR and engineering interconnectionse-ROSA
EMBL-EBI provides biological data services and is home to many data resources including Ensembl, Gene Ontology, Protein Data Bank, and others. It handles large volumes of data and users. Ensembl is a suite of software for genome analysis and visualization that was originally for vertebrates but now includes other taxa. ELIXIR coordinates biological data resources across Europe to enable access and ensure data is FAIR. There are challenges in agreeing standards, funding, and divisions of labor across many countries and domains.
The document discusses ELIXIR, the European Life Sciences Infrastructure for Biological Information. It provides information on ELIXIR's governance structure, member countries, and nodes. The nodes work with the central ELIXIR hub to develop and deliver bioinformatics services, resources, training, and more. The goal is to support life science research through integrated, interoperable data and tools.
Pistoia Alliance Debates - Data Sustainability 15-07-2015 16.00Pistoia Alliance
This document summarizes a webinar on sustaining translational data for secondary use. It introduces the panelists, who work in areas like pharmaceutical R&D and ELIXIR. Examples of sustainability models are discussed, such as those used by private companies, public-private partnerships like IMI, and disease organizations. ELIXIR services for archiving and discovering biomedical data, like the European Genome-phenome Archive, are also summarized. The webinar involved a panel discussion and Q&A on challenges of sustaining translational data.
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
The document provides information about ELIXIR-UK, which is the UK node of ELIXIR, a European infrastructure for biological information. ELIXIR-UK is a network of 18 UK organizations and has established training programs, services, and communities. It coordinates UK participation in ELIXIR and related EU projects. ELIXIR-UK also works to establish interoperability across biological data resources and help make these resources FAIR (Findable, Accessible, Interoperable, Reusable). It is working to establish BioFAIR, a proposed new institute that would coordinate UK life science data infrastructure.
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
The document discusses ELIXIR, the European infrastructure for biological information. ELIXIR aims to build a sustainable infrastructure to support life science research and its translation to various sectors. It coordinates existing bioinformatics services across Europe. The Italian node of ELIXIR (ELIXIR-ITA) coordinates domestic bioinformatics resources and connects them to the broader ELIXIR network. It works to aggregate and integrate Italy's small but excellent bioinformatics community and supports the large amount of data being generated through high-throughput sequencing platforms in Italy.
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016
The Tryggve project facilitates cross-border biomedical research by providing secure computing services across the Nordic countries. These services allow sensitive human data to be analyzed while protecting individual privacy through secure data storage, transfer and computing environments. The goal is to enable research collaboration while preventing unauthorized access to personal health data.
This document discusses standards and data integration in life sciences databases. It notes that there are many diverse and dispersed databases in molecular biology. Standards facilitate data sharing, integration, and reuse by defining common data representation and description formats. However, integrating data across different sources is challenging due to variables like different interfaces, data types, and levels of information. Initiatives like ELIXIR and HUPO PSI aim to improve interoperability between life sciences resources through defining community standards and best practices.
The document summarizes discussions from the Technical Coordinator Group (TCG) meeting. The TCG is an advisory body to the Heads of Nodes Committee and consists of technical experts from each ELIXIR Node. They discuss technical and scientific aspects of ELIXIR and identify best practices. The summary outlines the members of TCG, describes short term working groups led by technical coordinators on specific technical efforts, and provides updates on various ELIXIR task forces focusing on areas such as cloud, storage, authentication and authorization, service registry, and training.
This document discusses the challenges of interdisciplinarity and managing large amounts of biological data. It notes that while challenges seem easy from the perspective of one's own discipline, knowledge from other disciplines is needed. It then lists the names and affiliations of members of the "Dutch Team" working on issues related to biological data integration and management.
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
The document summarizes a workshop aimed at integrating resources between several bioinformatics standards registries. The workshop agenda includes presentations from Identifiers.org, BioSharing, BMB Service Registry, EDAM ontology, and the BMB standards registry. Breakout sessions will identify overlaps and potential synergies between the registries, and define areas for collaboration. The goal is to reduce duplication of efforts and develop a common integration and development strategy across registries.
Proteomics repositories integration using EUDAT resourcesRafael C. Jimenez
This document discusses plans to integrate proteomics data repositories using resources from the EUDAT data infrastructure. It describes replicating data from the ELIXIR repository PRIDE to EUDAT data centers for backup and access. This will test using EUDAT services like B2SAFE for replication and assigning persistent identifiers (PIDs) to datasets and files. The current status describes installing necessary software at participating sites and initial testing of replication from PRIDE to the Swedish National Bioinformatics Infrastructure data center. Future plans include syncing data changes and exploring data push/pull models between repositories.
- ELIXIR is a European research infrastructure for biological information that aims to facilitate life sciences research. It brings together over 100 bioinformatics service providers from 17 EU member states.
- The large increase in biological data from sources like DNA sequencing and mass spectrometry is outpacing storage capabilities and transfer speeds. This "data deluge" threatens to overwhelm existing infrastructure for data sharing and analysis in life sciences.
- Cloud computing provides potential solutions like more storage, data compression, keeping data close to computation, and provisioning researchers directly with storage and tools. ELIXIR and Google Cloud Platform UK discussed collaborating to host processed data, provide joint solutions for large data producers, and leverage Google Cloud capabilities to help
The European life-science data infrastructure: Data, Computing and Services ...Rafael C. Jimenez
The document provides an update on the European Life Sciences Infrastructure for Biological Information (ELIXIR). ELIXIR aims to establish a distributed infrastructure to handle the growing volume of life science data. It coordinates several national nodes that provide bioinformatics resources and services. Key recent activities include establishing legal agreements with member states, developing a technical coordinator network, and running pilot projects to test solutions and foster collaboration between nodes. Moving forward, priorities include further establishing the infrastructure and community, providing visible and useful services to users, and ensuring sustainable data management.
The European Life Sciences Infrastructure for Biological Information (ELIXIR) coordinates biological data resources across Europe. It has several task forces working on key issues. This document summarizes discussions from the Technical Coordinators Group (TCG) meeting and provides updates from 7 task forces: Cloud, Storage, Authentication and Authorization (AAI), Service Registry, Metrics and Monitoring, Communication, and Website. Each section briefly describes the task force's goals, current work, and plans to coordinate with other groups to develop technical strategies for ELIXIR.
This document provides an introduction to programmatic access and web services for querying biological data resources. It discusses different types of query interfaces including graphical user interfaces, application programming interfaces, and web services. It then focuses on describing web services, including REST and SOAP web services. Examples are given of using PSICQUIC REST and SOAP services to query molecular interaction data. The document also introduces workflows and workflow management systems like Taverna and myExperiment that allow sharing and reusing workflows that combine multiple services.
Life science requirements from e-infrastructure:initial results from a joint...Rafael C. Jimenez
This document summarizes a workshop on life science requirements from e-infrastructure held by BioMedBridges. It discusses how big data is affecting challenges like data growth outpacing storage and transfer speeds. Potential solutions proposed include improving storage, compression, networking, partitioning data, and computing approaches like clouds. The workshop concluded that e-infrastructures need to better understand research infrastructure problems, evaluate bottlenecks, discuss solutions, and define requirements as big data will change current approaches to data sharing and management.
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Rafael C. Jimenez
European Life Sciences Infrastructure for Biological Information aims to provide data infrastructure for biological information sharing. It is running a pilot project with proteomics data to enable standardized submission and dissemination of data between major proteomics resources like PRIDE and PeptideAtlas. The pilot allows direct archiving of raw proteomics data in PRIDE for the first time. It uses the EUDAT program for data storage and access and ProteomeXchange as a framework to link proteomics databases together. The goal is to prepare for the rapid growth of life sciences data and keep up with processing and storing the large volumes of raw data being generated.
SASI, A lightweight standard for exchanging course informationRafael C. Jimenez
The document describes SASI (Scientific Announcement Standards Initiative), a lightweight standard for exchanging course information between life science organizations. It discusses problems with the current redundant and inconsistent annotation and distribution of announcements. The proposed solution is a centralized registry for annotation using agreed-upon standards, with decentralized distribution of announcements. This would allow automatic exchange of standardized announcements to reduce effort and improve discoverability.
The BioJS project is a collection of JavaScript components for presenting biological data. It started as a student project in 2011 and a component registry was released in 2012. The goals are to create reusable visualization components that are easy to install, configure and extend. Components are developed following common guidelines and released with documentation to be shared and enhanced by the community.
Rafael C Jimenez is the technical coordinator for ELIXIR. ELIXIR aims to coordinate life science resources across Europe through a federative infrastructure based on collaboration. Jimenez oversees the implementation of ELIXIR's technical strategy, which includes areas like compute, data, standards, tools, and access. He coordinates nodes, task forces, and central management. The timeline shows ELIXIR moving from a preparatory phase to a permanent phase between 2013 and 2022.
This document discusses Bi JS, a collection of JavaScript components for presenting biological information that follow common guidelines. It describes how Bi JS components are designed and layered, including data representation, code, dependencies, and style. The document outlines benefits for both creating and using Bi JS components such as sharing development, enhancing visibility, having a common structure, and more. It provides two use case examples and information on the BioJS project members and collaborators.
Java tutorial: Programmatic Access to Molecular InteractionsRafael C. Jimenez
This document provides information about a tutorial on programmatic access to molecular interactions. It discusses setting up the required software including Java, Maven, and a Java IDE. It also describes downloading the course source code and preparing the project. The tutorial will cover topics such as standards for molecular interaction data like PSIMITAB and PSICQUIC, as well as exercises on accessing and working with this data programmatically.
An introduction. Programmatic access to interaction resourcesRafael C. Jimenez
This document provides an introduction to networks and pathways bioinformatics for biologists. It discusses different types of interfaces like graphical user interfaces, application programming interfaces, and web services that allow programmatic access to interaction resources. It specifically focuses on web services, describing REST and SOAP web services. Examples are provided of querying PSICQUIC services through REST and SOAP. Workflows and tools like Taverna and myExperiment are introduced for creating, running, sharing and discovering workflows and services. The document demonstrates a simple workflow in Taverna to query molecular interactions from IntAct using PSICQUIC.
The document introduces the BioJS project, which is a collection of JavaScript components for presenting biological information. It began as a student project in 2011 and a registry of components was released by the EMBI in 2012. The BioJS project aims to create reusable components that can be combined to build biological applications. It provides benefits for both developers in sharing and extending components, and for users in reusing standardized visualizations across different sites and institutions.
The document discusses BioJS, a guideline for developing JavaScript components for biological data visualization. It aims to define a common minimal architecture for such components. BioJS is not a framework but a collection of graphical components with a standardized structure. This includes standardized components, options, inputs, documentation, and ways to develop, discover, test, use, combine, customize and extend BioJS components.
Challenges in protein networks representation and integrationRafael C. Jimenez
The document discusses challenges in representing protein networks and integrating interaction data from multiple sources. It describes how the EMBL-EBI is working to address these challenges through the Complex Portal database of protein complexes and use of PSI-MI standards. The standards include an ontology, file formats, and the PSICQUIC common query interface to facilitate access and integration of molecular interaction data from different databases and web services.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
1. European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Rafael C Jimenez
ELIXIR CTO
RDA Fourth Plenary Meeting, ELIXIR Bridging Force IG
2014,Tuesday 23 September
ELIXIR
5. ELIXIR
• European life sciences research infrastructure for
biological information to support all life science
research
• Safeguard data and build sustainable data
services
5
6. ELIXIR Members
6
• Participated by major bioinformatics service providers
(~130) and supported by 17 EU member states and
EMBL
• 11 EU states signed agreement
• Czech Republic, Estonia, Denmark,
Finland,Israel, Netherlands,
Norway,Portugal, Switzerland,
Sweden, UK
• 6 EU signed MoU
• Belgium, Greece, France, Italy,
Slovenia, Spain
8. ELIXIR strategic drivers
1. Establish a distributed infrastructure to scale with the challenge of data
growth
2. Secure and deliver the core data resources underpinning life science research
3. Provide discoverable tools, services and connectors to drive data access and
exploitation
4. Provide robust technical platforms and clouds for secure data access, data
exchange and compute
5. Develop and maintain standards for data management, reuse and integration
6. Drive partnerships with user communities and other organisations to ensure
high impact
7. Close the computational biology skills gap through a comprehensive training
programme for professionals
8. Support innovation in big data biology
8
9. Areas of interest and programme of work
9
Domain specific services
Data interoperability, vocabulary and ontology services
Tools interoperability and Service Registry
Data resources & services
Technical Services
Management and operations
Training
Compute
Dat
a
Standards
Tools
Training
10. • Working groups to coordinate the ELIXIR technical strategy
• Driven by national technical leads
• Plan, agree and implement technical strategies
• Represent ELIXIR on one specific technical topic
Task forces
10
Scientific
and funding
strategies
ELIXIR Programme of work
Task forces
Planning Agreement Implementation
Technical strategies
Pilot projects
HoNs
HoNs
HoNs
TCG
11. Task forces
11
Data interoperability, vocabulary and ontology services
Tools interoperability and Service Registry
Data resources & services
Technical Services
Management and operations
Training
• Interoperability
• Service registry
• Metrics, monitoring & quality control
• Cloud
• Storage
• AAI
• Training portal
• E-learning
• Communications
• Website
12. ELIXIR Pilot Projects
1. ELIXIR Facing Cloud Support and Virtual Machines - with SIB
2. ELIXIR Data IO to pilot the continuous transfer of major archive
resources to a remote European location - with CSC, Finland
3. Establishing EGA Distributed authentication - with CSC, Finland
4. Establishing EGA as joint venture – with CRG, Spain
5. Improving links between Human Proteome Atlas (HPA) and EMBL-
EBI resources
6. BILS-ProteomeXchange integration using EUDAT resources
7. Interoperable controlled-access big data transfer technology for
ELIXIR - application to EGA EBI / CRG ELIXIR collaboration and
beyond
8. Harmonising Marine Metagenomics pipelines
13. ELIXIR technical activities
• ELIXIR Node activities, Task forces, Pilots
• Technical activities among different interest groups …
13
ELIXIR
Technical
activities
BMBESFRIs e-infrastructures
ELIXIR
nodes
Communities
14. Affinity with RDA groups
Life science
• The BioSharing Registry: connecting data policies,
standards & databases in life sciences
• Wheat Data Interoperability WG
• Agriculture Data Interest Group (IGAD)
• Marine Data Harmonization IG
• Metabolomics
• Structural Biology IG
• Toxicogenomics Interoperability IG
14
15. Affinity with RDA groups
• Data Description Registry Interoperability
• Big Data Analytics IG
• Domain Repositories Interest Group
• Education and Training on handling of research data
• Federated Identity Management
• Metadata IG
• Preservation e-Infrastructure IG
• Service Management IG
• Active Data Management Plans
• Sustainability of eResearch / Cyberinfrastructure
15
16. Affinity with RDA groups
• Data Citation WG
• Data Foundation and Terminology WG
• Metadata Standards Directory Working Group
• PID Information Types WG
• RDA/WDS Publishing Data Bibliometrics WG
• RDA/WDS Publishing Data Services WG
• RDA/WDS Publishing Data Workflows WG
• Repository Audit and Certification DSA–WDS Partnership WG
• Long tail of research data IG
• PID Interest Group
• RDA/WDS Publishing Data IG
• Research Data Provenance
16
19. Data resources in life science
• Many
• Diverse
• Disperse
Nucleic Acids Research annual Database Issue
and the NAR online Molecular Biology Database Collection in 2012.
MY Galperin, GR Cochrane – Nucleic Acids Research, 2011
~1800molecular biology
data resources
22. Improving Links Between distributed European
resources
ELIXIR pilot: Interoperability of protein expressions resources
The Human Protein Atlas portal is a publicly available database
with millions of high-resolution images showing the spatial
distribution of proteins in 46 different normal human tissues and
20 different cancer types, as well as 47 different human cell lines.
25. Data types examples
25
Raw data Process data Metadata
DNA
Human
Liver
Mitochondria
W. Smith
…
Peptide
Mouse
Heart
Nucleus
J. Heinz
…
LPISASHSSK…
TTGTTATCCG…
… … …
29. Cross-site VM Operation - pilot
29
• Perform analysis via cloud infrastructures and VMs
• Transfer VMs between computing centers to allow researchers
to perform analyses that they could not otherwise do locally
• Supported by 5 NRENs and in collaboration with
30. Cross-site VM Operation
30
CSC
EMBL-EBI
University of Groningen
Data Analysis
tools
Computation
Data
Analysis
tools
VM
VM
VM
Chipster
200GB
NBIC Galaxy
50GB
GoNL
60TB
ENA
3.2PB 1GB lightpath
1GB lightpath
1GB lightpath
Funet
Janet
SURFnet
31. European ELIXIR Data - ”LightPath” (EBI / CSC)
• Aim
• To explore the replication of large scale
(Petabyte scale) archives to remote sites
• To create a separate source of data files for
challenging DataIO projects
• Update:
• Selection of pilot data transfer technology
between EBI and CSC
• Established a dedicated light path between
datacenters in London and Kajaani
• Development of model for future IO needs in
the lifesciences in Europe
32. REMS - Resource Entitlement Management System
Principal
investigator
Applicant
Research group
Members of the
application
Metadata on
dataset 1&2
Dataset 1
Dataset 2
DAC 1
Approver
DAC 2
Approver
REMS
Workflow
Reports
Entitlements
IdP
IdP
IdP
SP
1. Apply
for access
4. Approve
5. Access
3. Circulate
to approver
2. Commit to
licence terms
• Access to sensitive data (genomics) granted by a Data Access Committee
• In collaboration with eduGAIN
• Agreements to be applied to other domains: FI-CLARIN & FI-CESSDA
https://tnc2013.terena.org/getfile/421 Starting point: https://remsdemo.csc.fi/
33. Training
33
For Big Data to become huge, however, there are still
hurdles to leap. For one thing, the tools to analyse data
are not yet good enough. And people with the
skills to analyse data are scarce and will
become scarcer. By 2018 there will be a “talent
gap” of between 140,000 and 190,000 people, …
34. Challenges
• Sustain data and services
• Make data and service interoperable
• Necessary to integrate data
• Specially medical, clinical and research
• Data too big to store, exchange & compute? Forthcoming
challenges …
• Data production grows faster than storage
• Cost of data production technologies declines faster than storage
• It takes longer to transfer data than produce the data.
• Privacy, security & access (AAI)
• Training
34
35. • EGA is to be distributed
effort with archive,
submission, and data
distribution capacity at both
the EBI and CRG
35
European Genome-phenome archive
• From the users point of view, EGA remains one integrated
Archive of secure human biomedical research data.
• Search of datasets at either website is "global" across the EGA.
37. ELIXIR pilots to address key challenges in
biomedical research:
1. Cloud computing
“Embassy cloud”: Access reference
data in a virtual environment – work
as though you are at EMBL-EBI or
SIB, Switzerland
3. High-Performance Computing
“Lightpath”: Connections for on-demand reference
data to remote HPC centres at EMBL-EBI and CSC
Finland
2. Authentication & Authorisation
Improved methods and processes for
access to clinical data
43. Data sharing
The casual approach
‘data on my disk and available to anyone who requests it'
Submission to data repositories
Will big data affect data deposition?
Creating a robust infrastructure for biological information is a bigger task than any individual organisation or nation can take on alone
The programme of work defines the ELIXIR Strategic plan for the next 4 years
Programme of work
5 years plan
Work streams
Strategy focused on generic topic
Task forces
Define technical implementation of specific tasks
The programme of work defines the ELIXIR Strategic plan for the next 4 years
Test beds for integration of ELIXIR services. Pilot Projects are being staffed by seconded staff from EMBL-EBI.
Cost of storage more than cost of production
Access to sensitive data granted by a Data Access Committee (DAC)
Sensitive data
Knowledgeable about technology, data and services in life sciences and be knowledgeable in ICT
- EGA is to be distributed effort with archive, submission, and data distribution capacity at both the EBI and CRG
- From the users point of view, EGA remains one integrated Archive of secure human biomedical research data.
Search of datasets at either website is "global" across the EGA.
- EBI and CRG teams are working together to improve search functions based on disease and technology platform type and
- Archived datasets may exist at one or both location, providing load balancing and backup in cases where data is mirrored.
- Data can be discovered and requested from either location, and will be streamed from the appropriate archive in a way transparent to users.
necessary to understand, develop or reproduce published research
Not all the data make it to the public repositories
necessary to understand, develop or reproduce published research