Presentation at BOSC2012 by P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
"Ontology-centric navigation of the scientific literature"bridgingworlds2008
This document discusses ontology-centric knowledge navigation of scientific literature. It motivates the need for scientists to integrate information from various sources, and notes that over 50% of information is unstructured. It proposes that providing structured access to information according to explicit knowledge representations can help scientists. The document then outlines various applications of semantic web technologies like ontologies, reasoning, and text mining to develop ontology-driven systems that can integrate information from multiple sources and enable complex queries over the structured knowledge.
Research objects aim to preserve digital science by aggregating all elements needed to understand a research investigation, including data, computational processes, and annotations. They promote reuse and verification of reproducibility. The anatomy of a research object includes resources like datasets and workflows that are described and related using semantic technologies. Tools are being developed to work with research objects, and standards like the Open Annotation Data Model and PROV are being used to represent their evolution over time.
Menager H - Mobyle web framework: new featuresJan Aerts
Mobyle is an easy to use command line and pipeline tool for bioinformatics. It features integration with common bioinformatics tools through BMID and easy pipeline design and execution with BMPS. Upcoming releases will include new edition widgets, improvements to BMID and BMPS, and continued development is supported by NIAID and GenOuest. Mobyle is available through apt-get or from source code repositories.
This document summarizes a talk on open science given by Jonathan Eisen. Some key points:
1. Eisen recounted his early skepticism of open access but eventual conversion after experiences like publishing an open access paper that received more attention.
2. He discussed experiments with openly releasing genomic data that helped convince him of the benefits of openness in science.
3. Eisen argued that limiting access to scientific literature and data hinders scientific progress, and outlined several ways scientists can promote openness.
GMOD in the Cloud provides preinstalled GMOD tools like Tripal, Chado, GBrowse, and JBrowse on cloud.gmod.org. These tools allow users to visualize, annotate, and manage biological data in the cloud. Potential use cases include community annotation events where users can load data, configure tools, annotate, and then export annotations without installing software locally. Using the cloud avoids installation issues and saves money while providing access to sample genomic datasets.
Jan Aerts is a faculty member at the Faculty of Engineering - ESAT/SCD who is involved in genomics research including DNA sequencing of chickens, cows, and humans. Their research aims to identify genetic variations responsible for phenotypes and diseases. They focus on developing visual analytics tools to help with (1) filtering large datasets to find relevant parameters and (2) making sense of patterns in the data such as gene networks. The goal is to help researchers better understand complicated genomic data through interactive visualization.
"Ontology-centric navigation of the scientific literature"bridgingworlds2008
This document discusses ontology-centric knowledge navigation of scientific literature. It motivates the need for scientists to integrate information from various sources, and notes that over 50% of information is unstructured. It proposes that providing structured access to information according to explicit knowledge representations can help scientists. The document then outlines various applications of semantic web technologies like ontologies, reasoning, and text mining to develop ontology-driven systems that can integrate information from multiple sources and enable complex queries over the structured knowledge.
Research objects aim to preserve digital science by aggregating all elements needed to understand a research investigation, including data, computational processes, and annotations. They promote reuse and verification of reproducibility. The anatomy of a research object includes resources like datasets and workflows that are described and related using semantic technologies. Tools are being developed to work with research objects, and standards like the Open Annotation Data Model and PROV are being used to represent their evolution over time.
Menager H - Mobyle web framework: new featuresJan Aerts
Mobyle is an easy to use command line and pipeline tool for bioinformatics. It features integration with common bioinformatics tools through BMID and easy pipeline design and execution with BMPS. Upcoming releases will include new edition widgets, improvements to BMID and BMPS, and continued development is supported by NIAID and GenOuest. Mobyle is available through apt-get or from source code repositories.
This document summarizes a talk on open science given by Jonathan Eisen. Some key points:
1. Eisen recounted his early skepticism of open access but eventual conversion after experiences like publishing an open access paper that received more attention.
2. He discussed experiments with openly releasing genomic data that helped convince him of the benefits of openness in science.
3. Eisen argued that limiting access to scientific literature and data hinders scientific progress, and outlined several ways scientists can promote openness.
GMOD in the Cloud provides preinstalled GMOD tools like Tripal, Chado, GBrowse, and JBrowse on cloud.gmod.org. These tools allow users to visualize, annotate, and manage biological data in the cloud. Potential use cases include community annotation events where users can load data, configure tools, annotate, and then export annotations without installing software locally. Using the cloud avoids installation issues and saves money while providing access to sample genomic datasets.
Jan Aerts is a faculty member at the Faculty of Engineering - ESAT/SCD who is involved in genomics research including DNA sequencing of chickens, cows, and humans. Their research aims to identify genetic variations responsible for phenotypes and diseases. They focus on developing visual analytics tools to help with (1) filtering large datasets to find relevant parameters and (2) making sense of patterns in the data such as gene networks. The goal is to help researchers better understand complicated genomic data through interactive visualization.
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...GigaScience, BGI Hong Kong
Eamonn Maguire's talk on "The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe" at ISCB-Asia, December 17th 2012
Managing Experimental Metadata using ISA data structures discusses using the ISA (Investigation/Study/Assay) format and tools to capture experimental workflows, make annotations explicit and discoverable, and structure descriptions for consistency and tracking. The ISA format supports data provenance tracking using a node/edge concept and tabular representation inspired by object models. It can be applied to experiments in various omics domains like microarrays, sequencing, flow cytometry, and mass spectrometry. The ISA tools provide a suite of modular, open source tools for creating, validating, loading, browsing, and analyzing ISA-formatted metadata and linking it to associated data files.
Pal gov.tutorial2.session13 1.data schema integrationMustafa Jarrar
This document discusses data schema integration, which involves identifying correspondences between different data schemas and resolving conflicts between them to create an integrated schema. It describes challenges in schema integration including identifying corresponding concepts and analyzing conflicts. It then presents a generic framework for schema integration involving schema transformation, schema matching to identify correspondences, and integration and mapping generation to create the integrated schema and mappings. Finally, it provides examples of different types of conflicts and integration methods.
White Paper: Hadoop in Life Sciences — An Introduction EMC
This White Paper reviews the Apache Hadoop technology, its components — MapReduce and Hadoop Distributed File System — and its adoption in the life sciences with an example in Genomics data analysis.
"Towards a Science of Reproducible Science?" DPRMA Workshop talk at JCDL 2013, Indianapolis, 25th July 2013. Workshop website is http://dprma.oerc.ox.ac.uk/
Paper is
David De Roure. 2013. Towards computational research objects. In Proceedings of the 1st International Workshop on Digital Preservation of Research Methods and Artefacts (DPRMA '13). ACM, New York, NY, USA, 16-19. DOI=10.1145/2499583.2499590 http://doi.acm.org/10.1145/2499583.2499590
This document provides an overview of the Web Ontology Language (OWL). OWL is built on top of RDF and is used to process information on the web by computers. It allows for stronger constraints and rules than RDF. There are three sublanguages of OWL with varying expressiveness. OWL is written in XML and is a W3C standard, making it suitable for exchanging and processing web information across different systems.
This tutorial discusses the Web Ontology Language (OWL). OWL is built on top of RDF and is used to process information on the web by computers. It allows for stronger constraints and rules than RDF. There are three sublanguages of OWL with varying expressiveness. OWL is written in XML and is a W3C standard for representing ontologies on the semantic web.
Pal gov.tutorial2.session13 2.gav and lav integrationMustafa Jarrar
This document discusses Global-As-View (GAV) and Local-As-View (LAV) integration approaches. GAV defines the global schema in terms of the local schemas by writing views over the local schemas. LAV defines the local schemas in terms of the global schema by writing views from the global schema to the local schemas. The document provides an example of each approach and discusses how queries are executed differently under GAV versus LAV.
Präsentation anlässlich eines Thementreffs der Hauptbibliothek Universität Zürich zum Thema "Neue Open Access-Themen mit Bedeutung für wissenschaftliche Bibliotheken" am 23.7.2012
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
This document provides an overview of storing Resource Description Framework (RDF) graphs in relational database management systems. Specifically:
- RDF represents data as subject-predicate-object triples that form a directed graph. This triples-based data model allows for easy data integration.
- RDF graphs are typically stored as a single subject-predicate-object table in a relational database for persistent storage.
- Queries to retrieve and manipulate data in the RDF graph can then be performed using SQL on this table.
This document proposes representing scientific workflows as first-class citizens called research objects. It presents a model for workflow research objects that aggregates all necessary elements to understand an investigation. These include experiments, annotations, results, datasets and provenance. Research objects are encoded using semantic technologies like RDF and follow standards such as the Object Exchange model. The lifecycle of research objects is also described.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
This document provides an overview of a tutorial on data integration and open information systems. It discusses the goals of the semantic web and linked data, which aim to create a universal medium for data exchange by publishing and connecting structured data on the web. Currently, web APIs allow access to data but use different data models and formats. Linked data uses common RDF standards and links entities to enable querying across diverse domains and data sources, forming a global data web.
This document discusses the intersection of machine learning and search-based software engineering (ML & SBSE). It provides examples of how data miners can find signals in software engineering artifacts using machine learning techniques. It then discusses how better algorithms do not necessarily lead to better mining yet and emphasizes the importance of sharing data, models, and analysis methods. Finally, it outlines a vision for "discussion mining" to guide teams in walking across the space of local models, with the goal of building a science of localism in ML and SBSE.
The document discusses Oracle Semantic Technologies for storing and querying RDF data. It provides an overview of how RDF data is stored and organized in Oracle databases using ID triples and URI mapping tables. It describes how the SEM_MATCH SQL function allows querying RDF data using a SPARQL-like syntax. Optimization techniques for SEM_MATCH queries include indexes and materialized views. The core entities in the Oracle Semantic Store include semantic networks, models, rulebases, and entailments. Functionality includes bulk loading, incremental loading, SPARQL querying, and built-in or user-defined inference rules.
This document discusses visualizing genomic variation from DNA sequencing data. It begins by defining genomic variation such as single nucleotide polymorphisms and structural variations. It then discusses analyzing multiple samples, showing affected genes and clustering individuals. The document outlines challenges in visualizing high-dimensional genomic data from deep sequencing at scale, while maintaining computational performance for interactivity. It proposes representing rearranged chromosomes based on segment relationships to focus on functional impacts.
Visual Analytics in Omics - why, what, how?Jan Aerts
This document discusses visual analytics in omics data. It begins by noting the shift from hypothesis-driven to data-driven research due to large datasets. Visual analytics can help explore these data by opening the "black box" of algorithms and enabling researchers to develop hypotheses. Effective visualization leverages human perception through techniques like preattentive vision and Gestalt laws. Challenges to visual analytics include scalability issues for large datasets and identifying interesting patterns for further analysis. Examples demonstrate data exploration, filtering, and user-guided analysis in genomic applications.
More Related Content
Similar to P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Cu...GigaScience, BGI Hong Kong
Eamonn Maguire's talk on "The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe" at ISCB-Asia, December 17th 2012
Managing Experimental Metadata using ISA data structures discusses using the ISA (Investigation/Study/Assay) format and tools to capture experimental workflows, make annotations explicit and discoverable, and structure descriptions for consistency and tracking. The ISA format supports data provenance tracking using a node/edge concept and tabular representation inspired by object models. It can be applied to experiments in various omics domains like microarrays, sequencing, flow cytometry, and mass spectrometry. The ISA tools provide a suite of modular, open source tools for creating, validating, loading, browsing, and analyzing ISA-formatted metadata and linking it to associated data files.
Pal gov.tutorial2.session13 1.data schema integrationMustafa Jarrar
This document discusses data schema integration, which involves identifying correspondences between different data schemas and resolving conflicts between them to create an integrated schema. It describes challenges in schema integration including identifying corresponding concepts and analyzing conflicts. It then presents a generic framework for schema integration involving schema transformation, schema matching to identify correspondences, and integration and mapping generation to create the integrated schema and mappings. Finally, it provides examples of different types of conflicts and integration methods.
White Paper: Hadoop in Life Sciences — An Introduction EMC
This White Paper reviews the Apache Hadoop technology, its components — MapReduce and Hadoop Distributed File System — and its adoption in the life sciences with an example in Genomics data analysis.
"Towards a Science of Reproducible Science?" DPRMA Workshop talk at JCDL 2013, Indianapolis, 25th July 2013. Workshop website is http://dprma.oerc.ox.ac.uk/
Paper is
David De Roure. 2013. Towards computational research objects. In Proceedings of the 1st International Workshop on Digital Preservation of Research Methods and Artefacts (DPRMA '13). ACM, New York, NY, USA, 16-19. DOI=10.1145/2499583.2499590 http://doi.acm.org/10.1145/2499583.2499590
This document provides an overview of the Web Ontology Language (OWL). OWL is built on top of RDF and is used to process information on the web by computers. It allows for stronger constraints and rules than RDF. There are three sublanguages of OWL with varying expressiveness. OWL is written in XML and is a W3C standard, making it suitable for exchanging and processing web information across different systems.
This tutorial discusses the Web Ontology Language (OWL). OWL is built on top of RDF and is used to process information on the web by computers. It allows for stronger constraints and rules than RDF. There are three sublanguages of OWL with varying expressiveness. OWL is written in XML and is a W3C standard for representing ontologies on the semantic web.
Pal gov.tutorial2.session13 2.gav and lav integrationMustafa Jarrar
This document discusses Global-As-View (GAV) and Local-As-View (LAV) integration approaches. GAV defines the global schema in terms of the local schemas by writing views over the local schemas. LAV defines the local schemas in terms of the global schema by writing views from the global schema to the local schemas. The document provides an example of each approach and discusses how queries are executed differently under GAV versus LAV.
Präsentation anlässlich eines Thementreffs der Hauptbibliothek Universität Zürich zum Thema "Neue Open Access-Themen mit Bedeutung für wissenschaftliche Bibliotheken" am 23.7.2012
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
This document provides an overview of storing Resource Description Framework (RDF) graphs in relational database management systems. Specifically:
- RDF represents data as subject-predicate-object triples that form a directed graph. This triples-based data model allows for easy data integration.
- RDF graphs are typically stored as a single subject-predicate-object table in a relational database for persistent storage.
- Queries to retrieve and manipulate data in the RDF graph can then be performed using SQL on this table.
This document proposes representing scientific workflows as first-class citizens called research objects. It presents a model for workflow research objects that aggregates all necessary elements to understand an investigation. These include experiments, annotations, results, datasets and provenance. Research objects are encoded using semantic technologies like RDF and follow standards such as the Object Exchange model. The lifecycle of research objects is also described.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
This document provides an overview of a tutorial on data integration and open information systems. It discusses the goals of the semantic web and linked data, which aim to create a universal medium for data exchange by publishing and connecting structured data on the web. Currently, web APIs allow access to data but use different data models and formats. Linked data uses common RDF standards and links entities to enable querying across diverse domains and data sources, forming a global data web.
This document discusses the intersection of machine learning and search-based software engineering (ML & SBSE). It provides examples of how data miners can find signals in software engineering artifacts using machine learning techniques. It then discusses how better algorithms do not necessarily lead to better mining yet and emphasizes the importance of sharing data, models, and analysis methods. Finally, it outlines a vision for "discussion mining" to guide teams in walking across the space of local models, with the goal of building a science of localism in ML and SBSE.
The document discusses Oracle Semantic Technologies for storing and querying RDF data. It provides an overview of how RDF data is stored and organized in Oracle databases using ID triples and URI mapping tables. It describes how the SEM_MATCH SQL function allows querying RDF data using a SPARQL-like syntax. Optimization techniques for SEM_MATCH queries include indexes and materialized views. The core entities in the Oracle Semantic Store include semantic networks, models, rulebases, and entailments. Functionality includes bulk loading, incremental loading, SPARQL querying, and built-in or user-defined inference rules.
Similar to P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe (20)
This document discusses visualizing genomic variation from DNA sequencing data. It begins by defining genomic variation such as single nucleotide polymorphisms and structural variations. It then discusses analyzing multiple samples, showing affected genes and clustering individuals. The document outlines challenges in visualizing high-dimensional genomic data from deep sequencing at scale, while maintaining computational performance for interactivity. It proposes representing rearranged chromosomes based on segment relationships to focus on functional impacts.
Visual Analytics in Omics - why, what, how?Jan Aerts
This document discusses visual analytics in omics data. It begins by noting the shift from hypothesis-driven to data-driven research due to large datasets. Visual analytics can help explore these data by opening the "black box" of algorithms and enabling researchers to develop hypotheses. Effective visualization leverages human perception through techniques like preattentive vision and Gestalt laws. Challenges to visual analytics include scalability issues for large datasets and identifying interesting patterns for further analysis. Examples demonstrate data exploration, filtering, and user-guided analysis in genomic applications.
Visual Analytics in Omics: why, what, how?Jan Aerts
Visual Analytics in omics can help address several challenges in analyzing complex biological data:
- It allows researchers to explore large datasets in an interactive way to generate hypotheses, as the initial analysis is often exploratory rather than driven by a specific hypothesis.
- It opens the "black box" of automated analysis by making the analysis process transparent and understandable to domain experts.
- Effective visualization techniques leverage human visual perception and cognition to facilitate reasoning about the data.
This document discusses the shift from hypothesis-driven to data-driven scientific research paradigms and the role of visualization in facilitating human reasoning about complex data. It describes visualization as a framework involving interaction, visual representations, and analytics to support biological data exploration and hypothesis generation. Examples are provided of visualization tools that enable interactive analysis, algorithm development by making black boxes transparent, and user-guided analysis through continuous refinement. Challenges in scalability, uncertainty, evaluation and infrastructure are also discussed.
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
This document discusses visualizing structural variation in genomes. It begins by defining structural variation and copy number variation. It then discusses why structural variation is important, listing examples of traits influenced by copy number differences. The document outlines challenges in visualizing structural variation data from techniques like array CGH and sequencing. It proposes dual approaches - focusing on functional impact and representing rearranged chromosomes based on segment relationships. Future directions discussed include single-cell analysis and cross-omic data integration.
The document discusses humanizing data analysis by putting the human back in the loop of data analysis processes. It notes that current data analysis involves filtering and other automated tasks that act as a "black box" for humans. The author argues that data analysis should involve generating hypotheses with the human perspective in mind through techniques like visual analytics and cognitive tasks to make the data analysis process more transparent and understandable for people.
This document provides an introduction to data visualization. It discusses what data visualization is, why it is used, and the stages involved in creating visualizations from data. Key points include:
- Data visualization involves using visual representations of data to help people analyze and communicate information more effectively.
- Visualizations are used for tasks like recording information, analyzing data to support reasoning, and communicating information.
- The process of creating visualizations involves understanding the properties of the data, properties of images and perception, and rules for mapping data to visual encodings.
- Important considerations include which visual variables to use to encode different data properties, principles of visual perception, and enabling interaction with the data. Validation of the effectiveness of
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
The document introduces Dao, a new programming language for bioinformatics. It discusses Dao's key features like optional typing, native support for concurrent programming, an LLVM-based JIT compiler, simple C interfaces, and the ClangDao tool for wrapping C/C++ libraries. An example demonstrates using thread tasks and futures for concurrent programming. The document outlines future plans to develop BioDao, an open source project providing bioinformatics modules to the Dao language.
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
Presentation at BOSC2012 by J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module for distributed analysis of large-scale biological data
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
The Bioinformatics Testing Consortium aims to improve bioinformatics software by having software tested by others in addition to the developers. It will assign testers to review open source bioinformatics projects and ensure they meet minimum standards through running standard tests and verifying output matches test data. This benefits new users by providing more reliable software, developers by identifying bugs, testers by learning quality standards, and journal editors by ensuring published software is fit for purpose. The consortium seeks feedback, participation, test cases, and engagement on Twitter to achieve its goals.
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
The document describes Galaxy, an open-source web-based platform for visual analysis of genomic data. Galaxy provides tools for obtaining, integrating, analyzing, visualizing, sharing and publishing complete genomic analyses through a graphical user interface. It allows users to easily chain tools and create complex analysis workflows. The document highlights several Galaxy visualization tools, including Trackster for interactive exploration of large genomic datasets, Paramamonster for parameter space exploration, and Circster for circular genome-wide views. Future directions include expanding visualization capabilities to other data types and integrating multiple coordinated views.
GMOD in the Cloud provides preinstalled GMOD tools like Tripal, Chado, GBrowse, and JBrowse on cloud.gmod.org. These tools allow users to visualize, annotate, and manage biological data in the cloud. Potential use cases include community annotation events where users can load data, configure tools, annotate, and then export annotations without installing software locally. Using the cloud avoids installation issues and saves money while providing access to sample genomic datasets.
B Chapman - Toolkit for variation comparison and analysisJan Aerts
The document describes a toolkit for comparing variant calls from different variant callers and sequencing technologies. It proposes establishing a set of true variants by comparing calls across multiple callers and technologies on gold standard genomes. The toolkit includes a comparison architecture that analyzes variants, identifies real variants by summarizing metrics, and scales to large numbers of variants and samples. It also describes building analysis pipelines in Clojure and providing comparison results through a web interface with metrics. The goal is to help answer biological questions by determining true variants and prioritizing based on existing evidence.
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
The KUPKB integrates thousands of kidney and urinary pathway studies into an RDF knowledge base using ontologies to provide schema and annotation. The iKUP browser exposes the knowledge in a simple web interface, allowing biologists to more easily survey biological publications and generate hypotheses than traditional literature searches. The tools and APIs used make it possible to build such applications at relatively low cost.
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
InterMine is an integrated data warehouse with an optimizing query engine. It provides web services and embeddable widgets to make powerful data querying accessible to non-technical users. InterMine runs databases for various model organisms and is working to make machine-readable APIs and data displays universally accessible.
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
This document discusses how to quickly set up a bioinformatics analysis platform in four minutes using various open source tools. It introduces CloudBioLinux for building custom tool suites, CloudMan for creating scalable processing platforms, Galaxy for exploratory analysis, and BioCloudCentral for getting started easily. A new Python library called Blend is also introduced for automating repetitive tasks related to analysis and infrastructure manipulation using the APIs of these tools.
B Kinoshita - Creating biology pipelines with BioUnoJan Aerts
BioUno is an open source project that uses continuous integration tools like Jenkins to create biology pipelines. It was created by Bruno Kinoshita in Brazil as a way to apply DevOps practices to biology. BioUno uses Jenkins for its jobs, notifications, and integration with other tools. The next steps are to enhance documentation, find new developers and users, and compare BioUno to other similar biology tools.
The document discusses updates to the Galaxy API and automatic parallelization capabilities. The RESTful Galaxy API now uses JSON and authentication keys instead of usernames/passwords. Tools can be configured for automatic parallelization to take advantage of available resources. The Tool Shed allows simple installation and updating of tools and workflows in a Galaxy instance.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe
1. 1
The open source ISA metadata tracking
framework: from data curation and management at
the source, to the linked data universe
BOSC, Long Beach, July 13-14, 2012
Philippe Rocca-Serra (Ph. D)
ISA Team
twitter: @isatools.org
philippe.rocca-serra@oerc.ox.ac.uk
http://www.isa-tools.org
Friday, 13 July 2012
2. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Friday, 13 July 2012
3. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books
(information for humans)
Friday, 13 July 2012
4. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books Spreadsheets and Tables
(information for humans) ( the compromise)
Friday, 13 July 2012
5. 3
MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.
But let’s proceed gradually…
Notes in Lab Books Spreadsheets and Tables Facts as RDF statements
(information for humans) ( the compromise) (information for machines)
Friday, 13 July 2012
6. 9
Observations
• Experiments are expensive, often publicly funded, still
many fail to see the light.
• Spreadsheets are the most common vehicle for so-called
‘omics’ (functional genomics) experimental metadata
tracking
• technology centric repositories form de facto silos
• conversions are required to allow for deposition to public
databases.
• submitting to common information across a series of
repositories is inefficient
Friday, 13 July 2012
8. 13
Many ontologies, Many Formats, Many
Requirements…
Grr…Where are the
tools!?!
Credits:
h/p://liverpoolsolfed.wordpress.com/resources/image-‐bank/demonstraAon/
Friday, 13 July 2012
10. Why ISA format and Tools?
– Supporting data provenance tracking
– Node/Edge underlying concept
– Tabular as a compromise: a presentation layer inspired by Object
model (FuGE,MAGE-OM)
– A Generic representation, applied to:
• microarray based experiments (MAGE)
• sequencing based experiments (SRA)
• flow cytometry based experiments (FuGE-Flow Cyt)
• mass spectrometry and NMR spectroscopy experiments
Friday, 13 July 2012
11. Why ISA format and Tools?
investigation investigation
high level concept to link H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel
related studies H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel
H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel
study
the central unit, containing
information on the subject
under study, its characteristics H1.sample1 Labeling H1.sample1.labeled h1-s1.cel
and any treatments applied. H1
a study has associated assays H. Sapiens H1.sample2 h1-s2.cel
35 Years
assay H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel
test performed either on H. Sapiens
33 Years
material taken from the sub-
ject or on the whole initial
subject, which produce quali-
tative or quantitative meas- ISA metadata specifications:
urements (data)
•workflow and process orientated
•compatible with checklist enforcement
•compatible with external vocabulary resources
assay(s) assay(s) •compatible by design with existing schemas
pointers to data file MAGE-Tab
names/location
Pride-xml
SRA-xml
external files in Currently finalizing conversion to RDF to explore
native or other for-
mats
the growing Linked Data universe, in collaboration
with the W3C HCLSIG, Toxbank Consortium)
data data
Friday, 13 July 2012
12. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
13. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Material Node Material Node
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
14. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Data File Node
Material Node Material Node
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
15. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Data File Node
Material Node Material Node
Comment[…]
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
16. ISA syntax and Table definition
• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)
Data File Node
Material Node Material Node
Comment[…]
Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]
Performer (operator
effect)
Date (day effect)
9
Friday, 13 July 2012
19. 22
How do ISA tools access Ontology servers?
Friday, 13 July 2012
20. The ISAcreator...
isacreator
Developed to be a user friendly way to
enter standards-compliant metadata: it
has lots of features...
But these are just some of them...we
also have a data entry wizard and an
import utility...
Friday, 13 July 2012
21. 24
Select and Annotate in ISAcreator
Friday, 13 July 2012
23. Plugins in ISAcreator
In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.
•Plugins can be developed for 3 different purposes:
Search (adds extra search space Custom cell editors Extra general functionality
for ontology tool) (for spreadsheet) (which appears in a plugin
menu)
•2 Examples of ISA plugins:
• Access to local metadata stores: Novartis Plugin to Ontology Widget
• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository contribution to ISA project).
Friday, 13 July 2012
24. Plugins...example 1 Novartis Metastore Search
Search function on the Novartis
Metastore... integrates search results
on the metastore in the Ontology
search tool.
So, with the Novartis plugin in your
Plugin directory, you’ll be able to
search the Novartis metastore
directly within ISAcreator, and it will
handle all the tasks involved with
recording term source, etc.
Friday, 13 July 2012
25. Plugins Example 2 - Metabolite Identification plugin
5
Credits: Kenneth Haug: Metabolights
Friday, 13 July 2012
26. 30
Potential Issues and known hurdles
• The problem of conflicting versions
– especially high when working with big consortia
– distributed, decentralized groups of users
• Lack of version control and history
• Absence of collaborative features
– Looking for new solutions while retaining the features !
• OntoMaton: Bringing Google Doc, NCBO Bioportal and
ISA-TAB together !
Friday, 13 July 2012
30. OntoMaton
• Public release: http://goo.gl/2OKFV
• Can be used in any Google Spreadsheet
document
• Application:
• Annotating data records
• Supporting ontology development (see OBI
Quick Term Templates)
Friday, 13 July 2012
31. 31
ISA2RDF work in progress
• Use case on W3C HCLS scientific discourse list
– deciding on the granularity of representation
– building on previous experience
– Evaluating alternative representations.
• Participitation to the Biohackathon 2011
– http://blogs.openaccesscentral.com/blogs/bmcblog/entry/
biohackathon_2011_number_1
– Discussing best practices
• PURL uri and identifiers.org as identifiers
• Openphacts guidelines (http://www.nanopub.org/guidelines/
OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf)
•
Friday, 13 July 2012
32. Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures
Friday, 13 July 2012
33. Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures
Friday, 13 July 2012
34. Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures
Friday, 13 July 2012
35. 32
ISA2RDF: work in progress
jeliazkova.nina
[toxbank project]
Friday, 13 July 2012
36. 32
ISA2RDF: work in progress
jeliazkova.nina
[toxbank project]
Friday, 13 July 2012
37. ISA2OWL
• OWLAPI
• ISA Parser (in memory BII object store objects)
• Mapping ISA syntax into target Ontological Space
• Decoupling Mapping from Conversion Engine
• avoid to be tied to a semantic framework
Friday, 13 July 2012
40. ISA2OWL: mapping issues
• Stability over time
• Keeping track of resource versions
• Gaps in coverage
• Use of local extensions
• Direct requests/contributions
Friday, 13 July 2012
41. ISA2OWL: development
• include graph metadata (graph provenance to aid
indexing)
• extend semantic validation of ISA archive
• augment annotation by suggesting additions
• facilitate curation work
• create new mappings to other frameworks
(OPML model, SIO,)
Friday, 13 July 2012
42. 33
Publication...
ISA software suite: supporting standards-compliant
experimental annotation and enabling curation at the
community level
Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris;
Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta Sansone
BioinformaAcs
2010
26:
2354-‐2356
Friday, 13 July 2012
43. 34
Acknowledgements
Groups and individuals participating in:
MIBBI http://mibbi.org
ISA-‐Tab
format http://isatab.sf.net
OBO
Foundry http://obofoundry.org
OBI: http://obi-ontology.org/page/Main_Page
collaborators at:
ISA Infrastructure Team: Cambridge University
Alejandra Gonzalez-‐Beltran
(Oxford) EuNuGO
Harvard School for Public Health
Eamonn Maguire
(Oxford) FDAs NCTR
Philippe Rocca-‐Serra
(Oxford) Leibniz Plant Institute
NERCs NEBC
SIDR,
INIST
Metabolights,
EMBL-‐EBI
Funders:
EU Carcinogenomics Project
UK
BBSRC
Friday, 13 July 2012
44. 35
Groups and individuals participating in:
Winston Hide: HSPH
Oliver Hoffman: HSPH
Shannan Ho Sui : HSPH
Brad Chapman: HSPH
Christoph Steinbeck: Metabolights
Kenneth Haug: Metabolights
Paula de Matos: Metabolights
Magali Roux: INIST
Florian Mazur: INIST
Alain Zasadzinki: INIST
Marie Christine Jacquemot: INIST
Nina Jeliazkova: ToxBank
And many more who have to forgive us!
Friday, 13 July 2012