This document describes Bio-NGS, a BioRuby plugin for conducting programmable workflows for Next Generation Sequencing (NGS) data. Bio-NGS provides a software development framework, application, and project environment for analyzing NGS data. It integrates third-party bioinformatics tools as wrappers or bindings and allows for modular, reusable plugins. The document outlines features of the Bio-NGS application, software development framework, and project environment.
BioRuby is a bioinformatics library for the Ruby programming language. It provides object-oriented tools for tasks like sequence analysis, format conversion, running bioinformatics tools, and working with biological data. The latest version added features like improved support for phylogenetic XML (PhyloXML), next-generation sequencing FASTQ format reading/writing, and a REST API wrapper for the NCBI database. BioRuby development follows agile principles and its large developer community contributes new code frequently on GitHub. The project aims to improve integration with R and data visualization while maintaining a stable core.
H Mishima - Biogem, Ruby UCSC API, and BioRubyJan Aerts
BioRuby is an open-source bioinformatics library for the Ruby programming language that has been in development since 2000. It utilizes a centralized approach where code changes are reviewed by core committers. In recent years, efforts have been made to decentralize development through the use of GitHub and biogems - plug-ins that can be developed and maintained independently while still following standard guidelines. There are now over 60 biogems covering various domains. The biogems framework aims to further expand and motivate contributions to BioRuby.
The document summarizes the BioLib project, which aims to create C/C++ libraries for common biological functionality that can be accessed from multiple bioinformatics programming languages to avoid duplication of efforts. It has created bindings for several existing libraries, including Affyio, Staden IO, GSL, Rlib, and others. The project uses Git for version control, CMake for building, and SWIG for generating language bindings in an effort to maximize code reuse across languages.
El documento describe las diferentes cualidades y características de los amigos de la persona, incluyendo aquellos que entienden a pesar de las diferencias, ayudan en momentos difíciles, a veces critican pero de forma constructiva, son tranquilos o enérgicos, protegen la paz, nunca se rinden, muestran afecto, necesitan afecto, son originales, arriesgados, sociales, entusiastas, ruidosos, protectores, distraídos, cercanos aunque lejanos, trabajadores, soñadores, problemáticos pero ay
BioRuby is a bioinformatics library for the Ruby programming language. It provides object-oriented tools for tasks like sequence analysis, format conversion, running bioinformatics tools, and working with biological data. The latest version added features like improved support for phylogenetic XML (PhyloXML), next-generation sequencing FASTQ format reading/writing, and a REST API wrapper for the NCBI database. BioRuby development follows agile principles and its large developer community contributes new code frequently on GitHub. The project aims to improve integration with R and data visualization while maintaining a stable core.
H Mishima - Biogem, Ruby UCSC API, and BioRubyJan Aerts
BioRuby is an open-source bioinformatics library for the Ruby programming language that has been in development since 2000. It utilizes a centralized approach where code changes are reviewed by core committers. In recent years, efforts have been made to decentralize development through the use of GitHub and biogems - plug-ins that can be developed and maintained independently while still following standard guidelines. There are now over 60 biogems covering various domains. The biogems framework aims to further expand and motivate contributions to BioRuby.
The document summarizes the BioLib project, which aims to create C/C++ libraries for common biological functionality that can be accessed from multiple bioinformatics programming languages to avoid duplication of efforts. It has created bindings for several existing libraries, including Affyio, Staden IO, GSL, Rlib, and others. The project uses Git for version control, CMake for building, and SWIG for generating language bindings in an effort to maximize code reuse across languages.
El documento describe las diferentes cualidades y características de los amigos de la persona, incluyendo aquellos que entienden a pesar de las diferencias, ayudan en momentos difíciles, a veces critican pero de forma constructiva, son tranquilos o enérgicos, protegen la paz, nunca se rinden, muestran afecto, necesitan afecto, son originales, arriesgados, sociales, entusiastas, ruidosos, protectores, distraídos, cercanos aunque lejanos, trabajadores, soñadores, problemáticos pero ay
BioLib is a C and C++ library that aims to provide common bioinformatics functionality to multiple bioinformatics programming languages like BioPerl, BioJava, and BioPython in order to prevent duplication of efforts given the limited number of contributing bioinformatics programmers. The BioLib project seeks to create a 'kernel' of reusable software for bioinformatics by initially focusing on sequence analysis, structure prediction, and biological database access.
Experiences with logic programming in bioinformaticsChris Mungall
This document discusses experiences applying logic programming techniques in bioinformatics. It describes Obol, a system that used definite clause grammars to parse biological terms, and Blipkit, a reusable bioinformatics toolkit built for SWI-Prolog. Blipkit includes domain models, I/O modules, and tools for integrating with relational databases and web services. The document discusses applications of logic programming for tasks like genome inference, phenotype matching, and consistency checking biological data. It evaluates different logic programming approaches for representing genomic data and rules.
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionHilmar Lapp
The Open Bioinformatics Foundation (OBF) is a non-profit organization that promotes open source software development and open science in bioinformatics. It sponsors conferences like BOSC and initiatives that nurture the bioinformatics community. The OBF accepts donations for and manages assets on behalf of member projects. In the past year, the OBF board of directors was elected, the OBF participated in Google Summer of Code, and the organization is working to improve sustainability and professionalism as an all-volunteer group.
Sharing Data: An Introductory Workshop from OpenAIRE and FosterOpenAIRE
This document provides an introduction to open data sharing. It discusses the benefits of open data such as saving time, enabling reproducibility, and strengthening the scholarly record. Funders increasingly require that research data be made openly available. However, it also acknowledges reasons one may not want to share data, such as privacy or ethical concerns. The document outlines key points to consider in open data sharing like metadata, formats, licensing, and long-term preservation.
I apologize, upon further review I do not have enough context to provide a meaningful summary of the document you provided. The document contains snippets of text from different sources without clear connections between them. Could you please provide a single coherent document or passage for me to summarize? Summarizing disparate quotes and sections without understanding their full context and relationship to each other may lead to inaccuracies.
Software engineering methodologies also work for Ontology engineering. This presentation from Bio-Ontologies 2012 describes how we are using Jenkins CI in GO and other ontologies.
BioJava is an open source Java framework for processing biological data. It provides tools for analyzing and manipulating sequences, structures, and other biological data. The latest version, BioJava 1.7, includes improved support for 3D structures and modularization into separate modules. The project aims to facilitate rapid bioinformatics application development and is supported by an active developer community.
Use Integrated Genome Browser to explore, analyze, and publish genomic dataAnn Loraine
The document discusses the genome browser IGB (Integrated Genome Browser) and how it can be used to analyze genomic data. IGB allows users to load, visualize, and analyze genomic data. It supports fast zooming and is highly interactive. Data can be shared using QuickLoad sites and IGB is extensible via apps. The document provides an example analysis of the MEOX1 gene using IGB to investigate alternative splicing and its effects on protein function. RNA-seq data was loaded and filtered in IGB to find evidence of exon skipping, which deletes a conserved homeodomain in the protein.
Data and Computational Challenges in Integrative Biomedical InformaticsJoel Saltz
Joel Saltz MD, PhD discusses data and computational challenges in integrative biomedical informatics. His research center analyzes complex patient data like medical images, pathology slides, and "omic" data to characterize diseases at multiple scales. Machine learning is used to automatically segment and classify features in images and identify patterns across different data types that can improve disease classification, predict outcomes, and uncover new biology. Large computing resources are required to handle and analyze huge biomedical datasets.
Graph DB + Bioinformatics: Bio4j, recent applications and future directions Pablo Pareja Tobes
Graph DB + Bioinformatics describes applications of graph databases in bioinformatics. Bio4j is a graph database that integrates biological data from sources like Uniprot, Gene Ontology, and NCBI Taxonomy. It provides a novel framework for querying and managing protein information that is more scalable and integrates new knowledge more easily than traditional relational databases. Era7 Bioinformatics develops Bio4j and other bioinformatics tools using an open source business model.
Road towards Owasp Orizon 2.0 (November 2009 update)Paolo Perego
The document provides an update on the OWASP Orizon 2.0 project roadmap. It summarizes the current state of the Orizon 1.19 tool and outlines goals for improving the tool, community, and development process. Key plans for the roadmap include reworking the architecture and implementation, improving usability, adding new features like taint analysis, and releasing version 2.0 by June 2010.
Bio4j: A pioneer graph based database for the integration of biological Big D...graphdevroom
The document describes Bio4j, an open source graph database for biological data integration. Bio4j stores biological data like Uniprot, Gene Ontology, and taxonomies in a graph structure using Neo4j technology. This allows for flexible querying of semantic relationships between data. Bio4j provides APIs for easy access to integrated data and can be customized with additional datasets. It aims to improve on relational databases for biological data through its scalable graph model.
Bio4j: A pioneer graph based database for the integration of biological Big DataPablo Pareja Tobes
1. Bio4j
2. What’s Bio4j?: Data included
3. What’s Bio4j?: A completely new and powerful framework for protein
4. What’s Bio4j?: Neo4j --> very scalable
5. What's Bio4j?: Everything in Bio4j is open source released under AGPLv3
6. Bioinformatics DBs and Graphs: Highly interconnected overlapping knowledge spread throughout different databases
7. Bioinformatics DBs and Graphs: Data is in most cases modeled in relational databases, (sometimes even just as plain CSV files)
8. Bioinformatics DBs and Graphs: Problems of a relational paradigm
9. Bioinformatics DBs and Graphs: Life + Biology like a graph
10. Bioinformatics DBs and Graphs: NoSQL
11. Bioinformatics DBs and Graphs: NoSQLdata models
12. Bioinformatics DBs and Graphs: The Graph DB model: representation
13. Bioinformatics DBs and Graphs: Neo4j
14. Initial motivation: Why starting all this?
15. Initial motivation: Processes had to be automated for BG7 (http://bg7.ohnosequences.com)
Easy-peasy OSGi Development with Bndtools - Neil Bartlettmfrancis
Developing OSGi bundles is just too hard! That gnarly old manifest; listing all the imported packages (again!); writing all those XML files... who needs that kind of hassle?
My goal when I began development of Bndtools was to change this picture drastically: to make it actually _easier_ to develop and test OSGi applications, composed of high quality reusable bundles, than it is to develop "traditional" non-modular Java applications. Now with the help of a growing community of contributors and brave alpha testers, that vision is beginning to come true.
Bndtools is an Eclipse IDE for OSGi development, based on bnd. In this talk I will demonstrate the features of Bndtools that help to make it fun and easy to build bundles. I will also talk about: Declarative Services with Java annotations; OBR for release management and provisioning; and integration testing strategies.
This document discusses validating biomedical scientific publications through a system called PaperMaker. It begins by outlining some key aspects of publishing, including agreeing/disagreeing on current science, bringing new results, and gaining new knowledge. It then describes ongoing work towards integrating literature into bioinformatics resources, supporting different domains, tracking provenance, and enabling inference and reasoning. Several of the presenter's past efforts are outlined, including named entity recognition, terminology resources, annotation formats, corpus annotation, and deploying solutions through services. The talk concludes by introducing PaperMaker as a way to validate scientific literature against these various resources and efforts.
This document provides an overview of Apache Maven, including:
- Maven is a software project management and comprehension tool based on conventions like standardized project descriptors (POMs) and build lifecycles.
- Key concepts include dependencies, versions, profiles, repositories, and a plugin-based architecture that supports custom goals and extensions.
- Maven 3.x focused on improving backward compatibility, performance through parallel builds and caching, and extensibility through new APIs and classloader partitioning for plugins.
Taverna Server allows users to run workflows remotely on a deployment host through a web portal. It uses a Tomcat container with a CXF framework to run per-user Taverna workflow engines and file managers as web services accessed through REST and SOAP interfaces. Results are stored in a document store and managed through a common system and management model accessible by clients like the Taverna Workbench and a Ruby client.
This document discusses bringing model organism databases onto the Semantic Web using SADI (Semantic Automated Discovery and Integration). SADI allows bioinformatics data and software to be integrated automatically through web services that consume and generate RDF. The document describes how SADI has been implemented for GMOD (Generic Model Organism Database) to provide services for accessing sequence feature data from model organism databases. It outlines the structure of the SADI services and their inputs and outputs, and provides instructions for setting up and registering the services.
This document discusses the Microsoft Biology Foundation (MBF), an open source bioinformatics toolkit built on the .NET Framework. MBF provides components for working with biological data formats, sequences, and algorithms. It is intended as a foundation for other tools and applications to build upon, rather than an application itself. The community around MBF consists of scientific programmers with a range of expertise. MBF is cross-platform, with new features in version 2 including advanced math functions, comparative assembly, command line tools, and visualization capabilities. The source code and community are hosted on Codeplex to encourage open source development of the toolkit.
Mobyle 1.0 introduces new features including workflows, viewers, and improved workspace management. Workflows allow automated chaining of sequential or parallel jobs specified through an XML description. Viewers display specialized output like VARNA diagrams through custom HTML code. Authentication and reporting have also been expanded with OpenID and Google Analytics integration.
This document summarizes a presentation given at the Bioinformatics Open Source Conference 2011 about the G-language Project from Keio University and Cornell University. Over the past 10 years, the G-language Project has developed several genome analysis tools including the Genome Analysis Environment (GAE), Genome Projector, and Pathway Projector. It has also created REST/SOAP web services and the Keio Bioinformatics Web Services (KBWS) to provide access to over 500 bioinformatics tools and databases.
BioLib is a C and C++ library that aims to provide common bioinformatics functionality to multiple bioinformatics programming languages like BioPerl, BioJava, and BioPython in order to prevent duplication of efforts given the limited number of contributing bioinformatics programmers. The BioLib project seeks to create a 'kernel' of reusable software for bioinformatics by initially focusing on sequence analysis, structure prediction, and biological database access.
Experiences with logic programming in bioinformaticsChris Mungall
This document discusses experiences applying logic programming techniques in bioinformatics. It describes Obol, a system that used definite clause grammars to parse biological terms, and Blipkit, a reusable bioinformatics toolkit built for SWI-Prolog. Blipkit includes domain models, I/O modules, and tools for integrating with relational databases and web services. The document discusses applications of logic programming for tasks like genome inference, phenotype matching, and consistency checking biological data. It evaluates different logic programming approaches for representing genomic data and rules.
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionHilmar Lapp
The Open Bioinformatics Foundation (OBF) is a non-profit organization that promotes open source software development and open science in bioinformatics. It sponsors conferences like BOSC and initiatives that nurture the bioinformatics community. The OBF accepts donations for and manages assets on behalf of member projects. In the past year, the OBF board of directors was elected, the OBF participated in Google Summer of Code, and the organization is working to improve sustainability and professionalism as an all-volunteer group.
Sharing Data: An Introductory Workshop from OpenAIRE and FosterOpenAIRE
This document provides an introduction to open data sharing. It discusses the benefits of open data such as saving time, enabling reproducibility, and strengthening the scholarly record. Funders increasingly require that research data be made openly available. However, it also acknowledges reasons one may not want to share data, such as privacy or ethical concerns. The document outlines key points to consider in open data sharing like metadata, formats, licensing, and long-term preservation.
I apologize, upon further review I do not have enough context to provide a meaningful summary of the document you provided. The document contains snippets of text from different sources without clear connections between them. Could you please provide a single coherent document or passage for me to summarize? Summarizing disparate quotes and sections without understanding their full context and relationship to each other may lead to inaccuracies.
Software engineering methodologies also work for Ontology engineering. This presentation from Bio-Ontologies 2012 describes how we are using Jenkins CI in GO and other ontologies.
BioJava is an open source Java framework for processing biological data. It provides tools for analyzing and manipulating sequences, structures, and other biological data. The latest version, BioJava 1.7, includes improved support for 3D structures and modularization into separate modules. The project aims to facilitate rapid bioinformatics application development and is supported by an active developer community.
Use Integrated Genome Browser to explore, analyze, and publish genomic dataAnn Loraine
The document discusses the genome browser IGB (Integrated Genome Browser) and how it can be used to analyze genomic data. IGB allows users to load, visualize, and analyze genomic data. It supports fast zooming and is highly interactive. Data can be shared using QuickLoad sites and IGB is extensible via apps. The document provides an example analysis of the MEOX1 gene using IGB to investigate alternative splicing and its effects on protein function. RNA-seq data was loaded and filtered in IGB to find evidence of exon skipping, which deletes a conserved homeodomain in the protein.
Data and Computational Challenges in Integrative Biomedical InformaticsJoel Saltz
Joel Saltz MD, PhD discusses data and computational challenges in integrative biomedical informatics. His research center analyzes complex patient data like medical images, pathology slides, and "omic" data to characterize diseases at multiple scales. Machine learning is used to automatically segment and classify features in images and identify patterns across different data types that can improve disease classification, predict outcomes, and uncover new biology. Large computing resources are required to handle and analyze huge biomedical datasets.
Graph DB + Bioinformatics: Bio4j, recent applications and future directions Pablo Pareja Tobes
Graph DB + Bioinformatics describes applications of graph databases in bioinformatics. Bio4j is a graph database that integrates biological data from sources like Uniprot, Gene Ontology, and NCBI Taxonomy. It provides a novel framework for querying and managing protein information that is more scalable and integrates new knowledge more easily than traditional relational databases. Era7 Bioinformatics develops Bio4j and other bioinformatics tools using an open source business model.
Road towards Owasp Orizon 2.0 (November 2009 update)Paolo Perego
The document provides an update on the OWASP Orizon 2.0 project roadmap. It summarizes the current state of the Orizon 1.19 tool and outlines goals for improving the tool, community, and development process. Key plans for the roadmap include reworking the architecture and implementation, improving usability, adding new features like taint analysis, and releasing version 2.0 by June 2010.
Bio4j: A pioneer graph based database for the integration of biological Big D...graphdevroom
The document describes Bio4j, an open source graph database for biological data integration. Bio4j stores biological data like Uniprot, Gene Ontology, and taxonomies in a graph structure using Neo4j technology. This allows for flexible querying of semantic relationships between data. Bio4j provides APIs for easy access to integrated data and can be customized with additional datasets. It aims to improve on relational databases for biological data through its scalable graph model.
Bio4j: A pioneer graph based database for the integration of biological Big DataPablo Pareja Tobes
1. Bio4j
2. What’s Bio4j?: Data included
3. What’s Bio4j?: A completely new and powerful framework for protein
4. What’s Bio4j?: Neo4j --> very scalable
5. What's Bio4j?: Everything in Bio4j is open source released under AGPLv3
6. Bioinformatics DBs and Graphs: Highly interconnected overlapping knowledge spread throughout different databases
7. Bioinformatics DBs and Graphs: Data is in most cases modeled in relational databases, (sometimes even just as plain CSV files)
8. Bioinformatics DBs and Graphs: Problems of a relational paradigm
9. Bioinformatics DBs and Graphs: Life + Biology like a graph
10. Bioinformatics DBs and Graphs: NoSQL
11. Bioinformatics DBs and Graphs: NoSQLdata models
12. Bioinformatics DBs and Graphs: The Graph DB model: representation
13. Bioinformatics DBs and Graphs: Neo4j
14. Initial motivation: Why starting all this?
15. Initial motivation: Processes had to be automated for BG7 (http://bg7.ohnosequences.com)
Easy-peasy OSGi Development with Bndtools - Neil Bartlettmfrancis
Developing OSGi bundles is just too hard! That gnarly old manifest; listing all the imported packages (again!); writing all those XML files... who needs that kind of hassle?
My goal when I began development of Bndtools was to change this picture drastically: to make it actually _easier_ to develop and test OSGi applications, composed of high quality reusable bundles, than it is to develop "traditional" non-modular Java applications. Now with the help of a growing community of contributors and brave alpha testers, that vision is beginning to come true.
Bndtools is an Eclipse IDE for OSGi development, based on bnd. In this talk I will demonstrate the features of Bndtools that help to make it fun and easy to build bundles. I will also talk about: Declarative Services with Java annotations; OBR for release management and provisioning; and integration testing strategies.
This document discusses validating biomedical scientific publications through a system called PaperMaker. It begins by outlining some key aspects of publishing, including agreeing/disagreeing on current science, bringing new results, and gaining new knowledge. It then describes ongoing work towards integrating literature into bioinformatics resources, supporting different domains, tracking provenance, and enabling inference and reasoning. Several of the presenter's past efforts are outlined, including named entity recognition, terminology resources, annotation formats, corpus annotation, and deploying solutions through services. The talk concludes by introducing PaperMaker as a way to validate scientific literature against these various resources and efforts.
This document provides an overview of Apache Maven, including:
- Maven is a software project management and comprehension tool based on conventions like standardized project descriptors (POMs) and build lifecycles.
- Key concepts include dependencies, versions, profiles, repositories, and a plugin-based architecture that supports custom goals and extensions.
- Maven 3.x focused on improving backward compatibility, performance through parallel builds and caching, and extensibility through new APIs and classloader partitioning for plugins.
Taverna Server allows users to run workflows remotely on a deployment host through a web portal. It uses a Tomcat container with a CXF framework to run per-user Taverna workflow engines and file managers as web services accessed through REST and SOAP interfaces. Results are stored in a document store and managed through a common system and management model accessible by clients like the Taverna Workbench and a Ruby client.
This document discusses bringing model organism databases onto the Semantic Web using SADI (Semantic Automated Discovery and Integration). SADI allows bioinformatics data and software to be integrated automatically through web services that consume and generate RDF. The document describes how SADI has been implemented for GMOD (Generic Model Organism Database) to provide services for accessing sequence feature data from model organism databases. It outlines the structure of the SADI services and their inputs and outputs, and provides instructions for setting up and registering the services.
This document discusses the Microsoft Biology Foundation (MBF), an open source bioinformatics toolkit built on the .NET Framework. MBF provides components for working with biological data formats, sequences, and algorithms. It is intended as a foundation for other tools and applications to build upon, rather than an application itself. The community around MBF consists of scientific programmers with a range of expertise. MBF is cross-platform, with new features in version 2 including advanced math functions, comparative assembly, command line tools, and visualization capabilities. The source code and community are hosted on Codeplex to encourage open source development of the toolkit.
Mobyle 1.0 introduces new features including workflows, viewers, and improved workspace management. Workflows allow automated chaining of sequential or parallel jobs specified through an XML description. Viewers display specialized output like VARNA diagrams through custom HTML code. Authentication and reporting have also been expanded with OpenID and Google Analytics integration.
This document summarizes a presentation given at the Bioinformatics Open Source Conference 2011 about the G-language Project from Keio University and Cornell University. Over the past 10 years, the G-language Project has developed several genome analysis tools including the Genome Analysis Environment (GAE), Genome Projector, and Pathway Projector. It has also created REST/SOAP web services and the Keio Bioinformatics Web Services (KBWS) to provide access to over 500 bioinformatics tools and databases.
The document describes an R package called isobar that provides tools for analyzing quantitative proteomics data from isobaric labeling experiments. The package extracts identification and quantitative information from mass spectrometry data. It models technical variability, normalizes data, and handles biological variability to determine differential protein expression accurately. Statistical tests and data visualization help decide significant regulation. Automated reporting of results in PDF format via Sweave is demonstrated.
1) Biopython is a free and open source library for bioinformatics written in Python. It has regular releases and is supported by an international team of volunteer developers.
2) Recent Biopython releases include versions 1.55-1.57, which added new features like Python 3 support, command line wrappers, and SQLite sequence indexing.
3) Biopython has participated in Google Summer of Code for the past two years, funding student projects on topics like PDB parsing and biomolecular interface analysis. Integrated testing via Buildbot helps ensure high code quality.
UGENE is a multiplatform open-source toolkit for complex genome analysis. It integrates popular bioinformatics tools into a unified visual and computational solution. Written in C++/Qt, UGENE has a modular structure and integrated plugin system. It supports over 20 common biological data formats and allows users to retrieve information from remote databases. UGENE features rich visualization capabilities and algorithms optimized for multi-core CPUs and GPUs. The UGENE Workflow Designer provides a visual environment for constructing computational workflows combining algorithms, data formats, and high performance capabilities.
Cloud BioLinux provides an open source and fully customizable bioinformatics computing environment on the cloud for genomics research. It allows small labs to perform large-scale genomic analyses independently without dedicated on-site computing resources. The project is a community effort led by Ntino Krampis and developed by a core team, with contributions from bioinformaticians worldwide. Cloud BioLinux builds on existing open source tools by providing pre-configured virtual machines on Amazon Web Services that can be easily deployed through a desktop computer with an internet connection. This allows researchers to share projects, analyses, and computing environments with collaborators through whole-system snapshots.
Cytoscape is a popular open-source software for biological network visualization and analysis. It has a large plugin ecosystem that allows for customization but this has led to issues with backwards compatibility and maintenance over time. Cytoscape 3.0 aims to address these issues by implementing a new architecture based on OSGi, semantic versioning, and Maven to better define dependencies, enforce modularity, and make developing and maintaining plugins easier going forward. This involves rewriting parts of Cytoscape but is intended to improve the sustainability and extensibility of the software long-term.
Cloud BioLinux provides a customizable open source bioinformatics computing environment on cloud platforms for genomics research. As sequencing technologies become more accessible, individual labs will need to perform their own analysis but may lack dedicated infrastructure. Cloud BioLinux addresses this need by offering pre-configured virtual machines with over 100 bioinformatics tools on Amazon EC2, allowing researchers to perform large-scale analysis using only a web browser. Developers can extend Cloud BioLinux by specifying additional software configurations using a framework based on Python fabric scripts. The project aims to expand its community and tool offerings to better support sequencing centers and individual researchers.
BioMart 0.8 offers new features like an integrated tool for building and configuring BioMart servers, support for more database backends, and analysis and visualization plugins. It provides more flexible querying interfaces like diverse web GUIs and programmatic APIs. BioMart hides the complexity of underlying databases and provides a simple conceptual model with concepts like datasets, attributes, and filters. It is used by several large collaborative projects for data management and federation.
The document discusses how InterMine uses RESTful web services for data integration and interoperability between biological databases, providing examples of how the InterMine API can be used to query databases and manage workflows through client libraries and sugar syntax. It also covers lessons learned in developing InterMine including using JSON for data exchange and implementing token-based authentication.
The document discusses several ontology tools and services including the Ontology Lookup Service (OLS) from the European Bioinformatics Institute (EBI) and the National Center for Biomedical Ontology (NCBO) BioPortal. It introduces OntoCAT, an ontology database and browser that provides a REST service for searching and accessing ontology data in a standardized way. OntoCAT aims to facilitate integration across ontologies and has been used to build applications like concept recognizers.
This document summarizes a presentation about CloudMan, a platform for deploying cloud computing resources and Galaxy instances. CloudMan allows users to setup cloud clusters in minutes without expertise, provides over 700GB of reference genomes and bioinformatics tools, and enables customization and sharing of derived cluster instances. It bridges users, isolated Galaxy instances, and cloud infrastructure. Key features include deployment on AWS, automated configuration, dynamic storage, and elastic scaling of resources.
Hadoop-BAM is a small Java library that allows Binary Alignment Map (BAM) files, a common format for storing aligned DNA sequencing reads, to be directly manipulated and processed on Hadoop. It handles challenges like BAM's binary format and compression by detecting record boundaries and providing access to the files through the Picard SAM API. The library was used to build tools for preprocessing large BAM files for interactive browsing of genome data on Hadoop, demonstrating good scaling on a test of over 50GB of sequencing data from 1000 Genomes Project. Future work involves developing more BAM analysis tools that can leverage Hadoop-BAM.
WebApollo is a web-based genome annotation editor that allows multiple users to collaboratively edit annotations in real-time. It uses a client-server model where the web-based client communicates with an annotation editing engine and data server through a framework called Trellis. This allows annotations to be accessed and edited without installing software by running in a web browser.
This document describes MyGene.Info, a gene annotation service that provides gene data through a RESTful API. It discusses how gene data is stored and queried from a CouchDB document-oriented database. Users can search for genes and retrieve full or filtered annotation objects for specific genes in JSON format through HTTP requests to the MyGene.Info website. The service aims to allow quick development of gene-centric websites and applications without needing to maintain a local gene annotation database.
This document discusses the potential for open source artificial intelligence to help understand molecular biology data. It argues that capturing common sense knowledge computationally has been challenging, but knowledge about molecular biology exists explicitly. An open source AI focused on molecular biology could help explain genomic data by developing a comprehensive knowledge base and using abductive inference. However, explaining biological phenomena is difficult and requires judgment. The document advocates for open source development to gain productivity advantages and build trust through transparency. It outlines challenges and opportunities for facilitating an open source AI community focused on understanding life.
OBIWEE is an open source bioinformatics cloud environment for running intensive workflows. It uses SLICEE as a workflow authoring tool and runs jobs on a scalable virtual cluster deployed on private or public clouds using OpenNebula or EC2. Workflows are authored by describing command lines and OBIWEE handles parallelization and job submission to the cluster. It provides tools for deployment, data management, and client access via command line or GUI interfaces like Kepler.
More from Bioinformatics Open Source Conference (20)
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...AlexanderRichford
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation Functions to Prevent Interaction with Malicious QR Codes.
Aim of the Study: The goal of this research was to develop a robust hybrid approach for identifying malicious and insecure URLs derived from QR codes, ensuring safe interactions.
This is achieved through:
Machine Learning Model: Predicts the likelihood of a URL being malicious.
Security Validation Functions: Ensures the derived URL has a valid certificate and proper URL format.
This innovative blend of technology aims to enhance cybersecurity measures and protect users from potential threats hidden within QR codes 🖥 🔒
This study was my first introduction to using ML which has shown me the immense potential of ML in creating more secure digital environments!
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfleebarnesutopia
So… you want to become a Test Automation Engineer (or hire and develop one)? While there’s quite a bit of information available about important technical and tool skills to master, there’s not enough discussion around the path to becoming an effective Test Automation Engineer that knows how to add VALUE. In my experience this had led to a proliferation of engineers who are proficient with tools and building frameworks but have skill and knowledge gaps, especially in software testing, that reduce the value they deliver with test automation.
In this talk, Lee will share his lessons learned from over 30 years of working with, and mentoring, hundreds of Test Automation Engineers. Whether you’re looking to get started in test automation or just want to improve your trade, this talk will give you a solid foundation and roadmap for ensuring your test automation efforts continuously add value. This talk is equally valuable for both aspiring Test Automation Engineers and those managing them! All attendees will take away a set of key foundational knowledge and a high-level learning path for leveling up test automation skills and ensuring they add value to their organizations.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
D03-NextGen-Bio-NGS
1. Bio-‐NGS:
BioRuby
plugin
to
conduct
programmable
workflows
for
Next
Genera?on
Sequencing
data
Raoul
J.P.
Bonnal
co-‐authors
bonnal@ingm.org
Francesco
Strozzi
Valeria
Ranzani
Integra(ve
Biology
Program
Toshiaki
Katayama
Is(tuto
Nazionale
di
Gene(ca
Molecolare
Italy
July
15,
2011
BOSC,
Vienna,
Austria
2. Bio-‐Gem
authors:
Raoul
J.P.
Bonnal,
Pjotr
Prins,
Toshiaki
Katayama
• a
soOware
generator
for
crea(ng
BioRuby
plugins
• last
year
(@BOSC
2010)
was
an
idea
and
a
prototype
• Features:
• bio-‐assembly
(0.1.0)
• bio-‐isoelectric_point
(0.1.1)
• bio-‐blastxmlparser
(0.6.1)
• bio-‐kb-‐illumina
(0.1.0)
• bio-‐bwa
(0.2.2)
• bio-‐lazyblastxml
(0.4.0)
– Extend
BioRuby
• bio-‐cnls_screenscraper
(0.1.0)
• bio-‐
• bio-‐logger
(0.9.0)
• bio-‐nexml
(0.0.1)
• bio-‐ngs
(0.2.1)
– Modularity
emboss_six_frame_nucleo(de • bio-‐octopus
(0.1.1)
_sequences
(0.1.0)
• bio-‐gem
(0.2.2)
• bio-‐samtools
(0.2.4)
• bio-‐sge
(0.0.0)
– Easy
• bio-‐genomic-‐interval
(0.1.2)
• bio-‐tm_hmm
(0.2.0)
• bio-‐gff3
(0.8.6)
• bio-‐graphics
(1.4)
• bio-‐ucsc-‐api
(0.1.0)
• sharing:packaging:publishing
• bio-‐hello
(0.0.0)
– Just
Code
!
Dev:
hcps://github.com/helios/bioruby-‐gem
Install:
gems
install
bio-‐gem
July
15,
2011
BOSC,
Vienna,
Austria
3. Bio-‐NGS
An
Applica(on
A
SoOware
Development
Framework
A
Project
Environment
July
15,
2011
BOSC,
Vienna,
Austria
4. Applica(on
• Stand
alone
– Auto
install
everything
it
needs
–
sandbox/isola*on-‐
– System-‐wide
or
User
(RVM
–Ruby
Version
Manager-‐)
• Mul(
plagorms
– Linux,
OS
X
– MRI,
JRuby
• Command
line
– Thor:
a
simple
and
efficient
tool
for
building
self-‐documen(ng
command
line
u(li(es
• Common
syntax
to
different
applica(ons
• Collec(on
of
Tasks
– Basic,
Advanced
RVM
hcps://rvm.beginrescueend.com/
Thor
hcps://github.com/wycats/thor
July
15,
2011
BOSC,
Vienna,
Austria
5. SoOware
Development
Framework
• Expand
BioRuby’s
func(onali(es
to
NGS
• API
+
Consistent
Namespace
• Integrate
third-‐party
tools
• Wrapping
:
quick,
easy
support,
increase
produc(vity
• Binding
:
low-‐level
func(onali(es
• Modular,
reuse
other
plug-‐ins
• BioBwa
(binding)
• BioSamtools
(binding)
July
15,
2011
BOSC,
Vienna,
Austria
6. Project
Environment
• Directory
scaffold
• Customize
– Tasks
:
Thor
or
Rake
(Ruby
DSL)
– Configura(ons:
YAML
• History
• Embedded
DB
– SQLite3
July
15,
2011
BOSC,
Vienna,
Austria
7. Tools
e/
Bow(
BWA
?
?
More…
Quant
FASTX-‐
Toolkit
July
15,
2011
BOSC,
Vienna,
Austria
9. Wrapper
module Bio
module Ngs
module Cufflinks
class Compare
include Bio::Command::Wrapper
set_program Bio::Ngs::Utils.binary("cufflinks/cuffcompare")
use_aliases
add_option "outprefix", :type => :string, :aliases => '-o', :default =>
"Comparison"
add_option "gtf_combine_file", :type => :string, :aliases => '-i'
add_option "gtf_reference", :type => :string, :aliases => '-r'
add_option "only_overlap", :type => :boolean, :aliases => '-R'
add_option "discard_transfrags", :type => :boolean, :aliases => '-M’
end
end
end
end
July
15,
2011
BOSC,
Vienna,
Austria
10. Wrapper
module Bio
module Ngs
module Cufflinks
class Compare
include Bio::Command::Wrapper
set_program Bio::Ngs::Utils.binary("cufflinks/cuffcompare")
use_aliases
add_option "outprefix", :type => :string, :aliases => '-o', :default =>
"Comparison"
add_option "gtf_combine_file", :type => :string, :aliases => '-i'
add_option "gtf_reference", :type => :string, :aliases => '-r'
irb(main):001:0> require:type => :boolean, :aliases => '-R'
add_option "only_overlap", ‘bio-ngs’
add_option "discard_transfrags", :type => :boolean, :aliases => '-M’
irb(main):001:1> cuffcompare = Bio::Ngs::Cufflinks::Compare.new
irb(main):001:2> cuffcompare.params = {….}
irb(main):001:3> cuffcompare.run(:arguments=>[…])
end
end
=> #<Bio::Ngs::Cufflinks::Compare:0x0000000c1630f8 @program="/
end
end usr/local/lib/ruby/gems/1.9.1/gems/bio-ngs-0.2.1/lib/bio/ngs/
ext/bin/linux/cufflinks/cuffcompare", @options={}, @params={}>
July
15,
2011
BOSC,
Vienna,
Austria
11. Tasks
No
binary
found
with
this
name:
setupBclToQseq.py
biongs
convert:qseq:fastq:samples_by_lane
SAMPLES
LANE
project
No
binary
found
with
this
name:
fastq_quality_boxplot_graph.sh
OUTPUT
-‐-‐-‐-‐-‐-‐-‐
No
binary
found
with
this
name:
blastn
biongs
project:new
[NAME]
No
binary
found
with
this
name:
blastx
history
biongs
project:update
[TYPE]
WARNING:
no
program
is
associated
with
BCLQSEQ
task,
does
-‐-‐-‐-‐-‐-‐-‐
not
make
sense
to
create
a
thor
task.
biongs
history:8
#
Task
convert:illumina:de:isoform
quality
WARNING:
no
program
is
associated
with
BLASTN
task,
does
not
PARAMETERS:
/Users/bonnalraoul/Desktop/
make
sense
to
create
a
thor
task.
RRep16giugno/DE_lane1-‐2-‐3-‐4-‐6-‐8/DE_lane1-‐2-‐3-‐4-‐6-‐8/ -‐-‐-‐-‐-‐-‐-‐
isoform_exp.diff
/Users/bonnalraoul/Desktop/ biongs
quality:boxplot
FASTQ_QUALITY_STATS
WARNING:
no
program
is
associated
with
BLASTX
task,
does
not
RRep16giugno/COMPARE_lane1-‐2-‐3-‐4-‐6-‐8/COMPA...
biongs
quality:fastq_stats
FASTQ
make
sense
to
create
a
thor
task.
biongs
quality:illumina_b_profile_raw
FASTQ
-‐-‐read-‐length=N
bwa
homology
biongs
quality:illumina_b_profile_svg
FASTQ
-‐-‐read-‐length=N
-‐-‐-‐
-‐-‐-‐-‐-‐-‐-‐-‐
biongs
quality:reads
FASTQ
biongs
bwa:aln:long
[FASTQ]
-‐-‐file-‐out=FILE_OUT
-‐-‐prefix=PREFIX
biongs
homology:convert:blast2text
[XML
FILE]
-‐-‐file-‐ biongs
quality:reads_coverage
FASTQ_QUALITY_STATS
biongs
bwa:aln:short
[FASTQ]
-‐-‐file-‐out=FILE_OUT
-‐-‐ out=FILE_OUT
biongs
quality:scacerplot
EXPR1
EXPR2
OUTPUT
prefix=PREFIX
biongs
homology:convert:go2json
biongs
quality:trim
FASTQ
biongs
bwa:index:long
[FASTA]
biongs
bwa:index:short
[FASTA]
biongs
homology:db:export
[TABLE]
-‐-‐fileout=FILEOUT
rna
biongs
bwa:sam:paired
-‐-‐fastq=one
two
three
-‐-‐file-‐ biongs
homology:db:init
out=FILE_OUT
-‐-‐prefix=PREFIX
-‐-‐sai=one
two
three
-‐-‐-‐
biongs
bwa:sam:single
[SAI]
-‐-‐fastq=FASTQ
-‐-‐file-‐ biongs
homology:download:all
biongs
rna:compare
GTF_REF
OUTPUTDIR
out=FILE_OUT
-‐-‐prefix=PREFIX
biongs
homology:download:goannota(on
GTFS_QUANTIFICATION
biongs
homology:download:uniprot
biongs
rna:idx2fasta
INDEX
FASTA
convert
biongs
homology:load:blast
[FILE]
biongs
rna:mapquant
DIST
INDEX
OUTPUTDIR
FASTQS
-‐-‐-‐-‐-‐-‐-‐
biongs
homology:load:goa
biongs
rna:quant
GTF
OUTPUTDIR
BAM
biongs
convert:bam:extract_genes
BAM
GENES
-‐-‐ensembl-‐ biongs
homology:report:blast
biongs
rna:tophat
DIST
INDEX
OUTPUTDIR
FASTQS
release=N
-‐o,
-‐-‐output=OUTPUT
biongs
convert:bam:merge
-‐i,
-‐-‐input-‐bams=one
two
three
ontology
sff
biongs
convert:bam:sort
BAM
[PREFIX]
-‐-‐-‐-‐-‐-‐-‐-‐
-‐-‐-‐
biongs
convert:bcl:qseq:convert
RUN
OUTPUT
[JOBS]
biongs
ontology:db:export
[TABLE]
-‐-‐fileout=FILEOUT
biongs
sff:extract
[FILE]
biongs
convert:illumina:de:gene
DIFF
GTF
biongs
ontology:db:init
biongs
convert:illumina:de:isoform
DIFF
GTF
biongs
ontology:download:all
biongs
convert:illumina:de:rename_qs
DIFF_FILE
NAMES
biongs
ontology:download:go
biongs
convert:illumina:fastq:trim_b
FASTQ
biongs
ontology:download:goslim
biongs
convert:illumina:humanize:build_compare_kb
GTF
biongs
ontology:load:genego
[FILE]
biongs
convert:illumina:humanize:isoform_exp
GTF
ISOFORM
biongs
ontology:load:go
[FILE]
biongs
convert:qseq:fastq:by_file
FIRST
OUTPUT
biongs
ontology:report:go
biongs
convert:qseq:fastq:by_lane
LANE
OUTPUT
biongs
convert:qseq:fastq:by_lane_index
LANE
INDEX
OUTPUT
July
15,
2011
BOSC,
Vienna,
Austria
12. N o
B i n a r y
Tasks
Task
disabled
No
binary
found
with
this
name:
setupBclToQseq.py
biongs
convert:qseq:fastq:samples_by_lane
SAMPLES
LANE
project
Keep
OUTPUT
No
binary
found
with
this
name:
fastq_quality_boxplot_graph.sh
No
binary
found
with
this
name:
blastn
-‐-‐-‐-‐-‐-‐-‐
biongs
project:new
[NAME]
everything
No
binary
found
with
this
name:
blastx
history
biongs
project:update
[TYPE]
WARNING:
no
program
is
associated
with
BCLQSEQ
task,
does
-‐-‐-‐-‐-‐-‐-‐
not
make
sense
to
create
a
thor
task.
biongs
history:8
#
Task
convert:illumina:de:isoform
organized
quality
WARNING:
no
program
is
associated
with
BLASTN
task,
does
not
PARAMETERS:
/Users/bonnalraoul/Desktop/
make
sense
to
create
a
thor
task.
RRep16giugno/DE_lane1-‐2-‐3-‐4-‐6-‐8/DE_lane1-‐2-‐3-‐4-‐6-‐8/ -‐-‐-‐-‐-‐-‐-‐
isoform_exp.diff
/Users/bonnalraoul/Desktop/ biongs
quality:boxplot
FASTQ_QUALITY_STATS
WARNING:
no
program
is
associated
with
BLASTX
task,
does
not
RRep16giugno/COMPARE_lane1-‐2-‐3-‐4-‐6-‐8/COMPA...
biongs
quality:fastq_stats
FASTQ
make
sense
to
create
a
thor
task.
biongs
quality:illumina_b_profile_raw
FASTQ
-‐-‐read-‐length=N
bwa
homology
biongs
quality:illumina_b_profile_svg
FASTQ
-‐-‐read-‐length=N
-‐-‐-‐
-‐-‐-‐-‐-‐-‐-‐-‐
biongs
quality:reads
FASTQ
biongs
bwa:aln:long
[FASTQ]
-‐-‐file-‐out=FILE_OUT
-‐-‐prefix=PREFIX
biongs
homology:convert:blast2text
[XML
FILE]
-‐-‐file-‐ biongs
quality:reads_coverage
FASTQ_QUALITY_STATS
biongs
bwa:aln:short
[FASTQ]
-‐-‐file-‐out=FILE_OUT
-‐-‐ out=FILE_OUT
biongs
quality:scacerplot
EXPR1
EXPR2
OUTPUT
prefix=PREFIX
biongs
bwa:index:long
[FASTA]
biongs
homology:convert:go2json
Repor(ng
biongs
quality:trim
FASTQ
biongs
bwa:index:short
[FASTA]
biongs
bwa:sam:paired
-‐-‐fastq=one
two
three
-‐-‐file-‐
Recall
an
biongs
homology:db:export
[TABLE]
-‐-‐fileout=FILEOUT
rna
biongs
homology:db:init
-‐-‐-‐
old
out=FILE_OUT
-‐-‐prefix=PREFIX
-‐-‐sai=one
two
three
biongs
bwa:sam:single
[SAI]
-‐-‐fastq=FASTQ
-‐-‐file-‐
out=FILE_OUT
-‐-‐prefix=PREFIX
biongs
homology:download:all
biongs
homology:download:goannota(on
biongs
rna:compare
GTF_REF
OUTPUTDIR
GTFS_QUANTIFICATION
convert
analysis
biongs
homology:download:uniprot
biongs
homology:load:blast
[FILE]
biongs
rna:idx2fasta
INDEX
FASTA
biongs
rna:mapquant
DIST
INDEX
OUTPUTDIR
FASTQS
-‐-‐-‐-‐-‐-‐-‐
biongs
homology:load:goa
biongs
rna:quant
GTF
OUTPUTDIR
BAM
biongs
convert:bam:extract_genes
BAM
GENES
-‐-‐ensembl-‐ biongs
homology:report:blast
biongs
rna:tophat
DIST
INDEX
OUTPUTDIR
FASTQS
release=N
-‐o,
-‐-‐output=OUTPUT
biongs
convert:bam:merge
-‐i,
-‐-‐input-‐bams=one
two
three
ontology
sff
biongs
convert:bam:sort
BAM
[PREFIX]
-‐-‐-‐-‐-‐-‐-‐-‐
-‐-‐-‐
biongs
convert:bcl:qseq:convert
RUN
OUTPUT
[JOBS]
biongs
ontology:db:export
[TABLE]
-‐-‐fileout=FILEOUT
biongs
sff:extract
[FILE]
biongs
convert:illumina:de:gene
DIFF
GTF
biongs
ontology:db:init
biongs
convert:illumina:de:isoform
DIFF
GTF
biongs
ontology:download:all
biongs
convert:illumina:de:rename_qs
DIFF_FILE
NAMES
biongs
ontology:download:go
biongs
convert:illumina:fastq:trim_b
FASTQ
biongs
ontology:download:goslim
biongs
convert:illumina:humanize:build_compare_kb
GTF
biongs
ontology:load:genego
[FILE]
biongs
convert:illumina:humanize:isoform_exp
GTF
ISOFORM
biongs
ontology:load:go
[FILE]
biongs
convert:qseq:fastq:by_file
FIRST
OUTPUT
biongs
convert:qseq:fastq:by_lane
LANE
OUTPUT
biongs
ontology:report:go
Basic
Advanced
biongs
convert:qseq:fastq:by_lane_index
LANE
INDEX
OUTPUT
July
15,
2011
BOSC,
Vienna,
Austria
13. Tasks
class Rna < Thor
desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify"
method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose
this option pass just the basename
of the file without forward/reverse
and .fastq'
def mapquant(dist, index, outputdir, fastqs)
#tophat
invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired
#cufflinks quantification on gtf
invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")]
end
…
end
July
15,
2011
BOSC,
Vienna,
Austria
14. Tasks
class Rna < Thor
# you'll end up with 3 accept file, regular, sorted, sorted-indexed
desc "tophat DIST INDEX OUTPUTDIR FASTQS", "run tophat as from command line, default 6 processors and then create a
sorted bam indexed."
method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass
just the…’
Bio::Ngs::Tophat.new.thor_task(self, :tophat) do |wrapper, task, dist, index, outputdir, fastqs|
wrapper.params = task.options #merge passed options to the wrapper.
wrapper.params = {"mate-inner-dist"=>dist, "output-dir"=>outputdir, "num-threads"=>6, "solexa1.3-quals"=>true}
fastq_files = task.options[:paired] ? ["#{fastqs}_forward.fastq","#{fastqs}_reverse.fastq"] : ["#{fastqs}"]
wrapper.run :arguments=>[index, fastq_files ].flatten, :separator=>"="
class Rna < Thor
accepted_hits_bam_fn = File.join(outputdir, "accepted_hits.bam")
desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify"
method_option :paired, "convert:bam:sort", :default => false, :desc => 'Are reads paired? If you chose
task.invoke :type => :boolean, [accepted_hits_bam_fn] # call the sorting procedure.
end this option pass just the basename
end of the file without forward/reverse
and .fastq'
def mapquant(dist, index, outputdir, fastqs)
#tophat
invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired
#cufflinks quantification on gtf
invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")]
end
…
end
July
15,
2011
BOSC,
Vienna,
Austria
15. Tasks
class Rna < Thor
# you'll end up with 3 accept file, regular, sorted, sorted-indexed
desc "tophat DIST INDEX OUTPUTDIR FASTQS", "run tophat as from command line, default 6 processors and then create a
sorted bam indexed."
method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass
just the…’
Bio::Ngs::Tophat.new.thor_task(self, :tophat) do |wrapper, task, dist, index, outputdir, fastqs|
wrapper.params = task.options #merge passed options to the wrapper.
wrapper.params = {"mate-inner-dist"=>dist, "output-dir"=>outputdir, "num-threads"=>6, "solexa1.3-quals"=>true}
fastq_files = task.options[:paired] ? ["#{fastqs}_forward.fastq","#{fastqs}_reverse.fastq"] : ["#{fastqs}"]
wrapper.run :arguments=>[index, fastq_files ].flatten, :separator=>"="
class Rna < Thor
accepted_hits_bam_fn = File.join(outputdir, "accepted_hits.bam")
desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify"
method_option :paired, "convert:bam:sort", :default => false, :desc => 'Are reads paired? If you chose
task.invoke :type => :boolean, [accepted_hits_bam_fn] # call the sorting procedure.
end this option pass just the basename
end of the file without forward/reverse
and .fastq'
def mapquant(dist, index, outputdir, fastqs)
#tophat
invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired
#cufflinks quantification on gtf
invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")]
end
…
end
class Rna < Thor
desc "quant GTF OUTPUTDIR BAM ", "Genes and transcripts quantification"
Bio::Ngs::Cufflinks::Quantification.new.thor_task(self, :quant) do |wrapper, task, gtf, outputdir, bam|
wrapper.params = task.options
wrapper.params = {"num-threads" => 6, "output-dir" => outputdir, "GTF" => gtf }
wrapper.run :arguments=>[bam], :separator => "="
end
end
July
15,
2011
BOSC,
Vienna,
Austria
16. Next?
• Support
more
soOware,
not
only
NGS
• Wrap
EMBOSS
on
the
fly
reading
acd
files
• Tune
according
to
hardware
• Share
tasks
– Thor
&
Rake
• Improve
JRuby
compa(bility
• Contributes
• Scalability
– Cloud
?
BioLinux
– BioHub:
distribute
tasks
using
messaging
• Ac(veMQ
• Stomp
• Ac(veMessaging
• Adapters
for
Queuing
Systems
July
15,
2011
BOSC,
Vienna,
Austria
17. Next?
• Support
more
soOware,
not
only
NGS
• Wrap
EMBOSS
on
the
fly
reading
acd
files
• Tune
according
to
hardware
• Share
tasks
– Thor
&
Rake
• Improve
JRuby
compa(bility
• Contributes
• Scalability
– Cloud
?
BioLinux
– BioHub:
distribute
tasks
using
messaging
• Ac(veMQ
• Stomp
• Ac(veMessaging
• Adapters
for
Queuing
Systems
July
15,
2011
BOSC,
Vienna,
Austria
18. Acknowledgments
Serena
Cur(
Francesco
Strozzi1,3
Groningen
Bioinforma(cs
Centre
Debora
Mascheroni
Pjotr
Prins2
Alessandra
Stella
Valeria
Parente
Valeria
Ranzani1
Anna
Ripamon(
Grazisa
Rossez
Riccardo
L.
Rossi
Laboratory
of
Genome
Database
Dan
MacLean
4
Roberto
Sciarreca
Toshiaki
Katayama1,2
The
Genome
Analysis
Centre
Ricardo
Ramirez-‐Gonzalez
4
Massimiliano
Pagani
1
bio-‐ngs,
2
bio-‐gem,
3
bio-‐bwa,
4
bio-‐samtools
July
15,
2011
BOSC,
Vienna,
Austria
19. Ques(ons
?
INFO
E-‐mail:
bonnal@ingm.org
/
r@bioruby.org
Dev
:
hcp://github/helios/bioruby-‐ngs
Docs
:
hcps://github.com/helios/bioruby-‐ngs/blob/master/README.rdoc
Wiki
:
hcp://bioruby.open-‐bio.org/wiki/Next_Genera(on_Sequencing
BioRuby-‐ML:
hcp://lists.open-‐bio.org/mailman/lis(nfo/bioruby
Irc:
#bioruby
(
irc.freenode.org
)
July
15,
2011
BOSC,
Vienna,
Austria