an R package that contains a collection of tools for visualizing and analyzing genome-wide data sets. The package works with a variety of genomic interval file types and enables easy summarization and annotation of high throughput data sets with given genomic annotations. http://al2na.github.io/genomation/
AviPulse is a not for profit voluntary organization working for bird conservation. This presentation is a compilation of our work in the past semester (yes! we're college students - well most of us anyways at the time of uploading :P) presented at the YETI (Young Ecologists Talk and Interact) conference held on 20th Jan, 2016 @Delhi
Computational genomics approaches to precision medicineAltuna Akalin
The document announces a computational genomics conference from September 12-23, 2016 in Berlin, Germany that will focus on approaches to precision medicine. The conference will include lectures from experts in computational genomics and nine course modules covering topics like RNA-seq analysis, ChIP-seq analysis, variant calling, data integration, and cancer classification using high-throughput sequencing data. The application deadline for the conference is July 1, 2016.
2015 Computational genomics course poster. There will be theoretical lectures followed by practical session where students apply what they learned. The programming will be mainly done in R. The course will be beneficial for first year computational biology PhD students or experimental biologists who want to start data analysis or seeking a better understanding of computational genomics.
This document introduces analyzing methylation data from Reduced Representation Bisulfite Sequencing (RRBS) experiments using the R package methylKit. It begins with an overview of basic R operations and data structures. Next, it discusses relevant genomics packages in Bioconductor like GenomicRanges and IRanges that are useful for working with genomic intervals. Finally, it demonstrates how to use methylKit to analyze RRBS methylation data, including working with annotated methylation events.
This document discusses various methods for normalization of RNA-seq read count data, including RPKM/FPKM, TMM, Limma voom, and TPM. It provides explanations of each method and how they aim to correct for differences in sequencing depth, transcript length, and transcript pool composition between samples. The document also provides examples of calculating RPKM, TPM, and comparing the two methods. Lastly, it discusses using tools like RSEM, Bowtie, and EBSeq for determining differentially expressed genes from RNA-seq data through a count-based strategy.
The document describes a competition between Dries Godderis and Wim Van Criekinge to find the longest words in English, Dutch, and Portuguese that meet certain criteria. Dries Godderis provides examples of words that meet each of the criteria in the different languages. He also discusses how a repeated stretch of letters at the beginning and end of a sequence could stabilize hairpin loop structures in biology.
This document summarizes the work of Hans Jansen and Christiaan Henkel with long read nanopore sequencing. They have sequenced several genomes including carp, eel, king cobra, and Agrobacterium using MinION. Their longest reads were 120 kbp and 93.5 kbp. They also established the MinION Access Program to improve genomes by resolving repeats. As part of this, they formed the MinION Analysis and Reference Consortium to standardize protocols and understand variability between labs. Their work with the E. coli genome demonstrated sources of variation in read counts, lengths, and alignments between labs.
Long reads from nanopore sequencing provide advantages for sequencing small genomes like bacteria and viruses. Key benefits include easier genome assembly from long overlapping reads, resolution of structural variants and repeats from single long reads, and real-time analysis. Nanopore devices are scalable and can sequence genomes from viruses to plants depending on the device and number of flow cells used.
AviPulse is a not for profit voluntary organization working for bird conservation. This presentation is a compilation of our work in the past semester (yes! we're college students - well most of us anyways at the time of uploading :P) presented at the YETI (Young Ecologists Talk and Interact) conference held on 20th Jan, 2016 @Delhi
Computational genomics approaches to precision medicineAltuna Akalin
The document announces a computational genomics conference from September 12-23, 2016 in Berlin, Germany that will focus on approaches to precision medicine. The conference will include lectures from experts in computational genomics and nine course modules covering topics like RNA-seq analysis, ChIP-seq analysis, variant calling, data integration, and cancer classification using high-throughput sequencing data. The application deadline for the conference is July 1, 2016.
2015 Computational genomics course poster. There will be theoretical lectures followed by practical session where students apply what they learned. The programming will be mainly done in R. The course will be beneficial for first year computational biology PhD students or experimental biologists who want to start data analysis or seeking a better understanding of computational genomics.
This document introduces analyzing methylation data from Reduced Representation Bisulfite Sequencing (RRBS) experiments using the R package methylKit. It begins with an overview of basic R operations and data structures. Next, it discusses relevant genomics packages in Bioconductor like GenomicRanges and IRanges that are useful for working with genomic intervals. Finally, it demonstrates how to use methylKit to analyze RRBS methylation data, including working with annotated methylation events.
This document discusses various methods for normalization of RNA-seq read count data, including RPKM/FPKM, TMM, Limma voom, and TPM. It provides explanations of each method and how they aim to correct for differences in sequencing depth, transcript length, and transcript pool composition between samples. The document also provides examples of calculating RPKM, TPM, and comparing the two methods. Lastly, it discusses using tools like RSEM, Bowtie, and EBSeq for determining differentially expressed genes from RNA-seq data through a count-based strategy.
The document describes a competition between Dries Godderis and Wim Van Criekinge to find the longest words in English, Dutch, and Portuguese that meet certain criteria. Dries Godderis provides examples of words that meet each of the criteria in the different languages. He also discusses how a repeated stretch of letters at the beginning and end of a sequence could stabilize hairpin loop structures in biology.
This document summarizes the work of Hans Jansen and Christiaan Henkel with long read nanopore sequencing. They have sequenced several genomes including carp, eel, king cobra, and Agrobacterium using MinION. Their longest reads were 120 kbp and 93.5 kbp. They also established the MinION Access Program to improve genomes by resolving repeats. As part of this, they formed the MinION Analysis and Reference Consortium to standardize protocols and understand variability between labs. Their work with the E. coli genome demonstrated sources of variation in read counts, lengths, and alignments between labs.
Long reads from nanopore sequencing provide advantages for sequencing small genomes like bacteria and viruses. Key benefits include easier genome assembly from long overlapping reads, resolution of structural variants and repeats from single long reads, and real-time analysis. Nanopore devices are scalable and can sequence genomes from viruses to plants depending on the device and number of flow cells used.
This document discusses high-throughput DNA sequencing technologies and their application to genome assembly projects. It provides a brief history of DNA sequencing, from early chemical and chain termination methods to current massively parallel sequencing technologies. It also describes several long-read sequencing technologies, including Pacific Biosciences SMRT sequencing and Oxford Nanopore sequencing. Examples are given of genome projects utilizing these technologies along with short-read sequencing data.
This document describes a method for de novo genome assembly of the forest pathogen Grosmannia clavigera using sequencing data from multiple platforms. Sanger, 454, and Illumina sequencing technologies were used to generate sequence data, which were then assembled using Velvet and Forge assemblers. The final assembly was approximately 32.5 Mb in length with an N50 scaffold size of 782 kb. The method demonstrates that eukaryotic genomes can be accurately assembled by integrating sequence data from different sequencing platforms.
This document provides an introduction and overview of manual genome annotation using the Apollo genome annotation tool. It begins with an outline of the webinar topics, which include an introduction to manual annotation and its necessity, an overview of the Apollo tool and its functionality for collaborative curation, and examples and demonstrations. The document then covers key concepts for manual annotation such as the definition of a gene, genome curation steps, transcription and translation including reading frames, splice sites, and phase. The goal of the webinar is to help participants better understand genome curation and manual annotation using Apollo to identify and modify gene models.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
A Workshop at the Stowers Institute for Medical Research.
FANS is a bioinformatics tool that predicts the functional effects of novel single nucleotide polymorphisms (SNPs) and point mutations in humans and mice. It classifies variants into risk levels - very high, high, medium, and very low - based on their potential impact on coding regions, splicing sites, protein domains and protein function. It integrates data from multiple databases and algorithms to analyze the functional consequences of variants.
The document provides information about various bioinformatics tools for DNA sequence analysis. It describes tools for finding protein coding regions like GeneMark and GENSCAN. It discusses tools for predicting promoters like SoftBerry Promoter and Promoter 2.0. It outlines how Tandem Repeat Finder can detect tandem repeats and how RepeatMasker can mask interspersed repeats in a sequence. It also discusses UTRScan for finding UTR locations and CpG Islands for detecting CpG islands. For each tool, it provides the procedure and interpretation of sample results.
Lightning is a tool that facilitates queries, machine learning, and interactive analysis of hundreds to millions of genomes by utilizing tiling to represent genomes in a compact and standardized format. It tiles genomes by aligning common k-mers to generate unique tile identifiers, then represents genomes as arrays of these identifiers. This enables genomes to be loaded into memory and compared more efficiently via genotypic queries and machine learning. Common uses enabled by Lightning include sharing clinically important variants, standardizing variant call format files, adding annotations, comparing genomes, and finding genomes with specific genotypes in large cohorts.
Overview of methods for variant calling from next-generation sequence dataThomas Keane
This document provides an overview of methods for variant calling from next-generation sequencing data. It describes the SAM/BAM format for storing read alignments and details various tools for manipulating and visualizing BAM files. It also discusses approaches for calling SNPs, small indels, and structural variants from aligned sequencing data, highlighting factors to consider for accurate variant detection.
1. The document discusses various strategies for RNA-seq data analysis including fast sequence alignment strategies using hash tables and suffix trees.
2. It also covers criteria for choosing aligners for different types of sequences, tools for mapping reads across splice junctions, and normalization methods like RPKM and TPM.
3. The key steps involved in the identification of differentially expressed genes from RNA-seq data using count-based and FPKM-based strategies are described. This includes mapping, quantifying transcripts, and using tools like Cufflinks, RSEM, DESeq and edgeR for detection of differentially expressed genes.
This document provides an overview of RNA-Seq analysis. It begins with considerations for RNA-Seq experiments such as computational requirements. It then describes the general RNA-Seq analysis workflow including short-read alignment, transcript reconstruction, abundance estimation, visualization, and statistics. The document focuses on explaining the "Tuxedo" analysis pipeline which includes Bowtie, Tophat, Cufflinks, Cuffmerge, Cuffdiff and CummeRbund. It provides examples of commands for each step and discusses alternative tools. The document concludes with resources for further information on RNA-Seq analysis.
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
The document discusses Genome in a Bottle (GIAB) and its efforts to characterize human genomes and provide reference materials and benchmarks to evaluate genome sequencing and variant calling. Specifically, it summarizes how GIAB has characterized 7 human genomes, provides extensive public sequencing data for benchmarking, and is now using linked and long reads to expand the small variant benchmark set, develop a structural variant benchmark, and perform diploid assembly of difficult regions. It also shows how new benchmarks that include more difficult regions have revealed errors in previous benchmarks and reduced performance metrics for variant calling tools.
This document provides an optimized protocol for preparing inexpensive Nextera mate pair libraries with improved performance for scaffolding genomes. Key optimizations include adjusting the tagmentation reaction to maximize DNA in the target size range, performing multiple rounds of Covaris shearing to achieve a narrower insert size distribution peaked at 450-500bp, and reducing PCR cycles to 10 or less to limit duplicates. The protocol aims to improve scaffolding by focusing on read pairs that contain junction adaptors through these adjustments to yield and read length.
Apollo: A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the Manakin Genomics research community.
This document provides an improved protocol for preparing Nextera Mate Pair libraries with the goals of enhancing yield and detecting junction adaptors. Key optimizations include testing multiple tagmentation conditions to maximize DNA within the targeted size range, performing successive Covaris shearing to achieve fragment sizes suitable for junction detection, and reducing PCR cycles to minimize amplification bias. The protocol recommends analysis and sequencing approaches to validate library quality and assess the proportion of reads containing junction adaptors.
The document outlines topics that will be covered in a bioinformatics course, including biological databases, sequence alignments, database searching, phylogenetics, protein structure, gene prediction, gene ontologies, hidden Markov models, and non-coding RNA. It then provides more details on topics like using sequence similarity to gather information from unknown protein sequences, finding conserved patterns in alignments using profiles and hidden Markov models, and challenges around gene prediction from raw genomic sequences.
1) The study aimed to improve the genome annotation of Brassica rapa by assembling its transcriptome using RNA-Seq data from multiple tissues and detecting novel transcripts.
2) Over 128 RNA-Seq libraries were generated, sequenced, and quality controlled using resources on the iPlant Collaborative and XSEDE. De novo and reference-based assembly was performed.
3) The analysis detected over 3,500 novel transcripts and updated the gene models for over 28,000 genes in the B. rapa genome. The hybrid assembly approach provided a more accurate annotation compared to single methods.
The document discusses genome assembly and finishing processes. It begins by outlining typical project goals of completely restoring the genome and producing a high-quality consensus sequence. It then describes the evolution of sequencing technologies from Sanger to newer platforms and their impact on draft assemblies. Key steps in the assembly and finishing process include library preparation, assembly, identifying gaps, and improving consensus quality.
This document discusses progress in plant genome sequencing. It begins by outlining advances in DNA sequencing technologies that have improved data quality and volume. It then describes the diversity of plant genomes and challenges of sequencing and assembly. Key applications of plant genome sequencing discussed include model genomes, crop genomes, sequencing plant biodiversity, rare/threatened species, and pan-genomes. Details are provided on the steps of sequencing and assembling plant genomes. Emerging long-read sequencing technologies and their advantages over short-read techniques are also summarized.
Data analysis patterns, tools and data types in genomicsAltuna Akalin
This document provides an overview of common data analysis patterns, tools, and data types used in genomics. It discusses the general genomics workflow, which includes data collection, quality checking and cleaning, processing, exploratory analysis and modeling, and visualization and reporting. For each step, examples of relevant tools and file formats used in genomics are provided. The document emphasizes that visualization and exploration of data is an important part of analysis. It provides advice for getting started with learning bioinformatics, including using programming languages like R and Galaxy, getting help from others, and not worrying about perfection initially.
More Related Content
Similar to genomation: summary of genomic intervals
This document discusses high-throughput DNA sequencing technologies and their application to genome assembly projects. It provides a brief history of DNA sequencing, from early chemical and chain termination methods to current massively parallel sequencing technologies. It also describes several long-read sequencing technologies, including Pacific Biosciences SMRT sequencing and Oxford Nanopore sequencing. Examples are given of genome projects utilizing these technologies along with short-read sequencing data.
This document describes a method for de novo genome assembly of the forest pathogen Grosmannia clavigera using sequencing data from multiple platforms. Sanger, 454, and Illumina sequencing technologies were used to generate sequence data, which were then assembled using Velvet and Forge assemblers. The final assembly was approximately 32.5 Mb in length with an N50 scaffold size of 782 kb. The method demonstrates that eukaryotic genomes can be accurately assembled by integrating sequence data from different sequencing platforms.
This document provides an introduction and overview of manual genome annotation using the Apollo genome annotation tool. It begins with an outline of the webinar topics, which include an introduction to manual annotation and its necessity, an overview of the Apollo tool and its functionality for collaborative curation, and examples and demonstrations. The document then covers key concepts for manual annotation such as the definition of a gene, genome curation steps, transcription and translation including reading frames, splice sites, and phase. The goal of the webinar is to help participants better understand genome curation and manual annotation using Apollo to identify and modify gene models.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
A Workshop at the Stowers Institute for Medical Research.
FANS is a bioinformatics tool that predicts the functional effects of novel single nucleotide polymorphisms (SNPs) and point mutations in humans and mice. It classifies variants into risk levels - very high, high, medium, and very low - based on their potential impact on coding regions, splicing sites, protein domains and protein function. It integrates data from multiple databases and algorithms to analyze the functional consequences of variants.
The document provides information about various bioinformatics tools for DNA sequence analysis. It describes tools for finding protein coding regions like GeneMark and GENSCAN. It discusses tools for predicting promoters like SoftBerry Promoter and Promoter 2.0. It outlines how Tandem Repeat Finder can detect tandem repeats and how RepeatMasker can mask interspersed repeats in a sequence. It also discusses UTRScan for finding UTR locations and CpG Islands for detecting CpG islands. For each tool, it provides the procedure and interpretation of sample results.
Lightning is a tool that facilitates queries, machine learning, and interactive analysis of hundreds to millions of genomes by utilizing tiling to represent genomes in a compact and standardized format. It tiles genomes by aligning common k-mers to generate unique tile identifiers, then represents genomes as arrays of these identifiers. This enables genomes to be loaded into memory and compared more efficiently via genotypic queries and machine learning. Common uses enabled by Lightning include sharing clinically important variants, standardizing variant call format files, adding annotations, comparing genomes, and finding genomes with specific genotypes in large cohorts.
Overview of methods for variant calling from next-generation sequence dataThomas Keane
This document provides an overview of methods for variant calling from next-generation sequencing data. It describes the SAM/BAM format for storing read alignments and details various tools for manipulating and visualizing BAM files. It also discusses approaches for calling SNPs, small indels, and structural variants from aligned sequencing data, highlighting factors to consider for accurate variant detection.
1. The document discusses various strategies for RNA-seq data analysis including fast sequence alignment strategies using hash tables and suffix trees.
2. It also covers criteria for choosing aligners for different types of sequences, tools for mapping reads across splice junctions, and normalization methods like RPKM and TPM.
3. The key steps involved in the identification of differentially expressed genes from RNA-seq data using count-based and FPKM-based strategies are described. This includes mapping, quantifying transcripts, and using tools like Cufflinks, RSEM, DESeq and edgeR for detection of differentially expressed genes.
This document provides an overview of RNA-Seq analysis. It begins with considerations for RNA-Seq experiments such as computational requirements. It then describes the general RNA-Seq analysis workflow including short-read alignment, transcript reconstruction, abundance estimation, visualization, and statistics. The document focuses on explaining the "Tuxedo" analysis pipeline which includes Bowtie, Tophat, Cufflinks, Cuffmerge, Cuffdiff and CummeRbund. It provides examples of commands for each step and discusses alternative tools. The document concludes with resources for further information on RNA-Seq analysis.
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
The document discusses Genome in a Bottle (GIAB) and its efforts to characterize human genomes and provide reference materials and benchmarks to evaluate genome sequencing and variant calling. Specifically, it summarizes how GIAB has characterized 7 human genomes, provides extensive public sequencing data for benchmarking, and is now using linked and long reads to expand the small variant benchmark set, develop a structural variant benchmark, and perform diploid assembly of difficult regions. It also shows how new benchmarks that include more difficult regions have revealed errors in previous benchmarks and reduced performance metrics for variant calling tools.
This document provides an optimized protocol for preparing inexpensive Nextera mate pair libraries with improved performance for scaffolding genomes. Key optimizations include adjusting the tagmentation reaction to maximize DNA in the target size range, performing multiple rounds of Covaris shearing to achieve a narrower insert size distribution peaked at 450-500bp, and reducing PCR cycles to 10 or less to limit duplicates. The protocol aims to improve scaffolding by focusing on read pairs that contain junction adaptors through these adjustments to yield and read length.
Apollo: A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the Manakin Genomics research community.
This document provides an improved protocol for preparing Nextera Mate Pair libraries with the goals of enhancing yield and detecting junction adaptors. Key optimizations include testing multiple tagmentation conditions to maximize DNA within the targeted size range, performing successive Covaris shearing to achieve fragment sizes suitable for junction detection, and reducing PCR cycles to minimize amplification bias. The protocol recommends analysis and sequencing approaches to validate library quality and assess the proportion of reads containing junction adaptors.
The document outlines topics that will be covered in a bioinformatics course, including biological databases, sequence alignments, database searching, phylogenetics, protein structure, gene prediction, gene ontologies, hidden Markov models, and non-coding RNA. It then provides more details on topics like using sequence similarity to gather information from unknown protein sequences, finding conserved patterns in alignments using profiles and hidden Markov models, and challenges around gene prediction from raw genomic sequences.
1) The study aimed to improve the genome annotation of Brassica rapa by assembling its transcriptome using RNA-Seq data from multiple tissues and detecting novel transcripts.
2) Over 128 RNA-Seq libraries were generated, sequenced, and quality controlled using resources on the iPlant Collaborative and XSEDE. De novo and reference-based assembly was performed.
3) The analysis detected over 3,500 novel transcripts and updated the gene models for over 28,000 genes in the B. rapa genome. The hybrid assembly approach provided a more accurate annotation compared to single methods.
The document discusses genome assembly and finishing processes. It begins by outlining typical project goals of completely restoring the genome and producing a high-quality consensus sequence. It then describes the evolution of sequencing technologies from Sanger to newer platforms and their impact on draft assemblies. Key steps in the assembly and finishing process include library preparation, assembly, identifying gaps, and improving consensus quality.
This document discusses progress in plant genome sequencing. It begins by outlining advances in DNA sequencing technologies that have improved data quality and volume. It then describes the diversity of plant genomes and challenges of sequencing and assembly. Key applications of plant genome sequencing discussed include model genomes, crop genomes, sequencing plant biodiversity, rare/threatened species, and pan-genomes. Details are provided on the steps of sequencing and assembling plant genomes. Emerging long-read sequencing technologies and their advantages over short-read techniques are also summarized.
Similar to genomation: summary of genomic intervals (20)
Data analysis patterns, tools and data types in genomicsAltuna Akalin
This document provides an overview of common data analysis patterns, tools, and data types used in genomics. It discusses the general genomics workflow, which includes data collection, quality checking and cleaning, processing, exploratory analysis and modeling, and visualization and reporting. For each step, examples of relevant tools and file formats used in genomics are provided. The document emphasizes that visualization and exploration of data is an important part of analysis. It provides advice for getting started with learning bioinformatics, including using programming languages like R and Galaxy, getting help from others, and not worrying about perfection initially.
This document announces a three-day computational genomics course on bisulfite sequencing analysis and data integration to be held May 23-25, 2018 in Berlin, Germany. The course will cover bisulfite sequencing background and alignment on day one, data integration and visualization on day two, and segmentation and differential methylation on day three. It provides the names and affiliations of the four instructors and a link to apply on the Berlin Institute for Medical Systems Biology website.
Computational genomics and RNA biology summer school | BerlinAltuna Akalin
The document announces a summer school from September 25-29 on computational genomics and RNA biology. It lists keynote lecturers from various universities who will speak, including Rolf Backofen, Peter Stadler, Uwe Ohler, and Nikolaus Rajewsky. A tentative schedule and more information can be found on the provided website, with an application deadline of July 1, 2017.
Epigenetic dysregulation talk at NGSAsia 2016 Altuna Akalin
The document discusses epigenetic dysregulation in acute myeloid leukemia (AML). It describes how mutations in genes involved in epigenetic regulation, like IDH1/2, TET2, and WT1, can disrupt the production of 5-hydroxymethylcytosine (5hmC) and lead to epigenetic dysregulation in AML. The document outlines strategies for analyzing bisulfite sequencing data to identify differentially methylated regions and changes in methylation, and how to integrate these analyses with other genomic data.
This document summarizes the Bioinformatics Platform at the Max Delbruck Center for Molecular Medicine in Berlin-Buch. The platform provides data analysis and computational resources to support genomic and epigenomic research. Services include maintaining databases and scientific software, providing scientific IT support, training courses, and collaborating with researchers on data analysis projects. Walk-in consultations are also offered monthly to help with bioinformatics problems.
A bioinformatics hackathon will take place on September 14-15, 2015 in Berlin where participants can pitch their own ideas or work on suggested RNA bioinformatics topics using public data to build open source data analysis and processing tools. Registration is available online.
The document introduces analyzing methylation data from Reduced Representation Bisulfite Sequencing (RRBS) experiments using the R package methylKit. It begins with an introduction and outline, then covers downloading example RRBS data, basics of R including vectors, matrices, and data frames, genomics and R packages for working with genomic intervals, and using methylKit to analyze RRBS methylation data.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Generating privacy-protected synthetic data using Secludy and Milvus
genomation: summary of genomic intervals
1. genomation
package
genomation
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
a toolkit to summarize, annotate and visualize genomic intervals
Using
genomation
Altuna Akalın1
More
information
February 24, 2014
1*
presented by. Package developed by Altuna Akalın and Vedran Franke
2. Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1
Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
3. Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1
2
Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
4. Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1
2
More
information
Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3
Visualize genomic interval summaries as meta-region plots or
heatmaps.
5. Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1
2
More
information
Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3
4
Visualize genomic interval summaries as meta-region plots or
heatmaps.
Work with multiple file formats
e.g. BAM, BED, bigWig, GFF and generic tabular text files
containing chromosome location information.
6. Quick introduction
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
The genomation is an R package that expedites genomic
interval summary and annotation. It has the following features
1
2
More
information
Annotation of genomic intervals: e.g. see what % of your
intervals overlap with exon/intron/promoters
Summary of genomic scores or read coverages over pre-defined
regions
e.g. extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3
4
Visualize genomic interval summaries as meta-region plots or
heatmaps.
Work with multiple file formats
e.g. BAM, BED, bigWig, GFF and generic tabular text files
containing chromosome location information.
5
do all these in R :)
7. Genomic interval summaries are widely used
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Summaries of genomic intervals are one of the useful ways to
communicate high-dimensional data
Traditionally, regions of interest are picked and distribution of
genomic intervals are summarized on those regions
8. Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Figure : Erkek, S., et al. (2013). Molecular determinants of nucleosome
retention at CpG-rich sequences in mouse spermatozoa. Nature Structural
& Molecular Biology
9. Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Figure : Stadler, M., Murr, R., Burger, L., et al. (2011). DNA-binding
factors shape the mouse methylome at distal regulatory regions. Nature
10. Utility and futility of average profiles
genomation
package
average profile around anchor
4.5
average score
More
information
5.0
Using
genomation
4.0
Usage and
ubiquity of
genomic
interval
summaries
Does this mean all of the windows (viewpoints) have a similar
enrichment profile?
3.5
Altuna Akalın
−100
0
bases
50
100
11. Utility and futility of average profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Only 1/3 of windows have such enrichment. Be careful when you are
interpreting the average profiles.
21
Using
genomation
0
5.2
10
16
More
information
−100
0
50 100
12. Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Figure : Lister, R., et al. (2009). Human DNA methylomes at base
resolution show widespread epigenomic differences. Nature
13. Genomic interval summaries are widely used:
Examples from literature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Figure : Feng, S. et al. (2010). Conservation and divergence of
methylation patterning in plants and animals. PNAS
14. Issues to keep in mind when developing summary
methods
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic data comes in many formats, we need a method that is
able to work with multiple flat file formats
We need a method that is not specialized on one type of data
set such as read counts, it should also work on other scoring
schemes(e.g. conservation scores) easily.
Regions of interest are not always equi-width, you should be able
to normalize for length differences by binning.
Multiple visualization options and fast heatmap generation
should be available
Clustering of regions based on multiple summaries (e.g.
binding for different TFs on the same set of regions) on the
heatmap
Ease of use, it should not take hours of coding to generate and
visualize summaries.
15. Overview of genomation features
genomation
package
Altuna Akalın
region 2
region 3
region 4
...
region m
1.1
1.0
0.8
0.86
TF4
TF3
0.6
0.6
TF2
0.34
TF1
0.072
.
.
.
.
. . . . . .
.
TF1
0
0.2
region 1
TF3
TF2
0.4
read per million
1 2 3 4 ... n
0
500
1000
base-pairs around anchor
heatmaps for genomic interval sets
TF 4
TF 3
TF 2
TF 1
Visualize
0 0.5 1 1.5 2 2.5
0
100
0
0
0
0 0.5 1 1.5 2 2.5
500
500
100
0
0
100
0
0 0.5 1 1.5 2
0
Annotate
500
Annotation
BED
GFF
Tab txt
GRanges
500
1000
base-pairs around anchor
500
More
information
Genomic Intervals
BAM
BigWig
BED
GFF
Summarize
Tab txt
GRanges
meta-region heatmaps
TF4
100
Using
genomation
meta-region plots
ScoreMatrix/ScoreMatrixList object
Base-pairs/ bins
0.0
Usage and
ubiquity of
genomic
interval
summaries
0 0.5 1 1.5 2 2.5
Piecharts for annotation
25.7
21.8
11.6
40.9
Intergenic
Intron
Exon
Promoter
16. installation of the package and the example data
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
We can install the package and the data using install_github()
function from the devtools package.
#install dependencies
install.packages( c("data.table","plyr","reshape2","ggplot2",
"gridBase","devtools"))
source("http://bioconductor.org/biocLite.R")
biocLite(c("GenomicRanges","rtracklayer","impute","Rsamtools"))
# install the packages
library(devtools)
install_github("genomation", username = "al2na")
# install the data package
# needed for examples
install_github("genomationData", username = "al2na")
17. Data import
genomation
package
Altuna Akalın
Various file formats can be used in genomation. You can read in
annotation or your genomic intervals of interest.
Usage and
ubiquity of
genomic
interval
summaries
library(genomation)
tab.file1 <- system.file("extdata/tab1.bed", package = "genomation")
readGeneric(tab.file1)
Using
genomation
More
information
## GRanges with 6
##
seqnames
##
<Rle>
##
[1]
chr21
##
[2]
chr21
##
[3]
chr21
##
[4]
chr21
##
[5]
chr21
##
[6]
chr21
##
--##
seqlengths:
##
chr21
##
NA
ranges and 0 metadata columns:
ranges strand
<IRanges> <Rle>
[9437272, 9439473]
*
[9483485, 9484663]
*
[9647866, 9648116]
*
[9708935, 9709231]
*
[9825442, 9826296]
*
[9909011, 9909218]
*
18. Extraction of data over pre-defined genomic regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
ScoreMatrix() and ScoreMatrixBin() are functions used to extract
data over predefined windows.
ScoreMatrix is used when all of the windows have the same
width (e.g. region around TSS)
Using
genomation
ScoreMatrixBin is designed for use with windows of unequal
width (e.g. enrichment of methylation over exons).
More
information
data(cage)
data(promoters)
sm <- ScoreMatrix(target = cage, windows = promoters)
sm
##
scoreMatrix with dims:
1055 2001
19. Visualizing ScoreMatrix: summary of genomic
invervals over pre-defined regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
plotMeta(),heatMeta(), heatMatrix() and multiHeatMatrix()
are the visualization functions.
oldmar <- par()$mar
par(oma = c(0, 0, 0, 0))
heatMatrix(sm, xcoords = c(-1000, 1000))
plotMeta(sm, xcoords = c(-1000, 1000),line.col="blue")
par(oma = oldmar)
0.15
average score
0.10
0.05
0.00
0
0.75
1.5
2.2
3
0.20
0.25
More
information
−1000
−500
0
500
1000
−1000
−500
0
bases
500
1000
20. Working with BAM files
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()
functions
bam.file = system.file('tests/test.bam', package='genomation')
windows = GRanges(rep(c(1,2),each=2),
IRanges(rep(c(1,2), times=2), width=5))
scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam')
21. Working with bigWig files
DHS around TSS
14
More
information
8 10
Using
genomation
my.bed12.file=system.file("extdata/chr21.refseq.hg19.bed",
package = "genomation")
feats=readTranscriptFeatures(my.bed12.file,up.flank=500,down.flank=500)
sm=ScoreMatrix(target="wgEncodeUwDnaseA549RawRep1.bw",
windows=feats$promoters,type='bigWig',strand.aware=TRUE)
plotMeta(sm,xcoords=c(-500,500),main="DHS around TSS",line.col="blue")
average score
Usage and
ubiquity of
genomic
interval
summaries
6
Altuna Akalın
ScoreMatrix() and ScoreMatrixBin() are functions can handle
bigWig files. Here we use ENCODE DHS scores, downloaded from
http://goo.gl/fEVu0g
4
genomation
package
−400
0
200
22. Multiple profiles
0
0
0
50
25
0
25
−2
50
50
−5 0
00
0
0
25
−2
50
50
−5 0
00
0
0
25
0
P300 Suz12 Rad21 Znf143
−2
50
0
0
−5
00
CTCF
50
−5 0
00
More
information
25
Using
genomation
ctcf.peaks=readRDS("ctcf.peaks.rds")
dataPath = system.file("extdata", package = "genomationData")
bam.files = list.files(dataPath, full= T,pattern = "bam$")[c(1:4,6)]
sml = ScoreMatrixList(bam.files, ctcf.peaks, bin.num = 50,type = "bam")
names(sml)=c("CTCF","P300","Suz12","Rad21","Znf143")
multiHeatMatrix(sml, xcoords = c(-500, 500),cex.axis=0.35,common.scale = T,
col = c("lightgray", "blue"),winsorize=c(0,95))
−2
50
Usage and
ubiquity of
genomic
interval
summaries
50
−5 0
00
Altuna Akalın
Multiple heatmap profiles can be plotted using multiHeatMatrix()
which takes in a ScoreMatrixList object. Here we used CTCF , P300
, Suz12 ,Rad21, Znf143 BAM files from genomationData package.
−2
50
genomation
package
02468 02468 02468 02468 02468
23. Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
multiHeatMatrix() can also apply K-means clustering. Extreme
values are trimmed using with “winsorize” argument
multiHeatMatrix(sml, xcoords = c(-500, 500),kmeans=TRUE,k=3,common.scale = T,
cex.axis=0.4,col = c("lightgray", "blue"),winsorize=c(0,95))
CTCF
P300 Suz12 Rad21 Znf143
1
More
information
2
0
0
0
50
25
0
25
0
−500
50
0
−2
50
0
25
0
−500
50
0
−2
50
0
25
0
−500
50
0
−2
50
0
50
0
−500
50
0
−2
50
25
−2
−5
00
3
02468 02468 02468 02468 02468
24. Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
Multiple average profiles can be visualized with heatMeta(). Here,
we also apply a scaling function to all the matrices.
# take log2 of all matrices
sml2=scaleScoreMatrixList(sml,scalefun=function(x) log2(x+1))
heatMeta(sml2,legend.name="average profiles",xcoords=c(-500, 500),
xlab="bp around peaks")
More
information
1.8
CTCF
average profiles
0.61
1
1.4
P300
Suz12
0.21
Rad21
Znf143
−400
−200
0
bp around peaks
200
400
25. Multiple profiles
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2,profile.names=names(sml2),
xcoords=c(-500, 500),
main="mult. profiles")
Using
genomation
mult. profiles
More
information
1.0
0.5
average score
1.5
CTCF
P300
Suz12
Rad21
Znf143
−400
−200
0
bases
200
400
26. Future work...
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Explore overlap statistics between two genomic data sets: Does
TF1 binding site locations overlap with TF2 sites more than
expected?
This is previously explored with GenometriCorr package. These
functionality can be included in the form of a dependency.
Performance improvement on certain functions, faster is always
better...
27. Further information
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
The genomation package is available at
http:/al2na.github.io/genomation. You can find the link
to the vignette on the webpage as well.
Code that generated this presentation is available at
http://github.com/al2na/genomation_presentation
Questions and bug reports
You can view/open issues in github
https://github.com/al2na/genomation/issues?state=open
You can ask questions by sending an e-mail to
genomation@googlegroups.com or using the web interface to
google groups
Developed by Altuna Akalın and Vedran Franke
28. Session Info
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
sessionInfo()
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] methods
grid
stats
[8] base
graphics
grDevices utils
other attached packages:
[1] genomation_0.99.0.2 knitr_1.5
loaded via a namespace (and not attached):
[1] BSgenome_1.30.0
BiocGenerics_0.8.0
[4] GenomicRanges_1.14.3 IRanges_1.20.5
[7] RColorBrewer_1.0-5
RCurl_1.95-4.1
[10] XML_3.95-0.2
XVector_0.2.0
[13] colorspace_1.2-4
data.table_1.8.10
[16] digest_0.6.3
evaluate_0.5.1
[19] ggplot2_0.9.3.1
gridBase_0.4-6
Biostrings_2.30.0
MASS_7.3-29
Rsamtools_1.14.1
bitops_1.0-6
dichromat_2.0-0
formatR_0.10
gtable_0.1.2
datas