Presentation at 2019 ASHG GRC/GIAB workshop describing features and recent updates to the vg toolkit, including examples of comparisons to other methods used for alignment and variant detection.
The MANE Project aims to define a standardized set of "representative" transcripts for protein-coding genes by selecting a single "default" transcript (MANE Select) and additional well-supported transcripts (MANE Plus) that are identical between RefSeq and Ensembl. This will provide a consistent annotation across genomic resources. So far, MANE Select covers 67% of the genome and aims to reach 75-80% coverage by the end of the year. While MANE cannot capture full biological complexity, it establishes a baseline for clinical reporting and comparative genomics.
Presentation at 2019 ASHG GRC/GIAB workshop describing history of the human reference genome, current curation efforts and future plans, and the relationship of all 3 to efforts to produce a human pan-genome.
Platform presentation at ASHG 2019 describing recent updates to the human reference genome assembly (GRCh38) and future plans with relevance to pan-genomic representations.
Speaker: Benedict C. S. Cross, PhD, Team leader (Discovery Screening), Horizon Discovery
CRISPR–Cas9 mediated genome editing provides a highly efficient way to probe gene function. Using this technology, thousands of genes can be knocked out and their function assessed in a single experiment. We have conducted over 150 of these complex and powerful screens and will use our experience to guide you through the process of screen design, performance and analysis.
We'll be discussing:
• How to use CRISPR screening for target ID and validation, understanding drug MOA and patient stratification
• The screen design, quality control and how to evaluate success of your screening program
• Horizon’s latest developments to the platform
• Horizon’s novel approaches to target validation screening
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
The document summarizes a review of a recent Nature paper that presents a draft human pangenome reference assembled from 47 genetically diverse individuals. Key points:
- The draft pangenome improves upon the current human genome reference by capturing more genetic diversity and revealing new variants and sequences.
- It was constructed by assembling high-quality genomes from diverse individuals using long-read sequencing and integrating them into a graph-based reference structure.
- Evaluation showed it captured over 99% of expected sequences with high accuracy and identified new variants, improving tools for variant discovery and disease research.
The document describes an RNA-seq analysis workflow that includes:
1. Preprocessing raw reads including quality control, filtering, and alignment to a reference genome using tools like FastQC, Bowtie2, and TopHat.
2. Assembling transcripts and estimating abundance using Cufflinks and HTseq-count.
3. Identifying differentially expressed genes between samples using DESeq and Cuffdiff.
4. Providing gene annotations and visualizing results using tools like GO, KEGG, and CummeRbund.
The workflow follows a typical reference-based analysis approach and uses various open source tools for read mapping, assembly, quantification, and differential expression.
This document provides an overview of genome-wide association studies (GWAS). It discusses the basic concept of GWAS, running and analyzing a GWAS, and interpreting the results. Key points include: GWAS genotype individuals for hundreds of thousands to millions of SNPs to look for associations with traits; extensive quality control is required; imputation can increase SNP coverage; statistical analysis includes computing p-values and correcting for multiple testing; significant findings still require replication in independent samples.
The MANE Project aims to define a standardized set of "representative" transcripts for protein-coding genes by selecting a single "default" transcript (MANE Select) and additional well-supported transcripts (MANE Plus) that are identical between RefSeq and Ensembl. This will provide a consistent annotation across genomic resources. So far, MANE Select covers 67% of the genome and aims to reach 75-80% coverage by the end of the year. While MANE cannot capture full biological complexity, it establishes a baseline for clinical reporting and comparative genomics.
Presentation at 2019 ASHG GRC/GIAB workshop describing history of the human reference genome, current curation efforts and future plans, and the relationship of all 3 to efforts to produce a human pan-genome.
Platform presentation at ASHG 2019 describing recent updates to the human reference genome assembly (GRCh38) and future plans with relevance to pan-genomic representations.
Speaker: Benedict C. S. Cross, PhD, Team leader (Discovery Screening), Horizon Discovery
CRISPR–Cas9 mediated genome editing provides a highly efficient way to probe gene function. Using this technology, thousands of genes can be knocked out and their function assessed in a single experiment. We have conducted over 150 of these complex and powerful screens and will use our experience to guide you through the process of screen design, performance and analysis.
We'll be discussing:
• How to use CRISPR screening for target ID and validation, understanding drug MOA and patient stratification
• The screen design, quality control and how to evaluate success of your screening program
• Horizon’s latest developments to the platform
• Horizon’s novel approaches to target validation screening
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
The document summarizes a review of a recent Nature paper that presents a draft human pangenome reference assembled from 47 genetically diverse individuals. Key points:
- The draft pangenome improves upon the current human genome reference by capturing more genetic diversity and revealing new variants and sequences.
- It was constructed by assembling high-quality genomes from diverse individuals using long-read sequencing and integrating them into a graph-based reference structure.
- Evaluation showed it captured over 99% of expected sequences with high accuracy and identified new variants, improving tools for variant discovery and disease research.
The document describes an RNA-seq analysis workflow that includes:
1. Preprocessing raw reads including quality control, filtering, and alignment to a reference genome using tools like FastQC, Bowtie2, and TopHat.
2. Assembling transcripts and estimating abundance using Cufflinks and HTseq-count.
3. Identifying differentially expressed genes between samples using DESeq and Cuffdiff.
4. Providing gene annotations and visualizing results using tools like GO, KEGG, and CummeRbund.
The workflow follows a typical reference-based analysis approach and uses various open source tools for read mapping, assembly, quantification, and differential expression.
This document provides an overview of genome-wide association studies (GWAS). It discusses the basic concept of GWAS, running and analyzing a GWAS, and interpreting the results. Key points include: GWAS genotype individuals for hundreds of thousands to millions of SNPs to look for associations with traits; extensive quality control is required; imputation can increase SNP coverage; statistical analysis includes computing p-values and correcting for multiple testing; significant findings still require replication in independent samples.
Transgene-free CRISPR/Cas9 genome-editing methods in plantsCIAT
"Transgene-free CRISPR/Cas9 genome-editing methods in plants" by Matthew R. Willmann, Ph.D. Director, Plant Transformation Facility College of Agriculture and Life Sciences, School of Integrative Plant Science, Cornell University.
This document summarizes Shivendra Kumar's class presentation on SNP genotyping using KASP. It introduces SNP genotyping and the KASP platform. It describes using KASP to genotype a wheat mapping population derived from a cross between an introgression line containing stripe rust resistance genes and a susceptible cultivar. KASP markers were developed and used to map the resistance genes. One candidate resistance gene was identified and further analyzed through expression studies and development of a linked KASP marker. Recombinants were identified and confirmed through additional KASP genotyping.
This document provides an overview of DNA sequencing technologies. It begins with a brief history of DNA sequencing, including the discovery of DNA's structure and Sanger sequencing. The document then focuses on next generation sequencing technologies, describing several platforms such as 454 sequencing, Illumina sequencing, Ion Torrent sequencing, and Pacific Biosciences sequencing. It also discusses third generation sequencing and compares the sequencing approaches, workflows, and applications of various sequencing technologies. In conclusion, the document notes the progress and future directions of sequencing, including increased clinical applications and reduced costs.
New Generation Sequencing Technologies: an overviewPaolo Dametto
The document provides a history of DNA sequencing technologies. It begins with the discovery of DNA's structure in 1953 and the development of recombinant DNA technology in the 1970s. First generation Sanger sequencing produced short reads over 1,000 years to sequence the human genome. Next generation sequencing (NGS) platforms since 2005 have dramatically reduced costs while increasing throughput. NGS methods like Roche/454 pyrosequencing, Illumina/Solexa sequencing by synthesis, SOLiD ligation sequencing, and single-molecule real-time sequencing by Pacific Biosciences now enable large-scale genome and transcriptome analysis.
The document provides an overview of gene knockout and homology-directed repair using CRISPR. It discusses designing guide RNAs and comparing delivery methods like lipofection, electroporation, and microinjection. It also covers designing repair templates for homology-directed repair to insert or change DNA sequences. Optimization of guide RNAs, delivery method, and repair template design can improve genome editing efficiency.
Genotyping by Sequencing is a robust,fast and cheap approach for high throughput marker discovery.It has applications in crop improvement programs by enhancing identification of superior genotypes.
1. Karen Miga presented on reaching complete telomere-to-telomere chromosome assemblies using long-read sequencing.
2. The current human reference genome is incomplete, with gaps and unresolved issues remaining.
3. Miga's lab generated a high-quality assembly of chromosome 21 for the CHM13 genome using Oxford Nanopore long reads totaling 155 GB of sequence data.
Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.1- Next Generation Sequencing. Technologies and Applications. Part I: NGS Introduction and Technology Overview.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Handout available at
https://drive.google.com/open?id=0B52D4j3rC4WDN2hRV25CaW9kb0E
Define DNA sequencing.
Describe the history of DNA sequencing.
Describe the steps of Sanger sequencing.
Define next generation sequencing.
Describe the steps of next generation DNA sequencing.
Describe use of DNA sequencing.
Struggling with low editing efficiency or delivery problems? IDT has developed a simple and affordable CRISPR-Cas9 solution that outperforms other methods. In this presentation we present the advantages of using a Cas9:tracrRNA:crRNA ribonucleoprotein (RNP) complex in genome editing experiments, and explain why it is the most efficient driver for genome editing compared to alternative methods, such as expression plasmids or the use of sgRNAs. We also review RNP delivery using cationic lipids and electroporation, and provide tips for optimized transfection in your system.
1. CRISPR is a bacterial immune system that provides immunity against viruses. It works by matching DNA sequences from viruses to DNA spacers and cleaving invading DNA.
2. The type II CRISPR system is the most well studied and requires only three components - Cas9 protein, CRISPR RNA (crRNA), and trans-activating CRISPR RNA (tracrRNA) - to function. Combining crRNA and tracrRNA into a single RNA molecule called sgRNA was crucial for developing the CRISPR technique.
3. CRISPR technology has become a powerful genome editing tool due to its simplicity, high efficiency, low cost, and ease of use. It allows targeted
This document provides an overview of CRISPR/Cas9 genome editing. It discusses the history and limitations of prior genome engineering techniques like recombinant DNA and zinc finger nucleases. It then explains how CRISPR/Cas9 works as a RNA-guided DNA endonuclease and how this allows it to efficiently and specifically edit genomes. The document outlines several applications of CRISPR/Cas9 like generating knockout animals and cell lines. It also notes some concerns about using the technique for human genome editing.
microRNA in Plant Defence and Pathogen Counter-defenceMahtab Rashid
The presentation is about the role of microRNA in plant defence and the pathogen counter-defences which they adopt to escape or evade the plant defence mechanism.
Primer design is a key step in PCR that requires considering various factors to optimize the reaction. These include primer length, melting temperature, GC content, specificity, and potential for secondary structures. Well-designed primers are unique in targeting a single region, have compatible melting temperatures, and do not form hairpins or primer dimers that could inhibit the reaction. Choosing appropriate primers is essential for successful PCR amplification.
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
This document discusses de novo genome assembly, which is the process of reconstructing long genomic sequences from many short sequencing reads without the aid of a reference genome. It is challenging due to factors like short read lengths, repetitive sequences that complicate the assembly graph, and sequencing errors. The goals of assembly are to produce contiguous sequences with high completeness and correctness by resolving overlaps between reads into consensus sequences. Metrics like N50, core gene content, and read remapping are used to assess assembly quality.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)Akshay Deshmukh
clustered regularly interspaced short palindromic repeats is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria. Now CRISPR use as genome editing tool in different Plant Breeder to manipulate the DNA of the crop
The Genome in a Bottle Consortium provides reference samples and genomic data to enable the development and benchmarking of genome analysis tools, and they have recently published new resources from linked and long read sequencing including 10x Genomics, PacBio, and Oxford Nanopore data, which have expanded their small variant benchmark set and enabled the development of a structural variant benchmark.
Transgene-free CRISPR/Cas9 genome-editing methods in plantsCIAT
"Transgene-free CRISPR/Cas9 genome-editing methods in plants" by Matthew R. Willmann, Ph.D. Director, Plant Transformation Facility College of Agriculture and Life Sciences, School of Integrative Plant Science, Cornell University.
This document summarizes Shivendra Kumar's class presentation on SNP genotyping using KASP. It introduces SNP genotyping and the KASP platform. It describes using KASP to genotype a wheat mapping population derived from a cross between an introgression line containing stripe rust resistance genes and a susceptible cultivar. KASP markers were developed and used to map the resistance genes. One candidate resistance gene was identified and further analyzed through expression studies and development of a linked KASP marker. Recombinants were identified and confirmed through additional KASP genotyping.
This document provides an overview of DNA sequencing technologies. It begins with a brief history of DNA sequencing, including the discovery of DNA's structure and Sanger sequencing. The document then focuses on next generation sequencing technologies, describing several platforms such as 454 sequencing, Illumina sequencing, Ion Torrent sequencing, and Pacific Biosciences sequencing. It also discusses third generation sequencing and compares the sequencing approaches, workflows, and applications of various sequencing technologies. In conclusion, the document notes the progress and future directions of sequencing, including increased clinical applications and reduced costs.
New Generation Sequencing Technologies: an overviewPaolo Dametto
The document provides a history of DNA sequencing technologies. It begins with the discovery of DNA's structure in 1953 and the development of recombinant DNA technology in the 1970s. First generation Sanger sequencing produced short reads over 1,000 years to sequence the human genome. Next generation sequencing (NGS) platforms since 2005 have dramatically reduced costs while increasing throughput. NGS methods like Roche/454 pyrosequencing, Illumina/Solexa sequencing by synthesis, SOLiD ligation sequencing, and single-molecule real-time sequencing by Pacific Biosciences now enable large-scale genome and transcriptome analysis.
The document provides an overview of gene knockout and homology-directed repair using CRISPR. It discusses designing guide RNAs and comparing delivery methods like lipofection, electroporation, and microinjection. It also covers designing repair templates for homology-directed repair to insert or change DNA sequences. Optimization of guide RNAs, delivery method, and repair template design can improve genome editing efficiency.
Genotyping by Sequencing is a robust,fast and cheap approach for high throughput marker discovery.It has applications in crop improvement programs by enhancing identification of superior genotypes.
1. Karen Miga presented on reaching complete telomere-to-telomere chromosome assemblies using long-read sequencing.
2. The current human reference genome is incomplete, with gaps and unresolved issues remaining.
3. Miga's lab generated a high-quality assembly of chromosome 21 for the CHM13 genome using Oxford Nanopore long reads totaling 155 GB of sequence data.
Course: Bioinformatics for Biomedical Research (2014).
Session: 2.1.1- Next Generation Sequencing. Technologies and Applications. Part I: NGS Introduction and Technology Overview.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Handout available at
https://drive.google.com/open?id=0B52D4j3rC4WDN2hRV25CaW9kb0E
Define DNA sequencing.
Describe the history of DNA sequencing.
Describe the steps of Sanger sequencing.
Define next generation sequencing.
Describe the steps of next generation DNA sequencing.
Describe use of DNA sequencing.
Struggling with low editing efficiency or delivery problems? IDT has developed a simple and affordable CRISPR-Cas9 solution that outperforms other methods. In this presentation we present the advantages of using a Cas9:tracrRNA:crRNA ribonucleoprotein (RNP) complex in genome editing experiments, and explain why it is the most efficient driver for genome editing compared to alternative methods, such as expression plasmids or the use of sgRNAs. We also review RNP delivery using cationic lipids and electroporation, and provide tips for optimized transfection in your system.
1. CRISPR is a bacterial immune system that provides immunity against viruses. It works by matching DNA sequences from viruses to DNA spacers and cleaving invading DNA.
2. The type II CRISPR system is the most well studied and requires only three components - Cas9 protein, CRISPR RNA (crRNA), and trans-activating CRISPR RNA (tracrRNA) - to function. Combining crRNA and tracrRNA into a single RNA molecule called sgRNA was crucial for developing the CRISPR technique.
3. CRISPR technology has become a powerful genome editing tool due to its simplicity, high efficiency, low cost, and ease of use. It allows targeted
This document provides an overview of CRISPR/Cas9 genome editing. It discusses the history and limitations of prior genome engineering techniques like recombinant DNA and zinc finger nucleases. It then explains how CRISPR/Cas9 works as a RNA-guided DNA endonuclease and how this allows it to efficiently and specifically edit genomes. The document outlines several applications of CRISPR/Cas9 like generating knockout animals and cell lines. It also notes some concerns about using the technique for human genome editing.
microRNA in Plant Defence and Pathogen Counter-defenceMahtab Rashid
The presentation is about the role of microRNA in plant defence and the pathogen counter-defences which they adopt to escape or evade the plant defence mechanism.
Primer design is a key step in PCR that requires considering various factors to optimize the reaction. These include primer length, melting temperature, GC content, specificity, and potential for secondary structures. Well-designed primers are unique in targeting a single region, have compatible melting temperatures, and do not form hairpins or primer dimers that could inhibit the reaction. Choosing appropriate primers is essential for successful PCR amplification.
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
This document discusses de novo genome assembly, which is the process of reconstructing long genomic sequences from many short sequencing reads without the aid of a reference genome. It is challenging due to factors like short read lengths, repetitive sequences that complicate the assembly graph, and sequencing errors. The goals of assembly are to produce contiguous sequences with high completeness and correctness by resolving overlaps between reads into consensus sequences. Metrics like N50, core gene content, and read remapping are used to assess assembly quality.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)Akshay Deshmukh
clustered regularly interspaced short palindromic repeats is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria. Now CRISPR use as genome editing tool in different Plant Breeder to manipulate the DNA of the crop
The Genome in a Bottle Consortium provides reference samples and genomic data to enable the development and benchmarking of genome analysis tools, and they have recently published new resources from linked and long read sequencing including 10x Genomics, PacBio, and Oxford Nanopore data, which have expanded their small variant benchmark set and enabled the development of a structural variant benchmark.
Peter krusche population based targeted validation of structural variant brea...GenomeInABottle
This document summarizes Illumina's process for validating structural variants using population data. It describes several community resources that are used, including Platinum Genomes and Polaris datasets. The validation process involves jointly calling variants in these populations and checking for Mendelian consistency and Hardy-Weinberg equilibrium. Variants are compared between calls sets to identify representations needing improvement. Future plans include genome-wide graph realignment and validation and improving targeted validation tools.
Integration of single molecule, genome mapping data in a web-based genome bro...William Chow
Sequence, Finishing and the Future Conference (SFAF 2015) Poster submission. Santa Fe, New Mexico.
Poster describes the gEVAL browser, and the integration of genome/optical map data for use in evaluating/curation of genome assemblies. Human, Mouse, Zebrafish, Pig, Helminth, Chicken.
VectorBase - PopGenBase Meeting at ASTMH08Yoosook Lee
PopGenBase introduction meeting at the American Society of Tropical Medicine and Hygiene targeted for vector biologists, ecologist and epidemiologists. PopGenBase are designed to hold genetic variation data for natural populations of mosquitoes.
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
This document summarizes Golden Helix's capabilities for handling big data in genomics. It discusses how Golden Helix's software tools like VarSeq and the Variant Storage Warehouse can scale to handle large volumes of genomic and clinical data from whole genome sequencing, exome sequencing, and large population studies. It provides examples of how these tools have been used for clinical and research applications like variant filtering, annotation, and analysis of rare variants. The presentation concludes with a demonstration of the software.
Nils Gehlenborg presented on visualizing 3D genome data. He discussed how Hi-C data captures chromosomal interactions to measure 3D genome structure, but results in massive interaction matrices. Existing tools have limitations in scale, comparison across conditions, and navigation. Gehlenborg demonstrated HiGlass, an interactive multi-resolution viewer for exploring these matrices. He also introduced HiPiler for investigating detected patterns, by filtering, aggregating, and correlating them with other data. HiGlass and HiPiler address challenges of visualizing 3D genome data at different scales and across many samples.
GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer rese...Maulik Kamdar
The document describes GenomeSnip, a semantic visual analytics platform for knowledge exploration and discovery in cancer research. It integrates multiple genomic and biomedical datasets and uses a circular "Genomic Wheel" visualization and linear "Genomic Tracks" display to show relationships between chromosomes and allow analysis of genomic features. An evaluation found it effectively integrated knowledge and enabled simultaneous, dynamic analysis compared to other tools. Future work includes integrating it with genome browsers for improved linear visualization of tracks.
The document provides information about an atlas of alternative polyadenylation (APA) outliers and rare variants. It describes the data and methods used, including collecting RNA-seq and whole genome sequencing data from the GTEx project. The data was analyzed to identify APA outliers in single tissues and across multiple tissues. The document then outlines a three part guide for querying outliers of interest, viewing them in a genome browser, and downloading the associated data.
Alzheimer’s disease (AD) is a devastating neurodegenerative disease that is genetically complex. Although great progress has been made in identifying fully penetrant mutations in genes that cause early-onset AD, these still represent a very small percentage of AD cases. Large-scale, genome-wide association studies (GWAS) have identified at least 20 additional genetic risk loci for the more common form: late-onset AD. However, the identified SNPs are typically not the actual risk variants, but are in linkage disequilibrium with the presumed causative variants [1].
To help identify causative genetic variants, we have combined highly accurate, long-read sequencing with hybrid-capture technology. In this collaborative webinar*, we present this method and show how combining IDT xGen® Lockdown® Probes with PacBio SMRT® Sequencing allows targeting and sequencing of candidate genes from genomic DNA and corresponding transcripts from cDNA. Using a panel of target capture probes for 35 AD candidate genes, we demonstrate the power of this approach by looking at data for two individuals with AD. Some additional benefits of this method include the ability to leverage long reads, phase heterozygous variants, and link corresponding transcript isoforms to their respective alleles.
Reference: 1. Van Cauwenberghe C, Van Broeckhoven C, Sleegers K. (2016) The genetic landscape of Alzheimer disease: clinical implications and perspectives. Genet Med, 18(5):421–430.
* This presentation represents a collaboration between Pacific Biosciences and Integrated DNA Technologies. The individual opinions expressed may not reflect shared opinions of Pacific Biosciences and Integrated DNA Technologies.
1) The document describes a computational tool called RASA-GD that was developed to extract and analyze genetic data from GenBank files in a more readable graphical format.
2) RASA-GD accepts GenBank files as input, extracts information on coding regions, exons, introns and other elements, and calculates length statistics which are visualized graphically.
3) As a demonstration, the tool was used to analyze variation in the FOXP2 gene, which is responsible for speech, across different species. The output provided insights into the evolutionary history of this important gene.
1) The document describes a computational tool called RASA-GD that was developed to extract and analyze genetic data from GenBank files in a more readable graphical format.
2) RASA-GD accepts GenBank files as input, extracts information on coding regions, exons, introns and other elements, and calculates statistical values and generates graphs of the length variations.
3) As a demonstration, the tool was used to visualize and analyze variations in the foxp2 gene, which is responsible for speech, across different species. The data generated insights into the evolutionary history of this gene.
The document describes the Cassava Genome Hub, which provides big genomic data management and analysis resources for cassava. It discusses how the hub handles big data through its architecture and tools. The hub stores terabytes of cassava genomic, transcriptomic and other omics data. It provides tools like JBrowse, SNiPlay, GIGWA and Galaxy to enable visualization, exploration and analysis of the large datasets.
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
The document discusses how research objects and computational workflows can help capture experimental processes and reproduce findings in life sciences research. It describes a computational experiment evaluating three genome assembly algorithms on bacterial, insect, and human genomes. Key steps included identifying resources, designing the experimental workflow, running the experiment in Galaxy, and publishing results as nanopublications aggregated in a research object to enable verification and reuse. The goal is to improve reproducibility by making experimental descriptions and reviews more structured and transparent.
This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
The Rat Genome Database has enhanced their genome browser, called GBrowse, to provide researchers with a comprehensive platform for visualizing rat genomic features and comparative genomics data between rat, human and mouse. The browser displays tracks of protein-coding genes, RNA genes, SNPs, CNVs, SSLPs and more. Synteny tracks show ortholog positions across species to enable comparative analysis. The browser is being updated to accommodate additional rat strain genomes and visualize strain-specific differences, furthering insight into genotype-phenotype relationships.
This document describes a method for haplotype-resolved structural variant assembly using long reads. PacBio and BioNano data are hybrid assembled to generate highly contiguous and complete haplotype-specific assemblies. The hybrid approach resolves many gaps in current references assemblies and detects complex structural variants and rearrangements. Analysis of trios from the 1000 Genomes Project and GIAB project using this pipeline detects numerous insertions, deletions, inversions and other structural variants.
The Rat Genome Database (RGD) provides several genomics tools and resources for researchers, including GBrowse for visualizing and exploring the rat genome, GViewer for viewing genomic regions associated with functions and phenotypes, and RatMine for flexible searching and analysis of integrated rat data. RGD has recently updated GBrowse with new functionality and improved performance, and also provides SNPlotyper for viewing strain-specific SNPs and the Human Genome Browser for comparative purposes. These tools integrate data from multiple sources to facilitate genomics research on the rat model.
This document provides an overview and discussion of next-generation sequencing technologies by C. Titus Brown. It begins by outlining some basics of shotgun sequencing and how increasing density leads to squared increases in the number of sequences obtained. It then discusses current costs for Illumina sequencing and the amount of data needed for different applications. Some challenges and problems with sequencing data are also reviewed, such as systematic bias, genome assembly and scaffolding difficulties, reference gene models, and mRNA isoform construction. Emerging long-read sequencing technologies are also briefly discussed.
Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Cas...StampedeCon
At the StampedeCon 2013 Big Data conference in St. Louis, Jeff Melching, Big Data Engineer and Architect at Monsanto, discussed Legacy Analysis: How Hadoop Streaming Enables Software Reuse – A Genomics Case Study. The bioinformatics domain and in particular computational genomics has always had the problem of computing analytics against very large data sets. Traditionally, these analytics have leveraged grid and compute farm technologies. Additionally, the analytics software and algorithms have been built up over the past 30 years by contributions from both the public and private domain and written in a number of programming languages. When these software packages are brought in house and combined with the skills and preferences of internal bioinformatics researchers, what you get is a myriad of different technologies linked together in an analytics pipeline. The rise of technologies like MapReduce in hadoop have made the execution of such pipelines much more efficient, but what about all those analytic pipelines I have built up over the years that aren’t written in MapReduce? Do I have to rewrite them? Do I have to know java? This talk will explain how hadoop streaming can help you reuse instead of rewriting. It will also touch on techniques for packaging and deploying hadoop applications without having to centrally manage software versions on the cluster.
Similar to Genome variation graphs with the vg toolkit (20)
Presentation at IMGC 2019 workshop describing the latest improvements to the mouse reference genome assembly and analyses performed in preparation for the next release of the mouse genome assembly (GRCm39).
Presentation at PanGenomics in the Cloud Hackathon, run by NCBI at UCSC (https://ncbiinsights.ncbi.nlm.nih.gov/2019/02/06/pangenomics-cloud-hackathon-march-2019/). Presents points to consider about the adoption of a pangenome reference, emphasizing aspects for long-term data management and wide-spread adoption.
This document discusses the Matched Annotation from NCBI and EMBL-EBI (MANE) project. The project aims to define a set of identical reference transcripts between RefSeq and Ensembl/GENCODE for protein-coding genes. The MANE set will include a "Select" transcript that is representative of each gene locus, as well as additional "Plus" and "Extended" transcripts. The methodology involves choosing the most representative transcript based on expression, conservation, and manual curation. 5' and 3' ends will be matched based on CAGE and polyA sequencing data. The project aims to initially match 50% of genes and have the dataset available in late 2018, with the goal of matching 90%
The Locus Reference Genomic (LRG) project provides stable reference sequences for reporting clinically-relevant genetic variants. It aims to address challenges of keeping track of variants over time as genomes and transcripts change. LRG records are manually curated and contain genomic, transcript, and protein sequences selected by experts that do not change versions. They are used to report variants in a standardized way compatible with HGVS nomenclature. While the LRG project is not scalable due to being fully manual, there is potential convergence with the automated but expert-guided MANE Plus project to cover more transcripts. The LRG project also works independently of the GRCh38 human reference genome, creating records for genes where an alternate sequence is preferred.
1) De novo genome assembly and phasing methods can reconstruct complete haplotype information from sequenced reads without relying on a reference genome.
2) Trio binning uses k-mer profiling of parents' genomes to separate child's reads into maternal and paternal bins before assembly, producing fully phased haplotig assemblies.
3) The human pan-genome project aims to build a collection of diverse, high-quality haplotype-resolved genomes from different populations using trio binning to improve representation of genetic variations.
Presentation by Benedict Paten at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
The document discusses challenges in characterizing structural variations across genomes and populations using long-read sequencing technologies. It describes tools developed for accurate mapping and structural variant calling from long reads, including NGMLR for mapping and Sniffles for variant calling. It also presents results of applying these tools to call structural variants in the NA12878 genome from PacBio and Nanopore long reads, detecting many more variants than from short read data. The goal is to better understand variability between genomes and impact on gene regulation.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Presentation by Karen Miga at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on centromere assemblies.
The document discusses laboratory techniques for generating high-quality genome assemblies, including PacBio long-read sequencing, 10X Genomics linked reads, and BioNano physical mapping. PacBio sequencing of various library preparation methods produced reads over 10kb in length. 10X Genomics linked reads provided long-range phasing information to resolve alleles from repeats. BioNano mapping revealed a large inversion in one genome through detection of nick sites. The integration of these long-read and long-range techniques aims to capture more human genetic diversity in reference genomes.
The document discusses new challenges in reference assembly management due to changing sequencing technologies and new resources. It summarizes an evaluation of de novo assemblies of the CHM1 and CHM13 haploid hydatidiform mole samples from different assemblers. Key metrics like contiguity, alignment to reference transcripts and protein coding regions, and error rates are compared between the assemblies and reference. The evaluation aims to assess suitability of the assemblies for use in reference curation.
The document discusses efforts to improve the human reference genome by generating new reference-grade assemblies using long-read sequencing technologies. Several human genomes are being sequenced to high coverage and assembled using PacBio long reads to generate "gold standard" references representing more of human genetic diversity. Assemblies are being improved using optical mapping data and finished BAC sequences. The improved assemblies will help represent structural variation and allelic diversity more accurately than the current reference.
This document discusses how linked-read technology from 10x Genomics now enables routine de novo diploid genome assembly from individual human samples. Linked-reads generate long-range genomic information by barcoding DNA fragments derived from the same long input molecules. This allows for improved physical coverage compared to short reads or synthetic long reads. The document shows how a single 10x Genomics library with only 1 ng of input DNA and modest sequencing can generate high-quality human genome assemblies in 2 days. De novo assembly excels at resolving structurally complex and diverse genomic regions that remain difficult for short-read and alignment-based methods.
The document discusses the human reference genome assembly GRCh38 and how to access and utilize its data. It provides an overview of assembly basics, describes how GRCh38 represents haplotypes and alternative loci. Key statistics about GRCh38 such as improved contiguity, novel sequence, and updated annotations are highlighted. Resources for accessing GRCh38 data from the Genome Reference Consortium website and other sources are also reviewed.
The summary is as follows:
1) Creating reference-grade human genome assemblies is an ongoing process as technologies improve and additional samples are sequenced to better represent global genetic diversity.
2) New long-read sequencing technologies have enabled improved assemblies of genomes like CHM1 that resolve structural variants and fill gaps compared to the current reference.
3) Additional "gold standard" genomes from diverse populations are being sequenced, assembled and improved to provide more representative references.
This document discusses the Genome in a Bottle Consortium's efforts to develop reference materials and data to evaluate whole genome sequencing performance. It summarizes the release of new reference materials, including additional Genome in a Bottle samples from the Personal Genome Project and microbial genomic DNA standards. The consortium aims to apply principles of metrology to genome analysis by generating extensively characterized reference genomes and associated data that can be used to develop and validate analysis methods.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
4. is a complete,
open source solution
for graph construction,
read mapping,
and variant calling.
https://github.com/vgteam/vg
4
5. Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.
Better read mapping in regions with variants
5
6. Better allele balances at heterozygous sites
Deletion Size Insertion Size
SNP
AltAlleleFraction
Garrison et al (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology.
6
8. Graph-based SV genotypers:
vg, Paragraph, BayesTyper
Traditional SV genotypers:
Delly, SVTyper
Genotyping SVs from
short-read data
High-quality SV catalogs from
long-read sequencing studies
(HGSVC 2019, GIAB 2019, SVPOP 2019).
Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv.
8
9. Graph from de novo assemblies
Experiment with 12 yeast strains.
- better read mapping.
- SV better supported by reads.
Hickey et al (2019). Genotyping structural variants in pangenome graphs using the vg toolkit. bioRxiv.
9
12. New graph formats optimized for memory and speed
VG: Legacy
PG (PackedGraph): Small
HG (HashGraph): Fast
OG (Optimized Dynamic Graph Index):
Balanced
XG: Small and fast, but static
Jordan Eizenga, Emily Fujimoto, Erik Garrison
https://github.com/vgteam/libbdsg
12
13. 13
Graph Burrows-Wheeler Transform
Stores 2,504 haplotypes from 1K Genomes Project in 14.6 GiB
Can scale past tens of thousands of haplotypes
Sirén et al. (2018). Haplotype-aware graph indexes. In 18th International Workshop on Algorithms in Bioinformatics (WABI)
https://github.com/jltsiren/gbwtgraph
14. Faster short read mapping with giraffe
Xian Chang, Jouni Siren, Adam Novak
Assuming that most indels are in the graph and a large
haplotype database.
- Minimizer-based indexing.
- Restrict to known haplotypes.
- Gapless extension.
- Faster clustering using a snarl tree.
14
16. Pipelines in Toil and WDL
https://github.com/vgteam/toil-vg
https://github.com/vgteam/vg_wdl
toil-vg WDL
Graph construction x
Read mapping x x
Variant calling( SNV/indels & SVs) x x
Charles Markello, Glenn Hickey, Mike Lin, Eric T. Dawson
16
17. Acknowledgements
Benedict Paten
Adam Novak
Erik Garrison
Jordan Eizenga
Glenn Hickey
Jouni Siren
David Heller
Check out Charles Markello’ Platform Talk (PgmNr 15) on applying vg for
rare variant discovery!
https://github.com/vgteam/vg
17
Jonas Sibbesen
Xian Chang
Charles Markello
Yohei Rosen
Robin Rounthwaite
Susanna Morin
Emily Fujimoto
Eric Dawson
Mike Lin
Wolfgang Beyer
Richard Durbin
Daniel Zerbino