This document summarizes the progress of sequencing the Medicago truncatula genome. Key points include:
- M. truncatula is an important forage crop and model legume with a relatively small genome that is being sequenced to further legume research.
- Initial whole genome shotgun sequencing at low coverage identified highly repetitive regions. A mapped BAC approach is now being used.
- Over 1,000 BACs have been sequenced with a goal of completing the euchromatic regions of four chromosomes. Other chromosomes are being sequenced at other institutions.
- Gene predictions from the sequenced data find a gene density of about one gene per 6.5-7.6 kb and over 13,000 genes identified
1. Single-cell RNA sequencing was performed on hematopoietic stem cells isolated from myelodysplastic syndrome patients and normal individuals to characterize heterogeneity. Cells were collected before and after treatment with decitabine from responders and non-responders.
2. Differential expression analysis identified genes dysregulated in MDS compared to normal, including pathways involved in hematopoiesis. Clusters of patients were identified based on expression of hematopoietic stem cell signature genes.
3. The study aims to understand heterogeneity in MDS, factors influencing response to therapy, and disease progression by characterizing gene expression profiles at the single-cell level. This may help identify new therapeutic targets.
1. The document describes a method called Anchored Assembly for detecting structural variants from short-read sequencing data using read overlap assembly and reference removal.
2. The method was validated against other SV detection tools using validated SVs from fosmid/PacBio sequencing, detecting 15 previously undetected SVs with high sensitivity and specificity.
3. Examples are given of validated deletions and insertions detected in an Ashkenazi Jewish trio that were identical in the offspring and followed expected inheritance patterns from parents.
1. Variation in the genome of the fungal wheat pathogen Zymoseptoria tritici facilitates rapid evolution through mechanisms like gaining virulence mutations, chromosomal rearrangements that result in gene loss or gain, and transposable element activity providing a source of evolutionary novelty.
2. Analysis of multiple Z. tritici genomes revealed a large flexible pan-genome with a small conserved core and many lineage-specific genes, facilitating adaptation to different wheat cultivars and environments. Recent losses of core genes were enriched for secreted effectors.
3. Signatures of recent strong positive selection were detected in pathogen populations, indicating adaptive evolution in response to pressures like new resistant wheat cultivars.
RefSeq curation in-depth. Examples of targeted transcript and protein curation, presented at the 8th International Biocuration conference (April, 2015).
This document summarizes a presentation given by Luke Hickey of Pacific Biosciences on human genome sequencing using PacBio systems. It discusses PacBio sequencing technology developments, sequencing and assembly of the NA12878 genome, and the role of the NIST Genome in a Bottle (GIAB) reference materials. Specifically, it notes that PacBio sequenced the GIAB Ashkenazim trio genomes to high coverage and made the data publicly available. The sequencing and assembly of these genomes helps validate and improve PacBio sequencing technologies and supports the development and release of the trio as new NIST reference materials.
Avances en genética. Utilidad de la NGS y la bioinformática.BBK Innova Sarea
27 Octubre 2014. Presentación de Pablo Lapunzina, Director del Instituto de Medicina Genética Médica y Molecular (INGEMM), de IDIPAZ y de CEBERER, en la "Jornada Avances en Genética y Tecnología Social. La experiencia de la Fundación Síndrome de Dravet ".
The document discusses curating sequence and literature data for RefSeq and Gene at the National Center for Biotechnology Information. It provides an overview of RefSeq, describing what RefSeq is, how it compares to GenBank, its advantages, and how the RefSeq dataset is built through curated data and sequence analysis. It then discusses the curation process in depth, including examples of curating genes, transcripts, proteins, and literature. It also describes the tools and quality assurance checks used in curation.
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
The so-called “next-generation” sequencing (NGS) technologies allows us, in a short time and in parallel, to sequence massive amounts of DNA, overcoming the limitations of the original Sanger sequencing methods used to sequence the first human genome. NGS technologies have had an enormous impact on biomedical research within a short time frame. This talk will give an overview of these applications with specific examples from Mendelian genomics and cancer research. #h2ony
1. Single-cell RNA sequencing was performed on hematopoietic stem cells isolated from myelodysplastic syndrome patients and normal individuals to characterize heterogeneity. Cells were collected before and after treatment with decitabine from responders and non-responders.
2. Differential expression analysis identified genes dysregulated in MDS compared to normal, including pathways involved in hematopoiesis. Clusters of patients were identified based on expression of hematopoietic stem cell signature genes.
3. The study aims to understand heterogeneity in MDS, factors influencing response to therapy, and disease progression by characterizing gene expression profiles at the single-cell level. This may help identify new therapeutic targets.
1. The document describes a method called Anchored Assembly for detecting structural variants from short-read sequencing data using read overlap assembly and reference removal.
2. The method was validated against other SV detection tools using validated SVs from fosmid/PacBio sequencing, detecting 15 previously undetected SVs with high sensitivity and specificity.
3. Examples are given of validated deletions and insertions detected in an Ashkenazi Jewish trio that were identical in the offspring and followed expected inheritance patterns from parents.
1. Variation in the genome of the fungal wheat pathogen Zymoseptoria tritici facilitates rapid evolution through mechanisms like gaining virulence mutations, chromosomal rearrangements that result in gene loss or gain, and transposable element activity providing a source of evolutionary novelty.
2. Analysis of multiple Z. tritici genomes revealed a large flexible pan-genome with a small conserved core and many lineage-specific genes, facilitating adaptation to different wheat cultivars and environments. Recent losses of core genes were enriched for secreted effectors.
3. Signatures of recent strong positive selection were detected in pathogen populations, indicating adaptive evolution in response to pressures like new resistant wheat cultivars.
RefSeq curation in-depth. Examples of targeted transcript and protein curation, presented at the 8th International Biocuration conference (April, 2015).
This document summarizes a presentation given by Luke Hickey of Pacific Biosciences on human genome sequencing using PacBio systems. It discusses PacBio sequencing technology developments, sequencing and assembly of the NA12878 genome, and the role of the NIST Genome in a Bottle (GIAB) reference materials. Specifically, it notes that PacBio sequenced the GIAB Ashkenazim trio genomes to high coverage and made the data publicly available. The sequencing and assembly of these genomes helps validate and improve PacBio sequencing technologies and supports the development and release of the trio as new NIST reference materials.
Avances en genética. Utilidad de la NGS y la bioinformática.BBK Innova Sarea
27 Octubre 2014. Presentación de Pablo Lapunzina, Director del Instituto de Medicina Genética Médica y Molecular (INGEMM), de IDIPAZ y de CEBERER, en la "Jornada Avances en Genética y Tecnología Social. La experiencia de la Fundación Síndrome de Dravet ".
The document discusses curating sequence and literature data for RefSeq and Gene at the National Center for Biotechnology Information. It provides an overview of RefSeq, describing what RefSeq is, how it compares to GenBank, its advantages, and how the RefSeq dataset is built through curated data and sequence analysis. It then discusses the curation process in depth, including examples of curating genes, transcripts, proteins, and literature. It also describes the tools and quality assurance checks used in curation.
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
The so-called “next-generation” sequencing (NGS) technologies allows us, in a short time and in parallel, to sequence massive amounts of DNA, overcoming the limitations of the original Sanger sequencing methods used to sequence the first human genome. NGS technologies have had an enormous impact on biomedical research within a short time frame. This talk will give an overview of these applications with specific examples from Mendelian genomics and cancer research. #h2ony
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...shabeel pn
This document discusses genetic and molecular characterization of Actinobacillus actinomycetemcomitans (A.a.), a dental pathogen, using genomic approaches. Key points include:
1) A.a.'s genome has been sequenced which will help study its iron acquisition systems, Fur and iron regulons, and virulence factors.
2) A rat model has been used to study A.a. pathogenesis and induced colonization, immune response, and bone loss similar to human infections.
3) Future studies aim to use genomics and DNA microarrays to better understand A.a. biology, host-pathogen interactions, and develop new therapies.
The document discusses bioinformatics at the International Institute of Tropical Agriculture (IITA). It defines bioinformatics and describes the large amount of biological data that is now available, including sequences, structures, and expressions. It then outlines IITA's bioinformatics resources, including personnel, computing hardware, and software tools for analyzing different types of biological data like genotyping by sequencing (GBS) and RNA sequencing. Specific IITA projects involving the analysis of cassava, yam, and other crop sequencing data are also mentioned.
Genome sequencing and the development of our current information libraryZarlishAttique1
This document provides information about genome projects and the development of current information libraries. It discusses different types of genome projects conducted on organisms from all domains of life. These include projects on humans, plants, animals, fungi, bacteria, archaea, and viruses. It also describes the methods used in genome projects, such as genome assembly, annotation, and high-throughput sequencing techniques including de novo sequencing and resequencing. Genome annotation methods and tools are also outlined. The document concludes by noting the tremendous progress made in high-throughput sequencing capabilities, allowing for rapid sequencing of many genomes.
This document summarizes epigenetics quality control (EpiQC) efforts for several reference epigenome samples, including various epigenetic assays and datasets that have been or will be generated. It discusses completed whole-genome bisulfite sequencing, methylation array, and single molecule real-time sequencing data for several samples, as well as planned additional assays including oxidative bisulfite sequencing and single-cell reduced representation bisulfite sequencing. The goal is to establish high-quality reference epigenomes for use in assay development and validation through a collaborative consortium.
This lecture introduces next-generation sequencing and its applications in biomedical research. It discusses how next-gen sequencing is transforming genetic disease diagnosis and personalized medicine. The lecture covers sequencing workflows including read alignment, variant calling, and annotation. It also describes different sequencing experiments like whole genome, exome, RNA-seq, and ChIP-seq. Finally, it discusses how next-gen sequencing is advancing research into genetic diseases and cancer genomics.
The document describes BioNano Genomics' Irys system for generating genome maps using single molecule imaging. The Irys system labels sites in native genomic DNA, linearizes and images the molecules to create digital maps over 100kb in length. These maps can then be assembled into consensus maps over 30Mb long and used for structural variation detection, genome finishing by aligning sequencing data, and validation of genome assemblies. Examples are provided analyzing data from the NIST GIAB trio to validate structural variants and correct conflicts between sequencing and genetic maps.
The document describes a comparative analysis of human chromosome 22q11.1-q12.3 and syntenic regions in chimpanzee, baboon, bovine, mouse, pufferfish and zebrafish genomes. It finds that while the human and chimpanzee genomes are about 98% identical, differences increase in other species like baboons at 92% and mice at 90%. Various deletions, duplications and other structural variations are observed between species. About half of predicted human genes on chromosome 22 have been experimentally validated.
A plant genome project aims to discover all genes and their function in a particular plant species.
The main objective of genomic research in any species is to sequence the whole genome and functions of all the different coding and non-coding sequences.
These techniques helped in preparation of molecular maps of many plant genomes.
Plant genome projects initially focused on a few model organisms that are characterized by small genomes or their amenability to genetic studies
Since sequencing technologies have moved on, sequencing cost have dropped and bioinformatics tools advanced, the genomes of many plant species including the enormous genome of bread wheat have been assembled
Genome sequencing projects have been carried out on all three plant genomes: the nuclear, chloroplast and mitochondrial genomes
This opened venues for advanced molecular breeding and manipulation of plant species, but also have accelerated phylogenetics studies amongst species
Several excellent curated plant genome databases, besides the general nucleotide data base archives, allow public access of plant genomes
This presentation highlights the basics and application of genome editing strategies in plants, strategies to reduce off-target mutation, identification of mutant analysis etc.
This document summarizes a presentation on characterizing extreme diversity in the human genome using a single haplotype genomic resource called CHM1. The presentation discusses how CHM1, which is a hydatidiform mole genome, provides a highly contiguous single haplotype representation of the genome that can help identify misassemblies in the current reference genome and regions with high genetic variation. It also describes how finishing additional diverse genomes and incorporating them into a population reference graph could help make the reference more representative of human genetic diversity.
The document provides an overview of the COLO829 datasets available for precision medicine research from the USC Translational Genomics lab. The lab conducts applied genomics and bioinformatics research for precision medicine in areas like rare childhood disorders, cancer genetics, and neurogenomics. It coordinates high-throughput sequencing, data analysis, and clinical trial execution to enable molecular characterization and therapy selection. The datasets include exome, transcriptome, and structural variant data on cell lines and patient samples, stored in databases like dbGAP and made available to collaborators for studies in precision oncology and rare diseases.
The document summarizes the rice genome sequencing project. The International Rice Genome Sequencing Project was established in 1997 with the goal of sequencing the rice genome through international collaboration. Representatives from 11 countries were assigned different rice chromosomes to sequence. The project found that the rice genome is 420 Mb in size and contains 37,544 protein-coding genes, more than previously estimated for rice. Understanding the rice genome sequence could help improve rice breeding, increase crop yields, and develop disease-resistant rice varieties.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
This document summarizes Dr. Tara N. Marriage's RNA-Seq analysis of the cell cycle transcriptome in the multicellular alga Gonium pectorale. RNA was extracted from G. pectorale cells collected hourly across a 24 hour period and pooled into time points corresponding to different cell cycle phases. RNA-Seq libraries were constructed and sequenced, and the reads were mapped and analyzed for differential gene expression. Preliminary results identified over 2400 differentially expressed genes across the cell cycle and hierarchical clustering of expression profiles. Several key cell cycle genes were found to be differentially expressed during mitosis. The analysis is ongoing to further investigate cell cycle regulation and changes contributing to multicellularity in Gonium compared
This document provides an overview of rice genomics. It discusses the history of genomics from the 1980s development of DNA markers and PCR, to major milestones like the sequencing of rice genomes in 2002. It describes the International Rice Genome Sequencing Project's clone-by-clone sequencing approach. The rice genome was found to contain over 37,000 genes and significant repetitive elements. Comparative genomics with other cereals revealed conserved synteny. The 3,000 Rice Genomes Project aims to sequence a diverse set of rice varieties to explore genetic diversity.
RNA-Seq Analysis of Blueberry Fruit Development and RipeningAnn Loraine
This document summarizes an RNA-Seq analysis of blueberry fruit development and ripening. Researchers sequenced RNA from five stages of fruit development to generate over 20 million reads per sample. Reads were aligned to the blueberry genome assembly to identify over 50,000 expressed genes and their expression profiles across stages. Analysis identified thousands of differentially expressed genes between stages and clusters of genes with similar expression patterns. Pathway analysis revealed metabolic pathways active during fruit development, including a potential new pathway for bixin biosynthesis with high expression during fruit maturation. Resources from the project include an online blueberry browser and gene expression data.
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...Ann Loraine
I presented these slides at the Plant Metabolic Network workshop held at the Plant Animal Genome Conference (PAG) XXII, January, 2014. The main goals of the talk were to describe RNA-Seq based annotation of a blueberry genome assembly and explain how we used PlantCyc enzyme data to associate blueberry genes with metabolic pathways.
Next-generation sequencing technologies have rapidly advanced since 2005. Key developments include massively parallel sequencing reactions that enabled sequencing of entire human genomes for less than $1,000 by 2015. While Illumina dominates the market, other platforms like Ion Torrent and PacBio are increasing capabilities. Routine human whole genome sequencing is now used in research and medicine, enabling new opportunities like liquid biopsies and single-cell analysis. However, data storage and analysis remain challenges due to the large volumes of sequencing data.
Institute of Learning in Retirement - Miami University (Ohio)Andor Kiss
CRISPR/Cas9 is a new genetic engineering technique that uses a bacterial immune system to edit DNA. It involves using an RNA guide sequence and Cas9 protein to cut DNA at a targeted location. This allows genes to be knocked out or altered. CRISPR has many advantages over older techniques and has greatly improved efficiency of genetic engineering. However, it also raises ethical concerns about its applications, including the first reported use in human embryos.
Overview on arabidopsis and rice genomeGopal Singh
This document summarizes the sequencing of the Arabidopsis and rice genomes. It describes that Arabidopsis was the first plant and third multicellular organism to have its genome sequenced, which was completed in 2000 through an international collaboration. The rice genome sequencing project began in 1997 and was completed in 2005, providing a 389Mb sequence with 95% accuracy. Both projects used BAC and PAC libraries to sequence the genomes. The Arabidopsis genome is 115Mb across 5 chromosomes, while the rice genome is larger at 400-430Mb across 12 chromosomes.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Robert Hunt works a dreary job for the Department of Information Technology and Telecommunications in New York. After a disturbing dream about his deceased wife, Robert encounters his old friend John at a bar. Robert later overhears classified government transmissions at work and traces the call, discovering a corrupt scheme. Frightened, Robert steals money and flees to John for help. John agrees to assist Robert in exchange for a cut of the stolen funds. Meanwhile, the police raid Robert's apartment searching for clues to his whereabouts.
The company was facing increasing patient acquisition costs from traditional media marketing. They implemented a new digital marketing strategy focusing on search, display, and social media advertising. This led to an 82% increase in digitally acquired patient inquiries and an 18% decrease in the total cost per patient acquisition. The outcome was both lower acquisition costs and higher patient volumes, improving the financial performance and market position of the company.
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...shabeel pn
This document discusses genetic and molecular characterization of Actinobacillus actinomycetemcomitans (A.a.), a dental pathogen, using genomic approaches. Key points include:
1) A.a.'s genome has been sequenced which will help study its iron acquisition systems, Fur and iron regulons, and virulence factors.
2) A rat model has been used to study A.a. pathogenesis and induced colonization, immune response, and bone loss similar to human infections.
3) Future studies aim to use genomics and DNA microarrays to better understand A.a. biology, host-pathogen interactions, and develop new therapies.
The document discusses bioinformatics at the International Institute of Tropical Agriculture (IITA). It defines bioinformatics and describes the large amount of biological data that is now available, including sequences, structures, and expressions. It then outlines IITA's bioinformatics resources, including personnel, computing hardware, and software tools for analyzing different types of biological data like genotyping by sequencing (GBS) and RNA sequencing. Specific IITA projects involving the analysis of cassava, yam, and other crop sequencing data are also mentioned.
Genome sequencing and the development of our current information libraryZarlishAttique1
This document provides information about genome projects and the development of current information libraries. It discusses different types of genome projects conducted on organisms from all domains of life. These include projects on humans, plants, animals, fungi, bacteria, archaea, and viruses. It also describes the methods used in genome projects, such as genome assembly, annotation, and high-throughput sequencing techniques including de novo sequencing and resequencing. Genome annotation methods and tools are also outlined. The document concludes by noting the tremendous progress made in high-throughput sequencing capabilities, allowing for rapid sequencing of many genomes.
This document summarizes epigenetics quality control (EpiQC) efforts for several reference epigenome samples, including various epigenetic assays and datasets that have been or will be generated. It discusses completed whole-genome bisulfite sequencing, methylation array, and single molecule real-time sequencing data for several samples, as well as planned additional assays including oxidative bisulfite sequencing and single-cell reduced representation bisulfite sequencing. The goal is to establish high-quality reference epigenomes for use in assay development and validation through a collaborative consortium.
This lecture introduces next-generation sequencing and its applications in biomedical research. It discusses how next-gen sequencing is transforming genetic disease diagnosis and personalized medicine. The lecture covers sequencing workflows including read alignment, variant calling, and annotation. It also describes different sequencing experiments like whole genome, exome, RNA-seq, and ChIP-seq. Finally, it discusses how next-gen sequencing is advancing research into genetic diseases and cancer genomics.
The document describes BioNano Genomics' Irys system for generating genome maps using single molecule imaging. The Irys system labels sites in native genomic DNA, linearizes and images the molecules to create digital maps over 100kb in length. These maps can then be assembled into consensus maps over 30Mb long and used for structural variation detection, genome finishing by aligning sequencing data, and validation of genome assemblies. Examples are provided analyzing data from the NIST GIAB trio to validate structural variants and correct conflicts between sequencing and genetic maps.
The document describes a comparative analysis of human chromosome 22q11.1-q12.3 and syntenic regions in chimpanzee, baboon, bovine, mouse, pufferfish and zebrafish genomes. It finds that while the human and chimpanzee genomes are about 98% identical, differences increase in other species like baboons at 92% and mice at 90%. Various deletions, duplications and other structural variations are observed between species. About half of predicted human genes on chromosome 22 have been experimentally validated.
A plant genome project aims to discover all genes and their function in a particular plant species.
The main objective of genomic research in any species is to sequence the whole genome and functions of all the different coding and non-coding sequences.
These techniques helped in preparation of molecular maps of many plant genomes.
Plant genome projects initially focused on a few model organisms that are characterized by small genomes or their amenability to genetic studies
Since sequencing technologies have moved on, sequencing cost have dropped and bioinformatics tools advanced, the genomes of many plant species including the enormous genome of bread wheat have been assembled
Genome sequencing projects have been carried out on all three plant genomes: the nuclear, chloroplast and mitochondrial genomes
This opened venues for advanced molecular breeding and manipulation of plant species, but also have accelerated phylogenetics studies amongst species
Several excellent curated plant genome databases, besides the general nucleotide data base archives, allow public access of plant genomes
This presentation highlights the basics and application of genome editing strategies in plants, strategies to reduce off-target mutation, identification of mutant analysis etc.
This document summarizes a presentation on characterizing extreme diversity in the human genome using a single haplotype genomic resource called CHM1. The presentation discusses how CHM1, which is a hydatidiform mole genome, provides a highly contiguous single haplotype representation of the genome that can help identify misassemblies in the current reference genome and regions with high genetic variation. It also describes how finishing additional diverse genomes and incorporating them into a population reference graph could help make the reference more representative of human genetic diversity.
The document provides an overview of the COLO829 datasets available for precision medicine research from the USC Translational Genomics lab. The lab conducts applied genomics and bioinformatics research for precision medicine in areas like rare childhood disorders, cancer genetics, and neurogenomics. It coordinates high-throughput sequencing, data analysis, and clinical trial execution to enable molecular characterization and therapy selection. The datasets include exome, transcriptome, and structural variant data on cell lines and patient samples, stored in databases like dbGAP and made available to collaborators for studies in precision oncology and rare diseases.
The document summarizes the rice genome sequencing project. The International Rice Genome Sequencing Project was established in 1997 with the goal of sequencing the rice genome through international collaboration. Representatives from 11 countries were assigned different rice chromosomes to sequence. The project found that the rice genome is 420 Mb in size and contains 37,544 protein-coding genes, more than previously estimated for rice. Understanding the rice genome sequence could help improve rice breeding, increase crop yields, and develop disease-resistant rice varieties.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
This document summarizes Dr. Tara N. Marriage's RNA-Seq analysis of the cell cycle transcriptome in the multicellular alga Gonium pectorale. RNA was extracted from G. pectorale cells collected hourly across a 24 hour period and pooled into time points corresponding to different cell cycle phases. RNA-Seq libraries were constructed and sequenced, and the reads were mapped and analyzed for differential gene expression. Preliminary results identified over 2400 differentially expressed genes across the cell cycle and hierarchical clustering of expression profiles. Several key cell cycle genes were found to be differentially expressed during mitosis. The analysis is ongoing to further investigate cell cycle regulation and changes contributing to multicellularity in Gonium compared
This document provides an overview of rice genomics. It discusses the history of genomics from the 1980s development of DNA markers and PCR, to major milestones like the sequencing of rice genomes in 2002. It describes the International Rice Genome Sequencing Project's clone-by-clone sequencing approach. The rice genome was found to contain over 37,000 genes and significant repetitive elements. Comparative genomics with other cereals revealed conserved synteny. The 3,000 Rice Genomes Project aims to sequence a diverse set of rice varieties to explore genetic diversity.
RNA-Seq Analysis of Blueberry Fruit Development and RipeningAnn Loraine
This document summarizes an RNA-Seq analysis of blueberry fruit development and ripening. Researchers sequenced RNA from five stages of fruit development to generate over 20 million reads per sample. Reads were aligned to the blueberry genome assembly to identify over 50,000 expressed genes and their expression profiles across stages. Analysis identified thousands of differentially expressed genes between stages and clusters of genes with similar expression patterns. Pathway analysis revealed metabolic pathways active during fruit development, including a potential new pathway for bixin biosynthesis with high expression during fruit maturation. Resources from the project include an online blueberry browser and gene expression data.
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...Ann Loraine
I presented these slides at the Plant Metabolic Network workshop held at the Plant Animal Genome Conference (PAG) XXII, January, 2014. The main goals of the talk were to describe RNA-Seq based annotation of a blueberry genome assembly and explain how we used PlantCyc enzyme data to associate blueberry genes with metabolic pathways.
Next-generation sequencing technologies have rapidly advanced since 2005. Key developments include massively parallel sequencing reactions that enabled sequencing of entire human genomes for less than $1,000 by 2015. While Illumina dominates the market, other platforms like Ion Torrent and PacBio are increasing capabilities. Routine human whole genome sequencing is now used in research and medicine, enabling new opportunities like liquid biopsies and single-cell analysis. However, data storage and analysis remain challenges due to the large volumes of sequencing data.
Institute of Learning in Retirement - Miami University (Ohio)Andor Kiss
CRISPR/Cas9 is a new genetic engineering technique that uses a bacterial immune system to edit DNA. It involves using an RNA guide sequence and Cas9 protein to cut DNA at a targeted location. This allows genes to be knocked out or altered. CRISPR has many advantages over older techniques and has greatly improved efficiency of genetic engineering. However, it also raises ethical concerns about its applications, including the first reported use in human embryos.
Overview on arabidopsis and rice genomeGopal Singh
This document summarizes the sequencing of the Arabidopsis and rice genomes. It describes that Arabidopsis was the first plant and third multicellular organism to have its genome sequenced, which was completed in 2000 through an international collaboration. The rice genome sequencing project began in 1997 and was completed in 2005, providing a 389Mb sequence with 95% accuracy. Both projects used BAC and PAC libraries to sequence the genomes. The Arabidopsis genome is 115Mb across 5 chromosomes, while the rice genome is larger at 400-430Mb across 12 chromosomes.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Robert Hunt works a dreary job for the Department of Information Technology and Telecommunications in New York. After a disturbing dream about his deceased wife, Robert encounters his old friend John at a bar. Robert later overhears classified government transmissions at work and traces the call, discovering a corrupt scheme. Frightened, Robert steals money and flees to John for help. John agrees to assist Robert in exchange for a cut of the stolen funds. Meanwhile, the police raid Robert's apartment searching for clues to his whereabouts.
The company was facing increasing patient acquisition costs from traditional media marketing. They implemented a new digital marketing strategy focusing on search, display, and social media advertising. This led to an 82% increase in digitally acquired patient inquiries and an 18% decrease in the total cost per patient acquisition. The outcome was both lower acquisition costs and higher patient volumes, improving the financial performance and market position of the company.
The document summarizes the history of construction and renovations of the Church of Cabanillas. It describes how the church was initially built between 1581-1587 with three naves and chapels, using stones from the nearby river. Major renovations occurred in 1609 when the tower's lightning rod was replaced, in 1626 when the first bell was added, and in 1676 when the main chapel was built. Throughout the 17th-18th centuries, several consecutive works were needed to repair the church. The most recent renovations started in 1994 to repair the roof, tower, and interior, with work continuing into the 2010s.
The medical provider wanted to increase patient volume and conversion rates. Their current marketing tactics were acquiring non-converting inquiries, leading to a drop in inquiries and patients. Market research identified demographic segments with higher conversion rates. The new tactical program targeted these better converting segments through customized creative assets, improving the overall conversion rate from inquiry to patient by 500 basis points. This resulted in a 29% higher concentration of patients most likely to convert to treatment.
1. Atahualpa, the Inca ruler, was lured into a trap by Francisco Pizarro and kidnapped by Spanish forces in Cajamarca.
2. When Atahualpa refused to convert to Christianity, the Spanish opened fire and massacred thousands of Inca people.
3. To secure his release, Atahualpa offered to fill a room with gold and two rooms with silver, which the Spanish accepted. However, after receiving the treasure, the Spanish executed Atahualpa in 1533 for alleged crimes against the Inca people.
Francisco Pizarro invited the Inca ruler Atahualpa to meet with him in Cajamarca. It was a trap, and when Atahualpa arrived with thousands of men, the Spanish launched a surprise attack, killing many Incas. Atahualpa was taken prisoner. In exchange for his freedom, Atahualpa offered to fill a room with gold and two rooms with silver. However, after the treasure was delivered, the Spanish accused Atahualpa of various crimes and executed him on July 26, 1533, despite his ransom offer.
The document provides details about the mise en scene of a scene set in an undiscovered cave. It summarizes that the location is a pitch black underground cave in America that was filmed in Scotland. Costumes are normal day clothes since the characters are caving, while the "crawlers" have prosthetic makeup and no clothes. Various props are used to make the setting feel realistic, like bones, cameras, and climbing equipment. Lighting includes flares, headlamps, and night vision to see in the dark cave. The movement of the fast crawlers and actors' quick reactions to scares were intended to grab audience attention.
Valve Corporation publishes its own games through its digital distribution platform Steam. It was founded in 1996 by Gabe Newell and Mike Harrington after leaving Microsoft. Notable games include Half-Life, Portal, and Counter-Strike. Valve focuses on PC gaming and emphasizes creative freedom for employees without formal managers or directors. It uses proprietary Source engine software as well as filmmaker and Havok middleware to develop high-quality games across multiple genres.
Market Study and Feasibility of Amphibious Vehicles in GoaGaurav Sharma
This document presents the findings of a feasibility study for introducing amphibious vehicles in Goa, India. It analyzes the market opportunity through frameworks like Go-To-Market and Porter's Five Forces. Key findings include that the tourism industry in Goa has been growing, with over 2 million tourists annually. Market segmentation identified families and young travelers as potential early adopters. A SWOT analysis found strengths in the unique experience but weaknesses around regulations. Recommendations include starting with a small pilot and focusing marketing on experience seekers looking for something new. Limitations around regulations and seasonality are noted.
The document provides instructions for conducting a skin test prior to applying makeup for a special effect. It instructs the person to test only the materials they plan to use in a well-ventilated area by the sink. They should apply the test to the inside of the wrist and wait a few minutes to check for irritation, washing immediately with soap and water if any occurs. It also has sections to provide personal details and describe the intended makeup effect and location on the body.
The document discusses a budget. It appears to be written by Airidas Cironka and focuses on financial planning and allocation of funds. In a concise manner, the author outlines considerations for income, expenses, and savings over a set period of time.
Una ciudad inteligente se caracteriza por el uso intensivo de las tecnologías de la información y la comunicación para crear y mejorar los sistemas de la ciudad, con el fin de aumentar la eficiencia en el uso de recursos y mejorar la calidad de vida de los habitantes. La inteligencia de la ciudad se refleja en aspectos como la educación, salud, transporte y seguridad, los cuales pueden comportarse de forma más eficiente a través del uso de las TIC. Algunas ideas principales de las ciudades inteligentes incluyen cuestiones ambient
Communiqués de presse, poster, liste des publications. Cellule d'ingénierie des connaissances et d'assistance à la publication scientifique (CICAP). Unité mixte de recherche 1347 Agroécologie, INRA, Dijon, France.
This document provides an overview of the field of bioinformatics. It discusses that bioinformatics is the analysis of biological information using computers and statistical techniques, and involves organizing, storing, analyzing and visualizing genomic data. It also discusses various databases used in bioinformatics, including nucleotide sequence databases like GenBank, protein sequence databases like Swiss-Prot, structure databases like PDB, and species-oriented databases. Examples of analyzing genomic sequences, predicting protein structures, and correlating gene expression and disease are also provided.
Johannes Bergsten lecture on Thursday, Sept 17, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyShaojun Xie
1. Researchers at DOE JGI sequenced the genome of switchgrass (Panicum virgatum) to support the development of cellulosic biofuels.
2. They produced an initial draft genome assembly (v0.0) using 454 sequencing but it was fragmented.
3. To improve the assembly, they developed a genetic map of switchgrass using 250 offspring from a cross between cultivars AP13 and VS16.
4. They used the genetic map to order and orient contigs into chromosomes, producing version 1.0 of the switchgrass reference genome.
The word clone has been extensively used to indicate the product of recombinant DNA technology that allows geneticist to create identical copies of a DNA fragment, more often alluded to as gene. In practice, the procedure is carried out by inserting a fragment of desired DNA into another DNA molecule, a vector, and allowing this chimeric molecule to replicate inside a fast replicating living cell such as bacterium.
This document describes a DNA sequencing process. It begins with DNA extraction from an insect sample, followed by PCR and gel electrophoresis to amplify and isolate the target DNA fragment. The DNA is then sequenced using the dideoxy sequencing method. The sequenced DNA can be used for tasks like identifying the insect species, performing forensics analysis, or providing genetic information for medical insurance purposes. Bioinformatics tools are used to analyze the sequenced DNA data.
The document discusses microarray studies to analyze gene expression. It provides background on the history and evolution of cDNA microarrays. It then describes the basic process of building microarray chips, preparing RNA samples, hybridizing chips, and analyzing data. Specific details are given on preparing probes, synthesizing oligonucleotides, making microarray chips in the lab, obtaining tissue samples, and analyzing emission from hybridized chips. The overall aim is to induce rupture of atherosclerotic plaques in mice and use microarrays to find genes expressed in ruptured plaques by comparing results to histopathology. Various drugs being tested to induce rupture are also listed.
Genomics is the study of genomes through sequencing and analysis. Key points:
- Genomics involves mapping and sequencing genomes to understand genes and how they function. It uses techniques from genetics and molecular biology.
- The human genome contains 23 chromosome pairs and around 24,000 genes. Genomics aims to sequence whole genomes and analyze gene function.
- Early developments included identifying DNA's structure in 1953 and sequencing the first genome in the 1970s. The Human Genome Project aimed to map the entire human genome between 1990-2003.
- Genomics has applications in medicine like gene therapy for genetic diseases and in understanding health, disease, and drug responses through analysis of genetic variations.
This document summarizes Sujai Kumar's presentation on insights into planarian regeneration from sequencing the genome of Girardia tigrina. Key points include:
1) G. tigrina is a freshwater planarian species with remarkable regenerative abilities. Sequencing its genome provides resources to study the genetic basis of regeneration.
2) Kumar's team generated a draft genome assembly, predicted genes, and performed functional annotation to create genomic resources for G. tigrina.
3) These resources are being used to identify genes and pathways involved in regeneration, perform orthology analysis across species, and study cis-regulatory elements in regeneration.
126 micro array study for gene expressionSHAPE Society
The document describes the process of performing a microarray study to analyze gene expression. It discusses the history of microarrays and the overall process which includes building the chip by amplifying cDNA clones via PCR, preparing RNA by isolating it from cell cultures, hybridizing the array by labeling probes with fluorescent dyes and incubating RNA samples on the chip, and analyzing the data by quantifying fluorescence levels. The goal is to induce rupture of atherosclerotic plaques in mice and use microarrays to identify genes expressed in ruptured versus stable plaques. Various drugs will be tested for their ability to rupture plaques by altering hemodynamics and oxidative stress.
The document provides information about genomics and the Human Genome Project. It defines genomics as the study of the structure and function of entire genomes. It describes the goals of the Human Genome Project as identifying all human genes, determining DNA sequences, and making the data publicly available. Sequencing techniques used include shotgun sequencing and Sanger sequencing. The document also discusses how DNA is amplified and prepared for sequencing.
The Genome in a Bottle Consortium is developing reference materials, reference methods, and reference data to assess confidence in human whole genome variant calls. The Consortium is characterizing several human genomes including the NA12878 genome, an Ashkenazi Jewish trio, and a Chinese trio from the Personal Genome Project. Data generated for these genomes includes various sequencing technologies from Illumina, Complete Genomics, PacBio, BioNano, and others. The Consortium is developing high-confidence variant calls for SNPs, indels, structural variants, and phasing. Individual datasets and integrated variant calls will be made publicly available on the GIAB FTP site.
Role of bioinformatics in life sciences researchAnshika Bansal
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
This document outlines a DNA barcoding protocol for Census of Marine Life (CoML) investigators to determine DNA barcodes from collected specimens. The protocol recommends preserving specimens in 95% ethanol, amplifying and sequencing the cytochrome c oxidase subunit I (COI) gene as the primary barcode marker, and submitting sequences to public databases linked to specimen data. Alternate targets may be needed for some taxa. The goal is to provide a uniform method for species identification that will aid CoML research and have broader scientific applications.
whole genome analysis
history
needs
steps involved
human genome data
NGS
pyrosequencing
illumina
SOLiD
Ion torrent
PacBio
applications
problems
benefits
The document discusses microarray studies to analyze gene expression. It describes the history of microarrays and the current process which involves building chips by amplifying cDNA clones via PCR, printing the clones on slides, preparing RNA by isolating it from cell cultures, producing cDNA, labeling samples, hybridizing the chip, and analyzing data. The aim is to induce rupture of atherosclerotic plaque in mice using drugs to alter gene expression and identify genes associated with ruptured plaques. The process will involve designing probes, synthesizing cDNA, obtaining tissue samples, and analyzing hybridized chips to find intensity differences in genes.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
06.09.15
Invited Talk
2006 Synthetic Biology Symposium
Aliso Creek Inn
Title: Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics
Laguna Beach, CA
The document describes a seminar on high-throughput sequencing bioinformatics. It discusses analyzing microbiome samples using 16S rRNA sequencing and tools like Mothur and QIIME. It provides an overview of analyzing 16S sequences, including quality filtering, OTU clustering, classification, and diversity analysis. It also outlines running a Mothur tutorial to analyze a mock microbiome dataset from 21 samples using the Mothur MiSeq standard operating procedure.
1. A
C
G
T
TheThe MedicagoMedicago truncatulatruncatula genome:genome:
a progress reporta progress report
Dr. Bruce A. RoeDr. Bruce A. Roe
Advanced Center for Genome TechnologyAdvanced Center for Genome Technology
Department of Chemistry and BiochemistryDepartment of Chemistry and Biochemistry
University of OklahomaUniversity of Oklahoma
broe@ou.edu www.genome.ou.edubroe@ou.edu www.genome.ou.edu
Plant and Animal GenomePlant and Animal Genome
San Deigo January 11San Deigo January 11, 2004, 2004
Photos by Steve Hughes, Genetic Resource Centre (PIRSA-SARDI), Adelaide, Australia.
http://www.fao.org/ag/AGP/AGPC/doc/gallery/pictures/meditrunc/meditrunc.htm
2. A
C
G
T
• An important forage crop
• A genetically tractable model legume
• A relatively small (~500 Mbp) diploid genome
• Active legume research community
• Medicago Research Consortium
• Large collection of ESTs
• Excellent BAC library
• Integrated physical and genetic map
• Large number of BAC-end sequences
Why sequence the Medicago genome?
3. A
C
G
T
DNA GenBank
Sequence Pipeline at the University of Oklahoma
Genome Center, OU-ACGT
DNA shearing
(HydroshearTM
)
Colony Piking
(QPixIITM
)
Growing subclones
(HiGroTM)
Subclone Isolation I
(Mini-StaccatoTM
)
Subclone isolation II
(VPrepTM
)
Thermocycling
(ABI 9700)
Sequencing
(ABI 3700)
Data assembly and
Analysis
Primer
Synthesis
Miscelaneous liquid
handling
Closure
4. A
C
G
T
• This Zymark robot has 384 cannula array, four built in shakers, three
attached storage racks, built-in barcoding and a Twister II robotic arm.
• This automation has allow us to perform the DNA isolation completely
unattended from as many as eighty 384 well plates of bacterial cells per
Subclone Isolation (Mini-StaccatoTM
)
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
5. A
C
G
T
• Once all three solutions have been added, the plates are transferred from
the SciClone workspace deck to a storage rack by the Twister II robotic arm.
Subclone Isolation (Mini-StaccatoTM
)
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
6. A
C
G
T
• Liquid handling station with 384-channel pipettor head
• Four movable shelves on either side of the pipettor head
• Used for subclone isolation, sequencing reaction set-up and clean-up.
Subclone Isolation and Sequencing Reaction
Pipetting (Velocity 11 VPrep)
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
7. A
C
G
T
Data assembly and Analysis
32 GB RAM running Solaris 8 OS and 3
TB of data stored on RAID-5 arrays
with autoloader tape backup
Also:
• 12 workstations each with 1 GB RAM
Sun V880 server Phred/Phrap/Consed
Exgap
8. A
C
G
T
Initial WGS Skimming for ~500 Mb
Medicago truncatula genome
• Collected ~25,000 end-sequences from ~12,500
plasmid-based WGS clones.
• Of these ~25,000 sequences, ~1,000 have
homology with Medicago truncatula ESTs.
• URL:
http://www.genome.ou.edu/medicago.html
9. A
C
G
T
Phrap assembly of our Medicago truncatula whole
genome shotgun survey sequencing data
at 0.005-fold genomic sequence coverage
10. A
C
G
T
DotPlot of a Phrap assembled whole genome
shotgun contig showing multiple repeated regions
0 100 200 300 400 500 600 700
7006005004003002001000
Bases
Bases
11. A
C
G
T
DotPlot of a Phrap assembled whole genome shotgun
contig showing 4 repeated blocks of ~600 bases
0 500 1000
10005000Bases
Bases
12. A
C
G
T
Yet another genomic contig showing extensive repeated regions
Contig 1931
0 200 400 600
6004002000
Bases
Bases
14. A
C
G
T
Summary of our Medicago truncatula WGS
Sequencing Assembly with only 0.005-fold
Genomic Sequence Coverage
• The largest contig (21,157 bp) contained the 26S
rRNA genes
• 19 smaller contigs (105,455 bp total) were from the
chloroplast genome
• The remaining ~500 contigs, ranging in size from
2,000 to 12,000 bp contain highly repetitive DNA,
which were unique to Medicago, as they had no
significant homology in the GenBank database
• We concluded that a more directed strategy was
needed
15. A
C
G
T
Mapped BAC approach in
collaboration with Doug Cook
and DJ Kim at U.C. Davis with
funding from the Noble
Foundation, Ardmore, OK
16. A
C
G
T
The first ~1000The first ~1000 Medicago truncatulaMedicago truncatula BACsBACs
• Initially concentrated on BACs with known biologicalInitially concentrated on BACs with known biological
markers and in regions of biological interest that weremarkers and in regions of biological interest that were
supplied to us by the UC Davis group.supplied to us by the UC Davis group.
• Requests for sequencing specific BACs were directedRequests for sequencing specific BACs were directed
to Doug Cook and DJ Kim at UC Davis and theyto Doug Cook and DJ Kim at UC Davis and they
supplied us with the BACs once these BACs havesupplied us with the BACs once these BACs have
been characterized.been characterized.
• Once the BACs were received, we created the shotgunOnce the BACs were received, we created the shotgun
libraries, isolated the sequencing templates andlibraries, isolated the sequencing templates and
obtained the working draft sequence followed byobtained the working draft sequence followed by
closure and finishing.closure and finishing.
• All data was made publically available in GenBankAll data was made publically available in GenBank
within 24 hours of sequence assembly.within 24 hours of sequence assembly.
19. A
C
G
T
The next ~750The next ~750 Medicago truncatulaMedicago truncatula BACsBACs
• With recent NSF funding, we will beWith recent NSF funding, we will be
sequencing BACs from chromosomessequencing BACs from chromosomes
1,4, 6, and 8 with the goal of completing1,4, 6, and 8 with the goal of completing
the sequence of the euchromatic regionsthe sequence of the euchromatic regions
of these chromosomes over the next 3of these chromosomes over the next 3
years.years.
• Chromosomes 2 and 7 will be sequencedChromosomes 2 and 7 will be sequenced
at TIGR, chromosome 3 at The Sangerat TIGR, chromosome 3 at The Sanger
Institute and and chromosome 5 atInstitute and and chromosome 5 at
Genoscope.Genoscope.
• All data will be released immediately asAll data will be released immediately as
before.before.
27. A
C
G
T
Gene Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1-1000
1001-2000
2001-3000
3001-4000
4001-5000
5001-6000
6001-7000
7001-8000
8001-9000
9001-10000
10001-11000
11001-12000
12001-13000
13001-14000
14001-15000
15001-16000
16001-17000
17001-18000
18001-19000
19001-20000
20001-above
FgeneSH
Genscan
Number
of
Genes
Gene Size Range
13,396 FgeneSH predicted genes
11,488 Genscan predicted genes
28. A
C
G
T
Exon Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1-50
51-100
101-200
201-300
301-400
401-500
501-600
601-700
701-800
801-900
901-1000
1001-1500
1501-2000
2001-2500
2501-3000
3001-3500
3501-4000
Number
of
Exons
Exon Size Range
FgeneSH
Genscan
59,808 FgeneSH predicted exons
55,792 Genscan predicted exons
29. A
C
G
T
Intron Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
0
2000
4000
6000
8000
10000
12000
1-50
51-100
101-200
201-300
301-400
401-500
501-600
601-700
701-800
801-900
901-1000
1001-1500
1501-2000
2001-2500
2501-3000
3001-3500
3501-4000
Number
of
Introns
Intron Size Range
FgeneSH
Genscan
46,412 FgeneSH predicted introns
44,305 Genscan predicted introns
30. A
C
G
T
FgeneSH Genscan
Total number of genes 13,397 11,488
Total length of genes 30,793,326 51,687,528
Total exon length 15,794,243 14,400,445
Total number of exons 59,808 55,792
Total intron length 14,999,083 37,287,083
Total number of introns 46,412 44,305
_______________________________________________________
Base Pairs Sequenced 87,423,457 87,423,457
_______________________________________________________
Gene Space
(Gene Length/BP Sequenced) 35% 59%_______________________________________________________
Gene Density (Genes/200Mb) 30,649 26,281
1 gene/6.5 kb 1 gene/7.6 kb_______________________________________________________
Arabidopsis 25,498 protein coding genes
Gene Density of the ~450 Mb Medicago truncatula genome
32. A
C
G
T
Metabolic Overview of Medicago
13,396 FgeneSH predicted genes using the COG Database
DNA Metabolism
23%
Cellular Processes
23%Metabolism
24%
Poorly
Characterized
17%
No Hits
5%
Multiple COG Hits
8%
33. A
C
G
T
Metabolic Overview (detailed view) of Medicago
13,396 FgeneSH predicted genes using the COG Database
No Hits
5%
Translation, ribosomal
structure & biogenesis
7% Transcription
5%
DNA replication,
recombination & repair
11%
Multiple COG Hits
8%
Poorly Characterized
17%
Cell division &
chromosome
partitioning 2%
Posttranslational
modification, protein
turnover, chaperones 5%
Cell envelope
biogenesis, outer
membrane 4%
Cell motility & secretion 3%
Inorganic ion transport &
metabolism 3%
Signal
transduction
mechanisms 5%Energy production &
conversion 5%
Carbohydrate transport &
metabolism 4%
Amino acid transport
& metabolism 5%
Nucleotide transport &
metabolism 2%
Coenzyme metabolism 2%
Lipid metabolism 2%
Secondary metabolites
biosynthesis, transport &
catabolism 3%
35. A
C
G
T
AC138448.fg.10 MATKRSVGTLKEAELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHL-----
AC138448.fg.11 MA-KKSVGDLSGAELKGKKVFVRADLNVPLDDNQNITDDTRIRAAIPTIKYLIQNGAKVILSSHL-----
AC138448.fg.8 MATKRSVGTLKEGELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHLEIYKT
AC138448.fg.10 ------------------------------------------GRPKGVTPKYSLKPLVPRLSELLGTQVK
AC138448.fg.11 ------------------------------------------GRPKGVTPKYSLAPLVPRLSELIGIEVI
AC138448.fg.8 EVSVSEYNLAVSEYKLAISDTYRYRIRVRHDSSPFLEYRGSQGRPKGVTPKYSLKPLVPRLSELLETQVK
AC138448.fg.10 IADDSIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNDPEFAKKLASLADLYVNDAFGTAHRAHASTEGV
AC138448.fg.11 KAEDSIGPEVEKLVASLPDGGVLLLENVRFYKEEEKNDPEHAKKLAALADLYVNDAFGTAHRAHASTEGV
AC138448.fg.8 ISDDCIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNEPEFAKKLASLADLYVNDAFGTAHRAHASTEGV
AC138448.fg.10 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA
AC138448.fg.11 TKYLKPSVAGFLLQKELDYLVGAVSSPKRPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA
AC138448.fg.8 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIYTFYKA
AC138448.fg.10 QGYAVGSSLVEEDKLDLATTLIEKAKAKGVSLLLPTDVVIADKFAADANDKIVPASSIPDGWMGLDIGPD
AC138448.fg.11 QGLAVGSSLVEEDKLELATTLIAKAKAKGVSLLLPSDVVIADKFAPDANSQIVPASAIPDGWMGLDIGPD
AC138448.fg.8 QGYSIGSSLVEEDKLDLATSLMEKAKAKGVSLLLPTDVVIADKFSADANDKIVPASSIPDGWMGLDIGPD
AC138448.fg.10 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM
AC138448.fg.11 SIKTFNEALDTTQTIIWNGPMGVFEFDKFAVGTESIAKKLADLSGKGVTTIIGGGDSVAAVEKVGVADVM
AC138448.fg.8 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM
AC138448.fg.10 SHISTGGGASLELLEGKPLPGVLALDDA* 401 amino acids
AC138448.fg.11 SHISTGGGASLELLEGKELPGVLALDEATPVAV* 405 amino acids, differs at 42 positions
AC138448.fg.8 SHISTGGGASLELLEGKPLPGVLALDDA* 448 amino acids, differs at 6 positions
Gene Duplication: Three copies of phosphoglycerate kinase in one BAC
36. A
C
G
T
Printrepeat Analysis of
M. truncatula BAC AC121240 vs. A. thaliana Chr.2
Expansion, Duplication, Repeat Elements
~5 kb region
~25 kb region
38. A
C
G
T
Medicago truncatulaMedicago truncatula
Summary and ConclusionsSummary and Conclusions
• Average Predicted Gene Density of 1 gene per 6.5 toAverage Predicted Gene Density of 1 gene per 6.5 to
7.6 Kb by FgeneSH and Genscan, respectively.7.6 Kb by FgeneSH and Genscan, respectively.
• Genome characteristics such as %GC, intron/exonGenome characteristics such as %GC, intron/exon
size and conserved unique 5’ splice sites revealsize and conserved unique 5’ splice sites reveal
Medicago characteristicsMedicago characteristics
• The sequence of theThe sequence of the Medicago truncatulaMedicago truncatula genomegenome
shows homology to the sequencedshows homology to the sequenced ArabidopsisArabidopsis
thalianathaliana genome but expansion, rearrangementsgenome but expansion, rearrangements
and duplications are evident.and duplications are evident.
39. A
C
G
T
Data Release and Preliminary AnnotationData Release and Preliminary Annotation
• All our sequence data is available through links on ourAll our sequence data is available through links on our
web site to GenBank and on our ftp site at URL:web site to GenBank and on our ftp site at URL:
ftp.genome.ou.edu/medicagoftp.genome.ou.edu/medicago
• keyword and blast searches can be done on our web sitekeyword and blast searches can be done on our web site
at URL:at URL: http://www.genome.ou.edu/medicago.htmlhttp://www.genome.ou.edu/medicago.html
• Additional annotation via Genome Browser databaseAdditional annotation via Genome Browser database
are available on our web site at URL:are available on our web site at URL:
http://www.genome.ou.edu/medicago_table.htmlhttp://www.genome.ou.edu/medicago_table.html
• E-mail suggestions for additional annotation to BruceE-mail suggestions for additional annotation to Bruce
Roe at: broe@ou.eduRoe at: broe@ou.edu
40. A
C
G
T
Three Year PlanThree Year Plan
• Obtain the contiguous sequence of the GeneObtain the contiguous sequence of the Gene
Rich regions of four of the 8Rich regions of four of the 8 Medicago truncatulaMedicago truncatula
genome at OU, with the remaining four beinggenome at OU, with the remaining four being
completed by our international partners at TIGR,completed by our international partners at TIGR,
Sanger, and Genoscope.Sanger, and Genoscope.
• This information will serve as a solid foundationThis information will serve as a solid foundation
for anticipated comparative and functionalfor anticipated comparative and functional
legume genomics.legume genomics.
41. A
C
G
T
Laboratory OrganizationLaboratory Organization
Bruce Roe, PIBruce Roe, PI
InformaticsInformatics
Support TeamsSupport Teams
ProductionProduction AdministrationAdministration
Jim WhiteJim White
Steve KentonSteve Kenton
Hongshing LaiHongshing Lai
Sean QianSean Qian
Rose Morales-Diaz*Rose Morales-Diaz*
Mounir Elharam*Mounir Elharam*
Yonas TesfaiYonas Tesfai
Steve Shaull**Steve Shaull**
Doug WhiteDoug White
Work-study Undergraduates**Work-study Undergraduates**
Kay Lynn HaleKay Lynn Hale
Dixie WishnuckDixie Wishnuck
Tami WomackTami Womack
Mary Catherine WilliamsMary Catherine Williams
DNA SynthesisDNA Synthesis
Phoebe Loh*Phoebe Loh*
Sulan QiSulan Qi
Bart Ford*Bart Ford*
Reagents &Reagents &
Equip. Maint.Equip. Maint.
Mounir Elharam*Mounir Elharam*
Doug WhiteDoug White
Axin HuaAxin Hua
Weihong XuWeihong Xu
Jami MilamJami Milam
Sara Downard**Sara Downard**
Limei YangLimei Yang
Angie Prescott*Angie Prescott*
Audra Wendt**Audra Wendt**
Mandi Aycock**Mandi Aycock**
Ziyun YaoZiyun Yao
Steve Shaull*Steve Shaull*
Youngju YoonYoungju Yoon
Trang DoTrang Do
Anh DoAnh Do
Lily FuLily Fu
Yang YeYang Ye
James YuJames Yu
Tessa Manning**Tessa Manning**
Fu YingFu Ying
Liping ZhouLiping Zhou
Ruihua ShiRuihua Shi
Junjie WuJunjie Wu
Stephan DeschampsStephan Deschamps
Shelly OommenShelly Oommen
Christopher LauChristopher Lau
Yanhong LiYanhong Li
Research TeamsResearch Teams
Doris KupferDoris Kupfer
Julia Kim*Julia Kim*
Sun SoSun So
Graham Wiley**Graham Wiley**
Lauren Ritterhouse**Lauren Ritterhouse**
Lin SongLin Song
Ying NiYing Ni
Huarong JiangHuarong Jiang
ShaoPing LinShaoPing Lin
Honggui JiaHonggui Jia
Hongming WuHongming Wu
Baifang QinBaifang Qin
Peng ZhangPeng Zhang
Fares NajarFares Najar
Chunmei QuChunmei Qu
Keqin WangKeqin Wang
Carson QuCarson Qu
Shuling LiShuling Li
Funding from the Noble Foundation, DOE, and NSF
Collaborators at Univ. Minnesota, UC Davis, TIGR,
Sanger, Genoscope, and the Noble Foundation
Pheobe LohPheobe Loh **
Sulan QiSulan Qi
Bart Ford*Bart Ford*
* Previous undergraduate* Previous undergraduate
research studentresearch student
** Present undergraduate** Present undergraduate
research studentresearch student
44. A
C
G
T
Conserved Intron/Exon Boundry Features by a FELINEs**
Analysis of 181,444 Medicago truncatula ESTs in GenBank
vs Genomic Sequence
Size Range Mean Length
Exons 6 - 5,789 nt 268 nt
Introns 20 - 3,921 nt 429 nt
Intron Conserved Splice Site Sequence Elements Percent
Introns w/ 5’ GU 99.21%
Introns w/ 5’ GC 0.36%*
Introns w/ 5’ AU 0.31%
Introns w/ U12 branch sites instead of A12 0.13%
*Compared to 0.5 - 2.5% in fungi, and 0.5% in mammals with an EST minimum identity
of 90%
** S. Drabensctot, D. Kupfer, J. White, D. Dyer, B. Roe, K. Buchanan and J. Murphy.
FELINES: A Utility for Extracting and Examining EST-Defined Introns and Exons.
Nucleic Acid Research 31(22), E141 (2003).
45. A
C
G
T
Consensus Logogram of the 5’GU vs the 5’AU Class of Introns
in Medicago truncatula determined by FELINES
AU intron consensus
GU intron consensus