The human reference genome is becoming more complex, moving from a single consensus sequence to representing multiple haplotypes and genomic diversity. The current assembly model, GRCh38, includes 178 regions with alternative loci sequences totaling 3.6 Mb of novel sequence not present in previous assemblies. Future assemblies will aim to better define sequence contexts and provide coordinate information for multiple genomes and patches. Challenges include developing compatible analysis tools and determining how to best represent updated regions in new assembly releases.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
Presentation by Valerie Schneider discussing Genome Reference Consortium (GRC) plans for the mouse and zebrafish reference genome assemblies, presented at the 2016 meeting of the The Allied Genetic Conference (TAGC). Includes description of resources at the National Center for Biotechnology Information (NCBI) for working with reference genome assemblies.
GRC Workshop at Churchill College on Sep 21, 2014. This is Aaron Quinlan's talk on issues with representing variants in the full assembly, with suggestions for VCF modifications for handling variant calls on the alts.
Presentation at IMGC 2019 workshop describing the latest improvements to the mouse reference genome assembly and analyses performed in preparation for the next release of the mouse genome assembly (GRCm39).
Presentation by Benedict Paten at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation at PanGenomics in the Cloud Hackathon, run by NCBI at UCSC (https://ncbiinsights.ncbi.nlm.nih.gov/2019/02/06/pangenomics-cloud-hackathon-march-2019/). Presents points to consider about the adoption of a pangenome reference, emphasizing aspects for long-term data management and wide-spread adoption.
A machine-learning view on heterogeneous catalyst design and discoveryIchigaku Takigawa
Telluride Workshop on Computational Materials Chemistry, Telluride, Colorado, USA, July 1, 2021.
https://research.chem.ucr.edu/groups/jiang/Telluride_Workshop.html
https://www.telluridescience.org/meetings/workshop-details?wid=901
https://www.telluridescience.org/meetings/workshop-details?wid=945
GRC Workshop at Churchill College on Sep 21, 2014. This is Aaron Quinlan's talk on issues with representing variants in the full assembly, with suggestions for VCF modifications for handling variant calls on the alts.
Presentation at IMGC 2019 workshop describing the latest improvements to the mouse reference genome assembly and analyses performed in preparation for the next release of the mouse genome assembly (GRCm39).
Presentation by Benedict Paten at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation by Tina Graves-Lindsay at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on production of reference grade assemblies for various human populations.
Presentation by Valerie Schneider at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on updates to the human reference assembly, GRCh38.
Presentation at PanGenomics in the Cloud Hackathon, run by NCBI at UCSC (https://ncbiinsights.ncbi.nlm.nih.gov/2019/02/06/pangenomics-cloud-hackathon-march-2019/). Presents points to consider about the adoption of a pangenome reference, emphasizing aspects for long-term data management and wide-spread adoption.
A machine-learning view on heterogeneous catalyst design and discoveryIchigaku Takigawa
Telluride Workshop on Computational Materials Chemistry, Telluride, Colorado, USA, July 1, 2021.
https://research.chem.ucr.edu/groups/jiang/Telluride_Workshop.html
https://www.telluridescience.org/meetings/workshop-details?wid=901
https://www.telluridescience.org/meetings/workshop-details?wid=945
Integration of single molecule, genome mapping data in a web-based genome bro...William Chow
Sequence, Finishing and the Future Conference (SFAF 2015) Poster submission. Santa Fe, New Mexico.
Poster describes the gEVAL browser, and the integration of genome/optical map data for use in evaluating/curation of genome assemblies. Human, Mouse, Zebrafish, Pig, Helminth, Chicken.
THE A-SQUARE TECHNOLOGY GROUP & NASCENT APPLIED METHODS AND ENDEAVOR’S ONESIMUS EQUATIONS PROCEDURAL CONFIGURATIONS, INTERPRETATIONS & APPROACHES FOR STRUCTURING GRAMMATIC GENOMES OR METHODIC CHROMOSOMAL SEQUENCING
Applying the Scientific Method to Simulation ExperimentsFrank Bergmann
In this talk I would like to explore on how to apply the scientific method to in silico experiments. How can we design these experiments, so that they are independent of the software tool that gave rise to them? Over the past decade we have seen the rise of model exchange formats such as the Systems Biology Markup Language (SBML), that enable us to share the models readily with colleagues and between applications.
Here I present the Simulation Experiment Description Markup Language (SED-ML) that aims to do the same thing for in silico experiments. After detailing its history, and where it currently stands, I will give a short overview of the growing tool support.
Computational Approaches to Systems BiologyMike Hucka
Presentation given at the Sydney Computational Biologists meetup on 21 August 2013 (http://australianbioinformatics.net/past-events/2013/8/21/computational-approaches-to-systems-biology.html).
Presentation at 2019 ASHG GRC/GIAB workshop describing history of the human reference genome, current curation efforts and future plans, and the relationship of all 3 to efforts to produce a human pan-genome.
Platform presentation at ASHG 2019 describing recent updates to the human reference genome assembly (GRCh38) and future plans with relevance to pan-genomic representations.
Presentation at 2019 ASHG GRC/GIAB workshop describing goals and progress of the telomere-to-telomere consortium to generate a genome assembly that provides representation of all sequences, including repetitive regions.
Presentation at 2019 ASHG GRC/GIAB workshop describing features and recent updates to the vg toolkit, including examples of comparisons to other methods used for alignment and variant detection.
Presentation at 2019 ASHG GRC/GIAB workshop describing recent updates to the MANE project, which aims to provide matched annotation from RefSeq and GENCODE.
Presentation by Fritz Sedlazeck at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on characterizing human structural variation.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Presentation by Karen Miga at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on centromere assemblies.
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Oleg Kshivets
Overall life span (LS) was 1671.7±1721.6 days and cumulative 5YS reached 62.4%, 10 years – 50.4%, 20 years – 44.6%. 94 LCP lived more than 5 years without cancer (LS=2958.6±1723.6 days), 22 – more than 10 years (LS=5571±1841.8 days). 67 LCP died because of LC (LS=471.9±344 days). AT significantly improved 5YS (68% vs. 53.7%) (P=0.028 by log-rank test). Cox modeling displayed that 5YS of LCP significantly depended on: N0-N12, T3-4, blood cell circuit, cell ratio factors (ratio between cancer cells-CC and blood cells subpopulations), LC cell dynamics, recalcification time, heparin tolerance, prothrombin index, protein, AT, procedure type (P=0.000-0.031). Neural networks, genetic algorithm selection and bootstrap simulation revealed relationships between 5YS and N0-12 (rank=1), thrombocytes/CC (rank=2), segmented neutrophils/CC (3), eosinophils/CC (4), erythrocytes/CC (5), healthy cells/CC (6), lymphocytes/CC (7), stick neutrophils/CC (8), leucocytes/CC (9), monocytes/CC (10). Correct prediction of 5YS was 100% by neural networks computing (error=0.000; area under ROC curve=1.0).
Basavarajeeyam is a Sreshta Sangraha grantha (Compiled book ), written by Neelkanta kotturu Basavaraja Virachita. It contains 25 Prakaranas, First 24 Chapters related to Rogas& 25th to Rasadravyas.
share - Lions, tigers, AI and health misinformation, oh my!.pptxTina Purnat
• Pitfalls and pivots needed to use AI effectively in public health
• Evidence-based strategies to address health misinformation effectively
• Building trust with communities online and offline
• Equipping health professionals to address questions, concerns and health misinformation
• Assessing risk and mitigating harm from adverse health narratives in communities, health workforce and health system
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachAyurveda ForAll
Explore the benefits of combining Ayurveda with conventional Parkinson's treatments. Learn how a holistic approach can manage symptoms, enhance well-being, and balance body energies. Discover the steps to safely integrate Ayurvedic practices into your Parkinson’s care plan, including expert guidance on diet, herbal remedies, and lifestyle modifications.
Muktapishti is a traditional Ayurvedic preparation made from Shoditha Mukta (Purified Pearl), is believed to help regulate thyroid function and reduce symptoms of hyperthyroidism due to its cooling and balancing properties. Clinical evidence on its efficacy remains limited, necessitating further research to validate its therapeutic benefits.
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists Saeid Safari
Preoperative Management of Patients on GLP-1 Receptor Agonists like Ozempic and Semiglutide
ASA GUIDELINE
NYSORA Guideline
2 Case Reports of Gastric Ultrasound
These simplified slides by Dr. Sidra Arshad present an overview of the non-respiratory functions of the respiratory tract.
Learning objectives:
1. Enlist the non-respiratory functions of the respiratory tract
2. Briefly explain how these functions are carried out
3. Discuss the significance of dead space
4. Differentiate between minute ventilation and alveolar ventilation
5. Describe the cough and sneeze reflexes
Study Resources:
1. Chapter 39, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 34, Ganong’s Review of Medical Physiology, 26th edition
3. Chapter 17, Human Physiology by Lauralee Sherwood, 9th edition
4. Non-respiratory functions of the lungs https://academic.oup.com/bjaed/article/13/3/98/278874
Adv. biopharm. APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMSAkankshaAshtankar
MIP 201T & MPH 202T
ADVANCED BIOPHARMACEUTICS & PHARMACOKINETICS : UNIT 5
APPLICATION OF PHARMACOKINETICS : TARGETED DRUG DELIVERY SYSTEMS By - AKANKSHA ASHTANKAR
ABDOMINAL TRAUMA in pediatrics part one.drhasanrajab
Abdominal trauma in pediatrics refers to injuries or damage to the abdominal organs in children. It can occur due to various causes such as falls, motor vehicle accidents, sports-related injuries, and physical abuse. Children are more vulnerable to abdominal trauma due to their unique anatomical and physiological characteristics. Signs and symptoms include abdominal pain, tenderness, distension, vomiting, and signs of shock. Diagnosis involves physical examination, imaging studies, and laboratory tests. Management depends on the severity and may involve conservative treatment or surgical intervention. Prevention is crucial in reducing the incidence of abdominal trauma in children.
Title: Sense of Taste
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the structure and function of taste buds.
Describe the relationship between the taste threshold and taste index of common substances.
Explain the chemical basis and signal transduction of taste perception for each type of primary taste sensation.
Recognize different abnormalities of taste perception and their causes.
Key Topics:
Significance of Taste Sensation:
Differentiation between pleasant and harmful food
Influence on behavior
Selection of food based on metabolic needs
Receptors of Taste:
Taste buds on the tongue
Influence of sense of smell, texture of food, and pain stimulation (e.g., by pepper)
Primary and Secondary Taste Sensations:
Primary taste sensations: Sweet, Sour, Salty, Bitter, Umami
Chemical basis and signal transduction mechanisms for each taste
Taste Threshold and Index:
Taste threshold values for Sweet (sucrose), Salty (NaCl), Sour (HCl), and Bitter (Quinine)
Taste index relationship: Inversely proportional to taste threshold
Taste Blindness:
Inability to taste certain substances, particularly thiourea compounds
Example: Phenylthiocarbamide
Structure and Function of Taste Buds:
Composition: Epithelial cells, Sustentacular/Supporting cells, Taste cells, Basal cells
Features: Taste pores, Taste hairs/microvilli, and Taste nerve fibers
Location of Taste Buds:
Found in papillae of the tongue (Fungiform, Circumvallate, Foliate)
Also present on the palate, tonsillar pillars, epiglottis, and proximal esophagus
Mechanism of Taste Stimulation:
Interaction of taste substances with receptors on microvilli
Signal transduction pathways for Umami, Sweet, Bitter, Sour, and Salty tastes
Taste Sensitivity and Adaptation:
Decrease in sensitivity with age
Rapid adaptation of taste sensation
Role of Saliva in Taste:
Dissolution of tastants to reach receptors
Washing away the stimulus
Taste Preferences and Aversions:
Mechanisms behind taste preference and aversion
Influence of receptors and neural pathways
Impact of Sensory Nerve Damage:
Degeneration of taste buds if the sensory nerve fiber is cut
Abnormalities of Taste Detection:
Conditions: Ageusia, Hypogeusia, Dysgeusia (parageusia)
Causes: Nerve damage, neurological disorders, infections, poor oral hygiene, adverse drug effects, deficiencies, aging, tobacco use, altered neurotransmitter levels
Neurotransmitters and Taste Threshold:
Effects of serotonin (5-HT) and norepinephrine (NE) on taste sensitivity
Supertasters:
25% of the population with heightened sensitivity to taste, especially bitterness
Increased number of fungiform papillae
micro teaching on communication m.sc nursing.pdfAnurag Sharma
Microteaching is a unique model of practice teaching. It is a viable instrument for the. desired change in the teaching behavior or the behavior potential which, in specified types of real. classroom situations, tends to facilitate the achievement of specified types of objectives.
263778731218 Abortion Clinic /Pills In Harare ,sisternakatoto
263778731218 Abortion Clinic /Pills In Harare ,ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group of receptionists, nurses, and physicians have worked together as a teamof receptionists, nurses, and physicians have worked together as a team wwww.lisywomensclinic.co.za/
Title: Sense of Smell
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the primary categories of smells and the concept of odor blindness.
Explain the structure and location of the olfactory membrane and mucosa, including the types and roles of cells involved in olfaction.
Describe the pathway and mechanisms of olfactory signal transmission from the olfactory receptors to the brain.
Illustrate the biochemical cascade triggered by odorant binding to olfactory receptors, including the role of G-proteins and second messengers in generating an action potential.
Identify different types of olfactory disorders such as anosmia, hyposmia, hyperosmia, and dysosmia, including their potential causes.
Key Topics:
Olfactory Genes:
3% of the human genome accounts for olfactory genes.
400 genes for odorant receptors.
Olfactory Membrane:
Located in the superior part of the nasal cavity.
Medially: Folds downward along the superior septum.
Laterally: Folds over the superior turbinate and upper surface of the middle turbinate.
Total surface area: 5-10 square centimeters.
Olfactory Mucosa:
Olfactory Cells: Bipolar nerve cells derived from the CNS (100 million), with 4-25 olfactory cilia per cell.
Sustentacular Cells: Produce mucus and maintain ionic and molecular environment.
Basal Cells: Replace worn-out olfactory cells with an average lifespan of 1-2 months.
Bowman’s Gland: Secretes mucus.
Stimulation of Olfactory Cells:
Odorant dissolves in mucus and attaches to receptors on olfactory cilia.
Involves a cascade effect through G-proteins and second messengers, leading to depolarization and action potential generation in the olfactory nerve.
Quality of a Good Odorant:
Small (3-20 Carbon atoms), volatile, water-soluble, and lipid-soluble.
Facilitated by odorant-binding proteins in mucus.
Membrane Potential and Action Potential:
Resting membrane potential: -55mV.
Action potential frequency in the olfactory nerve increases with odorant strength.
Adaptation Towards the Sense of Smell:
Rapid adaptation within the first second, with further slow adaptation.
Psychological adaptation greater than receptor adaptation, involving feedback inhibition from the central nervous system.
Primary Sensations of Smell:
Camphoraceous, Musky, Floral, Pepperminty, Ethereal, Pungent, Putrid.
Odor Detection Threshold:
Examples: Hydrogen sulfide (0.0005 ppm), Methyl-mercaptan (0.002 ppm).
Some toxic substances are odorless at lethal concentrations.
Characteristics of Smell:
Odor blindness for single substances due to lack of appropriate receptor protein.
Behavioral and emotional influences of smell.
Transmission of Olfactory Signals:
From olfactory cells to glomeruli in the olfactory bulb, involving lateral inhibition.
Primitive, less old, and new olfactory systems with different path
1. Advancing the Human Reference Assembly
Valerie Schneider
NCBI
25 February 2015
The Human Reference Genome: Today, Tomorrow and Next ?
http://genomereference.org
3. Outline
• The assembly model
• Basics
• Value added
• Challenges
• Future relevance of the reference
• Multiple genomes
• Haploid genomes
• Assembly updates
• Mechanisms
• Requirements/Challenges
4. Sequences from haplotype 1
Sequences from haplotype 2
Old Assembly model: compress into a consensus
Current Assembly model: represent both haplotypes
GRC Assembly Model
many
5. Assembly (e.g. GRCh38)
Primary
Assembly
Unit
Non-nuclear
assembly unit
(e.g. MT)
PAR
Genomic
Region
(MHC)
Genomic
Region
(UGT2B17)
Genomic
Region
(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091
GRC Assembly Model
ALT
2
ALT
3
ALT
4
ALT
5
ALT
6
ALT
7
ALT
1
6. GRC Assembly Model
Alt loci alignments are an integral part of the assembly model
alignment to chr + scaffold sequence = Alt
7. GRCh38
• 178 regions with alt loci: 2% of chromosome
sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to
chromosomes
• Average alt length = 400 kb, max = ~5 Mb
GRCh38
8. GRC Assembly Model
The human reference assembly represents population
genomic diversity in the context of linear sequences
19. Assembly (e.g. GRCh38.p1)
Primary
Assembly
Unit
Non-nuclear
assembly unit
(e.g. MT)
ALT
1
ALT
2
ALT
3
ALT
4
ALT
5
ALT
6
ALT
7
PAR
Genomic
Region
(MHC)
Genomic
Region
(UGT2B17)
Genomic
Region
(MAPT)
Patches
Genomic
Region
(ABO)
Genomic
Region
(FOXO6)
Genomic
Region
(FCGBP)
Assembly Updates
Patches
FIX NOVEL
SCAFFOLD STATUS AT NEXT
MAJOR ASSEMBLY RELEASE
ALT
LOCI
--
(integrated)
Treat as:
Allelic
Treat as:
Preferred
21. Assembly Updates
GRC
• Finished Quality
• INSDC Accessioned
• Representative of an actual DNA molecule
Criteria for Reference Assembly Component Sequences
23. GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Carol Bult
• Derek Stemple
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
GRC Credits
Workshop sponsor:
http://genomereference.org
Editor's Notes
I’d like open this workshop by reminding everyone of the difference between a genome and an assembly. A human genome is a physical object. An assembly is our representation of that object. It is a model. And as shown here, genome models can take many forms.
And as these atomic models illustrate, scientific models evolve over time to reflect our growing knowledge base. And so it is with the human assembly model, the reference genome.
Today’s workshop addresses the advancement of the human reference genome assembly in the context of new data and technologies.
In my talk, I’ll discuss the current reference assembly, highlighting the following topics (read outline). I’ll be followed by Karyn, Tina and Deanna, who will each be talking in more detail about some of these items that I introduce.
When assembling the genome of single diploid individual, there may be divergent haplotypes that confound genome assembly. In the original reference assembly model, which was essentially a stick model of linear chromosomes, there really wasn’t a good way to represent highly variant or complex genomic regions. Different haplotypes were simply compressed into a consensus. The insertion of different haplotypes, however, often led to non-existent allele combinations and artificial gaps, as illustrated here.
This issue led the GRC to develop a new assembly model several years ago that has a mechanism to cleanly represent multiple haplotypes: alternate loci. They allow the reference assembly to contain alternate representations for regions where haplotype compression isn’t appropriate or a single sequence path is considered insufficient. At the same time, the model retains the linear chromosomes with which most users are comfortable.
As a result of the adoption of this model, it’s important to understand that the reference assembly isn’t a haploid or even a diploid genome representation. For any locus, it can represent many haplotypes.
This slide explains how the assembly model accomplishes this. The first thing to know is that the “assembly” is comprised of multiple assembly units.
The primary assembly unit is the collection of chromosomes and unlocalized and unplaced scaffolds. This is essentially the original haploid assembly model.
Non-nuclear genomes are assigned to their own assembly unit.
Regions are defined for those areas of the genome for which alternate sequence representation is desired.
Alternate sequence representations for those regions go into alternate loci assembly units. The first alternate sequence representations for each region goes into into one assembly unit. Each additional sequence representation for a region goes into its own assembly unit.
We also define the PAR regions, to account for sequence shared by the sex chromosomes.
The alternate loci are stand-alone accessioned scaffold sequences that are given chromosome context via their alignment to the primary assembly unit. This image shows a portion of GRCh38 chr. 17, with its regions and alt loci alignments. As you can see, the relationships of the alts to the primary assembly can be complex, with indels and inversions. For this reason, the GRC curates these alignments.
One point I want to make is that the alignments of the alt loci to the chromosomes are an integral part of the assembly model. The alignment, in conjunction with the sequence, is what defines the alt. The alignments are available for download with the assembly from GenBank.
The ideogram image in this slide shows the genome-wide locations of alternate loci in GRCh38, along with some basic alt loci stats.
What all this means is that you don’t have to wait for the development of a graph-based genome representation and corresponding tool suites to do genomic analyses that benefit from variant sequence representations. The current assembly model allows the reference to represent population genomic diversity in the context of linear sequences, which are the currency for most existing analysis pipelines. The next couple of slides show you some of the value added to analyses by use of the full assembly model.
Gene content is one way in which alt loci add value to the assembly. In this slide, you can see several genes annotated in the regions of this alternate representation of the chr. 19 KIR region that have no alignment to the chromosome. Deanna will tell you more about genes unique to alt loci in her talk.
[Thus, if you’re not using the entire assembly in your analyses, you may be missing genes. This can affect the development of exome capture reagents. In addition, many of these alts contain paralogous gene copies that will affect alignments and your understanding of the protein content of the genome.]
Alternate loci also have implications for read mapping and data interpretation. This image from the NCBI 1000G browser shows a region of GRCh37 chr.7 encompassing UPK3B, a gene expressed in primary mesothelial cells. The chromosomal representation of UPK3B has not changed in GRCh38, and is believed to represent a relatively rare insertion allele. An alternate loci for this region is included in GRCh38, and represents the deletion allele, as illustrated by its alignment to GRCh37.
As illustrated by the 3 samples shown here, alignment profiles in this region vary depending on the alignment method used, in this case bwa or mosaik. As a result, it’s difficult to ascertain the genotypes of these samples or the distribution of these alleles in the human population.
However, with the inclusion of the alternate scaffold, we can better interpret the data. This slide shows the alignment of previously unmapped reads from one of these 1000G samples to the GRCh38 alt across the indel boundary, indicating that the sample contains the deletion allele. From analyses such as these, we can see how the inclusion of alternate loci in alignment target sets may improve alignments and data interpretation.
Alternate loci also have a broader impact on read alignments. Since we first developed this model, we’ve been interested in the effect of alt loci on read mapping. This slide describes a study we did a few years ago with the GRCh37.p9 assembly. We looked at the alignment behavior of simulated reads sourced from sequence unique to alt loci. We asked what happened to them when aligned to the primary assembly unit without the alt loci, where their true target is missing. We aligned the reads either as singletons or pairs, using two different aligners (BWA and srprism).
As shown in this graph, regardless of read pairing or the aligner, 25% of these reads failed to align (red). What’s particularly concerning is that nearly three-quarters had an off-target alignment on the primary assembly unit (in blue). These off-target alignments are likely to result in errors in variation analyses. This analysis demonstrates the broader value of including alternate loci in alignment target sets.
While it’s clear that alternate loci add value to the reference assembly, you need the right set of tools to take advantage of them. Unfortunately, using many common analyses suites and file formats with the current assembly model is kind of like eating yogurt with chopsticks. They give you a taste of the richness of the data, but leave a lot behind. This is a point that Deanna will address in greater detail later today, so I’ll only outline the challenges researchers face in using the full assembly. But because the assembly model is still based on linear sequences, it should be possible to modify our current tools and file formats to take full advantage of the reference, rather than starting from scratch.
The first issue is allelic duplication. Most current aligners cannot distinguish the allelic duplication introduced by alternate loci from segmental duplication. As a result, reads aligning to sequence common to the chromosomes and alternate loci tend to be down-weighted and excluded from further analysis. This slide shows a graphical view of an alt locus scaffold, with the alignments of the chromosome and reads from a 1000G sample. The top set of alignments represent reads that aligned to both the alt and the chr. The bottom set are the reads that aligned only to the alt. Zooming in, we see these are reads aligning to an insertion in the alt sequence.
Unless the aligner can distinguish chromosomal regions associated with alt loci and not down-weight alignments in those regions, the gains in picking up new read alignments are likely to be offset by the discarding of other alignments.
Another challenge to using alternate loci comes in reporting features associated with >1 location. As shown in this image that illustrates the TNXB locus on the reference chromosome and 3 alts, genes may have different structures in different locations. Modifications to file formats such as GFF will make it easier to recognize sequence relationships across the assembly when reporting gene and exon locations.
Variant analysis and reporting is another area where changes are needed. As illustrated here, GRCh38 includes representations for the two major haplotypes at the MAPT locus. Depending on sample genotype, it may be desirable to report on more than one representation. However, the VCF format requires modification to support this.
A GRC workshop held last fall led to a publication that helped raise awareness of these issues, and some proposals, such as this one by Aaron Quinlan to make VCF alt-friendly, were discussed. There’s a git issue available for those who are interested. Additionally, bwa-mem recently became alt aware, joining SRPRISM as an alternate aware short read aligner. These changes show that use of the full assembly model is possible and the necessary tools are starting to become available.
I now want to shift gears and discuss the place of the reference as we enter a new era in which this assembly may no longer stand apart in terms of its quality or completeness. It’s important to remember that the human reference assembly is a special kind of genome model. In today’s era of personal genome sequencing, most assemblies only model a haploid or diploid genome.
But the reference assembly is a model of many diploid genomes, meant to represent the “human” genome. This slide shows the assembly composition of the GRCh38 primary assembly. While 70% of the genome comes from one donor, sequence from >70 individuals is represented.
Because the reference is derived from many individuals and includes alternate sequence representations, it is likely to remain our best resource for putting sequences identified in any individual into a genomic context. Likewise, b/c a common coordinate system remains critical for communication and reporting purposes, we’re likely to see the reference retain this role as well.
The table shows the latest versions of human genome assemblies in GenBank. Those in red are population-specific, and more population-specific genomes are under construction today. When analyzing samples from known populations, population-specific references or collections of population-specific genomes may be particularly valuable for variant or haplotype analysis. Even with a reference that is a graph of population variation, certain analyses may benefit from using only sub-paths in the graph. However, it’s important to realize that the utility of population-specific references may be limited for admixed samples. Given that much of the US population is admixed, this is an important consideration for resource development. Today you’ll hear from Tina about gold genomes, a set of genomes from diverse populations that are being sequenced to provide new representations for some of the genome’s most variable regions. These data will be incorporated into the reference.
Karyn and Tina will also be talking today about platinum assemblies that are derived from hydatidiform moles, which have haploid genomes. Without allelic duplication complicating their assembly, these resources facilitate the resolution of some of the most complex segmentally duplicated genome regions. These platinum genomes will be assembled to reference quality. However, it’s important to realize that there are no plans to replace the reference with either of these platinum mole assemblies. Like other individual genomes, they are limited in their representation of diversity. As you’ll hear, the GRC does intend to use these genomes to improve or augment the reference. As we enter a new era of multiple high quality genomes, we still envision the reference playing important roles.
In the last few minutes of this talk, I’ll discuss ongoing efforts to improve the reference. The “patches” feature of the model allows the GRC to make assembly updates available in a timely fashion without disrupting the chromosome coordinates upon which other users rely.
Regions are defined for the genomic locations to be updated, and the sequences representing those updates are put into the “Patches” assembly unit. Like the alt loci, the patches are stand-alone scaffold sequences with alignments.
It’s important to distinguish the two types of patches and the ways in which they should be used for analysis:
(1) FIX patches correct problems in the assembly: deprecated in next assembly release.
(2) NOVEL patches add new alternate sequence representations to the assembly: become alternate loci in the next assembly release.
An example of a GRCh38 fix patch is shown on top in this issue summary from the GRC website, where sequence from a fosmid was used to patch a deleted BAC disrupting representation of the FOXO6 gene. An example of a GRCh38 novel patch is shown on the bottom, where the GRC (in collaboration with the Pharmacogenomics Research Network) added representation for another structural variant of the CYP2D6 locus. The GRC releases patches on a quarterly cycle with the next release planned for the end of March.
With all of the NGS and new genome data, you might think that the GRC is awash in sequences with which to update the assembly. But the reality sometimes feels more like this. Although there is a lot of sequence data available, sequence meeting all 3 of these reference criteria is still limited. Quality is less of an issue today than a couple years ago, but more groups doing sequencing and assembly are putting their data on “public” FTP sites, but not submitting it to an INSDC database. We encourage groups to submit their data so that it can contribute to this valuable public resource. Lastly, the reference assembly is clone based, and all component sequences are representative of a DNA molecule found in an actual individual. As long as the community feels it is important for the reference to represent actual sequences, the ability to phase or resolve haplotypes in newly sequence genomes or generate finished quality sequence from single molecules will be critical to incorporating new sequence into the assembly.