The document discusses various methods for high-throughput genotyping used in molecular breeding, including SNP arrays, genotyping by sequencing, and TaqMan assays. It also covers marker-assisted selection strategies and factors critical for successful molecular breeding, such as choice of germplasm, accurate phenotyping, high-quality genotyping data, and involvement of breeders. Genome-wide association studies are presented as a method for population mapping of traits compared to bi-parental mapping of families.
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Golden Helix Inc
Population structure and inbreeding can confound results from a standard genome-wide association test. Accounting for the random effect of relatedness can lead to lower false discovery rates and identify the causative markers without over-correcting and dampening the true signal.
This presentation will review four different methods of analyzing genotype data while accounting for random effects of relatedness. Methods include PCA analysis with Linear Regression, GBLUP, EMMAX, and MLMM. Comparisons will be made using data from the Sheep HapMap project and a simulated phenotype. After presenting the various methods, we will discuss how these results can be obtained using Golden Helix SNP & Variation Suite (SVS) software and how SVS can be used to compare and contrast the results.
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...Roxana Hickey
Presented by Roxana Hickey (PhD candidate, University of Idaho) at the University of Oregon META Symposium on "Host-Microbe Systems Biology: Modeling Our Microbial Selves" 8-10 Aug 2014 in Eugene, OR
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Golden Helix Inc
Population structure and inbreeding can confound results from a standard genome-wide association test. Accounting for the random effect of relatedness can lead to lower false discovery rates and identify the causative markers without over-correcting and dampening the true signal.
This presentation will review four different methods of analyzing genotype data while accounting for random effects of relatedness. Methods include PCA analysis with Linear Regression, GBLUP, EMMAX, and MLMM. Comparisons will be made using data from the Sheep HapMap project and a simulated phenotype. After presenting the various methods, we will discuss how these results can be obtained using Golden Helix SNP & Variation Suite (SVS) software and how SVS can be used to compare and contrast the results.
Community dynamics of the adolescent vaginal microbiome during puberty (UOreg...Roxana Hickey
Presented by Roxana Hickey (PhD candidate, University of Idaho) at the University of Oregon META Symposium on "Host-Microbe Systems Biology: Modeling Our Microbial Selves" 8-10 Aug 2014 in Eugene, OR
Will data scientists lead the discovery of cancer therapeutics?Laura Berry
Presented at the Global Pharma R&D Informatics Congress. To find out more, visit:
www.global-engage.com
The rapidly decreasing cost of molecular measurement technologies not only enables the profiling of disease samples but also the cellular signatures of individual drugs in clinically relevant models. In this presentation, Bin Chen from the University of California San Francisco, proposes a systems-approach to identifying drugs that reverse the molecular state of a disease.
A Large-Scale Study of Test Coverage Evolutionjon_bell
While it is common for projects to measure what percentage of their statements are executed by tests, this single number is woefully inadequate at providing a detailed understanding of the extent to which a project’s code is tested, if there are gaps in the tests, and if these tests are useful in any meaningful way. For instance, seemingly simple changes in one part of the codebase may reduce the efficacy of existing tests that seem otherwise unrelated.
Code coverage can be useful to track long-term trends in how tested a project is, but on a day-to-day basis, can’t serve as an indicator for the change in test suite quality. In particular, moving the coverage needle even 0.01% can be extremely difficult in a project with millions of lines of code. At such a large scale, focus often drifts from which lines of code are covered to simply the number of lines covered. However, a change in the coverage of several hundred critical lines of code might be important for developers to take notice of. Over time, these small changes to which lines are covered add up to form a coverage debt, and can lead to a dangerous reduction in test suite effectiveness.
We are building tools and techniques to help every developer track and manage their coverage debt, easily answering questions like: Which lines are no longer covered, even though I didn’t change them? Which lines are non-deterministically covered, perhaps indicative of flaky tests? By answering these questions with hard data, we can provide developers with a rich understanding of the impact of their actions on test suite effectiveness.
Will data scientists lead the discovery of cancer therapeutics?Laura Berry
Presented at the Global Pharma R&D Informatics Congress. To find out more, visit:
www.global-engage.com
The rapidly decreasing cost of molecular measurement technologies not only enables the profiling of disease samples but also the cellular signatures of individual drugs in clinically relevant models. In this presentation, Bin Chen from the University of California San Francisco, proposes a systems-approach to identifying drugs that reverse the molecular state of a disease.
A Large-Scale Study of Test Coverage Evolutionjon_bell
While it is common for projects to measure what percentage of their statements are executed by tests, this single number is woefully inadequate at providing a detailed understanding of the extent to which a project’s code is tested, if there are gaps in the tests, and if these tests are useful in any meaningful way. For instance, seemingly simple changes in one part of the codebase may reduce the efficacy of existing tests that seem otherwise unrelated.
Code coverage can be useful to track long-term trends in how tested a project is, but on a day-to-day basis, can’t serve as an indicator for the change in test suite quality. In particular, moving the coverage needle even 0.01% can be extremely difficult in a project with millions of lines of code. At such a large scale, focus often drifts from which lines of code are covered to simply the number of lines covered. However, a change in the coverage of several hundred critical lines of code might be important for developers to take notice of. Over time, these small changes to which lines are covered add up to form a coverage debt, and can lead to a dangerous reduction in test suite effectiveness.
We are building tools and techniques to help every developer track and manage their coverage debt, easily answering questions like: Which lines are no longer covered, even though I didn’t change them? Which lines are non-deterministically covered, perhaps indicative of flaky tests? By answering these questions with hard data, we can provide developers with a rich understanding of the impact of their actions on test suite effectiveness.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
2. High Throughput Marker Analysis
• SNP Arrays
• Illumina Infinum assay (Bead Chip assay)
• Affymetrix Axiom genotyping
• TaqMan SNP Genotyping Assays - Life Technologies
• Genotyping By Sequencing (GBS)
3. Cost Effective Molecular Breeding Strategies
The cost of molecular markers application is now very low.
Whole genome scan (mainly for parents) is becoming a routine
application instead of single marker screening.
4. Marker Assisted Selection
Assumption: DNA markers can reliably predict phenotype
Very useful when selecting for:
• Low heritability traits (Traits dependent from the
environment)
• Traits hard to measure (aroma)
• Traits measured late in the plant life cycle (Fertility genes)
• Disease resistance genes (Multiple assay needed for each
selection)
5. Critical factors for the success of molecular
breeding
Germplasm
• Elite vs. non-elite
• Wild relatives
• Structure
6. Critical factors for the success of molecular
breeding
Germplasm
• Elite vs. non-elite
• Wild relatives
• Structure
Phenotyping
• Identify key traits
• Stress management
• Accurate
7. Critical factors for the success of molecular
breeding
Germplasm
• Elite vs. non-elite
• Wild relatives
• Structure
Genotyping
• High-throughput level
• Cost-effectiveness
• Informatics
• Data management
Phenotyping
• Identify key traits
• Stress management
• Accurate
8. Critical factors for the success of molecular
breeding
Germplasm
• Elite vs. non-elite
• Wild relatives
• Structure
Genotyping
• High-throughput level
• Cost-effectiveness
• Informatics
• Data management
Phenotyping
• Identify key traits
• Stress management
• Accurate
Breeding
• Choice of MB strategy
• Breeders fluent with MB
• Stakeholders involvement
9. QTL mapping
Family mapping
• Developing of the experimental
population
• Collection of genotypic and
phenotypic data
• Construction of linkage map
• Linkage analysis between the
trait and the markers
• Identification of QTL
• QTL cloning
10. Construction of Genetic Map
A high-density genetic map of SNP markers in apple produced
using the 20K Infinum Array
11. Family (linkage) QTL mapping
PROs
• Highly controlled experiments
• High statistical power to detect ‘major QTL’
• A limited number of markers is requested and limited samples to be
phenotyped
CONs
• Experimental crosses, time consuming, laborious, long generation time
• QTL intervals are quite long, 5-10 cM (1 cM might be equal to 300 kb up
to 3000 kb of DNA) and contain MANY genes
• Fine mapping to identify individual genes in QTL (necessity of making
large number of crosses to elicit sufficient number of meiotic events)
• QTL identification is based on bi-parental crosses and quite often QTL
are specific to the bi-parental population and not useful in a wide
genetic backgrounds
12. Population mapping vs Family mapping
• Sampling natural variation: existing populations/germplasm
collections
• Breeding accessions (high agronomic value, low diversity)
• Landraces and local varieties
• Wild varieties (low agronomic value, high diversity, source of new alleles)
• Rely on the detection of Linkage Disequilibrium
• Take advantage of recombination events accumulated over
many generations
• Searching for genotype-phenotype correlations in unrelated
individuals
closed system open system
14. How does it look at molecular level?
Marker/marker correlation
Disequilibrium matrix for
polymorphic sites
Selfing species
selfing reduces opportunities
for recombination . . . DNA
variation tends to be structured
into distinct haplotypes coupled
with extensive LD
Outcrossing species
recombination among
heterozygotes breaks up
haplotypes and reduces LD
15. Why linkage disequilibrium matters?
– LD refers to the correlation between polymorphisms in a population
– The power of association study depends on the strength of this
correlation
– The resolution with which a QTL can be mapped is a function of
how quickly LD decays over distance
High LD
Whole genome scan, low number
of marker and resolution.
Typical of selfing plants
Low LD
Candidate gene approach, high
number of marker and resolution.
Typical of outcrossing plants
16. Population structure
Presence of a systematic difference in allele frequencies
between subpopulations in a population.
Causes:
Geographic origin: natural selection may favor genes for
adaptation (fixation of alleles flanking a favored
variant)
Nonrandom mating: geographic isolation, assortative mating,
breeding history
Coancestry
Domestication: Selection of desired agronomic and end-use
quality characteristics, co-selection of loci during
breeding for multiple traits
18. Population mapping
PROs
• No need for the development of biparental populations
• High resolution mapping, take advantage of LD as well as historical
recombinations present within the gene pools of an organism and
natural genetic diversity
• Availability of broader genetic variations with wider background for
marker-trait correlation (i.e., many alleles evaluated simultaneously)
and also more QTLs underlying the traits
CONs
• GWA blind approach, facilitates the identification of new regions
containing no a priori candidate genes
• Less statistical power (presence of alleles wih low MAF, rare alleles)
• Spurious signals of association derived from population genetic
structure or non-random mating (relatedness)
• Population admixture, selection and genetic drift can bias the detected
association
• Large sample size – significant phenotyping
19. Genome Wide Association Studies (GWAS)
Different statistical methods are available for AM studies
22. Test of associated SNP as candidate for MAS
Case 1: Perfect association
Case 2: Obfuscated association
23. Genomic Selection
Genomic selection (GS) was first suggested in animal breeding
(Meuwissen, 2001)
GS is a form of MAS that simultaneously consider the effect of
all markers (as many loci as possible) in the whole genome to
calculate the Genomic Estimate of Breeding Value (GEBV).
GS avoids QTL mapping altogether
• In GS, the joint effects of all markers are fitted as random
effects in a linear model
• Trait values are predicted from a weighed index calculated
for each marker
25. Genomic Selection
• Simulation studies have shown that across different
numbers of QTLs (from 20 up to 100) and levels of h2,
responses to GS were 18 to 43% larger than MAS
(Bernardo and Yu, 2007)
• GS was found more useful with complex traits and low h2
• GS focuses on the genetic improvement of quantitative
traits rather than on understanding their genetic basis
• GS and QTL discovery are not mutually exclusive.
27. Genomic Selection Models
Prediction problem due to large p small n
• Best Linear unbiased prediction (BLUP)
• Ridge regression
• Bayesian regression
• Kernel Regression
• Machine learning model
• Partial least square regression (PLS)
• Principal component regression (PCR)