As increasing numbers of people choose to have their genomes sequenced and made available for research, more genomic data is available for analysis by machine learning approaches. Single Nucleotide Polymorphisms (SNPs) are known to be a major factor influencing many physical traits, diseases and other phenotypes. Using publicly available data and tools we predict phenotype from genotype using SNP data (1 to 2 million SNPs). We utilize data analysis and machine learning approaches only, no domain knowledge, so that our automated approach may be generally used to predict different phenotypes from genotype. In the first application of our method we predicted eye color with 87% accuracy.
Molecular Breeding in Plants is an introduction to the fundamental techniques...UNIVERSITI MALAYSIA SABAH
This slide describe the process of molecular breeding in plants which involves the application of molecular markers for Marker Assisted Selection and Marker Assisted Breeding.
Perspectives and Challenges of Phenotyping in Crop Improvement. - Copy.pptxRonikaThakur
Plant breeding programmes have been supplemented with the rapid advancements in modern technology. But these cannot be exploited fully until a précised phenotypic data is available which can bridge the gap between Genotype and Environment.
So, this presentation is made to have an overview how the advanced high throughput phenotyping platforms are playing a crucial role in the crop improvement.
Pathogenesis-related proteins (initially named “b” proteins) were discovered in tobacco leaves
hypersensitively reacting to TMV by two independently working groups (Van Loon and Van Kammen,
1970; Gianinazzi et al., 1970)
A measure of group distance based on multiple charaters.
It introduce by P.C.Mahalanobis in 1928.
Rao 1952 use this technique for assessment of genetic diversity in plant breeding.The genotypes for study of genetic diversity includes germplasm lines, and varieties.
3.Grouping of genotypes into clusters
4.Average Intra and Inter-cluster Distance
5.Cluster Diagram
6.Contributation of individual characters towards total divergence
The MEGA software is one of the most widely used software tools in molecular taxonomy and bioinformatics. This module describes how MEGA can be employed in a classroom setting to teach the fundamentals of molecular taxonomy.
Marker Assisted Selection in Crop BreedingPawan Chauhan
Marker Assisted Selection is a value addition to conventional methods of Crop Breeding. It has been gaining importance in plant breeding with new generation of plant breeders and to get accurate and fast desired result from plant breeding.
Linkage and QTL mapping Populations and Association mapping population.
F2, Immortalized F2, Backcross (BC), Near isogenic lines (NIL), RIL, Double haploids(DH), Nested Association mapping (NAM), MAGIC and Interconnected populations.
Molecular Breeding in Plants is an introduction to the fundamental techniques...UNIVERSITI MALAYSIA SABAH
This slide describe the process of molecular breeding in plants which involves the application of molecular markers for Marker Assisted Selection and Marker Assisted Breeding.
Perspectives and Challenges of Phenotyping in Crop Improvement. - Copy.pptxRonikaThakur
Plant breeding programmes have been supplemented with the rapid advancements in modern technology. But these cannot be exploited fully until a précised phenotypic data is available which can bridge the gap between Genotype and Environment.
So, this presentation is made to have an overview how the advanced high throughput phenotyping platforms are playing a crucial role in the crop improvement.
Pathogenesis-related proteins (initially named “b” proteins) were discovered in tobacco leaves
hypersensitively reacting to TMV by two independently working groups (Van Loon and Van Kammen,
1970; Gianinazzi et al., 1970)
A measure of group distance based on multiple charaters.
It introduce by P.C.Mahalanobis in 1928.
Rao 1952 use this technique for assessment of genetic diversity in plant breeding.The genotypes for study of genetic diversity includes germplasm lines, and varieties.
3.Grouping of genotypes into clusters
4.Average Intra and Inter-cluster Distance
5.Cluster Diagram
6.Contributation of individual characters towards total divergence
The MEGA software is one of the most widely used software tools in molecular taxonomy and bioinformatics. This module describes how MEGA can be employed in a classroom setting to teach the fundamentals of molecular taxonomy.
Marker Assisted Selection in Crop BreedingPawan Chauhan
Marker Assisted Selection is a value addition to conventional methods of Crop Breeding. It has been gaining importance in plant breeding with new generation of plant breeders and to get accurate and fast desired result from plant breeding.
Linkage and QTL mapping Populations and Association mapping population.
F2, Immortalized F2, Backcross (BC), Near isogenic lines (NIL), RIL, Double haploids(DH), Nested Association mapping (NAM), MAGIC and Interconnected populations.
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Thermo Fisher Scientific
Presented by Jennifer D. Churchill, PhD during a special Lunch and Learn session during the American Academy of Forensic Sciences (AAFS) 67th annual conference, February 2015. / Conclusions
• Robust panels of identity and ancestry SNPs
• Robust STR panel
• Whole genome mtDNA sequencing
• Highly informative
• Sensitive
• Quantitative – scaling comparison
• Low density chip is not necessarily a bad chip
• Wide range of density can still yield high quality data
• Based on results continue development and validation
ASHG 2015 - Redundant Annotations in Tertiary AnalysisJames Warren
After obtaining genetic variants from next generation sequencing data, a precursory step in tertiary analysis is to annotate each variant with available relevant information. There is no standardized compendium for this purpose; researchers instead are required to compile data from a motley of annotation tools and public datasets. These sources for annotation are independently maintained, and accordingly there is limited concordance between their reported contents. The choice of annotation datasets thus has a direct and significant impact on the results of the analysis.
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
Tute Genomics is cloud-based software that can rapidly analyze entire human genomes. The cost of whole genome sequencing is dropping rapidly and we are in the middle of a genomic revolution. Tute is opening a new door for personalized medicine by helping researchers & healthcare organizations analyze human genomes.
Manteia non confidential-presentation 2003-09Pascal Mayer
A non confidential corporate presentation of "Manteia Predictive Médicine" as of September 2003. Présents DNA colony sequencing resutls, instrument, DNA preparation for genotyping.
Similar to Predicting phenotype from genotype with machine learning (20)
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...i3 Health
i3 Health is pleased to make the speaker slides from this activity available for use as a non-accredited self-study or teaching resource.
This slide deck presented by Dr. Kami Maddocks, Professor-Clinical in the Division of Hematology and
Associate Division Director for Ambulatory Operations
The Ohio State University Comprehensive Cancer Center, will provide insight into new directions in targeted therapeutic approaches for older adults with mantle cell lymphoma.
STATEMENT OF NEED
Mantle cell lymphoma (MCL) is a rare, aggressive B-cell non-Hodgkin lymphoma (NHL) accounting for 5% to 7% of all lymphomas. Its prognosis ranges from indolent disease that does not require treatment for years to very aggressive disease, which is associated with poor survival (Silkenstedt et al, 2021). Typically, MCL is diagnosed at advanced stage and in older patients who cannot tolerate intensive therapy (NCCN, 2022). Although recent advances have slightly increased remission rates, recurrence and relapse remain very common, leading to a median overall survival between 3 and 6 years (LLS, 2021). Though there are several effective options, progress is still needed towards establishing an accepted frontline approach for MCL (Castellino et al, 2022). Treatment selection and management of MCL are complicated by the heterogeneity of prognosis, advanced age and comorbidities of patients, and lack of an established standard approach for treatment, making it vital that clinicians be familiar with the latest research and advances in this area. In this activity chaired by Michael Wang, MD, Professor in the Department of Lymphoma & Myeloma at MD Anderson Cancer Center, expert faculty will discuss prognostic factors informing treatment, the promising results of recent trials in new therapeutic approaches, and the implications of treatment resistance in therapeutic selection for MCL.
Target Audience
Hematology/oncology fellows, attending faculty, and other health care professionals involved in the treatment of patients with mantle cell lymphoma (MCL).
Learning Objectives
1.) Identify clinical and biological prognostic factors that can guide treatment decision making for older adults with MCL
2.) Evaluate emerging data on targeted therapeutic approaches for treatment-naive and relapsed/refractory MCL and their applicability to older adults
3.) Assess mechanisms of resistance to targeted therapies for MCL and their implications for treatment selection
Prix Galien International 2024 Forum ProgramLevi Shapiro
June 20, 2024, Prix Galien International and Jerusalem Ethics Forum in ROME. Detailed agenda including panels:
- ADVANCES IN CARDIOLOGY: A NEW PARADIGM IS COMING
- WOMEN’S HEALTH: FERTILITY PRESERVATION
- WHAT’S NEW IN THE TREATMENT OF INFECTIOUS,
ONCOLOGICAL AND INFLAMMATORY SKIN DISEASES?
- ARTIFICIAL INTELLIGENCE AND ETHICS
- GENE THERAPY
- BEYOND BORDERS: GLOBAL INITIATIVES FOR DEMOCRATIZING LIFE SCIENCE TECHNOLOGIES AND PROMOTING ACCESS TO HEALTHCARE
- ETHICAL CHALLENGES IN LIFE SCIENCES
- Prix Galien International Awards Ceremony
Knee anatomy and clinical tests 2024.pdfvimalpl1234
This includes all relevant anatomy and clinical tests compiled from standard textbooks, Campbell,netter etc..It is comprehensive and best suited for orthopaedicians and orthopaedic residents.
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journeygreendigital
Tom Selleck, an enduring figure in Hollywood. has captivated audiences for decades with his rugged charm, iconic moustache. and memorable roles in television and film. From his breakout role as Thomas Magnum in Magnum P.I. to his current portrayal of Frank Reagan in Blue Bloods. Selleck's career has spanned over 50 years. But beyond his professional achievements. fans have often been curious about Tom Selleck Health. especially as he has aged in the public eye.
Follow us on: Pinterest
Introduction
Many have been interested in Tom Selleck health. not only because of his enduring presence on screen but also because of the challenges. and lifestyle choices he has faced and made over the years. This article delves into the various aspects of Tom Selleck health. exploring his fitness regimen, diet, mental health. and the challenges he has encountered as he ages. We'll look at how he maintains his well-being. the health issues he has faced, and his approach to ageing .
Early Life and Career
Childhood and Athletic Beginnings
Tom Selleck was born on January 29, 1945, in Detroit, Michigan, and grew up in Sherman Oaks, California. From an early age, he was involved in sports, particularly basketball. which played a significant role in his physical development. His athletic pursuits continued into college. where he attended the University of Southern California (USC) on a basketball scholarship. This early involvement in sports laid a strong foundation for his physical health and disciplined lifestyle.
Transition to Acting
Selleck's transition from an athlete to an actor came with its physical demands. His first significant role in "Magnum P.I." required him to perform various stunts and maintain a fit appearance. This role, which he played from 1980 to 1988. necessitated a rigorous fitness routine to meet the show's demands. setting the stage for his long-term commitment to health and wellness.
Fitness Regimen
Workout Routine
Tom Selleck health and fitness regimen has evolved. adapting to his changing roles and age. During his "Magnum, P.I." days. Selleck's workouts were intense and focused on building and maintaining muscle mass. His routine included weightlifting, cardiovascular exercises. and specific training for the stunts he performed on the show.
Selleck adjusted his fitness routine as he aged to suit his body's needs. Today, his workouts focus on maintaining flexibility, strength, and cardiovascular health. He incorporates low-impact exercises such as swimming, walking, and light weightlifting. This balanced approach helps him stay fit without putting undue strain on his joints and muscles.
Importance of Flexibility and Mobility
In recent years, Selleck has emphasized the importance of flexibility and mobility in his fitness regimen. Understanding the natural decline in muscle mass and joint flexibility with age. he includes stretching and yoga in his routine. These practices help prevent injuries, improve posture, and maintain mobilit
The prostate is an exocrine gland of the male mammalian reproductive system
It is a walnut-sized gland that forms part of the male reproductive system and is located in front of the rectum and just below the urinary bladder
Function is to store and secrete a clear, slightly alkaline fluid that constitutes 10-30% of the volume of the seminal fluid that along with the spermatozoa, constitutes semen
A healthy human prostate measures (4cm-vertical, by 3cm-horizontal, 2cm ant-post ).
It surrounds the urethra just below the urinary bladder. It has anterior, median, posterior and two lateral lobes
It’s work is regulated by androgens which are responsible for male sex characteristics
Generalised disease of the prostate due to hormonal derangement which leads to non malignant enlargement of the gland (increase in the number of epithelial cells and stromal tissue)to cause compression of the urethra leading to symptoms (LUTS
Recomendações da OMS sobre cuidados maternos e neonatais para uma experiência pós-natal positiva.
Em consonância com os ODS – Objetivos do Desenvolvimento Sustentável e a Estratégia Global para a Saúde das Mulheres, Crianças e Adolescentes, e aplicando uma abordagem baseada nos direitos humanos, os esforços de cuidados pós-natais devem expandir-se para além da cobertura e da simples sobrevivência, de modo a incluir cuidados de qualidade.
Estas diretrizes visam melhorar a qualidade dos cuidados pós-natais essenciais e de rotina prestados às mulheres e aos recém-nascidos, com o objetivo final de melhorar a saúde e o bem-estar materno e neonatal.
Uma “experiência pós-natal positiva” é um resultado importante para todas as mulheres que dão à luz e para os seus recém-nascidos, estabelecendo as bases para a melhoria da saúde e do bem-estar a curto e longo prazo. Uma experiência pós-natal positiva é definida como aquela em que as mulheres, pessoas que gestam, os recém-nascidos, os casais, os pais, os cuidadores e as famílias recebem informação consistente, garantia e apoio de profissionais de saúde motivados; e onde um sistema de saúde flexível e com recursos reconheça as necessidades das mulheres e dos bebês e respeite o seu contexto cultural.
Estas diretrizes consolidadas apresentam algumas recomendações novas e já bem fundamentadas sobre cuidados pós-natais de rotina para mulheres e neonatos que recebem cuidados no pós-parto em unidades de saúde ou na comunidade, independentemente dos recursos disponíveis.
É fornecido um conjunto abrangente de recomendações para cuidados durante o período puerperal, com ênfase nos cuidados essenciais que todas as mulheres e recém-nascidos devem receber, e com a devida atenção à qualidade dos cuidados; isto é, a entrega e a experiência do cuidado recebido. Estas diretrizes atualizam e ampliam as recomendações da OMS de 2014 sobre cuidados pós-natais da mãe e do recém-nascido e complementam as atuais diretrizes da OMS sobre a gestão de complicações pós-natais.
O estabelecimento da amamentação e o manejo das principais intercorrências é contemplada.
Recomendamos muito.
Vamos discutir essas recomendações no nosso curso de pós-graduação em Aleitamento no Instituto Ciclos.
Esta publicação só está disponível em inglês até o momento.
Prof. Marcus Renato de Carvalho
www.agostodourado.com
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...GL Anaacs
Contact us if you are interested:
Email / Skype : kefaya1771@gmail.com
Threema: PXHY5PDH
New BATCH Ku !!! MUCH IN DEMAND FAST SALE EVERY BATCH HAPPY GOOD EFFECT BIG BATCH !
Contact me on Threema or skype to start big business!!
Hot-sale products:
NEW HOT EUTYLONE WHITE CRYSTAL!!
5cl-adba precursor (semi finished )
5cl-adba raw materials
ADBB precursor (semi finished )
ADBB raw materials
APVP powder
5fadb/4f-adb
Jwh018 / Jwh210
Eutylone crystal
Protonitazene (hydrochloride) CAS: 119276-01-6
Flubrotizolam CAS: 57801-95-3
Metonitazene CAS: 14680-51-4
Payment terms: Western Union,MoneyGram,Bitcoin or USDT.
Deliver Time: Usually 7-15days
Shipping method: FedEx, TNT, DHL,UPS etc.Our deliveries are 100% safe, fast, reliable and discreet.
Samples will be sent for your evaluation!If you are interested in, please contact me, let's talk details.
We specializes in exporting high quality Research chemical, medical intermediate, Pharmaceutical chemicals and so on. Products are exported to USA, Canada, France, Korea, Japan,Russia, Southeast Asia and other countries.
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists Saeid Safari
Preoperative Management of Patients on GLP-1 Receptor Agonists like Ozempic and Semiglutide
ASA GUIDELINE
NYSORA Guideline
2 Case Reports of Gastric Ultrasound
Acute scrotum is a general term referring to an emergency condition affecting the contents or the wall of the scrotum.
There are a number of conditions that present acutely, predominantly with pain and/or swelling
A careful and detailed history and examination, and in some cases, investigations allow differentiation between these diagnoses. A prompt diagnosis is essential as the patient may require urgent surgical intervention
Testicular torsion refers to twisting of the spermatic cord, causing ischaemia of the testicle.
Testicular torsion results from inadequate fixation of the testis to the tunica vaginalis producing ischemia from reduced arterial inflow and venous outflow obstruction.
The prevalence of testicular torsion in adult patients hospitalized with acute scrotal pain is approximately 25 to 50 percent
Predicting phenotype from genotype with machine learning
1. Predicting phenotype from
genotype with machine learning
Patricia Francis-Lyon, Shraddha Lanka,
Lakshmi Navin Arbatti, Gaurika Tyagi, Hung Do
University of San Francisco
SciPy 2017
2. Opportunity: availability of genomic data
Bryant F. McAllister
• More genomic data available for predictive analytics:
health risk assessment, etc.
• 23andme research:
- 2,000,000 genotyped customers
- 85% opt to participate in research
• OpenSNP genomic data
is publicly available
• Ancestry.com, others
3. Goals for genotype -> phenotype
• A general method to be applied to traits or diseases
that have a significant genetic component
• Agnostic to application: uses no domain knowledge
• Prediction of trait
• Emphasis on interpretability of results: detection of
novel SNPs, discovery of rules for interactions of genes
• Use publicly available tools and data sources
4. Genomic information:
some basics
Genomic information is contained in
all the cells of the body, encoded as
nucleotides A,C,T,G
A person’s genome contains ~ 3 billion of these base pairs
Average difference between DNA of unrelated persons is
about 0.1%
There are ~3.8 to 4 million variants in a genome: Single
Nucleotide Polymorphisms (SNPs), insertions, deletions,
duplications, inversions
5. The human genome
A person’s genome is organized into
46 chromosomes: 22 are paired
autosomes, and 2 sex chromosomes.
Chromosomes are inherited: one of each pair from the
mother and one from the father.
Typically a person has 2 copies of each gene, one on
each paired chromosome. They have a combined affect
on trait and functionality.
6. Variants
Variants may occur within a protein coding region, a
regulatory region, an RNA gene, or an unknown region
Some variants have no impact on function. Ex: a gene
may contain a silent mutation or loss of function may
be compensated by the other copy of the gene.
Some phenotypes (traits) are Mendelian: controlled by
whether the inherited gene alleles are dominant or
recessive.
Some traits are polygenic, controlled by more than one
gene in more complex patterns
7. Known variants
Regions of genome containing prevalent SNPs are
typically reported by a sequencing company in variant
call format (vcf) or similar plain text.
Nucleotides in the SNP region are reported
Sequencing is not perfect,
called with ~ 1% error rate.
From a 23andme file
8. Tools
• Human genomic data from OpenSNP,
mostly originating from 23andme and
Ancestry.com. Full dataset ~1.5 TB.
Our dataset of 830 people : 56 GB
• Information on millions of SNPs from dbSNP in .vcf
format - applicable to any human trait: 206 MB
• Python : Numpy, Pandas, Scikit-learn
• R (may use from Python with rpy2)
SNP FAQ Archive [Internet]
SNP file from dbSNP in .vcf format
9. Our Approach
• Build a comprehensive SNP database containing all
SNP information from dbSNP joined with gene
information
• Read each person file, extract those SNPs that occur
in our database
• Reduce the millions of SNPs to a set of 100 or so
candidates from which to select features in
Supervised learning
• Employ ML tools to classify people by phenotype,
rank SNPs, and develop rules for interactions of SNPs.
10. Finding Candidate SNPs
• Encode each SNP as number of mutated alleles:
( 0, 1 or 2)
• Employ Pandas for ease of merging information on
millions of SNPs per person into one data structure
per class.
• Calculate aggregate statistics on each SNP on a class
level: calculate the percentage of members of each
class that have each SNP value.
• Select as candidate features those SNPs that exhibit
within-group commonality and between-group
differences.
11. Machine Learning : scikit-learn
• Split into training and test sets for supervised learning
• Employ random forest feature importance to rank
SNPs by importance to the prediction.
Examine results: are there known or novel SNPs?
• Train a random forest to classify people by phenotype
based on their SNPs.
• Employ cross-validation grid search to tune
hyperparameters for a decision tree to classify.
• Examine the tree for known or novel relationships.
Extract rules.
12. Logistic Regression modeling
• Import into R for all people the 15 SNPs that were
ranked most important by the scikit-learn random
forest: SNPs as a) integer and b) factor variables
• To improve regression fit, remove SNPs whose
correlation is higher than 0.7. These tend to be in
linkage disequilibrium, inherited together
• Detect interactions of SNPs associated with
phenotype by fitting a logistic regression model to the
person-SNP data.
14. Eye color
• Eye color has recessive monogenic aspects:
- breaking both copies of the OCA2 gene disrupts/breaks
pigment production chain, result is blue eye color
- breaking both copies HERC2 region regulating OCA2 results in
OCA2 never activated, stopping pigment protein, blue eye color
• Eye color has polygenic aspects (next slide)
• Limitations of our eye color dataset:
- self-reported data, text descriptions led to judgement calls in
assigning class
- no phasing information (haplotypes are unknown), so difficult to
detect compound heterozygous and polygenic
15. Eye color
Eye color has polygenic aspects:
If OCA2 is broken on one chromosome and HERC2 on
the other, the combination can result in blue eyes while
neither HERC2 nor OCA2 are fully mutated
OCA2 Column1 HERC2Column2
M D M D
0 1 0 1
1 0 1 0
0 1 1 0
1 0 0 1
NB: zeros are unmutated, functional pigmentation genes
Bottom row:
D allele has functioning OCA2, but doesn't get turned on due to broken HERC2
M allele has functioning HERC2, but it regulates for a broken protein
Next higher row is vice-versa
Polygenic relationship: consider haplotype from mother (M), from father (F)
16. Random Forest results
Brown and blue-green eye color were predicted with
89% accuracy using no domain knowledge
Validation of method: RF top 30 SNPs and their genes:
Of the millions of human SNPs and tens of thousands of
genes, all genes of the top 30 SNPs (HERC2, OCA2,
SLC24A4, IRF4, SLC45A2, TYRP1, TYR) are known to be
involved in eye color, and most of the top SNPs appear
in the literature as having predictive value
17. Random Forest
Sensitivity: 0.892
Specificity: 0.869
Accuracy: 0.88
Random Forest
Confusion Matrix
Gene: SNP Importance
HERC2:rs12913832 most important: Han et al, Sturm et al, Eiberg et al, patent SNP, etc.
HERC2:rs1667394 Rotterdam Study patent SNP, Kayser et al, Sulem et al, Sturm et al
HERC2:rs8039195 Kayser et al, Candille et al, substitute in patent for rs7183877
HERC2:rs11636232 Mengel-From et al, Eiberg et al
OCA2:rs4778241 Rotterdam Study patent associated SNP, Kayser et al, Eiberg et al
HERC2:rs16950987 Rotterdam Study patent substitute for rs12592730, Kayser et al
HERC2:rs3935591 Rotterdam Study patent substitute for rs1667394, Eiberg et al,
SLC24A4:rs12887171 unknown clinical significance, 51208 bps from patent SNP rs12896399
OCA2:rs7495174 Rotterdam Study patent SNP, Kayser et al, Sulem et al, Duffy et al
OCA2:rs7174027 Mengel-From et al, SNPedia, substitute in patent for rs7495174
predicted
class blue brown
blue 119 18
brown 15 124
Top 10 SNPs in order of feature importance
Patent: Method for prediction of human iris color US 20110312534 A1
18. Decision Tree Confusion
Matrix Comparison
Training set Test set
234 42 122 15
24 258 14 125
Decision Tree
Sensitivity: 0.899
Specificity: 0.891
Accuracy: 0.895
Sensitivity: 0.915
Specificity: 0.848
Accuracy: 0.882
param_grid={"criterion": ["gini", "entropy"],
"min_samples_split": [.01, .015, .02, .025],
"max_depth": [None, 4, 5],
"min_samples_leaf": [.0025, .005, .01,.015],
"max_features":[.3,.4,.5,]
}
To avoid overfitting,
hyperparameters were tuned
using cross validation grid
search on the training set
20. Decision Tree
The general consensus in the literature is that HERC2
rs12913832 alone is the most important SNP for eye color
prediction. A SNP value of fully mutated (2 mutated alleles)
is indicative of blue eye color
HERC2 rs12913832 appears at the top of the decision tree
and divides nodes broadly into those that are primarily blue
(right-hand side: rs12913832 =2) and nodes that are
primarily brown (left-hand side rs12913832 = 0 or 1)
HERC2 rs12913832 can be involved in polygenic recessive if
both HERC2 rs12913832 and an OCA2 gene are partly
mutated so as to prevent function of both copies of protein
Look for evidence of polygenic relationship on decision tree
21. Example in general agreement with literature:
Node where HERC2rs12913832 = 0 is
97.8% probability Brown
Can we find evidence of polygenic?
Caveat: we don’t know haplotype
Even so, consider:
HERC2rs12913832 = 1
OCA2rs7170989 = 2 : 13.6% probability Blue
Contrast with:
HERC2rs12913832 = 1
OCA2rs7170989 = 0 or 1: 27.4% probability Blue
Node includes all OCA2rs7170989 = 0 (likely brown)
plus polygenic combinations from previous slide:
0101,1010,0110,1001 (last 2 are blue)
Decision Tree rules
22. Logistic Regression on top 15 SNPs
Imported from all people the 15 most important SNPs (identified by RF)
Created a dataset with int encoding of SNPs
Created a dataset with SNPs as factor levels (like one hot encoding)
Removed 7 highly correlated (cor > .7) SNPs
Correlation of remaining 8 SNPs
Correlation of top 15 SNPs
23. Ran step function to find best model of all combinations of main effects and
2-way interactions: then refit with only significant variables
Found 2 significant main effects and 3 significant interactions:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.8174 0.4920 11.825 < 2e-16 ***
HERC2rs12913832 -2.5663 0.3404 -7.540 4.72e-14 ***
TYRP1rs2762458 -1.1576 0.3727 -3.106 0.001895 **
TYRP1rs2762458:IRF4rs3778607 -0.3548 0.1339 -2.649 0.008077 **
HERC2rs12913832:SLC24A4rs12887171 -0.8384 0.2209 -3.796 0.000147 ***
TYRP1rs2762458:SLC24A4rs12887171 0.7311 0.2341 3.123 0.001789 **
HERC2rs12913832:SLC24A4rs12887171 a unit increase in this value results in
a decrease in the log odds of having brown eyes
exp(-0.8384)= 0.43: reduction in odds of having brown eyes by factor of .43
In general agreement with decision tree (next slide)
Logistic Regression with HERC2 rs12913832
as only predictor shows good fit:
In general agreement with literature and
decision tree (previous slide)
Pseudo R2 for logistic regression:
Hosmer and Lemeshow R2: 0.511
Cox and Snell R2: 0.507
Nagelkerke R2: 0.676
Logistic regression on uncorrelated SNPs as int
24. Decision tree comparison with logistic regression
Interaction found in logistic regression (LR):
HERC2rs12913832:SLC24A4rs12887171
Tree combined effect of right branch where
HERC2rs12913832 = 2 , SLC24A4rs12887171 = 1 or 2
Blue increases with mutation, in agreement with LR:
SLC24A4rs12887171 = 0: 73.9 %
SLC24A4rs12887171 = 1 or 2: 95.0%
25. Future
• Apply this method to a human disease with a
significant genetic component to create a risk
assessment tool
• Extend the method to employ elastic net logistic
regression for logistic regression with feature
selection, XGBoost for decision trees with pruning
26. Conclusion
• This is a general method for using supervised learning to
predict phenotype from human genomes
• We focus on gaining understanding: detecting SNPs
important to prediction, elucidating interactions and
relationships between SNPs and genes
• Testing with a well-studied problem achieved good
prediction. We also detected all genes known be
implicated in eye color and the SNPs reported to be most
influential
• We employ publicly available tools and data in an
approach that may be used for the different organisms in
dbSNP databases
• Our code will be publicly available
27. Manfred Heinz Kayser, Fan Liu, Albert Hofman. Patent: “Method for prediction of human iris color” US 20110312534 A1 (2011), data
source is Rotterdam Study, Hofman et al (2007)
Kayser, Manfred, Fan Liu, A. Cecile JW Janssens, Fernando Rivadeneira, Oscar Lao, Kate van Duijn, Mark Vermeulen et al. "Three genome-
wide association studies and a linkage analysis identify HERC2 as a human iris color gene." The American Journal of Human Genetics 82,
no. 2 (2008): 411-423.
Sulem, Patrick, Daniel F. Gudbjartsson, Simon N. Stacey, Agnar Helgason, Thorunn Rafnar, Kristinn P. Magnusson, Andrei Manolescu et al.
"Genetic determinants of hair, eye and skin pigmentation in Europeans." Nature genetics 39, no. 12 (2007): 1443-1452.
Sturm, Richard A., David L. Duffy, Zhen Zhen Zhao, Fabio PN Leite, Mitchell S. Stark, Nicholas K. Hayward, Nicholas G. Martin, and Grant
W. Montgomery. "A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue-brown
eye color." The American Journal of Human Genetics 82, no. 2 (2008): 424-431.
Eiberg, Hans, Jesper Troelsen, Mette Nielsen, Annemette Mikkelsen, Jonas Mengel-From, Klaus W. Kjaer, and Lars Hansen. "Blue eye
color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene
inhibiting OCA2 expression." Human genetics 123, no. 2 (2008): 177-187.
Han, Jiali, Peter Kraft, Hongmei Nan, Qun Guo, Constance Chen, Abrar Qureshi, Susan E. Hankinson et al. "A genome-wide association
study identifies novel alleles associated with hair color and skin pigmentation." PLoS genetics 4, no. 5 (2008): e1000074.
Mengel-From, Jonas, Claus Børsting, Juan J. Sanchez, Hans Eiberg, and Niels Morling. "Human eye colour and HERC2, OCA2 and
MATP." Forensic Science International: Genetics 4, no. 5 (2010): 323-328.
Mengel-From, Jonas, Terence H. Wong, Niels Morling, Jonathan L. Rees, and Ian J. Jackson. "Genetic determinants of hair and eye colours
in the Scottish and Danish populations." BMC genetics 10, no. 1 (2009): 88.
Duffy, David L., Grant W. Montgomery, Wei Chen, Zhen Zhen Zhao, Lien Le, Michael R. James, Nicholas K. Hayward, Nicholas G. Martin,
and Richard A. Sturm. "A three–single-nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color
variation." The American Journal of Human Genetics 80, no. 2 (2007): 241-252.
Candille, Sophie I., Devin M. Absher, Sandra Beleza, Marc Bauchet, Brian McEvoy, Garrison Nanibaa’A, Jun Z. Li et al. "Genome-wide
association studies of quantitatively measured skin, hair, and eye pigmentation in four European populations." PLoS One 7, no. 10 (2012):
e48294.
Sources: Eye color studies for validation of detected SNPs (slide 17)
28. Top row:
By JDrewes - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3117810
By Larali21 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=22281525
By Grodrig1 - Taken with a cell phone camera, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=18953885
Middle row:
By Picture taken by Matthew Goldthwaite. - 영국말 위키백과에 있는 en:Image:Humaniris.jpg, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=1622680
By Rick - Owner, CC0, https://commons.wikimedia.org/w/index.php?curid=13400572
By Manicjedi - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=16812864
Bottom row:
CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=68654
By Arctice at the English language Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6910071
By Cecikierk - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12347264
Sources: Eye color images from Wikipedia commons (slide 13):