SlideShare a Scribd company logo
1 of 9
Download to read offline
PHARMACOGENOMIC DATA MINING
with Hierarchical Clustering Algorithms
Ohene Z. Frank
CSC 576 Data Warehousing and Mining
Final Report
Frank | PGX Data Mining 1
PHARMACOGENOMIC DATA MINING WITH HIERARCHICAL CLUSTERING ALGORITHMS
Designer’s drugs, individualized drugs and personalized medicine are few of the
buzzwords that are proliferating the biotech information super-highway and are
widely used by pharmaceutical scientists, clinical scientists, researchers and medical
humanitarians when referring to pharmacogenomics. Malorye Branca of Bio-IT
World stated, “One of the most seductive lures of the genomic revolution is the
promise of personalized medicine”. Pharmacogenomics is the study of how one’s
genetic makeup affects the body’s response to drugs, hence an intersection of
genetics, pharmacodynamics and pharmacokinetics. Pharmacogenetics is widely
used synonymously with pharmacogenomics. Conceptually, these genomics terms
are interchangeable, but from a purist view, pharmacogenomics is the technology
where as pharmacogenetics is the science. Genaissance Pharmaceuticals defined
pharmacogenomics as the application of genome science (genomics) to the study
of human variability to drug response.
So, what’s the real tumult? In the United States, there is at least 100, 000 death
annually due to adverse reactions (side effects) to prescription drugs. Moreover,
millions of people are being treated with drugs that are ineffective or have very little
pharmacological effect; beta-blockers given to reduce blood pressure are
ineffective in one-third of patients and many antidepressants in half of the people
who take them [1].
The culpability for the lack of efficacy and intolerance of many drugs lies mainly with
our genes, which help to determine the way in which our body reacts, absorbs,
Frank | PGX Data Mining 2
distributes, metabolize and excretes drugs. Small genetic variations between
people (known as polymorphisms) can alter the behavior of proteins that carry a
drug to its target cells or tissues, neutralize the enzymes that activate a drug or aid in
the excretion process or alter the structure of the receptor to which a drug is
supposed to bind [1]. Variation in immune-system genes can also influence how
particular drugs are tolerated. These slight genetic variations mean that the dose at
which a drug will work may vary hugely from person to person; hence, the one-size-
fits-all drug development and prescribing can lead to life-threatening adverse
reaction to a drug or in some cases, fatality.
On the right path forward, the genomics revolution has given us the tools to identify
people who don't fit the standard prescribing mold. Genomics is the use of high
throughput molecular biology technologies to study large numbers of genes, and
gene products simultaneously in whole cells, whole tissues, or whole organisms [2].
The genome is all of the genetic material in a cell or an organism. According to the
U.S. Department of Energy, the genome is an organism’s complete set of DNA. In
the human genome, DNA is arranged into 24 distinct chromosomes, which are
separate molecules (physically) that range in length from about 50 million to 250
million base pairs [3]. Each chromosome is a single strand of the DNA double helix
that is very long in length (as illustrated Figure 1).
Frank | PGX Data Mining 3
Figure 11: Illustration of a chromosome replicating its DNA before a cell divides.
Single nucleotide polymorphisms (SNPs) are single-letter variations in the genetic
code that are scattered throughout the genome. Most SNPs are benign, with
absolutely no effect on gene structure or expression; however, a subset of these
variations provides crucial links to disease-causing genes, either because they
directly alter a gene's activity or aid in pinpointing the location of a disease-related
gene [1].
1
Figure is the courtesy of Genaissance Pharmaceuticals, Inc.
Frank | PGX Data Mining 4
The profusion of SNPs and the simplistic identification, make them the ideal
biomarkers for clinical studies. SNPs are also found in genes for drug-metabolizing
enzymes, influencing individuals' ability to process a drug properly.
Many companies have compiled large collections of SNPs with the intention of
developing diagnostic and prognostic tests, as well as to guide the development of
a new generation of drugs that would target genetically determined subsets of
patients [1]. All in all, this type of genomic technology as it aims to identify the best
possible medications for individuals while maximizing efficacy and minimizing toxicity
is known as pharmacogenomics.
Due the gravity and promise of pharmacogenomics, several genomics companies
are manufacturing DNA microarrays to identify common SNPs that influence the
activity of various enzymes. Ultimately, these gene expression chips could help to
prevent life-threatening reactions to drugs, identify appropriate drug doses, and
prescribe the right drug combination (or concomitant medications) to give to
patients with complex conditions.
In order for this to come to fulfillment at faster pace, one can applied data mining
techniques to a clinical data warehouse that contains both clinical trials data and
genomic data (anonymized genotyping and microarray) utilizing hierarchical
clustering algorithms.
Frank | PGX Data Mining 5
The data mining technique most widely utilized for the analysis of gene expression
data is hierarchical clustering. This type of clustering algorithms has the advantage
of being relatively simple and the result can be easily visualized. Hierarchical
clustering is an agglomerative approach in which single expression profiles are
joined to form groups that are further joined until the process has been carried to
completion, forming a single hierarchical tree [5].
There are six main hierarchical clustering algorithms (single-linkage, complete-
linkage, average-linkage, weighted pair-group average, within-groups and Ward’s
method) that can be applied to gene expression profiling (microarray) data analysis.
These clustering algorithms differ in the methodology in which distances are
calculated between the growing clusters and the remaining members (including
other clusters) in the data set. [5]
Single-linkage Clustering: This method is also referred to as the minimum, or
nearest-neighbor method. The distance between two clusters, x and y, is
calculated as the minimum distance between a member of cluster x and a
member of cluster y. This method tends to produce “loose” clusters that can
be joined, if any two members are close together. This method often results in
sequential addition of single samples to an existing cluster, which in turn,
produces trees with many long, single-addition branches representing clusters
that have grown by accumulation.
Complete-linkage Clustering: This method is also referred to as the maximum
or furthest-neighbor method. The distance between two clusters is calculated
Frank | PGX Data Mining 6
as the greatest distance between members of the relevant clusters. This
method tends to produce very compact clusters of elements and the clusters
are often very similar in size.
Average-linkage Clustering: This method is also referred to as unweighted
pair-group method average. The average distance is calculated from the
distance between each point in a cluster and all other points in another
cluster. The two clusters with the lowest average distance are joined
together to form a new cluster.
Weighted Pair-group Average: This method is identical to average-linkage
clustering (as described above), except that the size of the respective clusters
is used as a weight in the computations. This method should be used when
the cluster sizes are suspected to be greatly uneven.
Within-groups Clustering: This method is similar to average-linkage clustering
also, except that the clusters are merged and a cluster average is used for
further calculations instead of the individual cluster elements. This method
tends to produce tighter clusters than average-linkage clustering.
Ward's Method: In this method, the calculation of the total sum of squared
deviations from the mean of a cluster and joining clusters in order that it
produces the smallest possible increase in the sum of squared errors
determines the clusters.
Frank | PGX Data Mining 7
Figure 32: Hierarchical Clustering Demonstration
Figure 3 is a representation of gene expression data that were subjected to average-
linkage, complete-linkage and single-linkage hierarchical clustering using a
Euclidean distance metric and gene-expression families (A–J) that were color coded
for comparison. Genes that are up-regulated appear in red, and those that are
down-regulated appear in green, with the relative log2 (ratio) reflected by the
intensity of the color [5].
2
Courtesy of Nature Reviews, Nature Publishing Group
Frank | PGX Data Mining 8
The aim and allure of pharmacogenomic data mining is to discovery knowledge
from a clinical genomic data warehouse (comprised of both genomic and clinical
data), in order to identify and prescribe the most effective and least toxic drug for
an individual based the person’s genetic makeup and the targeted disease.
References
[1] Abbott, A., Nature 425, 760 - 762 (23 October 2003).
[2] Genaissance Pharmaceuticals, Inc., Online Glossary (2004).
[3] US Department of Energy, Human Genome Information Project,
Pharmacogenomics (2004).
[4] Branca, M., The New, New Pharmacogenomics, Bio-IT World (Sept. 9, 2002).
[5] Quackenbush, J., Nature Reviews Genetics 2, 418-427 (2001).
[6] Brown, M., Essentials of Medical Genomics, 163-198 (2003).
[7] Hollinger, M.A., Introduction to Pharmacology 2, 288-290 (2003).

More Related Content

What's hot

Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Andrei KUCHARAVY
 
Tools for target identification and validation
Tools for target identification and validationTools for target identification and validation
Tools for target identification and validationDr. sreeremya S
 
COMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERYCOMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERYAmrutha Lakshmi
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtuAqsa Qambrani
 
Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Tudor Oprea
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designingW Roseybala Devi
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision
 
Recent advances in genetic Predisposition of Myasthenia Gravis
Recent advances in genetic Predisposition of Myasthenia GravisRecent advances in genetic Predisposition of Myasthenia Gravis
Recent advances in genetic Predisposition of Myasthenia Gravisangelisralopez
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interactionAashish Patel
 
Applications of proteomic sciences
Applications of proteomic sciencesApplications of proteomic sciences
Applications of proteomic sciencessukanyakk
 
Lecture 8 drug targets and target identification
Lecture 8 drug targets and target identificationLecture 8 drug targets and target identification
Lecture 8 drug targets and target identificationRAJAN ROLTA
 
The Role of Bioinformatics in The Drug Discovery Process
The Role of Bioinformatics in The Drug Discovery ProcessThe Role of Bioinformatics in The Drug Discovery Process
The Role of Bioinformatics in The Drug Discovery ProcessAdebowale Qazeem
 
Impacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics pptImpacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics pptGloria Okenze
 
Molecular target and development models
Molecular target and development modelsMolecular target and development models
Molecular target and development modelsAmjad Afridi
 
Unravelling the molecular linkage of co morbid diseases
Unravelling the molecular linkage of co morbid diseasesUnravelling the molecular linkage of co morbid diseases
Unravelling the molecular linkage of co morbid diseaseseSAT Journals
 

What's hot (20)

Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
 
Tools for target identification and validation
Tools for target identification and validationTools for target identification and validation
Tools for target identification and validation
 
COMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERYCOMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERY
 
Insilico binding studies on tau protein and pp2 a as alternative targets in a...
Insilico binding studies on tau protein and pp2 a as alternative targets in a...Insilico binding studies on tau protein and pp2 a as alternative targets in a...
Insilico binding studies on tau protein and pp2 a as alternative targets in a...
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtu
 
Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923
 
Genomics and proteomics
Genomics and proteomicsGenomics and proteomics
Genomics and proteomics
 
Genomics & Proteomics Based Drug Discovery
Genomics & Proteomics Based Drug DiscoveryGenomics & Proteomics Based Drug Discovery
Genomics & Proteomics Based Drug Discovery
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
Recent advances in genetic Predisposition of Myasthenia Gravis
Recent advances in genetic Predisposition of Myasthenia GravisRecent advances in genetic Predisposition of Myasthenia Gravis
Recent advances in genetic Predisposition of Myasthenia Gravis
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Preproposal Talk
Preproposal TalkPreproposal Talk
Preproposal Talk
 
Applications of proteomic sciences
Applications of proteomic sciencesApplications of proteomic sciences
Applications of proteomic sciences
 
Lecture 8 drug targets and target identification
Lecture 8 drug targets and target identificationLecture 8 drug targets and target identification
Lecture 8 drug targets and target identification
 
The Role of Bioinformatics in The Drug Discovery Process
The Role of Bioinformatics in The Drug Discovery ProcessThe Role of Bioinformatics in The Drug Discovery Process
The Role of Bioinformatics in The Drug Discovery Process
 
Impacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics pptImpacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics ppt
 
protein microarray
protein microarray protein microarray
protein microarray
 
Molecular target and development models
Molecular target and development modelsMolecular target and development models
Molecular target and development models
 
Unravelling the molecular linkage of co morbid diseases
Unravelling the molecular linkage of co morbid diseasesUnravelling the molecular linkage of co morbid diseases
Unravelling the molecular linkage of co morbid diseases
 

Viewers also liked

Общественный контроль государственных и муниципальных расходов
Общественный контроль государственных и муниципальных расходовОбщественный контроль государственных и муниципальных расходов
Общественный контроль государственных и муниципальных расходовKomitetGI
 
Forced migration, care and family relations
Forced migration, care and family relationsForced migration, care and family relations
Forced migration, care and family relationsRuth Evans
 
SMX East - Alternate Mobile Conversion Metrics
SMX East - Alternate Mobile Conversion MetricsSMX East - Alternate Mobile Conversion Metrics
SMX East - Alternate Mobile Conversion MetricsAaron Levy
 
Конфликты Никовская Л.И. - го и власть
Конфликты Никовская Л.И. - го и властьКонфликты Никовская Л.И. - го и власть
Конфликты Никовская Л.И. - го и властьKomitetGI
 
καζαντζακησ
καζαντζακησκαζαντζακησ
καζαντζακησfoteini2013
 
Our complex tech future
Our complex tech futureOur complex tech future
Our complex tech futureLizzie Hodgson
 
UN policy and standards migrants vs refugees
UN policy and standards migrants vs refugeesUN policy and standards migrants vs refugees
UN policy and standards migrants vs refugeesМЦМС | MCIC
 
Βραβείο προπαίδειας
Βραβείο προπαίδειαςΒραβείο προπαίδειας
Βραβείο προπαίδειαςRoula Mple
 

Viewers also liked (14)

Общественный контроль государственных и муниципальных расходов
Общественный контроль государственных и муниципальных расходовОбщественный контроль государственных и муниципальных расходов
Общественный контроль государственных и муниципальных расходов
 
Resume
ResumeResume
Resume
 
Powerpoint9
Powerpoint9Powerpoint9
Powerpoint9
 
Forced migration, care and family relations
Forced migration, care and family relationsForced migration, care and family relations
Forced migration, care and family relations
 
SMX East - Alternate Mobile Conversion Metrics
SMX East - Alternate Mobile Conversion MetricsSMX East - Alternate Mobile Conversion Metrics
SMX East - Alternate Mobile Conversion Metrics
 
Leave a legacy
Leave a legacyLeave a legacy
Leave a legacy
 
RecyclinginIV
RecyclinginIVRecyclinginIV
RecyclinginIV
 
Emerce Conversion
Emerce ConversionEmerce Conversion
Emerce Conversion
 
Конфликты Никовская Л.И. - го и власть
Конфликты Никовская Л.И. - го и властьКонфликты Никовская Л.И. - го и власть
Конфликты Никовская Л.И. - го и власть
 
καζαντζακησ
καζαντζακησκαζαντζακησ
καζαντζακησ
 
Activity Sheet
Activity SheetActivity Sheet
Activity Sheet
 
Our complex tech future
Our complex tech futureOur complex tech future
Our complex tech future
 
UN policy and standards migrants vs refugees
UN policy and standards migrants vs refugeesUN policy and standards migrants vs refugees
UN policy and standards migrants vs refugees
 
Βραβείο προπαίδειας
Βραβείο προπαίδειαςΒραβείο προπαίδειας
Βραβείο προπαίδειας
 

Similar to PGX Data Mining

The Principle of Rational Design of Drug Combination and Personalized Therapy...
The Principle of Rational Design of Drug Combination and Personalized Therapy...The Principle of Rational Design of Drug Combination and Personalized Therapy...
The Principle of Rational Design of Drug Combination and Personalized Therapy...Jianghui Xiong
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsJunaidAKG
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation ANAND SAGAR TIWARI
 
Solutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureSolutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureAffymetrix
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Mutiple Sclerosis
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsKevin Jaglinski
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentSuchittaU
 
Instructions Respond to your colleague in one of the following
Instructions Respond to your colleague in one of the following Instructions Respond to your colleague in one of the following
Instructions Respond to your colleague in one of the following TatianaMajor22
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Pharmacogenomics, by kk sahu
Pharmacogenomics, by kk sahuPharmacogenomics, by kk sahu
Pharmacogenomics, by kk sahuKAUSHAL SAHU
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryDr. Gerry Higgins
 
Pharmacogenomics: The right drug to the right person.
Pharmacogenomics: The right drug to the right person.Pharmacogenomics: The right drug to the right person.
Pharmacogenomics: The right drug to the right person.University of Allahabad
 
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...laserxiong
 
Folding. Building New Blood Vessels And Drugs Chosen by Your DNA.
Folding. Building New Blood Vessels  And Drugs Chosen by Your DNA.Folding. Building New Blood Vessels  And Drugs Chosen by Your DNA.
Folding. Building New Blood Vessels And Drugs Chosen by Your DNA.sebastian naranjo
 
Folding.Building New Blood Vessels And Drugs Chosen by Your DNA
Folding.Building New Blood Vessels  And Drugs Chosen by Your DNAFolding.Building New Blood Vessels  And Drugs Chosen by Your DNA
Folding.Building New Blood Vessels And Drugs Chosen by Your DNAsebastian naranjo
 

Similar to PGX Data Mining (20)

The Principle of Rational Design of Drug Combination and Personalized Therapy...
The Principle of Rational Design of Drug Combination and Personalized Therapy...The Principle of Rational Design of Drug Combination and Personalized Therapy...
The Principle of Rational Design of Drug Combination and Personalized Therapy...
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analytics
 
Target discovery and validation
Target discovery and validation Target discovery and validation
Target discovery and validation
 
Solutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochureSolutions for Personalized Medicine brochure
Solutions for Personalized Medicine brochure
 
Pharmacogenomics
Pharmacogenomics Pharmacogenomics
Pharmacogenomics
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
 
Instructions Respond to your colleague in one of the following
Instructions Respond to your colleague in one of the following Instructions Respond to your colleague in one of the following
Instructions Respond to your colleague in one of the following
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
multiomics-ebook.pdf
multiomics-ebook.pdfmultiomics-ebook.pdf
multiomics-ebook.pdf
 
Pharmaogenomics
PharmaogenomicsPharmaogenomics
Pharmaogenomics
 
Pharmacogenomics, by kk sahu
Pharmacogenomics, by kk sahuPharmacogenomics, by kk sahu
Pharmacogenomics, by kk sahu
 
Genomics
GenomicsGenomics
Genomics
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discovery
 
Pharmacogenomics: The right drug to the right person.
Pharmacogenomics: The right drug to the right person.Pharmacogenomics: The right drug to the right person.
Pharmacogenomics: The right drug to the right person.
 
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
 
Folding. Building New Blood Vessels And Drugs Chosen by Your DNA.
Folding. Building New Blood Vessels  And Drugs Chosen by Your DNA.Folding. Building New Blood Vessels  And Drugs Chosen by Your DNA.
Folding. Building New Blood Vessels And Drugs Chosen by Your DNA.
 
Building new blood vessels
Building new blood vesselsBuilding new blood vessels
Building new blood vessels
 
Folding.Building New Blood Vessels And Drugs Chosen by Your DNA
Folding.Building New Blood Vessels  And Drugs Chosen by Your DNAFolding.Building New Blood Vessels  And Drugs Chosen by Your DNA
Folding.Building New Blood Vessels And Drugs Chosen by Your DNA
 

Recently uploaded

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 

PGX Data Mining

  • 1. PHARMACOGENOMIC DATA MINING with Hierarchical Clustering Algorithms Ohene Z. Frank CSC 576 Data Warehousing and Mining Final Report
  • 2. Frank | PGX Data Mining 1 PHARMACOGENOMIC DATA MINING WITH HIERARCHICAL CLUSTERING ALGORITHMS Designer’s drugs, individualized drugs and personalized medicine are few of the buzzwords that are proliferating the biotech information super-highway and are widely used by pharmaceutical scientists, clinical scientists, researchers and medical humanitarians when referring to pharmacogenomics. Malorye Branca of Bio-IT World stated, “One of the most seductive lures of the genomic revolution is the promise of personalized medicine”. Pharmacogenomics is the study of how one’s genetic makeup affects the body’s response to drugs, hence an intersection of genetics, pharmacodynamics and pharmacokinetics. Pharmacogenetics is widely used synonymously with pharmacogenomics. Conceptually, these genomics terms are interchangeable, but from a purist view, pharmacogenomics is the technology where as pharmacogenetics is the science. Genaissance Pharmaceuticals defined pharmacogenomics as the application of genome science (genomics) to the study of human variability to drug response. So, what’s the real tumult? In the United States, there is at least 100, 000 death annually due to adverse reactions (side effects) to prescription drugs. Moreover, millions of people are being treated with drugs that are ineffective or have very little pharmacological effect; beta-blockers given to reduce blood pressure are ineffective in one-third of patients and many antidepressants in half of the people who take them [1]. The culpability for the lack of efficacy and intolerance of many drugs lies mainly with our genes, which help to determine the way in which our body reacts, absorbs,
  • 3. Frank | PGX Data Mining 2 distributes, metabolize and excretes drugs. Small genetic variations between people (known as polymorphisms) can alter the behavior of proteins that carry a drug to its target cells or tissues, neutralize the enzymes that activate a drug or aid in the excretion process or alter the structure of the receptor to which a drug is supposed to bind [1]. Variation in immune-system genes can also influence how particular drugs are tolerated. These slight genetic variations mean that the dose at which a drug will work may vary hugely from person to person; hence, the one-size- fits-all drug development and prescribing can lead to life-threatening adverse reaction to a drug or in some cases, fatality. On the right path forward, the genomics revolution has given us the tools to identify people who don't fit the standard prescribing mold. Genomics is the use of high throughput molecular biology technologies to study large numbers of genes, and gene products simultaneously in whole cells, whole tissues, or whole organisms [2]. The genome is all of the genetic material in a cell or an organism. According to the U.S. Department of Energy, the genome is an organism’s complete set of DNA. In the human genome, DNA is arranged into 24 distinct chromosomes, which are separate molecules (physically) that range in length from about 50 million to 250 million base pairs [3]. Each chromosome is a single strand of the DNA double helix that is very long in length (as illustrated Figure 1).
  • 4. Frank | PGX Data Mining 3 Figure 11: Illustration of a chromosome replicating its DNA before a cell divides. Single nucleotide polymorphisms (SNPs) are single-letter variations in the genetic code that are scattered throughout the genome. Most SNPs are benign, with absolutely no effect on gene structure or expression; however, a subset of these variations provides crucial links to disease-causing genes, either because they directly alter a gene's activity or aid in pinpointing the location of a disease-related gene [1]. 1 Figure is the courtesy of Genaissance Pharmaceuticals, Inc.
  • 5. Frank | PGX Data Mining 4 The profusion of SNPs and the simplistic identification, make them the ideal biomarkers for clinical studies. SNPs are also found in genes for drug-metabolizing enzymes, influencing individuals' ability to process a drug properly. Many companies have compiled large collections of SNPs with the intention of developing diagnostic and prognostic tests, as well as to guide the development of a new generation of drugs that would target genetically determined subsets of patients [1]. All in all, this type of genomic technology as it aims to identify the best possible medications for individuals while maximizing efficacy and minimizing toxicity is known as pharmacogenomics. Due the gravity and promise of pharmacogenomics, several genomics companies are manufacturing DNA microarrays to identify common SNPs that influence the activity of various enzymes. Ultimately, these gene expression chips could help to prevent life-threatening reactions to drugs, identify appropriate drug doses, and prescribe the right drug combination (or concomitant medications) to give to patients with complex conditions. In order for this to come to fulfillment at faster pace, one can applied data mining techniques to a clinical data warehouse that contains both clinical trials data and genomic data (anonymized genotyping and microarray) utilizing hierarchical clustering algorithms.
  • 6. Frank | PGX Data Mining 5 The data mining technique most widely utilized for the analysis of gene expression data is hierarchical clustering. This type of clustering algorithms has the advantage of being relatively simple and the result can be easily visualized. Hierarchical clustering is an agglomerative approach in which single expression profiles are joined to form groups that are further joined until the process has been carried to completion, forming a single hierarchical tree [5]. There are six main hierarchical clustering algorithms (single-linkage, complete- linkage, average-linkage, weighted pair-group average, within-groups and Ward’s method) that can be applied to gene expression profiling (microarray) data analysis. These clustering algorithms differ in the methodology in which distances are calculated between the growing clusters and the remaining members (including other clusters) in the data set. [5] Single-linkage Clustering: This method is also referred to as the minimum, or nearest-neighbor method. The distance between two clusters, x and y, is calculated as the minimum distance between a member of cluster x and a member of cluster y. This method tends to produce “loose” clusters that can be joined, if any two members are close together. This method often results in sequential addition of single samples to an existing cluster, which in turn, produces trees with many long, single-addition branches representing clusters that have grown by accumulation. Complete-linkage Clustering: This method is also referred to as the maximum or furthest-neighbor method. The distance between two clusters is calculated
  • 7. Frank | PGX Data Mining 6 as the greatest distance between members of the relevant clusters. This method tends to produce very compact clusters of elements and the clusters are often very similar in size. Average-linkage Clustering: This method is also referred to as unweighted pair-group method average. The average distance is calculated from the distance between each point in a cluster and all other points in another cluster. The two clusters with the lowest average distance are joined together to form a new cluster. Weighted Pair-group Average: This method is identical to average-linkage clustering (as described above), except that the size of the respective clusters is used as a weight in the computations. This method should be used when the cluster sizes are suspected to be greatly uneven. Within-groups Clustering: This method is similar to average-linkage clustering also, except that the clusters are merged and a cluster average is used for further calculations instead of the individual cluster elements. This method tends to produce tighter clusters than average-linkage clustering. Ward's Method: In this method, the calculation of the total sum of squared deviations from the mean of a cluster and joining clusters in order that it produces the smallest possible increase in the sum of squared errors determines the clusters.
  • 8. Frank | PGX Data Mining 7 Figure 32: Hierarchical Clustering Demonstration Figure 3 is a representation of gene expression data that were subjected to average- linkage, complete-linkage and single-linkage hierarchical clustering using a Euclidean distance metric and gene-expression families (A–J) that were color coded for comparison. Genes that are up-regulated appear in red, and those that are down-regulated appear in green, with the relative log2 (ratio) reflected by the intensity of the color [5]. 2 Courtesy of Nature Reviews, Nature Publishing Group
  • 9. Frank | PGX Data Mining 8 The aim and allure of pharmacogenomic data mining is to discovery knowledge from a clinical genomic data warehouse (comprised of both genomic and clinical data), in order to identify and prescribe the most effective and least toxic drug for an individual based the person’s genetic makeup and the targeted disease. References [1] Abbott, A., Nature 425, 760 - 762 (23 October 2003). [2] Genaissance Pharmaceuticals, Inc., Online Glossary (2004). [3] US Department of Energy, Human Genome Information Project, Pharmacogenomics (2004). [4] Branca, M., The New, New Pharmacogenomics, Bio-IT World (Sept. 9, 2002). [5] Quackenbush, J., Nature Reviews Genetics 2, 418-427 (2001). [6] Brown, M., Essentials of Medical Genomics, 163-198 (2003). [7] Hollinger, M.A., Introduction to Pharmacology 2, 288-290 (2003).