SlideShare a Scribd company logo
1 of 13
Decision Trees in
Medical Diagnosis
By Sonu Kumar
Head, AI CoE, Orbit Shifters Inc.
Todaywe will seehow we canusedecision tree algorithm in medical domain.A
decision tree is asimple yet powerful supervised learning algorithm that resemblesa
flow chart; we will talk more about that this in just aminute. Decisiontrees are
commonly usedin the fields such asmedicine, astronomy (for example, for filtering
noise from Hubble SpaceTelescopeimagesor to classify star-galaxy clusters),
manufacturing and production (for example, by Boeingto discover flaws in the
manufacturing process),and object recognition (for example, for recognizing 3D
objects).
My main motto of this article is to talk about “Using Decision Trees to make a
Medical Diagnosis”. How we canuseD-tree in medical domain to find out asmuch
information aspossibleand mimic adoctor’s decision-making process. Let’s consider
an example where anumber of patients have suffered from thesameillness, suchas
arare form of “Basorexia”.
Let further assumethat the true causesof the diseaseremain unknown to this day,
andthat all the information that is available to us consists of abunch of physiological
measurements. For example, we might have accessto the followinginformation:
 A patient’s blood pressure(‘BP’)
 A patient’s cholesterollevel(‘cholesterol’)
 A patient’sgender(‘sex’)
 A patient’sage(‘age’)
 A patient’s blood sodiumconcentration(‘Na’)
 A patient’s blood potassiumconcentration(‘K’)
Basedon all this information, let’s supposeadoctor maderecommendations tohis
patient to treat their diseaseusingone of four possibledrugs, drug A,B,C or D. we
have available for 20 different patients:
From the data, we canaskthat “what wasthe doctor’s reasoning for prescribing
drugs A, B, Cor D? Canwe seearelationship between apatient’s blood values and
the drug that the doctorprescribed?
Let’s seeif adecision tree canuncover these hiddenrelationships.
Understanding thedata
It is the first step in tackling a new MLproblem. If you cansense the data better than
most of the task is done here itself. Youcanrealize that “Drug” column is not the
feature value like all the other columns. Soit becomesthe “target labels”. In other
words, the inputs to our MLalgorithm will be all blood values,age,and gender of a
patient. Thus,the output will be aprediction of which drug to prescribe.As“Drug”
column is not numerical (though we canmakeit any time for suchasmall dictionary
in form of data),we know that it is a“Classification task”.
Thus, it would be a good idea to remove all ‘drug’entries from the dictionaries. For
this, we need to go through the list and extract the ‘drug’ entry, which is easiest to
do with alistcomprehension.
Forthe sakeof simplicity, we may want to focus on the numerical features first: ‘age’,
‘K’ and ‘Na’. Sowe canplot like this using Matplotlib asfollows:
However, this plot is not very informative, becauseall data points have the same
color. Wewant each data point to be colored according to the drug that was
prescribed. So,let’s convert ‘A’ to‘D’ into numerical values. Forthis we canusethe
ASCIIvalue of acharacter.
[Find more at: http://www.asciitable.com]
In python it is accessibleby function ord. Forexample, the character ‘A’ hasvalue 65;
‘B’ has66 and soon.
Wecan now passthese integer to the matplotlib’s scatterfunction, which will know
to choosedifferent colors for these different color labels(c=target in the following
code).
Thepreceding code will produce afigure of 2*2 grids containing four subplotsas
follows:-
Can you spot any relationshipbetween featurevalue and target labels?
There are someinteresting observations we canmake. Forexample, from the first
and third subplot, we canseethe light blue points to be clusteredaround high
sodium levels. Similarly, all red points seemto have both low sodium and low
potassium levels. Therest is lessclear. Solet’s seehow “Decision Tree” canhelp us
here.
Preprocessing thedata
In order for our data to be understood by our decisiontree algorithm, we need to
convertallcategoricalfeatures into numericalfeatures. Wewill do it using
scikit-learn’s DictVectorizer.Wefeed the dataset that we want to convert to the
fit_transform method:
If we want to seeat the first data point, we match the features namewith the
corresponding feature values:
Tomakesure that our data variables are compatible with OpenCV,we needto
convert everything to floating pointvalues:
Thenall that’s left here to split the data into 15-5.
Constructing thetree
Building the decision tree with OpenCVis relatively easythan otheralgorithms. We
cancreate an empty decision tree usingthefollowing:
In order to train the decision tree on the training data, we usethe method train:
Thenwe canpredict the labels of the new data points withpredict:
Wecaneven checkthe score asfollows:
This shows us that we only got 40 percent of the samples right. Since there are only
5 samples to test, I consider it good because 2 out of 5 samples are considered good
in that dictionary form ofdata.
Let’s checkhow it performs on trainingset:
Voila!! Decisiontree performs well on the training data becauseitis showing100
percent. This is called “Overfitting” here. Wewill talk aboutit.
Visualizing a trained decisiontree
I think it’s time to switch to scikit- learn. Its implementation allows usto customize
the algorithm and makes it a lot easier to investigate the inner workings of the tree.
It is reside under treemodule.
Wecancreate an empty decision tree usingthe DecisionTreeClassifierconstructor:
Wecantrain it usingthe fit method. Wecancompute theaccuracy score on both
training and test samplesusingthe scoremethod:
Here’sthe coolthing; if you want to know what the tree looks like, you can do so
using GraphViz to create aPDFfile (or anyother supported file type) from the
structure. Youhave to install GraphVizfirst using conda orpip command:
Thencome backto IDE,you canimport GraphVizand export the tree inGraphViz
format to afile tree.dot usingscikit-learn’sexport_graphviz exporter:
Thenbackto commandline, you canuseGraphVizto turn tree.dotinto (for example)
aPNGfile:
Investigating the inner workings of a decision
tree
Theprocess starts at the root node, where we split the data into two groups, based
on the somedecision rule. Thenthe processis repeated until all remaining samples
have the sametarget label, at which point we have reachedaleaf node. Youcan see
that the first question askedwas whether the sodium concentration wassimilar or
equalto 0.72. Thisresulted in the two subgroups:
 All data points where Na<=0.72(node1), which was true for 9 data points
 All data points where Na>0.72(node2) which was true for6 data points
At node 1, the next question askedwaswhether the remaining data points did not
have high cholesterol levels which were true for 5 data points and false for 4 data
points. At node 3, all 5 remaining data points had the sametarget label, which was
drug C(class=C), meaningthat there wasno more ambiguity to resolve. Wecall such
nodespure. Thus,node 3 becamealeaf node. Backanode 4, the next question
askedwhether sodium levels were lower than 0.445(Na<=0.445),and the remaining
4 data points were split into node 7 and node 8 . At this point, both node 7 and node
8 becameleaf nodes.
Rating the importance offeatures
Theprecedingroot node split the data according to Na<=0.72,but who told the
tree to focus on sodium first? Also, where does the number 0.72come from
anyway?
scikit-learn provides afunction to rate “feature importance”, which is anumber
between 0 and 1 for eachfeature, where 0 means not usedat all in any decisions
madeand 1 meansperfectly predicts thetarget.
Now, it becomes evident that the most telling feature for knowing which drug to
administerto patients was actually whether the patienthad a normal cholesterol
level.Age, sodium, potassium levels were also important. Gender andblood
pressure did not seem to make any difference but it doesn’t mean that this
information isuseless.
But, hold on. If cholesterol level is soimportant, why was it not picked asthe first
feature in the tree (that is, the root node)?Whywould you choose to split on the
sodium level first? This is where I needto tell you about that ominous “gini” label in
the figureearlier.
 Criterion=’gini’: TheGini impurity is ameasureof misclassification, with the aim
of minimizing the probability of misclassification.Aperfect split of the data,
where eachsubgroup contains data points of asingle target label, would result
in aGini index of 0. Wecanmeasurethe Gini index of every possible split of the
tree, andthen choosethe one that yields the lowest Gini impurity.
 Criterion=’entropy’: It is also known as information gain. Entropy is a measure of
the amount of uncertainty associated with a signal or distribution. Aperfect split
of the data wouldhave 0 entropy.
If you want to useentropy, you would type the following:
So,this was avery simple approach to use“DecisionTree” in the medicaldiagnosis.
If you want to practice on areal data set then you canuse“Breast Cancer(Wisconsin)
from UCIMachine learningrepo.
Thank You

More Related Content

What's hot

Advanced sampling part 2 presentation notes
Advanced sampling part 2   presentation notesAdvanced sampling part 2   presentation notes
Advanced sampling part 2 presentation notes
Anthony Shingleton
 

What's hot (6)

Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Introduction to STATA - Ali Rashed
Introduction to STATA - Ali RashedIntroduction to STATA - Ali Rashed
Introduction to STATA - Ali Rashed
 
Advanced sampling part 2 presentation notes
Advanced sampling part 2   presentation notesAdvanced sampling part 2   presentation notes
Advanced sampling part 2 presentation notes
 
Lecture 19
Lecture 19Lecture 19
Lecture 19
 
1234
12341234
1234
 

Similar to Medical diagnosis using decision tree

Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210
pbaxter
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
marshalkalra
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDataba
OllieShoresna
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Write a Mission Statement 1. What are your most important .docx
Write a Mission Statement 1. What are your most important .docxWrite a Mission Statement 1. What are your most important .docx
Write a Mission Statement 1. What are your most important .docx
edgar6wallace88877
 

Similar to Medical diagnosis using decision tree (20)

data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210
 
3320 lab1
3320 lab13320 lab1
3320 lab1
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDataba
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Principal Components Analysis - PyBay 2016
Principal Components Analysis - PyBay 2016Principal Components Analysis - PyBay 2016
Principal Components Analysis - PyBay 2016
 
Dymystify Statistics Day 1.pdf
Dymystify Statistics Day 1.pdfDymystify Statistics Day 1.pdf
Dymystify Statistics Day 1.pdf
 
Krupa rm
Krupa rmKrupa rm
Krupa rm
 
extra material for practicals in spss.pptx
extra material for practicals in spss.pptxextra material for practicals in spss.pptx
extra material for practicals in spss.pptx
 
Write a Mission Statement 1. What are your most important .docx
Write a Mission Statement 1. What are your most important .docxWrite a Mission Statement 1. What are your most important .docx
Write a Mission Statement 1. What are your most important .docx
 
Rishat ML ppt.pptx
Rishat ML ppt.pptxRishat ML ppt.pptx
Rishat ML ppt.pptx
 
analysing_data_using_spss.pdf
analysing_data_using_spss.pdfanalysing_data_using_spss.pdf
analysing_data_using_spss.pdf
 
analysing_data_using_spss.pdf
analysing_data_using_spss.pdfanalysing_data_using_spss.pdf
analysing_data_using_spss.pdf
 
Analysis Of Data Using SPSS
Analysis Of Data Using SPSSAnalysis Of Data Using SPSS
Analysis Of Data Using SPSS
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Medical diagnosis using decision tree

  • 1. Decision Trees in Medical Diagnosis By Sonu Kumar Head, AI CoE, Orbit Shifters Inc.
  • 2. Todaywe will seehow we canusedecision tree algorithm in medical domain.A decision tree is asimple yet powerful supervised learning algorithm that resemblesa flow chart; we will talk more about that this in just aminute. Decisiontrees are commonly usedin the fields such asmedicine, astronomy (for example, for filtering noise from Hubble SpaceTelescopeimagesor to classify star-galaxy clusters), manufacturing and production (for example, by Boeingto discover flaws in the manufacturing process),and object recognition (for example, for recognizing 3D objects). My main motto of this article is to talk about “Using Decision Trees to make a Medical Diagnosis”. How we canuseD-tree in medical domain to find out asmuch information aspossibleand mimic adoctor’s decision-making process. Let’s consider an example where anumber of patients have suffered from thesameillness, suchas arare form of “Basorexia”. Let further assumethat the true causesof the diseaseremain unknown to this day, andthat all the information that is available to us consists of abunch of physiological measurements. For example, we might have accessto the followinginformation:  A patient’s blood pressure(‘BP’)  A patient’s cholesterollevel(‘cholesterol’)  A patient’sgender(‘sex’)  A patient’sage(‘age’)  A patient’s blood sodiumconcentration(‘Na’)  A patient’s blood potassiumconcentration(‘K’)
  • 3. Basedon all this information, let’s supposeadoctor maderecommendations tohis patient to treat their diseaseusingone of four possibledrugs, drug A,B,C or D. we have available for 20 different patients: From the data, we canaskthat “what wasthe doctor’s reasoning for prescribing drugs A, B, Cor D? Canwe seearelationship between apatient’s blood values and the drug that the doctorprescribed? Let’s seeif adecision tree canuncover these hiddenrelationships. Understanding thedata It is the first step in tackling a new MLproblem. If you cansense the data better than most of the task is done here itself. Youcanrealize that “Drug” column is not the feature value like all the other columns. Soit becomesthe “target labels”. In other words, the inputs to our MLalgorithm will be all blood values,age,and gender of a patient. Thus,the output will be aprediction of which drug to prescribe.As“Drug” column is not numerical (though we canmakeit any time for suchasmall dictionary in form of data),we know that it is a“Classification task”. Thus, it would be a good idea to remove all ‘drug’entries from the dictionaries. For this, we need to go through the list and extract the ‘drug’ entry, which is easiest to do with alistcomprehension. Forthe sakeof simplicity, we may want to focus on the numerical features first: ‘age’, ‘K’ and ‘Na’. Sowe canplot like this using Matplotlib asfollows:
  • 4. However, this plot is not very informative, becauseall data points have the same color. Wewant each data point to be colored according to the drug that was prescribed. So,let’s convert ‘A’ to‘D’ into numerical values. Forthis we canusethe ASCIIvalue of acharacter. [Find more at: http://www.asciitable.com] In python it is accessibleby function ord. Forexample, the character ‘A’ hasvalue 65; ‘B’ has66 and soon.
  • 5. Wecan now passthese integer to the matplotlib’s scatterfunction, which will know to choosedifferent colors for these different color labels(c=target in the following code). Thepreceding code will produce afigure of 2*2 grids containing four subplotsas follows:- Can you spot any relationshipbetween featurevalue and target labels? There are someinteresting observations we canmake. Forexample, from the first and third subplot, we canseethe light blue points to be clusteredaround high sodium levels. Similarly, all red points seemto have both low sodium and low potassium levels. Therest is lessclear. Solet’s seehow “Decision Tree” canhelp us here.
  • 6. Preprocessing thedata In order for our data to be understood by our decisiontree algorithm, we need to convertallcategoricalfeatures into numericalfeatures. Wewill do it using scikit-learn’s DictVectorizer.Wefeed the dataset that we want to convert to the fit_transform method: If we want to seeat the first data point, we match the features namewith the corresponding feature values: Tomakesure that our data variables are compatible with OpenCV,we needto convert everything to floating pointvalues: Thenall that’s left here to split the data into 15-5.
  • 7. Constructing thetree Building the decision tree with OpenCVis relatively easythan otheralgorithms. We cancreate an empty decision tree usingthefollowing: In order to train the decision tree on the training data, we usethe method train: Thenwe canpredict the labels of the new data points withpredict: Wecaneven checkthe score asfollows: This shows us that we only got 40 percent of the samples right. Since there are only 5 samples to test, I consider it good because 2 out of 5 samples are considered good in that dictionary form ofdata. Let’s checkhow it performs on trainingset:
  • 8. Voila!! Decisiontree performs well on the training data becauseitis showing100 percent. This is called “Overfitting” here. Wewill talk aboutit. Visualizing a trained decisiontree I think it’s time to switch to scikit- learn. Its implementation allows usto customize the algorithm and makes it a lot easier to investigate the inner workings of the tree. It is reside under treemodule. Wecancreate an empty decision tree usingthe DecisionTreeClassifierconstructor: Wecantrain it usingthe fit method. Wecancompute theaccuracy score on both training and test samplesusingthe scoremethod:
  • 9. Here’sthe coolthing; if you want to know what the tree looks like, you can do so using GraphViz to create aPDFfile (or anyother supported file type) from the structure. Youhave to install GraphVizfirst using conda orpip command: Thencome backto IDE,you canimport GraphVizand export the tree inGraphViz format to afile tree.dot usingscikit-learn’sexport_graphviz exporter: Thenbackto commandline, you canuseGraphVizto turn tree.dotinto (for example) aPNGfile:
  • 10. Investigating the inner workings of a decision tree Theprocess starts at the root node, where we split the data into two groups, based on the somedecision rule. Thenthe processis repeated until all remaining samples have the sametarget label, at which point we have reachedaleaf node. Youcan see that the first question askedwas whether the sodium concentration wassimilar or equalto 0.72. Thisresulted in the two subgroups:  All data points where Na<=0.72(node1), which was true for 9 data points  All data points where Na>0.72(node2) which was true for6 data points
  • 11. At node 1, the next question askedwaswhether the remaining data points did not have high cholesterol levels which were true for 5 data points and false for 4 data points. At node 3, all 5 remaining data points had the sametarget label, which was drug C(class=C), meaningthat there wasno more ambiguity to resolve. Wecall such nodespure. Thus,node 3 becamealeaf node. Backanode 4, the next question askedwhether sodium levels were lower than 0.445(Na<=0.445),and the remaining 4 data points were split into node 7 and node 8 . At this point, both node 7 and node 8 becameleaf nodes. Rating the importance offeatures Theprecedingroot node split the data according to Na<=0.72,but who told the tree to focus on sodium first? Also, where does the number 0.72come from anyway? scikit-learn provides afunction to rate “feature importance”, which is anumber between 0 and 1 for eachfeature, where 0 means not usedat all in any decisions madeand 1 meansperfectly predicts thetarget. Now, it becomes evident that the most telling feature for knowing which drug to administerto patients was actually whether the patienthad a normal cholesterol level.Age, sodium, potassium levels were also important. Gender andblood
  • 12. pressure did not seem to make any difference but it doesn’t mean that this information isuseless. But, hold on. If cholesterol level is soimportant, why was it not picked asthe first feature in the tree (that is, the root node)?Whywould you choose to split on the sodium level first? This is where I needto tell you about that ominous “gini” label in the figureearlier.  Criterion=’gini’: TheGini impurity is ameasureof misclassification, with the aim of minimizing the probability of misclassification.Aperfect split of the data, where eachsubgroup contains data points of asingle target label, would result in aGini index of 0. Wecanmeasurethe Gini index of every possible split of the tree, andthen choosethe one that yields the lowest Gini impurity.  Criterion=’entropy’: It is also known as information gain. Entropy is a measure of the amount of uncertainty associated with a signal or distribution. Aperfect split of the data wouldhave 0 entropy. If you want to useentropy, you would type the following: So,this was avery simple approach to use“DecisionTree” in the medicaldiagnosis. If you want to practice on areal data set then you canuse“Breast Cancer(Wisconsin) from UCIMachine learningrepo.