SlideShare a Scribd company logo
1 of 4
Download to read offline
Kyla Marino EE 471: Machine Learning
3/16/2016
Decision Tree for Democratic Primary
Problem Statement
Politicians attempt to predict voter demographic through, predominately, landline telephone polls. This
causes disparity between the prediction and the outcome since landline telephones lost fashion to cell
phones--which have protect laws against cold-calls. A more accurate method would be predicting the
outcome based on previous voter results.
Theory
A decision tree has a simple structure, reminiscent of their namesake, which can be broken down into
leaf and non-leaf nodes. A leaf is the class name or decision. Each non-leaf node is an attribute test.
Decision trees attempt to divide the data set so each non-leaf node divides the data in an equal manner.
An optimal build will start with the most informative test at the root, followed by the next most
informative, and etc. until all nodes end in leaves [1]. The best decision tree will only contain non-trivial
partitions and be the simpler option if presented with multiple trees.
The attribute with the highest entropy is the most likely to give the most informative partitions. Entropy
is calculated for each attribute using:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑝) = − ∑ 𝑝𝑖 log2 𝑝𝑖𝑖 (eq. 1)
The amount of information gained from an attribute can be further calculated by (where S is the set and
F is the attribute):
𝐺𝑎𝑖𝑛(𝑆, 𝐹) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑
|𝑆 𝑓|
|𝑆|𝑓∈𝑣𝑎𝑙𝑢𝑒𝑠(𝐹) 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑓) (eq. 2)
Method
The data set was collected through the U.S. Census, focusing on three attributes of the Iowa county
populations: education, size, minority percentage. The collected numerical values were further broken
into yes/no to allow easier partitions (see appendix I for data). The population is highly educated if more
than 20% have higher education degrees. A county is large if there are more than 20,000 people. There
is a large minority population if there is more than 4% minority. Iowa was chosen because it was the first
primary voting state.
The top-level pseudocode of the decision tree is:
BEGIN
READ “data.txt”;
CALCULATE entropy
CALCULATE gain
PRINT tree
END
Results
Figure 1 shows higher education had the greatest entropy followed by large population and large
minority population. The lower section of the figure displays a “True Classes” which is the winner of
each data point entered into the system.
Figure 1. Decision Tree Clinton/Sanders Results
Conclusion
The final decision tree is not as simple as it could be. The second and third large minority population
nodes are unnecessary since Sanders and Clinton both win the yes/no questions for each branch.
In future runs of the decision tree, a larger data sample should be used. It may also be beneficial to re-
evaluate the guidelines for each attribute’s yes/no answer. A population of 20,000 may be too generous
for a large population.
Appendix I:
County
Higher
Education Population
Minority
Population Winner
1 Adair 16.3 no 7,454 no 98.2 no Clinton
2 Adams 13.7 no 3,875 no 97.8 no Clinton
3 Boone 20.3 yes 26,433 yes 96.8 no Sanders
4 Butler 15 no 15,006 no 98.1 no Sanders
5 Calhoun 19.1 no 9,866 no 96.4 no Clinton
6 Carroll 18.8 no 20,562 yes 95.5 yes Clinton
7 Cedar 19.5 no 18,411 no 95.9 yes Sanders
8 Cerro Gordo 21 yes 43,254 yes 95.7 yes Clinton
9 Cherokee 19.6 no 11,836 no 97 no Sanders
10 Chickasaw 13.8 no 12,264 no 98.3 no Clinton
11 Clinton 17.7 no 48,051 yes 94 yes Sanders
12 Dallas 43.6 yes 77,400 yes 92.7 yes Clinton
13 Davis 16.4 no 8,781 no 98.3 no Clinton
14 Des Moines 18.9 no 40,255 yes 90.4 yes Sanders
15 Fremont 20.5 yes 7,022 no 97.6 no Sanders
16 Greene 17.4 no 9,200 no 97.4 no Clinton
17 Grundy 20.7 yes 12,375 no 98.3 no Sanders
18 Guthrie 17.1 no 10,722 no 98 no Clinton
19 Jefferson 31.2 yes 17,325 no 85.6 yes Sanders
20 Jones 17.2 no 20,454 yes 96.2 no Sanders
Iowa (State) 25.7 3107126 92.1
H( C ) = 1.581
Education 0.521
Population 0.530
Minority 0.530
Higher Education|yes Large Population|yes Large Minority|yes
0.275 0.345 0.345
Higher Education|no Large Population|no Large Minority|no
0.690 0.647 0.647
SUM(Higher Education) SUM(Large Population) SUM(Large Population)
0.965 0.992 0.992
Gain (Higher Education) Gain (Large Population) Gain (Large Population)
0.616 0.589 0.589
Description: Predict a county's democratic primary winner based on features of the population.
Population is highly educated if more than 20% have higher education degrees, large if there are
more than 20,000 people, and has a large minority population if there is more than 4%.
References
[1] S. Marsland, Machine Learning An Algorithmic Perspective, Boca Raton, FL: CRC Press, 2015.

More Related Content

Similar to Decision Tree for Democratic Primary

The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxrhetttrevannion
 
A deep dive into the digital divide
A deep dive into the digital divideA deep dive into the digital divide
A deep dive into the digital divideSSRS Market Research
 
Jay Fallis MRP (Alternative Borda Count) Final Copy
Jay Fallis MRP (Alternative Borda Count) Final CopyJay Fallis MRP (Alternative Borda Count) Final Copy
Jay Fallis MRP (Alternative Borda Count) Final CopyJay Fallis
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerProduct School
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretationVasista Vinuthan
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretationVasista Vinuthan
 
lecture 1 Slides.pptx
lecture 1 Slides.pptxlecture 1 Slides.pptx
lecture 1 Slides.pptxSADAF53170
 
sophmoreyearmath-197-240.docx
sophmoreyearmath-197-240.docxsophmoreyearmath-197-240.docx
sophmoreyearmath-197-240.docxRuchi Garg
 
Essay On Law Enforcement
Essay On Law EnforcementEssay On Law Enforcement
Essay On Law EnforcementAngel Smith
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxpooleavelina
 
Definition statistics -unknown
Definition statistics -unknownDefinition statistics -unknown
Definition statistics -unknownPatricia Ann Gueta
 
International Comparisons of Litigation Costs
International Comparisons of Litigation CostsInternational Comparisons of Litigation Costs
International Comparisons of Litigation CostsInsitute for Legal Reform
 
American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...
American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...
American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...Natalie Kobayashi
 
EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...
EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...
EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...Leah Condina
 

Similar to Decision Tree for Democratic Primary (20)

1.1 to 1.3
1.1 to 1.31.1 to 1.3
1.1 to 1.3
 
nossi ch 9
nossi ch 9nossi ch 9
nossi ch 9
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docx
 
A deep dive into the digital divide
A deep dive into the digital divideA deep dive into the digital divide
A deep dive into the digital divide
 
Jay Fallis MRP (Alternative Borda Count) Final Copy
Jay Fallis MRP (Alternative Borda Count) Final CopyJay Fallis MRP (Alternative Borda Count) Final Copy
Jay Fallis MRP (Alternative Borda Count) Final Copy
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's Primer
 
StatIstics module 1
StatIstics  module 1StatIstics  module 1
StatIstics module 1
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
 
lecture 1 Slides.pptx
lecture 1 Slides.pptxlecture 1 Slides.pptx
lecture 1 Slides.pptx
 
Final Paper
Final PaperFinal Paper
Final Paper
 
sophmoreyearmath-197-240.docx
sophmoreyearmath-197-240.docxsophmoreyearmath-197-240.docx
sophmoreyearmath-197-240.docx
 
Essay On Law Enforcement
Essay On Law EnforcementEssay On Law Enforcement
Essay On Law Enforcement
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
 
Definition statistics -unknown
Definition statistics -unknownDefinition statistics -unknown
Definition statistics -unknown
 
Chapter1
Chapter1Chapter1
Chapter1
 
International Comparisons of Litigation Costs
International Comparisons of Litigation CostsInternational Comparisons of Litigation Costs
International Comparisons of Litigation Costs
 
Digital Media Infographic
Digital Media InfographicDigital Media Infographic
Digital Media Infographic
 
American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...
American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...
American Support for Climate Solutions - ecoAmerica & Lake Research Partners,...
 
EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...
EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...
EAPP Q 2 – Module 8 Writing the Report Survey Field ReportLaboratoryScientifi...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Decision Tree for Democratic Primary

  • 1. Kyla Marino EE 471: Machine Learning 3/16/2016 Decision Tree for Democratic Primary Problem Statement Politicians attempt to predict voter demographic through, predominately, landline telephone polls. This causes disparity between the prediction and the outcome since landline telephones lost fashion to cell phones--which have protect laws against cold-calls. A more accurate method would be predicting the outcome based on previous voter results. Theory A decision tree has a simple structure, reminiscent of their namesake, which can be broken down into leaf and non-leaf nodes. A leaf is the class name or decision. Each non-leaf node is an attribute test. Decision trees attempt to divide the data set so each non-leaf node divides the data in an equal manner. An optimal build will start with the most informative test at the root, followed by the next most informative, and etc. until all nodes end in leaves [1]. The best decision tree will only contain non-trivial partitions and be the simpler option if presented with multiple trees. The attribute with the highest entropy is the most likely to give the most informative partitions. Entropy is calculated for each attribute using: 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑝) = − ∑ 𝑝𝑖 log2 𝑝𝑖𝑖 (eq. 1) The amount of information gained from an attribute can be further calculated by (where S is the set and F is the attribute): 𝐺𝑎𝑖𝑛(𝑆, 𝐹) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ |𝑆 𝑓| |𝑆|𝑓∈𝑣𝑎𝑙𝑢𝑒𝑠(𝐹) 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑓) (eq. 2) Method The data set was collected through the U.S. Census, focusing on three attributes of the Iowa county populations: education, size, minority percentage. The collected numerical values were further broken into yes/no to allow easier partitions (see appendix I for data). The population is highly educated if more than 20% have higher education degrees. A county is large if there are more than 20,000 people. There is a large minority population if there is more than 4% minority. Iowa was chosen because it was the first primary voting state. The top-level pseudocode of the decision tree is: BEGIN READ “data.txt”; CALCULATE entropy CALCULATE gain PRINT tree END
  • 2. Results Figure 1 shows higher education had the greatest entropy followed by large population and large minority population. The lower section of the figure displays a “True Classes” which is the winner of each data point entered into the system. Figure 1. Decision Tree Clinton/Sanders Results Conclusion The final decision tree is not as simple as it could be. The second and third large minority population nodes are unnecessary since Sanders and Clinton both win the yes/no questions for each branch. In future runs of the decision tree, a larger data sample should be used. It may also be beneficial to re- evaluate the guidelines for each attribute’s yes/no answer. A population of 20,000 may be too generous for a large population.
  • 3. Appendix I: County Higher Education Population Minority Population Winner 1 Adair 16.3 no 7,454 no 98.2 no Clinton 2 Adams 13.7 no 3,875 no 97.8 no Clinton 3 Boone 20.3 yes 26,433 yes 96.8 no Sanders 4 Butler 15 no 15,006 no 98.1 no Sanders 5 Calhoun 19.1 no 9,866 no 96.4 no Clinton 6 Carroll 18.8 no 20,562 yes 95.5 yes Clinton 7 Cedar 19.5 no 18,411 no 95.9 yes Sanders 8 Cerro Gordo 21 yes 43,254 yes 95.7 yes Clinton 9 Cherokee 19.6 no 11,836 no 97 no Sanders 10 Chickasaw 13.8 no 12,264 no 98.3 no Clinton 11 Clinton 17.7 no 48,051 yes 94 yes Sanders 12 Dallas 43.6 yes 77,400 yes 92.7 yes Clinton 13 Davis 16.4 no 8,781 no 98.3 no Clinton 14 Des Moines 18.9 no 40,255 yes 90.4 yes Sanders 15 Fremont 20.5 yes 7,022 no 97.6 no Sanders 16 Greene 17.4 no 9,200 no 97.4 no Clinton 17 Grundy 20.7 yes 12,375 no 98.3 no Sanders 18 Guthrie 17.1 no 10,722 no 98 no Clinton 19 Jefferson 31.2 yes 17,325 no 85.6 yes Sanders 20 Jones 17.2 no 20,454 yes 96.2 no Sanders Iowa (State) 25.7 3107126 92.1 H( C ) = 1.581 Education 0.521 Population 0.530 Minority 0.530 Higher Education|yes Large Population|yes Large Minority|yes 0.275 0.345 0.345 Higher Education|no Large Population|no Large Minority|no 0.690 0.647 0.647 SUM(Higher Education) SUM(Large Population) SUM(Large Population) 0.965 0.992 0.992 Gain (Higher Education) Gain (Large Population) Gain (Large Population) 0.616 0.589 0.589 Description: Predict a county's democratic primary winner based on features of the population. Population is highly educated if more than 20% have higher education degrees, large if there are more than 20,000 people, and has a large minority population if there is more than 4%.
  • 4. References [1] S. Marsland, Machine Learning An Algorithmic Perspective, Boca Raton, FL: CRC Press, 2015.