SlideShare a Scribd company logo
1 of 15
2018 Catalytics, LLC - Proprietary and Confidential
Analyzing Breast
Cancer Dataset with
Azure
Machine Learning (ML)
Studio
Frank Mendoza
CEO, Catalytics
Chicago Technology for Value-Based Healthcare
Meetup
January 23, 2018
2018 Catalytics, LLC - Proprietary and Confidential
• Total of 569 records in dataset – donated in 1995
• 30 distinct numerical attributes (or features) associated with
each record
• No categorical features available within the dataset
Breast Cancer Wisconsin (Diagnostic) Dataset
Description
Location: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
2018 Catalytics, LLC - Proprietary and Confidential
Breast Cancer Wisconsin (Diagnostic) Dataset
Description, cont.
• Column identified as “Diagnosis” is the dataset label
• M = malignant
• B = benign 300+
200+
Example of Measurements
2018 Catalytics, LLC - Proprietary and Confidential
Core Steps to build Predictive Models using Machine Learning
5(a)
Test API
2018 Catalytics, LLC - Proprietary and Confidential
Acquire Data & Prepare
• Dataset did not have any missing values
• Manipulation was still required to ensure training process would be successful –
normalization, etc.
• Split data into two sets to Train & Test model
• Training = 311 records (~54%)
• Testing set 1 = 208 records (~36%)
• Additional Testing set was to test model after API created – step 5(a)
• Testing set 2 = 50 records (~10%)
• Training & Testing set 1 was uploaded to Azure Machine Learning (ML) Studio
2018 Catalytics, LLC - Proprietary and Confidential
Training Predictive Model
Choosing algorithms
• Since label is 2 class – Benign vs. Malignant; it was clear that a
Classification model would be necessary
• Multiple models were developed to identify the best algorithm to use
• Two class Logistic Regression
• Two class Support Vector Machine
• Two class Boosted Decision Tree
• Two class Neural Network - WINNER
2018 Catalytics, LLC - Proprietary and Confidential
Optimizing Neural Network Model
• Feature Selection – identify which attributes matter
Important Less Important
2018 Catalytics, LLC - Proprietary and Confidential
Feature Selection, continued
• Azure ML contains a module called “Permutation Feature Importance” that will
test features to identify importance
2018 Catalytics, LLC - Proprietary and Confidential
Cross Validation
• Azure ML contains a module called “Cross Validation Model” that will evaluate
model by partitioning the data – used to ensure that model will perform
against unseen/ new data
10 folds
2018 Catalytics, LLC - Proprietary and Confidential
Neural Network Classification Model
Optimized
• Feature selection allowed us to remove 14 attributes that did not
contribute to improving model
• Accuracy improved from 0.976 to 0.981
AZURE ML
DEMONSTRATION
AZURE ML API/ EXCEL
DEMONSTRATION
2018 Catalytics, LLC - Proprietary and Confidential
Frank Mendoza, CEO & Chief Catalyst
900 E. Pecan St, Suite 300-286 Pflugerville, TX 78660-8048
Phone: +1 (512) 767-8604
Fax: +1 (737) 703-5478
Email: Frank@CatalyticsConsulting.com
linkedin.com/in/fxmendoza
Twitter: @DataDrivenMind
Appendix
2018 Catalytics, LLC - Proprietary and Confidential
Attribute Information
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32) Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
Location: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.707&rep=rep1&type=pdf

More Related Content

Similar to Analyzing Breast Cancer Data with Azure ML

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014Roger Barga
 
Datasets for Machine Learning.docx
Datasets for Machine Learning.docxDatasets for Machine Learning.docx
Datasets for Machine Learning.docxShalini104884
 
Sai Teja K Resume.pdf
Sai Teja K Resume.pdfSai Teja K Resume.pdf
Sai Teja K Resume.pdfSaiTejaK11
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Big data services slideshare - agilisium 2.0 - v1.0
Big data services   slideshare - agilisium 2.0 - v1.0Big data services   slideshare - agilisium 2.0 - v1.0
Big data services slideshare - agilisium 2.0 - v1.0Agilisium Consulting
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 
Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdf
Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdfIntroduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdf
Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdfOne Federal Solution
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protectionUlf Mattsson
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataHealth Catalyst
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)Julien SIMON
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareHealth Catalyst
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataDale Sanders
 
Intelligent Digital Mesh Testing
Intelligent Digital Mesh TestingIntelligent Digital Mesh Testing
Intelligent Digital Mesh TestingNagarro
 
Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...
Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...
Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...QASymphony
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Prasanna Hegde
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Sandesh Rao
 

Similar to Analyzing Breast Cancer Data with Azure ML (20)

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Datasets for Machine Learning.docx
Datasets for Machine Learning.docxDatasets for Machine Learning.docx
Datasets for Machine Learning.docx
 
Sai Teja K Resume.pdf
Sai Teja K Resume.pdfSai Teja K Resume.pdf
Sai Teja K Resume.pdf
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Big data services slideshare - agilisium 2.0 - v1.0
Big data services   slideshare - agilisium 2.0 - v1.0Big data services   slideshare - agilisium 2.0 - v1.0
Big data services slideshare - agilisium 2.0 - v1.0
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdf
Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdfIntroduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdf
Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdf
 
SESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/MLSESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/ML
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big DataMicrosoft: A Waking Giant in Healthcare Analytics and Big Data
Microsoft: A Waking Giant in Healthcare Analytics and Big Data
 
Intelligent Digital Mesh Testing
Intelligent Digital Mesh TestingIntelligent Digital Mesh Testing
Intelligent Digital Mesh Testing
 
Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...
Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...
Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
 

More from Dan Wellisch

Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...Dan Wellisch
 
The Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health GoalsThe Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health GoalsDan Wellisch
 
Health Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best PracticesHealth Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best PracticesDan Wellisch
 
Driving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare CostsDriving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare CostsDan Wellisch
 
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...Dan Wellisch
 
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...Dan Wellisch
 
Who Is A HIPAA Business Associate ?
Who Is A  HIPAA  Business  Associate ?Who Is A  HIPAA  Business  Associate ?
Who Is A HIPAA Business Associate ?Dan Wellisch
 
Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018Dan Wellisch
 
Managing HIPAA Business Associate Relationships - April 24, 2018
Managing HIPAA Business Associate Relationships  -  April 24, 2018  Managing HIPAA Business Associate Relationships  -  April 24, 2018
Managing HIPAA Business Associate Relationships - April 24, 2018 Dan Wellisch
 
Using Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural TransformationUsing Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural TransformationDan Wellisch
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepDan Wellisch
 
Helping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision SupportHelping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision SupportDan Wellisch
 
AWS Machine Learning Workshop
AWS Machine Learning WorkshopAWS Machine Learning Workshop
AWS Machine Learning WorkshopDan Wellisch
 
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?Dan Wellisch
 
HIPAA Panel Discussion
HIPAA Panel Discussion HIPAA Panel Discussion
HIPAA Panel Discussion Dan Wellisch
 
Using Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And CoordinationUsing Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And CoordinationDan Wellisch
 
Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)Dan Wellisch
 
Driving to consumerism
Driving to consumerismDriving to consumerism
Driving to consumerismDan Wellisch
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationUsing The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationDan Wellisch
 

More from Dan Wellisch (19)

Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
Measuring, Mismeasuring, and Remeasuring - Creating Meaningful Key Performanc...
 
The Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health GoalsThe Role Of Community-Based Organizations in Achieving Population Health Goals
The Role Of Community-Based Organizations in Achieving Population Health Goals
 
Health Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best PracticesHealth Industry Cybersecurity Best Practices
Health Industry Cybersecurity Best Practices
 
Driving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare CostsDriving Data to Cut Healthcare Costs
Driving Data to Cut Healthcare Costs
 
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
US Healthcare Reform Landscape - Addendum to June 2018 Presentation to the Ch...
 
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
Payer Analytics In A Shifting Healthcare Landscape - June Presentation To Chi...
 
Who Is A HIPAA Business Associate ?
Who Is A  HIPAA  Business  Associate ?Who Is A  HIPAA  Business  Associate ?
Who Is A HIPAA Business Associate ?
 
Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018Chronic Care Management - Implemented By TimeDoc - May 2018
Chronic Care Management - Implemented By TimeDoc - May 2018
 
Managing HIPAA Business Associate Relationships - April 24, 2018
Managing HIPAA Business Associate Relationships  -  April 24, 2018  Managing HIPAA Business Associate Relationships  -  April 24, 2018
Managing HIPAA Business Associate Relationships - April 24, 2018
 
Using Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural TransformationUsing Models For Analytically-Driven Cultural Transformation
Using Models For Analytically-Driven Cultural Transformation
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
 
Helping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision SupportHelping Health Healthcare: Financial Decision Support
Helping Health Healthcare: Financial Decision Support
 
AWS Machine Learning Workshop
AWS Machine Learning WorkshopAWS Machine Learning Workshop
AWS Machine Learning Workshop
 
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
What Are The All Payer Claims Databases (SCPDs) And What Could Be Used For?
 
HIPAA Panel Discussion
HIPAA Panel Discussion HIPAA Panel Discussion
HIPAA Panel Discussion
 
Using Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And CoordinationUsing Predictive Analytics For Care Management And Coordination
Using Predictive Analytics For Care Management And Coordination
 
Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)Rcm (Revenue Cycle Management)
Rcm (Revenue Cycle Management)
 
Driving to consumerism
Driving to consumerismDriving to consumerism
Driving to consumerism
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationUsing The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare Innovation
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Analyzing Breast Cancer Data with Azure ML

  • 1. 2018 Catalytics, LLC - Proprietary and Confidential Analyzing Breast Cancer Dataset with Azure Machine Learning (ML) Studio Frank Mendoza CEO, Catalytics Chicago Technology for Value-Based Healthcare Meetup January 23, 2018
  • 2. 2018 Catalytics, LLC - Proprietary and Confidential • Total of 569 records in dataset – donated in 1995 • 30 distinct numerical attributes (or features) associated with each record • No categorical features available within the dataset Breast Cancer Wisconsin (Diagnostic) Dataset Description Location: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
  • 3. 2018 Catalytics, LLC - Proprietary and Confidential Breast Cancer Wisconsin (Diagnostic) Dataset Description, cont. • Column identified as “Diagnosis” is the dataset label • M = malignant • B = benign 300+ 200+ Example of Measurements
  • 4. 2018 Catalytics, LLC - Proprietary and Confidential Core Steps to build Predictive Models using Machine Learning 5(a) Test API
  • 5. 2018 Catalytics, LLC - Proprietary and Confidential Acquire Data & Prepare • Dataset did not have any missing values • Manipulation was still required to ensure training process would be successful – normalization, etc. • Split data into two sets to Train & Test model • Training = 311 records (~54%) • Testing set 1 = 208 records (~36%) • Additional Testing set was to test model after API created – step 5(a) • Testing set 2 = 50 records (~10%) • Training & Testing set 1 was uploaded to Azure Machine Learning (ML) Studio
  • 6. 2018 Catalytics, LLC - Proprietary and Confidential Training Predictive Model Choosing algorithms • Since label is 2 class – Benign vs. Malignant; it was clear that a Classification model would be necessary • Multiple models were developed to identify the best algorithm to use • Two class Logistic Regression • Two class Support Vector Machine • Two class Boosted Decision Tree • Two class Neural Network - WINNER
  • 7. 2018 Catalytics, LLC - Proprietary and Confidential Optimizing Neural Network Model • Feature Selection – identify which attributes matter Important Less Important
  • 8. 2018 Catalytics, LLC - Proprietary and Confidential Feature Selection, continued • Azure ML contains a module called “Permutation Feature Importance” that will test features to identify importance
  • 9. 2018 Catalytics, LLC - Proprietary and Confidential Cross Validation • Azure ML contains a module called “Cross Validation Model” that will evaluate model by partitioning the data – used to ensure that model will perform against unseen/ new data 10 folds
  • 10. 2018 Catalytics, LLC - Proprietary and Confidential Neural Network Classification Model Optimized • Feature selection allowed us to remove 14 attributes that did not contribute to improving model • Accuracy improved from 0.976 to 0.981
  • 12. AZURE ML API/ EXCEL DEMONSTRATION
  • 13. 2018 Catalytics, LLC - Proprietary and Confidential Frank Mendoza, CEO & Chief Catalyst 900 E. Pecan St, Suite 300-286 Pflugerville, TX 78660-8048 Phone: +1 (512) 767-8604 Fax: +1 (737) 703-5478 Email: Frank@CatalyticsConsulting.com linkedin.com/in/fxmendoza Twitter: @DataDrivenMind
  • 15. 2018 Catalytics, LLC - Proprietary and Confidential Attribute Information 1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1) Location: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.707&rep=rep1&type=pdf