Analyzing Breast Cancer Data with Azure ML

•Download as PPTX, PDF•

1 like•1,306 views

This presentation was given by https://www.meetup.com/Chicago-Technology-For-Value-Based-Healthcare-Meetup/ Member Frank Mendoza of Catalytics on January 23, 2018

Data & Analytics

2018 Catalytics, LLC - Proprietary and Confidential
Analyzing Breast
Cancer Dataset with
Azure
Machine Learning (ML)
Studio
Frank Mendoza
CEO, Catalytics
Chicago Technology for Value-Based Healthcare
Meetup
January 23, 2018

2018 Catalytics, LLC - Proprietary and Confidential
• Total of 569 records in dataset – donated in 1995
• 30 distinct numerical attributes (or features) associated with
each record
• No categorical features available within the dataset
Breast Cancer Wisconsin (Diagnostic) Dataset
Description
Location: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

2018 Catalytics, LLC - Proprietary and Confidential
Breast Cancer Wisconsin (Diagnostic) Dataset
Description, cont.
• Column identified as “Diagnosis” is the dataset label
• M = malignant
• B = benign 300+
200+
Example of Measurements

2018 Catalytics, LLC - Proprietary and Confidential
Core Steps to build Predictive Models using Machine Learning
5(a)
Test API

2018 Catalytics, LLC - Proprietary and Confidential
Acquire Data & Prepare
• Dataset did not have any missing values
• Manipulation was still required to ensure training process would be successful –
normalization, etc.
• Split data into two sets to Train & Test model
• Training = 311 records (~54%)
• Testing set 1 = 208 records (~36%)
• Additional Testing set was to test model after API created – step 5(a)
• Testing set 2 = 50 records (~10%)
• Training & Testing set 1 was uploaded to Azure Machine Learning (ML) Studio

2018 Catalytics, LLC - Proprietary and Confidential
Training Predictive Model
Choosing algorithms
• Since label is 2 class – Benign vs. Malignant; it was clear that a
Classification model would be necessary
• Multiple models were developed to identify the best algorithm to use
• Two class Logistic Regression
• Two class Support Vector Machine
• Two class Boosted Decision Tree
• Two class Neural Network - WINNER

2018 Catalytics, LLC - Proprietary and Confidential
Optimizing Neural Network Model
• Feature Selection – identify which attributes matter
Important Less Important

2018 Catalytics, LLC - Proprietary and Confidential
Feature Selection, continued
• Azure ML contains a module called “Permutation Feature Importance” that will
test features to identify importance

2018 Catalytics, LLC - Proprietary and Confidential
Cross Validation
• Azure ML contains a module called “Cross Validation Model” that will evaluate
model by partitioning the data – used to ensure that model will perform
against unseen/ new data
10 folds

2018 Catalytics, LLC - Proprietary and Confidential
Neural Network Classification Model
Optimized
• Feature selection allowed us to remove 14 attributes that did not
contribute to improving model
• Accuracy improved from 0.976 to 0.981

2018 Catalytics, LLC - Proprietary and Confidential
Frank Mendoza, CEO & Chief Catalyst
900 E. Pecan St, Suite 300-286 Pflugerville, TX 78660-8048
Phone: +1 (512) 767-8604
Fax: +1 (737) 703-5478
Email: Frank@CatalyticsConsulting.com
linkedin.com/in/fxmendoza
Twitter: @DataDrivenMind

$2018 Catalytics, LLC - Proprietary and Confidential Attribute Information 1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1) Location: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.707&rep=rep1&type=pdf$

Similar to Analyzing Breast Cancer Data with Azure ML

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY

Data Driven Engineering 2014Roger Barga

Datasets for Machine Learning.docxShalini104884

Sai Teja K Resume.pdfSaiTejaK11

Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC

MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus

Deliveinrg explainable AIGary Allemann

Big data services slideshare - agilisium 2.0 - v1.0Agilisium Consulting

AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting

Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdfOne Federal Solution

SESE 2021: Where Systems Engineering meets AI/MLCARLOS III UNIVERSITY OF MADRID

New technologies for data protectionUlf Mattsson

Microsoft: A Waking Giant In Healthcare Analytics and Big DataHealth Catalyst

An introduction to Machine Learning with scikit-learn (October 2018)Julien SIMON

Deploying Predictive Analytics in HealthcareHealth Catalyst

Microsoft: A Waking Giant in Healthcare Analytics and Big DataDale Sanders

Intelligent Digital Mesh TestingNagarro

Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...QASymphony

Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Prasanna Hegde

Introduction to Machine Learning and Data Science using the Autonomous databa...Sandesh Rao

Similar to Analyzing Breast Cancer Data with Azure ML (20)

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...

Data Driven Engineering 2014

Datasets for Machine Learning.docx

Sai Teja K Resume.pdf

Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...

MLOps and Data Quality: Deploying Reliable ML Models in Production

Deliveinrg explainable AI

Big data services slideshare - agilisium 2.0 - v1.0

AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...

Introduction-to-Big-Data-Analytics-in-Logistics-and-Supply-Chain-Management.pdf

SESE 2021: Where Systems Engineering meets AI/ML

New technologies for data protection

Microsoft: A Waking Giant In Healthcare Analytics and Big Data

An introduction to Machine Learning with scikit-learn (October 2018)

Deploying Predictive Analytics in Healthcare

Microsoft: A Waking Giant in Healthcare Analytics and Big Data

Intelligent Digital Mesh Testing

Quality Jam 2017: Jesse Reed & Kyle McMeekin "Test Case Management & Explorat...

Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...

Introduction to Machine Learning and Data Science using the Autonomous databa...

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Easter Eggs From Star Wars and in cars 1 and 217djon017

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly

While-For-loop in python used in collegessuser7a7cd61

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Multiple time frame trading analysis -brianshannon.pdfchwongval

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Student profile product demonstration on grades, ability, well-being and mind...

20240419 - Measurecamp Amsterdam - SAM.pdf

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

Advanced Machine Learning for Business Professionals

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING

Semantic Shed - Squashing and Squeezing.pptx

Easter Eggs From Star Wars and in cars 1 and 2

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

Defining Constituents, Data Vizzes and Telling a Data Story

Generative AI for Social Good at Open Data Science East 2024

While-For-loop in python used in college

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Multiple time frame trading analysis -brianshannon.pdf

Analyzing Breast Cancer Data with Azure ML

1. 2018 Catalytics, LLC - Proprietary and Confidential Analyzing Breast Cancer Dataset with Azure Machine Learning (ML) Studio Frank Mendoza CEO, Catalytics Chicago Technology for Value-Based Healthcare Meetup January 23, 2018

2. 2018 Catalytics, LLC - Proprietary and Confidential • Total of 569 records in dataset – donated in 1995 • 30 distinct numerical attributes (or features) associated with each record • No categorical features available within the dataset Breast Cancer Wisconsin (Diagnostic) Dataset Description Location: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

3. 2018 Catalytics, LLC - Proprietary and Confidential Breast Cancer Wisconsin (Diagnostic) Dataset Description, cont. • Column identified as “Diagnosis” is the dataset label • M = malignant • B = benign 300+ 200+ Example of Measurements

4. 2018 Catalytics, LLC - Proprietary and Confidential Core Steps to build Predictive Models using Machine Learning 5(a) Test API

5. 2018 Catalytics, LLC - Proprietary and Confidential Acquire Data & Prepare • Dataset did not have any missing values • Manipulation was still required to ensure training process would be successful – normalization, etc. • Split data into two sets to Train & Test model • Training = 311 records (~54%) • Testing set 1 = 208 records (~36%) • Additional Testing set was to test model after API created – step 5(a) • Testing set 2 = 50 records (~10%) • Training & Testing set 1 was uploaded to Azure Machine Learning (ML) Studio

6. 2018 Catalytics, LLC - Proprietary and Confidential Training Predictive Model Choosing algorithms • Since label is 2 class – Benign vs. Malignant; it was clear that a Classification model would be necessary • Multiple models were developed to identify the best algorithm to use • Two class Logistic Regression • Two class Support Vector Machine • Two class Boosted Decision Tree • Two class Neural Network - WINNER

7. 2018 Catalytics, LLC - Proprietary and Confidential Optimizing Neural Network Model • Feature Selection – identify which attributes matter Important Less Important

8. 2018 Catalytics, LLC - Proprietary and Confidential Feature Selection, continued • Azure ML contains a module called “Permutation Feature Importance” that will test features to identify importance

9. 2018 Catalytics, LLC - Proprietary and Confidential Cross Validation • Azure ML contains a module called “Cross Validation Model” that will evaluate model by partitioning the data – used to ensure that model will perform against unseen/ new data 10 folds

10. 2018 Catalytics, LLC - Proprietary and Confidential Neural Network Classification Model Optimized • Feature selection allowed us to remove 14 attributes that did not contribute to improving model • Accuracy improved from 0.976 to 0.981

11. AZURE ML DEMONSTRATION

12. AZURE ML API/ EXCEL DEMONSTRATION

13. 2018 Catalytics, LLC - Proprietary and Confidential Frank Mendoza, CEO & Chief Catalyst 900 E. Pecan St, Suite 300-286 Pflugerville, TX 78660-8048 Phone: +1 (512) 767-8604 Fax: +1 (737) 703-5478 Email: Frank@CatalyticsConsulting.com linkedin.com/in/fxmendoza Twitter: @DataDrivenMind

14. Appendix

15. 2018 Catalytics, LLC - Proprietary and Confidential Attribute Information 1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1) Location: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.707&rep=rep1&type=pdf

Analyzing Breast Cancer Data with Azure ML

Recommended

Recommended

More Related Content

Similar to Analyzing Breast Cancer Data with Azure ML

Similar to Analyzing Breast Cancer Data with Azure ML (20)

More from Dan Wellisch

More from Dan Wellisch (19)

Recently uploaded

Recently uploaded (20)

Analyzing Breast Cancer Data with Azure ML