Breast Cancer Diagnosis
Mini Project Presentation for the MCA
Course by Ankit Gupta
Introduction
What is Machine Learning and why we are using it ?
Machine learning (ML) is a field of artificial intelligence that uses statistical
techniques to give computer systems the ability to "learn" (e.g., progressively
improve performance on a specific task) from data, without being explicitly
programmed.
ML
AI Self Thinking
Computer
Provides
Mathematical
tool to AI
11/23/2018Ankit Gupta 1719214832 2
Breast Cancer: An Overview
• Breast cancer is the second leading cause of cancer death in women, second only to lung
cancer.
• The leading risk factor for breast cancer is simply being a woman. Though breast cancer does
occur in men, the disease is 100 times more common in women.
• Men can also get breast cancer. In 2017, the American Cancer Society estimates 2,470 new
cases of invasive breast cancer will be diagnosed in men in the U.S.
• A woman has about a one in eight chance of being diagnosed with breast cancer in her
lifetime, according to the National Cancer Institute.
• Most women (about eight out of 10) who get breast cancer do not have a family history of the
disease.
• But women who have close blood relatives with breast cancer have a higher risk. Having a
first-degree relative (mother, sister or daughter) with breast cancer almost doubles a woman’s
risk.
Ankit Gupta 1719214832 11/23/2018 3
Why I am using Machine Learning?
Breast
Cancer
Biopsy
Data
Machine
Learning
Diagnosis
11/23/2018Ankit Gupta 1719214832 4
Understanding the Algorithm
Lazy Learning – Classification Using Nearest Neighbors
K-Nearest Neighbor classifiers are defined by their characteristic of classifying
unlabeled examples by assigning them the class of similar labeled.
Examples:
• Computer vision applications, including optical character recognition and facial
recognition in both still images and video.
• Predicting whether a person will enjoy a movie or music recommendation.
• Identifying patterns in genetic data, perhaps to use them in detecting specific
proteins or diseases
Ankit Gupta 1719214832 11/23/2018 5
Flow chart
• MeasurementsBiopsy
procedure
• Evaluation
Reports Diagnosis
Ankit Gupta 1719214832 11/23/2018 6
Analysis of
Measurements
Preparation of
ML Models
Predictions &
Validation
Take a View of Biopsy Data
Ankit Gupta 1719214832 11/23/2018 7
Analysis with R language
Step 1 – collecting data
We will utilize the Wisconsin Breast Cancer Diagnostic dataset from the UCI
Machine Learning Repository at http://archive.ics.uci.edu/ml. This data was
donated by researchers of the University of Wisconsin. The breast cancer data
includes 569 examples of cancer biopsies, each with 32 features.
Ankit Gupta 1719214832 11/23/2018 8
Kaggle Competition
Ankit Gupta 1719214832 11/23/2018 9
Step 2 – exploring and preparing the data
Importing the data set in R IDE
The table( ) output indicates that 357 masses are benign while 212 are malignant
Ankit Gupta 1719214832 11/23/2018 10
Visualization 1.
Ankit Gupta 1719214832 11/23/2018 11
Do you notice anything
problematic about the
values?
Visualization 2.
Ankit Gupta 1719214832 11/23/2018 12
So let's apply
normalization to rescale
the features to a
standard range of values
Visualization 3.
Ankit Gupta 1719214832 11/23/2018 13
A vector x of numeric values,
and for each value in x,
subtracts the
minimum value in x and
divides by the range of
values in x.
Data preparation – creating training and test
datasets
We will split the wbcd_n data frame into wbcd_train and wbcd_test:
If the preceding commands are confusing, remember that data is extracted from
dataframes using the [row, column] syntax. A blank value for the row or column
value indicates that all the rows or columns should be included. Hence, the first
line of code takes rows 1 to 469 and all columns, and the second line takes 100
rows from 470 to 569 and all columns.
Ankit Gupta 1719214832 11/23/2018 14
Step 3 – training a model on the data
Ankit Gupta 1719214832 11/23/2018 15
Now Shit back and relax all the work will be done by R language you
don’t have to do any more calculations. Unless you don’t know the
value of K in KNN Algorithm
k=sqrt(nrow(wbcd))
Square root of number of rows in the dataset & should be odd .
Confusion Matrix to calculate Accuracy
77 0
2 21
Ankit Gupta 1719214832 11/23/2018 16
Benign
Malignant
Accuracy
98.24%
Machine
Learning
Ankit Gupta 1719214832 11/23/2018 17
Ankit Gupta 1719214832 11/23/2018 18

Breast cancer diagnosis machine learning ppt

  • 1.
    Breast Cancer Diagnosis MiniProject Presentation for the MCA Course by Ankit Gupta
  • 2.
    Introduction What is MachineLearning and why we are using it ? Machine learning (ML) is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) from data, without being explicitly programmed. ML AI Self Thinking Computer Provides Mathematical tool to AI 11/23/2018Ankit Gupta 1719214832 2
  • 3.
    Breast Cancer: AnOverview • Breast cancer is the second leading cause of cancer death in women, second only to lung cancer. • The leading risk factor for breast cancer is simply being a woman. Though breast cancer does occur in men, the disease is 100 times more common in women. • Men can also get breast cancer. In 2017, the American Cancer Society estimates 2,470 new cases of invasive breast cancer will be diagnosed in men in the U.S. • A woman has about a one in eight chance of being diagnosed with breast cancer in her lifetime, according to the National Cancer Institute. • Most women (about eight out of 10) who get breast cancer do not have a family history of the disease. • But women who have close blood relatives with breast cancer have a higher risk. Having a first-degree relative (mother, sister or daughter) with breast cancer almost doubles a woman’s risk. Ankit Gupta 1719214832 11/23/2018 3
  • 4.
    Why I amusing Machine Learning? Breast Cancer Biopsy Data Machine Learning Diagnosis 11/23/2018Ankit Gupta 1719214832 4
  • 5.
    Understanding the Algorithm LazyLearning – Classification Using Nearest Neighbors K-Nearest Neighbor classifiers are defined by their characteristic of classifying unlabeled examples by assigning them the class of similar labeled. Examples: • Computer vision applications, including optical character recognition and facial recognition in both still images and video. • Predicting whether a person will enjoy a movie or music recommendation. • Identifying patterns in genetic data, perhaps to use them in detecting specific proteins or diseases Ankit Gupta 1719214832 11/23/2018 5
  • 6.
    Flow chart • MeasurementsBiopsy procedure •Evaluation Reports Diagnosis Ankit Gupta 1719214832 11/23/2018 6 Analysis of Measurements Preparation of ML Models Predictions & Validation
  • 7.
    Take a Viewof Biopsy Data Ankit Gupta 1719214832 11/23/2018 7
  • 8.
    Analysis with Rlanguage Step 1 – collecting data We will utilize the Wisconsin Breast Cancer Diagnostic dataset from the UCI Machine Learning Repository at http://archive.ics.uci.edu/ml. This data was donated by researchers of the University of Wisconsin. The breast cancer data includes 569 examples of cancer biopsies, each with 32 features. Ankit Gupta 1719214832 11/23/2018 8
  • 9.
    Kaggle Competition Ankit Gupta1719214832 11/23/2018 9
  • 10.
    Step 2 –exploring and preparing the data Importing the data set in R IDE The table( ) output indicates that 357 masses are benign while 212 are malignant Ankit Gupta 1719214832 11/23/2018 10
  • 11.
    Visualization 1. Ankit Gupta1719214832 11/23/2018 11 Do you notice anything problematic about the values?
  • 12.
    Visualization 2. Ankit Gupta1719214832 11/23/2018 12 So let's apply normalization to rescale the features to a standard range of values
  • 13.
    Visualization 3. Ankit Gupta1719214832 11/23/2018 13 A vector x of numeric values, and for each value in x, subtracts the minimum value in x and divides by the range of values in x.
  • 14.
    Data preparation –creating training and test datasets We will split the wbcd_n data frame into wbcd_train and wbcd_test: If the preceding commands are confusing, remember that data is extracted from dataframes using the [row, column] syntax. A blank value for the row or column value indicates that all the rows or columns should be included. Hence, the first line of code takes rows 1 to 469 and all columns, and the second line takes 100 rows from 470 to 569 and all columns. Ankit Gupta 1719214832 11/23/2018 14
  • 15.
    Step 3 –training a model on the data Ankit Gupta 1719214832 11/23/2018 15 Now Shit back and relax all the work will be done by R language you don’t have to do any more calculations. Unless you don’t know the value of K in KNN Algorithm k=sqrt(nrow(wbcd)) Square root of number of rows in the dataset & should be odd .
  • 16.
    Confusion Matrix tocalculate Accuracy 77 0 2 21 Ankit Gupta 1719214832 11/23/2018 16 Benign Malignant
  • 17.
  • 18.