1. BREAST CANCER USING
MACHINE LEARNING
Submitted by
S.Rajayogha
2nd M.sc Data science
Kalasalingam Academy of Research and education
9921146010@klu.ac.in
2. OVERVIEW
• Introduction
• Breast cancer- an overview
• Why I am using machine learning?
• Understanding the algorithm
• Analysis with R language
• Confusion matrix and accuracy
3. INTRODUCTION
what is machine learning and why we are using it?
Machine learning(ML) is a field of artificial intelligence that uses statistical techniques to
give computer systems the ability to “learn” from data ,without being explicity programmed.
4. BREAST CANCER: AN OVERVIEW
• Breast cancer is the second leading cause of cancer death in women , second only
to lung cancer.
• The Leading risk factor for breast cancer is simply being a woman . though breast
cancer does occur in men , the disease is 100 times more common in women.
• Men can also get breast cancer .In 2017,the American cancer society estimates 2,470
new cases of invasive breast cancer will be diagnosed in men in the US.
• A Women has about a one in eight chance of being diagnosed with breast cancer in
her lifetime , according to the national cancer institute
• Most women who get breast cancer do not have a family history of the diseases.
• But women who have close blood relatives with breast cancer have a higher risk
.Having a first –degree relative(mother , sister or daughter) with breast cancer
almost doubles a woman’s risk.
6. UNDERSTANDING THE ALGORITHM
Lazy learning-Classification using Nearest Neighbors
K-Nearest Neighbor classifiers are defined by their characteristic of classifying
unlabeled examples by assigning them the class of similar labelled.
Examples:
Computer vision applications,including optical character recogonization and facial
recogonization in both still images and video.
Predicting whether a person will enjoy a movie or music recommendation.
Identifying patterns in generic data,perhaps to use them in detectingspecific
proteins or diseases.
9. ANALYSIS WITH R LANGUAGE
STEP 1 –COLLECTING DATA
We will utilize the Wisconsin Breast Cancer Diagnostic dataset from the UCI
Machine Learning Repository at http://archieve.ics.uci.edu/ml. This data was donated
by researcher of the university of winsconsin . The breast cancer data includes 569
example of cancer biopesies,each with 32 features
10. Step 2-exploring and preparing the data
Importing the data set in R IDE
>wbcd <- read.csv(“wisc_bc_data.csv”, stringAsFactors =FALSE)
The Table() indicates that 357 masses are benign while 212 are malignant
>table(wbcd$diagnosis)
B M
357 212
11. Step 3-training a model on the data
> wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl =
wbcd_train_labels_k =21)
12. CONFUSION MATRIX AND ACCURACY
CONFUSION MATRIX is a matrix used to determine the performance of the
classification models for a given set of test data .it can only be determined if the value
for test data are known.
confusion matrix to calculate Accuracy
13. CONFUSION MATRIX AND ACCURACY
Accuracy is a metric used in classification problems used tell the percentage of
accurate predictions.we calculate it by dividing the no of correct predictions by the
total number of predictions.