MAJOR PROJECT
ON
BREAST CANCER DETECTION
Guided By: Submitted By:
Bivasa Ranjan Parida Mahima Milan Mohapatra - 1801219074
Suroshree Ghosh - 1801219161
Gyana Prakash Sahoo - 1801219056
Biswajit Sahoo - 1801219035
Dept. of Computer Science & Engineering
College of Engineering Bhubaneswar
:Content:
Introduction
System Specification
Methodologies
System Architecture
Project Interface
Task Performed
Confusion Matrix
Project Interface for breast cancer detected
Project Interface for breast cancer not detected
Advantages & Disadvantages
Applications
Future Scope
Conclusion
Reference
Introduction
 Cancer is a disease in which abnormal cells divide uncontrollably and destroy
body tissue.
 Mainly of two types i.e.
 Malignant(Cancerous)
 Benign(Non cancerous)
 Breast Cancer is the second largest cause of cancer deaths among women.
 At the same time, it is also among the most curable cancer types if it can be
diagnosed early.
System
Specification
Hardware Requirements:
 System: Pentium IV 2.4GHz
 Hard Disk: 500 GB
 RAM: 4 GB
 Any desktop/laptop system with above configuration or higher level
Software Requirements:
 Operating System: Windows 7 and above
 Coding Language: Python 2.7 and above
 Scripting tool: Jupyter Notebook
 Libraries: Pandas, Numpy, Sklearn, stats, Matplotlib, statistics.
Methodologies
What is a Support Vector Machine(SVM)?
• Supervised pattern classification
• powerful and versatile Machine Learning model
• suited for small or medium sized datasets.
• SVM is a training algorithm for learning classification and regression
rules from
data.
System
Architecture: Start
Training Data Breast cancer
detection
Preprocessed data
Cleaned dataset
Data visualization
Prediction using SVM
algorithm
Analysis the output and
performance
Stop
Project Interface
Task
Performed
Preparing the Data:-
Some loaded packages are;
1. import pandas pd 2.import
numpy as np
3.import matplotlib.pyplot as plt 4.import
seaborn as sns
Using pandas we will load the dataset and print some basic
information.
df = pd.read_csv("cell_samples.csv")
df.head()
df.tail()
• Output:
Which will display top and bottom entities of the data set used in our model.
• Now we can calculate how many diagnosis are malignant and how many are
benign . Which has been shown below.
Output:
• Now we can use seaborn to create heat map of the correlations between the
features.
plt.figure(figsize=(14, 11))
sns.heatmap(df.corr(),annot=True,cmap=
'viridis’) plt.show()
Output:
Fig: Heat map
Why Choose
SVC?
114
(Fig: Confusion Matrix)
Predicted
Actual
TN FP
FN TP
From confusion matrix we can calculate Accuracy,Error,precision,recall.
1.Accuracy=(TP+TN)/Total
=(114+54)/175
=168/175
=0.96
2.Error=1-Accuracy
=1-0.96
=0.04
3.precision=TP/Predicted positive
=54/58
=0.93
4.recall=TP/Actual positive
=54/57
=0.95
Project Interface for Breast Cancer
Detected
Project Interface for Breast Cancer Not
Detected
Advantages
 Effective in high dimensional spaces
 Effective in cases where number of
dimensions is greater than the
number of samples.
 It is also memory efficient.
Disadvantages
If the number of features is much
greater than the number of samples,
avoid over-fitting in choosing Kernel
functions.
SVMs do not directly provide probability
estimates, these are calculated using an
expensive five-fold cross-validation.
Application
s
 Early detection leads to more treatment options and a better chance for
survival.
 Breast cancer detected at an early stage have a 93 percent or higher
survival rate in the first five years.
 It is quite easier to treat at an early stage rather than last stage.
Future
Scope
Breast cancer if found at an early stage will help save lives of thousands of women
or even men. Hence, this project plays a very important role for future:
• These projects help the real world patients and doctors to gather as much
information as they can.
• By using machine learning algorithms we will be able to classify and predict
the cancer into bening or malignant.
• Machine learning algorithms can be used for medical oriented research, it
advances the system, reduces human errors and lowers manual mistakes.
Conclusio
n
• After applying the different classification models, we have got accuracies with
different models. Decision Tree, K-NN, Support Vector Machine and Logistic
Regression algorithms achieved 94.64 percent,89.22 percent, 96.87 percent and 94.67
percent accuracy respectively.
• This research established the model’s performance and significant factors affecting
breast cancer patients’ survival rates, which may be used in clinical practice, especially
in the Asian scenario.
Reference
1. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-694
7-8-56
2. https://airccse.org/journal/ijdps/papers/4313ijdps09.pdf
3. https://link.springer.com/article/10.1007/s10489-007-0073-z
4. https://www.sciencedirect.com/science/article/pii/S1877050916302575
5. https://www.academia.edu/71848246/Prediction_of_Breast_Cancer_Disease_us
ing_Machine_Learning_Algorithms
Thank You

Breast-Cancer-Detection-final.pptx..............

  • 1.
    MAJOR PROJECT ON BREAST CANCERDETECTION Guided By: Submitted By: Bivasa Ranjan Parida Mahima Milan Mohapatra - 1801219074 Suroshree Ghosh - 1801219161 Gyana Prakash Sahoo - 1801219056 Biswajit Sahoo - 1801219035 Dept. of Computer Science & Engineering College of Engineering Bhubaneswar
  • 2.
    :Content: Introduction System Specification Methodologies System Architecture ProjectInterface Task Performed Confusion Matrix Project Interface for breast cancer detected Project Interface for breast cancer not detected Advantages & Disadvantages Applications Future Scope Conclusion Reference
  • 3.
    Introduction  Cancer isa disease in which abnormal cells divide uncontrollably and destroy body tissue.  Mainly of two types i.e.  Malignant(Cancerous)  Benign(Non cancerous)  Breast Cancer is the second largest cause of cancer deaths among women.  At the same time, it is also among the most curable cancer types if it can be diagnosed early.
  • 4.
    System Specification Hardware Requirements:  System:Pentium IV 2.4GHz  Hard Disk: 500 GB  RAM: 4 GB  Any desktop/laptop system with above configuration or higher level Software Requirements:  Operating System: Windows 7 and above  Coding Language: Python 2.7 and above  Scripting tool: Jupyter Notebook  Libraries: Pandas, Numpy, Sklearn, stats, Matplotlib, statistics.
  • 5.
    Methodologies What is aSupport Vector Machine(SVM)? • Supervised pattern classification • powerful and versatile Machine Learning model • suited for small or medium sized datasets. • SVM is a training algorithm for learning classification and regression rules from data.
  • 6.
    System Architecture: Start Training DataBreast cancer detection Preprocessed data Cleaned dataset Data visualization Prediction using SVM algorithm Analysis the output and performance Stop
  • 7.
  • 8.
    Task Performed Preparing the Data:- Someloaded packages are; 1. import pandas pd 2.import numpy as np 3.import matplotlib.pyplot as plt 4.import seaborn as sns Using pandas we will load the dataset and print some basic information. df = pd.read_csv("cell_samples.csv") df.head() df.tail()
  • 9.
    • Output: Which willdisplay top and bottom entities of the data set used in our model.
  • 10.
    • Now wecan calculate how many diagnosis are malignant and how many are benign . Which has been shown below. Output: • Now we can use seaborn to create heat map of the correlations between the features. plt.figure(figsize=(14, 11)) sns.heatmap(df.corr(),annot=True,cmap= 'viridis’) plt.show()
  • 11.
  • 12.
    Why Choose SVC? 114 (Fig: ConfusionMatrix) Predicted Actual TN FP FN TP
  • 13.
    From confusion matrixwe can calculate Accuracy,Error,precision,recall. 1.Accuracy=(TP+TN)/Total =(114+54)/175 =168/175 =0.96 2.Error=1-Accuracy =1-0.96 =0.04 3.precision=TP/Predicted positive =54/58 =0.93 4.recall=TP/Actual positive =54/57 =0.95
  • 14.
    Project Interface forBreast Cancer Detected
  • 15.
    Project Interface forBreast Cancer Not Detected
  • 16.
    Advantages  Effective inhigh dimensional spaces  Effective in cases where number of dimensions is greater than the number of samples.  It is also memory efficient. Disadvantages If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions. SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.
  • 17.
    Application s  Early detectionleads to more treatment options and a better chance for survival.  Breast cancer detected at an early stage have a 93 percent or higher survival rate in the first five years.  It is quite easier to treat at an early stage rather than last stage.
  • 18.
    Future Scope Breast cancer iffound at an early stage will help save lives of thousands of women or even men. Hence, this project plays a very important role for future: • These projects help the real world patients and doctors to gather as much information as they can. • By using machine learning algorithms we will be able to classify and predict the cancer into bening or malignant. • Machine learning algorithms can be used for medical oriented research, it advances the system, reduces human errors and lowers manual mistakes.
  • 19.
    Conclusio n • After applyingthe different classification models, we have got accuracies with different models. Decision Tree, K-NN, Support Vector Machine and Logistic Regression algorithms achieved 94.64 percent,89.22 percent, 96.87 percent and 94.67 percent accuracy respectively. • This research established the model’s performance and significant factors affecting breast cancer patients’ survival rates, which may be used in clinical practice, especially in the Asian scenario.
  • 20.
    Reference 1. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-694 7-8-56 2. https://airccse.org/journal/ijdps/papers/4313ijdps09.pdf 3.https://link.springer.com/article/10.1007/s10489-007-0073-z 4. https://www.sciencedirect.com/science/article/pii/S1877050916302575 5. https://www.academia.edu/71848246/Prediction_of_Breast_Cancer_Disease_us ing_Machine_Learning_Algorithms
  • 21.