SlideShare a Scribd company logo
1 of 10
Download to read offline
SESSION 2020-2021
SUBMITTED TO:-
INDRANIL DAS
SIGN:……………….
SUBMITTED BY:-
ARSHITA BISWAS
ACKNOWLEDGEMENT
I would like to express my special thanks of gratitude to my
teacher Mr.Indranil Das who gave me the golden opportunity
to do this wonderful project on the topic Breast Cancer
Prediction Model, which also helped me in doing a lot of
Research and i came to know about so many new things I am
really thankful to them.
Secondly i would also like to thank my friends who helped me
a lot in finalizing this project within the limited time frame.
PYTHON BASIC
Python is a general purpose and high level programming language. You
can use Python for developing desktop GUI applications, websites and
web applications. Also, Python, as a high level programming language,
allows you to focus on core functionality of the application by taking care of
common programming tasks.
And python also used for Machine Learning and Artificial Intilligence.
WHY PYTHON?
More Productive. First and foremost reason why Python is
much popular because it is highly productive as compared to other
programming languages like C++ and Java. ... Python is also very
famous for its simple programming syntax, code readability and English-
like commands that make coding in Python lot easier and efficient
WHAT IS MACHINE LEARNING?
Machine learning (ML) is the study of computer algorithms that improve
automatically through experience. It is seen as a subset of artificial
intelligence. Machine learning algorithms build a mathematical
model based on sample data, known as "training data", in order to make
predictions or decisions without being explicitly programmed to do
so. Machine learning algorithms are used in a wide variety of applications,
such as email filtering and computer vision, where it is difficult or
infeasible to develop conventional algorithms to perform the needed
tasks.
WHY PYTHON GOOD FOR MACHINE LEARNING?
Smart developers are choosing Python as their go-to programming
language for the myriad of benefits that make it particularly suitable
for machine learning and deep learning projects. Python's simple syntax
and readability promote rapid testing of complex algorithms, and make
the language accessible to non-programmers.
IMPORTANT LIBRARY FOR MACHINE
LEARNING
Numpy
Scikit-learn
Pandas
Matplotlib
NUMPY LIBRARY
NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along
with a large collection of high-level mathematical functions to
operate on these arrays.
Scikit-learn Library
Scikit-learn is a free software machine learning library for the
Python programming language. It features various
classification, regression and clustering algorithms including
support vector machines
PANDAS LIBRARY
In computer programming, pandas is a software library written for
the Python programming language for data manipulation and
analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series.
MATPLOTLIB LIBRARY
Matplotlib is a plotting library for the Python programming
language and its numerical mathematics extension NumPy. It
provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter
HOW LINEAR REGRESSION WORKS?
Linear Regression is a machine learning algorithm based on
supervised learning. ... Linear regression performs the task to predict a
dependent variable value (y) based on a given independent variable (x).
HOW LOGISTIC REGRESSION WORKS?
Logistic Regression is a classification algorithm. It is used to
predict a binary outcome (1 / 0, Yes / No, True / False) given a set
of independent variables. To represent binary / categorical
outcome, we use dummy variables.
HOW KNN WORKS?
KNN works by finding the distances between a query and all
the examples in the data, selecting the specified number
examples (K) closest to the query, then votes for the most
frequent label.
HOW RANDOMFORESTCLASSIFIER WORKS?
The random forest combines hundreds or thousands of decision trees, trains
each one on a slightly different set of the observations, splitting nodes in
each tree considering a limited number of the features. The final
predictions of the random forest are made by averaging the predictions of
each individual tree.
HOW DECISION TREE WORKS?
Decision tree builds classification or regression models in the form
of a tree structure. It breaks down a data set into smaller and
smaller subsets while at the same time an associated decision
tree is incrementally developed.
BREAST CANCER PREDICTION MODEL
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("breast-cancer.csv")
df.head()
df.info()
df.isnull().sum()
df=df.drop([“id"],axis=1)
df.head()
df.info()
df.isnull().sum()
df_x=df.iloc[:,[1:]].values
df_y=df.iloc[:,0]
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(df_x,df_y,test_size=0.3,random_state=42)
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaler.fit(train_x)
train_x=scaler.transform(train_x)
test_x=scaler.transform(test_x)
from sklearn.neighbors import KNeighborsClassifier
classifier=KNeighborsClassifier(n_neighbors=5)
classifier.fit(train_x,train_y)
y_pred=classifier.predict(test_x)
from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(test_y,y_pred))
print(classification_report(test_y,y_pred))
from sklearn import metrics
print("accuracy:",metrics.accuracy_score(test_y,y_pred)*100)
import seaborn as sb
sb.countplot(x='diagnosis',data=df)
plt.grid()
plt.figure(figsize=(20,10))
sb.heatmap(df.corr(),cmap='Blues')
from sklearn.ensemble import RandomForestClassifier
random=RandomForestClassifier(n_estimators=10)
random.fit(train_x,train_y)
y_pred=random.predict(test_x)
from sklearn.metrics import accuracy_score
accuracy_score(test_y,y_pred)
from sklearn.tree import DecisionTreeClassifier
decision=DecisionTreeClassifier()
decision.fit(train_x,train_y)
y_pred=decision.predict(test_x)
accuracy_score(test_y,y_pred)

More Related Content

Similar to Breast Cancer Prediction.pdf

Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
NitishChoudhary23
 

Similar to Breast Cancer Prediction.pdf (20)

Machine Learning Techniques in Python Dissertation - Phdassistance
Machine Learning Techniques in Python Dissertation - PhdassistanceMachine Learning Techniques in Python Dissertation - Phdassistance
Machine Learning Techniques in Python Dissertation - Phdassistance
 
Introduction to Python Programming Language For Artificial Intelligence
Introduction to Python Programming Language For Artificial IntelligenceIntroduction to Python Programming Language For Artificial Intelligence
Introduction to Python Programming Language For Artificial Intelligence
 
Python libraries
Python librariesPython libraries
Python libraries
 
The Best Programming Langauge for Data Science.pptx
The Best Programming Langauge for Data Science.pptxThe Best Programming Langauge for Data Science.pptx
The Best Programming Langauge for Data Science.pptx
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
ppt_template for EDA.pptx
ppt_template for EDA.pptxppt_template for EDA.pptx
ppt_template for EDA.pptx
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Introduction to Data Science & Python.pdf
Introduction to Data Science & Python.pdfIntroduction to Data Science & Python.pdf
Introduction to Data Science & Python.pdf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxWhy to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
 
Intersnship presentation done on inventeron technology company
Intersnship presentation done on inventeron technology companyIntersnship presentation done on inventeron technology company
Intersnship presentation done on inventeron technology company
 
Untitled document (12).pdf
Untitled document (12).pdfUntitled document (12).pdf
Untitled document (12).pdf
 
NEURAL NETWORK BOT
NEURAL NETWORK BOTNEURAL NETWORK BOT
NEURAL NETWORK BOT
 
Best Artificial Intelligence Course | Online program | certification course
Best Artificial Intelligence Course | Online program | certification course Best Artificial Intelligence Course | Online program | certification course
Best Artificial Intelligence Course | Online program | certification course
 
Python
PythonPython
Python
 
History Of C Essay
History Of C EssayHistory Of C Essay
History Of C Essay
 
Introduction to Data Science, Applications & Opportunities.pdf
Introduction to Data Science, Applications & Opportunities.pdfIntroduction to Data Science, Applications & Opportunities.pdf
Introduction to Data Science, Applications & Opportunities.pdf
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with python
 

Recently uploaded

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 

Recently uploaded (20)

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 

Breast Cancer Prediction.pdf

  • 1. SESSION 2020-2021 SUBMITTED TO:- INDRANIL DAS SIGN:………………. SUBMITTED BY:- ARSHITA BISWAS
  • 2. ACKNOWLEDGEMENT I would like to express my special thanks of gratitude to my teacher Mr.Indranil Das who gave me the golden opportunity to do this wonderful project on the topic Breast Cancer Prediction Model, which also helped me in doing a lot of Research and i came to know about so many new things I am really thankful to them. Secondly i would also like to thank my friends who helped me a lot in finalizing this project within the limited time frame.
  • 3. PYTHON BASIC Python is a general purpose and high level programming language. You can use Python for developing desktop GUI applications, websites and web applications. Also, Python, as a high level programming language, allows you to focus on core functionality of the application by taking care of common programming tasks. And python also used for Machine Learning and Artificial Intilligence. WHY PYTHON? More Productive. First and foremost reason why Python is much popular because it is highly productive as compared to other programming languages like C++ and Java. ... Python is also very famous for its simple programming syntax, code readability and English- like commands that make coding in Python lot easier and efficient
  • 4. WHAT IS MACHINE LEARNING? Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. WHY PYTHON GOOD FOR MACHINE LEARNING? Smart developers are choosing Python as their go-to programming language for the myriad of benefits that make it particularly suitable for machine learning and deep learning projects. Python's simple syntax and readability promote rapid testing of complex algorithms, and make the language accessible to non-programmers.
  • 5. IMPORTANT LIBRARY FOR MACHINE LEARNING Numpy Scikit-learn Pandas Matplotlib NUMPY LIBRARY NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Scikit-learn Library Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines
  • 6. PANDAS LIBRARY In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. MATPLOTLIB LIBRARY Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter
  • 7. HOW LINEAR REGRESSION WORKS? Linear Regression is a machine learning algorithm based on supervised learning. ... Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). HOW LOGISTIC REGRESSION WORKS? Logistic Regression is a classification algorithm. It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent binary / categorical outcome, we use dummy variables. HOW KNN WORKS? KNN works by finding the distances between a query and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the most frequent label.
  • 8. HOW RANDOMFORESTCLASSIFIER WORKS? The random forest combines hundreds or thousands of decision trees, trains each one on a slightly different set of the observations, splitting nodes in each tree considering a limited number of the features. The final predictions of the random forest are made by averaging the predictions of each individual tree. HOW DECISION TREE WORKS? Decision tree builds classification or regression models in the form of a tree structure. It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.
  • 9. BREAST CANCER PREDICTION MODEL import pandas as pd import numpy as np import matplotlib.pyplot as plt df=pd.read_csv("breast-cancer.csv") df.head() df.info() df.isnull().sum() df=df.drop([“id"],axis=1) df.head() df.info() df.isnull().sum() df_x=df.iloc[:,[1:]].values df_y=df.iloc[:,0] from sklearn.model_selection import train_test_split train_x,test_x,train_y,test_y=train_test_split(df_x,df_y,test_size=0.3,random_state=42) from sklearn.preprocessing import StandardScaler scaler=StandardScaler() scaler.fit(train_x) train_x=scaler.transform(train_x) test_x=scaler.transform(test_x)
  • 10. from sklearn.neighbors import KNeighborsClassifier classifier=KNeighborsClassifier(n_neighbors=5) classifier.fit(train_x,train_y) y_pred=classifier.predict(test_x) from sklearn.metrics import confusion_matrix,classification_report print(confusion_matrix(test_y,y_pred)) print(classification_report(test_y,y_pred)) from sklearn import metrics print("accuracy:",metrics.accuracy_score(test_y,y_pred)*100) import seaborn as sb sb.countplot(x='diagnosis',data=df) plt.grid() plt.figure(figsize=(20,10)) sb.heatmap(df.corr(),cmap='Blues') from sklearn.ensemble import RandomForestClassifier random=RandomForestClassifier(n_estimators=10) random.fit(train_x,train_y) y_pred=random.predict(test_x) from sklearn.metrics import accuracy_score accuracy_score(test_y,y_pred) from sklearn.tree import DecisionTreeClassifier decision=DecisionTreeClassifier() decision.fit(train_x,train_y) y_pred=decision.predict(test_x) accuracy_score(test_y,y_pred)