2. ACKNOWLEDGEMENT
I would like to express my special thanks of gratitude to my
teacher Mr.Indranil Das who gave me the golden opportunity
to do this wonderful project on the topic Breast Cancer
Prediction Model, which also helped me in doing a lot of
Research and i came to know about so many new things I am
really thankful to them.
Secondly i would also like to thank my friends who helped me
a lot in finalizing this project within the limited time frame.
3. PYTHON BASIC
Python is a general purpose and high level programming language. You
can use Python for developing desktop GUI applications, websites and
web applications. Also, Python, as a high level programming language,
allows you to focus on core functionality of the application by taking care of
common programming tasks.
And python also used for Machine Learning and Artificial Intilligence.
WHY PYTHON?
More Productive. First and foremost reason why Python is
much popular because it is highly productive as compared to other
programming languages like C++ and Java. ... Python is also very
famous for its simple programming syntax, code readability and English-
like commands that make coding in Python lot easier and efficient
4. WHAT IS MACHINE LEARNING?
Machine learning (ML) is the study of computer algorithms that improve
automatically through experience. It is seen as a subset of artificial
intelligence. Machine learning algorithms build a mathematical
model based on sample data, known as "training data", in order to make
predictions or decisions without being explicitly programmed to do
so. Machine learning algorithms are used in a wide variety of applications,
such as email filtering and computer vision, where it is difficult or
infeasible to develop conventional algorithms to perform the needed
tasks.
WHY PYTHON GOOD FOR MACHINE LEARNING?
Smart developers are choosing Python as their go-to programming
language for the myriad of benefits that make it particularly suitable
for machine learning and deep learning projects. Python's simple syntax
and readability promote rapid testing of complex algorithms, and make
the language accessible to non-programmers.
5. IMPORTANT LIBRARY FOR MACHINE
LEARNING
Numpy
Scikit-learn
Pandas
Matplotlib
NUMPY LIBRARY
NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along
with a large collection of high-level mathematical functions to
operate on these arrays.
Scikit-learn Library
Scikit-learn is a free software machine learning library for the
Python programming language. It features various
classification, regression and clustering algorithms including
support vector machines
6. PANDAS LIBRARY
In computer programming, pandas is a software library written for
the Python programming language for data manipulation and
analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series.
MATPLOTLIB LIBRARY
Matplotlib is a plotting library for the Python programming
language and its numerical mathematics extension NumPy. It
provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter
7. HOW LINEAR REGRESSION WORKS?
Linear Regression is a machine learning algorithm based on
supervised learning. ... Linear regression performs the task to predict a
dependent variable value (y) based on a given independent variable (x).
HOW LOGISTIC REGRESSION WORKS?
Logistic Regression is a classification algorithm. It is used to
predict a binary outcome (1 / 0, Yes / No, True / False) given a set
of independent variables. To represent binary / categorical
outcome, we use dummy variables.
HOW KNN WORKS?
KNN works by finding the distances between a query and all
the examples in the data, selecting the specified number
examples (K) closest to the query, then votes for the most
frequent label.
8. HOW RANDOMFORESTCLASSIFIER WORKS?
The random forest combines hundreds or thousands of decision trees, trains
each one on a slightly different set of the observations, splitting nodes in
each tree considering a limited number of the features. The final
predictions of the random forest are made by averaging the predictions of
each individual tree.
HOW DECISION TREE WORKS?
Decision tree builds classification or regression models in the form
of a tree structure. It breaks down a data set into smaller and
smaller subsets while at the same time an associated decision
tree is incrementally developed.
9. BREAST CANCER PREDICTION MODEL
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("breast-cancer.csv")
df.head()
df.info()
df.isnull().sum()
df=df.drop([“id"],axis=1)
df.head()
df.info()
df.isnull().sum()
df_x=df.iloc[:,[1:]].values
df_y=df.iloc[:,0]
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(df_x,df_y,test_size=0.3,random_state=42)
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaler.fit(train_x)
train_x=scaler.transform(train_x)
test_x=scaler.transform(test_x)
10. from sklearn.neighbors import KNeighborsClassifier
classifier=KNeighborsClassifier(n_neighbors=5)
classifier.fit(train_x,train_y)
y_pred=classifier.predict(test_x)
from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(test_y,y_pred))
print(classification_report(test_y,y_pred))
from sklearn import metrics
print("accuracy:",metrics.accuracy_score(test_y,y_pred)*100)
import seaborn as sb
sb.countplot(x='diagnosis',data=df)
plt.grid()
plt.figure(figsize=(20,10))
sb.heatmap(df.corr(),cmap='Blues')
from sklearn.ensemble import RandomForestClassifier
random=RandomForestClassifier(n_estimators=10)
random.fit(train_x,train_y)
y_pred=random.predict(test_x)
from sklearn.metrics import accuracy_score
accuracy_score(test_y,y_pred)
from sklearn.tree import DecisionTreeClassifier
decision=DecisionTreeClassifier()
decision.fit(train_x,train_y)
y_pred=decision.predict(test_x)
accuracy_score(test_y,y_pred)