SlideShare a Scribd company logo
Course Title Portfolio
Name
Email
Abstract—This document This document This document
This document This document This document This document
This document This document This document This document
This document This document This document This document
This document This document This document This document
This document This document This document This document
This document This document This document This document
This document This document This document This document
This document.
Keywords—mean, standard deviation, variance, probability
density function, classifier
I. INTRODUCTION
This document This document This document This
document This document This document This document This
document This document This document This document This
document This document This document This document This
document. [1].
This project practiced the use of density estimation
through several calculations via the Naïve Bayes Classifier.
The data for each equation was used to find the probability of
the mean for. Without using a built-in function, the first
feature, the mean, could be calculated using the equation in
Fig. 1. The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean()digit 0 or digit 1. The
test images were then classified based on the previous
calculations and the accuracy of the computations were
determined.
The project consisted of 4 tasks:
A. Extract features from the original training set
There were two features that needed to be extracted from
the original training set for each image. The first feature was
the average pixel brightness values within an image array.
The second was the standard deviation of all pixel
brightness values within an image array.
B. Calculate the parameters for the two-class Naïve Bayes
Classifiers
Using the features extracted from task A, multiple
calculations needed to be performed. For the training set
involving digit 0, the mean of all the average brightness
values was calculated. The variance was then calculated for
the same feature, regarding digit 0. Next, the mean of the
standard deviations involving digit 0 had to be computed. In
addition, the variance for the same feature was determined.
These four calculations had to then be repeated using the
training set for digit 1.
C. Classify all unknown labels of incoming data
Using the parameters obtained in task B, every image in
each testing sample had to be compared with the
corresponding training set for that particular digit, 0 or 1.
The probability of that image being a 0 or 1 needed to be
determined so it can then be classified.
D. Calculate the accuracy of the classifications
Using the predicted classifications from task C, the
accuracy of the predictions needed to be calculated for both
digit 0 and digit 1, respectively.
Each equation was used to find the probability of the
mean for. Without using a built-in function, the first feature,
the mean, could be calculated using the equation in Fig. 1.
The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean()of the data. These
features helped formulate the probability density function
when determining the classification.
II. DESCRIPTION OF SOLUTION
This project required a series of computations in order to
successfully equation was used to find the probability of the
mean for. Without using a built-in function, the first feature,
the mean, could be calculated using the equation in Fig. 1.
The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean(). Once acquiring the
data, the appropriate calculations could be made.
A. Finding the mean and standard deviation
The data was provided in the form of NumPy arrays,
which made it useful for performing routine mathematical
operations equation was used to find the probability of the
mean for. Without using a built-in function, the first feature,
the mean, could be calculated using the equation in Fig. 1.
The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean()by calling
‘numpy.std()’, another useful NumPy function. These
extracted features from the training set for digit 0 also had to
be evaluated from the training set for digit 1. Once all the
features for each image were obtained from both training
sets, the next task could be completed.
Equ. 1. Mean formula
B. Determining the parameters for the Naïve Bayes
Classifiers
To equation was used to find the probability of the mean
for. Without using a built-in function, the first feature, the
mean, could be calculated using the equation in Fig. 1. The
second feature, the standard deviation, could be calculated
using the equation in Fig. 2. Utilizing the training set for
digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean() and the array of the
standard deviations created for digit 1.
Equ. 2. Variance formula
This equation was used to find the probability of the
mean for. Without using a built-in function, the first feature,
the mean, could be calculated using the equation in Fig. 1.
The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean()’ for each image in the
set. In addition, the standard deviation of the pixel
brightness values was calculated for each image by calling
‘numpy.std()’, another useful NumPy function. These
extracted features from the training. This was multiplied by
the prior probability, which is 0.5 in this case because the
value is either a 0 or a 1.
This ]. Without using a built-in function, the first feature,
the mean, could be calculated using the equation in Fig. 1.
The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean()’ for each image in the
set. In addition, the standard deviation of the pixel
brightness values was calculated for each image by calling
‘numpy.std()’, another useful NumPy function. These
extracted features from the training.
This entire procedure had to be conducted once again but
utilizing the test sample for digit 1 instead. This meant
finding the mean and standard deviation of each image, using
the probability density function to calculate the probability of
the mean and probability of the standard deviation for digit 0,
and calculating the probability that the image is classified as
digit 0. The same operations had to be performed again, but
for the training set for digit 1. The probability of the image
being classified as digit 0 had to be compared to the
probability of the image being classified as digit 1. Again,
the larger of the two values suggested which digit to classify
as the label.
One aspect of machine learning that I understood better
after completion of the project was Gaussian distribution.
This normalized distribution style displays a bell-shape of
data in which the peak of the bell is where the mean of the
data is located [4]. A bimodal distribution is one that
displays two bell-shaped distributions on the same graph.
After calculating the features for both digit 0 and digit 1, the
probability density function gave statistical odds of that
particular image being classified under a specific bell -
shaped curve. An example of a bimodal distribution can be
seen in Fig. 7 below.
C. Determining the accuracy of the label
The mean for. Without using a built-in function, the first
feature, the mean, could be calculated using the equation in
Fig. 1. The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values was
determined by calling ‘numpy.mean()’ for each image in the
set. In addition, the standard deviation of the pixel
brightness values was calculated for each image by calling
‘numpy.std()’, another useful NumPy function. These
extracted features from the by the total number of images in
the test sample for digit 1.
III. RESULTS
mean for. Without using a built-in function, the first
feature, the mean, could be calculated using the equation in
Fig. 1. The second feature, the standard deviation, could be
calculated using the equation in Fig. 2. Utilizing the training
set for digit 0, the mean of the pixel brightness values w as
determined by calling ‘numpy.mean()’ for each image in the
set. In addition, the standard deviation of the pixel
brightness values was calculated for each image by calling
‘numpy.std()’, another useful NumPy function. These
extracted features from the also higher.
TABLE I. TRAINING SET FOR DIGIT 0
TTTTTTTT 000000
XXXXX 000000
When comparing the test images, the higher values of
the means and the standard deviations typically were labeled
as digit 0 and the lower ones as digit 1. However, this was
not always the case because then the calculated accuracy
would then be 100%.
The e. After classifying all the images in the test sample
for digit 0, the total amount predicted as digit 0 was 899.
This meant that the accuracy of classification was 0000%,
which can be represented in Fig. 5.
Fig. 1. Accuracy of classification for digit 0
The total amount of images in the test sample for digit 1
0000. After classifying all the images in the test sample for
digit 1, the total amount predicted as digit 00000. This
meant that the accuracy of classification was 00000%,
which can be represented in Fig. 6.
IV. LESSONS LEARNED
The procedures practiced in this project required skill in
the Python programming language, as well as understanding
concepts of statistics. It required plenty of practice to
implement statistical equations, such as finding the mean,
the standard deviation, and the variance. My foundational
knowledge of mathematical operations helped me gain an
initial understanding of how to set up classification
problems. My lack of understanding of the Python language
made it difficult to succeed initially. Proper syntax and
built-in functions had to be learned first before continuing
with solving the classification issue. For example, I had very
little understanding of NumPy prior to this project. I learned
that it was extremely beneficial for producing results of
mathematical operations. One of the biggest challenges for
me was creating and navigating through NumPy arrays
rather than a Python array. Looking back, it was a simple
issue that I solved after understanding how they were
uniquely formed. Once I had a grasp on the language and
built-in functions, I was able to create the probability
density function in the code and then apply classification
towards each image.
One aspect of machine learning that I understood better
after completion of the project was Gaussian distribution.
This normalized distribution style displays a bell-shape of
data in which the peak of the bell is where the mean of the
data is located [4]. A bimodal distribution is one that
displays two bell-shaped distributions on the same graph.
After calculating the features for both digit 0 and digit 1, the
probability density function gave statistical odds of that
particular image being classified under a specific bell -
shaped curve. An example of a bimodal distribution can be
seen in Fig. 7 below.
One aspect of machine learning that I understood better
after completion of the project was Gaussian distribution.
This normalized distribution style displays a bell-shape of
data in which the peak of the bell is where the mean of the
data is located [4]. A bimodal distribution is one that
displays two bell-shaped distributions on the same graph.
After calculating the features for both digit 0 and digit 1, the
probability density function gave statistical odds of that
particular image being classified under a specific bell -
shaped curve. An example of a bimodal distribution can be
seen in Fig. 7 below.
One aspect of machine learning that I understood better
after completion of the project was Gaussian distribution.
This normalized distribution style displays a bell-shape of
data in which the peak of the bell is where the mean of the
data is located [4]. A bimodal distribution is one that
displays two bell-shaped distributions on the same graph.
After calculating the features for both digit 0 and digit 1, the
probability density function gave statistical odds of that
particular image being classified under a specific bell -
shaped curve. An example of a bimodal distribution can be
seen in Fig. 7 below.
Fig. 2. Bimodal distribution example [5]
Upon completion of the project, I was able to realize that
One aspect of machine learning that I understood better after
completion of the project was Gaussian distribution. This
normalized distribution style displays a bell-shape of data in
which the peak of the bell is where the mean of the data is
located [4]. A bimodal distribution is one that displays two
bell-shaped distributions on the same graph. After
calculating the features for both digit 0 and digit 1, the
probability density function gave statistical odds of that
particular image being classified under a specific bell-
shaped curve. An example of a bimodal distribution can be
seen in Fig. 7 below.
One aspect of machine learning that I understood better
after completion of the project was Gaussian distribution.
This normalized distribution style displays a bell-shape of
data in which the peak of the bell is where the mean of the
data is located [4]. A bimodal distribution is one that
displays two bell-shaped distributions on the same graph.
After calculating the features for both digit 0 and digit 1, the
probability density function gave statistical odds of that
particular image being classified under a specific bell -
shaped curve. An example of a bimodal distribution can be
seen in Fig. 7 below.
One aspect of machine learning that I understood better
after completion of the project was Gaussian distribution.
This normalized distribution style displays a bell-shape of
data in which the peak of the the project was Gaussian
distribution. This normalized distribution style the project
was Gaussian distribution. This normalized distribution
style bell is where the mean of the data is located [4]. A
bimodal distribution is one that displays classified under a
specific bell-shaped curve. An example of a bimodal
distribution can be seen in Fig. 7 below.
Accuracy for Digit 0
Predicted as
digit 0
Predicted as
digit 1
V. REFERENCES
[1] N. Kumar, Naïve Bayes Classifiers, GeeksforGeeks, May 15,
2020.
Accessed on: Oct. 15, 2021. [Online]. Available:
https://www.geeksforgeeks.org/naive-bayes-classifiers/
[2] J. Brownlee, How to Develop a CNN for MNIST
Handwritten Digit
Classification, Aug. 24, 2020. Accessed on: Oct. 15, 2021.
[Online].
Available: https://machinelearningmastery.com/how-to-develop-
a-
convolutional-neural-network-from-scratch-for-mnist-
handwritten-
digit-classification/
[3] “What is NumPy,” June 22, 2021. Accessed on: Oct. 15,
2021.
[Online]. Available:
https://numpy.org/doc/stable/user/whatisnumpy.html
[4] J. Chen, Normal Distribution, Investopedia, Sept. 27, 2021.
Accessed
on: Oct. 15, 2021. [Online]. Available:
https://www.investopedia.com/terms/n/normaldistribution.asp
[5] “Bimodal Distribution,” Velaction, n.d. Accessed on: Oct.
15, 2021.
[Online]. Available: https://www.velaction.com/bimodal-
distribution/
I. IntroductionA. Extract features from the original training
setB. Calculate the parameters for the two-class Naïve Bayes
ClassifiersUsing the features extracted from task A, multiple
calculations needed to be performed. For the training set
involving digit 0, the mean of all the average brightness values
was calculated. The variance was then calculated for the same
feature, regard...C. Classify all unknown labels of incoming
dataD. Calculate the accuracy of the classificationsII.
Description of
Solution
A. Finding the mean and standard deviationB. Determining the
parameters for the Naïve Bayes ClassifiersC. Determining the
accuracy of the labelThe mean for. Without using a built-in
function, the first feature, the mean, could be calculated using
the equation in Fig. 1. The second feature, the standard
deviation, could be calculated using the equation in Fig. 2.
Utilizing the training set fo...III. ResultsIV. Lessons LearnedV.
References
[Recipient Name]
[Date]
Page 2
[Your Name]
[Street Address]
[City, ST ZIP Code]
[Date]
[Recipient Name]
[Title]
[Company Name]
[Street Address]
[City, ST ZIP Code]
Dear [Recipient Name]:
The first paragraph should thank the individual that interviewed
you, mentioning the specific title of the position and date. It
should include a leading sentence of your qualification and the
paragraph should be no longer than three sentences.
The second paragraph should focus on a specific topic covered
in the interview that shows you are a strong candidate for the
position. In this statement, you should tie your strength back to
the company’s projects or goals. The paragraph should be
approximately three to five sentences.
You may choose to do a third paragraph, if you think you did
not cover something that makes you a strong candidate or you
felt that you didn’t answer something to the best of your ability.
In this statement, you may want to reiterate a skill, knowledge
or qualification that makes you a good candidate. This
paragraph should be two to five sentences.
The last paragraph emphasizes your enthusiasm for the position,
the best time and phone number to reach you and mention any
follow-up date that you obtained during the interview. This
should be two to three sentences.
Sincerely,
[Your Name]
[Your Name]
[Street Address]
[City, ST ZIP Code]
[Date]
[Recipient Name]
[Title]
[Company Name]
[Street Address]
[City, ST ZIP Code]
Dear
[Recipient Name]
:
The
first paragraph
should thank the individual that interviewed you, mentioning
the
specific t
itle of the position and date.
It should include a leading sentence of your
qualification and the paragraph should be no longer than three
sentences.
The
second paragraph
should focus on a specific topic covered in the interview that
shows you are a stro
ng candidate for the position.
In this statement, you should tie your
strength back to t
he company’s projects or goals.
The paragraph should be
approximately three to five
sentences.
You ma
y choose to do a
third paragraph
, if you think you did not cover something that
makes you a strong candidate or you felt that you didn’t answer
somethin
g to the best of
your ability.
In this statement, you may want to
reiterate a
skill, k
nowledge or
qualification t
hat makes you a good candidate.
This paragraph should be two to five
sentences.
The
last paragraph
emphasizes your enthusiasm for the position, the best
time and
phone n
umber to reach you and mention
any follow
-
up date that you
obtained during the
interview.
This should be two to three sentences.
Sincerely,
[Your Name]
[Your Name]
[Street Address]
[City, ST ZIP Code]
[Date]
[Recipient Name]
[Title]
[Company Name]
[Street Address]
[City, ST ZIP Code]
Dear [Recipient Name]:
The first paragraph should thank the individual that interviewed
you, mentioning the
specific title of the position and date. It should include a
leading sentence of your
qualification and the paragraph should be no longer than three
sentences.
The second paragraph should focus on a specific topic covered
in the interview that
shows you are a strong candidate for the position. In this
statement, you should tie your
strength back to the company’s projects or goals. The paragraph
should be
approximately three to five sentences.
You may choose to do a third paragraph, if you think you did
not cover something that
makes you a strong candidate or you felt that you didn’t answer
something to the best of
your ability. In this statement, you may want to reiterate a skill,
knowledge or
qualification that makes you a good candidate. This paragraph
should be two to five
sentences.
The last paragraph emphasizes your enthusiasm for the position,
the best time and
phone number to reach you and mention any follow-up date that
you obtained during the
interview. This should be two to three sentences.
Sincerely,
[Your Name]
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 1/13
In [71]: import pandas as pd
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt
import numpy
from statsmodels.graphics.mosaicplot import mosaic
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score,
recall_scor
e, f1_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import GridSearchCV,
RandomizedSearchCV, tr
ain_test_split
import warnings
%matplotlib inline
df = pd.read_csv("data/adult.data", header=None, sep=", ")
df.columns = ["age", "workclass", "fnlwgt", "education",
"education-num"
, "marital-status", "occupation", "relationship", "race", "sex",
"capita
l-gain", "capital-loss", "hours-per-week", "native-country",
"class"]
df = df[df["workclass"] != '?']
df = df[df["education"] != '?']
df = df[df["marital-status"] != '?']
df = df[df["occupation"] != '?']
df = df[df["relationship"] != '?']
df = df[df["race"] != '?']
df = df[df["sex"] != '?']
df = df[df["native-country"] != '?']
below = df[df["class"] == "<=50K"]
above = df[df["class"] == ">50K"]
<ipython-input-71-d873bf4dac12>:19: ParserWarning: Falling
back to the
'python' engine because the 'c' engine does not support regex
separator
s (separators > 1 char and different from 's+' are interpreted as
rege
x); you can avoid this warning by specifying engine='python'.
df = pd.read_csv("data/adult.data", header=None, sep=", ")
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 2/13
In [61]: above_50k = Counter(above['native-country'])
below_50k = Counter(below['native-country'])
print('native-country')
fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10) )
axes[0].pie(above_50k.values(), labels=above_50k.keys(),
autopct='%1.0f%
%')
axes[0].set_title(">50K")
axes[1].pie(below_50k.values(), labels=below_50k.keys(),
autopct='%1.0f%
%')
axes[1].set_title("<=50K")
plt.show()
native-country
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 3/13
In [62]: above_50k = Counter(above['race'])
below_50k = Counter(below['race'])
print('race')
fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10))
axes[0].pie(above_50k.values(), labels=above_50k.keys(),
autopct='%1.0f%
%')
axes[0].set_title(">50K")
axes[1].pie(below_50k.values(), labels=below_50k.keys(),
autopct='%1.0f%
%')
axes[1].set_title("<=50K")
plt.show()
race
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 4/13
In [63]: above_50k = Counter(above['education'])
below_50k = Counter(below['education'])
print('education')
fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10))
axes[0].pie(above_50k.values(), labels=above_50k.keys(),
autopct='%1.0f%
%')
axes[0].set_title(">50K")
axes[1].pie(below_50k.values(), labels=below_50k.keys(),
autopct='%1.0f%
%')
axes[1].set_title("<=50K")
plt.show()
education
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 5/13
In [64]: above_50k = Counter(above['workclass'])
below_50k = Counter(below['workclass'])
print('workclass')
fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10))
axes[0].pie(above_50k.values(), labels=above_50k.keys(),
autopct='%1.0f%
%')
axes[0].set_title(">50K")
axes[1].pie(below_50k.values(), labels=below_50k.keys(),
autopct='%1.0f%
%')
axes[1].set_title("<=50K")
plt.show()
workclass
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 6/13
In [65]: fig, axes = plt.subplots(ncols=2, nrows=3,
figsize=(8,8))
fig.subplots_adjust(hspace=.5)
x = below['capital-gain']
y = below['age']
axes[0, 0].scatter(x,y)
axes[0, 0].set_title("<=50K")
axes[0, 0].set_xlabel('capital-gain')
axes[0, 0].set_ylabel('age')
x = above['capital-gain']
y = above['age']
axes[0, 1].scatter(x,y)
axes[0, 1].set_title(">50K")
axes[0, 1].set_xlabel('capital-gain')
axes[0, 1].set_ylabel('age')
x = below['age']
y = below['hours-per-week']
axes[1, 0].scatter(x,y)
axes[1, 0].set_title("<=50K")
axes[1, 0].set_xlabel('age')
axes[1, 0].set_ylabel('hours-per-week')
x = above['age']
y = above['hours-per-week']
axes[1, 1].scatter(x,y)
axes[1, 1].set_title(">50K")
axes[1, 1].set_xlabel('age')
axes[1, 1].set_ylabel('hours-per-week')
x = below['hours-per-week']
y = below['capital-gain']
axes[2, 0].scatter(x,y)
axes[2, 0].set_title("<=50K")
axes[2, 0].set_xlabel('hours-per-week')
axes[2, 0].set_ylabel('capital-gain')
x = above['hours-per-week']
y = above['capital-gain']
axes[2, 1].scatter(x,y)
axes[2, 1].set_title(">50K")
axes[2, 1].set_xlabel('hours-per-week')
axes[2, 1].set_ylabel('capital-gain')
plt.show()
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 7/13
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 8/13
In [50]: fig, axes = plt.subplots(ncols=1, nrows=1,
figsize=(15,10))
fig.subplots_adjust(hspace=.5)
mosaic(df, ['occupation', 'class'], ax=axes, axes_label=False)
plt.show()
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 9/13
In [51]: fig, axes = plt.subplots(ncols=1, nrows=1,
figsize=(15,10))
fig.subplots_adjust(hspace=.5)
mosaic(df, ['marital-status', 'class'], ax=axes, axes_label=False)
plt.show()
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 10/13
In [54]: fig, axes = plt.subplots(ncols=1, nrows=1,
figsize=(15,12))
fig.subplots_adjust(hspace=.5)
mosaic(df, ['education-num', 'class'], ax=axes,
axes_label=False)
plt.show()
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 11/13
In [90]: train = df
train = train.drop("capital-loss", axis=1)
train = train.drop("native-country", axis=1)
train = train.drop("fnlwgt", axis=1)
train = train.drop("education",axis=1)
def get_occupation(x):
if x in ["Exec-managerial", "Prof-specialty", "Protective-
serv"]:
return 1
elif x in ["Sales", "Transport-moving", "Tech-support",
"Craft-repai
r"]:
return 2
else:
return 3
def get_relationship(x):
if x == "Own-child":
return 6
elif x == "Other-relative":
return 5
elif x == "Unmarried":
return 4
elif x == "Not-in-family":
return 3
elif x == "Husband":
return 2
else:
return 1
def get_race(x):
if x == "Other":
return 5
elif x == "Amer-Indian-Eskimo":
return 4
elif x == "Black":
return 3
elif x == "White":
return 2
else:
return 1
def get_sex(x):
if x == "Male":
return 2
else:
return 1
def get_class(x):
if x == ">50K":
return 1
else:
return 0
def get_workclass(x):
if x == "Without-pay":
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 12/13
return 7
elif x == "Private":
return 6
elif x == "State-gov":
return 5
elif x == "Self-emp-not-inc":
return 4
elif x == "Local-gov":
return 3
elif x == "Federal-gov":
return 2
else:
return 1
def get_marital_status(x):
if x == "Never-married":
return 7
elif x == "Separated":
return 6
elif x == "Married-spouse-absent":
return 5
elif x == "Widowed":
return 4
elif x == "Divorced":
return 3
elif x == "Married-civ-spouse":
return 2
else:
return 1
train['workclass'] = train['workclass'].apply(get_workclass)
train['marital-status'] = train['marital-
status'].apply(get_marital_stat
us)
train['occupation'] = train['occupation'].apply(get_occupation)
train['relationship'] =
train['relationship'].apply(get_relationship)
train['race'] = train['race'].apply(get_race)
train['sex'] = train['sex'].apply(get_sex)
train['class'] = train['class'].apply(get_class)
Out[90]:
age workclass
education-
num
marital-
status occupation relationship race sex
capital-
gain
hours-
per-
week
cla
0 39 5 13 7 3 3 2 2 2174 40
1 50 4 13 2 1 2 2 2 0 13
2 38 6 9 3 3 3 2 2 0 40
3 53 6 7 2 3 2 3 2 0 40
4 28 6 13 2 1 1 3 1 0 40
2/26/22, 9:04 PM CSE578Project
localhost:8888/nbconvert/html/CSE578Project.ipynb?download
=false 13/13
In [96]: test=pd.read_csv("data/adult.test", header=None, sep=",
")
feature = train.iloc[:, :-1]
labels = train.iloc[:, -1]
feature_matrix1 = feature.values
labels1 = labels.values
train_data, test_data, train_labels, test_labels =
train_test_split(feat
ure_matrix1, labels1, test_size=0.2 , random_state=42)
transformed_train_data =
MinMaxScaler().fit_transform(train_data)
transformed_test_data =
MinMaxScaler().fit_transform(test_data)
In [97]: t
In [114]:
mod=LogisticRegression().fit(transformed_train_data,train_labe
ls)
test_predict=mod.predict(transformed_test_data)
acc=accuracy_score(test_labels, test_predict)
f1=f1_score(test_labels, test_predict)
prec=precision_score(test_labels,test_predict)
rec=recall_score(test_labels, test_predict)
In [115]: print("%.4ft%.4ft%.4ft%.4ft%s" % (acc, f1, prec,
rec, 'Logistic Regr
ession'))
In [ ]:
<ipython-input-96-90f00b23459c>:1: ParserWarning: Falling
back to the
'python' engine because the 'c' engine does not support regex
separator
s (separators > 1 char and different from 's+' are interpreted as
rege
x); you can avoid this warning by specifying engine='python'.
test=pd.read_csv("data/adult.test", header=None, sep=", ")
0.8409 0.6404 0.7500 0.5588 Logistic Regression
1
Individual Contribution Report
Pradeep Peddnade
Id: 1220962574
2
Reflection:
My overall role in the team was Data Analyst where I was
responsible for combining
theory in the group and practices to make and communicate data
insights that enabled my
team to make informed inferences regarding the data. Through
skills such as data analytics and
statistical modeling, my role as a data analyst was crucial in
mining and gathering data. Once
data is ready, performed exploratory analysis for native-
country, race, education, and work
class variables of the dataset.
The other role was charged with as a data analyst in the group
was to apply statistical
tools to construe the mined data by giving specific attention to
the trends and the various
patterns which would lead to predictive analytics to enable the
group to make informed
decisions and predictions.
Another role that I did for the group was to work on data
cleansing. The specific role
involved managing data though procedure that ensures data us
properly formatted and
irrelevant data points are removed.
Lessons Learned:
The wisdom that I would share with others regarding research
design is to ensure that
the design is straightforward and aimed towards answering the
research question. Having an
appropriate research design will assist the group to answer the
research question effectively. I
would also share with the team that it is very appropriate to
consider at the time of data
collection from sources and analyze the data into something that
the researcher the team
would want to consider. On how to best apply them is to
consider that it is appropriate for the
team to ensure that the data is analyzed appropriately and
structured appropriately. Make sure
data is cleansed and outliers are removed or normalized.
From the group, we can conclude that the research was an
honest effort that was
established to identify how the lessons learned are beyond the
project. The data analytics skills
ensured that the analyzed data was collected from the primary
sources of data, this prevent
3
the group from the biasedness of another research that was
previously conducted. In this, data
world there is unlimited data choosing right variable among the
data to answer the research
questions is very important by using correlation and other
techniques.
Assessment:
Additional skills that I learned from the course and during the
project work is choosing
the visualization type and variables from data set, which is a
very important in the analysis of
data. Through this skill, I was able to conceptualize and
properly analyze and interpret big data
that requires data modeling and management. Despite that is
through the group that I was able
to develop my communication skills since the data analytic role
needed an excellent
communicator who would interpret and explain the various
inferences to my group.
Group members are in different time zones, scheduling a time to
meet was
strenuousness. Everyone in the team was accommodating.
Future Application:
In my current role, I will analyze the metrics of the cluster and
logs to monitor the health
of the different services using Elasticsearch Kibana and
Grafana. The topics I learned in this
course will be greatly useful and I can apply it in building
metrics based Kibana dashboard for
Management to see the usage and cost incurred for each service
running in the cluster. And I
will use statistical methods on picking the fields interested
among thousands of available fields.
4

More Related Content

Similar to Course Title Portfolio Name EmailAbstract—Th

An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
IRJET Journal
 
Anish_Hemmady_assignmnt1_Report
Anish_Hemmady_assignmnt1_ReportAnish_Hemmady_assignmnt1_Report
Anish_Hemmady_assignmnt1_Report
anish h
 
Gil Shapira's Active Appearance Model slides
Gil Shapira's Active Appearance Model slidesGil Shapira's Active Appearance Model slides
Gil Shapira's Active Appearance Model slides
wolf
 
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVALA COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
cscpconf
 
A comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrievalA comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrieval
csandit
 
IRJET- 3D Vision System using Calibrated Stereo Camera
IRJET- 3D Vision System using Calibrated Stereo CameraIRJET- 3D Vision System using Calibrated Stereo Camera
IRJET- 3D Vision System using Calibrated Stereo Camera
IRJET Journal
 
Fuzzy Logic based Contrast Enhancement
Fuzzy Logic based Contrast EnhancementFuzzy Logic based Contrast Enhancement
Fuzzy Logic based Contrast Enhancement
Samrudh Keshava Kumar
 
Report
ReportReport
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
IRJET Journal
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
Sneha Ravikumar
 
ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLAB
ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLABANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLAB
ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLAB
Jim Jimenez
 
QR code decoding and Image Preprocessing
QR code decoding and Image Preprocessing QR code decoding and Image Preprocessing
QR code decoding and Image Preprocessing
Hasini Weerathunge
 
H05844346
H05844346H05844346
H05844346
IOSR-JEN
 
SinGAN for Image Denoising
SinGAN for Image DenoisingSinGAN for Image Denoising
SinGAN for Image Denoising
KhalilBergaoui
 
Real-Time Face Tracking with GPU Acceleration
Real-Time Face Tracking with GPU AccelerationReal-Time Face Tracking with GPU Acceleration
Real-Time Face Tracking with GPU Acceleration
QuEST Global (erstwhile NeST Software)
 
CUDA Accelerated Face Recognition
CUDA Accelerated Face RecognitionCUDA Accelerated Face Recognition
CUDA Accelerated Face Recognition
QuEST Global (erstwhile NeST Software)
 
Image processing
Image processingImage processing
Image processing
kamal330
 
Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...
Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...
Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...
sipij
 
Improvement and Enhancement Point Search Algorithm
Improvement and Enhancement Point Search AlgorithmImprovement and Enhancement Point Search Algorithm
Improvement and Enhancement Point Search Algorithm
IJCSIS Research Publications
 
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIMETHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
ijcsit
 

Similar to Course Title Portfolio Name EmailAbstract—Th (20)

An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
 
Anish_Hemmady_assignmnt1_Report
Anish_Hemmady_assignmnt1_ReportAnish_Hemmady_assignmnt1_Report
Anish_Hemmady_assignmnt1_Report
 
Gil Shapira's Active Appearance Model slides
Gil Shapira's Active Appearance Model slidesGil Shapira's Active Appearance Model slides
Gil Shapira's Active Appearance Model slides
 
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVALA COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
 
A comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrievalA comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrieval
 
IRJET- 3D Vision System using Calibrated Stereo Camera
IRJET- 3D Vision System using Calibrated Stereo CameraIRJET- 3D Vision System using Calibrated Stereo Camera
IRJET- 3D Vision System using Calibrated Stereo Camera
 
Fuzzy Logic based Contrast Enhancement
Fuzzy Logic based Contrast EnhancementFuzzy Logic based Contrast Enhancement
Fuzzy Logic based Contrast Enhancement
 
Report
ReportReport
Report
 
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLAB
ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLABANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLAB
ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES USING MATLAB
 
QR code decoding and Image Preprocessing
QR code decoding and Image Preprocessing QR code decoding and Image Preprocessing
QR code decoding and Image Preprocessing
 
H05844346
H05844346H05844346
H05844346
 
SinGAN for Image Denoising
SinGAN for Image DenoisingSinGAN for Image Denoising
SinGAN for Image Denoising
 
Real-Time Face Tracking with GPU Acceleration
Real-Time Face Tracking with GPU AccelerationReal-Time Face Tracking with GPU Acceleration
Real-Time Face Tracking with GPU Acceleration
 
CUDA Accelerated Face Recognition
CUDA Accelerated Face RecognitionCUDA Accelerated Face Recognition
CUDA Accelerated Face Recognition
 
Image processing
Image processingImage processing
Image processing
 
Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...
Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...
Introducing New Parameters to Compare the Accuracy and Reliability of Mean-Sh...
 
Improvement and Enhancement Point Search Algorithm
Improvement and Enhancement Point Search AlgorithmImprovement and Enhancement Point Search Algorithm
Improvement and Enhancement Point Search Algorithm
 
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIMETHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
 

More from CruzIbarra161

Business and Government Relations  Please respond to the following.docx
Business and Government Relations  Please respond to the following.docxBusiness and Government Relations  Please respond to the following.docx
Business and Government Relations  Please respond to the following.docx
CruzIbarra161
 
Business Continuity Planning Explain how components of the busine.docx
Business Continuity Planning Explain how components of the busine.docxBusiness Continuity Planning Explain how components of the busine.docx
Business Continuity Planning Explain how components of the busine.docx
CruzIbarra161
 
business and its environment Discuss the genesis, contributing fac.docx
business and its environment Discuss the genesis, contributing fac.docxbusiness and its environment Discuss the genesis, contributing fac.docx
business and its environment Discuss the genesis, contributing fac.docx
CruzIbarra161
 
business and its environment Discuss the genesis, contributing facto.docx
business and its environment Discuss the genesis, contributing facto.docxbusiness and its environment Discuss the genesis, contributing facto.docx
business and its environment Discuss the genesis, contributing facto.docx
CruzIbarra161
 
Business BUS 210 research outline1.Cover page 2.Table .docx
Business BUS 210 research outline1.Cover page 2.Table .docxBusiness BUS 210 research outline1.Cover page 2.Table .docx
Business BUS 210 research outline1.Cover page 2.Table .docx
CruzIbarra161
 
BUS 439 International Human Resource ManagementInstructor Steven .docx
BUS 439 International Human Resource ManagementInstructor Steven .docxBUS 439 International Human Resource ManagementInstructor Steven .docx
BUS 439 International Human Resource ManagementInstructor Steven .docx
CruzIbarra161
 
BUS 439 International Human Resource ManagementEmployee Value Pr.docx
BUS 439 International Human Resource ManagementEmployee Value Pr.docxBUS 439 International Human Resource ManagementEmployee Value Pr.docx
BUS 439 International Human Resource ManagementEmployee Value Pr.docx
CruzIbarra161
 
Bullzeye is a discount retailer offering a wide range of products,.docx
Bullzeye is a discount retailer offering a wide range of products,.docxBullzeye is a discount retailer offering a wide range of products,.docx
Bullzeye is a discount retailer offering a wide range of products,.docx
CruzIbarra161
 
Building on the work that you prepared for Milestones One through Th.docx
Building on the work that you prepared for Milestones One through Th.docxBuilding on the work that you prepared for Milestones One through Th.docx
Building on the work that you prepared for Milestones One through Th.docx
CruzIbarra161
 
Budget Legislation Once the budget has been prepared by the vari.docx
Budget Legislation Once the budget has been prepared by the vari.docxBudget Legislation Once the budget has been prepared by the vari.docx
Budget Legislation Once the budget has been prepared by the vari.docx
CruzIbarra161
 
Browsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docx
Browsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docxBrowsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docx
Browsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docx
CruzIbarra161
 
Brown Primary Care Dental clinics Oral Health Initiative p.docx
Brown Primary Care Dental clinics Oral Health Initiative p.docxBrown Primary Care Dental clinics Oral Health Initiative p.docx
Brown Primary Care Dental clinics Oral Health Initiative p.docx
CruzIbarra161
 
BUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docx
BUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docxBUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docx
BUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docx
CruzIbarra161
 
Build a binary search tree that holds first names.Create a menu .docx
Build a binary search tree that holds first names.Create a menu .docxBuild a binary search tree that holds first names.Create a menu .docx
Build a binary search tree that holds first names.Create a menu .docx
CruzIbarra161
 
Briefly describe the development of the string quartet. How would yo.docx
Briefly describe the development of the string quartet. How would yo.docxBriefly describe the development of the string quartet. How would yo.docx
Briefly describe the development of the string quartet. How would yo.docx
CruzIbarra161
 
Briefly describe a time when you were misled by everyday observation.docx
Briefly describe a time when you were misled by everyday observation.docxBriefly describe a time when you were misled by everyday observation.docx
Briefly describe a time when you were misled by everyday observation.docx
CruzIbarra161
 
Broadening Your Perspective 8-1The financial statements of Toots.docx
Broadening Your Perspective 8-1The financial statements of Toots.docxBroadening Your Perspective 8-1The financial statements of Toots.docx
Broadening Your Perspective 8-1The financial statements of Toots.docx
CruzIbarra161
 
Briefly discuss the differences in the old Minimum Foundation Prog.docx
Briefly discuss the differences in the old Minimum Foundation Prog.docxBriefly discuss the differences in the old Minimum Foundation Prog.docx
Briefly discuss the differences in the old Minimum Foundation Prog.docx
CruzIbarra161
 
Briefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docx
Briefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docxBriefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docx
Briefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docx
CruzIbarra161
 
Brief Exercise 9-11Suppose Nike, Inc. reported the followin.docx
Brief Exercise 9-11Suppose Nike, Inc. reported the followin.docxBrief Exercise 9-11Suppose Nike, Inc. reported the followin.docx
Brief Exercise 9-11Suppose Nike, Inc. reported the followin.docx
CruzIbarra161
 

More from CruzIbarra161 (20)

Business and Government Relations  Please respond to the following.docx
Business and Government Relations  Please respond to the following.docxBusiness and Government Relations  Please respond to the following.docx
Business and Government Relations  Please respond to the following.docx
 
Business Continuity Planning Explain how components of the busine.docx
Business Continuity Planning Explain how components of the busine.docxBusiness Continuity Planning Explain how components of the busine.docx
Business Continuity Planning Explain how components of the busine.docx
 
business and its environment Discuss the genesis, contributing fac.docx
business and its environment Discuss the genesis, contributing fac.docxbusiness and its environment Discuss the genesis, contributing fac.docx
business and its environment Discuss the genesis, contributing fac.docx
 
business and its environment Discuss the genesis, contributing facto.docx
business and its environment Discuss the genesis, contributing facto.docxbusiness and its environment Discuss the genesis, contributing facto.docx
business and its environment Discuss the genesis, contributing facto.docx
 
Business BUS 210 research outline1.Cover page 2.Table .docx
Business BUS 210 research outline1.Cover page 2.Table .docxBusiness BUS 210 research outline1.Cover page 2.Table .docx
Business BUS 210 research outline1.Cover page 2.Table .docx
 
BUS 439 International Human Resource ManagementInstructor Steven .docx
BUS 439 International Human Resource ManagementInstructor Steven .docxBUS 439 International Human Resource ManagementInstructor Steven .docx
BUS 439 International Human Resource ManagementInstructor Steven .docx
 
BUS 439 International Human Resource ManagementEmployee Value Pr.docx
BUS 439 International Human Resource ManagementEmployee Value Pr.docxBUS 439 International Human Resource ManagementEmployee Value Pr.docx
BUS 439 International Human Resource ManagementEmployee Value Pr.docx
 
Bullzeye is a discount retailer offering a wide range of products,.docx
Bullzeye is a discount retailer offering a wide range of products,.docxBullzeye is a discount retailer offering a wide range of products,.docx
Bullzeye is a discount retailer offering a wide range of products,.docx
 
Building on the work that you prepared for Milestones One through Th.docx
Building on the work that you prepared for Milestones One through Th.docxBuilding on the work that you prepared for Milestones One through Th.docx
Building on the work that you prepared for Milestones One through Th.docx
 
Budget Legislation Once the budget has been prepared by the vari.docx
Budget Legislation Once the budget has been prepared by the vari.docxBudget Legislation Once the budget has been prepared by the vari.docx
Budget Legislation Once the budget has been prepared by the vari.docx
 
Browsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docx
Browsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docxBrowsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docx
Browsing the podcasts on iTunes or YouTube, listen to a few of Gramm.docx
 
Brown Primary Care Dental clinics Oral Health Initiative p.docx
Brown Primary Care Dental clinics Oral Health Initiative p.docxBrown Primary Care Dental clinics Oral Health Initiative p.docx
Brown Primary Care Dental clinics Oral Health Initiative p.docx
 
BUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docx
BUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docxBUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docx
BUDDHISMWEEK 3Cosmogony - Origin of the UniverseNature of .docx
 
Build a binary search tree that holds first names.Create a menu .docx
Build a binary search tree that holds first names.Create a menu .docxBuild a binary search tree that holds first names.Create a menu .docx
Build a binary search tree that holds first names.Create a menu .docx
 
Briefly describe the development of the string quartet. How would yo.docx
Briefly describe the development of the string quartet. How would yo.docxBriefly describe the development of the string quartet. How would yo.docx
Briefly describe the development of the string quartet. How would yo.docx
 
Briefly describe a time when you were misled by everyday observation.docx
Briefly describe a time when you were misled by everyday observation.docxBriefly describe a time when you were misled by everyday observation.docx
Briefly describe a time when you were misled by everyday observation.docx
 
Broadening Your Perspective 8-1The financial statements of Toots.docx
Broadening Your Perspective 8-1The financial statements of Toots.docxBroadening Your Perspective 8-1The financial statements of Toots.docx
Broadening Your Perspective 8-1The financial statements of Toots.docx
 
Briefly discuss the differences in the old Minimum Foundation Prog.docx
Briefly discuss the differences in the old Minimum Foundation Prog.docxBriefly discuss the differences in the old Minimum Foundation Prog.docx
Briefly discuss the differences in the old Minimum Foundation Prog.docx
 
Briefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docx
Briefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docxBriefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docx
Briefly compare and contrast EHRs, EMRs, and PHRs. Include the typic.docx
 
Brief Exercise 9-11Suppose Nike, Inc. reported the followin.docx
Brief Exercise 9-11Suppose Nike, Inc. reported the followin.docxBrief Exercise 9-11Suppose Nike, Inc. reported the followin.docx
Brief Exercise 9-11Suppose Nike, Inc. reported the followin.docx
 

Recently uploaded

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 

Recently uploaded (20)

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 

Course Title Portfolio Name EmailAbstract—Th

  • 1. Course Title Portfolio Name Email Abstract—This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document. Keywords—mean, standard deviation, variance, probability density function, classifier I. INTRODUCTION This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document This document. [1]. This project practiced the use of density estimation
  • 2. through several calculations via the Naïve Bayes Classifier. The data for each equation was used to find the probability of the mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean()digit 0 or digit 1. The test images were then classified based on the previous calculations and the accuracy of the computations were determined. The project consisted of 4 tasks: A. Extract features from the original training set There were two features that needed to be extracted from the original training set for each image. The first feature was the average pixel brightness values within an image array. The second was the standard deviation of all pixel brightness values within an image array. B. Calculate the parameters for the two-class Naïve Bayes Classifiers Using the features extracted from task A, multiple calculations needed to be performed. For the training set involving digit 0, the mean of all the average brightness values was calculated. The variance was then calculated for the same feature, regarding digit 0. Next, the mean of the standard deviations involving digit 0 had to be computed. In addition, the variance for the same feature was determined. These four calculations had to then be repeated using the training set for digit 1. C. Classify all unknown labels of incoming data
  • 3. Using the parameters obtained in task B, every image in each testing sample had to be compared with the corresponding training set for that particular digit, 0 or 1. The probability of that image being a 0 or 1 needed to be determined so it can then be classified. D. Calculate the accuracy of the classifications Using the predicted classifications from task C, the accuracy of the predictions needed to be calculated for both digit 0 and digit 1, respectively. Each equation was used to find the probability of the mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean()of the data. These features helped formulate the probability density function when determining the classification. II. DESCRIPTION OF SOLUTION This project required a series of computations in order to successfully equation was used to find the probability of the mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean(). Once acquiring the data, the appropriate calculations could be made.
  • 4. A. Finding the mean and standard deviation The data was provided in the form of NumPy arrays, which made it useful for performing routine mathematical operations equation was used to find the probability of the mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean()by calling ‘numpy.std()’, another useful NumPy function. These extracted features from the training set for digit 0 also had to be evaluated from the training set for digit 1. Once all the features for each image were obtained from both training sets, the next task could be completed. Equ. 1. Mean formula B. Determining the parameters for the Naïve Bayes Classifiers To equation was used to find the probability of the mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean() and the array of the standard deviations created for digit 1.
  • 5. Equ. 2. Variance formula This equation was used to find the probability of the mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean()’ for each image in the set. In addition, the standard deviation of the pixel brightness values was calculated for each image by calling ‘numpy.std()’, another useful NumPy function. These extracted features from the training. This was multiplied by the prior probability, which is 0.5 in this case because the value is either a 0 or a 1. This ]. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean()’ for each image in the set. In addition, the standard deviation of the pixel brightness values was calculated for each image by calling ‘numpy.std()’, another useful NumPy function. These extracted features from the training. This entire procedure had to be conducted once again but utilizing the test sample for digit 1 instead. This meant finding the mean and standard deviation of each image, using the probability density function to calculate the probability of the mean and probability of the standard deviation for digit 0, and calculating the probability that the image is classified as
  • 6. digit 0. The same operations had to be performed again, but for the training set for digit 1. The probability of the image being classified as digit 0 had to be compared to the probability of the image being classified as digit 1. Again, the larger of the two values suggested which digit to classify as the label. One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the bell is where the mean of the data is located [4]. A bimodal distribution is one that displays two bell-shaped distributions on the same graph. After calculating the features for both digit 0 and digit 1, the probability density function gave statistical odds of that particular image being classified under a specific bell - shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below. C. Determining the accuracy of the label The mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values was determined by calling ‘numpy.mean()’ for each image in the set. In addition, the standard deviation of the pixel brightness values was calculated for each image by calling ‘numpy.std()’, another useful NumPy function. These extracted features from the by the total number of images in the test sample for digit 1.
  • 7. III. RESULTS mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set for digit 0, the mean of the pixel brightness values w as determined by calling ‘numpy.mean()’ for each image in the set. In addition, the standard deviation of the pixel brightness values was calculated for each image by calling ‘numpy.std()’, another useful NumPy function. These extracted features from the also higher. TABLE I. TRAINING SET FOR DIGIT 0 TTTTTTTT 000000 XXXXX 000000 When comparing the test images, the higher values of the means and the standard deviations typically were labeled as digit 0 and the lower ones as digit 1. However, this was not always the case because then the calculated accuracy would then be 100%. The e. After classifying all the images in the test sample for digit 0, the total amount predicted as digit 0 was 899. This meant that the accuracy of classification was 0000%, which can be represented in Fig. 5.
  • 8. Fig. 1. Accuracy of classification for digit 0 The total amount of images in the test sample for digit 1 0000. After classifying all the images in the test sample for digit 1, the total amount predicted as digit 00000. This meant that the accuracy of classification was 00000%, which can be represented in Fig. 6. IV. LESSONS LEARNED The procedures practiced in this project required skill in the Python programming language, as well as understanding concepts of statistics. It required plenty of practice to implement statistical equations, such as finding the mean, the standard deviation, and the variance. My foundational knowledge of mathematical operations helped me gain an initial understanding of how to set up classification problems. My lack of understanding of the Python language made it difficult to succeed initially. Proper syntax and built-in functions had to be learned first before continuing with solving the classification issue. For example, I had very little understanding of NumPy prior to this project. I learned that it was extremely beneficial for producing results of mathematical operations. One of the biggest challenges for me was creating and navigating through NumPy arrays rather than a Python array. Looking back, it was a simple issue that I solved after understanding how they were uniquely formed. Once I had a grasp on the language and built-in functions, I was able to create the probability
  • 9. density function in the code and then apply classification towards each image. One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the bell is where the mean of the data is located [4]. A bimodal distribution is one that displays two bell-shaped distributions on the same graph. After calculating the features for both digit 0 and digit 1, the probability density function gave statistical odds of that particular image being classified under a specific bell - shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below. One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the bell is where the mean of the data is located [4]. A bimodal distribution is one that displays two bell-shaped distributions on the same graph. After calculating the features for both digit 0 and digit 1, the probability density function gave statistical odds of that particular image being classified under a specific bell - shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below. One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the bell is where the mean of the data is located [4]. A bimodal distribution is one that displays two bell-shaped distributions on the same graph. After calculating the features for both digit 0 and digit 1, the
  • 10. probability density function gave statistical odds of that particular image being classified under a specific bell - shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below. Fig. 2. Bimodal distribution example [5] Upon completion of the project, I was able to realize that One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the bell is where the mean of the data is located [4]. A bimodal distribution is one that displays two bell-shaped distributions on the same graph. After calculating the features for both digit 0 and digit 1, the probability density function gave statistical odds of that particular image being classified under a specific bell- shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below. One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the bell is where the mean of the data is located [4]. A bimodal distribution is one that displays two bell-shaped distributions on the same graph. After calculating the features for both digit 0 and digit 1, the probability density function gave statistical odds of that particular image being classified under a specific bell - shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below.
  • 11. One aspect of machine learning that I understood better after completion of the project was Gaussian distribution. This normalized distribution style displays a bell-shape of data in which the peak of the the project was Gaussian distribution. This normalized distribution style the project was Gaussian distribution. This normalized distribution style bell is where the mean of the data is located [4]. A bimodal distribution is one that displays classified under a specific bell-shaped curve. An example of a bimodal distribution can be seen in Fig. 7 below. Accuracy for Digit 0 Predicted as digit 0 Predicted as digit 1 V. REFERENCES [1] N. Kumar, Naïve Bayes Classifiers, GeeksforGeeks, May 15, 2020. Accessed on: Oct. 15, 2021. [Online]. Available: https://www.geeksforgeeks.org/naive-bayes-classifiers/ [2] J. Brownlee, How to Develop a CNN for MNIST Handwritten Digit Classification, Aug. 24, 2020. Accessed on: Oct. 15, 2021. [Online]. Available: https://machinelearningmastery.com/how-to-develop- a- convolutional-neural-network-from-scratch-for-mnist- handwritten-
  • 12. digit-classification/ [3] “What is NumPy,” June 22, 2021. Accessed on: Oct. 15, 2021. [Online]. Available: https://numpy.org/doc/stable/user/whatisnumpy.html [4] J. Chen, Normal Distribution, Investopedia, Sept. 27, 2021. Accessed on: Oct. 15, 2021. [Online]. Available: https://www.investopedia.com/terms/n/normaldistribution.asp [5] “Bimodal Distribution,” Velaction, n.d. Accessed on: Oct. 15, 2021. [Online]. Available: https://www.velaction.com/bimodal- distribution/ I. IntroductionA. Extract features from the original training setB. Calculate the parameters for the two-class Naïve Bayes ClassifiersUsing the features extracted from task A, multiple calculations needed to be performed. For the training set involving digit 0, the mean of all the average brightness values was calculated. The variance was then calculated for the same feature, regard...C. Classify all unknown labels of incoming dataD. Calculate the accuracy of the classificationsII. Description of Solution A. Finding the mean and standard deviationB. Determining the parameters for the Naïve Bayes ClassifiersC. Determining the
  • 13. accuracy of the labelThe mean for. Without using a built-in function, the first feature, the mean, could be calculated using the equation in Fig. 1. The second feature, the standard deviation, could be calculated using the equation in Fig. 2. Utilizing the training set fo...III. ResultsIV. Lessons LearnedV. References [Recipient Name] [Date] Page 2 [Your Name] [Street Address] [City, ST ZIP Code] [Date] [Recipient Name] [Title] [Company Name] [Street Address] [City, ST ZIP Code] Dear [Recipient Name]: The first paragraph should thank the individual that interviewed you, mentioning the specific title of the position and date. It should include a leading sentence of your qualification and the paragraph should be no longer than three sentences. The second paragraph should focus on a specific topic covered
  • 14. in the interview that shows you are a strong candidate for the position. In this statement, you should tie your strength back to the company’s projects or goals. The paragraph should be approximately three to five sentences. You may choose to do a third paragraph, if you think you did not cover something that makes you a strong candidate or you felt that you didn’t answer something to the best of your ability. In this statement, you may want to reiterate a skill, knowledge or qualification that makes you a good candidate. This paragraph should be two to five sentences. The last paragraph emphasizes your enthusiasm for the position, the best time and phone number to reach you and mention any follow-up date that you obtained during the interview. This should be two to three sentences. Sincerely, [Your Name] [Your Name] [Street Address] [City, ST ZIP Code] [Date]
  • 15. [Recipient Name] [Title] [Company Name] [Street Address] [City, ST ZIP Code] Dear [Recipient Name] : The first paragraph should thank the individual that interviewed you, mentioning the specific t itle of the position and date. It should include a leading sentence of your qualification and the paragraph should be no longer than three sentences.
  • 16. The second paragraph should focus on a specific topic covered in the interview that shows you are a stro ng candidate for the position. In this statement, you should tie your strength back to t he company’s projects or goals. The paragraph should be approximately three to five sentences. You ma y choose to do a third paragraph , if you think you did not cover something that makes you a strong candidate or you felt that you didn’t answer somethin g to the best of your ability. In this statement, you may want to reiterate a
  • 17. skill, k nowledge or qualification t hat makes you a good candidate. This paragraph should be two to five sentences. The last paragraph emphasizes your enthusiasm for the position, the best time and phone n umber to reach you and mention any follow - up date that you obtained during the interview. This should be two to three sentences. Sincerely,
  • 18. [Your Name] [Your Name] [Street Address] [City, ST ZIP Code] [Date] [Recipient Name] [Title] [Company Name] [Street Address] [City, ST ZIP Code] Dear [Recipient Name]: The first paragraph should thank the individual that interviewed you, mentioning the specific title of the position and date. It should include a leading sentence of your qualification and the paragraph should be no longer than three sentences. The second paragraph should focus on a specific topic covered in the interview that shows you are a strong candidate for the position. In this statement, you should tie your strength back to the company’s projects or goals. The paragraph should be approximately three to five sentences.
  • 19. You may choose to do a third paragraph, if you think you did not cover something that makes you a strong candidate or you felt that you didn’t answer something to the best of your ability. In this statement, you may want to reiterate a skill, knowledge or qualification that makes you a good candidate. This paragraph should be two to five sentences. The last paragraph emphasizes your enthusiasm for the position, the best time and phone number to reach you and mention any follow-up date that you obtained during the interview. This should be two to three sentences. Sincerely, [Your Name] 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 1/13 In [71]: import pandas as pd
  • 20. import numpy as np from collections import Counter import matplotlib.pyplot as plt import numpy from statsmodels.graphics.mosaicplot import mosaic from sklearn.preprocessing import MinMaxScaler from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score, precision_score, recall_scor e, f1_score from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, tr ain_test_split import warnings %matplotlib inline df = pd.read_csv("data/adult.data", header=None, sep=", ") df.columns = ["age", "workclass", "fnlwgt", "education", "education-num" , "marital-status", "occupation", "relationship", "race", "sex", "capita l-gain", "capital-loss", "hours-per-week", "native-country", "class"]
  • 21. df = df[df["workclass"] != '?'] df = df[df["education"] != '?'] df = df[df["marital-status"] != '?'] df = df[df["occupation"] != '?'] df = df[df["relationship"] != '?'] df = df[df["race"] != '?'] df = df[df["sex"] != '?'] df = df[df["native-country"] != '?'] below = df[df["class"] == "<=50K"] above = df[df["class"] == ">50K"] <ipython-input-71-d873bf4dac12>:19: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separator s (separators > 1 char and different from 's+' are interpreted as rege x); you can avoid this warning by specifying engine='python'. df = pd.read_csv("data/adult.data", header=None, sep=", ") 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download
  • 22. =false 2/13 In [61]: above_50k = Counter(above['native-country']) below_50k = Counter(below['native-country']) print('native-country') fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10) ) axes[0].pie(above_50k.values(), labels=above_50k.keys(), autopct='%1.0f% %') axes[0].set_title(">50K") axes[1].pie(below_50k.values(), labels=below_50k.keys(), autopct='%1.0f% %') axes[1].set_title("<=50K") plt.show() native-country 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 3/13
  • 23. In [62]: above_50k = Counter(above['race']) below_50k = Counter(below['race']) print('race') fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10)) axes[0].pie(above_50k.values(), labels=above_50k.keys(), autopct='%1.0f% %') axes[0].set_title(">50K") axes[1].pie(below_50k.values(), labels=below_50k.keys(), autopct='%1.0f% %') axes[1].set_title("<=50K") plt.show() race 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 4/13 In [63]: above_50k = Counter(above['education'])
  • 24. below_50k = Counter(below['education']) print('education') fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10)) axes[0].pie(above_50k.values(), labels=above_50k.keys(), autopct='%1.0f% %') axes[0].set_title(">50K") axes[1].pie(below_50k.values(), labels=below_50k.keys(), autopct='%1.0f% %') axes[1].set_title("<=50K") plt.show() education 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 5/13 In [64]: above_50k = Counter(above['workclass']) below_50k = Counter(below['workclass']) print('workclass')
  • 25. fig, axes = plt.subplots(ncols=1, nrows=2, figsize=(5,10)) axes[0].pie(above_50k.values(), labels=above_50k.keys(), autopct='%1.0f% %') axes[0].set_title(">50K") axes[1].pie(below_50k.values(), labels=below_50k.keys(), autopct='%1.0f% %') axes[1].set_title("<=50K") plt.show() workclass 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 6/13 In [65]: fig, axes = plt.subplots(ncols=2, nrows=3, figsize=(8,8)) fig.subplots_adjust(hspace=.5) x = below['capital-gain']
  • 26. y = below['age'] axes[0, 0].scatter(x,y) axes[0, 0].set_title("<=50K") axes[0, 0].set_xlabel('capital-gain') axes[0, 0].set_ylabel('age') x = above['capital-gain'] y = above['age'] axes[0, 1].scatter(x,y) axes[0, 1].set_title(">50K") axes[0, 1].set_xlabel('capital-gain') axes[0, 1].set_ylabel('age') x = below['age'] y = below['hours-per-week'] axes[1, 0].scatter(x,y) axes[1, 0].set_title("<=50K") axes[1, 0].set_xlabel('age') axes[1, 0].set_ylabel('hours-per-week') x = above['age'] y = above['hours-per-week'] axes[1, 1].scatter(x,y) axes[1, 1].set_title(">50K") axes[1, 1].set_xlabel('age') axes[1, 1].set_ylabel('hours-per-week') x = below['hours-per-week']
  • 27. y = below['capital-gain'] axes[2, 0].scatter(x,y) axes[2, 0].set_title("<=50K") axes[2, 0].set_xlabel('hours-per-week') axes[2, 0].set_ylabel('capital-gain') x = above['hours-per-week'] y = above['capital-gain'] axes[2, 1].scatter(x,y) axes[2, 1].set_title(">50K") axes[2, 1].set_xlabel('hours-per-week') axes[2, 1].set_ylabel('capital-gain') plt.show() 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 7/13 2/26/22, 9:04 PM CSE578Project
  • 28. localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 8/13 In [50]: fig, axes = plt.subplots(ncols=1, nrows=1, figsize=(15,10)) fig.subplots_adjust(hspace=.5) mosaic(df, ['occupation', 'class'], ax=axes, axes_label=False) plt.show() 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 9/13 In [51]: fig, axes = plt.subplots(ncols=1, nrows=1, figsize=(15,10)) fig.subplots_adjust(hspace=.5) mosaic(df, ['marital-status', 'class'], ax=axes, axes_label=False) plt.show() 2/26/22, 9:04 PM CSE578Project
  • 29. localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 10/13 In [54]: fig, axes = plt.subplots(ncols=1, nrows=1, figsize=(15,12)) fig.subplots_adjust(hspace=.5) mosaic(df, ['education-num', 'class'], ax=axes, axes_label=False) plt.show() 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 11/13 In [90]: train = df train = train.drop("capital-loss", axis=1) train = train.drop("native-country", axis=1) train = train.drop("fnlwgt", axis=1) train = train.drop("education",axis=1) def get_occupation(x):
  • 30. if x in ["Exec-managerial", "Prof-specialty", "Protective- serv"]: return 1 elif x in ["Sales", "Transport-moving", "Tech-support", "Craft-repai r"]: return 2 else: return 3 def get_relationship(x): if x == "Own-child": return 6 elif x == "Other-relative": return 5 elif x == "Unmarried": return 4 elif x == "Not-in-family": return 3 elif x == "Husband": return 2 else: return 1
  • 31. def get_race(x): if x == "Other": return 5 elif x == "Amer-Indian-Eskimo": return 4 elif x == "Black": return 3 elif x == "White": return 2 else: return 1 def get_sex(x): if x == "Male": return 2 else: return 1 def get_class(x): if x == ">50K": return 1 else: return 0 def get_workclass(x):
  • 32. if x == "Without-pay": 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 12/13 return 7 elif x == "Private": return 6 elif x == "State-gov": return 5 elif x == "Self-emp-not-inc": return 4 elif x == "Local-gov": return 3 elif x == "Federal-gov": return 2 else: return 1 def get_marital_status(x):
  • 33. if x == "Never-married": return 7 elif x == "Separated": return 6 elif x == "Married-spouse-absent": return 5 elif x == "Widowed": return 4 elif x == "Divorced": return 3 elif x == "Married-civ-spouse": return 2 else: return 1 train['workclass'] = train['workclass'].apply(get_workclass) train['marital-status'] = train['marital- status'].apply(get_marital_stat us) train['occupation'] = train['occupation'].apply(get_occupation) train['relationship'] = train['relationship'].apply(get_relationship) train['race'] = train['race'].apply(get_race) train['sex'] = train['sex'].apply(get_sex) train['class'] = train['class'].apply(get_class)
  • 34. Out[90]: age workclass education- num marital- status occupation relationship race sex capital- gain hours- per- week cla 0 39 5 13 7 3 3 2 2 2174 40 1 50 4 13 2 1 2 2 2 0 13 2 38 6 9 3 3 3 2 2 0 40
  • 35. 3 53 6 7 2 3 2 3 2 0 40 4 28 6 13 2 1 1 3 1 0 40 2/26/22, 9:04 PM CSE578Project localhost:8888/nbconvert/html/CSE578Project.ipynb?download =false 13/13 In [96]: test=pd.read_csv("data/adult.test", header=None, sep=", ") feature = train.iloc[:, :-1] labels = train.iloc[:, -1] feature_matrix1 = feature.values labels1 = labels.values train_data, test_data, train_labels, test_labels = train_test_split(feat ure_matrix1, labels1, test_size=0.2 , random_state=42) transformed_train_data = MinMaxScaler().fit_transform(train_data) transformed_test_data =
  • 36. MinMaxScaler().fit_transform(test_data) In [97]: t In [114]: mod=LogisticRegression().fit(transformed_train_data,train_labe ls) test_predict=mod.predict(transformed_test_data) acc=accuracy_score(test_labels, test_predict) f1=f1_score(test_labels, test_predict) prec=precision_score(test_labels,test_predict) rec=recall_score(test_labels, test_predict) In [115]: print("%.4ft%.4ft%.4ft%.4ft%s" % (acc, f1, prec, rec, 'Logistic Regr ession')) In [ ]: <ipython-input-96-90f00b23459c>:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separator s (separators > 1 char and different from 's+' are interpreted as rege
  • 37. x); you can avoid this warning by specifying engine='python'. test=pd.read_csv("data/adult.test", header=None, sep=", ") 0.8409 0.6404 0.7500 0.5588 Logistic Regression 1 Individual Contribution Report Pradeep Peddnade Id: 1220962574 2
  • 38. Reflection: My overall role in the team was Data Analyst where I was responsible for combining theory in the group and practices to make and communicate data insights that enabled my team to make informed inferences regarding the data. Through skills such as data analytics and statistical modeling, my role as a data analyst was crucial in mining and gathering data. Once data is ready, performed exploratory analysis for native- country, race, education, and work class variables of the dataset. The other role was charged with as a data analyst in the group was to apply statistical tools to construe the mined data by giving specific attention to the trends and the various
  • 39. patterns which would lead to predictive analytics to enable the group to make informed decisions and predictions. Another role that I did for the group was to work on data cleansing. The specific role involved managing data though procedure that ensures data us properly formatted and irrelevant data points are removed. Lessons Learned: The wisdom that I would share with others regarding research design is to ensure that the design is straightforward and aimed towards answering the research question. Having an appropriate research design will assist the group to answer the
  • 40. research question effectively. I would also share with the team that it is very appropriate to consider at the time of data collection from sources and analyze the data into something that the researcher the team would want to consider. On how to best apply them is to consider that it is appropriate for the team to ensure that the data is analyzed appropriately and structured appropriately. Make sure data is cleansed and outliers are removed or normalized. From the group, we can conclude that the research was an honest effort that was established to identify how the lessons learned are beyond the project. The data analytics skills ensured that the analyzed data was collected from the primary sources of data, this prevent
  • 41. 3 the group from the biasedness of another research that was previously conducted. In this, data world there is unlimited data choosing right variable among the data to answer the research questions is very important by using correlation and other techniques. Assessment: Additional skills that I learned from the course and during the project work is choosing the visualization type and variables from data set, which is a very important in the analysis of data. Through this skill, I was able to conceptualize and properly analyze and interpret big data
  • 42. that requires data modeling and management. Despite that is through the group that I was able to develop my communication skills since the data analytic role needed an excellent communicator who would interpret and explain the various inferences to my group. Group members are in different time zones, scheduling a time to meet was strenuousness. Everyone in the team was accommodating. Future Application: In my current role, I will analyze the metrics of the cluster and logs to monitor the health of the different services using Elasticsearch Kibana and Grafana. The topics I learned in this
  • 43. course will be greatly useful and I can apply it in building metrics based Kibana dashboard for Management to see the usage and cost incurred for each service running in the cluster. And I will use statistical methods on picking the fields interested among thousands of available fields. 4