1. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 1
SHREE N.J.SONECHA MNG &TECH INSTITUTE
CHANDUVAV
DevelopedBy:-
1) Asker Hema [195533693002]
2) Dusara Khushbu[195533693008]
3) Makvana Bharat[195533693020]
DATA ANALYSIS
2. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 2
Theory of any subject is important but without its
practice it becomes useless particularly for the computer
student. A Project developer student canât become a perfect
man of technologist without practical understanding of
branch. Hence this visiting provides golden opportunity for
all developer students.
The principal objective of the in office visiting is to get
details about the operation process which are carried out in
the proper used in the various place. Itâs another attractive
feature is to learn office management & discipline which is
equally important in life.
PREFACE
3. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 3
ACKNOWLEDGEMENT
The principal objective of the in office visiting is to get
details about the operation process which are carried out in
the proper used in the various place. Itâs another attractive
feature is to learn office management & discipline which is
equally important in life.
The success of any project is never limited to the
individuals undertaking the project. It is the collective effort
of the people around an individual that spell success. For all
effort, behind this successful project, we are highly intended
to the following personalities without whom this project
would never be completed.
Mr. Chirag Rachchh sir HOD of MCA Department, who
had guided us, regularly supervises our project. We would
like to express our deep gratitude to the all friends, for their
valuable suggestion and cooperation.
At last, our special thanks to Mr. Dipak Thanki sir
Assistant professor who have encouraged and motivated us
directly or indirectly.
4. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 4
INDEX
NO CONCEPT
1. BasicObjectives
1.1 Introduction
1.2 Need for Data Analytics
1.3 ProblemStatement
1.4 Introductionto Python
2. UnderstandData
2.1 About Data Source
2.2 Understand Data: Basic Questions
2.3 Understand Data: Data Wrangling
2.4 ExploratoryAnalysis
3. Methodology
3.1 Extract Features & Model Methodology
3.2 Introductionto Model and Methodology
3.3 Data Visualization
3.4 Various query outcome, visualization,
analysis and conclusion
5. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 5
3.5 Implementation of model & Methodology
3.6 Advantages and Limitations of proposed
Model
4. Conclusion
Conclusion
5. References
List of References
6. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 6
INTRODUCTION
Dataset Title Covid-19 India
Project For N. J. Sonecha Mgt & tech
institute chanduvav
Developed By
Dusara Khushbu
Asker Hema
Makvana Bharat
College N. J. Sonecha Mgt and tech
institute chanduvav
Project Guide Thanki Dipak Sir
7. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 7
1. Basic Objectives
Introduction
Data science projects offer you a promising way to
kick-start your career in this field. Not only do you get to
learn data science by applying it, you also get projects to
showcase on your CV! Nowadays, recruiters evaluate a
candidateâs potential by his/her work and donât put a lot
of emphasis on certifications. It wouldnât matter if you
just tell them how much you know if you have nothing to
show them! Thatâs where most people struggle and miss
out.
You might have worked on several problems before,
but if you canât make it presentable & easy-to-explain,
how on earth would someone know what you are capable
of? Thatâs where these projects will help you. Think of
the time youâll spend on these projects like your training
8. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 8
sessions. The more time you spend practicing, the better
youâll become!
Weâve made sure to provide you with a taste of a
variety of problems from different domains. We believe
everyone must learn to smartly work with huge amounts
of data, hence large datasets are included. Also, weâve
made sure all the datasets are open and free to access.
Data science spend a significant amount of time on theory
and not enough on practical application. To make real
progress along the path toward becoming a data scientist,
itâs important to start building data science projects as
soon as possible.
Need for Data Analytics
Data analytics is the science of analyzing raw data
in order to make conclusions about that information.
Many of the techniques and processes of data analytics
9. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 9
have been automated into mechanical processes
and algorithms that work over raw data for human
consumption.
Data analytics techniques can reveal trends and
metrics that would otherwise be lost in the mass of
information. This information can then be used to
optimize processes to increase the overall efficiency of
a business or system.
Data analytics (DA) is the process of examining data
sets in order to draw conclusions about the
information they contain, increasingly with the aid of
specialized systems and software. Data analytics
technologies and techniques are widely used in
commercial industries to enable organizations to
make more-informed business decisions and by
scientists and researchers to verify or disprove
scientific models, theories and hypotheses.
10. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 10
ď Why Data Analytics:
Data analytics is important because it helps
businesses optimize their performances. Implementing it
into the business model means companies can help reduce
costs by identifying more efficient ways of doing business
and by storing large amounts of data.
12. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 12
Problem Statement
ďProject Definition:
Our project is about which type of COVID-19 INDIA
problem occur like:
ďˇ Cough
ďˇ Fever
ďˇ Difficulty breathing (severe cases)
ďˇ Tiredness
ďOur project detail:-
These databasesare related to Covid-19 that started out
in China has now spread globally with countries
scrambling to tackle it. The virus that started out as a
healthcare emergency has now started to have serious
economic consequences.
For the purpose of this article, we will only be looking
at the dataat a countrylevel and not at the Province/State
13. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 13
level. Letâs create a consolidateddataset that combines the
datasets for Cases, Deaths and Recoveries. I have also
created a function to get daily count from the cumulative
data
Column Detail:-
(1)Date:
Date of cumulative report
(2)Name of State / UT:
Name of the state / Union Territory / National
5Capital Region
(3)Total Confirmed cases (Indian National):
Cumulative count of Indian national confirmed with
COVID-19
(4)Total Confirmed cases (Foreign National):
Cumulative count of foreign national confirmed
with COVID-19
(5)Cured/Discharged/Migrated:
14. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 14
Cumulative count of cured / discharged cases
(6)Latitude:
Latitude of the location
(7)Longitude:
Longitude of the location
(8)Death:
Cumulative count of deaths reported
(9)Total Confirmed cases:
Total confirmed cases
(10) Gender:
Age / Age range / Age bracket
(11) detected_city:
City in which case is detected
(12) detected_district:
District in which case is detected
(13) detected_state:
District in which case is detected
15. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 15
(14) state_code:
State in which case is detected
(15) current_status:
Current status
(16)Notes:
Note
(17) suspected_contacted_patient:
Suspected contacted patient
(18) Nationality:
Nationality of the patient
(19) type_of_transmission:
Type of transmission
(20) status_change_date:
Date on which case status changed
(21) backup_notes:
Backup notes
16. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 16
Introduction To Python
ďWhat is python:
Python is a popular programming language. It was
created by Guido van Rossum, and released in 1991.
Python is a general purpose programming language that
is becoming ever more popular for data science.
Companies worldwide are using python to insight from
their data and gain a competitive edge.
ďWhat can Python do:
Python can be used on a server to create web
applications.
17. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 17
Python can be used alongside software to create work
flows.
Python can connect to database systems. It can also read
and modify files.
Python can be used to handle big data and perform
complex mathematics.
Python can be used for rapid prototyping, or for
production-ready software development.
ďWhy Python?
Python works on different platforms (Windows, Mac,
Linux, Raspberry Pi, etc).
ďFeatures in Python:
ďˇ Easy to code
ďˇ Python is high level programming language.
ďˇ Free and Open Source
ďˇ Object-Oriented Language
ďˇ GUI Programming Support
ďˇ High-Level Language
18. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 18
ďˇ Extensible feature
ďˇ Python is Portable language
ďˇ Python is Integrated language
2. Understand Data
About Data Source:
Source: - Covid -19 India
Data Source link: -
https://www.kaggle.com/imdevskp/covid19-corona-virus-india-dataset
ďBasic Questions:
Q.1) What is a corona virus?
Corona viruses are a large family of viruses that are
known to cause illness ranging from the common cold to
more severe diseases such as Middle East Respiratory
19. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 19
Syndrome (MERS) and Severe Acute Respiratory
Syndrome (SARS).
Q.2) Who is most at risk for the corona virus
disease?
Peopleof all ages can be infected by the new corona virus
(2019-nCoV).Older people, and people with pre-existing
medical conditions (such as asthma, diabetes, heart
disease) appear to be more vulnerable to becoming
severely ill with the virus.
WHO advises people of all ages to take steps to protect
themselves from the virus, for example by following good
hand hygiene and good respiratory hygiene.
Q.3)Is there a vaccine for the corona virus
disease?
When a disease is new, there is no vaccine until one is
developed. It can take a number of years for a new
vaccine to be developed.
20. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 20
Q.4) Can corona viruses be transmitted from
person to person?
Yes, some corona viruses can be transmitted from person
to person, usually after close contact with an infected
patient, for example, in a household workplace, or health
care centre.
Q.5) Is there a treatment for a novel corona
virus?
There is no specific treatment for disease caused by a
novel corona virus.
Q.6) What can I do to protect myself?
Standard recommendations to reduce exposure to and
transmission of a range of illnesses include maintaining
basic hand and respiratory hygiene, and safe food
practices and avoiding close contact, when possible, with
anyone showing symptoms of respiratory illness such as
coughing and sneezing.
21. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 21
Q.7) Are health workers at risk from a novel
corona virus?
Yes, they can be, as health care workers come into contact
with patients more often than the general public WHO
recommends that health care workers consistently apply
appropriate
Data Wrangling
Data wrangling, sometimes referred to as data
mugging, is the process of transforming and
mapping data from one "raw" data form into
another format with the intent of making it more
appropriate and valuable for a variety of
downstream purposes such as analytics.
A data wrangler is a person who performs these
transformation operations.
22. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 22
ď Use of Data Wrangling:
The data transformations are typically applied to distinct
entities (e.g. fields, rows, columns, data values etc.)
within a data set, and could include such actions as
extractions, parsing, joining, standardizing, augmenting,
cleansing, consolidating and filtering to create desired
wrangling outputs that can be leveraged downstream.
Exploratory Analysis:
Exploratory data analysis (EDA) is an approach
to analyzing datasets to summarize their main
characteristics, often with visual methods.
A statistical method can be used or not, but primarily
EDA is for seeing what the data can tell us beyond the
formal modeling or hypothesis testing task.
Typical graphical techniquesused in EDA are:
23. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 23
ďˇ Box Plot
ďˇ Histogram
ďˇ Run Chart
ďˇ Pareto Chart
ďˇ Pie Chart
ďˇ Ogive chart
3. Methodology
Extract Feature & Method Methodology:
Important Column: -
Date:
Date of cumulative report
Name of State / UT:
Name of the state / Union Territory /
National Capital Region
Total Confirmed cases (Indian National):
Cumulative count of Indian national
confirmed with COVID-19
Total Confirmed cases (Foreign National):
24. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 24
Cumulative count of foreign national
confirmed with COVID-19
Cured/Discharged/Migrated:
Cumulative count of cured / discharged
cases
Latitude:
Latitude of the location
Longitude:
Longitude of the location
Death:
Cumulative count of deaths reported
Total Confirmed cases:
Total confirmed cases
Types of Model:-
Data modeling is the process of producing a
descriptive diagram of relationships between various
types of information that are to be stored in a database.
1.Test Datasets
25. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 25
2.Classification Test Problems
3.Regression Test Problems
Test Datasets: -
The test dataset is a dataset used to provide an unbiased
evaluation of a final model fit on the training dataset.
A test dataset is a dataset that is independent of the
training dataset, but that follows the same probability
distribution as the training dataset. If a model fit to the
training dataset also fits the test dataset well,
minimal over fitting has taken place.
Test datasets are small contrived problems that allow you
to test and debug your algorithms and test harness.
ďˇ They can be generated quickly and easily.
ďˇ They are small and easily visualized in two
dimensions.
ďˇ They can be scaled up trivially.
26. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 26
Classification of Test Problems:-
After building a predictive classification model, you need
to evaluate the performance of the model.
That is how good the model is in predicting the outcome
of new observations test data that have been not used to
train the model.
Blobs Classification Problem:-
Used for Gaussian distribution.
You can control how many blobs to generate and the
number of samples to generate.
ďˇ Moons Classification Problem:-
Use for binary classification and will generate a swirl
pattern
That is capable of learning nonlinear class boundaries.
27. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 27
ďˇ Circles Classification Problem:-
Use fall into concentric circles.
You can control the amount of noise in the shapes.
Regression Test Problems:-
Regression is the problem of predicting a quantity given
an observation
We will create a dataset with a linear relationship between
inputs and the outputs.
Problems and Issues of Linear Regression
o Specification
o Proxy Variables and Measurement Error
o Selection Bias
o Multicollinearity
o Autocorrelation
o Heteroskedasticity
o SimultaneousEquations
28. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 28
o Limited Dependent Variables
Introduction To Model & Methodology:
ďWhat is Regression?
Regression analysis is a set of statistical processes for
estimating the relationships among variables. It includes
many techniques for modeling and analyzing several
variables, when the focus is on the relationship between a
dependent variable and one or more independent
variables.
This technique is used for forecasting, time series
modeling and finding the casual effect
relationship between the variables.
ďWhy do we use Regression Analysis?
Regression analysis estimates the relationship between
two or more variables.
29. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 29
There are multiple benefits of using regression analysis.
They are as follows:
1.It indicates the significant relationships between
dependent variable and independent variable.
2.It indicates the strength of impact of
multiple independent variables on a dependent
variable.
ďTypes Of Regression:
1. LinearRegression:
It is one of the most widely known modeling technique.
Linear regression is usually among the first few topics
which people pick while learning predictive modeling.
In this technique, the dependent variable is continuous,
independentvariable(s) can be continuous or discrete and
nature of regression line is linear.
30. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 30
There must be linear relationship between independent
and dependent variables.
Linear Regression is very sensitive to Outliers. It can
terribly affect the regression line and eventually the
forecasted values.
2. LogisticRegression:
Logistic regression is used to find the probability of
event=Success and event=Failure.
We should use logistic regression when the dependent
variable is binary (0/ 1, True/ False, Yes/ No) in nature.
Logistic regression is widely used for classification
problems.
Logistic regression doesnât require linear relationship
between dependent and independent variables.
31. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 31
It can handle various types of relationships because it
applies a non-linear log transformation to the predicted
odds ratio.
Logistic regression estimates the parameters of a logistic
model and is form of binomial regression.
Logistic regression is used to deal with data that has two
possible criterions and the relationship between the
criterions and the predictors.
3. Polynomial Regression:
A regression equation is a polynomial regression equation
if the power of independent variable is more than 1.
It is used for curvilinear data. Polynomial regression is fit
with the method of least squares.
32. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 32
The goal of regression analysis to model the expected
value of a dependent variable y in regards to the
independent variable x.
Formula:
Y=mx+b
m=N(âXY)-(âX)(âY)/NâX2
-(âX)2
b=ây-(m.âx)/N
40. COVID-19 INDIA
HEMAâBHARATâKHUSHBU Page 40
3. List of References:
1. http://www.kaggle.com
2. python data set
3. google.com
4. http://medicalnewstoday.com
Plagarism Report:
-Thank You.