Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Â
Project presentation sowjanya_132
1.
2. Data Analysis of
COVID-19 pandemic and Vaccination breakdown
Including Twitter data analysis
Sowjanya Bojja
3. ⢠I am Sowjanya I Completed
Btech(Electronics and
Communications Engg in
JNTU)
⢠I Worked in Tata
Consultancy Services and
having 5 yrs of experience
in Oracle Domain.
6. Why I want to learn DATASCIENCE:
In addition to job creation ,
another motivation factor to join
Data science course is my love for
technology and doing research.
Data Science career which is very
experimental and involving a lot of
creativity job,is the perfect
career path which suits with my hobbies.
7. â˘Business Problem
â˘Objective of the Project
â˘Web Scraping â Details
â˘Summary of the Data
â˘Data Cleaning
â˘Data Manipulation
â˘Univariate Analysis
and BiVariate Analysis
â˘Key Business Question
â˘Conclusion (Key finding overall)
â˘Q&A Slide
Agenda
8. COVID-19 originally known as
CoronaVirus Disease of 2019,
has been declared as a
pandemic by World Health
Organization (WHO) on 11th
March 2020. As we all know
that the vaccines arrived to
break this pandemic but most
of the people are not really
sure how effective are they and
they all not ready to take the
vaccines for some or other
reasons.
Business Problem
9. ⢠This paper aims to bring out all the
covid cases world wide and how
they are controlled using
Vaccination program.
⢠The research shows that the most
vaccinated country has less
number of cases which teaches the
public why vaccination is
important.
Objective of the Project
10. ⢠This project also handles data related to
COVID-19 vaccine and WHO have been
unsuccessful in guiding people around this
pandemic outbreak appositely.
⢠This study analyzes two types of tweets
gathered during the vaccination times.
⢠Understanding sentiments and opinions
toward vaccination using Twitter may help
public health agencies to increase positive
messaging and eliminate opposing
messages in order to enhance vaccine
uptake.
Objective of the Project:
11. Web Scraping â Details :
â˘Websites for Cases information
âhttps://www.worldometers.info/coronavirus/#countries'
â˘Websites for available WHO Approved Vaccines (URL)
'https://covid19.trackvaccines.org/agency/who/'
â˘Websites for vaccination data and vaccination breakdown data:
â 'https://news.google.com/covid19/map?hl=en-
US&gl=US&ceid=US%3Aen&state=4'
â 'https://data.ca.gov/dataset/covid-19-post-vaccination-infection-
data/resource/22066d16-3465-4339-94d6-a4f1b3a91101'
â˘Website for Covid vaccine Tweets:
www.twitter.com
12. Processors for Web scraping:
â˘BeautifulSoup
(for cases and
vaccination details)
â˘snscrape(for twitter data)
13. Summary of the Data
â˘Data collected for total cases,cases hospitalized,death cases and
Active carona cases
â˘Data collected for available vaccines,Total doses given
,Vaccination details according to the country region.
â˘Data collected for positive and negative tweets for covid.
14. Data Cleaning:
Data Cleaning is the process
of modifying data to ensure
that it is free of irrelevances
and incorrect data.
In this project I came up with
many challenging tasks to clean
the data.
15. Vaccination Data cleaning:
For getting vaccination information the data which I scraped from the
website was not clean. So initially I looked into the dataset and came
up with the following techniques to clean the data.
16. Vaccination Data cleaning:
â˘For getting number of approved countries I used regex to
get
substring from the original string.
Initially the value present in the approved country column
was
"Approved in 36 countries"
with the help of regex functions I extracted 36 from the
string.
("Approved in 106 countries")which is nothing but 36.
17. Vaccination Data cleaning:
For getting the value of number of trail countries also I used regex
functions.Since columns are more than one,I dropped one of the
column which is not really required.
18. Vaccination Data cleaning:
â˘As we know number of countries is the substring of that
column.we need to convert it into integer.So I used
astype(int)method for converting str to int.
19. Vaccination data cleaning
⢠There are null values in vaccination dataset so I used fillna
method to drop those null values.
21. Covid cases data cleaning:
If I see in the country column of cases dataset the values are containing
special characters like ',' , space ,-1 and null values
22. Covid cases data cleaning:
⢠I used .replace method to replace ',' with nothing.
23. Covid cases data cleaning:
⢠Some values are having 'N/A' values.I replaced those values
with '0'.
24. Covid cases data cleaning:
⢠In the cases dataframe '15,16,17,18,19,20' columns are not
required So i used drop method to drop those columns.
25. Covid cases data cleaning:
⢠I have renamed the column names from[ 0,1,2,...14 ] to
["Country","Total_cases","New_cases","Total_deaths","New_deat
hs","Total_recovered","New_recovered","Active_cases",
"Serious/critical","Total_cases/1M","Deaths/1M","Total_tests","Te
sts/1M","Population","Continent"].
26. Covid cases data cleaning:
⢠I need to find the %inc cases ,% inc deaths and %recovered
data.I used
df["%Inc_cases"]=df["New_cases"]/df["Total_cases"]*100metho
d to findout those results.
27. Covid cases data cleaning:
⢠After finding %inc values I inserted another three columns with
the names corresponding to %inc cases ,% inc deaths and
%recovered data.
28. Twitter data cleaning with neattext module
⢠For cleaning Twitter data I have imported neattext module.
29. Twitter data cleaning with neattext module
⢠nfx.remove_userhandles(x) to remove usehandles
30. Twitter data cleaning with neattext module
⢠nfx.remove_multiple_spaces to remove multiple spaces.
31. Twitter data cleaning with neattext module
⢠re.sub(r"S*https?:S*", "", str(x))) to remove hyperlinks
32. Data Manupulation
â˘In Twitter dataset I want to filter values of a column based on
conditions from another set of columns.So I used reset_index
method to rest the index.So that we can filter the values based
on index.
â˘To Modify the Null values in cases dataset I used fillna method
to fill null values with zero.
â˘To get vaccination details of two specific countries I used merge
function to get into single dataframe.
â˘To get top 10 countries affected with covid I used sorted method
first to sort the values.
33. Data Visualization
Data visualization provides a good, organized pictorial
representation of the data which makes it easier to
understand, observe and analyze.
Python provides various libraries that come with different
features for visualizing data. All these libraries come with
different features and can support various types of
graphs.
40. Univariate analysis is the simplest form of analyzing data. âUniâ
means âoneâ, so in other words your data has only one variable.
It doesn't deal with causes or relationships (unlike regression )
and it's major purpose is to describe; It takes data, summarizes
that data and finds patterns in the data.
In this project I used several Univariate Analysis plots to analyze
the Covid data.
Univariate Analysis
41. The Number of Active cases are high
in Europe and North America
The pie chart is a
univariate analysis
between
Active cases vs
continents.
42. Hist plot of total_cases
continent wise:
A histogram is a
classic
visualization tool
that represents
the distribution of
one or more
variables by
counting the
number of
observations that
fall within
discrete bins.
44. The mental and physical health of the global population is found to
be directly proportional to this pandemic disease. People are
having mixed feelings about the Covid-19 virus.
45. These are the positive tweet wordcloud
about covid virus.
It mainly focussed on Vaccination arrival.That
means people are happy about the arrival of
vaccination.
46. The research demonstrates that though
people have tweeted mostly positive regarding COVID-19 ,
yet netizens were busy engrossed in re-tweeting
the negative tweets and that no useful words could be found
in WordCloud or computations using word frequency in tweets.
47. Neutral tweet wordcloud about the
virus.
Some people are having no opinion on Covid.
Surprisingly that number is very high.
48. The bar plot is a univariate data
visualization plot on a two-dimensional axis
According to the latest information
In this plot we can observe that
total_deaths
(green colored bar)are
less compared to
active cases(red bar).
We can also see that total
recovered cases (blue)are high.
49. â˘From the above graph we can observe that new deaths are zero
and new recovered cases are as same as the active cases.And
this plot is based on the latest information.
50. ⢠From the above plot we can observe that percentage of
increased
recovered rate is very high.(From the latest information)
51. Top 20 countries in view of Total Vaccinations
Here is the list of top 20 countries
who got more vaccinations.
52. Percentage of people got fully
vaccinated country wise.
From this plot we
can say that China
is 87%
Vaccinated and
worldwide only half
of the people
got vaccinated
53. Top 40 Countries in which percent of population
who got fully vaccinated
India is in 40th
place
55. Multivariate Analysis:
â˘Multivariate analysis (MVA) is a Statistical procedure for analysis
of data involving more than one type of measurement or
observation.
â˘It may also mean solving problems where more than one
dependent variable is analyzed simultaneously with other
variables.
⢠Multivariate analysis is used to study more complex sets of data
than what univariate analysis methods can handle.
â˘In this project I used some of multivariate plots.
56. From the below plot we can see top 5 countries affected with covid and
their covid analysis.
total cases in USA and India are more compared to any other countries.
57. â˘The above plot is the covid analysis of Five least countries
affected with Covid-19.
â˘Cook Islands and Vatican City are less affected with COvid-19.
58. Pair plot of Total_cases vs Total_deaths
continent wise.
60. The below is the heat map of Total doses vs fully
vaccinated.Which is equally proportionated
61. â˘Even though there are more cases are
reporting daily the death rate is very low
and the recovered rate is also increasing.
62. Its the comparison between vaccinated,unvaccinated and boosted
cases.The people who got booster are less hospitalized and less
death rate compared to vaccinated and unvaccinated cases.The
unvaccinated death rate is high.
63. Bivariate analysis is one of the simplest forms of quantitative
(statistical) analysis. It involves the analysis of two variables
(often denoted as X, Y), for the purpose of determining the
empirical relationship between them. ... It is the analysis of the
relationship between the two variables.
In this project I used some of Bivariate plots.
Bi Variate analysis:
64. In china the vaccination count is much higher compared to any
other country and that of total doses.
65. From the below plots it's shown that total deaths are less even
though cases are high.
66. In Africa there are more Covid Cases compared to other continent.
As we see Africa is not in top 20
countries according
to vaccination.
68. In USA The people who got booster are less hospitalized.
69. Negative tweet word cloud about
vaccination:
From this plot we can observe that
people are
getting worried about the safety.
percentage of
fully
vaccination
table
In Unitedstates Only 65% is vaccinated
even though there is vaccine availability
70. The research demonstrates that though people have tweeted mostly positive
regarding COVID-19 vaccines,yet netizens were busy engrossed in re-tweeting
the negative tweets and that no useful words could be found in WordCloud
or computations using word frequency in tweets.
71. Key Business Question
When will Covid - 19 will end and when will be the whole
world will get vaccinated?
72. Conclusion:
Millions of people worldwide have been infected with COVID-19 and so far,
more than a million have lost their lives because of the pandemic.
A huge global research effort is taken place to bring a fast-tracked vaccine to the
market.
Currently there are more than 165 vaccines being developed.
While the COVID-19 outbreak is foremost a public health crisis,
it has also caused substantial damage to the global economy.
National governments are spending trillions of dollars to
fight the negative economic impact,
but until people get their vaccines and get fully vaccinated,
the financial cost will continue to be felt around the world.
Vaccines continue to reduce a person's risk of contracting the virus that cause
COVID-19. Vaccines are highly effective against severe illness.
But as we see from the twitter analysis people still are getting confused about the
vaccinations.And there are negative feelings,some are opposed to the vaccines.
73. As we have seen in USA the death rate became to zero after vaccinations.
Vaccines are working extremely well to reduce serious infections,
but boosters are critical for the best protection against hospitalization,
death and against multiple variants. Even after you have had a COVID infection,
vaccination and boosters will provide increased protection.
Public discussion and perception of COVID-19 vaccines on Twitter were influenced by
the
vaccine development and the pandemic, which varied depending on the geographics
and
demographics of Twitter users.
Understanding sentiments and opinions toward vaccination using
Twitter may help public health agencies to increase positive messaging
and eliminate opposing messages in order to enhance vaccine uptake.
74.
75. Finally,
It was a wonderful experience making this project. The experience is greater than all I
have expected
There were three key lessons I learned from this project:
Web scraping is a highly effective method to extract data from websites
The Beautiful Soup library is a fun and scrappy resource for scraping data off public
websites
Iâve honestly found web scraping to be super helpful when Iâm looking to work on a new
project
or need information for an existing one.
On the other hand I found it's very difficult to extract related tweets data from Twitter
platform.
Even though it's bit complicated , I am very much interested to work on scraping the
challenging
websites and collect the data.
It's fun and engaging and lot more to learn.
Overall the experience was good and looking forward to get more opportunities like this.
Thank you!