2. ANALYSIS OF BUSINESSES
Being in the home care assistance business we decided to analyze millions
of reviews from the web.
Note that the data contains only businesses with at least one review.
Reviews shows businesses near a certain location, so when we search
near cities that are close to the borders, we can get businesses outside
the US (e.g. in Canada). Furthermore, if a business is near two different
cities, it may be listed twice, so we also have to remove duplicates.
From Data from the National Study of Long-Term Care Providers (PDF):
“12,400 home health agencies [in the United States]
At least 60% of home health agencies, hospices, nursing homes, and residential care
communities were for profit, while about 40% of adult day services centers were for profit.”
3. That means there are roughly 0.6×12,400=7440 home care assistance providers in the
United States that operate for profit. By making many random samples from Yelp pages
(Monte Carlo simulation), it's possible to estimate the proportion of home health
businesses with reviews, and the findings are as follows. Approximately 1 in 4
businesses has at least one review on Yelp. If we apply this to our data, we find that
there are approximately 0.25×7440=1860 home care assistance businesses with
reviews on Yelp. This roughly matches our data, in which we count 1922 businesses.
We can observe that, on average, a business has 8997/1922=4.7 reviews.
4. HOME CARE REVIEWS
LOOKING DEEPER
Let's check the distribution of businesses in US states. For readability sake, only the first
30 states are shown in the chart.
5. California home health care reviews lead with a disproportionate amount of home care
assistance businesses: almost 900! On the other hand, there are states, like Vermont,
with only 11 business. This last fact can be checked quite easily: for example some
online directories list 11 businesses in Vermont, but only one has reviews (just one
review, actually). It’s interesting to visualize the density of the home care assistance
businesses on a map. We first compute the number of businesses per city, then we pair
each city with its coordinates. Here’s a home health map across the nation, showing the
density of home health care companies by city/region.
6. As expected, San Francisco, Los Angeles and San Diego (all in California) have the
highest concentration of home care businesses. New York, Miami, Chicago and Austin
closely follow. From the map, it's also clear that the west coast has a very high
concentration in fewer spots, while on the other side of the country the businesses are
more evenly distributed. This trend follows the population density:
8. It’s not surprising that California reaches almost 900 businesses, if Californian cities
make for half of the top 15 and take the first three spots. We can also confirm what we
anticipated before: with the exception of California, home care assistance most of the
top cities offering in home care are from the Midwest, the South or the Northeast.
Let’s examine the distribution of ratings amongst them. The rating of businesses where
found directly online, we calculated ratings using a custom formula that takes into
consideration only the recommended reviews.
9. The distribution of the ratings is, very interestingly, U-shaped. This property of star ratings
is well known and has been studied extensively (here is one short paper on the topic).
This is probably one reason why YouTube switched their five-star system to a likes/dislikes
system. The most popular star rating on YouTube was 5. People also gave 1 star ratings,
but very few people gave 2-4 star ratings (source).
As to why this happens, some possible explanations are:
Vote Stuffing – Some people are more interested in changing the
average rating for an item rather than give their true opinion on it.
(IMDb tries to address this problem, as explained here.)
Selection Bias – People who vote 2-4 stars don't have a strong
enough desire to represent themselves in the dataset.
Controversiality – Some things are just naturally controversial.
Check the reviews of some pens made for women on Amazon for
a hilarious example.
10. We'll now analyze the reviews. We will start by conducting a very general analysis on the
reviews. Then we will do a frequency analysis of the words in the reviews, and then perform
sentiment analysis on them. Let's plot the distribution of reviews length in words:
REVIEWS OF HOME CARE COMPANIES IN THE USA
Amazingly enough, there are reviews that surpass 1000 words. The reviewers clearly went
in-depth on those ones! Review length peaks at around 100 words, which is roughly a
paragraph. This page lists a number of examples that show how different word counts look
like.
11. FREQUENCY ANALYSIS OF WORDS
USED IN HOME HEALTH CARE REVIEWS
We want to extract the unique words. To do so, we first join all the reviews.The first thing
we’ll do is create a word cloud, where word size is proportional to its frequency in the
reviews.
12. We can clearly see that 'care' is overwhelmingly present in the reviews. Other frequent
words are 'time', 'home', 'one', 'service', 'need', 'mother'. Some of those words aren't
really meaningful, while others, like 'mother' and 'time' can shed great insight in what the
reviewers were thinking. As we did before, let's check the actual frequencies of the first
15 more common words:
13. Let's plot the distribution of positive vs negative reviews.
SENTIMENT ANALYSIS OF HOME CARE REVIEWS
To perform sentiment analysis we'll use the textblob package with NLTK's
NaiveBayesClassifier, which is trained on a movie reviews corpus and
therefore is quite suitable to analyze our reviews.
We can see that the majority of reviews carry a positive sentiment. The proportion of
negative to positive reviews is approximately 1 in 8 reviews. Could it be that users review
businesses more willingly, when their expectations are met with a great service? It's very
interesting to plot the proportion of negative to positive reviews while distributing them
according to the stars.
14. From the above chart we see that on average a 1-star review is much more negative than
a 5-star one. Note that our classifier still estimated that some 1-star reviews were positive.
This could be due to the fact that in a review many different aspects are considered. It
could be the case that only one feature is criticized in the review, while the others are
considered good enough. However, if that single feature is deemed very important by the
reviewer, they are going to rate the service 1 star, even though the language used could
be more or less positive.