Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Should You Eat There?
An Analysis of NYC Restaurant Inspection Data
BusinessIntelligence
&DataAnalytics
Samantha Grant
Jingshu Sun
Akash Dhruv
Candice Brown
Leeyat Slyper
Meet Our Team
Group 2
Agenda
The Data
Data Exploration
Unsupervised Learning
Supervised Learning
Recommendations
1
2
3
4
5
Business
Objectives
IDENTIFYING
VIOLATION
TRENDS
1
PREDICTING
VIOLATIONS
2
REDUCING
VIOLATIONS
3
Help NYC restaurant,
and ...
So there won’t
be any more of
this...
Part 01
The Data
Data Attributes
● Inspection Date
● Inspection Type
● Violation Code
● Critical Flag
● Grade (A,B,C)
● Scores
● ID
● Resta...
Data Cleaning
1
2
3
4
Removed rows with
inspection dates in the
future.
REMOVED
BAD DATA
Reduced number
of rows
SHRANK
DAT...
● Allergies/Safety
● Animals
● Certification
● Documentation
Replacement
110 Violation Codes → 13 Violation Categories
● F...
Part 02
Data Exploration
TOP 3
Violation Categories
#1
Facility
Amenities
#2
Animals
#3
Facility
Cleanliness
? Violation Trends:
What are the most ...
?Violation Density:
Which borough has the most violations?
Staten Island
Queens
Brooklyn
Bronx
Manhattan
1,438,159 population
13,221 persons/sq. km
9.48%
2,321,580 population
8,237 persons/sq. km
24.07%
39%
Violations
Manhattan...
These articles confirm our findings...
Insight:
There are not major
differences in average
restaurant scores
despite differing
borough wealth and
popularity.
Do ...
Recommendation:
Re-opening average
scores are lowest
scores. A separate
process could be in
place for re-openings to
ensur...
Restaurant Grade Distribution:Takeaways:
● Hamburgers,
Cafes and
American
food have the
highest % of
A grades.
● Indian fo...
Part 3
Unsupervised Learning
Association Rules
Animals
Facility Amenities
Worker Cleanliness
Facility Cleanliness
Food Temperature
Food Contamination
1...
Violations per
Season
Winter
~2k
Spring
>2K
Summer
<1.5k
Fall
~1.5k
Seasonal Trends:
Which season has the most violations?...
Clustering Results:
Clustering Results:
Segment Size
Clustering Results:
Takeaway:
Seasonal Dummy Variable was the most influential across the boardVariable Worth
Cluster Findings:
What are the prevalence of violations by season?
Takeaways:
Cluster 1: Spring
Cluster 2: Summer
Cluster ...
Cluster Findings:
What are the prevalence of violations by grade?
Takeaways:
Cluster 1: C Grade,
Food Temp,
Flies/Food Ref...
Part 4
Supervised Learning
Should you eat at
Chipotle?
?
Focus Point: Chipotle
Answer:
Yes...in STATEN
ISLAND - No
violations were
detected in any
Chipotle outlets there
Top Borou...
Focus Point: Chipotle
Takeaways:
Most common
violations category:
1. Animals: 04N
2. Food Temperature:
02B, 02G
3. Worker
...
Do landmark NY
restaurants perform
better?
?
Focus Point: Landmark Restaurants
Landmark
Restaurants:
- Famous
- Oldest
- Movie Scenes
- Favorites
Focus Point: Landmark Restaurants
Hypothesis
Confirmed:
Not Critical violations
are more common for
Landmark
restaurants t...
Focus Point: Landmark Restaurants
Hypothesis
Confirmed:
Landmark
restaurants have
higher percentage of
A’s.
Focus Point: Landmark Restaurants
Finding:
Second most
common violation for
landmark restaurants
due to not cleaning
surfa...
Focus Point: Landmark Restaurants
Hypothesis Not
Supported:
Violations, or lack
thereof are not
indicators of
Landmark
res...
What factors lead to
a judgement of
critical violation?
?
Part One: Decision Tree Model
VIOLATION PREDICTION --- Interpreting the Inspection Result
What kind of restaurants are mor...
Unsupervised
Learning
SCORE
CRITICAL_
FLAG
Cheating
Splitting
?
Variable Selection
Unsupervised
Learning
Two-Way
&
Three -
Way
?
Running Model:
Data Partition--70% Training Data & 30% Validation Data
Findings (Two-Way):
Grade
1.0000
Inspection
_Type
0.4314
BORO
0.1675
Restaurants who get a
score under B are 68.17%
likely...
Part Two: Logistic Regression
Outcome: Critical_Dummy
Variable Selection: Stepwise
Findings (Similar to Decision Tree):
Score GRADE BInspection
Type
GRADE C
0.0983 0.0948 0.06100.1596
Part 4
Recommendations
● Dine after Spring, since restaurants have been issued the most violations
by that time.
● Be wary of Indian and Chinese ...
● Hire a dedicated cleaner in high-volume landmark restaurants.
● Since Facility Amenities violations are the most common,...
Questions?
Thanks for listening!
New York City Restaurant Inspection Analysis
New York City Restaurant Inspection Analysis
Upcoming SlideShare
Loading in …5
×

New York City Restaurant Inspection Analysis

1,012 views

Published on

Data mining project on restaurant inspection results for New York City.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

New York City Restaurant Inspection Analysis

  1. 1. Should You Eat There? An Analysis of NYC Restaurant Inspection Data BusinessIntelligence &DataAnalytics
  2. 2. Samantha Grant Jingshu Sun Akash Dhruv Candice Brown Leeyat Slyper Meet Our Team Group 2
  3. 3. Agenda The Data Data Exploration Unsupervised Learning Supervised Learning Recommendations 1 2 3 4 5
  4. 4. Business Objectives IDENTIFYING VIOLATION TRENDS 1 PREDICTING VIOLATIONS 2 REDUCING VIOLATIONS 3 Help NYC restaurant, and restaurant-goers by...
  5. 5. So there won’t be any more of this...
  6. 6. Part 01 The Data
  7. 7. Data Attributes ● Inspection Date ● Inspection Type ● Violation Code ● Critical Flag ● Grade (A,B,C) ● Scores ● ID ● Restaurant Name ● Cuisine Description ● New York Boro ● Zip Code RESTAURANT DETAILS VIOLATION DETAILS 477,000 rows
  8. 8. Data Cleaning 1 2 3 4 Removed rows with inspection dates in the future. REMOVED BAD DATA Reduced number of rows SHRANK DATA SET FIXED SPELLING & INCONSISTENCIES REPLACEMENT & FLAG CREATION Fixed spelling errors. Replaced ‘Not Yet Graded’ with ‘N’. Broke ‘Inspection Type’ into 2 columns. Violation Categories Inspection Categories Seasonal Flags Landmark Flags
  9. 9. ● Allergies/Safety ● Animals ● Certification ● Documentation Replacement 110 Violation Codes → 13 Violation Categories ● Facility Amenities ● Tobacco ● Facility Cleanliness ● Hazardous Chemicals ● Food Temperature ● Food Contamination ● Tobacco ● Worker Cleanliness ● Other
  10. 10. Part 02 Data Exploration
  11. 11. TOP 3 Violation Categories #1 Facility Amenities #2 Animals #3 Facility Cleanliness ? Violation Trends: What are the most common violation types?
  12. 12. ?Violation Density: Which borough has the most violations? Staten Island Queens Brooklyn Bronx Manhattan
  13. 13. 1,438,159 population 13,221 persons/sq. km 9.48% 2,321,580 population 8,237 persons/sq. km 24.07% 39% Violations Manhattan 3% Violations Staten Island 24% Violations Brooklyn 9% Violations Bronx 24% Violations Queens Restaurant Density vs. Percent Violations
  14. 14. These articles confirm our findings...
  15. 15. Insight: There are not major differences in average restaurant scores despite differing borough wealth and popularity. Do inspection scores differ across borough? ?
  16. 16. Recommendation: Re-opening average scores are lowest scores. A separate process could be in place for re-openings to ensure good scores. Inspection Type: How Do Scores Differ for Inspection Types ?
  17. 17. Restaurant Grade Distribution:Takeaways: ● Hamburgers, Cafes and American food have the highest % of A grades. ● Indian food has the largest share of C grades Grade A Grade B Grade C Source: What’s the safest food in New York City? - Data Diversions - tumblr.com [NYC Open Data]
  18. 18. Part 3 Unsupervised Learning
  19. 19. Association Rules Animals Facility Amenities Worker Cleanliness Facility Cleanliness Food Temperature Food Contamination 1.06 Lift 1.01 Lift
  20. 20. Violations per Season Winter ~2k Spring >2K Summer <1.5k Fall ~1.5k Seasonal Trends: Which season has the most violations? Spring has the most violations & American, Chinese and Italian Food had the most violations. ? Winter Spring Summer Fall
  21. 21. Clustering Results:
  22. 22. Clustering Results: Segment Size
  23. 23. Clustering Results:
  24. 24. Takeaway: Seasonal Dummy Variable was the most influential across the boardVariable Worth
  25. 25. Cluster Findings: What are the prevalence of violations by season? Takeaways: Cluster 1: Spring Cluster 2: Summer Cluster 3: Winter. highest Manhattan incidence Cluster 4: Spring All Clusters: American & Chinese food violations, Manhattan & Brooklyn, Score impactful on all clusters, especially 1 & 4 Other Findings: Staten Island is not impactful on any cluster
  26. 26. Cluster Findings: What are the prevalence of violations by grade? Takeaways: Cluster 1: C Grade, Food Temp, Flies/Food Refuse Violation, Mice Cluster 2: A Grade Cluster 3: A Grade, highest Manhattan incidence Cluster 4: B Grade All Clusters: Manhattan impactful on all clusters
  27. 27. Part 4 Supervised Learning
  28. 28. Should you eat at Chipotle? ?
  29. 29. Focus Point: Chipotle Answer: Yes...in STATEN ISLAND - No violations were detected in any Chipotle outlets there Top Borough for violations at Chipotle outlets: MANHATTAN
  30. 30. Focus Point: Chipotle Takeaways: Most common violations category: 1. Animals: 04N 2. Food Temperature: 02B, 02G 3. Worker Cleanliness: 06A, 06B
  31. 31. Do landmark NY restaurants perform better? ?
  32. 32. Focus Point: Landmark Restaurants Landmark Restaurants: - Famous - Oldest - Movie Scenes - Favorites
  33. 33. Focus Point: Landmark Restaurants Hypothesis Confirmed: Not Critical violations are more common for Landmark restaurants than others.
  34. 34. Focus Point: Landmark Restaurants Hypothesis Confirmed: Landmark restaurants have higher percentage of A’s.
  35. 35. Focus Point: Landmark Restaurants Finding: Second most common violation for landmark restaurants due to not cleaning surfaces after each use Recommendation: Hire employee who cleans while chefs cook
  36. 36. Focus Point: Landmark Restaurants Hypothesis Not Supported: Violations, or lack thereof are not indicators of Landmark restaurants.
  37. 37. What factors lead to a judgement of critical violation? ?
  38. 38. Part One: Decision Tree Model VIOLATION PREDICTION --- Interpreting the Inspection Result What kind of restaurants are more likely to be judged critical violation? Key: Create a CRITICAL_DUMMY according to CRITICAL_FLAG; Assign Role “Target” and Level “Binary” Not Critical Critical Critical_Dummy = 0 VS Critical_Dummy = 1
  39. 39. Unsupervised Learning SCORE CRITICAL_ FLAG Cheating Splitting ? Variable Selection
  40. 40. Unsupervised Learning Two-Way & Three - Way ? Running Model: Data Partition--70% Training Data & 30% Validation Data
  41. 41. Findings (Two-Way): Grade 1.0000 Inspection _Type 0.4314 BORO 0.1675 Restaurants who get a score under B are 68.17% likely to be judged critical violation, compared to 48% likely to be critical violation with Grade A. Restaurants with an initial low grade are more likely to be judged a critical violation during re- inspection, with a possibility to nearly 70%. “BORO” does not appear to affect much on Critical Violation. The probability for critical judging is around 52% for re-inspection with initial high grades in all regions.
  42. 42. Part Two: Logistic Regression Outcome: Critical_Dummy Variable Selection: Stepwise
  43. 43. Findings (Similar to Decision Tree): Score GRADE BInspection Type GRADE C 0.0983 0.0948 0.06100.1596
  44. 44. Part 4 Recommendations
  45. 45. ● Dine after Spring, since restaurants have been issued the most violations by that time. ● Be wary of Indian and Chinese restaurants in New York City. ● Don’t pay Manhattan prices; it does not have cleaner restaurants. ● If you want to eat at Chipotle, go to Staten Island. FOR THE HUNGRY CONSUMER
  46. 46. ● Hire a dedicated cleaner in high-volume landmark restaurants. ● Since Facility Amenities violations are the most common, construction is a critical stage -- do extensive research before contracting. ● Focus on cleanliness for the Spring season. ● Be sure to do well for re-inspection, you’ll either pass with flying colors or be severely penalized. ● Set a benchmark to be met before allowing re-opening. FOR RESTAURANTS
  47. 47. Questions? Thanks for listening!

×