1. MACHINE LEARNING
Student: Ehsan Ullah
Instructor: Ahmed B. Mahmood, Ph.D
Introduction to Machine Learning
Classification of 911 dataset
2. BACKGROUND
• Various services are provided at city or provisional level
• Canada uses the same system as the USA as our telephone
system is closely integrated, so 911 is used everywhere in
Canada
• Canada adopted 911 in 1972, and the first city to implement
the system was London, Ontario, in 1974
• Winnipeg's 999 service, introduced in 1959, was a forerunner
to 911 service in centres across North America(changed to
911 in 1972).
• Prince Edward Island was the last province to get 911 service
in 2000.
3. WHY 911 EVOLVED
• Need Analysis
Members minimum knowledge about their rights (Authority)
• Technology
Evolving Telecom, Exchange, Data Networks, Data processing capabilities (GPU vs
CPU)
4. WHY IS ANALYSIS REQUIRED
• Real time analysis and feedback will help create
understanding of expected calls which might help
with resource allocation for call centers
• Reduce Stress (Call center, Emergency Services)
• Help strategy makers anticipate the occurrences of
emergencies and enable them to effectively handle
the emergency by appropriate allocation of resources
5. DATASETS – ATTRIBUTES(COLUMNS)
• Emergency - 911 Calls - Montgomery
County, PA - Data Source -
https://www.kaggle.com/mchirico/montco
alert
• Baltimore 911 Calls - Records of 2.8 million
calls from 2015 onwards - Data Source -
https://www.kaggle.com/sohier/baltimore-
911-calls
• Guelph Police Calls
New Link:
https://www.guelphpolice.ca/en/about-
gps/open-data.aspx
Old Link:
http://www.guelphpolice.ca/en/about-
us/Guelph-Police-Open-Data.asp
6. PROCESSING THE DATASETS
• Missing Values
• Time-Date from Timestamp
• Train the algorithm
• Classification of Emergency Calls
• Measure accuracy of results
• Google Maps API
• Python Script
• Microsoft Power BI
8. VISUAL INTERPRETATION
• Hours vs No. of calls
• Calls vs Type of calls
• Rate of Calls over Months
• Call distribution over cities and streets – Classification of call types
9. OBJECTIVE - TYPES OF CLASSIFIERS
• Linear Regression - Completed
• Logistic Regression – In Process
• Decision Trees - Completed
• Nearest Neighbor – In Process
10. DATA ANALYSIS - PYTHON
• Tools used:
• Jupyter Notebook
• Excel Power Query
• Excel Or SQL Server
• Libraries Used:
• Data modeling Libraries: Numpy, pandas,
• Graph Libraries: matplotlib, seaborn
11. PREPROCESSING - PYTHON
• Loading Data
• Cleaning Data (Missing or Null Values)
• Analyzing the Data (Converting Data Types,
Adding Derived Fields)
• Encoding Data for Analysis
12. DATA ANALYSIS - PYTHON
• Analyzing the data
• Converting data for heat maps
• Converting data for Algorithms (Linear
Regression, Decision Tree, Naive Bayes)
13. VISUALIZATION OF DATA - PYTHON
• Creating Tabular vs Graphical
Visualization of data
• Top 5 townships generating 911
calls
• Classifying type of emergencies
in Types
• Top 5 townships generating
Type of Calls
(EMS, Fire, Traffic)
• Top 5 calls – Types of Calls
14. VISUALIZATION OF DATA - PYTHON
• Call Counts VS Day of the Week
/Month (EMS, Fire, Traffic)
• Calls per month
• Calls Rate (EMS, Fire, Traffic)
• Calls – Hours Vs Day of Week
Heat Map
15. VISUALIZATION OF DATA - PYTHON
• Linear Regression
• EMS: ASSAULT VICTIM
• EMS: RESPIRATORY EMERGENCY
• EMS: ASSAULT VICTIM
vs
EMS: VEHICLE ACCIDENT
16. MACHINE LEARNING - ALGORITHMS
• Decision Tree – Accuracy
(58.05 – 50. 44) –
Depending on Training vs
Testing Data
• Naïve Bays Classifier –
Accuracy dropped to
50.68%
17. MICROSOFT POWER BI –
INTERACTIVE DASHBOARDS
• BI is a business intelligence platform
• Analytics toolset
• Connection to different Sources
• Cleaning Data using R
• Formatting Data
• Ad-hoc Analysis
• Dataflow (Refresh Schedule, Workspace)
18. INTRODUCTION TO POWER BI
• Install the Power BI
• Create Data Source Connections
• Create Query
• Clean Up Data (Visual, R Script)
• Apply changes or add more data
• Select data and add custom visuals
19. MY DASHBOARD – POWER BI
• Dashboard
Montgomery County, PA
Home
• Hours Vs Weeks – Calls
• Classification of Calls (Location, Heat
map)
• Distribution of calls (Townships)
• Clustering
20. CHALLENGES
• Software Installation and compatibilities (SQL, Connectors)
• Using Google and Bing to get GEO coordinates from Addresses (Paid Service)
• Data processing using excel vs SQL
• Data import into SQL from CSV, Excel
21. CONCLUSION AND FUTURE WORKS
Visualization graphs helped to understand:
• Time span calls are made
• Types of Calls
• Effected streets in cities
The results of this analysis can help the public safety services to make strategic
decisions about how to fairly allocate fire, medical and police resources
Power BI Dashboard and Jupyter Notebook will be published on Github
Reduce the stress of the 911 operators, who suffer from significant employee turnover
Using seaborn's lmplot() to create a linear fit on the number of calls per month.
Restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week
Microsoft Power BI is a business intelligence platform that offers business analytics toolset. It is designed to assist businesses in their efforts to systematically analyze data and share insights. Power BI can access data from different sources, simplify its preparation, cleaning and enable ad-hoc analysis.
Using R – Engine to use the visuals
Developing R scripts to process data