Developing a real-time
predictive algorithm for
Nick Nudell, MS, NRP
Dakota State University
• Approximately 7 billion emergency calls annually
• Police – Fire – EMS
Real World Problem
• What action to take?
• Where to do it?
• Who should do it?
• How quickly does it need to be done?
• Why was it done?
• Caller phone number: call routing information, mobile/fixed,
single/multiple user (like an IP address), GPS/tower, eCall/Automatic
• Resources/system status: what people, vehicles, equipment, etc.
• Environment: Weather, crowding & traffic (granular to the device),
street corner/high rise/wilderness, ferry/train/plane schedules
• Call center, paramedics, hospital, police records, fire records, public
• Social media: twitter, facebook, instagram, etc
• 50 years of Operations Research / Management
• 25 years of decision tool/tree validation
• 10 years of clinical registry prediction tool validation
• 15 years of decision support in emergency calling “appropriateness”
• 6 months of deep data mining exploratory work
Why is it so complex?
• Chinese city with 9 million residents
• 2.5 calls per resident over 5 years (0.5/person/year)
• Repeat callers average 2.09 calls per year
• USA with 320 million residents
• 240 million 911 calls per year (0.75/person/year)
• 41,000 calls per Public Safety Answering Point
• $4.51 per call, just to maintain the ICT & dispatching system
• 10,000+ ICD10 diagnosis codes
• 19,000 EMS services across 50 states & 6 territories
• Started in 1978…
• 36 Families of problem types
• Level of Urgency: Hot or Not
• Omega, Alpha, Bravo, Charlie, Delta, Echo
• Nuanced descriptors help determine what
kind of first-aid instructions are to be given
1120 * 8 = 8,960 hours of
138,116 total calls
5,730 high priority (Cardiac
Arrest & Choking)
53,481 life threatening
78,905 non-life threatening
Decision Tree – Manual Deductive Reasoning
• Dispatching priority relies on standardized keywords compared to
a known list of static scenarios
• Shooting THEN
• Urgently send police, apply tourniquet, stop bleeding.
• Not breathing/pulseless THEN
• Start CPR, urgently send paramedics
• Cardiac history THEN
• Urgently send paramedics, take aspirin, stay calm
• Known as clustering in computer science
Questions / Prioritization / Instructions
• Priorities designed to purposefully over-triage rather than increase
specificity as risk management tool
• Lots of vehicles / fewer vehicles
• Lights & Sirens / no L&S
• Queuing theory using probabilistic expected delays for paramedics,
police, or fire department responders
• Targeting the slowest delay possible because time=money
• Knowledge discovery opportunities are overlooked!
• Crowdsource trained people for faster response
• Electronic medical records describe historical risk
• Caller behavior, word choice, history, location, etc are untapped indicators
Queuing Theory – Planning to Disappoint
• Operations Research, Management Science, & Computer Science
disciplines rely on probabilistic calculations
• A model is constructed so that queue lengths and waiting time
can be predicted
• Interarrival time & service times are independent random variables
• Designed to select next task to perform
• The most commonly used laws are:
• FIFO - First In First Out: who comes earlier leaves earlier
• LIFO - Last Come First Out: who comes later leaves earlier
• RS - Random Service: the customer is selected randomly
Erlang Call Center Algorithm
Estimate how many agents you
need in your call center for
each hour during an eight hour
How many taxis for a particular
time of day?
How many hospital beds? Fire
trucks? Paramedics? Police?
Natural Language Processing
• Machine learning to determine semantic meaning
• Based on ontologies and probabilistic decisions
• “Understanding” of words, meanings, intents
• Better suited for structured, grouped or otherwise trained text such as
physician narratives or same language categorization
• Excels at spelling, grammar, and Named Entity Recognition that are relatively
• Well suited for classifying/parsing simple or common statements
• Generally “trained” by humans (expensive)
• Handling unstructured data, stemming, bag of words, TF/IDF, topic modeling.
Machine Learning - Inductive
• Learns from the information itself
• Classifier accuracy is similar to human experts
• Common Algorithm Types
• K-nearest neighbors (KNN)
• Linear regression
• Logistic regression
• Naive Bayes
• Decision trees, bagged trees, boosted trees, boosted stumps
• Random Forests
• Neural networks
Comparing Supervised Learning Algorithms
e by you?
Handles lots of
KNN Either Yes Yes Lower Fast
Minimal No No No Yes No Yes
Regression Yes Yes Lower Fast Fast
Yes No No N/A Yes
Classification Somewhat Somewhat Lower Fast Fast
Yes No No Yes Yes
Naive Bayes Classification Somewhat Somewhat Lower
Some for feature
Yes Yes No No Yes No
Decision trees Either Somewhat Somewhat Lower Fast Fast Some No No Yes Possibly No No
Either A little No Higher Slow Moderate Some No
Yes (unless noise
ratio is very high)
Yes Possibly No No
AdaBoost Either A little No Higher Slow Fast Some No Yes Yes Possibly No No
Either No No Higher Slow Fast Lots No Yes Yes Possibly No Yes
Support Vector Machine (SVM)
Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011).
Natural language processing: an introduction. Journal of the
American Medical Informatics Association : JAMIA, 18(5), 544–
• Very similar level of accuracy
• Will use similar attributes for
• May vary when categorical vs
• Primary difference is in efficiency
• Big-O Notation is a relative
representation of the complexity of
• It has been widely shown that random forests
are one of the most accurate existing
• It can deal with a huge number of features
• It runs efficiently on large datasets
• It can help estimate which variables are
important in classification
• It can be extended to an unsupervised version
to work with unlabeled data.
• It is relatively robust to noise
• They tend to overt noisy data.
• Not as intuitive as some other classification
• Might take a while to build the forest (but once
it's built classification is very fast)
The Turing Test
• In 1950 Alan Turing wondered ‘Can computers think?’
• Proposed The Imitation Game
• Interrogator and two players, one human and one computer
• Based on typewritten responses the interrogator was to guess which
player was the computer
• He believed having adequate storage was the primary limiting factor
with speed being next
• Learning machine is like a child being taught
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460.
• Can an a priori algorithmic, inductive reasoning based approach be
• improve the speed of the decision making process during emergency call
taking and dispatching?
• improve the accuracy of the resource assignment for emergency call
Discussion – Present Considerations
• Flowchart/Tree: veracity of the reporting party, socio-economic and
demographic factors of the patient/victim, the capability of the
responding unit, the quality of services provided by the responding
individual, and the specificity of the dispatching algorithm itself are
not factored into the decision model.
Discussion – Future Considerations & Research
• Future research: develop an AI, ML based approach.
• Obtain detailed 911 call and electronic Patient Care Records for approximately
five million patients where an outcome is identified.
• unfounded/no merit, patient treated but not transported, patient treated and
transported, and patient transferred to another responder.
• The clinical condition at the time of the outcome will be determined based on standard
paramedic coding practices.
• Data split by randomization to a training dataset and test dataset.
• A Random Forest model built from training dataset then applied to test
• Comparative statistics to evaluate the resource assignments, reduced
demand, and potential savings of the new model
• New knowledge model is a dynamic and real-time application