SlideShare a Scribd company logo
1 of 20
Download to read offline
@alexpmil
Alex P. Miller
Love Thy K-Nearest Neighbors
Best Predictive Model:
Pedestrian Usage of Downtown
Mall using Wifi Data
APPLIED MACHINE
LEARNING CONFERENCE
OPEN DATA
CHALLENGE
@alexpmil
The Team
@alexpmil
The Team
@alexpmil
The Team
@alexpmil
The Team
Alex P. Miller
Ph.D. Student, Information Systems
I study:
● Recommendation systems
● A/B testing
● Algorithmic decision making
Big thanks to @BecomingDataSci (Data Science
Renee) for tweeting about the competition!
@alexpmil
The Data
@alexpmil
The Data
Calendar features:
Day of week, quarter, federal
holidays, etc.
Quite important given test period!
@alexpmil
The Data
Weather data:
Temperature, wind, humidity
(from Weather Underground API)
@alexpmil
The Data
Local events data:
http://downtowncharlottesville.net
Actually did have predictive power!
@alexpmil
The Data
UVA Men’s Basketball Schedule:
@alexpmil
The Data
UVA Men’s Basketball Schedule:
@alexpmil
The Data
UVA Men’s Basketball Schedule:
Home/away, ranked opponent, etc.
Not highly predictive, but still fun
@alexpmil
The Data
Calendar
Weather
Local events
UVA Basketball
60+ possible features
Hand-picked < 30 that I
thought would be most
meaningful
@alexpmil
The Model
Simple linear model with < 30 features
@alexpmil
The Model
Simple linear model with < 30 features
Custom loss function:
● MAPE objective with L2 regularization (tuned with 5-fold CV)
@alexpmil
The Tricks
● Very hard to
de-trend time-series
with only one year of
data!
● Test period included
Christmas
Training Period Test Period
@alexpmil
The Tricks
Training Period Test Period
● Solution:
exponentially
smoothed weighting
of time series +
Thanksgiving
@alexpmil
The Code
● Built my own linear model object, fit using numerical
optimization (MAPE has no analytical minimum, like OLS)
● Built my own cross validator that could accommodate a MAPE
model objective with L2 regularization
@alexpmil
The Lessons
● Use better data, not better models!
● Optimize directly for your evaluation criterion
● Avoid overfitting during exploratory phase
○ Have a holdout dataset!
● Above all, use common sense
○ When N is small, weight your data intelligently
○ Make sure your model passes sanity checks (e.g., low
traffic on Christmas)
@alexpmil
Thanks!
Will write blog post about
methodology… stay in touch!
alexmill@upenn.edu@alexpmil

More Related Content

More from Astraea, Inc.

Detecting Solar Farms Using Deep Learning
Detecting Solar Farms Using Deep LearningDetecting Solar Farms Using Deep Learning
Detecting Solar Farms Using Deep LearningAstraea, Inc.
 
2018 IEEE WIE Presentation - Dr. Kimberly Scott
2018 IEEE WIE Presentation - Dr. Kimberly Scott2018 IEEE WIE Presentation - Dr. Kimberly Scott
2018 IEEE WIE Presentation - Dr. Kimberly ScottAstraea, Inc.
 
2018 Charlottesville Open Data Challenge - Team DSB
2018 Charlottesville Open Data Challenge - Team DSB2018 Charlottesville Open Data Challenge - Team DSB
2018 Charlottesville Open Data Challenge - Team DSBAstraea, Inc.
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryAstraea, Inc.
 
RasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine LearningRasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine LearningAstraea, Inc.
 
Machine Learning, FOSS, and open data to map deforestation trends in the Braz...
Machine Learning, FOSS, and open data to map deforestation trends in the Braz...Machine Learning, FOSS, and open data to map deforestation trends in the Braz...
Machine Learning, FOSS, and open data to map deforestation trends in the Braz...Astraea, Inc.
 

More from Astraea, Inc. (6)

Detecting Solar Farms Using Deep Learning
Detecting Solar Farms Using Deep LearningDetecting Solar Farms Using Deep Learning
Detecting Solar Farms Using Deep Learning
 
2018 IEEE WIE Presentation - Dr. Kimberly Scott
2018 IEEE WIE Presentation - Dr. Kimberly Scott2018 IEEE WIE Presentation - Dr. Kimberly Scott
2018 IEEE WIE Presentation - Dr. Kimberly Scott
 
2018 Charlottesville Open Data Challenge - Team DSB
2018 Charlottesville Open Data Challenge - Team DSB2018 Charlottesville Open Data Challenge - Team DSB
2018 Charlottesville Open Data Challenge - Team DSB
 
Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite Imagery
 
RasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine LearningRasterFrames: Enabling Global-Scale Geospatial Machine Learning
RasterFrames: Enabling Global-Scale Geospatial Machine Learning
 
Machine Learning, FOSS, and open data to map deforestation trends in the Braz...
Machine Learning, FOSS, and open data to map deforestation trends in the Braz...Machine Learning, FOSS, and open data to map deforestation trends in the Braz...
Machine Learning, FOSS, and open data to map deforestation trends in the Braz...
 

Recently uploaded

How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 

Recently uploaded (20)

How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 

2018 Charlottesville Open Data Challenge - Alex Miller