Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

895 views

Published on

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

Published in: Data & Analytics
  • Be the first to comment

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

  1. 1. Ramya Raghavendra IBM Research rraghav@us.ibm.com IMPROVING TRAFFIC PREDICTION USING WEATHER DATA #EUent7
  2. 2. #EUent7 Pranita Dewan Joshua Rosenkranz Ramya Raghavendra Mudhakar Srivatsa About me • PhD, CS from UC Santa Barbara • Researcher at IBM TJ Watson
  3. 3. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  4. 4. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  5. 5. Driver behavior data is only valid in the context of what is also happening on the road UBI – Usage Based Insurance 71 6571 7265 44˚ Driver Speed Driver Speed Speed Limit Speed Limit Reference Speed Weather Condition Temp Reading 2 Congestion Index Limited Analysis can lead to inaccurate assessments, and impact retention More data, and driver relevant data will lead to greater understanding of behavior and associated risk With 36.2 Billion wasted trucking hours caused by traffic congestion, and the average citizen losing nearly $800 per year in wasted fuel and time, we need to PREDICT traffic to increase efficiency. The Challenge What time should I leave tomorrow to get to Newark the quickest? With snow expected in the morning, what time do I need to leave to get to work by 8:00? What should I tell my morning viewers about their evening commute today? Predictive Traffic Demo #EUent7
  6. 6. Why It’s Important 22% Several times/day 32% Once/day 13% 2-3 times/ week 6% <2 times/week 12% Never 54% CHECK TRAFFIC DAILY 62% 59% 63% 62% 68% 63% 31% 28% 26% 26% 29% 37% Drive times … Drive times for … Best routes for … Best routes to get … How weather is … Maps showing … Before I leave As I'm driving TWC TRAFFIC SURVEY 2:1 PEOPLE WANT TRAFFIC DATA BEFORE THEY LEAVE #EUent7
  7. 7. We historically know general traffic patterns, but many variables can significantly change expectations. Weather is one of the primary variables. So what did we do? The Challenge – No Easy Task • 2.58 Billion Traffic records in the five cites studied • 262 Million weather records in the 1 year study • Week Day vs. Weekend, Morning Commute vs. Evening Commute • Results tabulated on bad weather days, where impacts matter the most. Selected 5 Unique Cities in different US geographies Analyzed 1 year of both traffic and weather data Built a cognitive model that predicts future traffic flows for 15 mins to 24 hours into the future. #EUent7
  8. 8. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  9. 9. • History on Demand – Weather features accessed via lat/lon or bounding box – Hourly historical information from July 2011 • Enhanced Forecast – Forecasts at 4 km. resolution every 15 minutes #EUent7 Weather Data https://business.weather.com/products/weather-data-packages
  10. 10. • Traffic, road and incident data – 300M sources – 8M kilometers of road • Real-time traffic flow information for all functional road classifications • eXtreme Definiton segments (XD) – 100-350m long – traffic updated every 5 minutes #EUent7 Traffic Data
  11. 11. 1Apache Spark extensions to handle time series and geospatial data Traffic (historical) Weather (historical + predicted) Incidence Reports (Police, Construction, Traffic Cam, Tweets) Data Sources First Order Models • ARIMA/BATS Second Order Models • Spatial Correlation • Causality Higher Order Models • Random forest • LSTM Machine Learning Models Analytics Platform Spark Streaming Training Scoring Apache Spark1 HDFS/ Cassandra #EUent7 Setup
  12. 12. Spark-TimeSeries: Library for Distributed Time Series Analytics on Apache Spark #EUent7 Scale out • Single JVM: Streams • Horizontal: ShortTSRDD • Longitudinal: LongTSRDD Data types • Fully templated • Integers, Doubles, Strings etc • Fully supporting geo locations Windowing • Record based • Time based • Activity based Runtime support • Periodic, Aperiodic, Hybrid • Aligned/ Unaligned timeseries Multivariate analysis • Temporal joins • Record-based Join Languages • Scala • Java • Python*
  13. 13. Class Features/Models Runtime datatypes • Java streams • Short timeseries RDD (horizontal partitioning) • Long timeseries RDD (longitudinal partitioning) • Timeseries Partitioner Runtime timeseries transforms • Map/Transform • Segmentation (record, time, burst, regression) • Temporal Join • Interpolation (linear, cubic-spline) • Forecast • Filter/slice Unsupervised/Semisupervized learning • Similar sequence detection (Damerau-Levenshtein, Dynamic Time Warping) • Semi-supervized clustering (motif-based) • Timeseries clustering (k-means, k-shape) • Subsequence mining( frequent, discriminatory, timeseries motifs ) • Automatic model selection (Autoforecaster), Grid-search (for H-W), Hannan-Rissanen, Yule- Walker Math • Kalman Filter, convolution/deconvolution, autocorrelation, cross-correlation, FFT, DCT Statistical tests • Ljung Box test, Augmented Dickey-Fuller test, Granger Causality Seasonal + Trend Modeling, Non-Linear • Holt-Winters Additive, Holt-Winters Multiplicative, Segmented Models, Seasonal-Trend Decomposition, Multi-Seasonality, BATS (Box-Cox, ARMA Error) Linear Modeling • ARIMA / ARMA, Linear Regression, Ridge Regression, Moving Averaging Runtime support Algorithms
  14. 14. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  15. 15. • ARIMA (Autoregressive integrated moving average) – Used for time-series forecasting • Use ARIMA to predict per road segment future speeds based on previously observed values • Can model hour-of-day and day-of-week patterns • Cannot handle non-periodic “incidents” 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 24 hour window prediction errors 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 0 2 4 6 prediction errors tailARIMA Prediction example p: # autoregressive terms, d: # non-seasonal differences needed for stationarity q: # lagged forecast errors in the prediction equation. 75% accuracy Time: ~3 mins (linear scaleout with TSRDD) #EUent7 ARIMA Based Model
  16. 16. • Per-road segment regression tree for prediction • Regression tree features: • Current speeds on the road segment • Current speeds on “connected” road segments • Predicted weather on the road segment • Connected Road Segment Extraction Methodologies: à Spatial Radius àCorrelation àCausality Congestion on a road segment affects connected road segments Accuracy: • 89% weather • 82% noweather Time: 6-8 mins (linear scaleout with TSRDD) TSRDD #EUent7 Random Forest Based Model
  17. 17. Vu + Training per node #EUent7 LSTM + Node Embedding as Feature Vector • Create node embedding • Concatenate node embedding with time series data • Node embedding allow the model to learn spatial components of the graph, while the time series data incorporates the temporal components
  18. 18. SparkHDFS CSV Parquet JSON (File) Train Models Offline: One model per-city and per- prediction-time- horizon; Updated every three months; No raw data is stored CSV JSON (15 min per-city updates) StreamingKafka Model Updates REDIS REST API Online: One Kafka and one Spark streaming job per city, prediction over multiple time horizons are stored against the edge id key in REDIS; REST API only accesses REDIS Traffic Weather Temporal & spatial joins #EUent7 Architecture
  19. 19. Driver behavior data is only valid in the context of what is also happening on the road UBI – Usage Based Insurance 71 6571 7265 44˚ Driver Speed Driver Speed Speed Limit Speed Limit Reference Speed Weather Condition Temp Reading 2 Congestion Index Limited Analysis can lead to inaccurate assessments, and impact retention More data, and driver relevant data will lead to greater understanding of behavior and associated risk The Results Total Percentage reduction in prediction error Percentage reduction in error during morning rush hour Percentage reduction in error during evening rush hour Chicago 34.4% 16.9% 41.5% Houston 30.6% 19.3% 17.9% Philadelphia 24.7% 9.5% 19.5% Atlanta 15.1% 3.3% 2.19% Portland 23.0% 15.3% 23.8% Chicago Houston Philadelphia Atlanta Portland Significant Improvements in Accuracy in All Geographies Modeled #EUent7
  20. 20. 5 Predictive Traffic will significantly impact how drivers plan their day. We will… Alert users, before they travel, that their journey may take longer than normal. Deliver intelligent mobile tools to find the best times to travel – if at all. Over time, Predictive Traffic gets smarter by learning from new IoT data: road conditions, local traffic behavior, weather sensors, incidents, user generated feedback, traffic cameras, etc. Commuting gets better with Predictive Traffic #EUent7
  21. 21. Open source details #EUent7 https://ibm.github.io/ https://www.ibm.com/developerworks

×