INFOCOM 2018 Talk: MOVI

•Download as PPTX, PDF•

2 likes•242 views

This document summarizes a model-free approach called MOVI for dynamic fleet management to optimize taxi dispatching and reduce passenger wait times. It compares MOVI to a baseline receding horizon control approach. MOVI uses a deep Q-network trained with a double DQN algorithm to learn optimal dispatch policies in a distributed, model-free manner. Evaluation on real taxi data shows MOVI reduces rejection rates and wait times compared to RHC and is more practical for real-time dispatching due to its faster computation. Future work includes handling partial observability and other reinforcement learning frameworks.

Engineering

MOVI: A Model-Free Approach
to Dynamic Fleet Management
Takuma Oda and Carlee Joe-Wong
Carnegie Mellon University
IEEE INFOCOM 2018/4/19 @Honolulu, HI

Optimization of taxi dispatch/cruising
Reduce passengers’ waiting time
Increase drivers revenue
Vehicle Dispatch Problem

Real-time (On-demand)
Complexity
Large state space
Demand uncertainty
Coordination in large-scale fleet
Centralized, model-based approach
 F. Miao, et al., Taxi Dispatch With Real-Time Sensing Data in Metropolitan
Areas: A Receding Horizon Control Approach, IEEE Trans. Autom. Sci. Eng.,
vol. 13, no. 2, pp. 463478, Apr. 2016.
 Limited modeling of vehicles dynamics
 Computationally intractable for real-time application
Our work: distributed, model-free approach
Challenges

Problem Definition
Environment
Agent
Observation
-Requests
-Vehicle State
-Auxiliary Info
Action
-Dispatch Decision
Reward
-Revenue
-Idle Cruising Cost
Matching

All rides are requested with app
Vehicle state information is available in real time
Requests are rejected if no available vehicles within the
fixed range, e.g., 5 km.
Assumptions

Baseline: Receding Horizon Control (RHC)
Our approach: Deep Q-network (DQN)
Approach
Policy RHC DQN
Formulation Deterministic Optimization Reinforcement Learning
Coordination Centralized Distributed
Model Model-based Model-free
Discretization Taxi Zone Grid

Action: number of vehicles to send to each region, each time
Reward:
Transition Model:
Idle cruising costUnserved requests
Leftover vehicles Vehicles sent
Taxis dropping off
passengers
RHC Approach

Action: where each taxi should go in the next timeslot
Reward Model:
Optimal Action-Value Function:
Loss Function:
Idle cruising cost
Pickup
DQN Approach
Target value

MOVI Architecture
Agent
Demand
Prediction
RHC/DQN
Policies
Fleet Object
Ride
Requests
Dispatcher
ETA Model
OSM Road
Network
Matching
Dispatch
Route
Trip Time
Environment Simulator
wt - 1
Ft
at

DQN Architecture
Fully CNN with auxiliary inputs
Outputs: Q value for each possible moves
Inputs: demand and supply heat maps

DQN Training
Training Step Training Step
Algorithm: Double DQN with experience replay
Exploration: Epsilon greedy with activation rate

Performance Comparison Over a Week
Reject Rate Wait Time Idle Cruising Time
Relative to NO 76% 34% Increases by 1.3%
Relative to RHC 20% 12% Increases by 4%

 DQN outperforms RHC due to the real-time dispatch decision
 DQN forward pass < 100ms
 RHC computation ~ a few seconds
 DQN is more beneficial for drivers
 DQN predicts best action for individual vehicle
 More realistic to implement in real-world
Discussion
UtilizationRate

Conclusion
Contribution
Demonstrated the benefits of applying model-free,
distributed solution to large-scale taxi dispatch problem
Future Work
Partial Observable Environment
Other Reinforcement Learning Framework

Limited Performance Tradeoffs
Reject Penalty Reject Penalty Reject Penalty
Reject Penalty Reject Penalty Reject Penalty
RHC
DQN

V. Mnih, et al. Human-level control through deep
reinforcement learning., Nature, vol. 518, no. 7540, pp.
52933, Feb. 2015.
Q-learning algorithm with function approximation
1. Take some action and observe
2. Set target values
3. Perform a gradient descent step on
Q-learning

Problem Definition
RHC/DQN
Policy Engine
Data Pre-
processing
Demand
Prediction
atwt - 1
Ft
Xt:t + TWt:t + T
Vehicle/passenger
matching
It
Vehicle State
Past
Demands
Dispatch Center
Dispatch
Decisions

Demand Supply Distribution Mismatch
Inference Algorithm

Similar to INFOCOM 2018 Talk: MOVI

[English]sae convergence2010 final2Tsuguo Nobe

Vadakpat-UAV Intelligent Transportation Workshop SlidesPrithviraj (Raj) Dasgupta

[English]sae convergence2010 final2Tsuguo Nobe

TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1NanubalaDhruvan

Ddam (1)suraj prajapati

Autonomous driving system using proximal policy optimization in deep reinforc...IAESIJAI

Speed study pradipta banik 1204012Pradipta Banik

Local Motors Awesome SystemDamien DECLERCQ

2017.On-Line VRP for CER.pdfAdrianSerrano31

Traffic Light Controlhoadktd

smart traffic control system using canny edge detection algorithm (4).pdfGYamini22

Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...toukaigi

Traffic Light Controlhoadktd

Model options for public Bus transport- India(PPP)Jalpa Jain

IRJET - Smart Traffic Monitoring SystemIRJET Journal

Praktijkrelevantie TRAIL PhD onderzoekSerge Hoogendoorn

Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...Biplav Srivastava

Simulation and optimization of dynamic ridesharing servicesMahdi Zarg Ayouna

Can we make traffic jams obsolete?University of Calgary

How to Make Cars Smarter: A Step Towards Self-Driving CarsVMware Tanzu

Similar to INFOCOM 2018 Talk: MOVI (20)

[English]sae convergence2010 final2

Vadakpat-UAV Intelligent Transportation Workshop Slides

[English]sae convergence2010 final2

TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1

Ddam (1)

Autonomous driving system using proximal policy optimization in deep reinforc...

Speed study pradipta banik 1204012

Local Motors Awesome System

2017.On-Line VRP for CER.pdf

Traffic Light Control

smart traffic control system using canny edge detection algorithm (4).pdf

Cost-Effective Single-Camera Multi-Car Parking Monitoring and Vacancy Detecti...

Traffic Light Control

Model options for public Bus transport- India(PPP)

IRJET - Smart Traffic Monitoring System

Praktijkrelevantie TRAIL PhD onderzoek

Case Studies in Managing Traffic in a Developing Country with Privacy-Preserv...

Simulation and optimization of dynamic ridesharing services

Can we make traffic jams obsolete?

How to Make Cars Smarter: A Step Towards Self-Driving Cars

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000

GDSC ASEB Gen AI study jams presentationGDSCAESB

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

What are the advantages and disadvantages of membrane structures.pptxwendy cai

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Porous Ceramics seminar and technical writingrakeshbaidya232001

Introduction and different types of Ethernet.pptxupamatechverse

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Analog to Digital and Digital to Analog ConverterAbhinavSharma374939

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

Recently uploaded (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...

GDSC ASEB Gen AI study jams presentation

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...

What are the advantages and disadvantages of membrane structures.pptx

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

main PPT.pptx of girls hostel security using rfid

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Porous Ceramics seminar and technical writing

Introduction and different types of Ethernet.pptx

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

Analog to Digital and Digital to Analog Converter

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

INFOCOM 2018 Talk: MOVI

1. MOVI: A Model-Free Approach to Dynamic Fleet Management Takuma Oda and Carlee Joe-Wong Carnegie Mellon University IEEE INFOCOM 2018/4/19 @Honolulu, HI

2. Optimization of taxi dispatch/cruising Reduce passengers’ waiting time Increase drivers revenue Vehicle Dispatch Problem

3. Real-time (On-demand) Complexity Large state space Demand uncertainty Coordination in large-scale fleet Centralized, model-based approach  F. Miao, et al., Taxi Dispatch With Real-Time Sensing Data in Metropolitan Areas: A Receding Horizon Control Approach, IEEE Trans. Autom. Sci. Eng., vol. 13, no. 2, pp. 463478, Apr. 2016.  Limited modeling of vehicles dynamics  Computationally intractable for real-time application Our work: distributed, model-free approach Challenges

4. Problem Definition Environment Agent Observation -Requests -Vehicle State -Auxiliary Info Action -Dispatch Decision Reward -Revenue -Idle Cruising Cost Matching

5. All rides are requested with app Vehicle state information is available in real time Requests are rejected if no available vehicles within the fixed range, e.g., 5 km. Assumptions

6. Baseline: Receding Horizon Control (RHC) Our approach: Deep Q-network (DQN) Approach Policy RHC DQN Formulation Deterministic Optimization Reinforcement Learning Coordination Centralized Distributed Model Model-based Model-free Discretization Taxi Zone Grid

7. Action: number of vehicles to send to each region, each time Reward: Transition Model: Idle cruising costUnserved requests Leftover vehicles Vehicles sent Taxis dropping off passengers RHC Approach

8. Action: where each taxi should go in the next timeslot Reward Model: Optimal Action-Value Function: Loss Function: Idle cruising cost Pickup DQN Approach Target value

9. MOVI Architecture Agent Demand Prediction RHC/DQN Policies Fleet Object Ride Requests Dispatcher ETA Model OSM Road Network Matching Dispatch Route Trip Time Environment Simulator wt - 1 Ft at

10. Datasets Training data Test data

11. DQN Architecture Fully CNN with auxiliary inputs Outputs: Q value for each possible moves Inputs: demand and supply heat maps

12. DQN Training Training Step Training Step Algorithm: Double DQN with experience replay Exploration: Epsilon greedy with activation rate

13. Performance Comparison Over a Week Reject Rate Wait Time Idle Cruising Time Relative to NO 76% 34% Increases by 1.3% Relative to RHC 20% 12% Increases by 4%

14.  DQN outperforms RHC due to the real-time dispatch decision  DQN forward pass < 100ms  RHC computation ~ a few seconds  DQN is more beneficial for drivers  DQN predicts best action for individual vehicle  More realistic to implement in real-world Discussion UtilizationRate

15. Conclusion Contribution Demonstrated the benefits of applying model-free, distributed solution to large-scale taxi dispatch problem Future Work Partial Observable Environment Other Reinforcement Learning Framework

16. Thank you! takumao@andrew.cmu.edu

17.

18. MOVI Algorithm

19. Limited Performance Tradeoffs Reject Penalty Reject Penalty Reject Penalty Reject Penalty Reject Penalty Reject Penalty RHC DQN

20. Hour-by-Hour Performance Comparison

21. V. Mnih, et al. Human-level control through deep reinforcement learning., Nature, vol. 518, no. 7540, pp. 52933, Feb. 2015. Q-learning algorithm with function approximation 1. Take some action and observe 2. Set target values 3. Perform a gradient descent step on Q-learning

22. Problem Definition RHC/DQN Policy Engine Data Pre- processing Demand Prediction atwt - 1 Ft Xt:t + TWt:t + T Vehicle/passenger matching It Vehicle State Past Demands Dispatch Center Dispatch Decisions

23. Multi-Agent Double DQN Algorithm

24. Demand Supply Distribution Mismatch Inference Algorithm

Editor's Notes

In traditional taxi networks, individual drivers look for passengers hailing on the street. They are relying on their experience and knowledge But it can be inefficient if they don’t know future demand and are not coordinated For instance, let’s say there are two vacant taxis on the streets and they cruise or are dispatched to this regions. But, customers may request rides at those locations. In this case, for both customers and drivers, dispatch decisions was not optimal. Either of drivers has to spend a lot of time on cruising Modern ride-hailing fleet networks such as Uber and Lyft can track vehicles’ GPS location and passengers’ pickup location in real time. This data can be utilized to predict passenger demand and vehicle mobility patterns in the future, which enables proactive dispatch of their vehicles to predicted future pickup locations In this way, optimization of taxi dispatch can reduce passengers waiting time for a ride and increase drivers revenue
There several challenges in this problem. For an on-demand ride-hailing application, it needs to be solved in real-time However, challenges such as large state space, uncertain customer demand and coordination in large-scale fleet network, makes it difficult to solve efficiently Most previous works on fleet management address this problem with a model-based approach Model-based approach first models vehicles dynamics and interactions with passengers and then optimally solves the dispatch problems given these models Though the model-based approaches can improve the performance, modeling complex dynamics of fleet networks is inherently limited and solving the problem in large-scale fleet in real-time tends to be computationally intractable In this work, we propose a model-free, distributed approach for the problem to tackle these challenges Our contribution of this work are: Design and evaluate a distributed, model-free approach for taxi dispatch problem Compare model-free, distributed approach and model-based, centralized approach Demonstrate effectiveness of the new approach in a realistic simulated environment
Let me define the problem more precisely We assume that there are an environment and an agent. The environment consists of vehicles and passengers with a mobile app The agent takes an action by dispatching. By dispatch, we mean sending a vacant taxi to an other location Agent observes each vehicles’ location and availability status and all passengers pickup requests. Using this real-time information, agent determines proactive dispatching for vacant vehicles. Since we focus on optimizing proactive dispatching, we incorporate the matching algorithm between passengers and available vehicles in the environment. The agent goal is to optimize sequential dispatch decisions so as to maximize accumulated reward
We assume that all rides requested with a mobile app so that the agent can get pickup and drop off location in real time Vehicle state information is available in real time, including locations, occupancy status, destination Requests are rejected if no available vehicles within fixed range. We use 5 km for our experiment
We used Receding Horizon Control approach as our baseline policy It is centralized, model-based approach and formulated as deterministic optimization problem For ours, we presented distributed, model-free approach using a popular reinforcement learning framework Deep Q-Network.
The action variable for the baseline is the number of vehicles to send to each region, each time, denoted by u_t We wish to choose the u_t to maximize reward, defined by a weighted sum of the number of rejects and the vehicles’ idle cruising time The number of vehicles in next time slot t is computed by this transition model. The first term corresponds to leftover vehicles as the results of pickups The second term is the net number of vehicles dispatched to this region The last two terms represent occupied vehicles dropping off passengers within time slot t+1 Assuming the future demand are known, we can find optimal dispatch actions to maximize accumulated reward in T horizon Every time step, we solve RHC to determine next T step actions, but execute only current action. The first constraint ensures that the total number of vehicles dispatched from i-th region must not exceed the number of idle vehicles The second constraint ensures that we do not dispatch vehicles to regions with travel times that exceeds d_t and all dispatch movement completes within a time interval For simplicity, we assume that u_tij are continuous variables; we can then solve optimization problem efficiently with Linear Programming methods
The action variable is where each taxi should go in the next timeslot Similar to the baseline, we express reward function for each vehicle as weighted sum of pickup reward and idle cruising cost We would like to learn optimal action-value function, which is defined as the maximum expected return achievable by any policy Since the number of states space is huge, we use neural network function approximator for Q For loss function, we use MSE and a target value is computed by bellman backup of current estimation
To evaluate RHC and DQN policies, we design and implement MOVI as a taxi fleet simulator This diagram shows the MOVI architecture Fleet object simulates states of all vehicles In every time step, MOVI generates ride requests based on the real trip records and matches each request to vehicles by nearest neighbor algorithm Next, the agent observes the current state of the environment which includes vehicle and requests information The agent then computes the actions, using either RHC or DQN policy, and sends a dispatch order to idle vehicles For each dispatch order, MOVI creates an estimated trajectory to the dispatched location by computing the shortest path in OSM road network graph Finally, all vehicles update their states according to their matching and dispatch assignments Dispatch policy is a separate module and does not affect other simulator modules so that we can compare different dispatch policies in the same settings
We used NYC taxi trip records for the experiments This is the regions in our experiment, showing geographical demand pattern The area size is roughly 40 km x 40 km We trained DQN and other machine learning models with one month data and evaluate metrics with one week data Temporal demand patterns are roughly similar
We use a fully convolutional neural network with a 15 x 15 output map Each grid corresponds to the Q-value for each possible move from center location FCNN enables faster learning and inference due to the absence of fully connected layer inputs: state of the env For input features, we use demand and supply heat maps surrounding an agent vehicle as an environment state. It makes input size independent on the service area The larger the input heat maps are, the further future demand agents can see for decision making, but the more computationally intensive it will be In order to make image size small, we also use smoothed heat maps so that agents know further information easily Another key design is incorporating other agents’ destination into input. This allows to mitigate environment non-stationarity because an agent can learn its optimal action conditioned on other agents’ current action
We trained DQNs with double DQN algorithm with experience replay Network weights and replay memory are shared among agents We customized epsilon greedy exploration methods by adding activation rate which controls the probability of move or stay. We found that it contributes to stable and faster training These graphs show training curves of average loss and average max Q during training max Q value starts decreasing after it reaches 100. This can be explained by environmental changes by more competitions among agents. It also indicates coordination in distributed manner.
We ran simulations with DQN policy, RHC policy and No Dispatching Policy and calculate three metrics: reject rate, passenger wait time and idle cruising time in each day of the week All simulations were ran with 1 minute time step and 8000 vehicles. In every day of week, the DQN policies significantly reduce the reject rate and wait time compared to no dispatching, while idle cruising time stays almost the same In comparison with DQN and RHC, the reject rate of DQN is reduced by 20% and wait time is reduced by 12%
Despite the fact that the DQN policy does not make coordinated decisions for idle vehicles, our results show that DQN performs better than RHC. We think that this is due to DQN’s faster and distributed dispatch decisions, allowing the dispatch policies to DQN forward pass computation takes less than 100ms while RHC computation takes a few seconds and depends on the number of regions To investigate the effect of on-demand, distributed nature of DQN, we simulates “batch” version of DQN policy. The results plotted as DQN* show that batch DQN policy performs almost the same as RHC. This indicates that faster, on-demand computation of DQN contributes to rapid adaptation to the environment state Another interesting feature of DQN policy is that it is more beneficial for drivers because it predicts the best action for each individual vehicle. The figure show that average and minimum utilization rate of all vehicles. DQN realizes better lowest utilization rate compared to RHC. Thus, the DQN policy may be more realistic to implement in real-world applications Utilization rate strongly relates to the revenue
Let me conclude our work For our contribution, There might be several extensions of our work.

INFOCOM 2018 Talk: MOVI

Recommended

Recommended

More Related Content

Similar to INFOCOM 2018 Talk: MOVI

Similar to INFOCOM 2018 Talk: MOVI (20)

More from Takuma Oda

More from Takuma Oda (6)

Recently uploaded

Recently uploaded (20)

INFOCOM 2018 Talk: MOVI

Editor's Notes