SlideShare a Scribd company logo
1 of 21
Augmenting Decisions of Taxi Drivers through
Reinforcement Learning for Improving Revenues
AAAI Association for the Advancement of Artificial Intelligence, 2017
Tanvi Verma, Pradeep Varakantham, Sarit Kraus, Hoong Chuin Lau
November 3, 2021
Presenter: Kyunghwan Mun
Contents
• Introduction
• Related Work
• Methodology
• Experiment
• Conclusion and Discussion
Introduction
• Taxis roam around not having a customer (Cruising)
▪ It is important to reduce cruising time and increase revenue
• Right “location" at the right “time”
• Reinforcement Learning (RL)
▪ Maximizing the long term revenue
• Requirement of making a sequence of decisions
• Wait for 5 minutes
• Reinforcement Learning being well defined 🚌…
• Revenue earned from a customer
• Cost from travelling between locations
• Uncertain customer demand
• Reinforcement Learning captures uncertainty well
• Learning focus of RL can adapt demand patterns
2
3
Introduction
• Contributions
▪ Annotation precedure of the trajectory data
▪ Monte Carlo Reinforcement Learning
▪ Iterative abstraction
▪ Evaluation method 🚌…
• The average revenue earned by the learned policy >>The top 10 percentile revenue
• The agent performance >> top 1 percentile revenue (Some time intervals)
• The increase in taxi utilization employing revenue maximization objective
4
Related Work
• Taxi Guidance
▪ Pick-up probability to recommend a driving route for profit maximization
▪ Cruising route to vacant taxis such that vacancy time is minimized
▪ Driver’s experience to find parking spots for a cruising taxi
▪ Taxi trajectories to learn traffic patterns and estimate travel time
▪ Locations for taxi drivers by constructing a spatio-temporal profitability map
• Surrounding regions of the driver
• Computing potential profit using historical data
▪ Considers long term revenue
▪ Any perferences with respet to areas are inherently captured
▪ Relies on past experiences
▪ Taxi trajectory data
5
Related Work
• Reinforcement Learning (RL)
▪ Model-based learning
• Transition probabilities
• Reward function to compute values of states
▪ Model-free learning
• When obtaining samples of experience from the dataset
• Temporal Difference method
• Monte Carlo method
• Estimate state-action values
6
Related Work
• Deep Reinforcement Learning (DeepRL)
▪ Ideal methods for environments where tens of milions of learning episodes
▪ Inappropriate situations to apply in taxi cases 🚌…
• Too small the number of features within the state space
7
Methodology
• Taxi Dataset
▪ A major company in Singapore
▪ Each log enry of the data
• Latitude (GPS)
• Longitude (GPS)
• Taxi ID
• Driver ID
• Taxi Status 🚌…
• Taxis-free (meter off, actively looking for next passenger)
• Busy (not accepting bookings)
• POB (Passenger On Board)
• Off-line
8
Methodology
• Driver Activity Graphs - 1
▪ Cruising trajectory
• “Free” state  “Non-free” state (passenger on board, busy, break, off-line, on call etc.)
• Cruising trajectories of drivers from the dataset
• Annotating the trajectories with the decisions made
▪ Figure 1.
• Starts at A
• Terminates at E
• B, C, and D are intermediate decision coordinates
▪ Desired path
• The shortest path between A and E
• Evaluate if the driver could have made the decision to go to D at A 🚌…
 If not, includes C in the trajectory and repeats for the final trajectory
9
Methodology
• Driver Activity Graphs – 2
▪ Convert each cruising trajectory into an activity graph
• A directed graph with decision coordinates as nodes
• Distance travelled between the coordinates
• Weight of the edge between them
• Terminating node of the activity graph
• Contains information about revenue earned
• Earned Revenue
• The fare of trip – The cost of travel for the trip
10
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ State is given as follows:
• <day of week, zone, time interval>
• Divide the entire map of Singapore into several zones
• Time interval (0-6 hours, 6-9 hours, 9-12 hours, 12-17 hours, 17-20 hours, 20-24 hours)
• For n zones, n available actions
(Stay in the current zone / Move to remaining n-1 zones)
▪ Episodes
• “Non-free” state  “Free” state  “Non-free” state (Termination)
• The cost of travel between nodes  Fixed cost per km to the weight of the edge
• Positive reward : The fare of the trip – The cost to travel the trip
11
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ (Algorithm 1) Monte Carlo Estimation of Q Values
• Return (The cumulative reward accumulated till the end of the episode)
• 𝑄(𝑠, 𝑎) : The value of (𝑠, 𝑎) pair
• Variable “min-count” to avoid inaccurate estimated value
• 𝐶𝑜𝑢𝑛𝑡(𝑠, 𝑎) : The total number of training episodes in which
(𝑠, 𝑎) was visited
• Policy 𝜋(𝑠) : mapping state s to it’s optimal action
• 𝑆 : The set of states
• 𝐴 : The set of actions
• 𝑆𝑙𝑒𝑎𝑟𝑛𝑒𝑑 : The set of states for which we could learn optimal
policy
12
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ Zone Structure
• Too big zones
 Increase uncertainty in outcome for actions
• Too small zones
 Doesn’t have sufficient training data to learn something meaningful
It is importance to balance between uncertainty and granulaity
▪ Method 1.) Static Zones
• Start with a large number of uniformly distributed zones
• Check how many relevant episodes are present in each zone
• If the number < 𝑚𝑖𝑛 − 𝑐𝑜𝑢𝑛𝑡, merge the zone has sufficient data
(500 zones  111 zones)
13
Methodology
• Reinforcement Learning (RL) for Taxi Driver
▪ Method 2.) Dynamic Zones
• Fix 𝑡𝑖𝑚𝑒 − 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 and 𝑑𝑎𝑦 − 𝑜𝑓 − 𝑡ℎ𝑒 − 𝑤𝑒𝑒𝑘
 each zone maps to a unique state and a unique action
• Decide whether certain low valued zones needs to split into smaller zones
• For Split zones, learn Q-values for the new set of zones
• Check if certain zones can be split
• Decrease the uncertainty in outcome of optimal action
• If smaller zones having adequate data & increasing the ocerall value of the bigger zone
 Split larger zones into smaller zones
14
Methodology
• (Algorithm 2) Dynamic zoning
▪ Start with four large uniform zones
▪ Split the zones repeatedly until further splits is not possible
• (Algorithm 3) WorthSplitting(z)
▪ Split the zones using K-Means Clustering
• Size of child zones > min-size
• max
𝑎
𝑄(𝑠1, 𝑎) + max
𝑎
𝑄 𝑠2, 𝑎 > max
𝑎
𝑄(𝑠, 𝑎)
• argmax
𝑎
𝑄 𝑠1, 𝑎 ! = argmax
𝑎
𝑄(𝑠, 𝑎) OR argmax
𝑎
𝑄 𝑠2, 𝑎 ! = argmax
𝑎
𝑄(𝑠, 𝑎)
15
Experiments
• Evaluation Method
▪ Compare (a), (b) and (c)
• Average revenue earned by our learning agent … (a)
• The top percentile revenue of drivers … (b)
• Revenue earned by greedy heuristics typically employed by drivers during cruising … (c)
▪ Simulation of Agent Movements
• Assigning the available trips to the agent while consider competition from active drivers
• Trip data and trajectories of all active drivers during a given date and time-interval
• Finding the relevant available trips (non pre-booked trips) that originated from each state
• Revenue earned, duration and distance for each trip
• Assignment probability (𝑝𝑎𝑠𝑠𝑖𝑔𝑛
𝑠𝑡
) :
𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑝𝑠 𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒
𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑟𝑢𝑖𝑠𝑖𝑛𝑔 𝑑𝑟𝑖𝑣𝑒𝑟𝑠 𝑝𝑟𝑒𝑠𝑒𝑛𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒
16
Experiments
• Evaluation Method
▪ Driver revenue
• It is difficult to estimate the exact cruising distance of our agent 🚌…
• Apply cost of travel per cruising minute
• Compute time duration for which the driver was not hired in the time interval
• 𝐶𝑟𝑢𝑖𝑠𝑖𝑛𝑔 − 𝑐𝑜𝑠𝑡 per minute is appled for this duration
• Driver’s revenue in a time interval
= All the trips of the driver in the time interval – Cost of travelling all trip distance – Cost of all cruising
▪ Heuristic strategy
• The remaining probability (𝑝𝑠𝑡𝑎𝑦 = 0.5)
17
Experiments
• Evaluation Method
▪ Agent revenue
• Compute agent’s revenue for each time-interval
• Initialize time with a start time of the interval
18
Experiments
• Experimental Results
▪ Evaluate dataset period : 1 month
▪ Average agent revenue VS Average of top percentile revenues earned by drivers
• Compare with top 10 percentile revenues
▪ Starting states of agent : Top 500 drivers in each time interval 🚌…
• For a given time interval and day, the agent revenue is averaged over 500,000 executions
(500 different initial states * 1000 exeutions)
19
Conclusion and Discussion
• Limitations & Requirements
▪ One single learning agent
 Multiple learning agents
▪ Starting states of agent : Top 500 drivers in each time interval
 Dynamic starting states with multiple learning agents
▪ Simple taxi states  Construct diverse taxi states (K trips, etc)
▪ Construction of time intervals divided
 Based on historical data
▪ Condition that the episode ends
 Set the end time of episodes that exceeds a specific threshold
(ex) K trips, Cruising distance, Waiting time, …) to reduces executions
Thank you
Any questions?

More Related Content

Similar to Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues

Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel PlanTime-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Planpc3377
 
Deep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle controlDeep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle controlPRITIJHA21
 
Towards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approachTowards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approachivaderivader
 
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm OptimizationA Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm OptimizationRajib Roy
 
Assignment12 s1270253.pptx
Assignment12 s1270253.pptxAssignment12 s1270253.pptx
Assignment12 s1270253.pptxRyoyaYoshimoto
 
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptxDeep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptxNehaVerma933923
 
When and where are bus express services justified?
When and where are bus express services justified?When and where are bus express services justified?
When and where are bus express services justified?BRTCoE
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Networkivaderivader
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsDatabricks
 
Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454nazifa tabassum
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta
 
Appointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathonAppointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathonJacob Westerfield
 
How to Design an On-Demand Transit Service
How to Design an On-Demand Transit ServiceHow to Design an On-Demand Transit Service
How to Design an On-Demand Transit ServiceGurjap Birring
 
Paratransit Service Analytics Reporting
Paratransit Service Analytics ReportingParatransit Service Analytics Reporting
Paratransit Service Analytics ReportingTSSParatransit
 
AWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyAWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyJouni Luoma
 
AACourier-30m-Industry
AACourier-30m-IndustryAACourier-30m-Industry
AACourier-30m-IndustryJake Cracknell
 
iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS TracesiBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS TracesRrubaa Panchendrarajan
 

Similar to Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues (20)

Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel PlanTime-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
Time-based Prize Collecting Modified Orienteering Problem for NYC Travel Plan
 
Deep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle controlDeep reinforcement learning for traffic light cycle control
Deep reinforcement learning for traffic light cycle control
 
Towards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approachTowards better bus networks: A visual analytics approach
Towards better bus networks: A visual analytics approach
 
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm OptimizationA Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
A Dynamic Logistic Dispatching System With Set-Based Particle Swarm Optimization
 
Assignment12 s1270253.pptx
Assignment12 s1270253.pptxAssignment12 s1270253.pptx
Assignment12 s1270253.pptx
 
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptxDeep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
Deep_Reinforcement_Learning_based_Dynamic_Timetable.pptx
 
When and where are bus express services justified?
When and where are bus express services justified?When and where are bus express services justified?
When and where are bus express services justified?
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454Presentation on Spot Speed Study Analysis for the course CE 454
Presentation on Spot Speed Study Analysis for the course CE 454
 
A new traffic report 3
A new traffic report 3A new traffic report 3
A new traffic report 3
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for AD
 
Appointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathonAppointment system tru x_usc_hackathon
Appointment system tru x_usc_hackathon
 
How to Design an On-Demand Transit Service
How to Design an On-Demand Transit ServiceHow to Design an On-Demand Transit Service
How to Design an On-Demand Transit Service
 
Paratransit Service Analytics Reporting
Paratransit Service Analytics ReportingParatransit Service Analytics Reporting
Paratransit Service Analytics Reporting
 
A new traffic report 2
A new traffic report 2A new traffic report 2
A new traffic report 2
 
A new traffic report 2
A new traffic report 2A new traffic report 2
A new traffic report 2
 
AWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer storyAWS Finland Meetup June 2019 - DeepRacer story
AWS Finland Meetup June 2019 - DeepRacer story
 
AACourier-30m-Industry
AACourier-30m-IndustryAACourier-30m-Industry
AACourier-30m-Industry
 
iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS TracesiBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces
 

More from ivaderivader

DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality ivaderivader
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...ivaderivader
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...ivaderivader
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...ivaderivader
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...ivaderivader
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualizationivaderivader
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...ivaderivader
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...ivaderivader
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeivaderivader
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removalivaderivader
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Networkivaderivader
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ivaderivader
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Componentsivaderivader
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking Systemivaderivader
 
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music TransformerVideo Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformerivaderivader
 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
 
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training  MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 
Video Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music TransformerVideo Background Music Generation with Controllable Music Transformer
Video Background Music Generation with Controllable Music Transformer
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues

  • 1. Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Improving Revenues AAAI Association for the Advancement of Artificial Intelligence, 2017 Tanvi Verma, Pradeep Varakantham, Sarit Kraus, Hoong Chuin Lau November 3, 2021 Presenter: Kyunghwan Mun
  • 2. Contents • Introduction • Related Work • Methodology • Experiment • Conclusion and Discussion
  • 3. Introduction • Taxis roam around not having a customer (Cruising) ▪ It is important to reduce cruising time and increase revenue • Right “location" at the right “time” • Reinforcement Learning (RL) ▪ Maximizing the long term revenue • Requirement of making a sequence of decisions • Wait for 5 minutes • Reinforcement Learning being well defined 🚌… • Revenue earned from a customer • Cost from travelling between locations • Uncertain customer demand • Reinforcement Learning captures uncertainty well • Learning focus of RL can adapt demand patterns 2
  • 4. 3 Introduction • Contributions ▪ Annotation precedure of the trajectory data ▪ Monte Carlo Reinforcement Learning ▪ Iterative abstraction ▪ Evaluation method 🚌… • The average revenue earned by the learned policy >>The top 10 percentile revenue • The agent performance >> top 1 percentile revenue (Some time intervals) • The increase in taxi utilization employing revenue maximization objective
  • 5. 4 Related Work • Taxi Guidance ▪ Pick-up probability to recommend a driving route for profit maximization ▪ Cruising route to vacant taxis such that vacancy time is minimized ▪ Driver’s experience to find parking spots for a cruising taxi ▪ Taxi trajectories to learn traffic patterns and estimate travel time ▪ Locations for taxi drivers by constructing a spatio-temporal profitability map • Surrounding regions of the driver • Computing potential profit using historical data ▪ Considers long term revenue ▪ Any perferences with respet to areas are inherently captured ▪ Relies on past experiences ▪ Taxi trajectory data
  • 6. 5 Related Work • Reinforcement Learning (RL) ▪ Model-based learning • Transition probabilities • Reward function to compute values of states ▪ Model-free learning • When obtaining samples of experience from the dataset • Temporal Difference method • Monte Carlo method • Estimate state-action values
  • 7. 6 Related Work • Deep Reinforcement Learning (DeepRL) ▪ Ideal methods for environments where tens of milions of learning episodes ▪ Inappropriate situations to apply in taxi cases 🚌… • Too small the number of features within the state space
  • 8. 7 Methodology • Taxi Dataset ▪ A major company in Singapore ▪ Each log enry of the data • Latitude (GPS) • Longitude (GPS) • Taxi ID • Driver ID • Taxi Status 🚌… • Taxis-free (meter off, actively looking for next passenger) • Busy (not accepting bookings) • POB (Passenger On Board) • Off-line
  • 9. 8 Methodology • Driver Activity Graphs - 1 ▪ Cruising trajectory • “Free” state  “Non-free” state (passenger on board, busy, break, off-line, on call etc.) • Cruising trajectories of drivers from the dataset • Annotating the trajectories with the decisions made ▪ Figure 1. • Starts at A • Terminates at E • B, C, and D are intermediate decision coordinates ▪ Desired path • The shortest path between A and E • Evaluate if the driver could have made the decision to go to D at A 🚌…  If not, includes C in the trajectory and repeats for the final trajectory
  • 10. 9 Methodology • Driver Activity Graphs – 2 ▪ Convert each cruising trajectory into an activity graph • A directed graph with decision coordinates as nodes • Distance travelled between the coordinates • Weight of the edge between them • Terminating node of the activity graph • Contains information about revenue earned • Earned Revenue • The fare of trip – The cost of travel for the trip
  • 11. 10 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ State is given as follows: • <day of week, zone, time interval> • Divide the entire map of Singapore into several zones • Time interval (0-6 hours, 6-9 hours, 9-12 hours, 12-17 hours, 17-20 hours, 20-24 hours) • For n zones, n available actions (Stay in the current zone / Move to remaining n-1 zones) ▪ Episodes • “Non-free” state  “Free” state  “Non-free” state (Termination) • The cost of travel between nodes  Fixed cost per km to the weight of the edge • Positive reward : The fare of the trip – The cost to travel the trip
  • 12. 11 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ (Algorithm 1) Monte Carlo Estimation of Q Values • Return (The cumulative reward accumulated till the end of the episode) • 𝑄(𝑠, 𝑎) : The value of (𝑠, 𝑎) pair • Variable “min-count” to avoid inaccurate estimated value • 𝐶𝑜𝑢𝑛𝑡(𝑠, 𝑎) : The total number of training episodes in which (𝑠, 𝑎) was visited • Policy 𝜋(𝑠) : mapping state s to it’s optimal action • 𝑆 : The set of states • 𝐴 : The set of actions • 𝑆𝑙𝑒𝑎𝑟𝑛𝑒𝑑 : The set of states for which we could learn optimal policy
  • 13. 12 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ Zone Structure • Too big zones  Increase uncertainty in outcome for actions • Too small zones  Doesn’t have sufficient training data to learn something meaningful It is importance to balance between uncertainty and granulaity ▪ Method 1.) Static Zones • Start with a large number of uniformly distributed zones • Check how many relevant episodes are present in each zone • If the number < 𝑚𝑖𝑛 − 𝑐𝑜𝑢𝑛𝑡, merge the zone has sufficient data (500 zones  111 zones)
  • 14. 13 Methodology • Reinforcement Learning (RL) for Taxi Driver ▪ Method 2.) Dynamic Zones • Fix 𝑡𝑖𝑚𝑒 − 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 and 𝑑𝑎𝑦 − 𝑜𝑓 − 𝑡ℎ𝑒 − 𝑤𝑒𝑒𝑘  each zone maps to a unique state and a unique action • Decide whether certain low valued zones needs to split into smaller zones • For Split zones, learn Q-values for the new set of zones • Check if certain zones can be split • Decrease the uncertainty in outcome of optimal action • If smaller zones having adequate data & increasing the ocerall value of the bigger zone  Split larger zones into smaller zones
  • 15. 14 Methodology • (Algorithm 2) Dynamic zoning ▪ Start with four large uniform zones ▪ Split the zones repeatedly until further splits is not possible • (Algorithm 3) WorthSplitting(z) ▪ Split the zones using K-Means Clustering • Size of child zones > min-size • max 𝑎 𝑄(𝑠1, 𝑎) + max 𝑎 𝑄 𝑠2, 𝑎 > max 𝑎 𝑄(𝑠, 𝑎) • argmax 𝑎 𝑄 𝑠1, 𝑎 ! = argmax 𝑎 𝑄(𝑠, 𝑎) OR argmax 𝑎 𝑄 𝑠2, 𝑎 ! = argmax 𝑎 𝑄(𝑠, 𝑎)
  • 16. 15 Experiments • Evaluation Method ▪ Compare (a), (b) and (c) • Average revenue earned by our learning agent … (a) • The top percentile revenue of drivers … (b) • Revenue earned by greedy heuristics typically employed by drivers during cruising … (c) ▪ Simulation of Agent Movements • Assigning the available trips to the agent while consider competition from active drivers • Trip data and trajectories of all active drivers during a given date and time-interval • Finding the relevant available trips (non pre-booked trips) that originated from each state • Revenue earned, duration and distance for each trip • Assignment probability (𝑝𝑎𝑠𝑠𝑖𝑔𝑛 𝑠𝑡 ) : 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑝𝑠 𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑟𝑢𝑖𝑠𝑖𝑛𝑔 𝑑𝑟𝑖𝑣𝑒𝑟𝑠 𝑝𝑟𝑒𝑠𝑒𝑛𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑎𝑡𝑒 𝑎𝑡 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒
  • 17. 16 Experiments • Evaluation Method ▪ Driver revenue • It is difficult to estimate the exact cruising distance of our agent 🚌… • Apply cost of travel per cruising minute • Compute time duration for which the driver was not hired in the time interval • 𝐶𝑟𝑢𝑖𝑠𝑖𝑛𝑔 − 𝑐𝑜𝑠𝑡 per minute is appled for this duration • Driver’s revenue in a time interval = All the trips of the driver in the time interval – Cost of travelling all trip distance – Cost of all cruising ▪ Heuristic strategy • The remaining probability (𝑝𝑠𝑡𝑎𝑦 = 0.5)
  • 18. 17 Experiments • Evaluation Method ▪ Agent revenue • Compute agent’s revenue for each time-interval • Initialize time with a start time of the interval
  • 19. 18 Experiments • Experimental Results ▪ Evaluate dataset period : 1 month ▪ Average agent revenue VS Average of top percentile revenues earned by drivers • Compare with top 10 percentile revenues ▪ Starting states of agent : Top 500 drivers in each time interval 🚌… • For a given time interval and day, the agent revenue is averaged over 500,000 executions (500 different initial states * 1000 exeutions)
  • 20. 19 Conclusion and Discussion • Limitations & Requirements ▪ One single learning agent  Multiple learning agents ▪ Starting states of agent : Top 500 drivers in each time interval  Dynamic starting states with multiple learning agents ▪ Simple taxi states  Construct diverse taxi states (K trips, etc) ▪ Construction of time intervals divided  Based on historical data ▪ Condition that the episode ends  Set the end time of episodes that exceeds a specific threshold (ex) K trips, Cruising distance, Waiting time, …) to reduces executions