Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions

Data
Model and Fitting
Experimental Results

.

.

.

.

.
..

Large-Scale Nonparametric Estimation
of Vehicle Travel Time Distributions
Rikiya Takahashi, Takayuki Osogami,
and Tetsuro Morimura
{rikiya,osogami,tetsuro}@jp.ibm.com

IBM Research - Tokyo

Rikiya Takahashi, Takayuki Osogami, and Tetsuro Morimura

Large-Scale Nonparametric Estimation of Vehicle Travel Time Dis

Data
Model and Fitting

. Route recommendation and traffic simulation
Which route (e.g. A or B) is chosen by a car driver?
Route recommendation
Which route should you select?

Traffic simulation
Which route do you select?

Dijkstra for minimizing expected traveltime is inflexible because of
Risk unawareness Variability of
travel-time is not
considered.
Unrealistic homogeneity Everyone
takes the same route.


Data
Model and Fitting

. Example of risk-sensitive route choice: ICTE
Instead of its mean, evaluate Iterated Conditional Tail
Expectation (ICTE) (Osogami, 2011) of travel-time.
Quantiles of travel-time distribution are utilized.
The value q of CTE q can be diﬀerent among drivers.



Data
Model and Fitting

. Agenda
What we need: probability density function (p.d.f.) of
travel-time for every link of a road network.
Main proposal: data-mining algorithm to interpolating
p.d.f. for every link.

...
...
...
1
2
3

Summary of real data
Model and how to ﬁt it
Experimental prediction performance



Data
Model and Fitting

. Our road network and travel-time samples

We have a road network and probe-car dataset as
1.2M intersections and 3.3M links in Greater Tokyo Area.
3.1M travel-time samples by totally 58,584 taxis.
Data sparseness especially in suburban or rural regions.

Figure: Heatmaps based on the total number of travel-time
samples in 24 hours for each link. The green, yellow or red points
are located on the links that have at least 1, 10, or 100 samples,
Rikiyarespectively. Osogami, and Tetsuro Morimura
Takahashi, Takayuki

Data
Model and Fitting

. Distribution of relative travel-time
Histogram of the relative travel-time y
y =(actual travel-time)/(travel time by legal speed limit)
Modes of P(y ) are about from 0 to 2.

2

4

6

8

10

0

2

4

6

8

10

x=(actual time)/(standard time)

16:00-16:59
Link ID=’1049171’
#samples=50

0.8
0.4

0.6

0.2

0.4
4

6

8

10

0.0

0.2
2

0


2

4

6

8

10

#samples=45

6

8

10

8:00-8:59

2

4

6

8

10


9:00-9:59

8

10

0.30

0.4

0.20
0.10
0.00

0.1
0.0

0.0
0

6

0.2

0.2
0.1

4

4

Link ID=’1049171’
#samples=41

0.3

0.3

0.20
0.10
0.00
2


2


22:00-22:59

#samples=59

0.4

0.5
0.4
0.3
0.2
0.1
0.0
0

0


18:00-18:59
20:00-20:59
Link ID=’1539993’ Link ID=’1049171’
Link ID=’1049171’

0.30

Link ID=’1049171’
#samples=31

0.0
0


14:00-14:59

Link ID=’1539993’
#samples=31

0.6

0.8

0.5
0.0

0.0

0.1

0.1

0.2

0.2

0.3

0.3

0.4

0.4

0.5

0.5
0.4
0.3
0.2
0.1
0.0
0

Link ID=’1539993’
#samples=56

1.0

Link ID=’1539993’
#samples=103
1.0

Link ID=’1539993’
#samples=71
0.6

Link ID=’1539993’
#samples=84

0

2

4

6

8

10


0

2

4

6

8

10:00-10:59
15:00-15:59
Link ID=’1049171’


10


0

2

4

6

8

10


16:00-16:59


Data
Model and Fitting

. Issues we must solve
Scalability The sizes of the road network and travel-time
samples are large.
Data sparseness Travel-time samples are limited or missing in
suburban links.
Non-Gaussianity Distribution of travel-time is not Gaussian.
Multi-modality or heavy tails could happen.
Least-square (L2 -loss) regression is inﬂexible.
Assumption for solving: connected links have similar
distributions of vehicle velocities, depending on the required
hops.



Data
Model and Fitting

. Conditional density estimator of relative travel-time
Conditional p.d.f. of the relative travel-time y
∑
λ0 ϕ0 (y )+ m λi K (e, eπ[i] )ϕi (y )
∑ i=1
fe (y ) =
,
λ0 + m λi K (e, eπ[i] )
i=1
EΦ

{eπ[1] , · · · , eπ[m] } : subset of E

Φ {ϕ0 , ϕ1 , · · · , ϕm } : set of basis density functions
K (·, ·) : similarity function between links
λ (λ0 , λ1 , · · · , λm )T : vector of link importance
The link-independent terms λ0 and ϕ0 (·) are introduced for
handling the case ∀i ∈ {1, · · · , m}, K (e, eπ[i] ) ≡ 0.


Data
Model and Fitting

. 3 steps in estimating the parameters
∑
λ0 ϕ0 (y )+ m λi K (e, eπ[i] )ϕi (y )
∑ i=1
fe (y ) =
λ0 + m λi K (e, eπ[i] )
i=1
A) Basis function Φ {ϕ0 , ϕ1 , · · · , ϕm } Mixture of gamma or
log-normal distributions using convex clustering.
B) Link similarity K (·, ·) Sparse diffusion kernel on a
link-connectivity graph.
C) Link importance λ (λ0 , λ1 , · · · , λm )T Kullback-Leibler
Importance Estimation Procedure (KLIEP)
(Sugiyama et al., 2008).
Stability of fitting: each component can be fitted with either
convex optimization or simple matrix multiplication.


Data
Model and Fitting

. A) Fitting nonparametric basis density functions
At most L mixtures of gamma or log-normal distributions
ϕi (y ) =

L
∑

θi ψ (y )

=1

Optimize mixture weights as
[ L
]
∑
∑
max
log
θi ψ (y ) .
θi

Figure: Sliding windows
for ﬁtting ψ1 ,· · ·, ψL

y ∈Yi

=1

Convex w.r.t. θ i (θi1 , · · · , θiL )T
Fast convergence with Sequential
Minimal Optimization (SMO)
(Takahashi, 2011)



Data
Model and Fitting

. B) Link connectivity graph and its Laplacian
Adjacency matrix A = (aij ; ei = (ui , vi ), ej = (uj , vj )) as
{
∆T (e )∆(ej )
1
+ 2 ∆(ei )i ∆(ej )
if ui = vj ∪ vi = uj
aij = 2
.
0
otherwise

Values of {aij } when the wide
arrow represents ei .

xv − xu for e = (u, v ) and xu , xv ∈ R2 : location
(∑
)
∑|E |
|E |
D = diag
j=1 a1j , · · · ,
j=1 a|E |
∆(e)

H = D−1/2 (A−D) D−1/2 : negative normalized Laplacian


Data
Model and Fitting

. B) Sparse diffusion kernel as link similarity
The diffusion kernel exp(βH) (Kondor and Lafferty, 2002)
is dense and computationally infeasible, while H is sparse.
Assume that traffic does not diffuse broadly in short time.
Then β is small and an approximate kernel matrix is
(

β
K (β, p) = I+ H
p

)p
=

p
∑
q=0

p!β q
Hq ,
q!(p−q)!p q

where p is a resolution hyperparameter in discretization.
The (i, j)-th element of the matrix K (β, p) gives the
similarity value between the edges ei and ej .



Data
Model and Fitting

. C) Optimize the link importance with SMO
The vector of link importance λ is optimized with KLIEP as
]
[
m
∑
∑ ∑
λi K (e, eπ[i] )ϕi (y )
max
log λ0 ϕ0 (y )+
λ

e∈E+ y ∈Y[e]

s.t.

∑ ∑

i=1

[

λ0 +

e∈E+ y ∈Y[e]

m
∑

]
λi K (e, eπ[i] ) = n.

i=1

Convex optimization
Equivalent objective to that of convex clustering, with a
variable transformation
Also can be accelerated with SMO


Data
Model and Fitting

. Experimental setting
10-fold likelihood cross-validation to evaluate predictive
performances.
Evaluate performances independently for 24 hourly
datasets.
Hyperparameters are also chosen with validations.
L = 100 and p = 8 (ﬁxed)
r ∈ {1, 1.5, 2, · · · , 3} and β ∈ {1, 2, 3, 4, 5}.

Compare with parametric regression methods assuming
single log-normal distribution.



Data
Model and Fitting

. Time dependent size of the data
Table: The numbers, N, of travel-time samples, and the numbers,
|E+ |, of links that have at least one sample for each time slot.
hour
N
|E+ |
0:00273,168 69,126
1:00185,567 53,018
2:00109,662 38,994
3:0049,821 25,620
4:0022,501 15,484
5:0024,433 16,189
6:0023,868 16,579
7:0062,753 30,025
8:00149,906 47,400
9:00154,597 47,067
10:00- 131,383 42,445
11:00- 111,664 37,080

hour
N
|E+ |
12:00- 129,148 41,569
13:00- 133,987 40,083
14:00- 128,288 37,594
15:00- 130,971 36,980
16:00- 134,056 37,794
17:00- 174,748 43,074
18:00- 196,978 45,676
19:00- 162,816 41,468
20:00- 149,438 42,592
21:00- 169,125 47,856
22:00- 169,956 49,328
Large-Scale 165,835
23:00- Nonparametric Estimation of Vehicle Travel Time Dis
47,297

Data
Model and Fitting

. Experimental predictive performances
Nonparametric CDEs outperform for all of the datasets.
0

Euclid-kNN
Nadaraya-Watson
CDE(Gamma)
CDE(LogNormal)
CDE(MixGamma)
CDE(MixLogNormal)

avg. test-set log-likelihood

-0.2
-0.4
-0.6
-0.8
-1

-1.2
-1.4
-1.6
-1.8
0

3

6

9

12

15

18

21

hour (index of the dataset)

Figure: Average test-set log-likelihood for each hourly dataset,
based on the 10-fold likelihood cross-validations.


Data
Model and Fitting

. Links having complex distributions
ge (y ): single exponential-family approximation of fe (y )
based on moment matching
Cauchy-Schwarz (CS) divergence (Pr´
ıncipe, 2010)
∫
f (y )ge (y )dy
y e
.
CS(f , g |e) = − log √∫
∫
2
fe2 (y )dy y ge (y )dy
y

12:00-

18:00-

0:00-

Figure: Links having top-1% highest CS divergence scores.


Data
Model and Fitting

. Conclusion and future directions
A novel nonparametric estimator of travel-time
distributions conditioned on the link of a road network.
A) Basis density functions by mixture of gamma or
log-normal distributions
B) Sparse diﬀusion kernel as link similarity
C) Optimizing link importance with KLIEP and SMO

Future directions
Interpolate p.d.f.s also in time domain, as well as the
spatial domain
Incorporate correlation among links
Estimate each driver’s preference for realistic simulation



Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions

Similar to Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions (20)

Recently uploaded

Recently uploaded (20)

Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions