Applied Mathematics of Logistics
-Mathematics of inventory-
@nobo0409
The University of Tokyo
Nishinari labotory B4
July 9, 2019
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 1 / 54
Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 2 / 54
Newsboy model
Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 3 / 54
Newsboy model
Newsboy model
Newsboy model
• A newsboy sells only one kind of newspaper.
• The inventory cost h and the shortage cost b are constant.
• The demand D ≥ 0 is a random valuable ∼ F.
• The amount of stock s ≥ 0 is determined by
minimize to C(s) = E
[
h [s − D]+
+ b [s − D]−]
. (1)
Optimal s = s∗
F(s∗
) = Pr[D ≤ s∗
] =
b
b + h
(=: ω critical ratio) (2)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 4 / 54
Optimal policies for a multi-echelon inventory system
Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 5 / 54
Optimal policies for a multi-echelon inventory system
Setup1
Multi inventory model
• Time t ≥ 0 is continious.
• Stationary process.
• We can order some pieces of newspaper anytime.
1
J. Clark, Andrew & Scarf, Herbert. (2004). Optimal Policies for a Multi-Echelon Inventory Problem.
Management Science. 50. 1782-1790.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 6 / 54
Optimal policies for a multi-echelon inventory system Real stock model
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 7 / 54
Optimal policies for a multi-echelon inventory system Real stock model
Notation
I′
i(t) The real amount of inventory.
B′
i(t) The amount of back order.
IN′
i(t) Net Inventory.
IOi(t) Inventory on Order.
ITi(t) Inventory in transit.
IOP′
i (t) Inventory Ordering Position.
ITP′
i (t) Inventory Transit Position.
L′
i Lead time.
D(s, t] Demand(r.v.).
s′
i base stock level.
b unit time.
h′
i unit time at i.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 8 / 54
Optimal policies for a multi-echelon inventory system Real stock model
By the definition, it follows that
IN′
i(t) = Ii(t) − B′
i(t), I′
i(t) = [IN′
i(t)]+
, B′
i(t) = [IN′
i(t)]−
, (3)
IOi(t) − ITi(t) = B′
i(t), (4)
IOP′
i (t) = IN′
i(t) + IOi(t), (5)
ITP′
i (t) = IN′
i(t) + ITi(t), (6)
IOP′
i (t) − ITP′
i (t) = B′
i+1(t), (7)
IOP′
i (t) = s′
i. (8)
And, inventory flow conserve.
IN′
i(t + L′
i) = ITP′
i (t) − D(t, t + L′
i] (9)
then
IN′
i(t + L′
i) = s′
i − B′
i+1(t) − D(t, t + L′
i]. (10)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 9 / 54
Optimal policies for a multi-echelon inventory system Real stock model
Now, the process which r.v. D follows is not depend of t. So, we can denote D(t, t + L′
i] = Di.
The method to calculate back order B′
i
By (10) and B′
i(t) = [IN′
i(t)]−, we can get the folowing method
{
B′
i = 0 i = n + 1,
B′
i = [s′
i − B′
i+1 − Di]−
i = n, n − 1, · · · , 1.
(11)
(12)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 10 / 54
Optimal policies for a multi-echelon inventory system Echelon inventory model
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 11 / 54
Optimal policies for a multi-echelon inventory system Echelon inventory model
Echelon invemtory model
Definition
Echelon inventory is defined as the inventory between a stage in the supply chain and the final
customer.
Re-notetion
Ii(t) := I′
i(t) +
i−1∑
j=1
{
ITj(t) + I′
j(t)
}
B(t) := B′
1(t)
INi(t) := Ii(t) − B(t)
IOPi(t) := INi(t) + IOi(t)
ITPi(t) := INi(t) + ITi(t)
si Echelon base stock level. IOPi(t) ≡ si
hi := h′
i − h′
i+1
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 12 / 54
Optimal policies for a multi-echelon inventory system Echelon inventory model
The conservation of inventory flow.
INi(t + L′
i) = ITPi(t) − D(t, t + L′
i] (13)
If we have an enough amount of stock, we can get the amount equal to the order
IOPi(t) = si, otherwise, only get Net Inventory INi+1(t). It means that
ITPi(t) = min {si, INi+1(t)}. (14)
Then, we have the equilibrium solution.
The equilibrium solution of the echelon inventory.



IOPn = sn,
INi = ITPi − Di,
ITPi = min {si, INi+1}
(15)
(16)
(17)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 13 / 54
Optimal policies for a multi-echelon inventory system Echelon inventory model
The optimization of si
The optimization problem
minimize to E
[ n∑
i=1
hiINi + (b + h′
1)B
]
(18)
We are going to optimize it in order from the bottom. The justification will be proved later.
Notation
Ci(x) The minimum cost up to the i step when INi+1 = x.
Ci(x) The minimum cost up to the i step when INi = x.
Ci(y) The minimum cost up to the i step when ITPi = x.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 14 / 54
Optimal policies for a multi-echelon inventory system Echelon inventory model
The algorithm to calculate an optimal stock s∗
i
Initialization
C0(x) = (b + h′
1)[x]−
(19)
Determine Ci, Ci, Ci
Ci(x) = hix + Ci−1(x) → Ci(y) = E
[
Ci(y − Di)
]
(20)
Ci(x) = Ci(min {s∗
i , x}) (21)
Optimize si, s′
i
s∗
i = arg min
y>0
Ci(y) → s−∗
= min
i≤j
s∗
j (22)
s′∗
i = s−∗
i − s−∗
i−1 (23)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 15 / 54
Dynamic programming
Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 16 / 54
Dynamic programming Dynamic systems and dynamic programming
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 17 / 54
Dynamic programming Dynamic systems and dynamic programming
General structure of finite horizon optimal control problems2
The model has two principal features:
1 a discrete-time dynamic system,
2 a cost function that is additive over time.
2
Dimitri P. Bertsekas (2005). Dynamic Programming and Opimal Control. Athena Scienific. Vol 1. 4th
ediion.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 18 / 54
Dynamic programming Dynamic systems and dynamic programming
Discrete time dynamic system
This system has the form
xt+1 = ft(xt, ut, wt), t = 0, 1, · · · , T − 1, (24)
where
t indexes discrete time
xt ∈ St is the state of the system abd summarizes past information that is relevant for
future optimization,
ut ∈ Ut(xt) ⊂ Ct is the control or decision variable to be selected at time t,
wt ∈ Wt is a random parameter featured by Pr [·|xt, ut],
ft is a function that describes the system and in particular the mechanism by which
the state is updated.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 19 / 54
Dynamic programming Dynamic systems and dynamic programming
The optimization problem
The cost function is denoted by gt, accumulates over time. We therefore formulate the
problem as an optimization of the expected cost
mimize to E
[
gT (xT ) +
T−1∑
t=0
gt(xt, ut, wt)
]
(25)
Definition
The class of policies consist of a sequence of functions,
π = {µt}T−1
t=0 = {µ0, . . . , µT−1} (26)
where
µt : St ∋ xt −→ µt(xt) ∈ Ct, t = 0, . . . , T − 1. (27)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 20 / 54
Dynamic programming Dynamic systems and dynamic programming
Definition
If µt is such that µt(xt) ∈ Ut(xt) for all xt ∈ St, such policies π will be called admissible. The
set of all admissible policies is denoted by Π.
Given x0 and an admissible policy π = {µt}T−1
t=0 , xt, wt are random valuables with distributions
defined through the system eqution
xt+1 = ft(xt, µ(xt), wt), t = 0, . . . , T − 1. (28)
Thus, the expected cost of π starting at x0 is
Jπ(x0) = EX1···XT ,W1···WT
[
gT (xT ) +
T−1∑
t=0
gt(xt, µt(xt), wt)
]
(29)
An optimal policy π∗ is one that minimizes this cost; i.e.,
Jπ∗ (x0) = min
π∈Π
Jπ(x0). (30)
Note that this is typically equal to J∗(x0).
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 21 / 54
Dynamic programming The DP algorithm
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 22 / 54
Dynamic programming The DP algorithm
Pinciple of optimality
Theorem
For every initial state x0, the optimal cost J∗(x0) of the basic problem is equal to J0(x0)
given by the last step of the following algorithm, which proceeds backward in time from period
T − 1 to period 0:



JT (xT ) = gT (xT ),
Jt(xt) = min
ut∈Ut(xt)
EWt [gt(xt, ut, wt) + Jt+1(ft(xt, ut, wt))], t = 0, . . . , T − 1,
(31)
(32)
where the expectation is taken with respect to the probability distributiion of wt, which
depends on xt, ut. Furthermore, if u∗
t = µ∗
t (xt) minimize the right side of Eq.(32) for each
xt, t, the policy π∗ = {µ∗
t }T−1
t=0 is optimal.
Proof.
Appendix.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 23 / 54
Dynamic programming Optimal stopping problems
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 24 / 54
Dynamic programming Optimal stopping problems
Optimal stopping problems
wt The asset value
xt ∈ {⊤, not sell}
ut ∈ {sell, not sell}
The system of the optimal stopping promblem is denoted by
xt+1 =
{
⊤ xt = ⊤ or ut = sell,
wt otherwise,
t = 0, · · · , T − 1
(33)
(34)
and, the mean of gross profiit is Eq.(29) where
gT (xT ) =
{
xT xT ̸= ⊤
0 otherwise
(35)
(36)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 25 / 54
Dynamic programming Optimal stopping problems
and
gt(xt, ut, wt) =
{
(1 + r)T−t
xt xT = ⊤ and ut = sell
0 otherwise
(37)
(38)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 26 / 54
Dynamic programming Optimal stopping problems
DP alg. of the optimal stopping problem
Initialization
JT (xT ) =
{
xT xT ̸= ⊤
0 otherwise
(39)
(40)
Iteration
Jt(xt) =
{
max
{
(1 + t)T−t
xt, E[Jt+1(wt)]
}
xt ̸= ⊤
0 otherwise
t = T − 1, · · · , 0
(41)
(42)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 27 / 54
Dynamic programming Deteministic dynamic programming
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 28 / 54
Dynamic programming Deteministic dynamic programming
Finite discrete time deterministic dynamic programming problem
• St, t = 0, · · · , T are finte.
• wt can be eliminated.
Thus, the problem results in the shortest path problem3.
3
http://web.mit.edu/15.053/www/AMP-Chapter-11.pdf
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 29 / 54
Dynamic programming Deteministic dynamic programming
Backward dynamic programming algorithm
Initialization
JT (i) = cT
i,n+1, ∀
i ∈ ST (43)
Iteration
Jt(i) = min
j∈St+1
[
ct
ij + Jt+1(j)
] ∀
i ∈ St (44)
Optimal value
J∗
= min
j∈S0
[J0(j)] (45)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 30 / 54
Dynamic programming Deteministic dynamic programming
Forward dynamic programming algorithm
Initialization
J0(i) = 0, ∀
i ∈ S0 (46)
Iteration
Jt(j) = min
j∈St−1
[
ct−1
ij + Jt−1(i)
]
∀
j ∈ St (47)
Optimal value
J∗
= min
j∈ST
[
cT
i,n+1 + JT (i)
]
(48)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 31 / 54
Dynamic programming Infinite horizon dynamic programming problems
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 32 / 54
Dynamic programming Infinite horizon dynamic programming problems
Infinite horizon dynamic programming problems
Definition
The problem aims at minimizing
Jπ(x0) = lim
T→∞
EW0W1···WT
[T−1∑
t=0
αt
gt(xt, ut(xt), wt)
]
. (49)
α is a discount factor.0 < α ≤ 1
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 33 / 54
Dynamic programming Infinite horizon dynamic programming problems
There are following three types of the problem.
1 Stochastic shortest path problems.(α = 1, lim
T→∞
cT
i,n+1 = 0)
2 Discounted dynamic programming problems.(α < 1, |g(x, u, w)| < ∞)
3 Average cost dynamic programming
problems.(Jπ(x0) = lim
T→∞
1
T
EW0W1···WT
[T−1∑
t=0
gt(xt, µt(xt), wt)
]
)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 34 / 54
Dynamic programming Infinite horizon dynamic programming problems
Stochastic shortest path problems
Stochastic shortest path problems
Notation and Assumption;
pij(u) = Pr [xt+1 = j|xt = j, ut = u], i, j = 1, . . . , n
g(i, u) =
∑
j∈S
pij(u)g(i, u, j)
p0,0(u) = 1,∀ u ∈ U
g(0, u) = 0,∀ u ∈ U
∀
π ∈ Π,∃
m ∈ {1, . . . , n}, ρπ := max
i=1,...,n
Pr [xm ̸= ⊤|x0 = i, π] < 1
Note
The results to be presented are valid under more general circumstances.Furthermore, we can
always use m = n.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 35 / 54
Dynamic programming Infinite horizon dynamic programming problems
Theorem
1 Given any initial conditions J0(i) = 0, i = 1, . . . , n, the sequence Jt(i) generated by the
iteration
Jt+1(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)Jt(j)

, ∀
i = 1, · · · , n, (50)
converge to the optimal cost J∗(i) for each i.
2 The optimal costs J∗(i), i = 1, . . . , n satisfy Bellman’s equation,
J∗
(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)J∗
(j)

, ∀
i = 1, · · · , n, (51)
and in fact they are the unique solution of this equation.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 36 / 54
Dynamic programming Infinite horizon dynamic programming problems
Corollary
1 For any stationary policy µ, the costs Jµ(i), i = 1, . . . , n are the unique solution of the
equation
Jµ(i) = g(i, µ(i)) +
n∑
j=1
pij(µ(i))Jµ(i), ∀
i = 1, · · · , n. (52)
Furthermore, given any initial conditions J0(i), i = · · · , n, the sequence Jk(i) generated
by the DP alg.
Jt+1(i) = g(i, µ(i)) +
n∑
j=1
pij(µ(i))Jt(i), ∀
i = 1, · · · , n. (53)
2 A stationary policy µ is optimal if and only if for every state i, µ(i) attains the minimum
in Bellman’s equation.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 37 / 54
Dynamic programming Infinite horizon dynamic programming problems
Computational method for stochastic shortest path problems
1 Value iteration
2 Policy iteration
3 Linear programming
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 38 / 54
Dynamic programming Infinite horizon dynamic programming problems
Value iteration
Alg. of the value iteration
Initialization
J0(i) = 0, ∀
i = 1, · · · , n. (54)
Iteration
Jt+1(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)Jt(j)

 (55)
Note:
• Infinite iterations are needed.
• The convergence speed is equal to ρK.
• Jt+1(j) + (N∗(j) − 1)ct ≤ J∗(j) ≤ Jµt (j) ≤ Jt+1(j) + (Nt(j) − 1)ct (Appendix)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 39 / 54
Dynamic programming Infinite horizon dynamic programming problems
Policy iteration
Alg. of the policy iteration
Initialization J0(i) = 0, ∀i = 1, · · · , n. µ0: A stational admissible policy.
Iteration (unless Jµt+1 (i) = Jµt (i))
• Policy evaluation step (Jµt(i)(i) ← J(i))
J(i) = g(i, µt(i)) +
n∑
j=1
pij(µt(i))J(j), ∀
i = 1, · · · , n (56)
• Policy improvemnt step
µt+1 = arg min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)Jµt(j)(j)

, ∀
i = 1, · · · , n (57)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 40 / 54
Dynamic programming Infinite horizon dynamic programming problems
Finite termination of the policy iteration
Under that assumption, the alg. generates an improving sequence of the polucies [i.e.
Jµt+1 (i) ≤ Jµt (i), ∀i, ∀t] and finitely (because S, U are finite sets) terminates with an optimal
policy.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 41 / 54
Dynamic programming Infinite horizon dynamic programming problems
Linear programming
It holds that Jt(i) ≤ Jt+1(i), and J0(i) ≤ J∗(i), thus J∗ is the “largest” J that satisfies the
constraint
J(i) ≤ g(i, u) +
n∑
j=1
pij(u)J(j), ∀
i = 1, · · · , n, u ∈ U(i). (58)
In particular, J∗(i) solve the linier program of maximizing
n∑
i=1
J(i) subject to the constraint.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 42 / 54
Dynamic programming Infinite horizon dynamic programming problems
Discounted dynamic programming problems
This problem can be converted to a stochastic shortest path problem by replacing pij(u) with
αpij(u).
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 43 / 54
Dynamic programming Infinite horizon dynamic programming problems
Average cost dynamic programming problems
The problem is essentially equivalent to a stochastic shortest path problem.
1 The optimal average cost is independent of the initial state.
2 Bellman’s equation takes the form
λ + h(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)h(j)

 (59)
where h(n) = 0, λ is the optimal average cost.
3 There are versions of the iteration.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 44 / 54
Optimality of the multi inventory system
Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 45 / 54
Optimality of the multi inventory system
Optimality of the multi inventory system
Notation
t Time
It Net Inventory (xt)
qt order (ut)
Dt Demand (wt)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 46 / 54
Optimality of the multi inventory system
Inventory control
System It+1 = It + qt − Dt
Total expected cost
E
[T−1∑
t=0
(h max {It + qt − Dt, 0} + b max {Dt − It − qt, 0})
]
+ C(IT ) (60)
DP algorithm



JT (IT ) = C(IT )
Jt(It) = min
qt≥0
[Ht(It + qt) + E[Jt+1(It + qt − Dt)]]
(61)
(62)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 47 / 54
Optimality of the multi inventory system
We can rewrite Eq.(62),
Jt(It) = min
yt≥It
[H(yt) + E[Jt+1(yt − Dt)]] = min
yt≥It
g(yt) (63)
Optimal policies and the cost
Optimal policy
µ∗
t (It) =



arg min
y∈R
gt(y) − It (It < arg min
y∈R
gt(y) =: St)
0 (otherwise)
(64)
(65)
Optimal cost
Jt(It) =
{
H(St) + E[Jt+1(St − Dt)] (It < St)
H(It) + E[Jt+1(St − Dt)] (otherwise)
(66)
(67)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 48 / 54
Optimality of the multi inventory system
Positive fixed cost
When
C(qt) =
{
K (qt > 0),
0 (otherwise).
(68)
(69)
DP algorithm



JT (IT ) = C(IT )
Jt(It) = min
{
H(It) + E[Jt+1(It − Dt)], min
yt≥It
[K + Ht(yt) + E[Jt+1(yt − Dt)]]
}
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 49 / 54
Optimality of the multi inventory system
Note:
(s, S) policies
st := min
gt(y)=K+gt(St)
y,
µ∗
t (It) =
{
St − It (It < st)
0 (It ≥ st)
(70)
(71)
is the optimal policy.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 50 / 54
Stochastic inventory model cosidering prices
Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Infinite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 51 / 54
Stochastic inventory model cosidering prices
Stochastic inventory model cosidering prices
Notation
h :inventory cost.
c :order cost.
b :back order cost
p :price (u)
D :demand (w).D(p, ε) = y(p) + ε, y(p) = −a(p − P0) + D0, ε ∼ F
s :amount of order.
z = s − y(p)
R :revenue,
R(z, p) =
{
p(y(p) + ε) − c(y(p) + z) − h(z − ε) ε ≤ z
p(y(p) + z) − c(y(p) + z) − b(ε − z) ε > z
(72)
(73)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 52 / 54
Stochastic inventory model cosidering prices
Optimal z, p
Optimal z = z∗
F(z∗
) =
p + b − c
h + p + b
(74)
Optimal price p = p∗
p∗
=
aP0 + D0 + ac −
∫ ∞
z
(x − z)f(x)dx
2a
(75)
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 53 / 54
Stochastic inventory model cosidering prices
References
久保幹雄  (2007)  『ロジスティクスの数理』 共立出版
Dimitri P. Bertsekas (2005). Dynamic Programming and Opimal Control. Athena Scienific.
Vol 1. 4th ediion.
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 54 / 54

Inventory theory presentation

  • 1.
    Applied Mathematics ofLogistics -Mathematics of inventory- @nobo0409 The University of Tokyo Nishinari labotory B4 July 9, 2019 @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 1 / 54
  • 2.
    Contents 1 Newsboy model 2Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 2 / 54
  • 3.
    Newsboy model Contents 1 Newsboymodel 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 3 / 54
  • 4.
    Newsboy model Newsboy model Newsboymodel • A newsboy sells only one kind of newspaper. • The inventory cost h and the shortage cost b are constant. • The demand D ≥ 0 is a random valuable ∼ F. • The amount of stock s ≥ 0 is determined by minimize to C(s) = E [ h [s − D]+ + b [s − D]−] . (1) Optimal s = s∗ F(s∗ ) = Pr[D ≤ s∗ ] = b b + h (=: ω critical ratio) (2) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 4 / 54
  • 5.
    Optimal policies fora multi-echelon inventory system Contents 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 5 / 54
  • 6.
    Optimal policies fora multi-echelon inventory system Setup1 Multi inventory model • Time t ≥ 0 is continious. • Stationary process. • We can order some pieces of newspaper anytime. 1 J. Clark, Andrew & Scarf, Herbert. (2004). Optimal Policies for a Multi-Echelon Inventory Problem. Management Science. 50. 1782-1790. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 6 / 54
  • 7.
    Optimal policies fora multi-echelon inventory system Real stock model 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 7 / 54
  • 8.
    Optimal policies fora multi-echelon inventory system Real stock model Notation I′ i(t) The real amount of inventory. B′ i(t) The amount of back order. IN′ i(t) Net Inventory. IOi(t) Inventory on Order. ITi(t) Inventory in transit. IOP′ i (t) Inventory Ordering Position. ITP′ i (t) Inventory Transit Position. L′ i Lead time. D(s, t] Demand(r.v.). s′ i base stock level. b unit time. h′ i unit time at i. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 8 / 54
  • 9.
    Optimal policies fora multi-echelon inventory system Real stock model By the definition, it follows that IN′ i(t) = Ii(t) − B′ i(t), I′ i(t) = [IN′ i(t)]+ , B′ i(t) = [IN′ i(t)]− , (3) IOi(t) − ITi(t) = B′ i(t), (4) IOP′ i (t) = IN′ i(t) + IOi(t), (5) ITP′ i (t) = IN′ i(t) + ITi(t), (6) IOP′ i (t) − ITP′ i (t) = B′ i+1(t), (7) IOP′ i (t) = s′ i. (8) And, inventory flow conserve. IN′ i(t + L′ i) = ITP′ i (t) − D(t, t + L′ i] (9) then IN′ i(t + L′ i) = s′ i − B′ i+1(t) − D(t, t + L′ i]. (10) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 9 / 54
  • 10.
    Optimal policies fora multi-echelon inventory system Real stock model Now, the process which r.v. D follows is not depend of t. So, we can denote D(t, t + L′ i] = Di. The method to calculate back order B′ i By (10) and B′ i(t) = [IN′ i(t)]−, we can get the folowing method { B′ i = 0 i = n + 1, B′ i = [s′ i − B′ i+1 − Di]− i = n, n − 1, · · · , 1. (11) (12) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 10 / 54
  • 11.
    Optimal policies fora multi-echelon inventory system Echelon inventory model 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 11 / 54
  • 12.
    Optimal policies fora multi-echelon inventory system Echelon inventory model Echelon invemtory model Definition Echelon inventory is defined as the inventory between a stage in the supply chain and the final customer. Re-notetion Ii(t) := I′ i(t) + i−1∑ j=1 { ITj(t) + I′ j(t) } B(t) := B′ 1(t) INi(t) := Ii(t) − B(t) IOPi(t) := INi(t) + IOi(t) ITPi(t) := INi(t) + ITi(t) si Echelon base stock level. IOPi(t) ≡ si hi := h′ i − h′ i+1 @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 12 / 54
  • 13.
    Optimal policies fora multi-echelon inventory system Echelon inventory model The conservation of inventory flow. INi(t + L′ i) = ITPi(t) − D(t, t + L′ i] (13) If we have an enough amount of stock, we can get the amount equal to the order IOPi(t) = si, otherwise, only get Net Inventory INi+1(t). It means that ITPi(t) = min {si, INi+1(t)}. (14) Then, we have the equilibrium solution. The equilibrium solution of the echelon inventory.    IOPn = sn, INi = ITPi − Di, ITPi = min {si, INi+1} (15) (16) (17) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 13 / 54
  • 14.
    Optimal policies fora multi-echelon inventory system Echelon inventory model The optimization of si The optimization problem minimize to E [ n∑ i=1 hiINi + (b + h′ 1)B ] (18) We are going to optimize it in order from the bottom. The justification will be proved later. Notation Ci(x) The minimum cost up to the i step when INi+1 = x. Ci(x) The minimum cost up to the i step when INi = x. Ci(y) The minimum cost up to the i step when ITPi = x. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 14 / 54
  • 15.
    Optimal policies fora multi-echelon inventory system Echelon inventory model The algorithm to calculate an optimal stock s∗ i Initialization C0(x) = (b + h′ 1)[x]− (19) Determine Ci, Ci, Ci Ci(x) = hix + Ci−1(x) → Ci(y) = E [ Ci(y − Di) ] (20) Ci(x) = Ci(min {s∗ i , x}) (21) Optimize si, s′ i s∗ i = arg min y>0 Ci(y) → s−∗ = min i≤j s∗ j (22) s′∗ i = s−∗ i − s−∗ i−1 (23) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 15 / 54
  • 16.
    Dynamic programming Contents 1 Newsboymodel 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 16 / 54
  • 17.
    Dynamic programming Dynamicsystems and dynamic programming 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 17 / 54
  • 18.
    Dynamic programming Dynamicsystems and dynamic programming General structure of finite horizon optimal control problems2 The model has two principal features: 1 a discrete-time dynamic system, 2 a cost function that is additive over time. 2 Dimitri P. Bertsekas (2005). Dynamic Programming and Opimal Control. Athena Scienific. Vol 1. 4th ediion. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 18 / 54
  • 19.
    Dynamic programming Dynamicsystems and dynamic programming Discrete time dynamic system This system has the form xt+1 = ft(xt, ut, wt), t = 0, 1, · · · , T − 1, (24) where t indexes discrete time xt ∈ St is the state of the system abd summarizes past information that is relevant for future optimization, ut ∈ Ut(xt) ⊂ Ct is the control or decision variable to be selected at time t, wt ∈ Wt is a random parameter featured by Pr [·|xt, ut], ft is a function that describes the system and in particular the mechanism by which the state is updated. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 19 / 54
  • 20.
    Dynamic programming Dynamicsystems and dynamic programming The optimization problem The cost function is denoted by gt, accumulates over time. We therefore formulate the problem as an optimization of the expected cost mimize to E [ gT (xT ) + T−1∑ t=0 gt(xt, ut, wt) ] (25) Definition The class of policies consist of a sequence of functions, π = {µt}T−1 t=0 = {µ0, . . . , µT−1} (26) where µt : St ∋ xt −→ µt(xt) ∈ Ct, t = 0, . . . , T − 1. (27) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 20 / 54
  • 21.
    Dynamic programming Dynamicsystems and dynamic programming Definition If µt is such that µt(xt) ∈ Ut(xt) for all xt ∈ St, such policies π will be called admissible. The set of all admissible policies is denoted by Π. Given x0 and an admissible policy π = {µt}T−1 t=0 , xt, wt are random valuables with distributions defined through the system eqution xt+1 = ft(xt, µ(xt), wt), t = 0, . . . , T − 1. (28) Thus, the expected cost of π starting at x0 is Jπ(x0) = EX1···XT ,W1···WT [ gT (xT ) + T−1∑ t=0 gt(xt, µt(xt), wt) ] (29) An optimal policy π∗ is one that minimizes this cost; i.e., Jπ∗ (x0) = min π∈Π Jπ(x0). (30) Note that this is typically equal to J∗(x0). @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 21 / 54
  • 22.
    Dynamic programming TheDP algorithm 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 22 / 54
  • 23.
    Dynamic programming TheDP algorithm Pinciple of optimality Theorem For every initial state x0, the optimal cost J∗(x0) of the basic problem is equal to J0(x0) given by the last step of the following algorithm, which proceeds backward in time from period T − 1 to period 0:    JT (xT ) = gT (xT ), Jt(xt) = min ut∈Ut(xt) EWt [gt(xt, ut, wt) + Jt+1(ft(xt, ut, wt))], t = 0, . . . , T − 1, (31) (32) where the expectation is taken with respect to the probability distributiion of wt, which depends on xt, ut. Furthermore, if u∗ t = µ∗ t (xt) minimize the right side of Eq.(32) for each xt, t, the policy π∗ = {µ∗ t }T−1 t=0 is optimal. Proof. Appendix. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 23 / 54
  • 24.
    Dynamic programming Optimalstopping problems 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 24 / 54
  • 25.
    Dynamic programming Optimalstopping problems Optimal stopping problems wt The asset value xt ∈ {⊤, not sell} ut ∈ {sell, not sell} The system of the optimal stopping promblem is denoted by xt+1 = { ⊤ xt = ⊤ or ut = sell, wt otherwise, t = 0, · · · , T − 1 (33) (34) and, the mean of gross profiit is Eq.(29) where gT (xT ) = { xT xT ̸= ⊤ 0 otherwise (35) (36) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 25 / 54
  • 26.
    Dynamic programming Optimalstopping problems and gt(xt, ut, wt) = { (1 + r)T−t xt xT = ⊤ and ut = sell 0 otherwise (37) (38) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 26 / 54
  • 27.
    Dynamic programming Optimalstopping problems DP alg. of the optimal stopping problem Initialization JT (xT ) = { xT xT ̸= ⊤ 0 otherwise (39) (40) Iteration Jt(xt) = { max { (1 + t)T−t xt, E[Jt+1(wt)] } xt ̸= ⊤ 0 otherwise t = T − 1, · · · , 0 (41) (42) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 27 / 54
  • 28.
    Dynamic programming Deteministicdynamic programming 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 28 / 54
  • 29.
    Dynamic programming Deteministicdynamic programming Finite discrete time deterministic dynamic programming problem • St, t = 0, · · · , T are finte. • wt can be eliminated. Thus, the problem results in the shortest path problem3. 3 http://web.mit.edu/15.053/www/AMP-Chapter-11.pdf @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 29 / 54
  • 30.
    Dynamic programming Deteministicdynamic programming Backward dynamic programming algorithm Initialization JT (i) = cT i,n+1, ∀ i ∈ ST (43) Iteration Jt(i) = min j∈St+1 [ ct ij + Jt+1(j) ] ∀ i ∈ St (44) Optimal value J∗ = min j∈S0 [J0(j)] (45) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 30 / 54
  • 31.
    Dynamic programming Deteministicdynamic programming Forward dynamic programming algorithm Initialization J0(i) = 0, ∀ i ∈ S0 (46) Iteration Jt(j) = min j∈St−1 [ ct−1 ij + Jt−1(i) ] ∀ j ∈ St (47) Optimal value J∗ = min j∈ST [ cT i,n+1 + JT (i) ] (48) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 31 / 54
  • 32.
    Dynamic programming Infinitehorizon dynamic programming problems 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 32 / 54
  • 33.
    Dynamic programming Infinitehorizon dynamic programming problems Infinite horizon dynamic programming problems Definition The problem aims at minimizing Jπ(x0) = lim T→∞ EW0W1···WT [T−1∑ t=0 αt gt(xt, ut(xt), wt) ] . (49) α is a discount factor.0 < α ≤ 1 @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 33 / 54
  • 34.
    Dynamic programming Infinitehorizon dynamic programming problems There are following three types of the problem. 1 Stochastic shortest path problems.(α = 1, lim T→∞ cT i,n+1 = 0) 2 Discounted dynamic programming problems.(α < 1, |g(x, u, w)| < ∞) 3 Average cost dynamic programming problems.(Jπ(x0) = lim T→∞ 1 T EW0W1···WT [T−1∑ t=0 gt(xt, µt(xt), wt) ] ) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 34 / 54
  • 35.
    Dynamic programming Infinitehorizon dynamic programming problems Stochastic shortest path problems Stochastic shortest path problems Notation and Assumption; pij(u) = Pr [xt+1 = j|xt = j, ut = u], i, j = 1, . . . , n g(i, u) = ∑ j∈S pij(u)g(i, u, j) p0,0(u) = 1,∀ u ∈ U g(0, u) = 0,∀ u ∈ U ∀ π ∈ Π,∃ m ∈ {1, . . . , n}, ρπ := max i=1,...,n Pr [xm ̸= ⊤|x0 = i, π] < 1 Note The results to be presented are valid under more general circumstances.Furthermore, we can always use m = n. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 35 / 54
  • 36.
    Dynamic programming Infinitehorizon dynamic programming problems Theorem 1 Given any initial conditions J0(i) = 0, i = 1, . . . , n, the sequence Jt(i) generated by the iteration Jt+1(i) = min u∈U(i)  g(i, u) + n∑ j=1 pij(u)Jt(j)  , ∀ i = 1, · · · , n, (50) converge to the optimal cost J∗(i) for each i. 2 The optimal costs J∗(i), i = 1, . . . , n satisfy Bellman’s equation, J∗ (i) = min u∈U(i)  g(i, u) + n∑ j=1 pij(u)J∗ (j)  , ∀ i = 1, · · · , n, (51) and in fact they are the unique solution of this equation. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 36 / 54
  • 37.
    Dynamic programming Infinitehorizon dynamic programming problems Corollary 1 For any stationary policy µ, the costs Jµ(i), i = 1, . . . , n are the unique solution of the equation Jµ(i) = g(i, µ(i)) + n∑ j=1 pij(µ(i))Jµ(i), ∀ i = 1, · · · , n. (52) Furthermore, given any initial conditions J0(i), i = · · · , n, the sequence Jk(i) generated by the DP alg. Jt+1(i) = g(i, µ(i)) + n∑ j=1 pij(µ(i))Jt(i), ∀ i = 1, · · · , n. (53) 2 A stationary policy µ is optimal if and only if for every state i, µ(i) attains the minimum in Bellman’s equation. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 37 / 54
  • 38.
    Dynamic programming Infinitehorizon dynamic programming problems Computational method for stochastic shortest path problems 1 Value iteration 2 Policy iteration 3 Linear programming @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 38 / 54
  • 39.
    Dynamic programming Infinitehorizon dynamic programming problems Value iteration Alg. of the value iteration Initialization J0(i) = 0, ∀ i = 1, · · · , n. (54) Iteration Jt+1(i) = min u∈U(i)  g(i, u) + n∑ j=1 pij(u)Jt(j)   (55) Note: • Infinite iterations are needed. • The convergence speed is equal to ρK. • Jt+1(j) + (N∗(j) − 1)ct ≤ J∗(j) ≤ Jµt (j) ≤ Jt+1(j) + (Nt(j) − 1)ct (Appendix) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 39 / 54
  • 40.
    Dynamic programming Infinitehorizon dynamic programming problems Policy iteration Alg. of the policy iteration Initialization J0(i) = 0, ∀i = 1, · · · , n. µ0: A stational admissible policy. Iteration (unless Jµt+1 (i) = Jµt (i)) • Policy evaluation step (Jµt(i)(i) ← J(i)) J(i) = g(i, µt(i)) + n∑ j=1 pij(µt(i))J(j), ∀ i = 1, · · · , n (56) • Policy improvemnt step µt+1 = arg min u∈U(i)  g(i, u) + n∑ j=1 pij(u)Jµt(j)(j)  , ∀ i = 1, · · · , n (57) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 40 / 54
  • 41.
    Dynamic programming Infinitehorizon dynamic programming problems Finite termination of the policy iteration Under that assumption, the alg. generates an improving sequence of the polucies [i.e. Jµt+1 (i) ≤ Jµt (i), ∀i, ∀t] and finitely (because S, U are finite sets) terminates with an optimal policy. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 41 / 54
  • 42.
    Dynamic programming Infinitehorizon dynamic programming problems Linear programming It holds that Jt(i) ≤ Jt+1(i), and J0(i) ≤ J∗(i), thus J∗ is the “largest” J that satisfies the constraint J(i) ≤ g(i, u) + n∑ j=1 pij(u)J(j), ∀ i = 1, · · · , n, u ∈ U(i). (58) In particular, J∗(i) solve the linier program of maximizing n∑ i=1 J(i) subject to the constraint. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 42 / 54
  • 43.
    Dynamic programming Infinitehorizon dynamic programming problems Discounted dynamic programming problems This problem can be converted to a stochastic shortest path problem by replacing pij(u) with αpij(u). @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 43 / 54
  • 44.
    Dynamic programming Infinitehorizon dynamic programming problems Average cost dynamic programming problems The problem is essentially equivalent to a stochastic shortest path problem. 1 The optimal average cost is independent of the initial state. 2 Bellman’s equation takes the form λ + h(i) = min u∈U(i)  g(i, u) + n∑ j=1 pij(u)h(j)   (59) where h(n) = 0, λ is the optimal average cost. 3 There are versions of the iteration. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 44 / 54
  • 45.
    Optimality of themulti inventory system Contents 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 45 / 54
  • 46.
    Optimality of themulti inventory system Optimality of the multi inventory system Notation t Time It Net Inventory (xt) qt order (ut) Dt Demand (wt) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 46 / 54
  • 47.
    Optimality of themulti inventory system Inventory control System It+1 = It + qt − Dt Total expected cost E [T−1∑ t=0 (h max {It + qt − Dt, 0} + b max {Dt − It − qt, 0}) ] + C(IT ) (60) DP algorithm    JT (IT ) = C(IT ) Jt(It) = min qt≥0 [Ht(It + qt) + E[Jt+1(It + qt − Dt)]] (61) (62) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 47 / 54
  • 48.
    Optimality of themulti inventory system We can rewrite Eq.(62), Jt(It) = min yt≥It [H(yt) + E[Jt+1(yt − Dt)]] = min yt≥It g(yt) (63) Optimal policies and the cost Optimal policy µ∗ t (It) =    arg min y∈R gt(y) − It (It < arg min y∈R gt(y) =: St) 0 (otherwise) (64) (65) Optimal cost Jt(It) = { H(St) + E[Jt+1(St − Dt)] (It < St) H(It) + E[Jt+1(St − Dt)] (otherwise) (66) (67) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 48 / 54
  • 49.
    Optimality of themulti inventory system Positive fixed cost When C(qt) = { K (qt > 0), 0 (otherwise). (68) (69) DP algorithm    JT (IT ) = C(IT ) Jt(It) = min { H(It) + E[Jt+1(It − Dt)], min yt≥It [K + Ht(yt) + E[Jt+1(yt − Dt)]] } @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 49 / 54
  • 50.
    Optimality of themulti inventory system Note: (s, S) policies st := min gt(y)=K+gt(St) y, µ∗ t (It) = { St − It (It < st) 0 (It ≥ st) (70) (71) is the optimal policy. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 50 / 54
  • 51.
    Stochastic inventory modelcosidering prices Contents 1 Newsboy model 2 Optimal policies for a multi-echelon inventory system Real stock model Echelon inventory model 3 Dynamic programming Dynamic systems and dynamic programming The DP algorithm Optimal stopping problems Deteministic dynamic programming Infinite horizon dynamic programming problems Stochastic shortest path problems Discounted dynamic programming problems Average cost dynamic programming problems 4 Optimality of the multi inventory system 5 Stochastic inventory model cosidering prices @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 51 / 54
  • 52.
    Stochastic inventory modelcosidering prices Stochastic inventory model cosidering prices Notation h :inventory cost. c :order cost. b :back order cost p :price (u) D :demand (w).D(p, ε) = y(p) + ε, y(p) = −a(p − P0) + D0, ε ∼ F s :amount of order. z = s − y(p) R :revenue, R(z, p) = { p(y(p) + ε) − c(y(p) + z) − h(z − ε) ε ≤ z p(y(p) + z) − c(y(p) + z) − b(ε − z) ε > z (72) (73) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 52 / 54
  • 53.
    Stochastic inventory modelcosidering prices Optimal z, p Optimal z = z∗ F(z∗ ) = p + b − c h + p + b (74) Optimal price p = p∗ p∗ = aP0 + D0 + ac − ∫ ∞ z (x − z)f(x)dx 2a (75) @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 53 / 54
  • 54.
    Stochastic inventory modelcosidering prices References 久保幹雄  (2007)  『ロジスティクスの数理』 共立出版 Dimitri P. Bertsekas (2005). Dynamic Programming and Opimal Control. Athena Scienific. Vol 1. 4th ediion. @nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 54 / 54