Inventory theory presentation

Applied Mathematics of Logistics
-Mathematics of inventory-
@nobo0409
The University of Tokyo
Nishinari labotory B4
July 9, 2019
@nobo0409 (Nishinari lab.) Applied Mathematics of Logistics July 9, 2019 1 / 54

Contents
1 Newsboy model
2 Optimal policies for a multi-echelon inventory system
Real stock model
Echelon inventory model
3 Dynamic programming
Dynamic systems and dynamic programming
The DP algorithm
Optimal stopping problems
Deteministic dynamic programming
Inﬁnite horizon dynamic programming problems
Stochastic shortest path problems
Discounted dynamic programming problems
Average cost dynamic programming problems
4 Optimality of the multi inventory system
5 Stochastic inventory model cosidering prices

Newsboy model
Contents
1 Newsboy model
Real stock model
The DP algorithm

Newsboy model
Newsboy model
Newsboy model
• A newsboy sells only one kind of newspaper.
• The inventory cost h and the shortage cost b are constant.
• The demand D ≥ 0 is a random valuable ∼ F.
• The amount of stock s ≥ 0 is determined by
minimize to C(s) = E
[
h [s − D]+
+ b [s − D]−]
. (1)
Optimal s = s∗
F(s∗
) = Pr[D ≤ s∗
] =
b
b + h
(=: ω critical ratio) (2)

Optimal policies for a multi-echelon inventory system
Contents
1 Newsboy model
Real stock model
The DP algorithm

Optimal policies for a multi-echelon inventory system
Setup1
Multi inventory model
• Time t ≥ 0 is continious.
• Stationary process.
• We can order some pieces of newspaper anytime.
1
J. Clark, Andrew & Scarf, Herbert. (2004). Optimal Policies for a Multi-Echelon Inventory Problem.
Management Science. 50. 1782-1790.

Optimal policies for a multi-echelon inventory system Real stock model
1 Newsboy model
Real stock model
The DP algorithm

Notation
I′
i(t) The real amount of inventory.
B′
i(t) The amount of back order.
IN′
i(t) Net Inventory.
IOi(t) Inventory on Order.
ITi(t) Inventory in transit.
IOP′
i (t) Inventory Ordering Position.
ITP′
i (t) Inventory Transit Position.
L′
i Lead time.
D(s, t] Demand(r.v.).
s′
i base stock level.
b unit time.
h′
i unit time at i.

By the deﬁnition, it follows that
IN′
i(t) = Ii(t) − B′
i(t), I′
i(t) = [IN′
i(t)]+
, B′
i(t) = [IN′
i(t)]−
, (3)
IOi(t) − ITi(t) = B′
i(t), (4)
IOP′
i (t) = IN′
i(t) + IOi(t), (5)
ITP′
i (t) = IN′
i(t) + ITi(t), (6)
IOP′
i (t) − ITP′
i (t) = B′
i+1(t), (7)
IOP′
i (t) = s′
i. (8)
And, inventory ﬂow conserve.
IN′
i(t + L′
i) = ITP′
i (t) − D(t, t + L′
i] (9)
then
IN′
i(t + L′
i) = s′
i − B′
i+1(t) − D(t, t + L′
i]. (10)

Now, the process which r.v. D follows is not depend of t. So, we can denote D(t, t + L′
i] = Di.
The method to calculate back order B′
i
By (10) and B′
i(t) = [IN′
i(t)]−, we can get the folowing method
{
B′
i = 0 i = n + 1,
B′
i = [s′
i − B′
i+1 − Di]−
i = n, n − 1, · · · , 1.
(11)
(12)

Optimal policies for a multi-echelon inventory system Echelon inventory model
1 Newsboy model
Real stock model
The DP algorithm

Echelon invemtory model
Definition
Echelon inventory is defined as the inventory between a stage in the supply chain and the final
customer.
Re-notetion
Ii(t) := I′
i(t) +
i−1∑
j=1
{
ITj(t) + I′
j(t)
}
B(t) := B′
1(t)
INi(t) := Ii(t) − B(t)
IOPi(t) := INi(t) + IOi(t)
ITPi(t) := INi(t) + ITi(t)
si Echelon base stock level. IOPi(t) ≡ si
hi := h′
i − h′
i+1

The conservation of inventory ﬂow.
INi(t + L′
i) = ITPi(t) − D(t, t + L′
i] (13)
If we have an enough amount of stock, we can get the amount equal to the order
IOPi(t) = si, otherwise, only get Net Inventory INi+1(t). It means that
ITPi(t) = min {si, INi+1(t)}. (14)
Then, we have the equilibrium solution.
The equilibrium solution of the echelon inventory.



IOPn = sn,
INi = ITPi − Di,
ITPi = min {si, INi+1}
(15)
(16)
(17)

The optimization of si
The optimization problem
minimize to E
[ n∑
i=1
hiINi + (b + h′
1)B
]
(18)
We are going to optimize it in order from the bottom. The justiﬁcation will be proved later.
Notation
Ci(x) The minimum cost up to the i step when INi+1 = x.
Ci(x) The minimum cost up to the i step when INi = x.
Ci(y) The minimum cost up to the i step when ITPi = x.

The algorithm to calculate an optimal stock s∗
i
Initialization
C0(x) = (b + h′
1)[x]−
(19)
Determine Ci, Ci, Ci
Ci(x) = hix + Ci−1(x) → Ci(y) = E
[
Ci(y − Di)
]
(20)
Ci(x) = Ci(min {s∗
i , x}) (21)
Optimize si, s′
i
s∗
i = arg min
y>0
Ci(y) → s−∗
= min
i≤j
s∗
j (22)
s′∗
i = s−∗
i − s−∗
i−1 (23)

Dynamic programming
Contents
1 Newsboy model
Real stock model
The DP algorithm

Dynamic programming Dynamic systems and dynamic programming
1 Newsboy model
Real stock model
The DP algorithm

General structure of ﬁnite horizon optimal control problems2
The model has two principal features:
1 a discrete-time dynamic system,
2 a cost function that is additive over time.
2
Dimitri P. Bertsekas (2005). Dynamic Programming and Opimal Control. Athena Scieniﬁc. Vol 1. 4th
ediion.

Discrete time dynamic system
This system has the form
xt+1 = ft(xt, ut, wt), t = 0, 1, · · · , T − 1, (24)
where
t indexes discrete time
xt ∈ St is the state of the system abd summarizes past information that is relevant for
future optimization,
ut ∈ Ut(xt) ⊂ Ct is the control or decision variable to be selected at time t,
wt ∈ Wt is a random parameter featured by Pr [·|xt, ut],
ft is a function that describes the system and in particular the mechanism by which
the state is updated.

The optimization problem
The cost function is denoted by gt, accumulates over time. We therefore formulate the
problem as an optimization of the expected cost
mimize to E
[
gT (xT ) +
T−1∑
t=0
gt(xt, ut, wt)
]
(25)
Deﬁnition
The class of policies consist of a sequence of functions,
π = {µt}T−1
t=0 = {µ0, . . . , µT−1} (26)
where
µt : St ∋ xt −→ µt(xt) ∈ Ct, t = 0, . . . , T − 1. (27)

Deﬁnition
If µt is such that µt(xt) ∈ Ut(xt) for all xt ∈ St, such policies π will be called admissible. The
set of all admissible policies is denoted by Π.
Given x0 and an admissible policy π = {µt}T−1
t=0 , xt, wt are random valuables with distributions
deﬁned through the system eqution
xt+1 = ft(xt, µ(xt), wt), t = 0, . . . , T − 1. (28)
Thus, the expected cost of π starting at x0 is
Jπ(x0) = EX1···XT ,W1···WT
[
gT (xT ) +
T−1∑
t=0
gt(xt, µt(xt), wt)
]
(29)
An optimal policy π∗ is one that minimizes this cost; i.e.,
Jπ∗ (x0) = min
π∈Π
Jπ(x0). (30)
Note that this is typically equal to J∗(x0).

Dynamic programming The DP algorithm
1 Newsboy model
Real stock model
The DP algorithm

Dynamic programming The DP algorithm
Pinciple of optimality
Theorem
For every initial state x0, the optimal cost J∗(x0) of the basic problem is equal to J0(x0)
given by the last step of the following algorithm, which proceeds backward in time from period
T − 1 to period 0:



JT (xT ) = gT (xT ),
Jt(xt) = min
ut∈Ut(xt)
EWt [gt(xt, ut, wt) + Jt+1(ft(xt, ut, wt))], t = 0, . . . , T − 1,
(31)
(32)
where the expectation is taken with respect to the probability distributiion of wt, which
depends on xt, ut. Furthermore, if u∗
t = µ∗
t (xt) minimize the right side of Eq.(32) for each
xt, t, the policy π∗ = {µ∗
t }T−1
t=0 is optimal.
Proof.
Appendix.

Dynamic programming Optimal stopping problems
1 Newsboy model
Real stock model
The DP algorithm

wt The asset value
xt ∈ {⊤, not sell}
ut ∈ {sell, not sell}
The system of the optimal stopping promblem is denoted by
xt+1 =
{
⊤ xt = ⊤ or ut = sell,
wt otherwise,
t = 0, · · · , T − 1
(33)
(34)
and, the mean of gross proﬁit is Eq.(29) where
gT (xT ) =
{
xT xT ̸= ⊤
0 otherwise
(35)
(36)

and
gt(xt, ut, wt) =
{
(1 + r)T−t
xt xT = ⊤ and ut = sell
0 otherwise
(37)
(38)

DP alg. of the optimal stopping problem
Initialization
JT (xT ) =
{
xT xT ̸= ⊤
0 otherwise
(39)
(40)
Iteration
Jt(xt) =
{
max
{
(1 + t)T−t
xt, E[Jt+1(wt)]
}
xt ̸= ⊤
0 otherwise
t = T − 1, · · · , 0
(41)
(42)

Dynamic programming Deteministic dynamic programming
1 Newsboy model
Real stock model
The DP algorithm

Finite discrete time deterministic dynamic programming problem
• St, t = 0, · · · , T are ﬁnte.
• wt can be eliminated.
Thus, the problem results in the shortest path problem3.
3
http://web.mit.edu/15.053/www/AMP-Chapter-11.pdf

Backward dynamic programming algorithm
Initialization
JT (i) = cT
i,n+1, ∀
i ∈ ST (43)
Iteration
Jt(i) = min
j∈St+1
[
ct
ij + Jt+1(j)
] ∀
i ∈ St (44)
Optimal value
J∗
= min
j∈S0
[J0(j)] (45)

Forward dynamic programming algorithm
Initialization
J0(i) = 0, ∀
i ∈ S0 (46)
Iteration
Jt(j) = min
j∈St−1
[
ct−1
ij + Jt−1(i)
]
∀
j ∈ St (47)
Optimal value
J∗
= min
j∈ST
[
cT
i,n+1 + JT (i)
]
(48)

Dynamic programming Inﬁnite horizon dynamic programming problems
1 Newsboy model
Real stock model
The DP algorithm

Deﬁnition
The problem aims at minimizing
Jπ(x0) = lim
T→∞
EW0W1···WT
[T−1∑
t=0
αt
gt(xt, ut(xt), wt)
]
. (49)
α is a discount factor.0 < α ≤ 1

There are following three types of the problem.
1 Stochastic shortest path problems.(α = 1, lim
T→∞
cT
i,n+1 = 0)
2 Discounted dynamic programming problems.(α < 1, |g(x, u, w)| < ∞)
3 Average cost dynamic programming
problems.(Jπ(x0) = lim
T→∞
1
T
EW0W1···WT
[T−1∑
t=0
gt(xt, µt(xt), wt)
]
)

Notation and Assumption;
pij(u) = Pr [xt+1 = j|xt = j, ut = u], i, j = 1, . . . , n
g(i, u) =
∑
j∈S
pij(u)g(i, u, j)
p0,0(u) = 1,∀ u ∈ U
g(0, u) = 0,∀ u ∈ U
∀
π ∈ Π,∃
m ∈ {1, . . . , n}, ρπ := max
i=1,...,n
Pr [xm ̸= ⊤|x0 = i, π] < 1
Note
The results to be presented are valid under more general circumstances.Furthermore, we can
always use m = n.

Theorem
1 Given any initial conditions J0(i) = 0, i = 1, . . . , n, the sequence Jt(i) generated by the
iteration
Jt+1(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)Jt(j)

, ∀
i = 1, · · · , n, (50)
converge to the optimal cost J∗(i) for each i.
2 The optimal costs J∗(i), i = 1, . . . , n satisfy Bellman’s equation,
J∗
(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)J∗
(j)

, ∀
i = 1, · · · , n, (51)
and in fact they are the unique solution of this equation.

Corollary
1 For any stationary policy µ, the costs Jµ(i), i = 1, . . . , n are the unique solution of the
equation
Jµ(i) = g(i, µ(i)) +
n∑
j=1
pij(µ(i))Jµ(i), ∀
i = 1, · · · , n. (52)
Furthermore, given any initial conditions J0(i), i = · · · , n, the sequence Jk(i) generated
by the DP alg.
Jt+1(i) = g(i, µ(i)) +
n∑
j=1
pij(µ(i))Jt(i), ∀
i = 1, · · · , n. (53)
2 A stationary policy µ is optimal if and only if for every state i, µ(i) attains the minimum
in Bellman’s equation.

Computational method for stochastic shortest path problems
1 Value iteration
2 Policy iteration
3 Linear programming

Value iteration
Alg. of the value iteration
Initialization
J0(i) = 0, ∀
i = 1, · · · , n. (54)
Iteration
Jt+1(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)Jt(j)

 (55)
Note:
• Inﬁnite iterations are needed.
• The convergence speed is equal to ρK.
• Jt+1(j) + (N∗(j) − 1)ct ≤ J∗(j) ≤ Jµt (j) ≤ Jt+1(j) + (Nt(j) − 1)ct (Appendix)

Policy iteration
Alg. of the policy iteration
Initialization J0(i) = 0, ∀i = 1, · · · , n. µ0: A stational admissible policy.
Iteration (unless Jµt+1 (i) = Jµt (i))
• Policy evaluation step (Jµt(i)(i) ← J(i))
J(i) = g(i, µt(i)) +
n∑
j=1
pij(µt(i))J(j), ∀
i = 1, · · · , n (56)
• Policy improvemnt step
µt+1 = arg min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)Jµt(j)(j)

, ∀
i = 1, · · · , n (57)

Finite termination of the policy iteration
Under that assumption, the alg. generates an improving sequence of the polucies [i.e.
Jµt+1 (i) ≤ Jµt (i), ∀i, ∀t] and ﬁnitely (because S, U are ﬁnite sets) terminates with an optimal
policy.

Linear programming
It holds that Jt(i) ≤ Jt+1(i), and J0(i) ≤ J∗(i), thus J∗ is the “largest” J that satisﬁes the
constraint
J(i) ≤ g(i, u) +
n∑
j=1
pij(u)J(j), ∀
i = 1, · · · , n, u ∈ U(i). (58)
In particular, J∗(i) solve the linier program of maximizing
n∑
i=1
J(i) subject to the constraint.

This problem can be converted to a stochastic shortest path problem by replacing pij(u) with
αpij(u).

The problem is essentially equivalent to a stochastic shortest path problem.
1 The optimal average cost is independent of the initial state.
2 Bellman’s equation takes the form
λ + h(i) = min
u∈U(i)

g(i, u) +
n∑
j=1
pij(u)h(j)

 (59)
where h(n) = 0, λ is the optimal average cost.
3 There are versions of the iteration.

Optimality of the multi inventory system
Contents
1 Newsboy model
Real stock model
The DP algorithm

Notation
t Time
It Net Inventory (xt)
qt order (ut)
Dt Demand (wt)

Inventory control
System It+1 = It + qt − Dt
Total expected cost
E
[T−1∑
t=0
(h max {It + qt − Dt, 0} + b max {Dt − It − qt, 0})
]
+ C(IT ) (60)
DP algorithm



JT (IT ) = C(IT )
Jt(It) = min
qt≥0
[Ht(It + qt) + E[Jt+1(It + qt − Dt)]]
(61)
(62)

We can rewrite Eq.(62),
Jt(It) = min
yt≥It
[H(yt) + E[Jt+1(yt − Dt)]] = min
yt≥It
g(yt) (63)
Optimal policies and the cost
Optimal policy
µ∗
t (It) =



arg min
y∈R
gt(y) − It (It < arg min
y∈R
gt(y) =: St)
0 (otherwise)
(64)
(65)
Optimal cost
Jt(It) =
{
H(St) + E[Jt+1(St − Dt)] (It < St)
H(It) + E[Jt+1(St − Dt)] (otherwise)
(66)
(67)

Positive ﬁxed cost
When
C(qt) =
{
K (qt > 0),
0 (otherwise).
(68)
(69)
DP algorithm



JT (IT ) = C(IT )
Jt(It) = min
{
H(It) + E[Jt+1(It − Dt)], min
yt≥It
[K + Ht(yt) + E[Jt+1(yt − Dt)]]
}

Note:
(s, S) policies
st := min
gt(y)=K+gt(St)
y,
µ∗
t (It) =
{
St − It (It < st)
0 (It ≥ st)
(70)
(71)
is the optimal policy.

Stochastic inventory model cosidering prices
Contents
1 Newsboy model
Real stock model
The DP algorithm

Notation
h :inventory cost.
c :order cost.
b :back order cost
p :price (u)
D :demand (w).D(p, ε) = y(p) + ε, y(p) = −a(p − P0) + D0, ε ∼ F
s :amount of order.
z = s − y(p)
R :revenue,
R(z, p) =
{
p(y(p) + ε) − c(y(p) + z) − h(z − ε) ε ≤ z
p(y(p) + z) − c(y(p) + z) − b(ε − z) ε > z
(72)
(73)

Optimal z, p
Optimal z = z∗
F(z∗
) =
p + b − c
h + p + b
(74)
Optimal price p = p∗
p∗
=
aP0 + D0 + ac −
∫ ∞
z
(x − z)f(x)dx
2a
(75)

References
久保幹雄 (2007) 『ロジスティクスの数理』共立出版
Dimitri P. Bertsekas (2005). Dynamic Programming and Opimal Control. Athena Scieniﬁc.
Vol 1. 4th ediion.

Inventory theory presentation

More Related Content

What's hot

Similar to Inventory theory presentation

Recently uploaded

Inventory theory presentation