Dissertation

1
UNIVERSITY OF EDINBURGH
Call Centre Staffing Problem:
Application in Erlang-A Model
By
Yanran Zhu
Dissertation presented for Honours Degree in Bachelor of
Science of Mathematics and Business Studies
2014/2015
Name of Dissertation Advisor:
Dr. Belen Martin-Barragan

2
Abstract
Call centres have already become the ‘public face’ of the firms. Many researchers are
dedicated to the study in call center in order to achieve scientific management. The most
common models used for telephone call centre designs include the Erlang-C (M/M/N);
Erlang-B (M/M/N/N) and Erlang-A model (M/M/N+M), but only Erlang-A model
incorporates that most important human factor- the impatient customers. The symbol ‘+M’
in Erlang-A represent exponentially distributed patience time of customers.
This dissertation focuses on the applications of Erang-A model in medium-to-large sized
call centre. We first outline two methods: exact method and approximation method, which
are used to address the performance measures of Erlang-A model in quality-and-
efficiency regimes. Each method has its own benefits and drawbacks, we implement them
in MATLAB software. It is necessary to mention that we overcome Garnett et al. (2002)
‘numerical difficulties’ problem by using log transformations in exact method. Then, a
simulation-based method also derived in MATLAB for Erlang-A model with comparison
to exact method. Finally, we also compare those three methods and our recommendation
is to blend those methods: use approximation and exact methods for insight and
calibration, and use simulation method for fine tuning.
Limitations on methods and Erlang-A model are shown in final chapter. Our
recommendations to the business is to establish large call centers, which will be benefit
from economics of scales and we also provide call center managers with useful
instructions on applications.

3
Table of Contents
Chapter 1: Introduction...........................................................................................................................6
Chapter 2: Overview of Queuing theory.............................................................................................8
2.1 Kendall’s Notation........................................................................................................................8
2.2 Performance Measure.................................................................................................................9
2.3 Queuing Models: Erlang-A, Erlang-B, Erlang-C....................................................................11
2.4 Birth-and-death process of Erlang A model...........................................................................12
2.5 Quasi-birth-death process QBD ..............................................................................................13
2.6 Significance of Abandonment in call centre staffing problems ...........................................13
Chapter 3: Literature Review of Call Centre Staffing Problem...................................................14
Part A: Staffing Problem...................................................................................................................16
3.1 Operational Regimes.................................................................................................................17
3.2 QED; ED; QD; QED+ED Approximation ................................................................................18
3.3 Fluid and Diffusion Approximation ..........................................................................................22
3.4 Retrials and Balking...................................................................................................................23
Part B: Joint Design, Staffing and Control Problem.................................................................24
Chapter 4: Erlang-A Example Paper..................................................................................................26
4.1 Exact Method..................................................................................................................................26
4.2 Approximation Method ..................................................................................................................28
4.3 Experiments design and Empirical Results ...............................................................................30
4.4 Critical Review................................................................................................................................33
4.5 Sensitivity Analysis ........................................................................................................................35
Chapter 5: Three Experiments Design..............................................................................................38
5.1 Introduction .....................................................................................................................................38
5.2 Exact Method..................................................................................................................................39
5.3 Approximation Method ..................................................................................................................40
5.4 Simulation Method.........................................................................................................................41
Chapter 6: Computational Results ....................................................................................................43
6.1 Exact Results using MATLAB......................................................................................................43
6.2 Approximation Results using MATLAB ......................................................................................45
6.3 Simulation Results using MATLAB .............................................................................................46
6.4 Comparison among Three approaches......................................................................................49
Chapter 7: Conclusions........................................................................................................................51
7.1 Limitations of Erlang-A model......................................................................................................52
7.2 Implementation in Business .........................................................................................................57

4
7.3 Future Research ............................................................................................................................58
Reference Lists:......................................................................................................................................59
Appendices:.............................................................................................................................................64
Appendix A: Approximation MATLAB Code.....................................................................................65
Appendix B: Exact & Simulation Matlab Code.................................................................................69
List of Tables, Figures, Diagram:
Figure 1: Operational Scheme of a simple call centre.
Figure 2: A schematic representation of the Erlang-A model.
Figure 3: A schematic representation of the H2/E3/1 queue.
Figure 4: Summary of practical recommendations.
Figrue 5: A schematic representation of the the V-model.
Figure 6: Performance measure of the form E[f(V, X)].
Figure 7: Probability to abandon vs. average waiting time.
Figure 8: Approximation on P{W>0}.
Figure 9: Approximation on P{Abandon}.
Figure 10: Approximation on P{W>10sec}.
Figure 11: Approximation on P{Ab|W>10sec}.
Figure 12: A comparison of the Erlang-A exact method against real data.
Figure 13: A comparison of Erlang-A approximation data against real data.
Figure 14: Comparison between our simulation and exact calculation method of
four performance measures. (maxiter = 10000).
four performance measures (maxiter = 1000).
four performance measures (maxiter = 100).
Table 1: Performance measure of M/M/1 model.

5
Table 2: Call canter queueing models.
Table 3: Exact and Approximation staffing level of four regimes.
Table 4: Results for 9 experiments as a base.
Table 5: The abandonment-rate elasticities ε(f, θ) of several performance measures.
Table 6: Arrival-rate elasticities ε (f, λ).
Table 7: Service-rate elasticities ε (f, μ).
Table 8: The second derivatives of the performance measures of service rate.
Table 9: Approximation formula of Erlang-A model.
Table 10: Approximation method comparison between our results and Garnett’s results.
Table 11: Comparison among all three methods
Table 12: Large System with high abandonment rate, high call volume P{W > 0} = ϵ; 𝜃 = 100;
𝜆 = 3000.
Table 13: Small System with high abandonment rate, low call volume.: {𝑊 > 0} = 𝜖; 𝜃 = 10; 𝜆 =
30
Table 14: Economies of Scale Staffing Example.
Diagram 1: Transition state diagram of Erlang-A model.

6
Chapter 1: Introduction
Call centres have become an invaluable function for a wide range of companies,
particularly because they serve as the “public face” of a business. Most academic journals
concerning call centre-based research address the following main topics: Forecasting
(e.g. Weinberg et al., 2007; Soyer and Tarimcilar, 2007; Shen and Huang, 2007);
Capacity Planning (e.g. Keblis and Chen, 2006)); Queueing (e.g. Koole and Mandelbaum,
2002; Mandelbaum and Zeltyn, 2006); and, Personnel Scheduling (e.g. Gans et al., 2003;
Saltzman, 2005; Saltzman and Mehrotra, 2007). However, only the fourth topic is
considered in this dissertation.
Personnel scheduling is crucial in call centres. Indeed, staffing costs account for nearly
60%-80% of the overall operational budget (Aksin et al., 2007). The aim of personal
scheduling is to ensure that the right number of staff members (staffing) with the right
skills are allocated (known as ‘routing’) to the right places, at the right times (known as
‘scheduling’) in order to meet the uncertain, time-varying demands for the call centre’s
service.
A central challenge in managing any service operation is to achieve a balance between
operational efficiency and service quality. This is an essential requirement for call centres
as they may receive thousands of calls per day, every one of which demands a response
within a few seconds. The call centre manager has to maximize the server’s utilization
and short mean waiting time for customers, and must also control operating costs.
Therefore, call centre managers need to firstly determine the operation regime, in which
there is an acceptable balance of quality and efficiency. In addition, they also need to
consider the following performance measures: P{Ab}, P{W>0}, P{W>t}, and P{Ab  W>t}.
A typical representation of call centre system is shown in Figure 1. The incoming calls
form an invisible queue waiting for service from one of n identical servers. There is a total
of (k+n) lines in the telephone trunk-line. If a customer’s phone call arrives when all the
lines are occupied, the customer shall encounter a busy signal, and is presented with two
choices under the circumstances: (1) give up directly (‘bulking’); or, (2) try again later
(‘retrial’). But if a customer arrives when there are free lines (whereby all call centre agents
are busy but the number of customers contacting the call centre is between n and (k+n)),
then his/her call can be placed in the queue. However, during the waiting time, if the

7
customer loses patience before being serviced, then he or she will hang up (this is termed
‘Abandonment’); if the customer’s patience is larger than the waiting time, then the
customer will wait until an agent answers the call. After a customer abandons the queue,
he or she may phone up and re-join the queue (Mandelbaum and Zeltyn, 2005).
Figure 1: Operational Scheme of a simple call centre.
This dissertation mainly considers three methods in addressing the Erlang-A (‘A’ for
‘Abandonment’) model for staffing problems in medium-sized and large-sized call centres.
The structure of this dissertation is as follows:
 Chapter 2 provides an overview of queuing theories that are mainly used in
addressing such staffing problems.
 Chapter 3 examines related literature on call centre staffing problems, namely two
issues: staffing problem and joint staffing, design and control problem. More
attentions are paid to four sub-questions of staffing problem.
 In Chapter 4, and in reference to Garnett et al., (2002), two methods used to
address the Erlang-A problem are presented and reviewed: the exact method, and
approximation methods.

8
 In Chapter 5, and by using the same data from Garnett et al., (2002), these two
methods are tested using MATLAB. More importantly, a new simulation model in
MATLAB to model Erlang-A in real world situation is detailed.
 Chapter 6 presents the significant computational results for all three methods
(described in Chapters 4 and 5), as well as comparisons between those methods.
 Chapter 7 draws conclusions with recommendations, limitations, implementations
to business and further researches.
Chapter 2: Overview of Queuing theory
Call centres can be viewed as queuing systems. Due to their increasing abundance in
Western economies, call centres have become the subject of widespread academic
research, namely asymptotic queuing theory. Specifically, simulation models and queuing
models are the two methods that are used to calculate the correct numbers of call centre
staff members required during certain time intervals. Queuing models (Koole and
Mandelbaum, 2002; Mandelbaum and Zeltyn, 2006) yield analytic results, but with real
world simplifications. Simulation models (Mehrotra and Fama, 2003) can take many
practical factors into account, but they are computationally expensive solutions. Queuing
models are more commonly used in call centre applications (Alfares, 2007) whereby they
are used to determine call centre staffing levels in order to satisfy specific service-level
criteria. Wallace and Whitt (2005) developed an algorithm based on queuing methods
and simulation for both staffing and routing, with the aim of minimising the total number
of call centre staff. Alfares (2007) preferred a queuing model to simulation methods to
estimate hourly staffing demands. An integer programming model is then constructed to
find the optimum employee tour schedules that satisfy labour requirements with minimum
cost.
2.1 Kendall’s Notation
Kendall’s Notation characterizes the queuing system in the form A / B / C / X / Y / Z,
whereby A and B indicate the inter-arrival time distribution and service time distribution.

9
The M (for ‘Markovian’) queuing system is the most common distribution. Other
distributions include the following: the E (Erlang) distribution; the D (deterministic)
distribution; the G (general) distribution; the GI distribution (a general distribution, but
wherein successive arrivals are independent of each other); and, the Ph (phase type)
distribution, which includes hypergeometric and Erlang distributions. Moreover, the term
C indicates the number of servers: C = 1 for a single-server system, and C > 1 for a multi-
server system. X is denoted as the system capacity (buffer size). For example, M/M/1/K
is a single server, finite capacity system; M/M/c/ is a multi-server, unlimited capacity
system. Finally, as for Y and Z, Y represents the size of customer population, and Z
indicates the queue scheduling discipline. Normally the default value for system capacity
and population size is infinite and the default setting for the scheduling discipline is First
Come First Serve (FCFS). The simplest queue, M/M/1, has a Markovian arrival process
(Poisson), a Markovian service process (Exponential), and a single server; it also has an
unlimited amount of space to hold waiting customers, an infinite population from which
customers are drawn, and an FCFS scheduling policy.
2.2 Performance Measure
Operational service level is quantified in terms of the congestion or performance
measures. Abandonment level, waiting times and retrials are significant performance
measures, which can be used to measure the natural fit between queueing models and
call centres. Abandonment is measured by the fraction of customers that abandon the
queue prior to being served, denoted as P{Ab}. Waiting is measured by its average speed
of answer (ASA), denoted as E[W] or by some percentile of the waiting time distribution.
A standard ASA for telephone services is 80/20 rule1. Finally, retrials are quantified by
the fraction of customers whose required is satisfied on first attempt. Moreover, those
performance measures are inter-correlated. For example, there is a remarkably linear
relation between the P{Ab} and E[W] (as illustrated later in Chapter 4). Also, in contrast
to waiting time which are objective, abandonment and retrials are subjective as they
incorporate customers’ view on whether the offered service is worth to wait (abandonment)
or returning to (retrials) (Koole and Mandelbaum, 2002). In Chapter 4, we will take
1
80/20 rule requires at least 80% of the customers must wait no more than 20 seconds.

10
account of P{Ab}, P{W>0}, P{Wait>T} and P{Ab|W>T} as the main performance measures,
where P{W>0} is the delay probability, P{Wait>T} is the telephone service factor (TSF).In
addition, delay probability are different according to different operational regimes. In QED
regime, delay probability is P(𝛽) between 0 and 1, in ED regime, delay probability is
P(
𝛽
√ 𝑚
)→ 1and in QD regime, delay probability is P(𝛽√ 𝑚) → 0.
There are some other performance measures which are also necessary to measure the
effectiveness of a queueing system. Those are:
1. The number of customers in the system (𝐿)
2. The rate at which work enters the system (traffic intensity 𝜌)
In order to use the little’s formula, we need to introduce 𝑊 at first. The time a customer
spends in the system, from the instant of arrival at the queue to the instant of departure
from the server, is called the response time (sojourn time), denoted as 𝑊. Response time
𝑊 includes the waiting time in the queue (𝑊𝑞) and the service time (s). In multiserver
system, traffic intensity( = /L) is the rate at which the work enters the system,
whereby  is the average arrival rate of customers, and (1/) is the mean service time.
The most widely used formula in queueing theory is little’s formula. It is simple to state
and intuitive to apply. Little’s formula states that the average number of customers in the
system is the arrival rate of customers to the system multiplied by the response time; this
is written as 𝐿 = 𝜆𝑊. At the same time, 𝐿 𝑞 = 𝜆 ∙ 𝑊𝑞 and 𝐿 𝑆 = 𝜆 ∙ 𝑠, whereby 𝐿 𝑞 and 𝐿 𝑆
are the average number of customers waiting in the queue, and the average number of
customers receiving service. In total:
𝐿 = 𝐿 𝑞 + 𝐿 𝑆 = 𝜆𝑊𝑞 + 𝜆𝑠 = 𝜆(𝑊𝑞 + 𝑠) = 𝜆𝑊
Performance Measure Formula for M/M/1
Mean Number in System
L = (1 − ρ)
ρ
(1 − ρ)2
=
λ
μ − λ
Variance of time in system
Var[L] = ρ
1 + ρ
(1 − ρ)2
− (
ρ
1 − ρ
)
2
=
ρ
(1 − ρ)2
Mean Queue Length Lq = ρL = L − ρ

11
Table 1. Performance measure of M/M/1 model.
2.3 Queuing Models: Erlang-A, Erlang-B, Erlang-C
The Erlang-C2 queuing model M/M/N is preferred in most call centre applications. It was
first proposed by Erlang (1948), and was later formalized by Halfin and Whitt (1981).
However, this model ignores the blocking and customer abandonment. The Erlang-B
model (with ‘B’ for blocking) characterizes the blocking3 (busy-signal) probability for the
associated M/M/N/N system. Specifically, if a customer arrives at a queuing system
(which in turn has n servers but no available waiting position), then this customer will
hang up and never retry. The Erlang-A (‘A’ for Abandonment) model, which incorporates
both busy signals and abandonment, is termed M/M/N+M (‘+M’ means that the patience
time is exponentially distributed). Within the Erlang A model, ( as we have mentioned in
Chapter 1.) patience is defined as the maximum length of time that the customer is willing
to wait for service; if not served within the time, then the customer will abandon the queue.
Recent journal papers tend to focus more on the Erlang-A model. More accurate values
can be computed by using several approximating methods based on Erlang-A, rather
than inputting substitute values into an exact formula.
2. The Erlang C model assumes there are no lost calls or busy signals, but it is possible to overestimate
the staff numbers.
3. Blocking: the percentage of calls that are blocked because not enough lines are available.
Average Response Time
E[W] =
1
μ − λ
Average Waiting Time in the
queue
Wq =
ρ
μ − λ
Queue Size for Nonempty
Queue
L′q =
1
1 − ρ
=
μ
μ − λ

12
2.4 Birth-and-death process of Erlang A model
In the Erlang-A model, the customer arrives at the queuing model according to a Poisson
() process. Each customer is equipped with patience times (X) that are ‘i.i.d.’
(Independent and identically distributed) according to exp(θ). At the same time, the
service times are also i.i.d., according to 𝑒𝑥𝑝(𝜇). Figure 2 is a schematic representation
of the Erlang-A model (Mandelbaum and Zeltyn, 2005), and a representation of the traffic
flow in the Erlang-A model. Compared with the general representation queuing model
(mentioned in Chapter 1), the Erlang-A model is greatly limited, due to the absence of
retrials, and lost calls. Nevertheless, as will be proved in Chapter 6, the Erlang-A model
is still useful both in theory and in practise.
Figure 2. A schematic representation of the Erlang-A model
The transition state diagram 1. of Erlang-A is provided below, 𝐿(𝑡) is the total number of
customers in the system, 𝐿 = {𝐿(𝑡), 𝑡 ≥ 0} is a Markov Birth-and-Death Process. It is
necessary to note that  is the arrival rate,  is the service rate, n is the number of servers,
 is the abandonment rate, and so the average patience is 1/. If the number of customers
exceeds the number of servers, customers will leave the queue at a rate of (n).
Diagram 1. Transition state diagram of Erlang-A model

13
2.5 Quasi-birth-death process QBD
As for the single-server queue, proposed probability laws used to model the inter-arrival
or service time distribution include not only exponential distribution (birth-death process),
but also phase-type distribution (quasi-birth-death process, QBD). Phase-type
distributions include more general situations with distributions such as Erlang and
hypergeometric forms. For example, the M/Er/1 queue is the Erlnag-r service model, and
the 𝐸𝑟/𝑀/1 queue is the Erlnag-r Arrival model. Similarly, 𝑀/𝐻𝑟/1 and 𝐻𝑟/𝑀/1 are
single-server queues with hyperexponential service distribution and hyperexponential
arrival distributions. One example is the 𝐻2/𝐸3/1 queue in Figure 3. The arrival process
of the queue has a two-phase hyperexponential distribution, while the service process is
an Erlang-3 distribution (which has the same value in service rate (hyperexponential), or
1 = 2 = 3 = 3). The graph is shown as follows (Stewart, 2009):
Figure 3. A schematic representation of the 𝐻2/𝐸3/1 queue
2.6 Significance of Abandonment in call centre staffing problems
Abandonment has become the most common issue for call centre staffing problem-based
research. More than 40% of call centre set a target for fraction of abandonment 𝑃{𝐴𝑏},
the lack of it will cause either understaffing or overstaffing. For one, it should be noted
that when some customers abandon the queue, other waiting customers will experience
shorter delays, and there will be further calls arriving at the queue. For another, fewer
servers will be required to meet the abandonment service goal if the call centre uses the
appropriate workforce management tools. Moreover, experiments by researchers have

14
demonstrated the importance of abandonment. Figure 4. (Dai and He, 2012) below
compare a queue with customer abandonment (𝑀/𝑀/50 + 𝑀) with a queue without
customer abandonment (𝑀/𝑀/50). The mean waiting time, mean queue length and delay
probability for 𝑀/𝑀/50 + 𝑀 are relatively shorter than those of the other system.
Therefore, under the same service capacity and throughput, the performance measure in
the queue with abandonment is significantly better than that of the queue without
abandonment.
Figure 4. Comparison between queues with and without customer abandonment
Chapter 3: Literature Review of Call Centre Staffing Problem
There is an extensive corpus of literature concerning call centre staffing problems, within
which there are four main streams of research (according to the number of classes/pools
of customers/servers): (1) single class of customers, single pool of identical servers; (2)
multiple classes of customers, single pool of identical servers; (3) single classes of
customers, multiple pools of identical servers; and, (4) multiple classes of customers,
multiple pools of identical servers. Most of the literature on call centre staffing problems
focuses on the first type. The main models used for this type of call centre are the well-
known Erlang-C formula (Gans et al., 2003, Borst et al., 2004) and its extension, the
Erlang-A formula (Garnett et al., 2002). Those two models, based on single-class M/M/N
queuing system, provide callers with the desired quality of service in a steady state.
Armony (2005) considered models with a single customer class and multiple agent types.
Their asymptotic optimality was inspired by the work of Borst et al. (2004), who formulated
and established asymptotic optimality for the single-class, single-pool 𝑀/𝑀/𝑁 queue.
However, what if the call centre manager wishes to provide a customised service catering

15
to different customer classes? For example, organizations such as banks and airlines
usually have special class designations (such as Platinum, Gold, Silver, and Economy),
where customers receive differentiated quality of service depending on their class
designation. The V-model, as an extension of the single-class Erlang-C and Erlang A
models, could handle this situation (Gurvich et al., 2008). Gurvich et al. (2008) explored
the server diffusion limits in the Quality-and-Efficiency Driven Regimes (QED), which
were first introduced by Halfin and Whitt (1981). Gurvich and Whitt (2010) took multiple
customer classes and a heterogeneous server pool into consideration; Armony and
Mandelbaum (2011) considered the symmetric case of a single customer type and a
heterogeneous server pool. However, designs for the multiclass/multipool type of call
centre are still in their infancy and rely mostly on simulation-based methods (Wallace and
Whitt (2005)). Recent work on multiclass/multipool systems focus on two approaches: the
model-based approach (Harrison and Zeevi, 2005), and the data-driven approach
(Bassamboo and Zeevi, 2009). Harrison and Zeevi (2005) were the first to propose a
staffing method in queuing models with multiple customer classes and server pools. Their
approach was based on reducing the original capacity optimization problem to a
multidimensional newsvendor problem, which addresses the question of capacity
planning under parameter uncertainty. Their findings inspired Bassamboo et al. (2006)
and Bassamboo and Zeevi (2009), whose subsequent work focused on the reduction of
joint staffing and dynamic call assignment problems. Bassamboo et al. (2010) conducted
a more detailed investigation of the efficacy of the newsvendor prescription model of
Harrison and Zeevi (2005), and they concluded that (i) traditional square root safety
staffing rule is no longer valid, and yet (ii) the results for a simple capacity prescription
derived via a suitable newsvendor problem are surprisingly accurate.
With the exception of the first type of call centre, call centre models include multiple
classes or pools of customers or servers, which means that if we want to address staffing
problems, then we need to consider ‘server scheduling’, i.e. the assignment of customers
to the appropriate server upon service completion or a customer’s arrival. That
consideration falls under the discipline of ‘Routing’ or ‘Control’, and that is outside the
realm of ‘Staffing’ problems. Therefore, this literature review mainly focuses on the single-
class/ single-pool call centre staffing problem, with some mention of joint staffing and
control problems. The structure of the literature review is as follows: Part A concerns

16
single-class/single-pool call centre problems; and, Part B contains a literature review on
related joint staffing and routing problems.
Part A: Staffing Problem
The seminal work on improving queuing systems with impatient customers was written
by Palm (1946), who introduced the basic Erlang-A model with 𝑀/𝑀/𝑛 + 𝑀, whereby the
last 𝑀 after the 𝑛 represents the exponentially distributed patience time. Similarly,
Gnedenko and Kovalenko (1989) studied the deterministically distributed patience time
(𝑀/𝑀/𝑛 + 𝐷), and Jurkevic et al. (2004) and Baccelli and Hebuteme (1981) studied the
generally distributed patience time ( 𝑀/𝑀/𝑛 + 𝐺 ); Mandelbaum and Zeltyn (2009)
demonstrated a 𝑀/𝑀/𝑛 + 𝐺 model with exponentially distributed service time and
generally distributed patience time; Mandelbaum and Momcilobic (2012) described an
𝑀/𝐺/𝑛 + 𝐺 model with generally distributed service and patience times. Dai et al. (2010)
used an 𝑀/𝑃ℎ/𝑛 + 𝑀 phase-type service time distribution and an exponential patience
time distribution to model the queue in to two regimes: (i) a critically loaded regime (also
known as a QED limiting regime), and (ii) an overloaded regime. Other call centre queuing
models are shown in Table 2. However, it is important to note that applying a
straightforward method or using exact formulae for performance measures of the Erlang-
A models have several drawbacks; the formulae for performance measures are relatively
complicated, as they involve double integration of the patience distribution, and will give
rise to numerical problems where there is a large number of servers. The whole patience
distribution has to be taken into account, which is a complicated estimation task (Brown
et al., 2005). Therefore, researchers have sought ways to develop various approximation
schemes. There are two main types of approximation schemes: the ’square root staffing’,
and the ’fluid and diffusion’ approximations. Each approximation method will be described
below, in Chapter 3.2 and Chapter 3.3 respectively.
References Call Centre Model
Erlang-A/-B/-C
Queuing Model
A/B/C/X/Y/Z

17
Table 2. Call cenetr queueing models
3.1 Operational Regimes
Halfin, S. and W. Whitt
(1981)
Erlang C M/M/N
Reed, J. E. (2007) Generalized Erlang C G/GI/N
Halfin, S. and W. Whitt
(1981)
Generalized Erlang C GI/M/N
Jelenkvic et al. (2004) Generalized Erlang C GI/D/N
Puhalskii, A. A. and
Reiman, M. I. (2000)
Generalized Erlang C GI/Ph/N
Garmarnik, D.,
Momcilovic, P. (2008)
Generalized Erlang C GI/GI/N
Janssen, A.J.E.M and
J.S.H. van Leeuwaarden,
B. Zwart. (2008)
Erlang B M/M/N/N Loss system
Palm, C. (1946) Erlang A M/M/N + M
Gnedenko, B and
Kovalenko, I. (1989)
Jelenkvic et al (2004)
Generalized Erlang A M/M/N + D
Baccelli, F and G.
Hebuterne. (1981)
Generalized Erlang A M/M/N + G
Whitt, W. (2005a) Generalized Erlang A G/M/N + M
Dai et al. (2010) Generalized Erlang A G/Ph/N + GI
Dai et al. (2010) Generalized Erlang A G/Ph/N + M
Dai et al. (2010) Generalized Erlang A G/GI/N + GI
Kang, W. N and
Remanan, K. (2010)
Generalized Erlang A GI/GI/N + G

18
Halfin and Whitt (1981) first introduced the concept of QED (quality-and-efficiency-driven)
operational regime: the quality-and efficiency-driven regime achieves, jointly, high levels
of system efficiency (high servers’ utilization) and service quality (short waiting time;
scarce abandonments). They established an important paradigm: as  increases,
maintaining the QED operational regime at a setting of >0 requires keeping the delay
probability {𝐴𝑏} at a fixed level between 0 and 1. In 1992, Whitt studied QED
approximations within several call centre without abandonment. Garnett et al. (2002)
investigated QED regimes with Erlang-A models with exponentially distributed
abandonment. For recent paper on the QED regime, Mandelbaum and Momcilovic (2008),
Reed (2009), Kaspi and Ramanan (2011). Zeltyn and Mandelbaum (2005) presented a
comprehensive study of QED, ED (efficiency-driven) and QD (quality-driven) regimes in
steady state for the 𝑀/𝑀/𝑛 + 𝐺 queue. However, ED approximation tends to be cruder
than QED approximation. Whitt (2006a) presented an ED approximation for 𝐺/𝐺/𝑛 + 𝐺
queuing systems with generally distributed arrivals, services and patience times, and
reported that ED approximation is useful when the uncertainty of the arrival rate is taken
into account. Whitt (2006a) showed that Erlang-A and other queue-with-abandonment
models are sensitive to changes in the arrival rate. Hence, Whitt (2006a) studied the ED
approximation for such a model, and developed asymptotic rules for optimal staffing. In
addition to the staffing problem, ED approximation is also useful when addressing other
problems. For example, Whitt (2006a) studied an ED approximation for skill-based routing,
and Bassamboo et al. (2006) used ED approximation to provide an asymptotic method of
routing and admission control. Finally, the development of an ED+QED model was
pioneered by Baron and Milner (2009), who developed the ED+QED approximation for
the tail probability of wait in an Erlang-A for 𝑀/𝑀/𝑛 + 𝐺 model for addressing staff
outsourcing problems. Shortly afterwards, Mandelbaum and Zeltyn (2009) applied this
method in their efforts to solve staffing problems.
3.2 QED; ED; QD; QED+ED Approximation
Depending on the offered load parameter R, (whereby 𝑅 =
𝜆
𝜇
= 𝜆 ∙ 𝐸[𝑆]) ,  is the arrival
rate and  is the service rate), the amount of work measured in time-units of service can
be quantified. If the staffing level exceeds R, the result is a high quality of service; whereas

19
if the staffing level less than R, there is a high utilization of the servers. By using the
square-root staffing rule, we can approximate the staffing level for each of the operational
regimes. The staffing level of each regimes together with their exact formula and
approximation formula of staffing level are shown below. Table 3.
Table 3. Exact and Approximation staffing level of four regimes.
The QED operational regime corresponding to the least staffing level that adheres to the
constraints with delay probability between 0 and 1. The QED staffing level is
nQED = R + β√R + o(√R )4
Where β is a quality-of-service (QoS) parameter, the larger it is, the better is the service
level. Furthermore, the approximation formula for QED is
n∗
QED = [R + β∗
√R]
4 𝑜(√𝑅) converges to zeros if R→ ∞. (Mandelbaum and Zeltyn, 2009)
Regimes Constraint Exact Staffing Level Approximation Staffing
Level
QED P{W > 0} ≤ α nQED = R + β√R + o(√R )
−∞ < β < ∞
n∗
QED = [R + β∗
√R]
ED P{Ab} > 10% nED = (1 − γ) ∙ R + o(R)
γ > 0
n∗
ED = [(1 − γ) ∙ R]
γ > 0
QD P{W > 0}
≤ 2%
nQD = (1 + γ) ∙ R + o(R)
γ > 0
n∗
QD = [(1 + γ) ∙ R]
γ > 0
ED+QED P{W > T} ≤ α nED+QED = (1 − γ) ∙ R +
δ√R + o(√R)
γ > 0
n∗
ED+QED = [(1 − γ∗) ∙ R
+ δ∗
√R ]

20
If we fix service rate (u) and patience distribution (G), then let λ and n converge to infinity,
we will get the delay probability converge to a constant strictly between zero and one:
P{W > 0} → α(β); At the same time, the probability to abandonment and the average
waiting time vanish at a rate
1
√n
. (P{Ab} → 0, E[W] → 0). (See Garnett et al. 2002, and
Zeltyn and Manudelbaum, 2005.) It was demonstrated by Mandelbaum and Zeltyn (2009)
that the data produced by QED approximation is accurate for small to moderate-sized
centres (<10 agents).
The staffing level for ED regime is
nED = (1 − γ) ∙ R + o(R)
The ED operational regime implies there is a understaffing with respect to the offered
load because of the high utilization of servers. So, with uncertain increase in n and λ,
nearly all customers will experience a delay and the abandonment will converges to a
constant. (P{W > 0}1, P{Ab} → γ , where γ is a constant). Meanwhile, the average wait
converges to a constant that depends on the patience distribution (o(R)). According to
the numerical results from Zeltyn and Mandelbaum (2005), a satisfactory fit of ED
approximation requires a large number of staffs (more than 100). The ED approximation
for the optimal staffing level is:
n∗
ED = [(1 − γ) ∙ R]
The staffing level for QD regime is
nQD = (1 + γ) ∙ R + o(R)
In contrast to ED regime, another extreme is QD regime. Within the quality-driven regime,
almost all customer are served immediately upon calling, and the delay probability will
converges to zero. A low probability of delay could correlate highly with high quality, but
it could also because of all customers are highly impatient, they leaves prior to queueing
(Balking). So they left more spaces for future incoming customers. At the same time, there
is also approximating staffing level for QD regimes:
n∗
QD = [(1 + γ) ∙ R]

21
The quality-driven approximation is good, but not as good as the rationalized
approximation (Even in the quality-driven regime). The efficiency-driven solution is the
worst of the three and substantially over-staffed in the quality-driven regime. (Borst et al.
2004).
Most recently, Mandelbaum and Zeltyn (2009) establish a new regime, ED+QED , which
corresponding to the lowest staffing level that obey the constraint P{W > T} ≤ α (whereby
T is in the order of a mean service time and P{W > T} is the tail probability of delay5, α is
a number between 0 and 1). If we vary the number of servers according to ED staffing
rule, and hold other parameters to be constant, we will get an ED parameter γ∗
, which
show that:
{
P{W > T} → 0 if γ < γ∗
P{W > T} → 1 − G(T)6 if γ > γ∗
And the staffing level for ED+QED regime is
nED+QED = (1 − γ) ∙ R + δ√R + o(√R)
When 0 ≤ α ≤ 1 − G(T), ED approximation is too “crude” for the constraint P{W > T} ≤
α. So QED fine-tuning around ED staffing level (1- γ∗
)R, and thus provides one with
staffing level that satisfy P{W > T} ~ α. That is why we need a new ED+QED regime. In
addition, this regimes provide us the well-known “rule-of-thumb” for Erlang A model: 30-
60 seconds’ waiting time are corresponding to approximately 10% abandonment. In
addition, Figure 4 shows operational regimes to be used in different levels of the
performance measures.
Figure 4. Summary of practical recommendations.
5
A special case of the tail probability is the delay probability P{W>0}.
6
G represent is the patience distribution.

22
3.3 Fluid and Diffusion Approximation
Only the M/M/N+M model can be used for an exact analysis of multi-server queues with
customer abandonment, as this has a Poisson arrival process, and exponential service
and patience time distributions. However, Brown et al. (2005) observed that both service
time distribution and patience time distribution are far from exponential. Hence, one must
find general distributions to model service and patience times. More recently, researchers
including Zeltyn and Mandelbaum (2005), Dai et al. (2010), and Mandelbaum and
Momcilovic (2012) have demonstrated that the performance of a multi-server queue in
QED regimes is just as insensitive to the patience time distribution as to the patience time
density. So if the service and patience time distributions are general (except within the
computer simulation), no suitable analytical or numerical methods are available to
evaluate the performance of such a queue. Therefore, it is useful to study approximate
models for multi-server queues.
The most important one is diffusion approximation for multi-server queues in the QED
regimes. The theory of diffusion approximation for multi-server queues can be traced back
to the seminal paper by Halfin and Whitt (1981), who established a diffusion limit for the
GI/M/n/ queues. Puhalskill and Reiman (2000) established a diffusion limit for the GI/Ph/n
queues. Garnett et al., (2002) identified a diffusion limit for the M/M/n+M queue, which
allows for customer abandonment. In 2012, Reed and Tezcan defined a diffusion limit for
the GI/M/n+GI queue. Whitt (2004) generalized these results for the G/M/n/M queue. Also,
Dai et al., (2010) extended these results for a G/Ph/n+GI queue. They ran two diffusion
approximation models: one was a M/M/n+GI queue (in which the service time distribution
is exponential), wherein every step for diffusion approximation could be shown in detail;
the other one was a M/H2/n+GI queue. The resulting diffusion process of the first model
was a one-dimensional piecewise Ornstein-Uhlenbeck (OU) process, while the latter was
a two-dimensional OU process. Dai et al., (2012) also demonstrated that diffusion models
are accurate in predicting the system performance in QED regimes, even for queues with
as few as 20 servers.

23
Meanwhile, for a multi-server queue with ED regime, fluid approximation has been shown
to be useful. Whitt (2006a) studied a fluid model to estimate the performance of a multi-
server queue in the ED regime. Bassamboo and Randhawa (2010) solved the staffing
problem of an M/M/n+GI queue; for that queue, exact optimization was not possible, and
so they employed the fluid model to approximate the queue. The optimized staffing level
affects the queue in ED regimes, and fluid approximation was shown to be particularly
accurate when it was the underlying system operated in the ED regimes. Dai et al., (2012)
showed that the fluid model is adequate in estimating the performance of a multi-server
queue in the ED regimes.
3.4 Retrials and Balking
There has been extensive research into retrial queues, a queuing system in which an
arriving customer who finds that all the servers and waiting positions are occupied may
retry the service after a period of time (Yang and Templeton, 1987). Yang and Templeton
were the first to discuss the queuing model for multi-server retrials queues. Falin (1995)
considered how best to the estimate the rate of retrials with the help of integral estimators
at the start of an M/M/1 queuing system; indeed, estimating the number of retrials is a
difficult problem because it cannot be fully observed in the retrial queues. There are also
other papers that consider retrials queues (e.g. Falin and Templeton, 1997, Walfield and
Foers,1985, Lewis and Leeonard, 1982), but they ignore abandonment behaviour.
Hoffman and Harris (1986) incorporated abandonment and retrials in a model.
Mandelbaum et al., (1999) considered a multi-server system with both abandonment and
retrials, and with fluid approximation analysis. Aguir et al., (2004) make an extension to
this approach by including non-stationary call arrivals, in order to investigate the impact
of retrials on the performance of call centres using fluid approximation. Gans et al., (2003),
Artalejo and Pla (2009), and Koole and Mandelbaum (2002) stated that retrials
phenomena cannot be disregarded in a careful design of a call centre queueing system.
Zhu et al. (2007) started to investigate the retrial queuing model with infinite Quasi-Birth-
and-Death (QBD) processes, by approximating the original infinite QBD model by another
infinite one, which in turn is solvable. That method has also been studied by Neuts and
Rao (1990), and by Artalejo and Pozo (2002). Moreover, Artalejo and Pla (2009)

24
illustrated the influences of retrials on telecommunication systems with infinite waiting
room and orbit in a retrial queue, they also proposed two truncation methods for analysing
the underlying Markov Chain in a retrial queue. In addition, some other researchers have
discussed the applications of retrial queuing models for the performance evaluation of
cellular mobile and computer networks (e.g. Phung-Duc et al., 2009a, Artalejo and Lopez-
Herrero, 2010).
There has been some academic research into balking. One idea is to consider strategic
consumer behaviours. According to classical queuing theory, it is the server(s) that
make(s) the decisions, and the customers are forced to follow them. However, in reality,
it is the arriving customer who decides whether to enter the system or to balk, to wait or
abandon, and to purchase priority status (e.g. upgrade to a ‘premium account’) or not.
Naor (1969) was the first to identify the strategic consumer behaviour in the joining-
balking dilemma in an M/M/1 queue. Hassin and Haviv (2003) conducted a
comprehensive review into state-dependent join or balk behaviour. Cui et al. (2014)
conducted retrials that took balking into account, and established a rational retrials model.
When considering customer strategic response, one important question needs to be
considered: how does the information level influence the customer’s decision? Hassin
(1986) argued that servers should provide less information to the customer. Hassin and
Haviv (1994) also claimed that if servers provide more information, this will lead to
counterintuitive results. Whitt (1999) considered balking and abandonment, and argued
that more information will reduce the waiting. There have also been studies into the
joining-balking dilemma in single-server queues during vacation times (e.g. Economou et
al., 2011).
Part B: Joint Design, Staffing and Control Problem
In modern call centre service systems, it is common to have multiple classes of customers
and server skills. Three interrelated problems should be addressed: ‘design’, ‘staffing’
and ‘control’ (routing). Design (according to Gurvich, 2004) concerns the long-term

25
problem of determining the class partitioning of customers, and the types of servers
serving different customer classes. Staffing (again, according to Gurvich, 2004) concerns
the short-term problem of determining the number of servers that are needed in order to
satisfy the given demand. Control (Gurvich, 2004) concerns customer routing and server
scheduling, which is the assignment of the right customer to the right server. Those three
problems are in conjecture with one other, but they are usually solved separately due to
the high complexity that results if they are solved simultaneously. Insights into the
coordination of skill set design, staffing and routing decisions for multi-skill server centres
were offered by Wallace and White (2005). The routing policies they consider are the
static overflow routing: each call class has an ordered list of agent types that can handle
it; the arrivals, a call of that class is assigned to the first agent type in the list that has an
available agent. Similarly, each agent type has an ordered list of call classes from which
to pick up calls when it becomes available. The problem here is to minimize staffing costs
that is subject to the constraints on service levels (SLs). Atlason et al., (2004) and Cezik
and L’Ecuyer (2008) addressed this problem by using integer programming with cutting
planes. This method can handle arbitrarily complex call centre operations conditions.
Avramidis et al., (2009) aimed to provide an approximation of the SL per class in a multi-
skill centre with a special type of overflow routing. They also mentioned that Bassamboo
et al. (2005, 2006) and Bassanboo and Zeevi (2008) made earlier efforts to establish
asymptotic feasibility or optimality for all the three problems (design, staffing, control).
Their aim was to minimize costs in terms of waiting, abandonment and customer
rejections. Bassanboo and Zeevi (2008) also considered abandonment constraints, but
not tail-probability SL constraints. Gurvich and Whitt (2010) developed the queue-ration
routing method that can solve the tail-probability SLs constraints.
The V-model is one example of the skills-based routing model (also called the General V
Model), whereby several customer types are served by one pool of servers. The most
simple and intuitive control is the generalized - c(Gc) rule, which was first introduced
by Van-Meighem (1995) for the multi-class customers and single server V model. The
importance of the General V-model is that it distributes different types of customers
among a group of heterogamous servers. Figure 5. shows a schematic representation of
the V-model. This model has Poisson arrival streams and identical exponential service
time: 𝜇𝑖 = 𝜇, ∀ 𝑖 = 1,2,3, … , 𝐽, and 𝐽 is the number of classes.

26
Figrue 5. A schematic representation of the the V-model.
They are several papers in which the control problems of V-mode in the QED regime are
discussed. Control of a V-model under cost-minimization objectives were evaluated
comprehensively by Gurvich (2004). Armony and Maglaras (2004a, 2004b) used the
same approach with two classes of differentiation service level to address the constraint
satisfaction problem, and they were the first researchers to investigate the dynamic
control in the QED regimes. Maglaras and Zeevi (2005) considered profit maximization
for a loss system two-class V model with admission, sizing, and pricing control.
Chapter 4: Erlang-A Example Paper
4.1 Exact Method
In Garnett et al (2002), the method used to represent the performance measures of an
M/M/N+M model in steady state is to express them as expectations of simple functions
of 𝑉 and 𝑋, that is 𝐸[𝑓(𝑉, 𝑋)], where 𝑉 is the virtual (potential) waiting time, and 𝑋 is the
patience of the an customer. (Our method refers to Garnett et al. (2002)) For example,
the equation 𝑃{𝐴𝑏} = 𝐸[𝕝(𝑋,∞)(𝑉)] proves that the probability of abandonment is
proportional to the expectation of the indicator function of 𝑉. To be more specific, if 𝑋 <
𝑉 < ∞, then 𝑃 {𝐴𝑏} = 𝐸 [𝑉]; otherwise, 𝑃 {𝐴𝑏} = 0. Similarly, 𝑃{𝑊 > 𝑡} = 𝐸[𝕝(𝑡,∞)(𝑉⋀𝑋)],

27
whereby 𝑉⋀𝑋 = min(𝑉, 𝑋) is the actual waiting time of a customer. Specifically, if 𝑡 <
𝑉⋀𝑋 < ∞, then 𝑃{𝑊 > 𝑡} = E[min(𝑉, 𝑋)], otherwise, the value of P{W>t} equals to 0.
More of these formulae are listed in Figure 6.
Figure 6. Performance measure of the form 𝐸[𝑓(𝑉, 𝑋)]
In order to generate exact calculation results for the 𝑀/𝑀/𝑁 + 𝑀 model, three different
methods for performing this calculation (each with its own virtues and drawbacks) will be
tested. But before that, it is necessary to decomposed 𝐸[𝐹(𝑉, 𝑋)] into two components:
𝐸[𝑓(𝑉, 𝑋)] = 𝐸[𝑓(𝑉, 𝑋) ∙ 𝕝(0,∞)(𝑉)] + 𝐸[𝑓(𝑉, 𝑋)𝕝(0)(𝑉)]
＝ 𝐸[𝑓(𝑉, 𝑋) ∙ 𝕝(0,∞)(𝑉)] + 𝐸 [𝑓(0, 𝑋) ∙ (𝜋 𝐵 + ∑ 𝜋 𝑘
𝑁−1
𝑘=0
)]
Whereby 𝜋 denotes the stationary distribution if the queue length process𝑄(𝑡), and 𝜋 𝑛
defined as the limit of probability of this process, that is
lim
𝑡→∞
𝑃{𝑄(𝑡) = 𝑛} = 𝜋 𝑛 , 𝑛 = 0,1,2, … , 𝐵
Noted that 𝐸[𝑓(0, 𝑋)] equals to either 0 or 1, and the distribution function of the potential
waiting of a typical customer 𝑉 is 𝐹𝑤.
Three methods of calculating 𝐸[𝑓(𝑉, 𝑋)] will be compared. Method A places certain
conditions upon the number of customers in the queue upon arrival, but this method is
unstable due to the varying sign of sum. Method B is similar to the first one except that
the sum is ignored; this is costly since the integrals must be solved numerically. Unlike
Methods A and B, Method C aims at solving a more generalised case of an M/M/N/B+M
queue, whereby B is the infinite buffer size, and whereby the resulting integral is usually
solved analytically and more easily. (Garnett et al., 2002)
The overall formula for Method C:

28
𝐸[𝑓(𝑉, 𝑋) ∙ 𝕝(0,∞)(𝑉)] = ∫ ∫ 𝑓(𝑡, 𝑥)𝜃𝑒−𝑥𝜃
∞
0
𝑓 (𝑡)𝑑𝑥 𝑑𝑡𝑣
+
∞
0
Whereby,
𝑓 (𝑡)𝑣
+
= 𝑁𝜇𝜋 𝑁[1 −
Υ(𝐵 − 𝑁,
𝜆
𝜇
(1 − 𝑒−𝜃𝑡
)
Γ(𝐵 − 𝑁)
]
Γ(x) is the gamma function, Υ(x, y) is the incomplete gamma function,
Γ(x) = ∫ 𝑡 𝑥−1
exp(−𝑡) 𝑑𝑡
∞
0
Υ(x, y) = ∫ 𝑡 𝑥−1
exp(−𝑡) 𝑑𝑡 , 𝑦 > 0
𝑦
0
Within the simplest Erlang-A mode, the buffer size B, which is the maximum system
capacity, is usually assumed to be infinite. However, if we give a value for 𝐵 = , the
steady-state solution j of the Markov Process include infinite sums that can cause
numerical problems. To overcome this, the method by Palm (1946) will be applied; this
represents the steady-state distribution, and some important performance measures in
terms of gamma function and incomplete gamma function. So instead of calculating k in
terms of B, we use blocking probability 𝑃{𝐵𝑙} in an 𝑀/𝑀/𝑁/𝑁 model to solve this problem.
Unfortunately, the procedures and formulas of using 𝑃{𝐵𝑙} was not provide in Garnett’s
paper, but nevertheless, these formulae will be clarified in Chapter 5, and the outstanding
results are provided in Chapter 6.
4.2 Approximation Method
The most significant and difficult feature in the approximation method is the estimation of
the abandonment rate and service grade parameters.
 Estimation of 𝜃:

29
The measure of abandonment rate 𝜃 is difficult since the direct data collection is restricted.
It was therefore decided to apply a steady-state balance equation to account for (i) the
rate customers abandon the queue 𝜃 and (ii) the rate that abandoning customers enter
the system 𝑝{𝐴𝑏}: Because of
𝜃 ∗ 𝐸[# 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 𝑖𝑛 𝑞𝑢𝑒𝑢𝑒] = 𝜆 ∗ 𝑝{𝐴𝑏};
𝐸[# 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 𝑖𝑛 𝑞𝑢𝑒𝑢𝑒] = 𝜆 ∗ 𝐸[𝑊];
Therefore,
𝜃 =
𝑃{𝐴𝑏}
𝐸[𝑊]
=
% 𝐴𝑏𝑎𝑛𝑑𝑜𝑛𝑚𝑒𝑛𝑡
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑊𝑎𝑖𝑡
(Garnett et al. 2002)
% Abandonment =
𝐸[𝑊]
𝐸[𝑅]
(Brown et al. 2005)
Thus, we conclude that
𝜃 =
1
𝐸[𝑅]
Where 𝐸[𝑅] is the expected time a customer is willing to wait (referred to as average
duration of customer’s patience.)
Indeed, the linear relation between 𝐸[𝑊] and 𝑃{𝐴𝑏} has already verified by using the
yearly data of an Israeli bank call centre, which accounts for the 4158 hour intervals (See
in Figure 7). The left hand side graph is smeared with a ”cloud” of data points; whereas
for the right hand side graph, the author used an aggregation procedure that was
designed to place more emphasis on the predominant patterns. The slope of the line is
an estimate of average patience, which is equals to 446 seconds.

30
Figure 7. Probability to abandon vs. average waiting time. (Mandelbaum and Zeltyn, 2005)
Mandelbaum and Zeltyn (2005) also provided a more general approach, that is to
calibrate the patience parameters until the estimates more closely match each other. The
advantage of that method is the greater flexibility in choosing the performance measure
being matched, which in turn depends on the given ACD data.
 Estimation of service grade 𝛽 and 𝜖 :
The estimation of  depends on the operational regimes. In a QED (rationalized
operational regime), and taking into account the staffing level in this regime:
𝑁 = 𝑅 + 𝛽√𝑅,
So the service grade 𝛽 is
𝛽 = √𝜇 𝜆⁄ 𝑁(1 − 𝜆 𝑁𝜇)⁄
As for QD regime, the service grade is
𝜖 = (𝑁𝜇 𝜆) − 1⁄
As for ED regime, the service grade is
𝜖 = 1 − (𝑁𝜇 𝜆)⁄ .
In practice, the value of 𝛽 lies in the range of -0.5< 𝛽 < 1.
4.3 Experiments design and Empirical Results

31
The aim of this experiment is to compare approximation method with exact method by
evaluating their performance measures based on 𝑀/𝑀/𝑁 + 𝑀 model. The parameter
values are assumed as follow: 𝜆 = 50; 𝜇 = 1; 𝑁 = [20,80] and variables is the
average number of agents. There are three comparison experiments for each
performance measure in terms of the average patience time: 𝑎 = 10𝑚𝑖𝑛 (very patient);
𝑏 = 1 min (moderately patient); 𝑐 = 6 𝑠𝑒𝑐𝑠 (very impatient). In total, there are five
performance measures examined in this paper: P{W > 0}, P{Ab}, P{W > 10s}, P{Ab|W >
10}, E[Wait|Served].
The results of experiments are shown from two aspects: One is to show the theoretical
validity of the approximation; another is to illustrate how average (Im)patience time affect
the performance measure.
Similar patterns were found for the performance results of very patient customers (Graph
A) and moderately patient customers (Graph B), in that the approximation and exact
plotted line fitted each other. However, this was not the case for the results of the
impatient customer (Graph C), particularly the accuracy of approximation in P{Ab[W > 10
sec]. This was the result of the tiny value for P{W > 10 sec}. Since P {Ab{W > 10sec} =
P{Ab;W>10sec} / P{Ab;W>10sec}, the smaller the denominator, the greater the difficulty
in calculating a precise value of P{AbW > 10sec}. It is therefore assumed that N equals
50 agents, and so a simulation method was preferred over an exacting method. In
addition, the figures below showing P{W>0}, P{W>10s} and E[WaitServed] show a
relatively small yet consistent underestimation of empirical values for medium size call
centres.
As shown in the graphs, the line fit approximation increases in direct proportion with the
patience time. Specifically, when the patience time is 10 minutes, the two lines (‘+’ for
Approximation, ’solid line’ for exact values) nearly coincide with each other. However,
when patience time is 1 minute, the approximation results fall below the exact values.
Furthermore, the approximation method will not produce accurate data for the behaviour
of impatient customers contacting medium-sized call centres.

32
The next step was to analyse the influence of average patience time on performance
measures. For P{W > 0}, this is the fraction of customers who encounter a delay. (Figure
8) When the number of agents (N) equals 45, the P{W > 0} patterns in the graphs (A, B,
C) are as follows
{
𝑃{𝑊7
𝑎
> 0} = 1.00
𝑃{𝑊𝑏 > 0} = 0.80
𝑃{𝑊𝑐 > 0} = 0.44
This shows that when there are 45 agents available, very patient customers will accept
the delay, 80% of moderately patience customers will tolerate the delay, and only 45% of
impatient customers will do so. Therefore, the higher the patience time, the higher the
probability of customers waiting in the queue, and thus the greater the delay for
subsequent customers. In addition, when N is larger than 45 agents, there is a sharp
decrease in the probability of delay. This shows the economies of scale: it would be more
valuable if the call centre hires more agents (with the same financial cost per agents, but
with a reduction in probability of delay in the queue). As shown in Figure 8, b is similar to
a, whereas line c follows a more plain trajectory and thus do not affect by economies of
scale.
{
𝑁 = 45 𝑃{𝑊𝑎 > 0} = 1.00
𝑁 = 50 𝑃{𝑊𝑎 > 0} = 0.80
𝑁 = 55 𝑃{𝑊𝑎 > 0} = 0.40
𝑁 = 60 𝑃{𝑊𝑎 > 0} = 0.13
It is also necessary to compare the performance of P{Ab} with P{AbW > 10sec}. Figure
9 shows the performance measure of P{Ab}, wherein lines a, b and c coincide into a single
line. This unified line equals 0.6 when N equals 20; and it nearly equals zero when N =
80. As for the performance measure of P{Ab|W > 10sec}, line a hold. However, line b and
c have smaller slope rates compared with those in Figure 11. The pattern for line b is
attributed to the increase in P{W > 10sec}, and for line c is attributed to the rare probability
of very impatient customers to wait longer than 10 seconds. Indeed, the results for
P{Ab|W > 10sec} are lower than for 𝑃{𝐴𝑏} as the former ignores the probability of
‘Balking’ , i.e. customers who hang up before getting into the queue P{abandon} = P{balk}
7
𝑃{𝑊𝑎 > 0}; 𝑃{𝑊𝑏 > 0}; 𝑃{𝑊𝑐 > 0} Represent the value of P{W>0} for each graph a,b,c at N=45
respectively.

33
+  * E[Wait]. Nevertheless, most of the impatient customers leave the queue within 6
seconds, and will not even wait for 10 seconds.
Figure 8. Approximation on P{W>0} Figure 9. Approximation on P{Abandon}
Figure 10. Approximation on P{W>10sec} Figure 11. Approximation on P{Ab|W>10sec}
4.4 Critical Review
Mandelbaum and Zeltyn (2007) examined the two pairs of comparison experiments for
moderate large call centre problem: (1) a comparison of the Erlang-A exact method
against real data, and (2) a comparison of Erlang-A approximation data against real data.
In their experiment they considered three performances measures: 𝑃{𝐴𝑏}, 𝐸[𝑊], 𝑃{𝑊 >
0}. Again, they sourced their data from an Israeli call centre.

34
Their results for the comparison experiments are shown below (Figure 12.). The values
of the exact Erlang-A formula are aggregated into a single line y = x, so the better the fit,
the better the Erlang-A formula describes reality. In general, the exact formula yielded a
respectable estimation of the real data, although there were slight underestimates in the
two left-hand graphs.
Figure 12. A comparison of the Erlang-A exact method against real data
For the comparison experiments (Figure 13.), the authors found that the approximation
provided somewhat larger values than the exact values for 𝑃{𝐴𝑏} and 𝐸[𝑊], but ones
which were closer to the real data. However, the results for 𝑃{𝑊 > 0} were less accurate,
as the formula generated lower values for predicting the probability of wait.
Figure 13. A comparison of Erlang-A approximation data against real data.
Nevertheless, in general the predicted lines of the exact method Erlang-A model and its
approximation are really close to real data.

35
4.5 Sensitivity Analysis
The reliability of an application is dependent on the degree of sensitivity. The greater the
sensitivity of a parameter, the stronger the influence of the parameter on the margin of
error in the results. Indeed, the arrival rate in M/M/N+M is relatively large, so we need to
consider more about the uncertainty of arrival rate. Whitt (2006b) did a sensitivity analysis
to this example paper, he applies approximation method to investigate the sensitivity of
the Erlang-A model to the model parameters: the arrival rate 𝜆 , the service rate 𝜇, the
number of servers 𝑠 and the individual abandonment rate 𝜃.
In order to evaluate the sensitivity, a direct approach can be used to calculate the
derivatives of performance measure with respect to the parameters, but it is difficult to
interpret the derivatives. So instead, Whitt introduced the concept of ‘elasticity’. For
example, if 𝑓(𝜆) is the abandonment probability as a function of the arrival rate, and it has
derivatives 𝑓′
, then the arrival-rate elasticity of the abandonment probability is
𝜀(𝑓, 𝜆) ≡
𝜆𝑓′
𝑓
≡
𝜆𝑓′(𝜆)
𝑓(𝜆)
;
This formula shows the percentage change in the abandonment probability result from a
small percentage change in the arrival rate.
Next, in order to calculate the derivatives, it is nature to directly differentiate the
performance measure only within the bounds for tractable formula. Usually we use the
exact numerical algorithm for 𝑀/𝑀/𝑁 + 𝑀 model. Whitt (2006b) analysed the elasticity
of the diffusion approximation in the QED regimes of Garnett et al. (2002) and showed
that for when there is a large number of servers, it is remarkably sensitive to changes in
the arrival rate and service rate, but remarkably insensitive to changes in the
Abandonment rate. Therefore, in the QED Regime, the arrival rate, service rate and
number of servers are all of order O(√𝑆) as s → ∞. However, when we investigate the
elasticity in ED regime, we find contrasting patterns, in which the sensitivity of
abandonment rate is really large ( with elasticity nearly equalling 1), whilst the other
parameters are stable. (so in the ED regime, the arrival- rate, service-rate and number of
servers are all of order O(1) as s → ∞).

36
In his paper, Whitt calculate the exact values of the derivatives and the elasticities for the
Erlang-A model, with the input values: 𝜆 = 𝑠 = 100; 𝜃 = 𝜇 = 1.
His basic experiment consider 9 cases with 3 values of s ( s= 10;100;1000 )and 3 values
of 𝜃 (𝜃 = 10; 1; 0.1). Here the values of basic performance measures are shown in Table
4.
8
Table 4. Results for 9 experiments as a base.
Then the author consider the arrival-rate elasticities 𝜀 (𝑓, 𝜆) and service rate elasticities
𝜀 (𝑓, 𝜇) and abandonment rate elasticities 𝜀(𝑓, 𝜃) in scaled versions.
Table 5: The abandonment-rate elasticities 𝜀(𝑓, 𝜃) of several performance measures.
Table 5 (in this paper) presents the abandonment-rate elasticities of the performance
measure The results of abandonment-rate elasticities are not large (less than 1.0), an
8 SD(Q): standard deviations of the steady-state queue length;
SD(N): standard deviations of the steady-state number of customers in the system;
SD(W): standard deviations of the steady-state waiting time.
EN: expected steady-state number of customers in the system (waiting and services)

37
indication that changes in abandonment rate result in less changes in those performance
measures. The arrival-rate elasticities 𝜀 (𝑓, 𝜆) and service-rate elasticities 𝜀 (𝑓, 𝜇) are
shown in Table 6 and Table 7. It is necessary to note that after the additional scaling by
√ 𝑠 , the elasticities indeed become of order O(1)
Table 6. Arrival-rate elasticities 𝜀 (𝑓, 𝜆).
Table 7. Service-rate elasticities 𝜀 (𝑓, 𝜇).
Table 8. The second derivatives of the performance measures of service rate.

38
Later in his paper, the author presented the second derivatives of the performance
measures with respect to the service rate in Table 8. The scaled second derivatives were
calculated by dividing the previous results by s again. The reason for this is so the reader
has a better picture of the strong sensitivity of performance to the service rate as s
increases. The second derivation scaled version results were produced by modifying the
base case by fix s and 𝜃; however, this time, the arrival rates varies according to 𝜆 = 90,
100, 110. The next stage is to calculate the second derivatives of elasticities for patience
(mean-time-to-abandon), arrival rate and service rate. This time, the arrival-rate and
service-rate elasticity of these performance measures are of the order 9(√𝑆) = 10.
Chapter 5: Three Experiments Design
5.1 Introduction
The following methods are detailed in this chapter: the exact calculation method, the
approximation method, and the simulation method. All of them seek to model the
performance measures of an 𝑀/𝑀/𝑁 + 𝑀 queue in a call centre staffing dissertation. The
only difference is that the former two methods will apply certain formula and generate
exact results, whereas the third one models the real-world situation by setting the random
numbers with uncertainties. In this dissertation, the objective is to derive a simulation
model and in reference to Garnett et al. (2002), implement his exact calculation formulae
and approximation formula in MATLAB software. The same input data as Garnett’s (2002)
are used: 𝜆 (the arrival rate, with 50 customers arriving per minute); 𝜇 (the service time,
with (1 𝜇⁄ ) as the average handling time, and with 1 customer finish service per minute);
N (the staffing level, from 20 to 80 agents). The traffic intensity equals 𝜆 / N𝜇, and this will
range from (50/80 to 50/20), or from 0.625 to 2.5. There are also three different values of
average patience: a = 10min; b = 1min; c = 6sec (or 1/10 min) in order to establish
comparison experiments in each methods.
9 O(√𝑆): Implies that the sensitivity of performance to the parameters is growing as s increases,
regardless of the direct scaling of the performance measures.

39
5.2 Exact Method
The purpose of the method is to find the static, steady state solution for the Erlang-A
model (M/M/N+M). Garnett et al. (2002) used Method C (detailed back in Chapter 4). In
this experiment Method C will be applied, as well as the function prescribed by Palm
(1946) to deal with the unlimited buffer size B. In this dissertation the formula 𝐸1,𝑛
(Mandelbaum and Zeltyn, 2005) shall be used to denote the blocking probability in the
M/M/N/N (Erlang-B) system. The classic Erlang-B formula is
𝐸1,𝑛 =
(𝜆 𝜇)⁄ 𝑛
𝑛!
∑
(𝜆 𝜇⁄ ) 𝑛
𝑗!
𝑛
𝑗=0
A simplest way to address the value of 𝐸1,𝑛 is the recursion, with 𝜌 =
𝜆
𝑛𝜇
is the offered load
per agent.
𝐸1,0 = 0 ; 𝐸1,𝑛 =
𝜌𝐸1,𝑛−1
1 + 𝜌𝐸1,𝑛−1
, 𝑛 ≥ 1
And the solution of steady-state distribution is:
𝜋𝑗 =
{
𝜋 𝑛 ∙
𝑛!
𝑗! ∙ (
𝜆
𝜇
)
𝑛−𝑗
, 0 ≤ 𝑗 ≤ 𝑛
𝜋 𝑛 ∙
(
𝜆
𝜃
)
𝑗−𝑛
∏ (
𝑛𝜇
𝜃
+ 𝑘)
𝑗−𝑛
𝑘=1
, 𝑗 ≥ 𝑛 + 1
Where,
𝜋 𝑛 =
𝐸1,𝑛
1 + [𝐴 (
𝑛𝜇
𝜃
,
𝜆
𝜃
) − 1] ∙ 𝐸1,𝑛

40
5.3 Approximation Method
An Approximation model of various performance measures of the Erlang-A model is a
significant method, which demands less computational effort, and which results in an
accurate empirical fit of the simple Erlang A model. This will be used to ‘replicate’ the
experiments from Garnett et al. (2002) by using the approximation result in
Implementation. The approximation MATLAB code is provided in Appendix A. Four
performance measures that are used here include P{W > 0}, P{Ab}, P{W > t}, and P{Ab|
W > t}. However, E[Wait|Served] is omitted since there was no such approximation
formula shown in the paper by Garnett et al. (2002). The computation of the Erlang-A
parameters 𝜇, 𝜆, 𝜃, t and are calculated for every minute-long interval for the agents’
intervals (from 20 agents to 80 agents). The results are displayed in Table 10.
Within the approximation model, the following formulae are used to approximate the
performance measures, using the following function:
P{W > 0} ≈ 𝑤(−𝛽, √𝜇 𝜃⁄ )
P{Ab}
≈ 𝑤 (−𝛽, √𝜇 𝜃⁄ ) ∙ [1 −
ℎ(𝛽√𝜇 𝜃⁄ )
ℎ(𝛽√𝜇 𝜃⁄ + √𝜃 𝑁𝜇⁄ )
]
P{W > t}
≈ 𝑤 (−𝛽, √𝜇 𝜃⁄ ) ∙
h(β√μ θ⁄ )
Ψ(β√μ θ⁄ , √Nμθt), )
∙ 𝑒−𝜃𝑡
𝑡 ≥ 0
P{Ab|W > t}
≈ 1 −
Ψ(β√μ θ⁄ , √Nμθt)
Ψ(β√μ θ⁄ + √θ Nμ⁄ , √Nμθt)
∙ 𝑒 𝜃𝑡
𝑡 ≥ 0
w(x, y)
= [1 +
ℎ(−𝑥𝑦)
𝑦ℎ(𝑥)
]−1
Ψ(x, y)
=
ϕ(𝑥)
1 − Φ(𝑥 + 𝑦)

41
h(x) hazard rate function
=
ϕ(x)
1 − Φ(x)
ϕ(x) standard normal density function
=
1
√2𝜋
𝑒−
𝑥2
2
Φ(𝑥) standard normal density function
= ∫ ϕ(y) 𝑑𝑦
𝑥
−∞
Table 9. Approximation formula of Erlang-A model
5.4 Simulation Method
Our simulation model is a straightforward discrete event simulation model. The purpose
of using simulation model is to predict the long term, steady-state behaviour of the
queueing system and we use ‘mt19937ar’ to generate random numbers. The most related
paper to us is Robbins et al. (2010), who also applied the simulation model to examine
the validity of Erlang model in call centre, but it is for most basic Erlang-C model, however,
we are designing a model for Erlang-A. The advantage of simulation is its generality. In
contrast to most analytic and numerical methods, it poses no restriction in the probability
distributions involved. For example, several assumptions of Erlang A are relaxed, such
as constant arrival rate, exponentially distributed service time, exponentially distributed
patience time. In our model, arriving calls are routed to the servers, who have been idle
for the longest time. We assumed that there is no balking included, which means all
customers can make decision only after they are getting into the queue, at that time they
have two choices: (i)abandoning (ii)waiting. It is also assumed there is no priority in the
queue, we follow the first-come-first-served scheduling disciplines.
The simulation model was created by firstly modelling M/M/1(with single server), then by
modelling M/M/N (with n servers), and then producing the final model M/M/N+M (with n
servers and an exponentially distributed abandonment). The variables of those models
included arrival times, departure times, and waiting times. The M/M/N+M model also
contains virtual waiting time and abandonment variables. The main differences between
M/M/1 and M/M/N are in the formulation of departure time and waiting time:

42
Departure time: departure time in M/M/1 model for its first customers is the sum of
service time and arrival time; and the departure time for all subsequent customers is the
service time plus the maximum value of the previous customer’s (or (𝑖 − 1) 𝑡ℎ
) departure
time and 𝑖 𝑡ℎ
customer’s arrival time. The reason for finding the “maximum value” is
because of the waiting time: when the 𝑖 𝑡ℎ
customer arrives, the previous customers are
still being served. Likewise, in a M/M/N system, the departure time for the first N
customers is similar to the former: i.e. It equals the arrival time plus service time. However,
for the subsequent customers in the M/M/N model, the aim is to calculate the earliest
departure time among the previous N served customers rather than previous all
customers, that is to find the maximum value between the earliest departure servers and
the 𝑖 𝑡ℎ
customer arrival time. This is because the first customer might not finish before
the second customer finishes.
Waiting time, if the 𝑖 𝑡ℎ
customer arrives before the previous customer has finished, the
result is waiting time. For a model with one server the waiting time for the first customer
is zero. Similarly, for a model with n servers the waiting time for the first n customers is
also zero. The difference between these models becomes apparent when subsequent
customers join the queue. For the M/M/1 model, the waiting time for the subsequent
customer (after the first customer) equals the difference between the previous customer’s
departure time and the subsequent customer’s arrival. However, in the M/M/N model the
waiting time is the difference between the fastest finish time among all previous (N)
served customers and the arrival time of the new customer.
There is also the simulation model for an M/M/N+M system. In addition to the
considerations in the M/M/N model, patience time is also taken into account in order to
determine whether the customers will wait in the queue or hang up. For the first N
customers the waiting time is zero, as expected. But for the subsequent customers a new
random variable, the exponentially distributed average patience time, is used. Hence the
waiting time is the minimum difference between how much time a customer has to wait
to get service (virtual waiting time, SV), and how much time a customer is willing to wait
to get service (Patience, P). The relationship between the two is specified as W (= min(SV,

43
P)). If the patience is greater than the virtual waiting time, then the waiting time will be
same as the virtual waiting time; otherwise, the waiting time is equals to patience.
Chapter 6: Computational Results
6.1 Exact Results using MATLAB
Table 9. shows our results compared with Garnett’s. The left hand side graphs below
illustrate the Matlab results that were generated by using the formula described in Chapter
5. The right-hand side graphs (only the solid lines) show the exact method results from
Garnett’s study. In our own results, ‘graphs’ a, b and c are represented by blue, red and
green lines respectively. Overall, the results in the figures are almost identical, except for
the sensitivity of 𝑃{𝑊 > 0} in relation to the number of agents. The Matlab results show
that there is a zero probability of delay at around (n = 55 agents), which is 15 agents less
than the level in Garnett’s results. Achieving zero delay probability with lower number of
agents seems to be a benefit, however, it could also indicate this method is unstable with
respect to staffing levels. There are small differences between the 𝑃{𝑊 > 10} values in
the two figures, although this may have arisen from errors in the scale of the graphs. Most
importantly, it is evident that the numerical difficulties for 𝑃{𝐴𝑏|𝑊 > 0} in Garnett’s paper
may have been overcome. In Garnett’s Paper, the exact graph for cases b and c were
not calculated for all values up to 80 agents; Garnett claimed this was because of
numerical difficulties in obtaining these values, and that he had addressed this problem
by using simulation (simulation results are not shown in the graph). However, in this
dissertation, this problem was tackled by using log transformation so that ‘graphs’ a and
b are not restricted to 50 agents anymore, and so the green and blue lines account for a
wider range of agents. Full details are provided in Appendix C.
Measures Own Work Garnett’s Work
(Blue line for ‘a’, Red line for ‘b’, Green line for ‘c’)

44
P{W>0}
P{Ab}
P{W>T}
P{Ab|W>T}
Table 9: Exact method comparison between our results and Garnett’s results.

45
6.2 Approximation Results using MATLAB
Four important performance measures P{W > 0}, P{Ab}, P{W > T}, and P{Ab|W > t}
were calculated by replicating both the approximation formula and the data for modelling
Abandonment in the paper by Garnett et al. (2002).The results are displayed in Table 10.
Similar to the previous one, Left-hand side graphs (blue for ‘𝑎’, red for ‘𝑏’, green for ‘𝑐’)
are our Matlab result and right-hand side graphs (only the ‘+’ signs lines ) are the
approximation method for Garnett’s. As a whole, the ‘similar’ results based on the
approximation method were produced using Matlab. (They are not the same, only similar.)
The reason for this is because, compared with the original graphs, it was found that the
graph patterns for P{W > 0}, P{W > T} and P{Ab|W > t} give a good fit to the ones in
the journal paper, but this was not so with P{Ab}. The Matlab results show three separate
lines (a, b, c) rather than the single line generated by Garnett et al. To be more specific,
Garnett et al. (2002) only produced a single graph (for b), because the other two lines
coincided with b, which implies that the (P{Ab}) abandonment probability with 𝜌 > 1 is
independent of the abandonment rate 𝜃. However, the Matlab results show that at around
N equals to 43, the ‘very impatient’ line have a low fraction of abandoning customers. But
this need not indicate a high level of service, but possibly an urgent need for it.
Measures Own Work Garnett’s Work
P{W>0}

46
P{Ab}
P{W>T}
P{Ab|W>T}
Table 10: Approximation method comparison between our results and Garnett’s results.
6.3 Simulation Results using MATLAB
In order to evaluate the results of this simulation model, they were compared against the
exact calculation method (also proposed in this dissertation), which in turn employed the
same values of input parameters. Roughly speaking, the simulation results make a good
fit with the data generated using the exact calculation. Under different patience times for
graphs a, b and c, it was found that the more impatient the customers, the better the fit
between exact and simulation data sets. To be more specific, graph c (green lines) and

47
graph b (red lines) outperform graph a (blue lines). However, as discussed, although the
exact method ‘replicates’ the input data and formulae used by Garnett et al. (2002), there
is a small margin of error that is difficult to eliminate, and this in turns leads to a relatively
larger margin of error for the differences between the exact results and the simulation
results. Furthermore, if the number of iterations (or the ‘maxiter’) is increased, the
simulation results become more valuable. Specifically, a small number of iterations and
a small number of random variables in the MATLAB program produced a deviation error.
To prove this, the simulation results for three different values of maxiter (100; 1000; 10000)
are shown in Figure 14. The results illustrate that the larger the number of iterations to
simulate the model, the more accurate the real world modelling. In this simulation result,
we also test the maxiter was raised to 100000, but it produced almost the same results
as those at a maxiter of 10000. Also, with the considering of running time for MATLAB
not to be too long, we choose the ‘’maxiter’ as 10000. Below is our results for simulation
method (‘x’ signs lines) compared with exact method (solid lines).
(Blue line for ‘graph a’, Red line for ‘graph b’, Green line for ‘graph c’)
Figure 14. Comparison between our simulation and exact calculation method of four
performance measures. (maxiter = 10000).

48
performance measures (maxiter = 1000).

49
performance measures (maxiter = 100).
6.4 Comparison among Three approaches
With accurate exact results computed in this dissertation compared with Garnett’s. Our
exact method forms a bridge linking simulation and approximation methods. Results
generated by the three methods are illustrated in the table below:
Measures Simulation(‘x’ sign) Vs Exact Exact Vs Approximation (‘+’ signs)

50
P{Ab}
P{W>0}
P{W>T}
P{Ab|W>T}
Table 11. Comparison among all three methods

51
For line C (the green line, indicating 6 seconds patient), simulation gives an accurate
result in P{W>T}, approximation gives an accurate results in P{Ab}. However, for P{W>0}
neither green line for the simulation nor the approximation forecasts are accurate. It is
interesting to mention that when number of call centre staffs less than 40, the simulation
produce a good fit with the exact method (solid line), and approximation produces a
perfect fit when the number of staff greater than 40. For P{Ab|W>T}, only the simulation
yields an precise prediction for small staffing levels.
The approximation results for line B and line A are more accurate for the four performance
measures compared with those for the simulation. It is therefore recommended to use an
approximation method when the customers are moderately patient (i.e., 1min patience)
or very patient (i.e., 10min patience).Nevertheless, when the customers are very
impatient (i.e. 6 sec patience), it is best to use different methods according to different
performance measures even though both models used for indicating the required staffing
level may be time-consuming. Overall, it is recommended to use the approximation
method for insight and calibration, and to use the simulation method for fine tuning.
Chapter 7: Conclusions
In modern firms, the ways to determine the optimal number of agents in a medium to
large-sized call centre is a crucial problem, especially with impatient customers. There
are 3 methods examined in this dissertation that are able to solve this problem: the exact
calculation method, the approximation method and simulation methods. Those three
methods for generating performance measures using the Erlang-A (M/M/N+M) model
have been scrutinised. Generally speaking, the data produced using the approximation
method accurately fits with the data generated with the exact method and we found
approximation method can accurately calculate the staffing level for customers with
moderate-to-large patience time. In addition, the approximation method proved to be
superior than exact ones in some performance measures such as P{Ab} and E[W].
Furthermore, the simulation method was compared against an exact method. The results
showed that simulation model can be useful in small staffing level for customers whose
patience time is only in seconds but for large number of staff, approximation still provide
with best accuracy. So our recommendation is to use approximation and exact method

52
as the main intuition, but use simulation as fine tuning. However, all three models have
limitations in terms of the assumptions and methods itself, so there are still several future
researches that need to be considered.
7.1 Limitations of Erlang-A model
Analytical models are commonly contrast with simulation. This is partly due to an
improved accessibility in simulation tools, partly due to the scarcity of mathematical skills
required for analytic models, but perhaps mostly due to the broadening gap between the
complexities of the modern call centres and the parameters of the analytical model.
Roughly speaking, analytic models are limited in terms of their assumptions, and
simulation models are limited in terms of their accuracy of the results as well as the
randomness of parameters with relaxed assumptions. In this chapter, we mentioned
several limitations in both assumptions and methods.
Limitation on Assumptions
 Exponentially distributed Patience time
It is assumed that patience time is exponentially distributed, but it is not necessary the
case in practise. Figure 17. are the estimations of the hazard rate10 of customer’ patience
for two bank in call centre: a large U.S. bank and a small Israeli bank. It is clear that for
both a large and a small call centre, there exist non-exponential patterns. For the U.S.
bank call centre there is a high probability of abandonment at the beginning of the wait,
but this becomes more stable after around 10 seconds. The patterns for the Israeli bank
call centre include two surges of abandonment: one is also at the beginning, but another
peak happens at around 60 seconds. Mandelbaum and Zeltyn (2005) described those
circumstances as results of human behaviour. For instance, the ‘10 seconds mark’
reflects a fairly normal point at which many customers change their minds. However, that
10 Hazard rate of an exponential random variable is a constant.

53
assumption is not valid in this dissertation project, so it is still necessary to consider the
validity of the Erlang-A formula.
 Exponentially distributed service time
The Erlang-A model assumes that the service time follows an exponential distribution.
However, several empirical analysis show that lognormal distribution (as we mentioned
in Chapter 3.311) have a better fit to the service time distribution. (Gans et al. 2003, Brown
et al. 2005).
 Constant Arrival Rate
Judging by the results from the sensitivity analysis, it is known that the Erlang-A model is
quite sensitive to changes in the arrival rate, service rate, and the number of customers.
However, in the Erlang A formula, it is often assumed that during each time interval,
arrivals follow a homogeneous Poisson process, and that call handling times (service
time) are exponentially distributed. It is justifiable to assume that service time as a
constant, but it is unrealistic to assume a constant arrival rate. Jongbloed and Koole (2001)
proposed a Poisson mixture model that relates to the Erlang formula if the uncertain
arrival rate is taken into account, and which incorporates a model for over-dispersion
associated with random arrival rate. This method include two steps, first is to collect the
11
Garnett paper was published in 2002. In recent year, there are plenty of researchers, who have already found
and improved the limitations more than Garnett’s. So we just mentions the limitations. More improved
approaches have already discussed in literature reviews Chapter 3.3.

54
rate 𝜆 from the mixed distribution 𝐻 on (0, ∞), and then a Poisson variable with that rate
is generated: The distribution of 𝑋 is a Poisson mixture with mixing distribution 𝐻:
𝑃 𝐻(𝑋 = 𝑥) = ∫
𝜆 𝑥
𝑥!
𝑒−𝜆
∞
0
𝑑𝐻(𝜆)
Then, the realized values x1, x2, … xk of the independent and identically distributed
random variable X1, X2, … Xk are distributed as X. However, the value of H is not known,
so one has to produce an estimate from the data. Two approaches could be used: one is
parametric (estimate the distribution by estimating its parameters); the other is to estimate
the distribution function H non-parametrically via its maximum likelihood.
Limitation on Methods:
 Exact method
In usual, the performance measure of exact method are expressed in terms of 𝐸[𝑓(𝑉, 𝑋)].
However, this method is limited in its capacity to represent quotients. P{Ab|W > 0} is an
example, this performance measure do not have its expression in 𝐸[𝑓(𝑉, 𝑋)], instead, the
only way to address it is to represent this formula as a function of other existing
performance measure (as shown below). Hence, the accuracy of this performance
measure will highly affected by other performance measures, which might leads to
fluctuations in accuracy. But if the time (T < t) is negligible, then P{Ab|W > 0} can be
replaced by P{Ab|W > t}, which is shown as follows.
P{Ab|W > t} =
P(W > t; Ab)
P(W > t)
=
P{(W > t) ∪ Ab}
P{W > t}
=
E[𝕝(t,∞)(V ∧ X)𝕝(X,∞)(V)]
E[𝕝(t,∞)(V ∧ X)]
 Simulation method
Although the simulation results we computed in Chapter 6 is reasonable, however,
simulation also has some important problems. First of all, in order to estimate the
probability of an event accurately, one needs to collect many (almost all) independent

55
observations of it in the simulation run, hence, simulation method can be very time-
consuming. In addition, a simulation run can only estimate for one set of values of
parameters in principle, so for many values of parameters, simulation needs to be
repeated many times. Moreover, simulation is unsuitable for those problems involves the
estimation of probabilities of rare events, i.e., events which have a very low probability of
occurrence (e.g., 10−6
or less) (de Boer, 2000). Such events are of much interest in
queueing models of telecommunications systems, since they are designed to have very
low packet loss12 probability to guarantee a good quality of service. However, the low
probabilities imply that the system can be simulated for a long time without the event
occurring even once.
 Diffusion approximation and square-root staffing
Mandelbaum and Zeltyn (2009) found that the square-root staffing with abandonment
model was not as robust as the model without abandonment. As for Erlang A model, they
also observed that the square-root staffing model is far from optimal. Zhang et al. (2010)
established a staffing refinement as a characterization of the optimality gap of
conventional square-root staffing. The conventional square-root staffing is: 𝑆∗ = 𝜆 + 𝛽∗√𝜆.
Implementing the results by Mandelbaum and Zeltyn (2009): the optimal staffing level for
Erlang-A model has the form 𝑆 𝑜𝑝𝑡 = 𝒪(𝜆−1 2⁄
) + 𝜆 + 𝛽∗√𝜆 + 𝛽• and they developed the
corrected diffusion approximation for the objective function. The refined staffing rules are
of the form
𝑆• = 𝜆 + 𝛽∗√𝜆 + 𝛽•
Then,
𝑆 𝑜𝑝𝑡 − 𝑆• = 𝒪(𝜆−1 2⁄
)
12 Packet loss is the lost data or lost calls, which happens usually in congestion. Good measure on
Packet loss probability will help the manager to increase the service level.

56
Zhang et al. (2010) refer the order 𝒪(𝜆−1 2⁄
) to express the difference between the exact
optimal staffing level 𝑆 𝑜𝑝𝑡 and the approximate staffing level 𝑆• as the optimality gap.
Hence this optimality gap 𝒪(𝜆−1 2⁄
) suggests that the staffing level 𝑆• could become more
accurate as 𝜆 increases. Similarly, the difference between the conventional staffing level
𝑆∗and the refined staffing rules 𝑆• is 𝛽• with 𝒪(1), (because 𝑆• = 𝑆∗ + 𝛽•) which indicate
that 𝑆• would be a more accurate prescription than 𝑆∗ due to:
𝑆 𝑜𝑝𝑡 − 𝑆∗ = 𝛽• + 𝒪(𝜆−1 2⁄
)
In addition, in this paper, Zhang et al. (2010) also provide the refined approximation
formula. Although this is a complicated expression, its computation is not difficult. We can
see from the following Table 12. Table 13. (Zhang et al. 2010) for a large/small system,
regardless of whether the customer patience level is high or low, 𝑆• provide an extremely
accurate approximation of 𝑆 𝑜𝑝𝑡 than 𝑆∗.
Table 12. Large System with high abandonment rate, high call volume P{W > 0} = ϵ; 𝜃 =
100; 𝜆 = 3000.

57
Table 13. Small System with high abandonment rate, low call volume.: {𝑊 > 0} = 𝜖; 𝜃 =
10; 𝜆 = 30
In closing, the author found that the refinement can be significant when the constraint is
tight, regardless of the customer patience level or the system size. So it is highly
recommend that the refined square-root staffing rule should be used for call centre, no
matter the size of call centres. The performance of the refined square-root staffing rule
(in terms of the Erlang-A approach) is still highly regarded among researchers.
7.2 Implementation in Business
The results in this dissertation may prove beneficial to the call centre sector, especially
large call centres. According to the survey by Deloitte on Global Contact Centres in 2013,
77% of contract centres will retain their capacity or expand during 2014 and 2015. There
are several advantages of large call centres. Firstly, larger centres employ a larger pool
of employees, and so will be afflicted with fewer impatient customers, and with a minimal
probability of abandonment. (The probability of abandonment may approach zero owing
urgent needs for services rather than high service levels, but it is more likely that only a
small percentage of customers will have urgent needs). In addition, large call centres
embody greater economies of scales, and are better placed to achieve improvements in
both service quality and cost performance. This was demonstrated by The Customer
Group in 2001, as shown in Table 14 below. With higher staffing levels, an agent can
handle more calls at a given service level than she/he could in a small team. More
importantly, with sustained call volumes and specified hours for employees (agents), the

58
workload ratio falls, resulting in reduced pressure upon agents and relatively good service
quality.
Table 14. Economies of Scale Staffing Example from ‘The Customer Group’.
Moreover, larger call centres provide much better command and control capabilities than
small ones. It would be easier to manage a continually operating (24/7) centre across
multiple sites or within one large building, instead of both technically and economically
infeasible ‘networking’ in small centres.
Another recommendation for call centre managers is to consider the quantitative models
described in this dissertation. The analytical and simulation models can be used to define
benchmarks, and may help managers to make intuitive decisions. Quantitative methods
include monitoring, data analysis, and making quantitative changes in call centre
performance. Managers should track relevant performance indicators, and change
course only when they reach unacceptable levels. In other words, call centre
management should be pro-active rather than reactive. For instance, managers should
ensure in advance that customers are not kept waiting for unreasonably long periods of
time, rather than hurriedly conscripting more agents when there is excessive demand.
7.3 Future Research
One direction of future research is in the further development of the refined square-root
staffing of the Erlang-A model. This has been developed for the Erlang-C model, and has
proved to be significantly more accurate. Another issue that merits further research is the
study of models, which considered the abandonment, retrial and after-call work
concurrently.

59
Reference Lists:
Alfares, H.K. 2007. "Operator staffing and scheduling for an IT-help call centre." European
Journal of Industrial Engineering 1(4), pp. 414-430.
Armony, M., and Maglaras, C. 2004a. "Contact centres with a call-back option and real-
time delay information." Operations Research 52(4), pp. 527-545.
Armony, M., and Maglaras, C. 2004b. On customer contact centres with a call-back option:
Customer decisions, routing rules and system design. Operations Research 52(2), pp.
271–292.
Avramidis, A. N., Chan, W., and L'Ecuyer, P. 2009. Staffing multi-skill call centres via
search methods and a performance approximation. Iie Transactions, 41(6), 483-497.
Aksin, Z., Armony, M., and Mehrotra, V. 2007. "The Modern Call Centre: A Multi‐
Disciplinary Perspective on Operations Management Research." Production and
Operations Management 16(6), pp. 665-688.
Armony, M. 2005 "Dynamic routing in large-scale service systems with heterogeneous
servers." Queueing Systems 51(3-4), pp. 287-329.
Armony, M., and Mandelbaum, A. 2011. "Routing and staffing in large-scale service
systems: The case of homogeneous impatient customers and heterogeneous servers."
Operations research 59(1), pp. 50-65.
Atlason, J., Epelman, M. A., and Henderson, S. G. 2004. "Call centre staffing with
simulation and cutting plane methods." Annals of Operations Research 127(1-4), pp. 333-
358.
Avramidis, A. N., Chan, W., and L'Ecuyer, P. 2009. "Staffing multi-skill call centres via
search methods and a performance approximation." Iie Transactions 41(6), pp. 483-497.
Aguir, S., Karaesmen, F., Akşin, O. Z., and Chauvet, F. 2004. “The impact of retrials on
call centre performance.” OR Spectrum, 26(3), 353-376.
Artalejo, J. R., and Pozo, M. 2002. "Numerical calculation of the stationary distribution of
the main multiserver retrial queue." Annals of Operations Research 116(1-4), pp. 41-56.
Artalejo, J. R., and Pla, V. 2009."On the impact of customer balking, impatience and
retrials in telecommunication systems." Computers & Mathematics with Applications
57(2), pp. 217-229.
Artalejo, J. R., and Lopez-Herrero, M. J. 2010. "Cellular mobile networks with repeated
calls operating in random environment." Computers & operations research 37(7), pp.
1158-1166.
Baccelli, Francois, and Gerard Hebuterne.1981. "On queues with impatient customers."
Baron, O., and Milner, J. 2009."Staffing to maximize profit for call centres with alternate
service-level agreements." Operations Research 57.3, pp. 685-700.

Dissertation

Recommended

Recommended

More Related Content

Similar to Dissertation

Similar to Dissertation (20)

Dissertation