This document summarizes a paper that discusses the presence of "fat-tailed" inputs in large manufacturing systems that produce documents. Fat-tailed distributions have long tails which result in a higher probability of extreme or outlier observations compared to normal distributions. The paper describes how job size distributions in some large print shops have been observed to follow fat-tailed statistical patterns. This leads to high variability and difficulties in predicting performance. The paper proposes designing autonomous production cells and dynamic scheduling policies to improve productivity in such manufacturing systems affected by fat-tailed inputs. A case study example found improvements including reduced throughput times, defects, and increased productivity after implementing the proposed solution.
Fat-tail inputs in manufacturing systems (Industrial Engineering Research Conference paper)
1. Proceedings of the 2008 Industrial Engineering Research Conference
J. Fowler and S. Mason, eds.
Fat-tail inputs in manufacturing systems
Sudhendu Rai
Xerox Corporation
800 Phillips Road, MS 128-51E, Webster, NY 14580
Abstract
The increase in product diversity and customization has led to the emergence of extreme levels of variability in
some production environments. In this paper, we will discuss and characterize some large manufacturing systems
that produce documents and where we have observed inputs (e.g. job-size) that have fat-tail statistical distributions.
The consequence of these on manufacturing productivity will be described and novel approaches to effectively
design and operate such systems will be presented.
Keywords
Fat-tails, manufacturing systems, autonomous cells
1. Introduction
Print shops are document manufacturing systems. They take raw material and information as input and through a
series of processing steps create the final finished document products such as books, brochures, checks, invoices and
the like. High variability as characterized by fat-tailed distributions in job sizes, job inter-arrival rate and job mix has
been observed in large print shops. Although fat-tail phenomena have been observed in internet traffic [2] and
computer workload on distributed processors [7] this behavior has not yet been reported in manufacturing systems
literature.
Large print shops are the first manufacturing systems reported to exhibit power-law distribution characteristics, and
in many instances – a fat tail. An example is given in Figure 1.
Figure 1 Fat tailed job size distribution from a print shop
Definition of fat-tailed distribution Let X be a random variable with cdf (cumulative density function) F(x) =
P[X≤ x] and complementary cdf (ccdf) Fc(x) = P[X>x]. We say here that a distribution F(x) is fat-tailed if
2
0
,
~
)
( <
<
−
α
α
cx
x
Fc (1)
In the limit of x->∞
2. Rai
α
−
=
∞
→
x
c
x
d
x
dLogF
log
)
(
lim
(2)
The theory of stable distributions [5] [13] has been used to characterize the behavior of these distributions. A
discussion of efficient methods for estimating the parameters and properties of these distributions can be found in
[9]. Fat-tailed distributions behave quite differently from distributions such as normal distributions or exponential
distributions that are most frequently used in characterizing manufacturing system behavior. Because their tails
decline relatively slowly, the probability of very large observations is not negligible. Fat-tail distributions have
infinite variance indicating high variability in the underlying process that generates these distributions.
In this paper, we will propose a shop design and scheduling approach for large shops with fat-tail job size
distribution inputs. The implementation of the solution in a real document manufacturing facility has shown to
significantly improve productivity.
2. Shop design and scheduling policy design in the presence of fat-tail inputs
A simple method to detect the presence of fat-tail distribution is to plot ln(CCDF) and ln(JobSize) as shown in
Figure 2. As seen from equation (2) in the main text, the negative value of the slope for large job sizes will be equal
to −α. The slope of curve in Figure 2 towards the end of the curve (i.e. for large job sizes) is -0.56 implying a fat-
tail distribution.
Fat tailed distributions have many small jobs mixed with a few large jobs. Therefore, even though most of the job-
sizes are small, the major contribution to sample mean or variance comes from the few large observations. In such
distributions, the difference between means and medians is usually very high. E.g. for the print shop data shown in
Figure 1, the mean = 17170 and median = 260.
Heavy-tailed distribution determination
-6
-5
-4
-3
-2
-1
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Ln(JobSize)
Ln(CCDF)
Heavy-tailed distribution
determination
Figure 2: Ln(CCDF) and Ln(JobSize) plot of job size data from print shop
2.1 Calculation of moments and steady state
To analyze the behavior of the sample mean, we are concerned with the convergence properties of sums of random
variables. The normal starting point for such discussions would be the Central Limit Theorem (CLT). Unfortunately,
the CLT applies only to sums of random variables with finite variance, and so does not apply in this case. In the
place of the CLT we instead have limit theorems for fat-tailed random variables first formulated by L´evy [5] [13].
To introduce these results we need to define the notation A
d
→ B which means that the random variable A
converges in the distribution to B (roughly, has distribution B for large n). Then the usual CLT can be stated as: for
Xi i.i.d and drawn from some distribution F with mean µ and variance σ2
<∞, define
∑
=
=
n
i
i
n X
n
A
1
1
(3)
and
)
(
2
/
1
µ
−
= −
n
n A
n
Z (4)
then
3. Rai
)
,
0
( 2
σ
Ν
→
d
n
Z (5)
where N(0,σ2
) is a Normal distribution.
However, if Xi are i.i.d. and drawn from some distribution F that is fat-tailed with tail index 1< α <2, then if we
define
)
(
/
1
1
µ
α
−
= −
n
n A
n
Z (6)
we find that
α
S
Z
d
n → (7)
where Sα is an α−Stable distribution [22].
The consequence of equation (6) is that as α gets smaller, the convergence of the sample mean to the population
mean becomes very slow – much slower than in the case of distributions with finite variance. In Figure 3 we show
the sample mean calculations performed on samples drawn from two distributions – one being normal and another
that is fat-tailed. It shows how the mean of the sample drawn from normal distribution converges at 100 data-points
but the mean of samples drawn from a fat-tail job size distribution of a large print shop has not converged even
when the sample size is 2000. Crovella [3] notes that simulations performed with fat-tailed inputs converge slowly
to steady state and show high variability at steady state. The practical consequence is that it becomes difficult for
such print shops to predict performance within tight bounds.
Figure 3: Mean calculations on samples drawn from a normal distribution and a fat-tail job-size distribution.
2.2 Scheduling Approach
Harchol-Balter et al. [7] discusses the implications of scheduling policies when task-size distribution is fat-tailed.
They consider four task-assignment policies for such distributed server systems: Round Robin; Random; Size-
Based, in which all tasks within a given size range are assigned to a given host; and Dynamic-Least-Work-
Remaining, in which a task is assigned to a host with least outstanding work. They reported that when the task-sizes
are not highly variable, the Dynamic-Least-Work-Remaining policy is preferable. However, when the task-size
distribution is fat-tailed and exhibits high variability, then the Size-Based policy is the best choice.
Unlike the single stage computer systems discussed in Harchol-Balter’s [7] work, the large transaction print shops
where fat-tailed behavior was observed have multiple production stages and also sequence dependent setups. Print
jobs arrive in the print shop and are sent to the printer queues for printing. Subsequently, they are moved to inserting
equipment for folding and insertion into envelopes. A setup in the form of roll-changeover is incurred each time the
underlying pre-printed form changes. Similarly, a new setup is needed when requirements on the inserting machines
change. The combined effect of sequence-dependent setups and fat-tail job size distribution creates difficulties in
achieving high throughput in these shops.
Mean Value as a Function of Sample Size for a fat-tailed vs. normally distributed job size
0
5000
10000
15000
20000
25000
0 500 1000 1500 2000 2500
Sample Size
Mean
Value
Avg_Formtype 1 (a = 0.6)
Avg_Formtype 1 (a = 2.0)
4. Rai
2.3 Heuristic Solution
Most print shops use a departmental layout and the batch-and-queue approach for processing jobs. It is widely
accepted that batch-and-queue causes large turnaround times and low equipment utilization. Another class of print
shops utilizes highly-automated inline printing and finishing devices (e.g. a printer connected to a shrink-wrapper
via conveyor belt and inline bar-code readers to control jobs). The latter maintain no buffer between the various
connected machines and that makes them highly sensitive to component failures thereby leading to low long-term
productivity primarily because of absence of buffers to protect against random downtime and repair time [
We focus on two types of variability in print shops namely, job type or workflow variety and job size variability. In
Figure 4 we show a bar chart description of a shop that processes 961 different product workflows. The process
cycle efficiency of large shops is quite low (approximately 10%). While there are many factors that contribute to the
low efficiency of these shops, high product variety can be an important factor. Donati et al. [4] report the potential
of phase transitions in scheduling efficiency whereby the scheduling efficiency of shops drops dramatically when
product workflow variety increase.
Figure 4: Print shop showing the job count associated with each workflow type.
As discussed earlier, job size distributions in large shops has also shown fat-tail behavior. To improve the
productivity of such shops, an architecture based on the concepts of autonomous cells [12] and hierarchical
scheduling [14] is proposed. Each cell is fully equipped with material, personnel and machines to fully produce a
few different types of products. It also novel job-splitting, routing and event-driven scheduling, WIP management
approaches using finite inter-process buffers [6] and labor cross-training to improve the productivity of print shops.
By limiting the number of job types associated with each autonomous cell, the negative impact on scheduling
efficiency [4] is also minimized.
Next we present a heuristic for creating autonomous cells and the associated scheduling policy. Figure 5 shows the
scheduling architecture for creating cells and scheduling jobs in large print shops with fat-tailed inputs. In step 1, the
incoming job stream is separated into two sets using a job-size threshold parameter – one that contains small jobs
and the other that are jobs above the threshold. Autonomous cells are designed to process the small jobs in their
entirety. The larger jobs are further split into two sets – one that require less frequent setup and the other that require
more frequent setups. Multiple strategies are used to perform this partitioning. In one strategy, the number of unique
setup requirements is determined separately for the print production step and the insertion step for each job. Each
job is assigned a number that is the maximum of the two setup numbers. A threshold is determined. Jobs that have
an assignment that is less than the threshold are put in the low-setup pool, and the rest are put in a high-setup pool.
Autonomous cells are designed for each set. The key idea behind this strategy is to separate jobs that require high
setups from the ones that require low setup and to design different types of autonomous production cells to handle
the two types.
Batch-splitting policies are further implemented to speed up production within the autonomous cell. Efficient
sequencing policies that optimize the desired production requirement (minimizing % of late jobs, improving
utilization and the like) are determined. The interaction of these variables and the resulting impact on cell design and
scheduling policies and performance can be quite complex. Discrete-event simulation models are built that allow
iteration and optimization of the parameters of the architecture shown in Figure 5.
Job Count by unique workflow
0
20
40
60
80
100
120
140
160
1 81 161 241 321 401 481 561 641 721 801 881 961
Unique Job Type ID
Number
of
Jobs
in
Category
Job Count
5. Rai
Figure 5: A scheduling architecture for large print shops with fat-tailed job size inputs
The solution described by the architecture diagram in Figure 5 is a function of the job size, setup requirements, and
related characteristics of the incoming job stream. As the customers’ job mix changes, the cells may require re-
adjustment and optimization. While the proposed architecture delivers superior performance to a functionally
organized print shop, it needs ongoing maintenance and optimization as the job mix changes over time.
3. Case Study Results
Large shops in the document manufacturing industry can employ over 100 employees; have up to 100 pieces of
equipment with sequence-dependent setup requirements; and process up to 1000 jobs per day with cycle time
requirements that vary between a few hours to few days.
The methodology discussed above was implemented in a large shop. The job size distribution of this shop exhibited
fat-tail behavior and equipments required up to six different types of setup depending on the sequence of jobs
processed on them. The shop had a very high WIP level with WIP in excess of 3 million document items on a given
day. The proposed solution was to separate small jobs from the large jobs and dedicate autonomous cells to handle
each class separately. Detailed simulation models were constructed to determine the structure of the autonomous
cells and the corresponding scheduling policy. In Figure 6 we show the results of implementing the solution
discussed in Figure 5 on the production operations.
Space Savings 15%
Product Travel Distance Reduction 75%
Throughput Time Savings 31%
Defects Per Million Reduction 55%
Productivity Improvement 12%
Sigma Metric Improvement 4.8 to 5.0
Yield Improvement 99.941 to 99.973
Figure 6: Productivity improvements achieved as a result of implementing the solution at a large shop
4. Conclusions
Job arrival
Job classifier by size
Big job pool classifier Small job pool scheduler
Machine Setup > Setup Threshold
High setup job
router
Low setup job
router
Cell1
Queue
sequencing
Batch Splitting, Sorting,
Machine assignment
Cell1
Queue
sequencing
Batch Splitting, Sorting
Machine assignment
Cell1
Queue
sequencing
Batch Splitting, Sorting,
Machine assignment
Cell1
Queue
sequencing
Batch Splitting, Sorting
Machine assignment
Job size threshold parameter
Cell1
Queue
sequencing
Batch Splitting, Sorting,
Machine assignment
Cell1
Queue
sequencing
Batch Splitting, Sorting
Machine assignment
6. Rai
This paper discusses fat-tail inputs in manufacturing systems, specifically document manufacturing systems. Novel
methods are proposed to improve the productivity of such shops based on the concepts of autonomous cells [12] and
dynamic scheduling policies. Improvements achieved from one such implementation are presented.
References
1. Colorni, A., Dorigo, M., Maniezzo, V. and Trubian, M., 1994, “Ant system for Job-shop Scheduling”.
JORBEL - Belgian Journal of Operations Research, Statistics and Computer Science, 34(1), 39-53.
2. Crovella, M.E, Taqqu, M.S. and Bestavros, A. . 1998, “Heavy-tailed probability distributions in the world
wide web.” In A Practical Guide to Heavy Tails, Chapter 1, Pages 1-23. Chapman & Hall. New York.
3. Crovella, M.E. and Lipsky, L.. 1997 “Long Lasting transient conditions in simulations with heavy-tail
workloads”. In Proceedings of the 1997 Winter Simulation Conference.
4. Donati, A.V., Darley, V., Ramachandran, B., 2004, “An Ant-Bidding Algorithm for Generalized Flow
Shop Scheduling Problem: Optimization and Phase Transitions”. (Working paper)
www.santafe.edu/~vince/
5. Feller, W.(1971). “An Introduction to Probability Theory and Its Applications”. Volume II. Second edition.
John Wiley and Sons.
6. Gershwin, S.B., (1994) “Manufacturing Systems Engineering”. Prentice-Hall, Englewood Cli#s, NJ 1994.
ISBN: 0-135-60608X.
7. Harchol-Balter,M., Crovella, M.E. , Murta, C.D. ,2004, “On Choosing a Task Assignment policy for a
distributed server system”. Lecture Notes in Computer Science, Springer Berlin/Heidelberg, ISBN 0302-
9743
8. Harchol-Balter, M. and Downey, A.. 1996. “Exploiting process lifetime distributions for dynamic load
balancing”. In Proceedings of SIGMETRICS ’96, pages 13-24.
9. Lin, J., Jacobs, T. and Rai, S., 2006, “System and method for determining an optimal batch size for a print
job”, XEROX 20050683-US-NP.
10. Nolan, J.P. (1998).”Multivariate stable distributions: approximation, estimation, simulation and
identification”. In R.J. Adler, R.E. Feldman, and M.S. Taqqu (Eds.), A Practical Guide to Heavy Tails, pp.
509-526. Boston: Birkhauser.
11. Rai, S., and Viassolo, D., 2001, “Production server architecture and methods for automated control of
production document management”, US patent 7,051,328, filed in Jan. 2001, granted in June 2006.
12. Rai, S., Godambe; A., Duke, C. B. and Williams, G., “Printshop resource optimization via the use of
autonomous cells”, US patent 7,079,266, filed in Nov. 2000, granted in July 2006.
13. Samorodnistsky, G., Taqqu, M., 1994, “Stable Non-Gaussian Random Processes”. Stochastic Modeling.
Chapman and Hall. New York.
14. Squires, D and Rai S., 2000 Production server for automated control of production document management,
US patent 7,065,567, filed in Nov. 2000, granted in June 2006.