Elevate Developer Efficiency & build GenAI Application with Amazon Q
Advanced Production Accounting
1. Advanced Production Accounting
Industrial Modeling Framework (APA-IMF)
i n d u s t r IAL g o r i t h m s LLC. (IAL)
www.industrialgorithms.com
July 2013
Introduction to Advanced Production Accounting, UOPSS and QLQP
Presented in this short document is a description of what we call "Advanced" Production
Accounting (APA). APA is the term given to the technique of vetting, screening or cleaning the
past production data using statistical data reconciliation and regression (DRR) when
continuous-processes are assumed to be at steady-state (Kelly and Hedengren, 2013) i.e.,
there is no significant material accumulation. Essentially, the model and data define a
simultaneous mass and volume with density DRR problem. Figure 1 depicts a relatively small
production accounting flowsheet problem configured in our unit-operation-port-state
superstructure (UOPSS) (Kelly, 2004a, 2005, and Zyngier and Kelly, 2012).
Figure 1. Oil-Refinery Production Accounting Flowsheet (Kelly and Mann, 2005).
The diamond shapes or objects are the sources and sinks known as perimeters, the triangle
shapes are the pools or tanks and the rectangle shapes with the cross-hairs are continuous-
2. process units and as mentioned these units should have a steady-state detection algorithm
(SSD) installed to determine if the units are steady or stationary. The circle shapes with no
cross-hairs are in-ports which can accept one or more inlet flows and are considered to be
simple or uncontrolled mixers. The cross-haired circles are out-ports which can allow one or
more outlet flows and are considered to be simple or uncontrolled splitters. The lines, arcs or
edges in between the various shapes are known as internal and external streams and represent
in this context the flows of materials from one shape to another. This example and its data are
taken directly from Kelly and Mann (2005) but is mapped to our UOPSS modeling framework
which includes only one time-period typically defined for one business or calendar day. A
related technique using multiple time-periods can be found in Kelly et. al. (2005) to trace or track
production qualities throughput any process network and is useful for real-time or on-line
monitoring applications as it involves dynamic DRR.
In this example, we have a crude-oil distillation unit (CRD), vacuum distillation unit (VAC),
fluidized catalytic cracking unit (FCC) and a catalytic reformer (REF) as well as twenty-four (24)
tanks for crude-oil, intermediate and final product storage. The continuous-process units only
conserve mass whereas all of the tanks conserve both mass and volume using density as the
conversion from volume to mass. There are five (5) perimeter units which represent pipeline
deliveries and liftings as well as a fuel gas burner export to a cogeneration utilities plant. For
this data set, there is no finished product blending that occurred over this production accounting
time-period and hence there is no blending header units shown.
The key difference between the modeling found in Kelly and Mann (2005) and our formulation is
that we use the concept of "ports" which allows for a more unambiguous and parsimonious
representation of the quantity, logic and quality phenomenological (QLQP) data. For instance,
on the CRD at out-port JVAC there are two flows out (quantity) simultaneously which requires
only one density (quality) measurement given that JVAC is an implied splitter. However, in the
Kelly and Mann (2005) formulation which does not employ the concept of ports, they require two
density measurements i.e., one for each stream out which requires more pre-processing of the
data to manage the density of each individual stream. The efficiency of UOPSS and QLQP is
that only one density measurement at JVAC needs to be configured and through the topology of
the superstructure, the necessary propagation of the out-port qualities is properly and
automatically handled.
Industrial Modeling Framework (IMF), IMPRESS and SIIMPLE
To implement the mathematical formulation of this and other systems, IAL offers a unique
approach and is incorporated into our Industrial Modeling and Pre-Solving System we call
IMPRESS. IMPRESS has its own modeling language called IML (short for Industrial Modeling
Language) which is a flat or text-file interface as well as a set of API's which can be called from
any computer programming language such as C, C++, Fortran, Java (SWIG), C# or Python
(CTYPES) called IPL (short for Industrial Programming Language) to both build the model and
to view the solution. Models can be a mix of linear, mixed-integer and nonlinear variables and
constraints and are solved using a combination of LP, QP, MILP and NLP solvers such as
COINMP, GLPK, LPSOLVE, SCIP, CPLEX, GUROBI, LINDO, XPRESS, CONOPT, IPOPT and
KNITRO as well as our own implementation of SLP called SLPQPE (Successive Linear &
Quadratic Programming Engine) which is a very competitive alternative to the other nonlinear
solvers and embeds all available LP and QP solvers.
In addition and specific to DRR problems, we also have a special solver called SECQPE
standing for Sequential Equality-Constrained QP Engine which computes the least-squares
3. solution and a post-solver called SORVE standing for Supplemental Observability, Redundancy
and Variability Estimator to estimate the usual DRR statistics found in Kelly (1998 and 2004b)
and Kelly and Zyngier (2008a). SECQPE also includes a Levenberg-Marquardt regularization
method for nonlinear data regression problems and can be presolved using SLPQPE i.e.,
SLPQPE warm-starts SECQPE. SORVE is run after the SECQPE solver and also computes
the well-known "maximum-power" gross-error statistics to help locate outliers, defects and/or
faults i.e., mal-functions in the measurement system and mis-specifications in the logging
system.
The underlying system architecture of IMPRESS is called SIIMPLE (we hope literally) which is
short for Server, Interacter (IPL), Interfacer (IML), Modeler, Presolver Libraries and Executable.
The Server, Presolver and Executable are primarily model or problem-independent whereas the
Interacter, Interfacer and Modeler are typically domain-specific i.e., model or problem-
dependent. Fortunately, for most industrial planning, scheduling, optimization, control and
monitoring problems found in the process industries, IMPRESS's standard Interacter, Interfacer
and Modeler are well-suited and comprehensive to model the most difficult of production and
process complexities allowing for the formulations of straightforward coefficient equations,
ubiquitous conservation laws, rigorous constitutive relations, empirical correlative expressions
and other necessary side constraints.
User, custom, adhoc or external constraints can be augmented or appended to IMPRESS when
necessary in several ways. For MILP or logistics problems we offer user-defined constraints
configurable from the IML file or the IPL code where the variables and constraints are
referenced using unit-operation-port-state names and the quantity-logic variable types. It is also
possible to import a foreign LP file (row-based MPS file) which can be generated by any
algebraic modeling language or matrix generator. This file is read just prior to generating the
matrix and before exporting to the LP, QP or MILP solver. For NLP or quality problems we offer
user-defined formula configuration in the IML file and single-value and multi-value function
blocks writable in C, C++ or Fortran. The nonlinear formulas may include intrinsic functions
such as EXP, LN, LOG, SIN, COS, TAN, MIN, MAX, IF, NOT, EQ, NE, LE, LT, GE, GT and KIP,
LIP, SIP (constant, linear and monotonic spline interpolation) as well as user-written extrinsic
functions.
Industrial modeling frameworks or IMF's are intended to provide a jump-start to an industrial
project implementation i.e., a pre-project if you will, whereby pre-configured IML files and/or IPL
code are available specific to your problem at hand. The IML files and/or IPL code can be
easily enhanced, extended, customized, modified, etc. to meet the diverse needs of your project
and as it evolves over time and use. IMF's also provide graphical user interface prototypes for
drawing the flowsheet as in Figure 1 and typical Gantt charts and trend plots to view the solution
of quantity, logic and quality time-profiles. Current developments use Python 2.3 and 2.7
integrated with open-source Dia and Matplotlib modules respectively but other prototypes
embedded within Microsoft Excel/VBA for example can be created in a straightforward manner.
However, the primary purpose of the IMF's is to provide a timely, cost-effective, manageable
and maintainable deployment of IMPRESS to formulate and optimize complex industrial
manufacturing systems in either off-line or on-line environments. Using IMPRESS alone would
be somewhat similar (but not as bad) to learning the syntax and semantics of an AML as well as
having to code all of the necessary mathematical representations of the problem including the
details of digitizing your data into time-points and periods, demarcating past, present and future
time-horizons, defining sets, index-sets, compound-sets to traverse the network or topology,
calculating independent and dependent parameters to be used as coefficients and bounds and
4. finally creating all of the necessary variables and constraints to model the complex details of
logistics and quality industrial optimization problems. Instead, IMF's and IMPRESS provide, in
our opinion, a more elegant and structured approach to industrial modeling and solving so that
you can capture the benefits of advanced decision-making faster, better and cheaper.
"Advanced" Production Accounting Synopsis
At this point we explore further the purpose of "advanced" production accounting in terms of its
diagnostic capability of aiding in the detection, identification and elimination of "bad" production
data where "bad" really implies inconsistent data. The major advantage of DRR is its ability to
use redundant data which is sometimes referred to as over-determined or over-specified
problems. The redundancy primarily occurs because of the inclusion of a model i.e., equations
or equality constraints relating flow, holdup and density variables together as in laws of
conservation of matter, energy and momentum. Some of these variables are measured or
reconciled, some are unmeasured or regressed while others are fixed or rigid. Measured
variables include a raw and known (finite) variance, unmeasured variables have a large and
unknown (infinite) variance and fixed variables have no or zero variance. The DRR objective
function is to minimize the weighted sum of squares of the raw measurements minus its
reconciled estimate where the weights are simply determined as the inverse of its raw variance
(Kelly, 1998). At a converged DRR solution using SECQPE we have estimates of the
reconciled and unmeasured or regressed variables and after running SORVE we have new
variance estimates for the reconciled and unmeasured or regressed variables as well as
redundancy and observability estimates for each measured and unmeasured variable
respectively. Furthermore, using these variances we can compute individual gross-error
detection statistics for the measured variables and equality constraints as well as confidence
intervals for each unmeasured variable using the Student-t tables to determine statistical
threshold or critical values. In addition, we can also compute a global or overall Hotelling
statistic on the objective function value to detect if at least one gross-error exists.
If we apply these techniques to the data set found in Kelly and Mann (2005) where the
flowsheet has been slightly modified to transform it into UOPSS, and there are no injected
gross-errors into the system, we arrive at an objective function of 34.87 with a Hotelling critical
value of 43.2 indicating that there are no detectable gross-errors. However, if we add a
significant bias, drift or offset to the density of pool T300 storing LPG of 0.05 i.e., the density
changes from 0.600 to 0.650, the objective function inflates to 334.64 where the Hotelling
statistic does not change. This indicates that at least one of the measurements is in gross-error
and/or there is a leak or unexpected flow in or out of one of the nodes. Using the individual
maximum-power measurement statistics we have three significant ones for the densities on the
"FCC,lpg" and "REF,lpg" out-ports as well as on "T300,LPG" of 17.315, 17.353 and 17.313
respectively which are very similar to those found in Table 3 of Kelly and Mann (2005).
Although it does not pinpoint "T300,LPG" exactly as the location of the gross-error it is able to
isolate the area, section or region of the flowsheet accurately to where the possible outlier may
exist which is very useful for large flowsheets. An interesting property or artifact of the
maximum-power measurement statistics is that if the measurement is deleted or removed i.e., is
made unmeasured, then the reduction in the weighted least-squares objective function will
equal the square of the maximum-power statistic. For example, 17.313^2 = 299.74 and when
we subtract this amount from 334.64 we get 334.64 - 299.74 = 34.90 which is very close to our
original objective function with no detectable gross-error of 34.87. Note that the reason it is
called the maximum-power statistic is due to the fact that if there is only one gross-error in the
system then this statistic will have the maximum-power or "maximum-probability" to detect that it
is a true outlier.
5. More generally, there are essentially two types of what-if scenarios used in APA to ultimately
"close" a production accounting period data set to within statistical control limits i.e., declaring
the production accounting period to be in statistical production control. The first is the one
mentioned above whereby a measured/reconciled variable is determined to be in gross-error by
switching it to an unmeasured/regressed variable and checking to see if the objective function
and other measurement and constraint statistics are below their statistical critical values. The
second is making a fixed or rigid variable into an unmeasured or regressed variable. In most
industrial plants found in the process industries (especially in pipe-less plants) there is flexibility
in how materials or resources can be routed, connected or streamed from one piece of
equipment to another (Kelly, 2000). The logging or recording of these movements can also be
erroneous even to the point where they are not logged at all. If the system knows of all of the
possible routes, lineups or external streams (out-port to in-port) then it is prudent to change a
suspect route from being fixed or rigid i.e., not open, active or setup with a tolerance or variance
of zero (0), to being unmeasured or regressed with an unknown value and a variance of infinity.
If a scenario with one of the routes changed from fixed to unmeasured results in a significant
reduction in the objective function, then this is potentially a mis-logged or mis-specified
connection and should be investigated further (Kelly, 1999).
In conclusion, the primary benefit of APA is to statistically scrutinize the production accounting
data on a regular and timely basis to quickly and accurately highlight anomalies in the flowsheet
where possible defects exist. When gross-errors are detected and identified it is then prudent to
eliminate these faults by re-calibrating instruments, improving the logging or recording of
manually entered transactional data such as temporary stream flows, updating or refreshing
auxiliary data sources more frequently, etc. (Kelly, 2000). If for example advanced planning and
scheduling (APS) decisions are made using bad or poor quality production data then of course
these decisions are unfortunately suspect and can significantly and negatively impact the
performance and profitability of your production-chain (Kelly and Zyngier, 2008b).
Finally, Appendix A and B show the APA-IMF.UPS and APA-IMF.IML files used to configure
both the model and the data of the APA problem. The UPS file contains the UOPSS constructs
or shapes and the IML file contains all of the static and dynamic QLQP capacity data referenced
by the UOPSS constructs. The UPS file can be automatically created using the open-source
drawing software called GNOME Dia and using the Python 2.3 programming language to
access Dia's object model to retrieve the UOPSS sheet shapes. The IML file is a simple text file
with several categories or classifications of both the model (master, static) data and the cycle
(transactional, dynamic) data. An interesting feature of the IML file are the use of "Calc"'s
(values assigned to symbols) which can be used to manage dynamic data from the field such as
flow meter readings and laboratory analysis results. This means that interfacing or binding the
various data sources to the IML file is achieved by changing the value of a Calc and then using
this Calc in the rest of the data categories of the IML file. Another interesting feature is the use
of a "missing-value" or "missing-data" number we call a "non-naturally occurring number"
(NNON) typically set to -99999. This is useful to switch a measurement from being measured to
unmeasured i.e., if the value is NNON then it is to be regressed in the DRR, when performing
the gross-error detection and identification analysis similar to running multiple scenarios, cases
or situations to determine if the problem contains bad data before the production accounting
data is disseminated to other decision-making applications.
References
Kelly, J.D., "A regularization approach to the reconciliation of constrained data sets", Computers
& Chemical Engineering, 1771, (1998).
6. Kelly, J.D., "Practical issues in the mass reconciliation of large plant-wide flowsheets", AIChE
Spring Meeting, Houston, March, (1999).
Kelly, J.D., “The necessity of data reconciliation”, NPRA Computer Conference, Chicago,
November, (2000).
Kelly, J.D., "Production modeling for multimodal operations", Chemical Engineering Progress,
February, 44, (2004a).
Kelly, J.D., "Techniques for solving industrial nonlinear data reconciliation problems",
Computers & Chemical Engineering, 2837, (2004b).
Kelly, J.D., Mann, J.L., "Improve yield accounting by including density measurements explicitly",
Hydrocarbon Processing, January, (2005).
Kelly, J.D., Mann, J.L., Schulz, F.G., "Improve accuracy of tracing production qualities using
successive reconciliation", Hydrocarbon Processing, April, (2005).
Kelly, J.D., "The unit-operation-stock superstructure (UOSS) and the quantity-logic-quality
paradigm (QLQP) for production scheduling in the process industries", In: MISTA 2005
Conference Proceedings, 327, (2005).
Kelly, J.D., Zyngier, D., "A new and improved MILP formulation to optimize observability,
redundancy and precision for sensor network problems", American Institute of Chemical
Engineering Journal, 54, 1282, (2008a).
Kelly, J.D., Zyngier, D., "Continuously improve planning and scheduling models with parameter
feedback", FOCAPO 2008, July, (2008b).
Zyngier, D., Kelly, J.D., "UOPSS: a new paradigm for modeling production planning and
scheduling systems", ESCAPE 22, June, (2012).
Kelly, J.D., Hedengren, J.D., "A steady-state detection (SDD) algorithm to detect non-stationary
drifts in processes", Journal of Process Control, 23, 326, (2013).
Appendix A - APA-IMF.UPS (UOPSS) File
Appendix B - APA-IMF.IML File