Presented in this short document is a description of what we call "Advanced" Property Tracking or Tracing (APT). APT is the term given to the technique of predicting, simulating, calculating or estimating the properties (i.e., densities, compositions, conditions, qualities, etc.) in a network or superstructure with significant inventory using statistical data reconciliation and regression (DRR)
1. Advanced Property Tracking/Tracing
Industrial Modeling Framework (APT-IMF)
i n d u s t r IAL g o r i t h m s LLC. (IAL)
www.industrialgorithms.com
July 2013
Introduction to Advanced Property Tracking/Tracing, UOPSS and QLQP
Presented in this short document is a description of what we call "Advanced" Property Tracking
or Tracing (APT). APT is the term given to the technique of predicting, simulating, calculating or
estimating the properties (i.e., densities, compositions, conditions, qualities, etc.) in a network or
superstructure with significant inventory using statistical data reconciliation and regression
(DRR). Essentially, the model and data define a simultaneous material with properties DRR
problem where the properties in this case refer to compositions or concentrations (Kelly et. al.,
2005). Figure 1 depicts a small property tracking/tracing flowsheet problem configured in our
unit-operation-port-state superstructure (UOPSS) (Kelly, 2004a, 2005, and Zyngier and Kelly,
2012).
Figure 1. Small Property Tracking/Tracing Flowsheet (Kelly et. al., 2005).
The diamond shapes or objects are the sources and sinks known as perimeters, the triangle
shapes are the pools or tanks (inventory units). The circle shapes with no cross-hairs are in-
ports which can accept one or more inlet flows and are considered to be simple or uncontrolled
mixers. The cross-haired circles are out-ports which can allow one or more outlet flows and are
considered to be simple or uncontrolled splitters. The lines, arcs or edges in between the
various shapes are known as internal and external streams and represent in this context the
flows of materials from one shape to another. This example and its data are taken directly from
Kelly et. al. (2005) but is mapped to our UOPSS modeling framework which includes only the
first time-period defined as a single hour. Multiple time-periods can be handled in one of two
ways for the APT problem. The first is to treat each time-period separately, sequentially or
"successively" as is done in the Kelly et. al. (2005) paper which are solved independently over a
time-horizon of typically a day. The second is to model all time-periods over the time-horizon
where all time-periods are solved "simultaneously" as one large multi-time-period model known
as dynamic DRR. From a diagnostic point of view, it is perhaps easier to diagnosis
2. measurement gross-errors or outliers using the first approach given that it chronologically steps
through the field data one time-period at a time.
In this example, we have five measured flows represented by the five out-port to in-port arrows
with one of the stream flows being a recycle. There are three tanks which have both opening
and closing measured holdup values. As well, we have two compositions C1 and C2 which are
some measurable phenomenon. The measured compositions are on the out-port of P1 and the
in-port of P2 as well as the opening two composition values for the three tanks. This leads to 5
(flows) + 3 (opening holdups) + 3 (closing holdups) + 4 (stream compositions) + 6 (opening tank
compositions) = 21 measurable quantities and qualities per time-period.
The key difference between the modeling found in Kelly et. al. (2005) and our formulation is that
we use the concept of "ports" which allows for a more unambiguous and parsimonious
representation of the quantity, logic and quality phenomenological (QLQP) data. For instance,
on T3 at out-port F4 there are two flows out (quantity) simultaneously which replaces the explicit
splitter object (S1 in Kelly et. al. (2005)'s Figure 1) with out-port F4 which is an implicit or implied
splitter. In addition, whereas the Kelly et. al. (2005) flowsheet is stream-based, with the port-
based UOPSS flowsheet, streams by themselves do not have explicit properties attached but
these are uniquely found on or are implied from the out-ports. Hence, the reason why out-ports
are implied splitters.
Industrial Modeling Framework (IMF), IMPRESS and SIIMPLE
To implement the mathematical formulation of this and other systems, IAL offers a unique
approach and is incorporated into our Industrial Modeling and Pre-Solving System we call
IMPRESS. IMPRESS has its own modeling language called IML (short for Industrial Modeling
Language) which is a flat or text-file interface as well as a set of API's which can be called from
any computer programming language such as C, C++, Fortran, Java (SWIG), C# or Python
(CTYPES) called IPL (short for Industrial Programming Language) to both build the model and
to view the solution. Models can be a mix of linear, mixed-integer and nonlinear variables and
constraints and are solved using a combination of LP, QP, MILP and NLP solvers such as
COINMP, GLPK, LPSOLVE, SCIP, CPLEX, GUROBI, LINDO, XPRESS, CONOPT, IPOPT and
KNITRO as well as our own implementation of SLP called SLPQPE (Successive Linear &
Quadratic Programming Engine) which is a very competitive alternative to the other nonlinear
solvers and embeds all available LP and QP solvers.
In addition and specific to DRR problems, we also have a special solver called SECQPE
standing for Sequential Equality-Constrained QP Engine which computes the least-squares
solution and a post-solver called SORVE standing for Supplemental Observability, Redundancy
and Variability Estimator to estimate the usual DRR statistics found in Kelly (1998 and 2004a)
and Kelly and Zyngier (2008). SECQPE also includes a Levenberg-Marquardt regularization
method for nonlinear data regression problems and can be presolved using SLPQPE i.e.,
SLPQPE warm-starts SECQPE. SORVE is run after the SECQPE solver and also computes
the well-known "maximum-power" gross-error statistics to help locate outliers, defects and/or
faults i.e., mal-functions in the measurement system and mis-specifications in the logging
system.
The underlying system architecture of IMPRESS is called SIIMPLE (we hope literally) which is
short for Server, Interacter (IPL), Interfacer (IML), Modeler, Presolver Libraries and Executable.
The Server, Presolver and Executable are primarily model or problem-independent whereas the
Interacter, Interfacer and Modeler are typically domain-specific i.e., model or problem-
3. dependent. Fortunately, for most industrial planning, scheduling, optimization, control and
monitoring problems found in the process industries, IMPRESS's standard Interacter, Interfacer
and Modeler are well-suited and comprehensive to model the most difficult of production and
process complexities allowing for the formulations of straightforward coefficient equations,
ubiquitous conservation laws, rigorous constitutive relations, empirical correlative expressions
and other necessary side constraints.
User, custom, adhoc or external constraints can be augmented or appended to IMPRESS when
necessary in several ways. For MILP or logistics problems we offer user-defined constraints
configurable from the IML file or the IPL code where the variables and constraints are
referenced using unit-operation-port-state names and the quantity-logic variable types. It is also
possible to import a foreign LP file (row-based MPS file) which can be generated by any
algebraic modeling language or matrix generator. This file is read just prior to generating the
matrix and before exporting to the LP, QP or MILP solver. For NLP or quality problems we offer
user-defined formula configuration in the IML file and single-value and multi-value function
blocks writable in C, C++ or Fortran. The nonlinear formulas may include intrinsic functions
such as EXP, LN, LOG, SIN, COS, TAN, MIN, MAX, IF, NOT, EQ, NE, LE, LT, GE, GT and KIP,
LIP, SIP (constant, linear and monotonic spline interpolation) as well as user-written extrinsic
functions.
Industrial modeling frameworks or IMF's are intended to provide a jump-start to an industrial
project implementation i.e., a pre-project if you will, whereby pre-configured IML files and/or IPL
code are available specific to your problem at hand. The IML files and/or IPL code can be
easily enhanced, extended, customized, modified, etc. to meet the diverse needs of your project
and as it evolves over time and use. IMF's also provide graphical user interface prototypes for
drawing the flowsheet as in Figure 1 and typical Gantt charts and trend plots to view the solution
of quantity, logic and quality time-profiles. Current developments use Python 2.3 and 2.7
integrated with open-source Dia and Matplotlib modules respectively but other prototypes
embedded within Microsoft Excel/VBA for example can be created in a straightforward manner.
However, the primary purpose of the IMF's is to provide a timely, cost-effective, manageable
and maintainable deployment of IMPRESS to formulate and optimize complex industrial
manufacturing systems in either off-line or on-line environments. Using IMPRESS alone would
be somewhat similar (but not as bad) to learning the syntax and semantics of an AML as well as
having to code all of the necessary mathematical representations of the problem including the
details of digitizing your data into time-points and periods, demarcating past, present and future
time-horizons, defining sets, index-sets, compound-sets to traverse the network or topology,
calculating independent and dependent parameters to be used as coefficients and bounds and
finally creating all of the necessary variables and constraints to model the complex details of
logistics and quality industrial optimization problems. Instead, IMF's and IMPRESS provide, in
our opinion, a more elegant and structured approach to industrial modeling and solving so that
you can capture the benefits of advanced decision-making faster, better and cheaper.
"Advanced" Property Tracking/Tracing Synopsis
At this point we explore further the purpose of "advanced" property tracking/tracing in terms of
its prediction and diagnostic capability of aiding in the detection, identification and elimination of
"bad" flow, holdup and property data where "bad" really implies inconsistent data. The major
advantage of DRR is its ability to use redundant data which is sometimes referred to as over-
determined or over-specified problems and is more powerful than using simulation techniques
alone. The redundancy primarily occurs because of the inclusion of a model i.e., equations or
4. equality constraints relating flow, holdup and property variables together as in laws of
conservation of matter, energy and momentum. Some of these variables are measured or
reconciled, some are unmeasured or regressed while others are fixed or rigid. Measured
variables include a raw and known (finite) variance, unmeasured variables have a large and
unknown (infinite) variance and fixed variables have no or zero variance. The DRR objective
function is to minimize the weighted sum of squares of the raw measurements minus its
reconciled estimate where the weights are simply determined as the inverse of its raw variance
(Kelly, 1998). At a converged DRR solution using SECQPE we have estimates of the
reconciled and unmeasured or regressed variables and after running SORVE we have new
variance estimates for the reconciled and unmeasured or regressed variables as well as
redundancy and observability estimates for each measured and unmeasured variable
respectively. Furthermore, using these variances we can compute individual gross-error
detection statistics for the measured variables and equality constraints as well as confidence
intervals for each unmeasured variable using the Student-t tables to determine statistical
threshold or critical values. In addition, we can also compute a global or overall Hotelling
statistic on the objective function value to detect if at least one gross-error exists.
The major driving force behind APT is to use quantity and quality variables to accurately predict
or precisely calculate other quantity and quality variables subject to normal random errors using
a model or set of variables and constraints. If gross-errors, defects, offsets or biases are
detected and identified using the diagnostic capability of DRR, then these should be removed or
eliminated before the estimated properties are used or reported in other information and
decision-making systems. Applying this technique to the data set found in Kelly et. al. (2005),
where the flowsheet has been slightly modified to transform it into UOPSS and there are no
injected gross-errors into the system, we arrive at an objective function of 0.00004 for the very
first time-period. A one percent (1%) relative standard error for the flow and holdup
measurements and a one percent (1%) absolute standard error for the composition
measurements were employed. This is close to the objective function of 0.0001 quoted in Table
4 for time-period 1 of Kelly et. al. (2005). We note that the opening tank holdups for T1 and T2
are in error in Kelly et. al. (2005). Instead, T1 has an opening inventory of 20.0 and T2 has a
value of 10.0 m^3. In order to estimate the other time-period values using a successive or
sequential approach, then the closing holdups and compositions from the first time-period are
then used as the opening values for the second time-period and so on.
Finally, Appendix A and B show the APT-IMF.UPS and APT-IMF.IML files used to configure
both the model and the data of the APT problem. The UPS file contains the UOPSS constructs
or shapes and the IML file contains all of the static and dynamic QLQP capacity data referenced
by the UOPSS constructs. The UPS file can be automatically created using the open-source
drawing software called GNOME Dia and using the Python 2.3 programming language to
access Dia's object model to retrieve the UOPSS sheet shapes. The IML file is a simple text file
with several categories or classifications of both the model (master, static) data and the cycle
(transactional, dynamic) data. An interesting feature of the IML file are the use of "Calc"'s
(values assigned to symbols) which can be used to manage dynamic data from the field such as
flow meter readings and laboratory analysis results. This means that interfacing or binding the
various data sources to the IML file is achieved by changing the value of a Calc and then using
this Calc in the rest of the data categories of the IML file. Another interesting feature is the use
of a "missing-value" or "missing-data" number we call a "non-naturally occurring number"
(NNON) typically set to -99999. This is useful to switch a measurement from being measured to
unmeasured i.e., if the value is NNON then it is to be regressed in the DRR, when performing
the gross-error detection and identification analysis similar to running multiple scenarios, cases
5. or situations to determine if the problem contains bad data before the property tracking/tracing
data is disseminated to other decision-making applications.
References
Kelly, J.D., "A regularization approach to the reconciliation of constrained data sets", Computers
& Chemical Engineering, 1771, (1998).
Kelly, J.D., "Production modeling for multimodal operations", Chemical Engineering Progress,
February, 44, (2004a).
Kelly, J.D., "Techniques for solving industrial nonlinear data reconciliation problems",
Computers & Chemical Engineering, 2837, (2004b).
Kelly, J.D., Mann, J.L., Schulz, F.G., "Improve accuracy of tracing production qualities using
successive reconciliation", Hydrocarbon Processing, April, (2005).
Kelly, J.D., "The unit-operation-stock superstructure (UOSS) and the quantity-logic-quality
paradigm (QLQP) for production scheduling in the process industries", In: MISTA 2005
Conference Proceedings, 327, (2005).
Kelly, J.D., Zyngier, D., "A new and improved MILP formulation to optimize observability,
redundancy and precision for sensor network problems", American Institute of Chemical
Engineering Journal, 54, 1282, (2008).
Zyngier, D., Kelly, J.D., "UOPSS: a new paradigm for modeling production planning and
scheduling systems", ESCAPE 22, June, (2012).
Appendix A - APT-IMF.UPS (UOPSS) File
Appendix B - APT-IMF.IML File