SlideShare a Scribd company logo
1 of 5
Download to read offline
36  www.aiche.org/cep  March 2016  CEP
Special Section: Big Data Analytics
B
ig data in the process industries has many of the
characteristics represented by the four Vs — volume,
variety, veracity, and velocity. However, process
data can be distinguished from big data in other industries
by the complexity of the questions we are trying to answer
with process data. Not only do we want to find and interpret
patterns in the data and use them for predictive purposes, but
we also want to extract meaningful relationships that can be
used to improve and optimize a process.
	 Process data are also often characterized by the pres-
ence of large numbers of variables from different sources,
something that is generally much more difficult to handle
than just large numbers of observations. Because of the
multisource nature of process data, engineers conducting a
process investigation must work closely with the IT depart-
ment that provides the necessary infrastructure to put these
data sets together in a contextually correct way.
	 This article presents several success stories from dif-
ferent industries where big data has been used to answer
complex questions. Because most of these studies involve
the use of latent variable (LV) methods such as principal
component analysis (PCA) (1) and projection to latent
structures (PLS) (2, 3), the article first provides a brief
overview of those methods and explains the reasons such
methods are particularly suitable for big data analysis.
Latent variable methods
	 Historical process data generally consist of measure-
ments of many highly correlated variables (often hundreds
to thousands), but the true statistical rank of the process, i.e.,
the number of underlying significant dimensions in which the
process is actually moving, is often very small (about two to
ten). This situation arises because only a few dominant events
are driving the process under normal operations (e.g., raw
material variations, environmental effects). In addition, more
sophisticated online analyzers such as spectrometers and
imaging systems are being used to generate large numbers of
highly correlated measurements on each sample, which also
require lower-rank models.
	 Latent variable methods are uniquely suited for the
analysis and interpretation of such data because they are
based on the critical assumption that the data sets are of
low statistical rank. They provide low-dimension latent
variable models that capture the lower-rank spaces of
the process variable (X) and the response (Y) data with-
out over-fitting the data. This low-dimensional space is
defined by a small number of statistically significant latent
variables (t1, t2, …), which are linear combinations of the
measured variables. Such variables can be used to con-
struct simple score and loading plots, which provide a way
to visualize and interpret the data.
Big data holds much potential for optimizing
and improving processes. See how it has
already been used in a range of industries,
from pharmaceuticals to pulp and paper.
Salvador García Muñoz
Eli Lilly and Co.
John F. MacGregor
ProSensus, Inc.
BIG DATA
Success Stories in the
Process Industries
Copyright © 2016 American Institute of Chemical Engineers (AIChE)
CEP  March 2016  www.aiche.org/cep  37
	 The scores can be thought of as scaled weighted aver-
ages of the original variables, using the loadings as the
weights for calculating the weighted averages. A score plot
is a graph of the data in the latent variable space. The load-
ings are the coefficients that reveal the groups of original
variables that belong to the same latent variable, with one
loading vector (W*) for each latent variable. A loading
plot provides a graphical representation of the clustering of
variables, revealing the identified correlations among them.
	 The uniqueness of latent variable models is that they
simultaneously model the low dimensional X and Y
spaces, whereas classical regression methods assume that
there is independent variation in all X and Y variables
(which is referred to as full rank). Latent variable models
show the relationships between combinations of variables
and changes in operating conditions — thereby allowing
us to gain insight and optimize processes based on such
historical data.
	 The remainder of the article presents several industrial
applications of big data for:
	 • the analysis and interpretation of historical data and
troubleshooting process problems
	 • optimizing processes and product performance
	 • monitoring and controlling processes
	 • integrating data from multivariate online analyzers
and imaging sensors.
Learning from process data
	 A data set containing about 200,000 measurements was
collected from a batch process for drying an agrochemical
material — the final step in the manufacturing process. The
unit is used to evaporate and collect the solvent contained in
the initial charge and to dry the product to a target residual
solvent level.
	 The objective was to determine the operating conditions
responsible for the overall low yields when off-specification
product is rejected. The problem is highly complex because
it requires the analysis of 11 initial raw material conditions,
10 time trajectories of process variables (trends in the evolu-
tion of process variables), and the impact of the process
variables on 11 physical properties of the final product.
	 The available data were arranged in three blocks:
	 • the time trajectories measured through the batch,
which were characterized by milestone events (e.g., slope,
total time for stage of operation), comprised Block X
	 • Block Z contained measurements of the chemistry of
the incoming materials
	 • Block Y consisted of the 11 physical properties of the
final product.
	 A multiblock PLS (MBPLS) model was fitted to the three
data blocks. The results were used to construct score plots
(Figure 1), which show the batch-to-batch product quality
variation, and the companion loading plots (Figure 2), which
show the regressor variables (in X and Z) that were most
highly correlated with such variability.
	 Contrary to the initial hypothesis that the chemistry vari-
ables (Z) were responsible for the off-spec product, the analy-
sis isolated the time-varying process variables as a plausible
cause for the product quality differences (Figure 1, red) (4).
This was determined by observing the direction in which the
product quality changes (arrow in Figure 1) and identifying
the variables that line up in this direction of change (Figure 2).
Variables z1–z11 line up in a direction that is close to perpen-
dicular to the direction of quality change.
Z1
Z3
Z10
Z5 Z4
Z8
Z2
Z7
Z11
Z6
Level1
Temp1
Time2
Time4 Temp2
Time1
Time3
TempSlope
0.7
0.6
0.5
0.4
0.3
0.2
0.71
0
–0.1
–0.2
–0.3
–0.4 –0.3 –0.2 –0.1 0 0.1 0.2 0.3 0.4
W* [1]
W*[2]
Z9
Weight Wet
Cake
p Figure 1. A score plot of two latent variables shows lots clustered by
product quality. Source: (4).
p Figure 2. A companion loading plot reveals the process parameters that
were aligned with the direction of change in the score plot. Source: (4).
6
4
2
0
–2
–4
–6
–6 –4 –2 0 2 4 6
t2
t1
On-Spec (High Residual Solvent)
On-Spec
Off-Spec
Copyright © 2016 American Institute of Chemical Engineers (AIChE)
38  www.aiche.org/cep  March 2016  CEP
Special Section: Big Data Analytics
Optimizing process operations
	 The manufacture of formulated products (such as
pharmaceutical tablets) generates a complex data set that
extends beyond process conditions to also include informa-
tion about the raw materials used in the manufacture of each
lot of final product, and the physical properties of the raw
materials. This case study can be represented by multiple
blocks of data: the final quality of the product of interest
(Y), the weighted average for the physical properties of the
raw materials used in each lot (RXI), and the process and
environmental conditions at which each lot was manufac-
tured (Z). These blocks of data were used to build a MBPLS
model that was later embedded within a mixed-integer non-
linear programming (MINLP) optimization framework. The
physical properties of the lots of material available in inven-
tory are represented by data block XA and the properties of
the lots of material used to manufacture the final product are
represented by data block X.
	 The objective for the optimization routine was to deter-
mine the materials available in inventory that should be
combined and the ratios (r) of those that should be blended
to obtain the best next lot of finished product. The square of
the difference between the predicted and the target quality of
the product was used to choose the lots and blending ratios.
	 The underlying calculations reduce the problem to the
score space, where the differences in quality — in this case
tablet dissolution — correspond to different locations on
the score plot (Figure 3). The MINLP optimization routine
identified the candidate materials available in inventory
that should be blended together to make the final product
so that the score for the next lot lands in the score space
corresponding to the desired quality (i.e., target dissolu-
tion). Implementing this optimization routine in real time
significantly improved the quality of the product produced
in this manufacturing process (Figure 4).
	 Selecting the materials from inventory to be used in
manufacturing a product is not as simple as choosing those
that will produce the best lot of product. If you choose
materials aiming to produce the best next lot, you will
inevitably consume the best materials very fast; this may
be acceptable for a low-volume product. For high-volume
products, however, using this same calculation will lead to
an undesired situation where the best materials have been
depleted and the less-desirable raw materials are left. In
this case, it is better to perform the optimization routine
for the best next campaign (a series of lots), which will
account for the fact that more than one acceptable lot of
product is being manufactured. The optimization calcula-
tion in this latter case will then balance the use of inventory
and enable a better management of desireable vs. less-
desirable raw materials for the entire campaign of manu-
factured product.
	 The MINLP objective function must be tailored to the
material management needs for the given product so that
it adequately considers operational constraints, such as
the maximum number of lots of the same material to
blend (5, 6).
Monitoring processes
	 Perhaps the most well-known application of principal
components analysis in the chemical process industries
(CPI) is its use as a monitoring tool, enabling true multi-
variate statistical process control (MSPC) (7, 8). In this
example, a PCA model was used to describe the normal
variability in the operation of a closed spray drying system
in a pharmaceutical manufacturing process (9). The system
Slow Dissolution
Fast Dissolution
Target Dissolution
1.5
1
0.5
0
–0.5
–1
–1.5
–2 –1.5 –1 –0.5 0 0.5 1 1.5 2
t3
t1
p Figure 3. The dissolution speed of a pharmaceutical tablet is identified
on a score plot of the latent variables. Source: (5).
Historical
Data
Best-Next-Lot Approach
Best-Next-Campaign Approach
{{
Quality Problems
Dissolution,%
USL
LSL
Target
45
40
35
30
25
20
15
Lots of Finished Goods
p Figure 4. A control chart of the degree of dissolution of a pharmaceuti-
cal tablet reveals the onset of quality problems. Quality problems are
reduced by the implementation of a best-next-lot solution, then eliminated
by the best-next-campaign approach. Source: (6).
Copyright © 2016 American Institute of Chemical Engineers (AIChE)
CEP  March 2016  www.aiche.org/cep  39
(Figure 5) includes measurements of 16 process variables,
which can be projected by a PCA model into two principal
components (t1 and t2), each of which describes a differ-
ent source of variability in the process. A score plot that
updates in real time can then be used as a graphical tool
to determine when the process is exhibiting abnormal
behavior. This is illustrated in Figure 6, where the red dots
indicate the current state of the process, which is clearly
outside of the normal operating conditions (gray markers).
	 It is important to emphasize that this model could be
used to effectively monitor product quality without the
need to add online sensors to measure product properties.
Building an effective monitoring system requires a good
data set that is representative of the normal operating con-
ditions of the process.
Control of batch processes
	 Multivariate PLS models built from process data that
relate the initial conditions of the batch (Z), the time-varying
process trajectories (X), and the final quality attributes (Y)
(10) provide an effective way to control product quality and
productivity of batch processes. Those models can be used
online to collect evolving data of any new batch (first the
initial data in Z and then the evolving data in X), which are
then used to update the predictions of final product qual-
ity (Y) at every time interval during the batch process. At
certain critical decision points (usually each batch has one
or two), a multivariate optimization routine is run to identify
control actions that will drive the final quality into a desired
target region and maximize productivity while respecting all
operating constraints (11–13).
	 Figure 7 displays one quality attribute of a high-value
food product before and after this advanced process
6
4
2
0
–2
–4
–6
–10 –8 –6 –4 –2 0 2 4 6 8 10
t1
t2
Abnormal
Operating
Conditions
Normal
Operating
Conditions
p Figure 6. A score plot of the two principal components describing
the closed-loop spray drying system (Figure 5) shows that the process is
operating under abnormal conditions. Source: (9).
p Figure 5. A closed-loop spray drying system in a pharmaceutical manufacturing facility is being monitored by the measurement of 16 variables that a PCA
model projects into two principal components. Source: (9).
With Control
No Control
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
–0.4 –0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Deviation from Target
FinalProductQualityAttribute
p Figure 7. Advanced control eliminated the variation in the final product
quality attribute of a food product. Source: (9).
Feed
Pump
Drying
Chamber
T
P
FS
T
P
W
Cyclone
Baghouse
Process
Heater
HEPA
Filter
Thermal
Mass Flow
Sensor
FS
Supply
Fan
Condenser
Exhaust
Fan
HEPA
Filter
Data Logging
of Product
Collection
Weight
Exhaust Pressure
Controlled by Exhaust
Fan Speed
Drying Gas Flowrate
Controlled by Supply
Fan Speed
Exhaust Pressure
Transducer
Copyright © 2016 American Institute of Chemical Engineers (AIChE)
40  www.aiche.org/cep  March 2016  CEP
Special Section: Big Data Analytics
control method was implemented over many thousands of
batches. The process control method reduced the root-
mean-square deviation from the target for all final product
quality attributes by 50–70% and increased batch produc-
tivity by 20%.
Analyzing information from
advanced analyzers and imaging sensors
	 The use of more-sophisticated online analyzers (e.g.,
online spectrometers) and image-based sensors for online
process monitoring is becoming more prevalent in the
CPI. With that comes the need for more powerful meth-
ods to handle and extract information from the large and
diverse data blocks acquired from such sophisticated
online monitors. Latent variable methods provide an effec-
tive approach (14).
	 Consider a soft sensor (i.e., virtual sensor software that
processes several measurements together) application for
predicting the quality of product exiting a lime kiln at a pulp
and paper mill. Real-time measurements on many process
variables were combined with images from a color camera
capturing the combustion region of the kiln. The information
extracted from the combustion zone images and data from
the process data blocks were combined using the online
multivariate model to assess combustion stability and make
2-hr-ahead predictions of the exit lime quality.
Concluding remarks
	 Contextually correct historical data is a critical asset
that a corporation can take advantage of to expedite asser-
tive decisions (3). A potential pitfall in the analysis of big
data is assuming that the data will contain information
just because there is an abundance of data. Data contain
information if they are organized in a contextually cor-
rect manner; the practitioner should not underestimate the
effort and investment necessary to organize data such that
information can be extracted from them.
	 Multivariate latent variable methods are effective tools
for extracting information from big data. These methods
reduce the size and complexity of the problem to simple
and manageable diagnostics and plots that are accessible to
all consumers of the information, from the process design-
ers and line engineers to the operations personnel. CEP
Literature Cited
1.	 Jackson, E., “A User’s Guide to Principal Components,” 1st ed.,
John Wiley and Sons, Hoboken, NJ (1991).
2.	 Höskuldsson, A., “PLS Regression Methods,” Journal of Chemo-
metrics, 2 (3), pp. 211–228 (June 1988).
3.	 Wold, S., et al., “PLS — Partial Least-Squares Projection to
Latent Structures,” in Kubiny, H., ed., “3D-QSAR in Drug
Design,” ESCOM Science Publishers, Leiden, The Netherlands,
pp. 523–550 (1993).
4.	 García Muñoz, S., et al., “Troubleshooting of an Industrial Batch
Process Using Multivariate Methods,” Industrial and Engineering
Chemistry Research, 42 (15), pp. 3592–3601 (2003).
5.	 García Muñoz, S., and J. A. Mercado, “Optimal Selection of
Raw Materials for Pharmaceutical Drug Product Design and
Manufacture Using Mixed Integer Non-Linear Programming and
Multivariate Latent Variable Regression Models,” Industrial and
Engineering Chemistry Research, 52 (17), pp. 5934–5942 (2013).
6.	 García Muñoz, S., et al., “A Computer Aided Optimal Inventory
Selection System for Continuous Quality Improvement in Drug
Product Manufacture,” Computers and Chemical Engineering, 60,
pp. 396–402 (Jan. 10, 2014).
7.	 MacGregor, J. F., and T. Kourti, “Statistical Process Control of
Multivariable Processes,” Control Engineering Practice, 3 (3),
pp. 403–414 (1995).
8.	 Kourti, T., and J. F. MacGregor, “Recent Developments in
Multivariate SPC Methods for Monitoring and Diagnosing Process
and Product Performance,” Journal of Quality Technology, 28 (4),
pp. 409–428 (1996).
9.	 García Muñoz, S., and D. Settell, “Application of Multivariate
Latent Variable Modeling to Pilot-Scale Spray Drying Monitoring
and Fault Detection: Monitoring with Fundamental Knowledge,”
Computers and Chemical Engineering, 33 (12), pp. 2106–2110
(2009).
10.	 Kourti, T., et al., “Analysis, Monitoring and Fault Diagnosis of
Batch Processes Using Multiblock and Multiway PLS,” Journal of
Process Control, 5, pp. 277–284 (1995).
11.	 Yabuki, Y., and J. F. MacGregor, “Product Quality Control
in Semibatch Reactors Using Midcourse Correction Policies,”
Industrial and Engineering Chemistry Research, 36, pp. 1268–1275
(1997).
12.	 Yabuki, Y., et al., “An Industrial Experience with Product Quality
Control in Semi-Batch Processes,” Computers and Chemical Engi-
neering, 24, pp. 585–590 (2000).
13.	 Flores-Cerrillo, J., and J. F. MacGregor, “Within-Batch and
Batch-to-Batch Inferential Adaptive Control of Semi-Batch
Reactors,” Industrial and Engineering Chemistry Research, 42,
pp. 3334–3345 (2003).
14.	 Yu, H., et al., “Digital Imaging for Online Monitoring and Control
of Industrial Snack Food Processes,” Industrial and Engineering
Chemistry Research, 42 (13), pp. 3036–3044 (2003).
Multivariate latent variable methods
reduce a problem to manageable
diagnostics and simple plots.
Copyright © 2016 American Institute of Chemical Engineers (AIChE)

More Related Content

What's hot

Cutting power & Energy Consideration in metal cutting
Cutting power & Energy Consideration in metal cuttingCutting power & Energy Consideration in metal cutting
Cutting power & Energy Consideration in metal cuttingDushyant Kalchuri
 
Presentation joining processes
Presentation joining processesPresentation joining processes
Presentation joining processesR G Sanjay Prakash
 
Thread ans weld joints
Thread ans weld jointsThread ans weld joints
Thread ans weld jointsAjit C
 
日本心理学会ポスター発表
日本心理学会ポスター発表日本心理学会ポスター発表
日本心理学会ポスター発表kaori enomoto
 
Oblique Cutting
Oblique CuttingOblique Cutting
Oblique CuttingAdil Malik
 
Submerged Arc Welding
Submerged Arc WeldingSubmerged Arc Welding
Submerged Arc Weldingswargpatel283
 
Khớp nối - chương 14
Khớp nối - chương 14Khớp nối - chương 14
Khớp nối - chương 14Chau Nguyen
 
Colonias sanas y seguras. nuevo laredo se transforma contigo2011
Colonias sanas y seguras.  nuevo laredo se transforma contigo2011Colonias sanas y seguras.  nuevo laredo se transforma contigo2011
Colonias sanas y seguras. nuevo laredo se transforma contigo2011consegul
 
処方箋データとNDBを用いた処方動向調査
処方箋データとNDBを用いた処方動向調査処方箋データとNDBを用いた処方動向調査
処方箋データとNDBを用いた処方動向調査Yasuyuki Okumura
 
Suc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCM
Suc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCMSuc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCM
Suc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCMcuong nguyen
 
バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用Yasuyuki Okumura
 
医療データベース研究における バイアスと交絡への対処法
医療データベース研究におけるバイアスと交絡への対処法医療データベース研究におけるバイアスと交絡への対処法
医療データベース研究における バイアスと交絡への対処法Takashi Fujiwara
 
Metal cutting basics min
Metal cutting basics minMetal cutting basics min
Metal cutting basics minNagarajpatil42
 

What's hot (20)

Bai 1
Bai 1Bai 1
Bai 1
 
Cutting power & Energy Consideration in metal cutting
Cutting power & Energy Consideration in metal cuttingCutting power & Energy Consideration in metal cutting
Cutting power & Energy Consideration in metal cutting
 
Presentation joining processes
Presentation joining processesPresentation joining processes
Presentation joining processes
 
Thread ans weld joints
Thread ans weld jointsThread ans weld joints
Thread ans weld joints
 
Codigo de Dibujo Tecnico - Mecanico
Codigo de Dibujo Tecnico - Mecanico Codigo de Dibujo Tecnico - Mecanico
Codigo de Dibujo Tecnico - Mecanico
 
日本心理学会ポスター発表
日本心理学会ポスター発表日本心理学会ポスター発表
日本心理学会ポスター発表
 
Gas welding
Gas weldingGas welding
Gas welding
 
Oblique Cutting
Oblique CuttingOblique Cutting
Oblique Cutting
 
Submerged Arc Welding
Submerged Arc WeldingSubmerged Arc Welding
Submerged Arc Welding
 
Khớp nối - chương 14
Khớp nối - chương 14Khớp nối - chương 14
Khớp nối - chương 14
 
Colonias sanas y seguras. nuevo laredo se transforma contigo2011
Colonias sanas y seguras.  nuevo laredo se transforma contigo2011Colonias sanas y seguras.  nuevo laredo se transforma contigo2011
Colonias sanas y seguras. nuevo laredo se transforma contigo2011
 
処方箋データとNDBを用いた処方動向調査
処方箋データとNDBを用いた処方動向調査処方箋データとNDBを用いた処方動向調査
処方箋データとNDBを用いた処方動向調査
 
Catalogo de Acero Aisi 4340
Catalogo de Acero Aisi 4340Catalogo de Acero Aisi 4340
Catalogo de Acero Aisi 4340
 
Suc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCM
Suc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCMSuc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCM
Suc ben vat_lieu Phân hiệu Đại học Giao thông Vận tải HCM
 
Soldaduras de oleoductos
Soldaduras de oleoductosSoldaduras de oleoductos
Soldaduras de oleoductos
 
バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用バリデーション研究の計画・報告・活用
バリデーション研究の計画・報告・活用
 
医療データベース研究における バイアスと交絡への対処法
医療データベース研究におけるバイアスと交絡への対処法医療データベース研究におけるバイアスと交絡への対処法
医療データベース研究における バイアスと交絡への対処法
 
第2回DARM勉強会
第2回DARM勉強会第2回DARM勉強会
第2回DARM勉強会
 
Metal cutting basics min
Metal cutting basics minMetal cutting basics min
Metal cutting basics min
 
Tiempo procesamiento torno
Tiempo procesamiento tornoTiempo procesamiento torno
Tiempo procesamiento torno
 

Similar to Data science in chemical manufacturing

Automated well test analysis ii using ‘well test auto’
Automated well test analysis ii using ‘well test auto’Automated well test analysis ii using ‘well test auto’
Automated well test analysis ii using ‘well test auto’Alexander Decker
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataeSAT Publishing House
 
Kelly zyngier oil&gasbookchapter_july2013
Kelly zyngier oil&gasbookchapter_july2013Kelly zyngier oil&gasbookchapter_july2013
Kelly zyngier oil&gasbookchapter_july2013Jeffrey Kelly
 
Unit-Operation Nonlinear Modeling for Planning and Scheduling Applications
Unit-Operation Nonlinear Modeling for Planning and Scheduling ApplicationsUnit-Operation Nonlinear Modeling for Planning and Scheduling Applications
Unit-Operation Nonlinear Modeling for Planning and Scheduling ApplicationsAlkis Vazacopoulos
 
Production_planning_of_a_furniture_manufacturing_c.pdf
Production_planning_of_a_furniture_manufacturing_c.pdfProduction_planning_of_a_furniture_manufacturing_c.pdf
Production_planning_of_a_furniture_manufacturing_c.pdfMehnazTabassum20
 
Supply chain design under uncertainty using sample average approximation and ...
Supply chain design under uncertainty using sample average approximation and ...Supply chain design under uncertainty using sample average approximation and ...
Supply chain design under uncertainty using sample average approximation and ...SSA KPI
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27IJARIIE JOURNAL
 
Quality of service management
Quality of service managementQuality of service management
Quality of service managementselinasimpson2301
 
Stochastic behavior analysis of complex repairable industrial systems
Stochastic behavior analysis of complex repairable industrial systemsStochastic behavior analysis of complex repairable industrial systems
Stochastic behavior analysis of complex repairable industrial systemsISA Interchange
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithmsFarhan Zaki
 
Informing product design with analytical data
Informing product design with analytical dataInforming product design with analytical data
Informing product design with analytical dataTeam Consulting Ltd
 
Optimizing transformation for linearity between online
Optimizing transformation for linearity between onlineOptimizing transformation for linearity between online
Optimizing transformation for linearity between onlineAlexander Decker
 
Evaluation of matcont bifurcation w jason picardo
Evaluation of matcont bifurcation   w jason picardoEvaluation of matcont bifurcation   w jason picardo
Evaluation of matcont bifurcation w jason picardoFatima Muhammad Saleem
 
Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...
Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...
Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...ArchiLab 7
 
LINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALS
LINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALSLINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALS
LINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALScscpconf
 
SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...
SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...
SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...Sudhendu Rai
 
Quotes on quality management
Quotes on quality managementQuotes on quality management
Quotes on quality managementselinasimpson331
 

Similar to Data science in chemical manufacturing (20)

Automated well test analysis ii using ‘well test auto’
Automated well test analysis ii using ‘well test auto’Automated well test analysis ii using ‘well test auto’
Automated well test analysis ii using ‘well test auto’
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in data
 
C054
C054C054
C054
 
Kelly zyngier oil&gasbookchapter_july2013
Kelly zyngier oil&gasbookchapter_july2013Kelly zyngier oil&gasbookchapter_july2013
Kelly zyngier oil&gasbookchapter_july2013
 
Unit-Operation Nonlinear Modeling for Planning and Scheduling Applications
Unit-Operation Nonlinear Modeling for Planning and Scheduling ApplicationsUnit-Operation Nonlinear Modeling for Planning and Scheduling Applications
Unit-Operation Nonlinear Modeling for Planning and Scheduling Applications
 
Production_planning_of_a_furniture_manufacturing_c.pdf
Production_planning_of_a_furniture_manufacturing_c.pdfProduction_planning_of_a_furniture_manufacturing_c.pdf
Production_planning_of_a_furniture_manufacturing_c.pdf
 
Supply chain design under uncertainty using sample average approximation and ...
Supply chain design under uncertainty using sample average approximation and ...Supply chain design under uncertainty using sample average approximation and ...
Supply chain design under uncertainty using sample average approximation and ...
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27
 
Quality of service management
Quality of service managementQuality of service management
Quality of service management
 
Stochastic behavior analysis of complex repairable industrial systems
Stochastic behavior analysis of complex repairable industrial systemsStochastic behavior analysis of complex repairable industrial systems
Stochastic behavior analysis of complex repairable industrial systems
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
Informing product design with analytical data
Informing product design with analytical dataInforming product design with analytical data
Informing product design with analytical data
 
Optimizing transformation for linearity between online
Optimizing transformation for linearity between onlineOptimizing transformation for linearity between online
Optimizing transformation for linearity between online
 
report
reportreport
report
 
Evaluation of matcont bifurcation w jason picardo
Evaluation of matcont bifurcation   w jason picardoEvaluation of matcont bifurcation   w jason picardo
Evaluation of matcont bifurcation w jason picardo
 
Project Report (Summer 2016)
Project Report (Summer 2016)Project Report (Summer 2016)
Project Report (Summer 2016)
 
Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...
Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...
Ying hua, c. (2010): adopting co-evolution and constraint-satisfaction concep...
 
LINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALS
LINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALSLINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALS
LINEAR REGRESSION MODEL FOR KNOWLEDGE DISCOVERY IN ENGINEERING MATERIALS
 
SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...
SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...
SIMULATION-BASED OPTIMIZATION USING SIMULATED ANNEALING FOR OPTIMAL EQUIPMENT...
 
Quotes on quality management
Quotes on quality managementQuotes on quality management
Quotes on quality management
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

Data science in chemical manufacturing

  • 1. 36  www.aiche.org/cep  March 2016  CEP Special Section: Big Data Analytics B ig data in the process industries has many of the characteristics represented by the four Vs — volume, variety, veracity, and velocity. However, process data can be distinguished from big data in other industries by the complexity of the questions we are trying to answer with process data. Not only do we want to find and interpret patterns in the data and use them for predictive purposes, but we also want to extract meaningful relationships that can be used to improve and optimize a process. Process data are also often characterized by the pres- ence of large numbers of variables from different sources, something that is generally much more difficult to handle than just large numbers of observations. Because of the multisource nature of process data, engineers conducting a process investigation must work closely with the IT depart- ment that provides the necessary infrastructure to put these data sets together in a contextually correct way. This article presents several success stories from dif- ferent industries where big data has been used to answer complex questions. Because most of these studies involve the use of latent variable (LV) methods such as principal component analysis (PCA) (1) and projection to latent structures (PLS) (2, 3), the article first provides a brief overview of those methods and explains the reasons such methods are particularly suitable for big data analysis. Latent variable methods Historical process data generally consist of measure- ments of many highly correlated variables (often hundreds to thousands), but the true statistical rank of the process, i.e., the number of underlying significant dimensions in which the process is actually moving, is often very small (about two to ten). This situation arises because only a few dominant events are driving the process under normal operations (e.g., raw material variations, environmental effects). In addition, more sophisticated online analyzers such as spectrometers and imaging systems are being used to generate large numbers of highly correlated measurements on each sample, which also require lower-rank models. Latent variable methods are uniquely suited for the analysis and interpretation of such data because they are based on the critical assumption that the data sets are of low statistical rank. They provide low-dimension latent variable models that capture the lower-rank spaces of the process variable (X) and the response (Y) data with- out over-fitting the data. This low-dimensional space is defined by a small number of statistically significant latent variables (t1, t2, …), which are linear combinations of the measured variables. Such variables can be used to con- struct simple score and loading plots, which provide a way to visualize and interpret the data. Big data holds much potential for optimizing and improving processes. See how it has already been used in a range of industries, from pharmaceuticals to pulp and paper. Salvador García Muñoz Eli Lilly and Co. John F. MacGregor ProSensus, Inc. BIG DATA Success Stories in the Process Industries Copyright © 2016 American Institute of Chemical Engineers (AIChE)
  • 2. CEP  March 2016  www.aiche.org/cep  37 The scores can be thought of as scaled weighted aver- ages of the original variables, using the loadings as the weights for calculating the weighted averages. A score plot is a graph of the data in the latent variable space. The load- ings are the coefficients that reveal the groups of original variables that belong to the same latent variable, with one loading vector (W*) for each latent variable. A loading plot provides a graphical representation of the clustering of variables, revealing the identified correlations among them. The uniqueness of latent variable models is that they simultaneously model the low dimensional X and Y spaces, whereas classical regression methods assume that there is independent variation in all X and Y variables (which is referred to as full rank). Latent variable models show the relationships between combinations of variables and changes in operating conditions — thereby allowing us to gain insight and optimize processes based on such historical data. The remainder of the article presents several industrial applications of big data for: • the analysis and interpretation of historical data and troubleshooting process problems • optimizing processes and product performance • monitoring and controlling processes • integrating data from multivariate online analyzers and imaging sensors. Learning from process data A data set containing about 200,000 measurements was collected from a batch process for drying an agrochemical material — the final step in the manufacturing process. The unit is used to evaporate and collect the solvent contained in the initial charge and to dry the product to a target residual solvent level. The objective was to determine the operating conditions responsible for the overall low yields when off-specification product is rejected. The problem is highly complex because it requires the analysis of 11 initial raw material conditions, 10 time trajectories of process variables (trends in the evolu- tion of process variables), and the impact of the process variables on 11 physical properties of the final product. The available data were arranged in three blocks: • the time trajectories measured through the batch, which were characterized by milestone events (e.g., slope, total time for stage of operation), comprised Block X • Block Z contained measurements of the chemistry of the incoming materials • Block Y consisted of the 11 physical properties of the final product. A multiblock PLS (MBPLS) model was fitted to the three data blocks. The results were used to construct score plots (Figure 1), which show the batch-to-batch product quality variation, and the companion loading plots (Figure 2), which show the regressor variables (in X and Z) that were most highly correlated with such variability. Contrary to the initial hypothesis that the chemistry vari- ables (Z) were responsible for the off-spec product, the analy- sis isolated the time-varying process variables as a plausible cause for the product quality differences (Figure 1, red) (4). This was determined by observing the direction in which the product quality changes (arrow in Figure 1) and identifying the variables that line up in this direction of change (Figure 2). Variables z1–z11 line up in a direction that is close to perpen- dicular to the direction of quality change. Z1 Z3 Z10 Z5 Z4 Z8 Z2 Z7 Z11 Z6 Level1 Temp1 Time2 Time4 Temp2 Time1 Time3 TempSlope 0.7 0.6 0.5 0.4 0.3 0.2 0.71 0 –0.1 –0.2 –0.3 –0.4 –0.3 –0.2 –0.1 0 0.1 0.2 0.3 0.4 W* [1] W*[2] Z9 Weight Wet Cake p Figure 1. A score plot of two latent variables shows lots clustered by product quality. Source: (4). p Figure 2. A companion loading plot reveals the process parameters that were aligned with the direction of change in the score plot. Source: (4). 6 4 2 0 –2 –4 –6 –6 –4 –2 0 2 4 6 t2 t1 On-Spec (High Residual Solvent) On-Spec Off-Spec Copyright © 2016 American Institute of Chemical Engineers (AIChE)
  • 3. 38  www.aiche.org/cep  March 2016  CEP Special Section: Big Data Analytics Optimizing process operations The manufacture of formulated products (such as pharmaceutical tablets) generates a complex data set that extends beyond process conditions to also include informa- tion about the raw materials used in the manufacture of each lot of final product, and the physical properties of the raw materials. This case study can be represented by multiple blocks of data: the final quality of the product of interest (Y), the weighted average for the physical properties of the raw materials used in each lot (RXI), and the process and environmental conditions at which each lot was manufac- tured (Z). These blocks of data were used to build a MBPLS model that was later embedded within a mixed-integer non- linear programming (MINLP) optimization framework. The physical properties of the lots of material available in inven- tory are represented by data block XA and the properties of the lots of material used to manufacture the final product are represented by data block X. The objective for the optimization routine was to deter- mine the materials available in inventory that should be combined and the ratios (r) of those that should be blended to obtain the best next lot of finished product. The square of the difference between the predicted and the target quality of the product was used to choose the lots and blending ratios. The underlying calculations reduce the problem to the score space, where the differences in quality — in this case tablet dissolution — correspond to different locations on the score plot (Figure 3). The MINLP optimization routine identified the candidate materials available in inventory that should be blended together to make the final product so that the score for the next lot lands in the score space corresponding to the desired quality (i.e., target dissolu- tion). Implementing this optimization routine in real time significantly improved the quality of the product produced in this manufacturing process (Figure 4). Selecting the materials from inventory to be used in manufacturing a product is not as simple as choosing those that will produce the best lot of product. If you choose materials aiming to produce the best next lot, you will inevitably consume the best materials very fast; this may be acceptable for a low-volume product. For high-volume products, however, using this same calculation will lead to an undesired situation where the best materials have been depleted and the less-desirable raw materials are left. In this case, it is better to perform the optimization routine for the best next campaign (a series of lots), which will account for the fact that more than one acceptable lot of product is being manufactured. The optimization calcula- tion in this latter case will then balance the use of inventory and enable a better management of desireable vs. less- desirable raw materials for the entire campaign of manu- factured product. The MINLP objective function must be tailored to the material management needs for the given product so that it adequately considers operational constraints, such as the maximum number of lots of the same material to blend (5, 6). Monitoring processes Perhaps the most well-known application of principal components analysis in the chemical process industries (CPI) is its use as a monitoring tool, enabling true multi- variate statistical process control (MSPC) (7, 8). In this example, a PCA model was used to describe the normal variability in the operation of a closed spray drying system in a pharmaceutical manufacturing process (9). The system Slow Dissolution Fast Dissolution Target Dissolution 1.5 1 0.5 0 –0.5 –1 –1.5 –2 –1.5 –1 –0.5 0 0.5 1 1.5 2 t3 t1 p Figure 3. The dissolution speed of a pharmaceutical tablet is identified on a score plot of the latent variables. Source: (5). Historical Data Best-Next-Lot Approach Best-Next-Campaign Approach {{ Quality Problems Dissolution,% USL LSL Target 45 40 35 30 25 20 15 Lots of Finished Goods p Figure 4. A control chart of the degree of dissolution of a pharmaceuti- cal tablet reveals the onset of quality problems. Quality problems are reduced by the implementation of a best-next-lot solution, then eliminated by the best-next-campaign approach. Source: (6). Copyright © 2016 American Institute of Chemical Engineers (AIChE)
  • 4. CEP  March 2016  www.aiche.org/cep  39 (Figure 5) includes measurements of 16 process variables, which can be projected by a PCA model into two principal components (t1 and t2), each of which describes a differ- ent source of variability in the process. A score plot that updates in real time can then be used as a graphical tool to determine when the process is exhibiting abnormal behavior. This is illustrated in Figure 6, where the red dots indicate the current state of the process, which is clearly outside of the normal operating conditions (gray markers). It is important to emphasize that this model could be used to effectively monitor product quality without the need to add online sensors to measure product properties. Building an effective monitoring system requires a good data set that is representative of the normal operating con- ditions of the process. Control of batch processes Multivariate PLS models built from process data that relate the initial conditions of the batch (Z), the time-varying process trajectories (X), and the final quality attributes (Y) (10) provide an effective way to control product quality and productivity of batch processes. Those models can be used online to collect evolving data of any new batch (first the initial data in Z and then the evolving data in X), which are then used to update the predictions of final product qual- ity (Y) at every time interval during the batch process. At certain critical decision points (usually each batch has one or two), a multivariate optimization routine is run to identify control actions that will drive the final quality into a desired target region and maximize productivity while respecting all operating constraints (11–13). Figure 7 displays one quality attribute of a high-value food product before and after this advanced process 6 4 2 0 –2 –4 –6 –10 –8 –6 –4 –2 0 2 4 6 8 10 t1 t2 Abnormal Operating Conditions Normal Operating Conditions p Figure 6. A score plot of the two principal components describing the closed-loop spray drying system (Figure 5) shows that the process is operating under abnormal conditions. Source: (9). p Figure 5. A closed-loop spray drying system in a pharmaceutical manufacturing facility is being monitored by the measurement of 16 variables that a PCA model projects into two principal components. Source: (9). With Control No Control 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Deviation from Target FinalProductQualityAttribute p Figure 7. Advanced control eliminated the variation in the final product quality attribute of a food product. Source: (9). Feed Pump Drying Chamber T P FS T P W Cyclone Baghouse Process Heater HEPA Filter Thermal Mass Flow Sensor FS Supply Fan Condenser Exhaust Fan HEPA Filter Data Logging of Product Collection Weight Exhaust Pressure Controlled by Exhaust Fan Speed Drying Gas Flowrate Controlled by Supply Fan Speed Exhaust Pressure Transducer Copyright © 2016 American Institute of Chemical Engineers (AIChE)
  • 5. 40  www.aiche.org/cep  March 2016  CEP Special Section: Big Data Analytics control method was implemented over many thousands of batches. The process control method reduced the root- mean-square deviation from the target for all final product quality attributes by 50–70% and increased batch produc- tivity by 20%. Analyzing information from advanced analyzers and imaging sensors The use of more-sophisticated online analyzers (e.g., online spectrometers) and image-based sensors for online process monitoring is becoming more prevalent in the CPI. With that comes the need for more powerful meth- ods to handle and extract information from the large and diverse data blocks acquired from such sophisticated online monitors. Latent variable methods provide an effec- tive approach (14). Consider a soft sensor (i.e., virtual sensor software that processes several measurements together) application for predicting the quality of product exiting a lime kiln at a pulp and paper mill. Real-time measurements on many process variables were combined with images from a color camera capturing the combustion region of the kiln. The information extracted from the combustion zone images and data from the process data blocks were combined using the online multivariate model to assess combustion stability and make 2-hr-ahead predictions of the exit lime quality. Concluding remarks Contextually correct historical data is a critical asset that a corporation can take advantage of to expedite asser- tive decisions (3). A potential pitfall in the analysis of big data is assuming that the data will contain information just because there is an abundance of data. Data contain information if they are organized in a contextually cor- rect manner; the practitioner should not underestimate the effort and investment necessary to organize data such that information can be extracted from them. Multivariate latent variable methods are effective tools for extracting information from big data. These methods reduce the size and complexity of the problem to simple and manageable diagnostics and plots that are accessible to all consumers of the information, from the process design- ers and line engineers to the operations personnel. CEP Literature Cited 1. Jackson, E., “A User’s Guide to Principal Components,” 1st ed., John Wiley and Sons, Hoboken, NJ (1991). 2. Höskuldsson, A., “PLS Regression Methods,” Journal of Chemo- metrics, 2 (3), pp. 211–228 (June 1988). 3. Wold, S., et al., “PLS — Partial Least-Squares Projection to Latent Structures,” in Kubiny, H., ed., “3D-QSAR in Drug Design,” ESCOM Science Publishers, Leiden, The Netherlands, pp. 523–550 (1993). 4. García Muñoz, S., et al., “Troubleshooting of an Industrial Batch Process Using Multivariate Methods,” Industrial and Engineering Chemistry Research, 42 (15), pp. 3592–3601 (2003). 5. García Muñoz, S., and J. A. Mercado, “Optimal Selection of Raw Materials for Pharmaceutical Drug Product Design and Manufacture Using Mixed Integer Non-Linear Programming and Multivariate Latent Variable Regression Models,” Industrial and Engineering Chemistry Research, 52 (17), pp. 5934–5942 (2013). 6. García Muñoz, S., et al., “A Computer Aided Optimal Inventory Selection System for Continuous Quality Improvement in Drug Product Manufacture,” Computers and Chemical Engineering, 60, pp. 396–402 (Jan. 10, 2014). 7. MacGregor, J. F., and T. Kourti, “Statistical Process Control of Multivariable Processes,” Control Engineering Practice, 3 (3), pp. 403–414 (1995). 8. Kourti, T., and J. F. MacGregor, “Recent Developments in Multivariate SPC Methods for Monitoring and Diagnosing Process and Product Performance,” Journal of Quality Technology, 28 (4), pp. 409–428 (1996). 9. García Muñoz, S., and D. Settell, “Application of Multivariate Latent Variable Modeling to Pilot-Scale Spray Drying Monitoring and Fault Detection: Monitoring with Fundamental Knowledge,” Computers and Chemical Engineering, 33 (12), pp. 2106–2110 (2009). 10. Kourti, T., et al., “Analysis, Monitoring and Fault Diagnosis of Batch Processes Using Multiblock and Multiway PLS,” Journal of Process Control, 5, pp. 277–284 (1995). 11. Yabuki, Y., and J. F. MacGregor, “Product Quality Control in Semibatch Reactors Using Midcourse Correction Policies,” Industrial and Engineering Chemistry Research, 36, pp. 1268–1275 (1997). 12. Yabuki, Y., et al., “An Industrial Experience with Product Quality Control in Semi-Batch Processes,” Computers and Chemical Engi- neering, 24, pp. 585–590 (2000). 13. Flores-Cerrillo, J., and J. F. MacGregor, “Within-Batch and Batch-to-Batch Inferential Adaptive Control of Semi-Batch Reactors,” Industrial and Engineering Chemistry Research, 42, pp. 3334–3345 (2003). 14. Yu, H., et al., “Digital Imaging for Online Monitoring and Control of Industrial Snack Food Processes,” Industrial and Engineering Chemistry Research, 42 (13), pp. 3036–3044 (2003). Multivariate latent variable methods reduce a problem to manageable diagnostics and simple plots. Copyright © 2016 American Institute of Chemical Engineers (AIChE)