Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
Daamen r 2010scwr-cpaper
1. Development of Inferential Sensors for
Real-time Quality Control of Water-
Level Data for the Everglades Depth
Estimation Network
Ruby Daamen, Advanced Data Mining Int’l
Edwin Roehl, Advanced Data Mining Int’l
Paul Conrads, USGS – SC Water Science Center
Matthew Petkewich, USGS – SC Water Science Center
2. Presentation Outline
• What is an “Inferential Sensor” (IS)?
• Background - Industrial application
• Everglades Depth Estimation Network
(EDEN)
• Automated Data Assurance and
Management (ADAM) - Inferential Sensor
development for EDEN
3. Development of IS in Industry
• A tough environment to monitor
– Emissions regulations require measurements of
effluent gases
– Smoke stack burns up probes
– Need alternative to “hard” sensors
4. Hard Sensor vs. Inferential Sensor
• Virtual sensor replaces actual sensor
– Temporary gage smoke stack
– Operate plant to cover range of emissions
– Develop model of emissions based on
operations
– Model becomes the “Inferential Sensor”
5. Inferential Sensors for Real-Time Data
• Problem – Need to minimize missing and
erroneous data
• Use similar approach taken by industry to predict
real-time data – ie. “Inferential Sensors”
– QA/QC hard sensor
– Provide accurate estimates for hard sensor
– Provide redundant signal
6. EDEN Problem
• EDEN – integrated network of real-time water-
level (WL) stations, ground-elevation models, and
water-surface models
– Real-time data is used to generate EDEN WL surfaces
used by scientists, engineers and water-resource
managers (253+ stations)
– Data used to guide large-scale field operations,
integrate hydrologic and ecological responses, support
assessments that measure ecosystem responses to the
Comprehensive Everglades Restoration Plan (CERP)
– Correcting errors is often time consuming; gaging
stations may be in remote areas
– Need to identify errors and to provide estimates on a
daily basis
8. EDEN / ADAM Overview
Inferential Sensor
application
9. Some of the Challenges
• 253+ gaging stations
• Inferential sensor uses data from one or
more correlated gages. At any given
datetime do not know what stations will have
available data
• Stations added/removed over time
10. Implementation
• Create models “on the fly”
– Use “best” available stations
– Simplifies addition / removal of stations
• Needs to consider
– Assessment of data – prevent use of suspect
data in models
– Model inputs need be decorrelated
11. Methods
• Two algorithms sequenced to analyze data
– Univariate filtering
• Provides information about the quality and behavior
for each stations WL
• A Statistical Process Control (SPC) which consists of
14 univariate filters – uniquely set for each station
– Estimate parameter value
• Select a “pool” of candidate gaging stations using
matrix of Pearson coefficients
• Principal component analysis (PCA) – calculates
decorrelated inputs
• Multivariate linear regression
12. Univariate Filtering
UNIVARIATE
FILTER CHECK DESCRIPTION PRECEDENCE
WATER LEVEL
LIMIT (ft.)
LOST_SIGNAL no signal 1 NA
GT_RNG_UL x(t) > signal range Upper Range Limit 2 15.19
LT_RNG_LL x(t) < signal range Upper Range Limit 3 6.99
GT_UCL x(t) > signal Upper Control Limit 4 14.73
LT_LCL x(t) < signal Upper Control Limit 5 8.56
Sn_LT_L flatlined: x'(t) = x(t)=x(t-1); SUM[(|x'(t)|,…,|x'(t-n+1)|] < Limit 6 0.00
D1_GT_L_1 vfast vlarge increase: x(t)-x(t-1) > Limit 7 1.92
D1_LT_L_1 vfast vlarge decrease: x(t)-x(t-1) < Limit 8 -2.34
D1Sn_GT_L_1 fast vlarge increase: x'(t)=x(t)-x(t-1); Sum[x'(t),…x'(t-n+1)] > Limit 9 1.98
D1Sn_LT_L_1 fast vlarge decrease: x'(t)=x(t)-x(t-1); Sum[x'(t),…x'(t-n+1)] < Limit 10 -2.52
D1_GT_L_2 vfast large increase: x(t) - x(t-1) > Limit 11 1.69
D1_LT_L_2 vfast large decrease: x(t) - x(t-1)< Limit 12 -0.25
D1Sn_GT_L_2 fast large increase: x'(t)=x(t)-x(t-1); Sum[x'(t),…x'(t-n+1)] > Limit 13 1.98
D1Sn_LT_L_2 fast large decrease: x'(t)=x(t)-x(t-1); Sum[x'(t),…x'(t-n+1)] < Limit 14 -0.27
• Additional filters
– Dry Protocol – set using offset from ground elevation
• Any filter trips are flagged for review
• Data triggering a filter is not used in any
predictions
13. Synthesize WL Measurements
• WL at candidate stations are correlated – no
surprise
• First approach examined selected the most
highly correlated station as a “standard”
signal and then attempted decorrelating the
other stations by computing differences from
the standard.
14. Principal Component Analysis (PCA)
• PCA is a statistical technique used to
“reduce the dimensionality of a data set
consisting of a large number of interrelated
variables, while retaining as much as
possible of the variation present in the data
set. This is achieved by transforming to a
new set of variables, the principal
components (PCs), which are uncorrelated,
and which are ordered so that the first few
retain most of the variation present in all of
the original variables” (Joliffe)
15. PCA – Main Points
• PCA - the main points
– Principal components are uncorrelated
– Transforms a set of correlated variables into a
smaller number of uncorrelated variables
– The first principal component (PC) accounts for
most of the variability in the data
16. PCA - Analysis
• PCA – a brief description of the analysis:
– Calculate the eigenvectors and eigenvalues of
the covariance matrix
• Create data set of n inputs with no gaps
• Subtract the mean from each n input
• Calculate the covariance matrix (square nXn matrix)
• Calculate eigenvectors
– Sort by eigenvalues (highest to lowest)
• Largest eigenvector = 1st principal component
• Use eigenvalues to determine how many PCs to
include
17. PCA – A 2-Dimension Example
Original Data
(Mean Subtracted)
Eigenvectors Principal
Components
Eigenvectors
Plotted on
Data
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
Normalized Data
E1
E2
Original Data
18. ADAM - Functionality
• Setup
– File paths
– PCA setup – period, number of sites to include
– Add / Edit / Remove sites
– Univariate filters
• Inferential Sensor – Option to analyze daily
(hourly and 15 minute data), quarterly and
annual (hourly) daily files
• Review results
• Output daily median files as required for
EDEN water surface map
19. ADAM – Control Worksheet
Select Daily, Quarterly or
Annual Run Analysis
Resume, redo OR continue from
last analyzed
Fill Setup
Remove , add, or edit sites
included in ADAM
Set Pathnames for files used
by ADAM
20. ADAM – Control Worksheet
Loads data from
selected run for review
Creates output files
required to generate
EDEN water surface
map
Dumps a listing sites
tripping any filters
27. PCA – The methodology
• From our favorite source -Wikipedia – PCA is:
– A mathematical procedure that transforms a set of correlated variables into a smaller number of
uncorrelated variables. The first principal component accounts for as much of the variability in
the data as possible and each succeeding component accounts for as much of the remaining
variability as possible.
• How to do it: In the broadest sense it is an eigenvector based multivariate analysis. The
methodology used is:
– Assemble the data
• In EDENIS the data will be water level data from up to 5 gaging stations. For 90 days of hourly data this
equates to up to 2160 vectors (8640 for 15 minute data)
– Remove any vectors that contain any missing data (1)
• Resulting matrix X[n,m] where n = number of fully populated vectors; m = number of gages included
– Subtract the mean from each of the data dimensions (m) – lets call this the normalized matrix B
(2)
• B[n,m] stores mean-subtracted data
– Calculate the Covariance matrix (3)
• Covariance matrix is a square matrix with dimension mXm: C[m,m]
– Calculate the eigenvalues and eigenvectors of the covariance matrix
• This is an iterative process. If you want to look up some more on this I used the Jacobi eigenvalue
algorithm. Some important properties of eigenvectors:
– Can only be found for square matrices. If a square matrix (mXm) does have eigenvectors, there are m of them
– All the eigenvectors of a matrix are orthogonal to each other. This is important: when the data is expressed in terms of
these eigenvectors the resulting principal components are uncorrelated
• For a refresher course on eigenvalues / eigenvectors heres a link
http://en.wikipedia.org/wiki/Eigenvalue,_eigenvector_and_eigenspace
– Sort the results: Largest eigenvalue to smallest eigenvalue. (4)
– Decide how many components to keep and calculate the new data set to be used. This is a
simple matrix multiplication of B X E where E is the mXm eigenvector matrix. (5)
• For the 10-12 sites I looked at when presenting the PCA results I would expect we’ll rarely if ever use more
than 2 principal components out of a possible 5 to make the regression predictions.
28. PCA – A 2-Dimension Example
1. Original Data 2. Mean
Subtracted
3. Covariance
Matrix
4. Eigenvalues /
Eigenvectors
5. New Data Set
Note high
correlation of
orignal X1, X2 vs.
No correlation of
PC1, PC2
E1 and E2 are the 2
eigenvectors layed on
top of the data. Note E1
and E2 are
perpendicular. Also note
that E1 goes through the
middle of the data – like
a best fit. E2 provides
less information about
the variance in the data.
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
Normalized Data
E1
E2
29. Challenges
• Develop 253+ models using artificial neural
networks (ANNs) (1 model per station)
– Pros
• authors have prior success modeling complex
processes using ANNs
• ANNs use non-linear curve fitting to capture complex
behaviors
– Cons
• Do not know what stations will have “good” data at
any given datetime
• Stations are removed and added
WL data at real-time gages are related to ungaged ares andusing ground elevation data, water depths throughout the Everglades are computed
Data used by scientists, engineers, and water resource managers to support CERP – Comprehensive Everglades Restoration Plan
Used to :
1) Guide large scale field operations, 2) integrate hydrologic and ecological responses and 3) support biological and ecological assesssments that measure ecosystem responses to the CERP