SlideShare a Scribd company logo
1 of 11
Download to read offline
sean@pingthings.io
21, rue d’Artois, F-75008 PARIS CIGRE US National Committee
http://www.cigre.org 2019 Grid of the Future Symposium
Who Monitors the Monitors?
Automated, Hierarchical Data Quality Assessment for Time Synchronized
Measurements
B. BENGFORT, S. P. MURPHY M. ANDERSEN K. D. JONES
PingThings, Inc. PingThings, Inc. Dominion Energy
USA South Africa USA
SUMMARY
Distributed sensor systems are a critical component required for the operation of a reliable,
resilient, and efficient electrical grid. However, just as the grid requires monitoring, so too
does the sensor system itself. Therefore, the continuous assessment of the quality of the data
captured by sensors becomes an important obligation and task for transmission and
distribution system operators. This paper examines the underlying philosophy of data quality
assessment and then uses this perspective to craft a set of requirements for an operational
system. The paper then discusses one potential system implementation that meets or exceeds
the requirements posed and then examines a few ways data quality assessment could evolve.
KEYWORDS
Time synchronized measurements, synchrophasor, data quality, distributed sensor network
2
1. INTRODUCTION
Death and taxes are joined by a third unavoidable certainty - data quality issues - for any organization
that operates a network of sensors measuring the real world. Electric utilities, stewards of
transmission and distributions systems together valued at over a trillion dollars in the US, are no
exception. If an organization’s assets are valuable enough to be instrumented with sensors, they are
valuable enough for the organization to monitor the quality of the data that is generated by those
sensors. As entropy perpetually assaults such a system, data quality assessments must be done
continuously. It is not something that, done once, is forever resolved and disappears.
The quality of the data captured by sensors can be impacted by a number of different factors that span
disciplines, from the field of metrology itself to the installation and configuration of the actual
hardware, to the communications channel, to the networking layer of the system, to the computer
systems and software receiving and persisting the data. No single engineer or subject matter expert has
deep knowledge in all of these fields, potentially making data quality problems especially pernicious.
Further, as a result of its interdisciplinary nature, responsibility for data quality can often fall through
the organizational cracks of a system operator and no individual or no group “owns” data quality.
Over the last 5 years, the authors of this paper have examined over a petabyte of synchrophasor data
from utilities across North America, including both the transmission and distribution grids. While this
total data volume examined may sound large, it represents less than one percent of the total
synchrophasor data available if all deployed phasor measurement capabilities were activated. Using a
conservative estimate of 20 channels of data per sensor, each at 30 samples per second, we estimate
that over 150 Petabytes of time synchronized, phasor measurements would be generated per year if all
of the hundreds of thousands of sensors were simply turned on. At either scale, the need for data
quality assessment becomes obvious as does the futility of human-based, ad-hoc assessments. The
quality of terabytes, petabytes, or even larger data sets must be interrogated, assessed, and captured in
an automated fashion. Further, given the magnitude of this analysis, it should be done once and only
once in a standardized fashion and included with data sent to each downstream consumer of data, be it
an application, an analytic, or an end user.
The modern utility operates an enormous, distributed sensor network composed of up to millions of
sensors of different types--smart meters, synchrophasors, power quality meters, point on wave, and
more--and reporting at varying data rates from once every 30 minutes to over a hundred thousand
samples per second. This paper will explore the requirements for a system capable of monitoring the
quality of the data originating from such a system at scale. We then examine particular aspects of a
universal sensor analytics platform that enable continuous data quality surveillance, attempting to
meet or achieve all requirements set forth. Finally, the paper will conclude with a glimpse into the
future of data quality assessments. With this effort, the authors of this paper hope to answer the
question, who (or what) monitors the monitors and how.
2. DATA QUALITY ASSESSMENT REQUIREMENTS
The high-level objective of such a system is to maximize the amount of “usable” data generated by the
sensor network while minimizing the consumption of resources to accomplish this goal. To make this
more tangible, consider a fictional grid measured by 100 sensors, each providing 10 channels of data
with 100Hz sampling per channel. This sensor network generates 100 x 10 x 100 = 100,000
measurements describing the grid per second. The objective of the associated data quality monitoring
system is to ensure that as many of these 100,000 measurements per second are “usable” by as many
downstream consumers as possible.
The first general requirement that we have of the system is that it must be capable of assessing the data
quality of the distributed sensor system for two different time periods. The first is historical, starting
3
with the current moment and stretching back in time to when the sensor first captured data. This
period allows those responsible for the sensor data to understand what problems occurred in the past,
what problems that were intermittent might still be impacting data quality, and what problems may
still be persisting today. The second time period is the present; the system must be able to examine and
analyze the quality of data as it arrives. This allows an alert to be triggered as bad data arrives,
enabling the system or its operators to take corrective action quickly to decrease the length of time bad
data pollutes the system in pursuit of one hundred percent good data.
The second general requirement for the system is that the data quality assessment must be persisted so
that downstream consumers do not have to replicate the analysis unless their data quality needs are
extraordinary. This record of the data quality must be forever associated to the original data. As a
caveat to this requirement, storage, both on premise and in the cloud, can be expensive. Amazon
AWS’s storage tiers range in cost from less than a penny per gigabyte per month for the slowest tier,
Glacier, to $0.30 per gigabyte per month for Amazon’s Elastic File Storage as of August 2019 [1]. It is
quite possible that the data quality assessment for each measurement could consume at least as much
space as the original value. Depending on how data is stored, the data quality assessment could inflate
the total storage size by 50-100% at a minimum if full temporal resolution is kept. Thus, it may make
sense to persist only an aggregated data quality assessment in lieu of full resolution.
The last general requirement for the system is to handle two broad categories of data quality
assessments. The first is a generic assessment that can be applied to every time series data stream
ingested by the platform, answering basic questions that are applicable to any sensor data. The second
is to be flexible enough to have custom data quality assessors applied to different types of time series,
some potentially specified by users.
3. SYSTEM IMPLEMENTATION
Assessing data quality for high frequency sensor networks should not be a bolt on after-thought but
rather one of the first design requirements for any system handling real world sensor data. Sensor
networks are distributed over a wide geographic area and send data through a large number of
subsystems including data concentrators and intermediate caches, all of which may contribute to poor
data quality. Because of this, the problem of data quality assessment is closely related to the problem
of how to use sensor data effectively. These design requirements mean that the data quality
assessment must occur as far downstream and as close to the end consumer as possible, ideally where
the sensor data is centrally and ultimately stored.
The PredictiveGridTM
platform was designed from the ground up to be a third-generation data
management, analytics, and artificial intelligence platform specifically for high-frequency time series
(e.g. dense telemetry generated from sensor data) [2]. Because the platform was architected to store,
process, and analyze dense telemetry faster than real-time, it is well-suited to both historical data
analysis and model generation as well as applying computation to live, streaming data. Because of this
dual role, the PredictiveGrid platform is the last stop in the automatic data processing pipeline before
data is presented to users, and therefore also is required to provide data quality assessment and
assurance.
3.1 Enabling General Purpose Sensor Data Quality Assessment: The Berkeley Tree
The data structure at the heart of the PredictiveGrid is the Berkeley Tree [3] - a time-partitioning,
multi-resolution, versioned k-ary tree. An artist’s rendering of the tree is shown in Figure 1. The leaf
nodes of the tree store fixed blocks of (time, value) pairs for a specific time-range. These blocks are
summarized by statistical aggregates, which are themselves stored in fixed time-ranges and are also
summarized by higher levels of the tree. This creates a hierarchical data structure where different
levels of the tree represent different time resolutions, from all epoch representable time at the root
node to leaf nodes containing nanoseconds of data through granularities representing years, months,
4
weeks, hours, etc. Finally, links between nodes are annotated with a monotonically increasing version
number when the underlying data is modified, ensuring that the tree is consistent even as data is being
written.
Figure 1: Representation of the Berkeley Tree is a time-partitioning, multi-resolution, versioned k-ary tree. This novel data
structure allows users to rapidly make both historical and online data quality assessments.
One function of the Berkeley Tree is to provide highly performant dense telemetry writes and
aggregate reads. The tree structure allows data to be inserted out of order unlike other append-only
systems. This ensures that sensor systems subject to variable network latencies and intermittent
communication outages can apply writes out of order or across overlapping windows as efficiently as
possible. Each write updates a version, ensuring that analysts can read at a specific version even while
writes are ongoing, giving their computations consistency through snapshot isolation [4]. Because
most time series analytics consider aggregate windows of data at varying resolution, the tree allows
users to query exactly the level of the tree where the aggregates are already stored without
computation. Secondarily, but just as importantly, the platform also allows for rich data quality
assessments using similar features: versioning, aggregates, and ingest distillers.
Data streaming into BTrDB (the Berkeley Tree Database) is generally written in blocks to the
underlying tree, causing a new version of the tree to be created. When the tree is read, the user must
specify a version, often the most recent, and only data belonging to that version is returned even as the
tree is being concurrently updated, e.g. they are seeing a specific snapshot of the data. Snapshot
isolation also ensures that results are repeatable as data changes - a key requirement for consistent
reporting. From a data quality perspective, versioning and snapshot isolation allows us to be more
aggressive when addressing data quality both because anomalous data can be recovered without
interrupting live streams and because the tree makes it possible to identify only what has changed
during automatic data cleaning. The ComputeDiff() function returns a list of time ranges where data
has changed between two versions, without requiring a query of raw data. By itself, simply tracking
the number of ranges that are changing between subsequent versions may indicate a problem (data is
being updated as much as it is being inserted). Tracking changes between more distant versions allows
data quality assessments to zoom directly into regions of question without having to query the entire
raw data set.
5
Figure 2: A snapshot of an interactive data quality heatmap over several months where each row is a stream and streams from
the same synchrophasor are adjacent. Darker regions indicate areas with a higher percentage of bad data (missing data, null
values, etc). This interactive visualization is made possible by only querying the aggregate values at a high-enough time
resolution for each stream, making interactive data quality visualization possible.
The interior tree nodes contain statistical aggregates, capturing the statistical properties and behavior
of all data points below them in the tree. Currently, the platform has implemented the following
statistical aggregates:
• count - the number of data points;
• max - the maximum value of the measurements;
• min - the minimum value of the measurements; and
• mean - the mean value of the measurements
for all data points below that specific interior node. In reality, any associative function can be used as a
statistical aggregate and the standard deviation has already been implemented in a mathematically
associative algorithm.
From a data quality perspective, these statistical aggregates create an almost ideal mechanism to assess
time series data quality in a generic way. For example, count allow you to infer from the sample
frequency if data is missing from a specific time range. This missing data metric is relevant to all
regularly sampled sensor data. The range of the window, defined by the min and max value of the raw
data, may indicate if there are any domain errors for even just a single point. Finally, standard
deviations may express abnormal changes in variability between windows that could also indicate
systemic problems or other types of changes that need to be addressed. As the tree is built in memory
upon data ingest and maintained by the database, all of the capabilities are “free” from the end user’s
perspective (and do not cost in terms of performance or latency). This allows for interactive data
visualizations like the one shown in Figure 2: a heatmap of bad data that relies on the performance of
querying statistical points to display groups of related streams or windows of time with poor data
quality.
3.2 Enabling Custom, Sensor-Specific Data Quality Assessment: Ingest Distillers
Timeliness in discovering and repairing data quality issues is essential to the PredictiveGrid platform.
Because the platform deals with very large datasets of dense telemetry, timeliness must be achieved
through Big Data analytics using distributed computing to parallelize computation across multiple
machines. The platform provides several options for distributed computing, including open source
options such as Apache Spark for general purpose machine learning and Google’s TensorFlow for
deep learning. However, to support automated data quality assessment with adaptive requirements, the
platform primarily relies on the DISTIL analytics framework [5].
6
DISTIL provides an environment for materializing analytical streams given a dataflow graph, a
directed acyclic graph that describes how data flows from input sources through different algorithmic
outputs before finally being stored as output streams in BTrDB. Each node in the dataflow graph (a
distiller) represents a stateless, idempotent computation that can be applied to a window of raw values
or statistical points (with some before and after the computational window). While seemingly simple,
the dataflow abstraction can be used to compose arbitrarily complex computations and, importantly, in
such a way that computations can be scheduled in parallel across a distributed computing cluster.
When a distiller is added to the platform, it is immediately applied to all new incoming data that match
the distiller inputs. Note that BTrDB supports two first class functions, GetNearestValue() and
ComputeDiff() that ensure the distiller is applied efficiently to only changed data, even if that data
arrives out of order. Further, the distiller also begins to back-process historical data in the stream while
prioritizing computation on incoming data. Given sufficient computational resources, the distillers will
eventually be applied to the entire input dataset while continuously making computations on incoming
streaming data. Because of this behavior, DISTIL is ideal for implementing an adaptive data quality
assessment process that can evolve with changing environment conditions and new error conditions.
Consider a distiller that takes as input a voltage or current phasor group measured by a PMU and
produces a per-second data quality score between 0 and 1 such that higher scores indicate higher data
quality. A first pass at this distiller might simply use the known sample rate of the PMU to measure
the number of missing points or to apply rules such as there can be no two consecutive, non-zero but
identical values for a stream. Later, after a few months of measurement, we may feel confident about
applying statistical thresholds to reduce our data quality score if there are measurements outside of a
few standard deviations of a computed mean. As more data enters the system, we can begin to add
depth to our data quality measurement by using equipment-specific flag streams or ensuring that the
magnitudes between phases are within a physical model based on their angle differences. As the data
quality system evolves, the distiller not only applies these new analytics to new, incoming data but
also to historical data, ensuring that a system-specific data quality assessment can be developed and
automatically applied organically. Note that versions in the data quality stream also allow analysts to
compare and understand how data quality changes as the assessment evolves, also deepening the
understanding of the underlying data.
4. REAL WORLD DATA QUALITY ISSUES
In this section, we will explore two case studies of data quality assessments that rely on time
synchronized measurements: data loss and outage detection and configuration change detection. We
will see how the error due to drift causes uncertainty in these assessments. Finally, we’ll explore how
timestamp jitter can lead to other data issues like decreasing the effectiveness of compression
mechanisms or causing variability across aggregated windows during analytics.
4.1 Data Loss and Outages
Given enough time or a big enough system, some data simply will not make it from the sensors to the
analytical data concentrator. This leads to the first and perhaps most fundamental data quality
question: are we receiving all of the data we should be? If the sample rate of the device and the
timestamp the first data point was received are known, it is easy to compare the total number of points
stored to the expected number of points received. When, inevitably, this ratio falls below one, new
questions arise: what is the distribution of outage durations, do they occur at regular intervals, are
these outages correlated with other devices, etc. For example, the analysis shown in Figure 3
demonstrates that there are both device-independent outages as well as routine outages that occur
across groups of devices. This may be related to maintenance windows or other processes that could
be adapted to keep better caches.
7
Figure 3: An analysis of the per-device outages across 2 months of data collection for a system operator. This analysis was
conducted on approximately 153B data points but completed in less than one minute via tree pruning that reduced the outage
search space.
A brute force approach to finding outages would be to scan every point and determine if the time delta
between those two points is greater than some minimum threshold (e.g. the sample interval). Besides
the obvious problem of the time required to compute this approach on large dense telemetry datasets,
it is also ineffective to apply this approach to online data as it streams in: data may arrive out of order
(e.g. it’s not lost, it’s simply late) or there could be a drift in the periodicity of the sample rate that
appears as an outage that is really just a time compression of two consecutive measurements.
The platform provides an effective solution to this problem because at higher levels of time
granularity, stat point aggregates mask the drift of time synchronization. Said another way, at lower
time resolutions (e.g. at the microsecond scale the number of points per period is more variable than
the number of points per period at the hourly scale. Because of the multi-resolution tree structure, the
platform ensures that outage analysis can occur in both offline, whole dataset analyses, and during
online applications because the tree ensures that only the time areas with probably outages are
inspected - pruning the search space and making the assessment more efficient.
A sketch of the algorithm is as follows. In a depth-first fashion traverse, the tree such that at each level
of the tree only nodes are kept that meet particular filtering criteria. First, determine the number of
expected points per node at the time-resolution of the level. For each consecutive pair of nodes at the
level, determine if they contain the expected number of points; if not, expand the next level of the tree
by selecting the aligned windows from the start time of the first node to the end time of the second. If
they do contain the expected number of points, compare end time of the first node to the start time of
the second to determine if the time delta between those nodes is greater than twice the expected
sample interval at that level. If so, an outage has occurred, and the duration of the outage can be
computed by subtracting the timestamp of the last point in the first node and the first point of the
second.
8
This mechanism works because the Berkeley Tree does not materialize nodes for data that does not
exist. If there is an hour-long outage, there will be a missing stat point at a pointwidth of 41 (statistical
points that cover a period of approximately 36.65) and the two consecutive stat points on either side
will have less than their required number points. The search will zoom in on this window of time and
ignore hours at the previous level that have complete stat points, pruning the tree only to the areas of
interest. Because we consider consecutive stat points, we also minimize the variability of the
periodicity that may mean that one statpoint has more than expected points while the one after has
fewer than expected or vice versa.
This data quality assessment can be conducted in both automated offline analyses and online
applications. In the offline analysis, the worst-case scenario is that every other point is missing, which
would result in an analysis equivalent to the brute force approach. In most cases, system providers
require a 95% uptime SLA, therefore tree pruning will identify gaps in logarithmic time and only
query the data rangers where gaps exist. In the online case, the hierarchical nature of the analysis
means that windows can be monitored at multiple granularities that create missing data alarms which
increase in severity as the outage continues. Because the distiller applies the data outage algorithm on
every insert or update, the alarm state will be eliminated if the data comes in out of order and the gap
is repaired or if the system returns to normal operation without repair, the outage and its severity will
be logged.
4.2 Detecting Configuration Changes
In the outage detection example, the algorithm required the sample rate to be known a priori. For
many time series analytics, particularly those that use windowing, the sample rate must also be known
in advance. This leads to the data quality question: are the devices configured at the expected sample
rate? For many devices the sample rate is a configuration with common settings at 15Hz, 30Hz, 60Hz,
and 120Hz; therefore, the corollary to this is to determine when a device configuration has changed,
which can have a significant, but hard to detect effect on downstream analytics. Moreover, if devices
are configured at too small a sample rate, events in very narrow time windows may be missed by
being sampling around.
The sample rate is usually expressed in Hz; however, loss of frequency may be a data loss problem,
not a configuration issue. Moreover, smaller aggregate windows like seconds may be affected by jitter
or other time synchronization issues. However, the platform makes this problem trivial to solve since
each interior node in the tree is defined by a point count and a time duration, such that frequency can
easily be computed and plotted as shown in Figure 4. At least initially, a visual analytics approach is
well-suited to this application, allowing the user to choose different time resolutions to see if they can
quickly spot changes in the frequency while minimizing the variability of the sample rate.
9
Figure 4: The frequency of this device is plotted at a time resolution of 1.63 days (pointwidth 47). In shows a clear
configuration change around June 9, where the sample rate is set to 60Hz, then again on June 11 where the sample rate is
reconfigured to 100Hz. The spikes in the graph also indicate where there may have been outages.
To automate this analysis, we note that the configuration changes usually change in step functions and
that it is rare to reconfigure devices more than once a day. Therefore, the automated analysis of this
stream operates at a daily time resolution and detects if two consecutive windows have a higher
frequency than the previous window. If so, this is likely not an outage, but rather an actual change.
Other approaches such as a rules based approach that uses standard configuration values may also be
applied to normalize the data quality alerts, or if there is a change in frequency, that window can be
expanded with the stat points at the lower level to determine if there were multiple configuration
changes in the same day.
4.3 Sample Rate Jitter
In the previous two case studies, we used the unique data structure of the platform to efficiently and
effectively diagnose data quality problems. In both cases, we noted that sample rate jitter is the cause
of a large number of analytical issues and used statistical aggregates to smooth the variability of the
data. More practically, data compression algorithms work better on less variable data, especially the
platform’s compression algorithm, which is designed for low variance sample rates. More
compression means less data storage, reducing the expense of the overall system.
10
Figure 5: Data quality issues due to timestamp jitter are only observed at a very low time resolution. This analysis that covers
over 14k streams for an hour’s worth of data shows that most streams are missing at least a few points, many of which may
have been sampled in either the previous or next periods meaning that there was no actual data loss.
It is important to conduct a thorough analysis of the sample interval jitter so that algorithmic solutions
such as truncating to the nearest microsecond or other configurable solutions may be employed. Many
sensor specifications require a sample rate at the millisecond or microsecond scale allowing
nanosecond time measurements to be aligned simply by rounding to the nearest microsecond as shown
in Figure 6. Higher resolution sensors such as Point on Wave sensors, which operate at kilohertz
sampling rates and beyond, still operate above the 50-200ns average network clock drift and can be
normalized in the same way. If sample rates are still too variable even after truncation or rounding,
distribution analysis allows for fuzzy alignment techniques such as binarization or linear interpolation
to smooth the dataset’s variability.
Figure 6: At the nanosecond scale, sample intervals can be highly variable, however if microsecond granularity is required,
then these timestamps could be truncated so that the majority of them would align with 33333 microseconds, and only a
small amount of variance would occur for those timestamps sampled at 33334 microseconds.
11
5. THE FUTURE
The overall objective function—to maximize the percentage of measurements captured by the system
that are usable downstream while minimizing the resources consumed in the process—steers future
directions for data quality monitoring. We will examine two efforts in progress that will help sensor
network operators optimize based on this objective function: (1) reporting and (2) automation via
artificial intelligence.
5.1 Reporting
The state of data quality for a network of sensors must be reported to multiple groups with differing
interests, potentially both internal and external to the system operator. The volume of information
reported and the frequency of reporting will vary greatly depending on the needs and goals of the
teams. With systems such as this, there is always the concern that too frequent of alarms will
desensitize the groups responsible for responding. Thus, there are multiple types of reporting that may
be needed including:
• real-time alerts broadcast via email or SMS,
• historical summaries showing system behavior over hours, days, weeks, months, or years, and
• interactive visualization of both the original measurements and the data quality assessments to
explore possible causes for data quality problems.
5.2 Automation
As a reminder, the overall objective is to maximize useful data captured while minimizing resources
consumed by handling data quality. The pursuit of this goal drives us toward increasing automation
where possible. While users care about the quantification of data quality, those tasked with addressing
the inevitable issues that arise would rather be informed as to the probable cause of the problem and
not just that there is a problem. As a result, we are using machine learning algorithms to classify the
probable cause based on the data quality issues that are detected in the data. Work on this approach is
very early but promising.
BIBLIOGRAPHY
[1] https://aws.amazon.com, retrieved August 2019
[2] S. P. Murphy, M. Andersen, K. D. Jones, M. Bariya, and J. Schuman, “A Universal Sensor
Analytics and Artificial Intelligence Platform for the Grid,” presented at the 2018 Grid of the
Future Symposium, 2018.
[3] M. P. Andersen and D. E. Culler, “BTrDB: Optimizing Storage System Design for Timeseries
Processing.,” in FAST, 2016, pp. 39–52.
[4] H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O’Neil, and P. E. O’Neil. A critique of
ANSI SQL isolation levels. In ACM SIGMOD Conf., 1995.
[5] M. P. Andersen, S. Kumar, C. Brooks, A. von Meier, and D. E. Culler, “DISTIL: Design and
implementation of a scalable synchrophasor data processing system,” in Smart Grid
Communications (SmartGridComm), 2017 IEEE International Conference on, 2015, pp. 271–
277.

More Related Content

What's hot

Network Traffic Anomaly Detection Through Bayes Net
Network Traffic Anomaly Detection Through Bayes NetNetwork Traffic Anomaly Detection Through Bayes Net
Network Traffic Anomaly Detection Through Bayes NetGyan Prakash
 
Probabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network trafficProbabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network trafficAlexander Decker
 
ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION
ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION
ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION Editor IJCATR
 
IEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and AbstractIEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and Abstracttsysglobalsolutions
 
A Survey of Fault Tolerance Methods in Wireless Sensor Networks
A Survey of Fault Tolerance Methods in Wireless Sensor NetworksA Survey of Fault Tolerance Methods in Wireless Sensor Networks
A Survey of Fault Tolerance Methods in Wireless Sensor NetworksIRJET Journal
 
Reliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkReliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkIJMTST Journal
 
Self-checking method for fault tolerance solution in wireless sensor network
Self-checking method for fault tolerance solution  in wireless sensor network Self-checking method for fault tolerance solution  in wireless sensor network
Self-checking method for fault tolerance solution in wireless sensor network IJECEIAES
 
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor NetworksEfficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor NetworksIRJET Journal
 
Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...prj_publication
 
A Survey of Various Data Communication Schemes in WSN
A Survey of Various Data Communication Schemes in WSNA Survey of Various Data Communication Schemes in WSN
A Survey of Various Data Communication Schemes in WSNIRJET Journal
 
Information Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and ApproachesInformation Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and ApproachesM H
 
Energy Consumption Minimization in WSN using BFO
 Energy Consumption Minimization in WSN using BFO Energy Consumption Minimization in WSN using BFO
Energy Consumption Minimization in WSN using BFOrahulmonikasharma
 
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...journalBEEI
 
DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...
DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...
DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...IJEEE
 
J031101064069
J031101064069J031101064069
J031101064069theijes
 
ENERGY CONSUMPTION IN WIRELESS SENSOR NETWORKS USING DATA FUSION ASSURANCE
ENERGY CONSUMPTION IN WIRELESS SENSOR  NETWORKS USING DATA FUSION ASSURANCEENERGY CONSUMPTION IN WIRELESS SENSOR  NETWORKS USING DATA FUSION ASSURANCE
ENERGY CONSUMPTION IN WIRELESS SENSOR NETWORKS USING DATA FUSION ASSURANCEijwmn
 
Iaetsd extending sensor networks into the cloud using tpss and lbss
Iaetsd extending sensor networks into the cloud using tpss and lbssIaetsd extending sensor networks into the cloud using tpss and lbss
Iaetsd extending sensor networks into the cloud using tpss and lbssIaetsd Iaetsd
 

What's hot (18)

Network Traffic Anomaly Detection Through Bayes Net
Network Traffic Anomaly Detection Through Bayes NetNetwork Traffic Anomaly Detection Through Bayes Net
Network Traffic Anomaly Detection Through Bayes Net
 
Probabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network trafficProbabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network traffic
 
ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION
ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION
ENERGY EFFICIENCY IN FILE TRANSFER ACROSS WIRELESS COMMUNICATION
 
IEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and AbstractIEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and Abstract
 
A Survey of Fault Tolerance Methods in Wireless Sensor Networks
A Survey of Fault Tolerance Methods in Wireless Sensor NetworksA Survey of Fault Tolerance Methods in Wireless Sensor Networks
A Survey of Fault Tolerance Methods in Wireless Sensor Networks
 
Reliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkReliable and Efficient Data Acquisition in Wireless Sensor Network
Reliable and Efficient Data Acquisition in Wireless Sensor Network
 
Self-checking method for fault tolerance solution in wireless sensor network
Self-checking method for fault tolerance solution  in wireless sensor network Self-checking method for fault tolerance solution  in wireless sensor network
Self-checking method for fault tolerance solution in wireless sensor network
 
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor NetworksEfficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
Efficient Data Gathering with Compressive Sensing in Wireless Sensor Networks
 
Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...Bidirectional data centric routing protocol to improve the energy efficiency ...
Bidirectional data centric routing protocol to improve the energy efficiency ...
 
A Survey of Various Data Communication Schemes in WSN
A Survey of Various Data Communication Schemes in WSNA Survey of Various Data Communication Schemes in WSN
A Survey of Various Data Communication Schemes in WSN
 
Information Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and ApproachesInformation Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and Approaches
 
Energy Consumption Minimization in WSN using BFO
 Energy Consumption Minimization in WSN using BFO Energy Consumption Minimization in WSN using BFO
Energy Consumption Minimization in WSN using BFO
 
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...Intelligent flood disaster warning on the fly: developing IoT-based managemen...
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
 
Sub1568
Sub1568Sub1568
Sub1568
 
DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...
DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...
DATA TRANSMISSION IN WIRELESS SENSOR NETWORKS FOR EFFECTIVE AND SECURE COMMUN...
 
J031101064069
J031101064069J031101064069
J031101064069
 
ENERGY CONSUMPTION IN WIRELESS SENSOR NETWORKS USING DATA FUSION ASSURANCE
ENERGY CONSUMPTION IN WIRELESS SENSOR  NETWORKS USING DATA FUSION ASSURANCEENERGY CONSUMPTION IN WIRELESS SENSOR  NETWORKS USING DATA FUSION ASSURANCE
ENERGY CONSUMPTION IN WIRELESS SENSOR NETWORKS USING DATA FUSION ASSURANCE
 
Iaetsd extending sensor networks into the cloud using tpss and lbss
Iaetsd extending sensor networks into the cloud using tpss and lbssIaetsd extending sensor networks into the cloud using tpss and lbss
Iaetsd extending sensor networks into the cloud using tpss and lbss
 

Similar to Who Monitors the Monitors? Automated, Hierarchical Data Quality Assessment for Time Synchronized Measurements

Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...M H
 
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...acijjournal
 
Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...
Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...
Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...Power System Operation
 
An intrusion detection algorithm for ami
An intrusion detection algorithm for amiAn intrusion detection algorithm for ami
An intrusion detection algorithm for amiIJCI JOURNAL
 
IRJET- Surveillance of Object Motion Detection and Caution System using B...
IRJET-  	  Surveillance of Object Motion Detection and Caution System using B...IRJET-  	  Surveillance of Object Motion Detection and Caution System using B...
IRJET- Surveillance of Object Motion Detection and Caution System using B...IRJET Journal
 
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train CoachesIRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train CoachesIRJET Journal
 
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET Journal
 
A data estimation for failing nodes using fuzzy logic with integrated microco...
A data estimation for failing nodes using fuzzy logic with integrated microco...A data estimation for failing nodes using fuzzy logic with integrated microco...
A data estimation for failing nodes using fuzzy logic with integrated microco...IJECEIAES
 
A Survey on Mobile Sensing Technology and its Platform
A Survey on Mobile Sensing Technology and its PlatformA Survey on Mobile Sensing Technology and its Platform
A Survey on Mobile Sensing Technology and its PlatformEswar Publications
 
Stream Processing Environmental Applications in Jordan Valley
Stream Processing Environmental Applications in Jordan ValleyStream Processing Environmental Applications in Jordan Valley
Stream Processing Environmental Applications in Jordan ValleyCSCJournals
 
Rapid Sensor Data Modeling
Rapid Sensor Data ModelingRapid Sensor Data Modeling
Rapid Sensor Data ModelingAlys Spillman
 
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Drjabez
 
Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...
Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...
Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...chokrio
 
Internet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionInternet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionGyan Prakash
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The CloudMonica Carter
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 

Similar to Who Monitors the Monitors? Automated, Hierarchical Data Quality Assessment for Time Synchronized Measurements (20)

Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...Information extraction from sensor networks using the Watershed transform alg...
Information extraction from sensor networks using the Watershed transform alg...
 
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...
 
Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...
Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...
Real Time Dynamics Monitoring System (RTDMS™): Phasor Applications for the Co...
 
An intrusion detection algorithm for ami
An intrusion detection algorithm for amiAn intrusion detection algorithm for ami
An intrusion detection algorithm for ami
 
IRJET- Surveillance of Object Motion Detection and Caution System using B...
IRJET-  	  Surveillance of Object Motion Detection and Caution System using B...IRJET-  	  Surveillance of Object Motion Detection and Caution System using B...
IRJET- Surveillance of Object Motion Detection and Caution System using B...
 
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train CoachesIRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
 
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
 
A data estimation for failing nodes using fuzzy logic with integrated microco...
A data estimation for failing nodes using fuzzy logic with integrated microco...A data estimation for failing nodes using fuzzy logic with integrated microco...
A data estimation for failing nodes using fuzzy logic with integrated microco...
 
IJET-V3I2P15
IJET-V3I2P15IJET-V3I2P15
IJET-V3I2P15
 
A Survey on Mobile Sensing Technology and its Platform
A Survey on Mobile Sensing Technology and its PlatformA Survey on Mobile Sensing Technology and its Platform
A Survey on Mobile Sensing Technology and its Platform
 
23709008 smart-sensors
23709008 smart-sensors23709008 smart-sensors
23709008 smart-sensors
 
Stream Processing Environmental Applications in Jordan Valley
Stream Processing Environmental Applications in Jordan ValleyStream Processing Environmental Applications in Jordan Valley
Stream Processing Environmental Applications in Jordan Valley
 
Rapid Sensor Data Modeling
Rapid Sensor Data ModelingRapid Sensor Data Modeling
Rapid Sensor Data Modeling
 
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
 
Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...
Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...
Optimal Operating Performances of Wireless Protocols for Intelligent Sensors ...
 
Internet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionInternet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detection
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The Cloud
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
chapter 4.pdf
chapter 4.pdfchapter 4.pdf
chapter 4.pdf
 
chapter 4.docx
chapter 4.docxchapter 4.docx
chapter 4.docx
 

More from Power System Operation

Thermography test of electrical panels
Thermography test of electrical panelsThermography test of electrical panels
Thermography test of electrical panelsPower System Operation
 
Big Data Analytics for Power Grid Operations
Big Data Analytics for Power Grid OperationsBig Data Analytics for Power Grid Operations
Big Data Analytics for Power Grid OperationsPower System Operation
 
SPS to RAS Special Protection Scheme Remedial Action Scheme
SPS to RAS Special Protection Scheme  Remedial Action SchemeSPS to RAS Special Protection Scheme  Remedial Action Scheme
SPS to RAS Special Protection Scheme Remedial Action SchemePower System Operation
 
SVC PLUS Frequency Stabilizer Frequency and voltage support for dynamic grid...
SVC PLUS Frequency Stabilizer Frequency and voltage support for  dynamic grid...SVC PLUS Frequency Stabilizer Frequency and voltage support for  dynamic grid...
SVC PLUS Frequency Stabilizer Frequency and voltage support for dynamic grid...Power System Operation
 
Principles & Testing Methods Of Earth Ground Resistance
Principles & Testing Methods Of Earth Ground ResistancePrinciples & Testing Methods Of Earth Ground Resistance
Principles & Testing Methods Of Earth Ground ResistancePower System Operation
 
Gas Insulated Switchgear? Gas-Insulated High-Voltage Switchgear (GIS)
Gas Insulated Switchgear?  Gas-Insulated High-Voltage Switchgear (GIS)Gas Insulated Switchgear?  Gas-Insulated High-Voltage Switchgear (GIS)
Gas Insulated Switchgear? Gas-Insulated High-Voltage Switchgear (GIS)Power System Operation
 
Electrical Transmission Tower Types - Design & Parts
Electrical Transmission Tower  Types - Design & PartsElectrical Transmission Tower  Types - Design & Parts
Electrical Transmission Tower Types - Design & PartsPower System Operation
 
The Need for Enhanced Power System Modelling Techniques & Simulation Tools
The Need for Enhanced  Power System  Modelling Techniques  &  Simulation Tools The Need for Enhanced  Power System  Modelling Techniques  &  Simulation Tools
The Need for Enhanced Power System Modelling Techniques & Simulation Tools Power System Operation
 
Power Quality Trends in the Transition to Carbon-Free Electrical Energy System
Power Quality  Trends in the Transition to  Carbon-Free Electrical Energy SystemPower Quality  Trends in the Transition to  Carbon-Free Electrical Energy System
Power Quality Trends in the Transition to Carbon-Free Electrical Energy SystemPower System Operation
 

More from Power System Operation (20)

ENERGY TRANSITION OUTLOOK 2021
ENERGY TRANSITION OUTLOOK  2021ENERGY TRANSITION OUTLOOK  2021
ENERGY TRANSITION OUTLOOK 2021
 
Thermography test of electrical panels
Thermography test of electrical panelsThermography test of electrical panels
Thermography test of electrical panels
 
What does peak shaving mean
What does peak shaving meanWhat does peak shaving mean
What does peak shaving mean
 
What's short circuit level
What's short circuit levelWhat's short circuit level
What's short circuit level
 
Power System Restoration Guide
Power System Restoration Guide  Power System Restoration Guide
Power System Restoration Guide
 
Big Data Analytics for Power Grid Operations
Big Data Analytics for Power Grid OperationsBig Data Analytics for Power Grid Operations
Big Data Analytics for Power Grid Operations
 
SPS to RAS Special Protection Scheme Remedial Action Scheme
SPS to RAS Special Protection Scheme  Remedial Action SchemeSPS to RAS Special Protection Scheme  Remedial Action Scheme
SPS to RAS Special Protection Scheme Remedial Action Scheme
 
Substation Neutral Earthing
Substation Neutral EarthingSubstation Neutral Earthing
Substation Neutral Earthing
 
SVC PLUS Frequency Stabilizer Frequency and voltage support for dynamic grid...
SVC PLUS Frequency Stabilizer Frequency and voltage support for  dynamic grid...SVC PLUS Frequency Stabilizer Frequency and voltage support for  dynamic grid...
SVC PLUS Frequency Stabilizer Frequency and voltage support for dynamic grid...
 
Principles & Testing Methods Of Earth Ground Resistance
Principles & Testing Methods Of Earth Ground ResistancePrinciples & Testing Methods Of Earth Ground Resistance
Principles & Testing Methods Of Earth Ground Resistance
 
Gas Insulated Switchgear? Gas-Insulated High-Voltage Switchgear (GIS)
Gas Insulated Switchgear?  Gas-Insulated High-Voltage Switchgear (GIS)Gas Insulated Switchgear?  Gas-Insulated High-Voltage Switchgear (GIS)
Gas Insulated Switchgear? Gas-Insulated High-Voltage Switchgear (GIS)
 
Electrical Transmission Tower Types - Design & Parts
Electrical Transmission Tower  Types - Design & PartsElectrical Transmission Tower  Types - Design & Parts
Electrical Transmission Tower Types - Design & Parts
 
What is load management
What is load managementWhat is load management
What is load management
 
What does merit order mean
What does merit order meanWhat does merit order mean
What does merit order mean
 
What are Balancing Services ?
What are  Balancing Services ?What are  Balancing Services ?
What are Balancing Services ?
 
The Need for Enhanced Power System Modelling Techniques & Simulation Tools
The Need for Enhanced  Power System  Modelling Techniques  &  Simulation Tools The Need for Enhanced  Power System  Modelling Techniques  &  Simulation Tools
The Need for Enhanced Power System Modelling Techniques & Simulation Tools
 
Power Quality Trends in the Transition to Carbon-Free Electrical Energy System
Power Quality  Trends in the Transition to  Carbon-Free Electrical Energy SystemPower Quality  Trends in the Transition to  Carbon-Free Electrical Energy System
Power Quality Trends in the Transition to Carbon-Free Electrical Energy System
 
Power Purchase Agreement PPA
Power Purchase Agreement PPA Power Purchase Agreement PPA
Power Purchase Agreement PPA
 
Harmonic study and analysis
Harmonic study and analysisHarmonic study and analysis
Harmonic study and analysis
 
What is leakage current testing
What is leakage current testingWhat is leakage current testing
What is leakage current testing
 

Recently uploaded

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 

Who Monitors the Monitors? Automated, Hierarchical Data Quality Assessment for Time Synchronized Measurements

  • 1. sean@pingthings.io 21, rue d’Artois, F-75008 PARIS CIGRE US National Committee http://www.cigre.org 2019 Grid of the Future Symposium Who Monitors the Monitors? Automated, Hierarchical Data Quality Assessment for Time Synchronized Measurements B. BENGFORT, S. P. MURPHY M. ANDERSEN K. D. JONES PingThings, Inc. PingThings, Inc. Dominion Energy USA South Africa USA SUMMARY Distributed sensor systems are a critical component required for the operation of a reliable, resilient, and efficient electrical grid. However, just as the grid requires monitoring, so too does the sensor system itself. Therefore, the continuous assessment of the quality of the data captured by sensors becomes an important obligation and task for transmission and distribution system operators. This paper examines the underlying philosophy of data quality assessment and then uses this perspective to craft a set of requirements for an operational system. The paper then discusses one potential system implementation that meets or exceeds the requirements posed and then examines a few ways data quality assessment could evolve. KEYWORDS Time synchronized measurements, synchrophasor, data quality, distributed sensor network
  • 2. 2 1. INTRODUCTION Death and taxes are joined by a third unavoidable certainty - data quality issues - for any organization that operates a network of sensors measuring the real world. Electric utilities, stewards of transmission and distributions systems together valued at over a trillion dollars in the US, are no exception. If an organization’s assets are valuable enough to be instrumented with sensors, they are valuable enough for the organization to monitor the quality of the data that is generated by those sensors. As entropy perpetually assaults such a system, data quality assessments must be done continuously. It is not something that, done once, is forever resolved and disappears. The quality of the data captured by sensors can be impacted by a number of different factors that span disciplines, from the field of metrology itself to the installation and configuration of the actual hardware, to the communications channel, to the networking layer of the system, to the computer systems and software receiving and persisting the data. No single engineer or subject matter expert has deep knowledge in all of these fields, potentially making data quality problems especially pernicious. Further, as a result of its interdisciplinary nature, responsibility for data quality can often fall through the organizational cracks of a system operator and no individual or no group “owns” data quality. Over the last 5 years, the authors of this paper have examined over a petabyte of synchrophasor data from utilities across North America, including both the transmission and distribution grids. While this total data volume examined may sound large, it represents less than one percent of the total synchrophasor data available if all deployed phasor measurement capabilities were activated. Using a conservative estimate of 20 channels of data per sensor, each at 30 samples per second, we estimate that over 150 Petabytes of time synchronized, phasor measurements would be generated per year if all of the hundreds of thousands of sensors were simply turned on. At either scale, the need for data quality assessment becomes obvious as does the futility of human-based, ad-hoc assessments. The quality of terabytes, petabytes, or even larger data sets must be interrogated, assessed, and captured in an automated fashion. Further, given the magnitude of this analysis, it should be done once and only once in a standardized fashion and included with data sent to each downstream consumer of data, be it an application, an analytic, or an end user. The modern utility operates an enormous, distributed sensor network composed of up to millions of sensors of different types--smart meters, synchrophasors, power quality meters, point on wave, and more--and reporting at varying data rates from once every 30 minutes to over a hundred thousand samples per second. This paper will explore the requirements for a system capable of monitoring the quality of the data originating from such a system at scale. We then examine particular aspects of a universal sensor analytics platform that enable continuous data quality surveillance, attempting to meet or achieve all requirements set forth. Finally, the paper will conclude with a glimpse into the future of data quality assessments. With this effort, the authors of this paper hope to answer the question, who (or what) monitors the monitors and how. 2. DATA QUALITY ASSESSMENT REQUIREMENTS The high-level objective of such a system is to maximize the amount of “usable” data generated by the sensor network while minimizing the consumption of resources to accomplish this goal. To make this more tangible, consider a fictional grid measured by 100 sensors, each providing 10 channels of data with 100Hz sampling per channel. This sensor network generates 100 x 10 x 100 = 100,000 measurements describing the grid per second. The objective of the associated data quality monitoring system is to ensure that as many of these 100,000 measurements per second are “usable” by as many downstream consumers as possible. The first general requirement that we have of the system is that it must be capable of assessing the data quality of the distributed sensor system for two different time periods. The first is historical, starting
  • 3. 3 with the current moment and stretching back in time to when the sensor first captured data. This period allows those responsible for the sensor data to understand what problems occurred in the past, what problems that were intermittent might still be impacting data quality, and what problems may still be persisting today. The second time period is the present; the system must be able to examine and analyze the quality of data as it arrives. This allows an alert to be triggered as bad data arrives, enabling the system or its operators to take corrective action quickly to decrease the length of time bad data pollutes the system in pursuit of one hundred percent good data. The second general requirement for the system is that the data quality assessment must be persisted so that downstream consumers do not have to replicate the analysis unless their data quality needs are extraordinary. This record of the data quality must be forever associated to the original data. As a caveat to this requirement, storage, both on premise and in the cloud, can be expensive. Amazon AWS’s storage tiers range in cost from less than a penny per gigabyte per month for the slowest tier, Glacier, to $0.30 per gigabyte per month for Amazon’s Elastic File Storage as of August 2019 [1]. It is quite possible that the data quality assessment for each measurement could consume at least as much space as the original value. Depending on how data is stored, the data quality assessment could inflate the total storage size by 50-100% at a minimum if full temporal resolution is kept. Thus, it may make sense to persist only an aggregated data quality assessment in lieu of full resolution. The last general requirement for the system is to handle two broad categories of data quality assessments. The first is a generic assessment that can be applied to every time series data stream ingested by the platform, answering basic questions that are applicable to any sensor data. The second is to be flexible enough to have custom data quality assessors applied to different types of time series, some potentially specified by users. 3. SYSTEM IMPLEMENTATION Assessing data quality for high frequency sensor networks should not be a bolt on after-thought but rather one of the first design requirements for any system handling real world sensor data. Sensor networks are distributed over a wide geographic area and send data through a large number of subsystems including data concentrators and intermediate caches, all of which may contribute to poor data quality. Because of this, the problem of data quality assessment is closely related to the problem of how to use sensor data effectively. These design requirements mean that the data quality assessment must occur as far downstream and as close to the end consumer as possible, ideally where the sensor data is centrally and ultimately stored. The PredictiveGridTM platform was designed from the ground up to be a third-generation data management, analytics, and artificial intelligence platform specifically for high-frequency time series (e.g. dense telemetry generated from sensor data) [2]. Because the platform was architected to store, process, and analyze dense telemetry faster than real-time, it is well-suited to both historical data analysis and model generation as well as applying computation to live, streaming data. Because of this dual role, the PredictiveGrid platform is the last stop in the automatic data processing pipeline before data is presented to users, and therefore also is required to provide data quality assessment and assurance. 3.1 Enabling General Purpose Sensor Data Quality Assessment: The Berkeley Tree The data structure at the heart of the PredictiveGrid is the Berkeley Tree [3] - a time-partitioning, multi-resolution, versioned k-ary tree. An artist’s rendering of the tree is shown in Figure 1. The leaf nodes of the tree store fixed blocks of (time, value) pairs for a specific time-range. These blocks are summarized by statistical aggregates, which are themselves stored in fixed time-ranges and are also summarized by higher levels of the tree. This creates a hierarchical data structure where different levels of the tree represent different time resolutions, from all epoch representable time at the root node to leaf nodes containing nanoseconds of data through granularities representing years, months,
  • 4. 4 weeks, hours, etc. Finally, links between nodes are annotated with a monotonically increasing version number when the underlying data is modified, ensuring that the tree is consistent even as data is being written. Figure 1: Representation of the Berkeley Tree is a time-partitioning, multi-resolution, versioned k-ary tree. This novel data structure allows users to rapidly make both historical and online data quality assessments. One function of the Berkeley Tree is to provide highly performant dense telemetry writes and aggregate reads. The tree structure allows data to be inserted out of order unlike other append-only systems. This ensures that sensor systems subject to variable network latencies and intermittent communication outages can apply writes out of order or across overlapping windows as efficiently as possible. Each write updates a version, ensuring that analysts can read at a specific version even while writes are ongoing, giving their computations consistency through snapshot isolation [4]. Because most time series analytics consider aggregate windows of data at varying resolution, the tree allows users to query exactly the level of the tree where the aggregates are already stored without computation. Secondarily, but just as importantly, the platform also allows for rich data quality assessments using similar features: versioning, aggregates, and ingest distillers. Data streaming into BTrDB (the Berkeley Tree Database) is generally written in blocks to the underlying tree, causing a new version of the tree to be created. When the tree is read, the user must specify a version, often the most recent, and only data belonging to that version is returned even as the tree is being concurrently updated, e.g. they are seeing a specific snapshot of the data. Snapshot isolation also ensures that results are repeatable as data changes - a key requirement for consistent reporting. From a data quality perspective, versioning and snapshot isolation allows us to be more aggressive when addressing data quality both because anomalous data can be recovered without interrupting live streams and because the tree makes it possible to identify only what has changed during automatic data cleaning. The ComputeDiff() function returns a list of time ranges where data has changed between two versions, without requiring a query of raw data. By itself, simply tracking the number of ranges that are changing between subsequent versions may indicate a problem (data is being updated as much as it is being inserted). Tracking changes between more distant versions allows data quality assessments to zoom directly into regions of question without having to query the entire raw data set.
  • 5. 5 Figure 2: A snapshot of an interactive data quality heatmap over several months where each row is a stream and streams from the same synchrophasor are adjacent. Darker regions indicate areas with a higher percentage of bad data (missing data, null values, etc). This interactive visualization is made possible by only querying the aggregate values at a high-enough time resolution for each stream, making interactive data quality visualization possible. The interior tree nodes contain statistical aggregates, capturing the statistical properties and behavior of all data points below them in the tree. Currently, the platform has implemented the following statistical aggregates: • count - the number of data points; • max - the maximum value of the measurements; • min - the minimum value of the measurements; and • mean - the mean value of the measurements for all data points below that specific interior node. In reality, any associative function can be used as a statistical aggregate and the standard deviation has already been implemented in a mathematically associative algorithm. From a data quality perspective, these statistical aggregates create an almost ideal mechanism to assess time series data quality in a generic way. For example, count allow you to infer from the sample frequency if data is missing from a specific time range. This missing data metric is relevant to all regularly sampled sensor data. The range of the window, defined by the min and max value of the raw data, may indicate if there are any domain errors for even just a single point. Finally, standard deviations may express abnormal changes in variability between windows that could also indicate systemic problems or other types of changes that need to be addressed. As the tree is built in memory upon data ingest and maintained by the database, all of the capabilities are “free” from the end user’s perspective (and do not cost in terms of performance or latency). This allows for interactive data visualizations like the one shown in Figure 2: a heatmap of bad data that relies on the performance of querying statistical points to display groups of related streams or windows of time with poor data quality. 3.2 Enabling Custom, Sensor-Specific Data Quality Assessment: Ingest Distillers Timeliness in discovering and repairing data quality issues is essential to the PredictiveGrid platform. Because the platform deals with very large datasets of dense telemetry, timeliness must be achieved through Big Data analytics using distributed computing to parallelize computation across multiple machines. The platform provides several options for distributed computing, including open source options such as Apache Spark for general purpose machine learning and Google’s TensorFlow for deep learning. However, to support automated data quality assessment with adaptive requirements, the platform primarily relies on the DISTIL analytics framework [5].
  • 6. 6 DISTIL provides an environment for materializing analytical streams given a dataflow graph, a directed acyclic graph that describes how data flows from input sources through different algorithmic outputs before finally being stored as output streams in BTrDB. Each node in the dataflow graph (a distiller) represents a stateless, idempotent computation that can be applied to a window of raw values or statistical points (with some before and after the computational window). While seemingly simple, the dataflow abstraction can be used to compose arbitrarily complex computations and, importantly, in such a way that computations can be scheduled in parallel across a distributed computing cluster. When a distiller is added to the platform, it is immediately applied to all new incoming data that match the distiller inputs. Note that BTrDB supports two first class functions, GetNearestValue() and ComputeDiff() that ensure the distiller is applied efficiently to only changed data, even if that data arrives out of order. Further, the distiller also begins to back-process historical data in the stream while prioritizing computation on incoming data. Given sufficient computational resources, the distillers will eventually be applied to the entire input dataset while continuously making computations on incoming streaming data. Because of this behavior, DISTIL is ideal for implementing an adaptive data quality assessment process that can evolve with changing environment conditions and new error conditions. Consider a distiller that takes as input a voltage or current phasor group measured by a PMU and produces a per-second data quality score between 0 and 1 such that higher scores indicate higher data quality. A first pass at this distiller might simply use the known sample rate of the PMU to measure the number of missing points or to apply rules such as there can be no two consecutive, non-zero but identical values for a stream. Later, after a few months of measurement, we may feel confident about applying statistical thresholds to reduce our data quality score if there are measurements outside of a few standard deviations of a computed mean. As more data enters the system, we can begin to add depth to our data quality measurement by using equipment-specific flag streams or ensuring that the magnitudes between phases are within a physical model based on their angle differences. As the data quality system evolves, the distiller not only applies these new analytics to new, incoming data but also to historical data, ensuring that a system-specific data quality assessment can be developed and automatically applied organically. Note that versions in the data quality stream also allow analysts to compare and understand how data quality changes as the assessment evolves, also deepening the understanding of the underlying data. 4. REAL WORLD DATA QUALITY ISSUES In this section, we will explore two case studies of data quality assessments that rely on time synchronized measurements: data loss and outage detection and configuration change detection. We will see how the error due to drift causes uncertainty in these assessments. Finally, we’ll explore how timestamp jitter can lead to other data issues like decreasing the effectiveness of compression mechanisms or causing variability across aggregated windows during analytics. 4.1 Data Loss and Outages Given enough time or a big enough system, some data simply will not make it from the sensors to the analytical data concentrator. This leads to the first and perhaps most fundamental data quality question: are we receiving all of the data we should be? If the sample rate of the device and the timestamp the first data point was received are known, it is easy to compare the total number of points stored to the expected number of points received. When, inevitably, this ratio falls below one, new questions arise: what is the distribution of outage durations, do they occur at regular intervals, are these outages correlated with other devices, etc. For example, the analysis shown in Figure 3 demonstrates that there are both device-independent outages as well as routine outages that occur across groups of devices. This may be related to maintenance windows or other processes that could be adapted to keep better caches.
  • 7. 7 Figure 3: An analysis of the per-device outages across 2 months of data collection for a system operator. This analysis was conducted on approximately 153B data points but completed in less than one minute via tree pruning that reduced the outage search space. A brute force approach to finding outages would be to scan every point and determine if the time delta between those two points is greater than some minimum threshold (e.g. the sample interval). Besides the obvious problem of the time required to compute this approach on large dense telemetry datasets, it is also ineffective to apply this approach to online data as it streams in: data may arrive out of order (e.g. it’s not lost, it’s simply late) or there could be a drift in the periodicity of the sample rate that appears as an outage that is really just a time compression of two consecutive measurements. The platform provides an effective solution to this problem because at higher levels of time granularity, stat point aggregates mask the drift of time synchronization. Said another way, at lower time resolutions (e.g. at the microsecond scale the number of points per period is more variable than the number of points per period at the hourly scale. Because of the multi-resolution tree structure, the platform ensures that outage analysis can occur in both offline, whole dataset analyses, and during online applications because the tree ensures that only the time areas with probably outages are inspected - pruning the search space and making the assessment more efficient. A sketch of the algorithm is as follows. In a depth-first fashion traverse, the tree such that at each level of the tree only nodes are kept that meet particular filtering criteria. First, determine the number of expected points per node at the time-resolution of the level. For each consecutive pair of nodes at the level, determine if they contain the expected number of points; if not, expand the next level of the tree by selecting the aligned windows from the start time of the first node to the end time of the second. If they do contain the expected number of points, compare end time of the first node to the start time of the second to determine if the time delta between those nodes is greater than twice the expected sample interval at that level. If so, an outage has occurred, and the duration of the outage can be computed by subtracting the timestamp of the last point in the first node and the first point of the second.
  • 8. 8 This mechanism works because the Berkeley Tree does not materialize nodes for data that does not exist. If there is an hour-long outage, there will be a missing stat point at a pointwidth of 41 (statistical points that cover a period of approximately 36.65) and the two consecutive stat points on either side will have less than their required number points. The search will zoom in on this window of time and ignore hours at the previous level that have complete stat points, pruning the tree only to the areas of interest. Because we consider consecutive stat points, we also minimize the variability of the periodicity that may mean that one statpoint has more than expected points while the one after has fewer than expected or vice versa. This data quality assessment can be conducted in both automated offline analyses and online applications. In the offline analysis, the worst-case scenario is that every other point is missing, which would result in an analysis equivalent to the brute force approach. In most cases, system providers require a 95% uptime SLA, therefore tree pruning will identify gaps in logarithmic time and only query the data rangers where gaps exist. In the online case, the hierarchical nature of the analysis means that windows can be monitored at multiple granularities that create missing data alarms which increase in severity as the outage continues. Because the distiller applies the data outage algorithm on every insert or update, the alarm state will be eliminated if the data comes in out of order and the gap is repaired or if the system returns to normal operation without repair, the outage and its severity will be logged. 4.2 Detecting Configuration Changes In the outage detection example, the algorithm required the sample rate to be known a priori. For many time series analytics, particularly those that use windowing, the sample rate must also be known in advance. This leads to the data quality question: are the devices configured at the expected sample rate? For many devices the sample rate is a configuration with common settings at 15Hz, 30Hz, 60Hz, and 120Hz; therefore, the corollary to this is to determine when a device configuration has changed, which can have a significant, but hard to detect effect on downstream analytics. Moreover, if devices are configured at too small a sample rate, events in very narrow time windows may be missed by being sampling around. The sample rate is usually expressed in Hz; however, loss of frequency may be a data loss problem, not a configuration issue. Moreover, smaller aggregate windows like seconds may be affected by jitter or other time synchronization issues. However, the platform makes this problem trivial to solve since each interior node in the tree is defined by a point count and a time duration, such that frequency can easily be computed and plotted as shown in Figure 4. At least initially, a visual analytics approach is well-suited to this application, allowing the user to choose different time resolutions to see if they can quickly spot changes in the frequency while minimizing the variability of the sample rate.
  • 9. 9 Figure 4: The frequency of this device is plotted at a time resolution of 1.63 days (pointwidth 47). In shows a clear configuration change around June 9, where the sample rate is set to 60Hz, then again on June 11 where the sample rate is reconfigured to 100Hz. The spikes in the graph also indicate where there may have been outages. To automate this analysis, we note that the configuration changes usually change in step functions and that it is rare to reconfigure devices more than once a day. Therefore, the automated analysis of this stream operates at a daily time resolution and detects if two consecutive windows have a higher frequency than the previous window. If so, this is likely not an outage, but rather an actual change. Other approaches such as a rules based approach that uses standard configuration values may also be applied to normalize the data quality alerts, or if there is a change in frequency, that window can be expanded with the stat points at the lower level to determine if there were multiple configuration changes in the same day. 4.3 Sample Rate Jitter In the previous two case studies, we used the unique data structure of the platform to efficiently and effectively diagnose data quality problems. In both cases, we noted that sample rate jitter is the cause of a large number of analytical issues and used statistical aggregates to smooth the variability of the data. More practically, data compression algorithms work better on less variable data, especially the platform’s compression algorithm, which is designed for low variance sample rates. More compression means less data storage, reducing the expense of the overall system.
  • 10. 10 Figure 5: Data quality issues due to timestamp jitter are only observed at a very low time resolution. This analysis that covers over 14k streams for an hour’s worth of data shows that most streams are missing at least a few points, many of which may have been sampled in either the previous or next periods meaning that there was no actual data loss. It is important to conduct a thorough analysis of the sample interval jitter so that algorithmic solutions such as truncating to the nearest microsecond or other configurable solutions may be employed. Many sensor specifications require a sample rate at the millisecond or microsecond scale allowing nanosecond time measurements to be aligned simply by rounding to the nearest microsecond as shown in Figure 6. Higher resolution sensors such as Point on Wave sensors, which operate at kilohertz sampling rates and beyond, still operate above the 50-200ns average network clock drift and can be normalized in the same way. If sample rates are still too variable even after truncation or rounding, distribution analysis allows for fuzzy alignment techniques such as binarization or linear interpolation to smooth the dataset’s variability. Figure 6: At the nanosecond scale, sample intervals can be highly variable, however if microsecond granularity is required, then these timestamps could be truncated so that the majority of them would align with 33333 microseconds, and only a small amount of variance would occur for those timestamps sampled at 33334 microseconds.
  • 11. 11 5. THE FUTURE The overall objective function—to maximize the percentage of measurements captured by the system that are usable downstream while minimizing the resources consumed in the process—steers future directions for data quality monitoring. We will examine two efforts in progress that will help sensor network operators optimize based on this objective function: (1) reporting and (2) automation via artificial intelligence. 5.1 Reporting The state of data quality for a network of sensors must be reported to multiple groups with differing interests, potentially both internal and external to the system operator. The volume of information reported and the frequency of reporting will vary greatly depending on the needs and goals of the teams. With systems such as this, there is always the concern that too frequent of alarms will desensitize the groups responsible for responding. Thus, there are multiple types of reporting that may be needed including: • real-time alerts broadcast via email or SMS, • historical summaries showing system behavior over hours, days, weeks, months, or years, and • interactive visualization of both the original measurements and the data quality assessments to explore possible causes for data quality problems. 5.2 Automation As a reminder, the overall objective is to maximize useful data captured while minimizing resources consumed by handling data quality. The pursuit of this goal drives us toward increasing automation where possible. While users care about the quantification of data quality, those tasked with addressing the inevitable issues that arise would rather be informed as to the probable cause of the problem and not just that there is a problem. As a result, we are using machine learning algorithms to classify the probable cause based on the data quality issues that are detected in the data. Work on this approach is very early but promising. BIBLIOGRAPHY [1] https://aws.amazon.com, retrieved August 2019 [2] S. P. Murphy, M. Andersen, K. D. Jones, M. Bariya, and J. Schuman, “A Universal Sensor Analytics and Artificial Intelligence Platform for the Grid,” presented at the 2018 Grid of the Future Symposium, 2018. [3] M. P. Andersen and D. E. Culler, “BTrDB: Optimizing Storage System Design for Timeseries Processing.,” in FAST, 2016, pp. 39–52. [4] H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O’Neil, and P. E. O’Neil. A critique of ANSI SQL isolation levels. In ACM SIGMOD Conf., 1995. [5] M. P. Andersen, S. Kumar, C. Brooks, A. von Meier, and D. E. Culler, “DISTIL: Design and implementation of a scalable synchrophasor data processing system,” in Smart Grid Communications (SmartGridComm), 2017 IEEE International Conference on, 2015, pp. 271– 277.