SlideShare a Scribd company logo
User defined functions for performing interpolation
1. Introduction
Interpolation or kringing is the process of finding out the sensor data values at all points of a grid.Environmental
scientists record the datavalues for each timestamp at all points where the sensors are located.Using these values we
find out the data values at all the points on a specified grid.A grid consists of a two dimensional set of points at
equal intervals on a map. Such a two-dimensional set of points on a map help to capture the environmental state of
the database at a particular timestamp.
Grid is a set of two dimensional points equally spaced. The distance between consecutive points is decided by the
granularity of the grid. It is a two-dimensional evaluation space. The values of a particular measurement are
calculated at each point of the grid. These values at all points of the grid represent the environmental state at a
particular timestamp.
2. System Architecture
2.1. Data smoothing:
Done as sql queries inside the database
The data first needs to be smoothened. The smoothening of data is performed as follows:
1) A reference point is considered. The difference of the data timestamp and the reference timestamp is divided into
n minute intervals. Each timestamp is put into n minute intervals. Each timestamp is put into its own interval by
calculating the floor of its division with n.
For example consider a data timestamp 11.40 PM,31-08-2007.The time difference between this time stamp and the
reference timestamp is calculated and divided into n minute interval(Supposing the data needs to be smoothed into n
minute intervals)Each timestamp is put into its own interval by calculating the floor of its division with n.
For example, consider a data timestamp 11:35 PM,31-08007.The time difference between this timestamp and the
reference timestamp is calculates in minutes. This difference in minutes is divided by 30(supposing the smoothening
needs to be performed in 30 min intervals).The floor of this division is calculated and multiplied by 30.This result is
added to the reference timestamp.
This mathematical calculation returns the bucket for this timestamp which is 11.30 pm, 31-08-07.All the timestamps
lying between 11.30 and 12 pm of 31-08007 fall into this bucket. The average of data values of all such rows is
calculated and is considered as the cumulative value at this timestamp (11.30 pm, 31-08-2007).
Once the data has been smoothened with reference to the time-interval given, this data is used as input to the user
defined functions written.
2.2. User defined functions
General working--
The user defined functions first takes as input the maximum, minimum values of x and y to calculate the grid for
which environmental state needs to be calculated. For each such grid point, the effect of all the sensors which
measure the input measurement must be found out.
For measuring the effect of each such sensor on the specified grid point, we take into consideration two things:
1) The distance of each sensor from the given grid point.
2) The value of the measurement at that particular sensor (please note that all the values mentioned here are with
respect to a particular timestamp)
The effect of each sensor on the value at a particular timestamp is:
1) Inversely proportional to the distance of sensor from the grid point
2) Directly proportional to value of sensor.
Hence, the equation that we use is:
Where CE represents the cumulative effect on a single grid point
V1 ,…Vn represent the values at sensor points 1..n.
d1,…dn represent the distance metric between the sensor point and the actual .
Therefore, for each grid point, the sensors calculating the given measurement are taken. This data is then
smoothened as described before according to a specified interval.
The smoothed data is then interpolated for each interval using the formula above. This is calculated for every
interval.
Sql query generation:queries are passed by the user oblivious to the processing that is taking place wrt the query.
parsing and transforming sql queries:The queries passed by the user are parsed and a query tree is formed.The
arguments for the user defined functions are extracted from this parse tree and passed to the functions.The functions
replace the grid wherever it is used in the query.
Cursor based approach:
This is the slowest user defined function in terms of performance.
Input: maximum, minimum co-ordinates of grid and measurement id.
Output: interpolated environmental state.
A cursor returns the grid points lying between maximum, minimum points given as input.
The cursor then gives each row of its result set to point-influence, a function which calculates influence of sensors at
each point for a given measurement. Inside this function, a cursor returns the smoothened data for a particular
measurement. It also orders them by time.
For each timestamp all the sensor data is used in the formula above to calculate influence at a particular grid
point.
This is carried for each timestamp. This user defined function performs slowly because a cursor reads each value
from the database one after the other. There is no buffer which stores a group from which we can read easily. As
each call has to do database I/O,the function is the slowest of all the other functions.
Data structure based approach:
It is possible to write CLR enabled functions using .net languages to perform interpolation.CLR enabled
functions query the database and return a data reader class. The data reader acts as a buffer for the queried data.
From this data reader each value can be read one at a time and no overhead occurs on database I/O.
But there is a drawback to such an approach. The environmental data, which spans across various sensors all over
Switzerland and across time spread throughout 3 years, is quite huge taking several terabytes. Storing a subset of
such data in data structures in programming is not advisable. As the grid size increases due to decrease of
granularity, the data structures might not be able to hold such a magnitude of data. There is a substitute to such an
anomaly in case it occurs. This substitute is discussed in the next function.
Write-file function:
In this function, we query the database for the relevant grid points and the results are returned as a data reader class.
Along with this, the results of the data value for a particular measurement are also queried inside the database.
These two data readers are enough to calculate the interpolation values at a particular timestamp.
Then, the second data reader is opened and queried for the next timestamp to calculate the grid values at this
timestamp and so on.
In ms sql 2005,to open two result sets at a time you need to put MARS(Multiple Active Result Sets) option to true.
This option is not enabled for ms sql 2005 native client. But ms sql server 2005 is the version used in swissex.
The workaround for this is querying the db for the first timestamp and storing the results in a local store file, which
is equivalent to caching the results in a data reader.
Therefore we write the results for the first timestamp in a file. Then use the results to calculate interpolation by
reading the file. The results are stored in the file as comma separated values (CSV).Then the results for the next
timestamp are overwritten in the file and so on.
Table valued function:
This is the sql version of the user defined functions that we have represented till now. In this function a
single sql statement performs the interpolation. This sql statement is parsed by the server itself which then performs
the QEP (Query execution plan).This plan is not optimized, hence the execution is slower than write file method.
Non-CLR function:
This function queries the database using an sql connection and retrieves result set rows one after the other.The
advantage of this function is that we can have multiple active result sets open. The drawback is more i/o as we need
to retrieve results from database row wise. This function performs the best among all the stored procedures so
far,due to multiple active result sets that we have open.
Chunk Processing function:
The processing is done completely outside the database in the form of chunks of uniform size instead of chunks
dependent on data for a particular timestamp.The performance is a little bit worser than csv file based function. This
can be because,though we have decreased the disk I/O,the burden of processing in "chunks" of tuples rather than
chunks of timestamps causes the program to store previous data in a hash map so that when the next chunk of data
related to the same timestamp is processed,it can use the previous data.
The algorithm uses the size of the data or the value of the timestamp as a stopping point(whichever of these comes
first).So,if we are processing in chunks of data of 20 tuples for example,If the data contains 42 tuples,it gets
processed in set of 20->20->and 2 tuples.Therefore,there is not only the burden of storing data from previous
processing,but also a wastage of space(and hence time) while processing the remaining 2 tuples.
Gridcalculate
(Cursor based)
ShowGrid
(Joinbased)
WriteFile
(CSV file Based)
Non-clr chunkprocessi
ng
1000 34 min 4min 15 seconds 1 min 56 seconds 1 min 32 secs 2 min 29s
10,000 NA 53 min 41 seconds 19 min 33 seconds 15 min 40
secs
25min 14s
50,000 NA 4 hours 21 min 50 sec 1 hour 39 min 25 sec 1 hr 6 min
38seconds
2 hr 15min
29s
1,00,000 NA NA 3 hour 16 min 19 sec 2 hr 19mi 31 4 hr 29 min
Table 2:Performance comparison of user defined functions.
Note: Experiment was carried out on Octa-core server
2.3. Query Transformation
Sql queries are issued on the grid data assuming that the interpolation is already done. We need to parse the sql
statements to perform interpolation and substitute the results in place of wherever the grid table is used.
For example:
select * from grid ,sensor where xval between 1 and 10 and yval between 1 and 10 and s_id=1
For this purpose we require a transformation tool which transforms the sql query, parsing the where clause for
arguments and replacing the grid table with the user defined functions with the parsed arguments as input.
Javacc is the most popular parser generator for java based platform. As it concentrates on parser generation in only
one language, it crops up less errors .It is easy to use and also contains functions to auto document the parser and to
generate parser given the AST(Abstract Syntax Tree).It also contained functions to dump the AST and to perform
specific function when specific nodes are encountered, which is exactly what we want.
The Javacc transforms the sql query into an AST with a set of nodes. We parse the node formed by the where clause
and extract the arguments for the user defined function from it. We output the rest of the where clause as is. Now we
parse the node with with the table name as grid and replace the node with user defined function with the parsed
arguments as input.
Output: select * from showgrid(1,1,10,10,1),sensor

More Related Content

What's hot

Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
Martin Coronel
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
Pietro Michiardi
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
Divya Grover
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
IJCNCJournal
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
IJMER
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
EuroIoTa
 
An Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingAn Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather Forecasting
IOSR Journals
 
Load balancing
Load balancingLoad balancing
Load balancing
Pooja Dixit
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
Maria Stylianou
 
Soft computing based cryptographic technique using kohonen's selforganizing m...
Soft computing based cryptographic technique using kohonen's selforganizing m...Soft computing based cryptographic technique using kohonen's selforganizing m...
Soft computing based cryptographic technique using kohonen's selforganizing m...
ijfcstjournal
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run time
Alexander Decker
 
Big Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many ClusteringBig Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many Clustering
paperpublications3
 
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
IRJET-  	  Clustering the Real Time Moving Object Adjacent TrackingIRJET-  	  Clustering the Real Time Moving Object Adjacent Tracking
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
IRJET Journal
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
ijcsbi
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
IRJET Journal
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
Gopi Saiteja
 

What's hot (20)

Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
 
An Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingAn Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather Forecasting
 
Load balancing
Load balancingLoad balancing
Load balancing
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
 
Soft computing based cryptographic technique using kohonen's selforganizing m...
Soft computing based cryptographic technique using kohonen's selforganizing m...Soft computing based cryptographic technique using kohonen's selforganizing m...
Soft computing based cryptographic technique using kohonen's selforganizing m...
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run time
 
Big Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many ClusteringBig Data on Implementation of Many to Many Clustering
Big Data on Implementation of Many to Many Clustering
 
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
IRJET-  	  Clustering the Real Time Moving Object Adjacent TrackingIRJET-  	  Clustering the Real Time Moving Object Adjacent Tracking
IRJET- Clustering the Real Time Moving Object Adjacent Tracking
 
TO_EDIT
TO_EDITTO_EDIT
TO_EDIT
 
assignment_3
assignment_3assignment_3
assignment_3
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 

Similar to user_defined_functions_forinterpolation

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangEugine Kang
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
skilljiolms
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
RobinRohit2
 
Parameter Estimation User Guide
Parameter Estimation User GuideParameter Estimation User Guide
Parameter Estimation User GuideAndy Salmon
 
Ca unit v 27 9-2020
Ca unit v 27 9-2020Ca unit v 27 9-2020
Ca unit v 27 9-2020
Thyagharajan K.K.
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
Transaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptxTransaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptx
Roshni814224
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
ShimoFcis
 
Erik Proposal Final
Erik Proposal FinalErik Proposal Final
Erik Proposal FinalErik Messier
 
CS150_Project Report
CS150_Project ReportCS150_Project Report
CS150_Project ReportJingwei You
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
University of Technology - Iraq
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Florent Renucci
 
Data structure introduction
Data structure introductionData structure introduction
Data structure introduction
NavneetSandhu0
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
eSAT Journals
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
eSAT Publishing House
 
House price prediction
House price predictionHouse price prediction
House price prediction
SabahBegum
 

Similar to user_defined_functions_forinterpolation (20)

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
 
Parameter Estimation User Guide
Parameter Estimation User GuideParameter Estimation User Guide
Parameter Estimation User Guide
 
Project Report (Summer 2016)
Project Report (Summer 2016)Project Report (Summer 2016)
Project Report (Summer 2016)
 
Ca unit v 27 9-2020
Ca unit v 27 9-2020Ca unit v 27 9-2020
Ca unit v 27 9-2020
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Transaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptxTransaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptx
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Erik Proposal Final
Erik Proposal FinalErik Proposal Final
Erik Proposal Final
 
CS150_Project Report
CS150_Project ReportCS150_Project Report
CS150_Project Report
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
 
Data structure introduction
Data structure introductionData structure introduction
Data structure introduction
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
 
House price prediction
House price predictionHouse price prediction
House price prediction
 

user_defined_functions_forinterpolation

  • 1. User defined functions for performing interpolation 1. Introduction Interpolation or kringing is the process of finding out the sensor data values at all points of a grid.Environmental scientists record the datavalues for each timestamp at all points where the sensors are located.Using these values we find out the data values at all the points on a specified grid.A grid consists of a two dimensional set of points at equal intervals on a map. Such a two-dimensional set of points on a map help to capture the environmental state of the database at a particular timestamp. Grid is a set of two dimensional points equally spaced. The distance between consecutive points is decided by the granularity of the grid. It is a two-dimensional evaluation space. The values of a particular measurement are calculated at each point of the grid. These values at all points of the grid represent the environmental state at a particular timestamp. 2. System Architecture 2.1. Data smoothing: Done as sql queries inside the database The data first needs to be smoothened. The smoothening of data is performed as follows: 1) A reference point is considered. The difference of the data timestamp and the reference timestamp is divided into n minute intervals. Each timestamp is put into n minute intervals. Each timestamp is put into its own interval by calculating the floor of its division with n. For example consider a data timestamp 11.40 PM,31-08-2007.The time difference between this time stamp and the reference timestamp is calculated and divided into n minute interval(Supposing the data needs to be smoothed into n minute intervals)Each timestamp is put into its own interval by calculating the floor of its division with n. For example, consider a data timestamp 11:35 PM,31-08007.The time difference between this timestamp and the reference timestamp is calculates in minutes. This difference in minutes is divided by 30(supposing the smoothening needs to be performed in 30 min intervals).The floor of this division is calculated and multiplied by 30.This result is added to the reference timestamp. This mathematical calculation returns the bucket for this timestamp which is 11.30 pm, 31-08-07.All the timestamps lying between 11.30 and 12 pm of 31-08007 fall into this bucket. The average of data values of all such rows is calculated and is considered as the cumulative value at this timestamp (11.30 pm, 31-08-2007). Once the data has been smoothened with reference to the time-interval given, this data is used as input to the user defined functions written. 2.2. User defined functions General working-- The user defined functions first takes as input the maximum, minimum values of x and y to calculate the grid for which environmental state needs to be calculated. For each such grid point, the effect of all the sensors which measure the input measurement must be found out. For measuring the effect of each such sensor on the specified grid point, we take into consideration two things: 1) The distance of each sensor from the given grid point. 2) The value of the measurement at that particular sensor (please note that all the values mentioned here are with respect to a particular timestamp) The effect of each sensor on the value at a particular timestamp is: 1) Inversely proportional to the distance of sensor from the grid point 2) Directly proportional to value of sensor. Hence, the equation that we use is: Where CE represents the cumulative effect on a single grid point V1 ,…Vn represent the values at sensor points 1..n. d1,…dn represent the distance metric between the sensor point and the actual . Therefore, for each grid point, the sensors calculating the given measurement are taken. This data is then smoothened as described before according to a specified interval.
  • 2. The smoothed data is then interpolated for each interval using the formula above. This is calculated for every interval. Sql query generation:queries are passed by the user oblivious to the processing that is taking place wrt the query. parsing and transforming sql queries:The queries passed by the user are parsed and a query tree is formed.The arguments for the user defined functions are extracted from this parse tree and passed to the functions.The functions replace the grid wherever it is used in the query. Cursor based approach: This is the slowest user defined function in terms of performance. Input: maximum, minimum co-ordinates of grid and measurement id. Output: interpolated environmental state. A cursor returns the grid points lying between maximum, minimum points given as input. The cursor then gives each row of its result set to point-influence, a function which calculates influence of sensors at each point for a given measurement. Inside this function, a cursor returns the smoothened data for a particular measurement. It also orders them by time. For each timestamp all the sensor data is used in the formula above to calculate influence at a particular grid point. This is carried for each timestamp. This user defined function performs slowly because a cursor reads each value from the database one after the other. There is no buffer which stores a group from which we can read easily. As each call has to do database I/O,the function is the slowest of all the other functions. Data structure based approach: It is possible to write CLR enabled functions using .net languages to perform interpolation.CLR enabled functions query the database and return a data reader class. The data reader acts as a buffer for the queried data. From this data reader each value can be read one at a time and no overhead occurs on database I/O. But there is a drawback to such an approach. The environmental data, which spans across various sensors all over Switzerland and across time spread throughout 3 years, is quite huge taking several terabytes. Storing a subset of such data in data structures in programming is not advisable. As the grid size increases due to decrease of granularity, the data structures might not be able to hold such a magnitude of data. There is a substitute to such an anomaly in case it occurs. This substitute is discussed in the next function. Write-file function: In this function, we query the database for the relevant grid points and the results are returned as a data reader class. Along with this, the results of the data value for a particular measurement are also queried inside the database. These two data readers are enough to calculate the interpolation values at a particular timestamp. Then, the second data reader is opened and queried for the next timestamp to calculate the grid values at this timestamp and so on. In ms sql 2005,to open two result sets at a time you need to put MARS(Multiple Active Result Sets) option to true. This option is not enabled for ms sql 2005 native client. But ms sql server 2005 is the version used in swissex. The workaround for this is querying the db for the first timestamp and storing the results in a local store file, which is equivalent to caching the results in a data reader. Therefore we write the results for the first timestamp in a file. Then use the results to calculate interpolation by reading the file. The results are stored in the file as comma separated values (CSV).Then the results for the next timestamp are overwritten in the file and so on. Table valued function: This is the sql version of the user defined functions that we have represented till now. In this function a single sql statement performs the interpolation. This sql statement is parsed by the server itself which then performs the QEP (Query execution plan).This plan is not optimized, hence the execution is slower than write file method. Non-CLR function: This function queries the database using an sql connection and retrieves result set rows one after the other.The advantage of this function is that we can have multiple active result sets open. The drawback is more i/o as we need to retrieve results from database row wise. This function performs the best among all the stored procedures so far,due to multiple active result sets that we have open. Chunk Processing function:
  • 3. The processing is done completely outside the database in the form of chunks of uniform size instead of chunks dependent on data for a particular timestamp.The performance is a little bit worser than csv file based function. This can be because,though we have decreased the disk I/O,the burden of processing in "chunks" of tuples rather than chunks of timestamps causes the program to store previous data in a hash map so that when the next chunk of data related to the same timestamp is processed,it can use the previous data. The algorithm uses the size of the data or the value of the timestamp as a stopping point(whichever of these comes first).So,if we are processing in chunks of data of 20 tuples for example,If the data contains 42 tuples,it gets processed in set of 20->20->and 2 tuples.Therefore,there is not only the burden of storing data from previous processing,but also a wastage of space(and hence time) while processing the remaining 2 tuples. Gridcalculate (Cursor based) ShowGrid (Joinbased) WriteFile (CSV file Based) Non-clr chunkprocessi ng 1000 34 min 4min 15 seconds 1 min 56 seconds 1 min 32 secs 2 min 29s 10,000 NA 53 min 41 seconds 19 min 33 seconds 15 min 40 secs 25min 14s 50,000 NA 4 hours 21 min 50 sec 1 hour 39 min 25 sec 1 hr 6 min 38seconds 2 hr 15min 29s 1,00,000 NA NA 3 hour 16 min 19 sec 2 hr 19mi 31 4 hr 29 min Table 2:Performance comparison of user defined functions. Note: Experiment was carried out on Octa-core server 2.3. Query Transformation Sql queries are issued on the grid data assuming that the interpolation is already done. We need to parse the sql statements to perform interpolation and substitute the results in place of wherever the grid table is used. For example: select * from grid ,sensor where xval between 1 and 10 and yval between 1 and 10 and s_id=1 For this purpose we require a transformation tool which transforms the sql query, parsing the where clause for arguments and replacing the grid table with the user defined functions with the parsed arguments as input. Javacc is the most popular parser generator for java based platform. As it concentrates on parser generation in only one language, it crops up less errors .It is easy to use and also contains functions to auto document the parser and to generate parser given the AST(Abstract Syntax Tree).It also contained functions to dump the AST and to perform specific function when specific nodes are encountered, which is exactly what we want. The Javacc transforms the sql query into an AST with a set of nodes. We parse the node formed by the where clause and extract the arguments for the user defined function from it. We output the rest of the where clause as is. Now we parse the node with with the table name as grid and replace the node with user defined function with the parsed arguments as input. Output: select * from showgrid(1,1,10,10,1),sensor