Management and Analysis of Large Scale Heterogeneous Time-Series Data


Published on

Copyright Martin Litzenberger at CeDEM14

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Management and Analysis of Large Scale Heterogeneous Time-Series Data

  1. 1. Management and Analysis of Large Scale Heterogeneous Time-Series Data Sensor and Government Data: Their Role in Public Policy Martin Litzenberger Safety and Security Department AIT Austrian Institute of Technology Martin Litzenberger | Senior Engineer | DSS SNI
  2. 2. Motivation A plethora of heterogeneous data are collected by public institutions with various sensors today But the data and their use are (usually) restricted to the domain or departments they belong to, e.g. security surveillance, traffic, public transport, air quality, power grids, ... Reasons: Lack of interoperability and often lack of communication and cooperation of data owners 223.05.2014
  3. 3. Advantages Connecting these data or even collecting them on a common platform would allow for new ways of analysis and insight into important and interesting mechanisms (e.g. traffic / air quality) But data are heterogeneous in many aspects such as: format, update frequency, representation, owner, accessibility .. which makes a joint analysis a big challenge Real-time 24/7 processing and availability, not a “one-time” academic investigation! 323.05.2014
  4. 4. Challenge: Heterogeneity of Data Temporal heterogeneity Discrete events versus regular time series Spatial heterogeneity „On-site“ versus „as near as possible“ Semantic heterogeneity The same parameters might have different significance under different context Technical heterogeneity Non-standardized interfaces, formats, etc. Political heterogeneity “Owners” of data have different missions and goals 423.05.2014
  5. 5. 523.05.2014 Investigating effects of traffic state (free flow/stop&go) on local air quality Data sources Traffic monitor for traffic volume and acceleration Black carbon sensor at road side and a background station Meteorological station Case Study
  6. 6. Case Study: Combined Air Quality and Traffic Monitoring Different owners City Council, State AQ Department and projects own sensors Different data intervals Traffic: Individual vehicles (~ 4000 data sets (speed, acceleration, vehicle class)/hour !) Air Quality & Meteo: fixed frequency, 30 min averages (48 data points/day) Pre-processing Temporal alignment & Aggregation
  7. 7. Goal: Investigating a “black carbon equivalent” for traffic Accelerating cars have a higher tailpipe emission than “free flowing” vehicles Approach: Q”BC” = Qtotal-vehicles + 6 * Qaccelerating-vehicles (can be even more complex including weight factors for HGV etc...) Local (road-side) black carbon concentrations need to be reduced by “background” values to “isolate” traffic related component CBC = Croad – Cbackground And of course wind speed is of interest at the same time ... ! 723.05.2014
  8. 8. Solution: What is openUwedat? OpenUwedat is a toolbox that allows to build Time Series related Applications The toolbox contains many ready-made, adaptable programs The toolbox contains libraries to write your own programs which integrate seamlessly with the existing ones Driver Driver Database Driver configurable
  9. 9. What can I do with openUwedat? openUwedat allows to interact with any kind of Time Series Device. You can integrate new devices by writing new modules which act as „drivers“. Typical devices are: Measurement Devices Data Aquisition Systems (station computers) Other Time Series Management Systems Databases (SQL and no-SQL) …
  10. 10. Implementation in openUwedat Powerful scripting language “Formula 3” Real time interfaces and real-time processing pipes Example code how to implement the BC-Equivalent function in Formula 3 @A="name=Database; type=Aggregation;Source=TDS;Sensor=S4.TDS1;Lane=0" @B="name=Database; type=Aggregation;Source=TDS;Sensor=S4.TDS1;Lane=1" <<(A.accCount[i]+B.accCount[i]+A.decCount[i]+B.decCount[i])* talFlow[i]+B.totalFlow[i]>> | << sum( _ ]t-60mins..t] ) >> every 60 mins 1023.05.2014
  11. 11. 1123.05.2014 Very good correlation! But depending on meteo-conditions. During episodes of stronger wind, the correlation drops! Typical Result Traffic / Air Quality
  12. 12. Conclusions Plenty of heterogeneous data are collected on regular basis by public authorities day by day The potential to analyse these data together stays mostly unused because: Lack of cooperation between authorities / departments Lack of interoperability of the systems Case study on traffic/air quality show potential of how heterogeneous data analysis creates new insights AIT’s OpenUwedat data management toolbox allows Collection of Large Scale Heterogeneous Time-Series Data from different sources Complex analysis using a powerful scripting language 1223.05.2014
  13. 13. AIT Austrian Institute of Technology your ingenious partner Martin Litzenberger
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.