• Save
Why Smart Meters Need Informix TimeSeries
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Why Smart Meters Need Informix TimeSeries

  • 4,726 views
Uploaded on

Informix Update - Denna presentation hölls på IBM Data Server Day den 22 maj i Stockholm av Simon David, Technical Product Manager, Competitive Technologies & Enablement, Informix Development

Informix Update - Denna presentation hölls på IBM Data Server Day den 22 maj i Stockholm av Simon David, Technical Product Manager, Competitive Technologies & Enablement, Informix Development

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,726
On Slideshare
4,718
From Embeds
8
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 8

http://www-01.ibm.com 6
http://w3.nordic.ibm.com 1
https://www.docsnode.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Why Smart Meters NeedInformix TimeSeriesIBM Data Server Day, Stockholm 2012-05-22 Cosmo@uk.ibm.com © 2012 IBM Corporation
  • 2. Please Note: IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the users job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2012
  • 3. Acknowledgements and Disclaimers:Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in allcountries in which IBM operates.The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They areprovided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or adviceto any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, itis provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the useof, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall havethe effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of theapplicable license agreement governing the use of IBM software.All customer examples described are presented as illustrations of how those customers have used IBM products and the results theymay have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in thesematerials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specificsales, revenue growth or other results.© Copyright IBM Corporation 2012. All rights reserved. – U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.IBM, the IBM logo, ibm.com and IBM Informix are trademarks or registered trademarks of International Business Machines Corporationin the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in thisinformation with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at thetime this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current listof IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others. 2012
  • 4. Why Smart Meters Need Informix TimeSeries  What challenges are being faced in the Energy & Utilities Sector today?  What is a Smart Meter and how can it help?  How does Informix TimeSeries fit it?  Case studies –1M Oncor PoC –35M Internal benchmark –100M AMT Sybex benchmark4 22 May 2012 2012
  • 5. Consumers need Smart Meters  Samuel Palmisano, chief executive officer of International Business Machines Corp., said Improving the U.S. electric-transmission grid depends on providing better information to consumers. Companies shouldn’t wait for government to set standards for data and technologies to create a "smart grid," which lets consumers monitor their energy use and take conservation steps that can save energy and money, [ September 21, 2010 at the Gridwise Global Forum in Washington]5 22 May 2012 2012
  • 6. Energy Usage Issues  Emission reduction goals: – EU 20% emissions reduction by 2020 as compared to 1990. – UK is 60% reduction by 2050.  Long lead times for new, “clean” energy supply.  Lasting legacy of energy inefficiency: – 80% of refrigerators bought in 2007 will be in use in 2020. – Less than 1/3 of industrial infrastructure will be replaced by 2020. – Over 20%of cars bought in 2007 will still be on the road in 2020.  Household efficiency a priority: – 25-30% of carbon emissions are from regular households. – 80% of home energy usage is heating. – EC projects 27% savings through efficiency in buildings.6 22 May 2012 2012
  • 7. Why Smart Meters  Access to near-real-time electricity usage information.  Better control and management of electricity usage.  Enable retail electric providers to develop and offer new, innovative plans that will lower consumer bills.  Help make smarter decisions and change behaviours to help reduce consumption, or modify usage patterns. Smart meter often refers to an electrical meter, but it can also mean a device measuring natural gas or water consumption.7 22 May 2012 2012
  • 8. Who is Using Smart Meters  Utility Companies: – In the U.S. – stimulus money used for smart meters. – Main drive is not reducing billing costs. – Better analysis of usage patterns. – Can different tariffs change energy consumption? .  Consumers: – Looking to reduce energy costs. – Wanting to improve their green credentials.  Governments: – Need to show improvements in emissions. – Want to reduce energy consumption/reliance.8 22 May 2012 2012
  • 9. Smart Meters Solves Real Problems  Real time information on Energy Usage.  Gain control over personal energy usage – Modify electrical consumption: • California study – reductions 5.7% to 8.7%. • Norwegian study - reductions of 9%. • UK study reduction of 12%. • Oncor Texas, reductions of 5%-10%.  Power companies: – Develop new innovative rate plans. – Avoid building new plants. – Avoid buyer power from other sources. – Meet Green standards. – Reliable power restored quicker after outages.9 22 May 2012 2012
  • 10. Data issues with Smart Meters  Data Issues - Terabytes of new Data: – Ability to bring on new meters. – Stores data for new regulatory reasons. – Analyse usage. – Automatically Read Meters.  New Data, New Applications: – Billing – Portal – Compliance – New Analytics – Combine Meter and Weather data12 22 May 2012 2012
  • 11. Informix Timeseries Overview13 2012
  • 12. What is Time Series Data?  Time series data is: – A set of data where each item is time-stamped • Think of an array where each element can be indexed by time or by a timestamp “Give me the Jan 1st element from time series “X”  Most useful when a range of data is normally read “Give me the Jan 1st thru Jan 10th elements from time series “X”  Access to one time series is usually completed before moving to the next time series.14 2012
  • 13. How are Time Series Used?  They access the data by time range – Look at a range of data in the past – Make predictions about a range in the future  Their analysis is often very proprietary  Many keep large volumes of data online  Many take in huge volumes of data each second  All these markets use relational data as well  All need to combine their relational data with time series data15 2012
  • 14. Key Strengths of Informix Timeseries  Performance – Extremely fast data access • Data layout optimized on disk – Handles operations hard or impossible to do in standard SQL  Space Savings – Can be over 70% space savings over standard relational layout  Toolkit approach allows users to develop their own algorithms – Algorithms run in the database to leverage buffer pool  Conceptually closer to how users think of time series16 2012
  • 15. Relational Time Series Representation Meter_ID TimeStamp phase1 phase2 ... temp 1 2010-06-01 00:00 1.3 0 15.6 1 2010-06-01 00:30 1.6 0 15.6 1 2010-06-01 01:00 1.4 0 15.5 1 2010-06-01 01:30 1.4 0 15.4 1 2010-06-01 02:00 1.4 0 15.5 Growth ... 2 2010-06-01 00:00 0.4 0 12.3 2 2010-06-01 00:30 0.3 0 12.3 2 2010-06-01 01:00 0.2 0 12.2 2 2010-06-01 01:30 0.5 0 12.3 ... 3 2010-06-01 00:00 0.0 3.5 13.6 3 2010-06-01 00:03 0.0 4.3 12.217 2012
  • 16. Same Table Stored as a Time Series Meter_ID Origin 00:00 00:30 01:00 01:30 ... 1 2010-06-01 (1.3,0...15.6) (1.6,0...15.5) (1.4,0...15.5) (1.4,0...15.4) 2 2010-06-01 (0.4,0...12.3) (0.3,0...12.3) (0.2,0...12.2) (0.5,0...12.3) 3 2010-06-01 (0,3.5... 13.6) (0,4.3... 12.2) There are only as many rows as meters Growth Each row is very long and grows as data is inserted Very fast access to a timeslot once the Meter_ID is selected Very fast to read time-ordered set of values18 2012
  • 17. Informix TimeSeries  A “timeseries” datatype is available in Informix – First introduced by Illustra in 1996  Additional “objects” associated with timeseries: – “Calendar” datatype • For defining when data can be collected – Row types • For defining what should be collected – Containers • For defining where the data should be stored – Several Support tables: • Calendar, tsinstancestable, tscontainertable19 2012
  • 18. Key Concepts: Regular Time Series  Data collected uniformly over time intervals is a “regular” time series – For example: daily, hourly, etc...  A regular time series always has exactly one record per interval  If an interval is missing data then: – Missing data on an existing page takes up (a little) space – If all the intervals for a page are missing data then the page takes no space  Values in one interval typically do not carry into the next  Can be thought of as an array of data20 2012
  • 19. Key Concepts: Irregular Time Series  Irregular time series also use intervals of time, however: – Unlike regular time series, irregular time series can store more than one record into a given time interval • For instance, multiple alerts can occur in the same second – Missing data never takes any room on disk  Values in an irregular time series can be treated in two ways: – Values may persist until next value arrives (stair step) • Total usage counter – Values are only valid at their given time point and do not “persist” (discreet) • Power outage alert  Can also be thought of as an array of data21 2012
  • 20. Key Concepts: Calendar Datatype  Every Timeseries has an associated calendar  A calendar is made up of several parts: – A name – A pattern of intervals – A start date For instance, to create a calendar called “daily” that starts on Jan 1 2010 and defines regular work days you would issue this query: INSERT INTO Calendartable (c_name, c_calendar) VALUES (‘weekday’, ‘start(2010-01-01 00:00:00), pattern({5 on, 2 off}, day)’);  The system catalog called “calendartable” holds all the calendars that have been defined22 2012
  • 21. Key Concepts: Row Types  A Timeseries is made up of a series of timestamped rows  The granularity of the timestamp is 10 microseconds (.00001 seconds)  The SQL syntax that defines a row type is: CREATE ROW TYPE reading (tstamp DATETIME, phase1 DECIMAL,…) – NOTE: Timeseries requires the type of the first column (the type of tstamp) to be “datetime year to fraction(5)”  Data in the row can be missing (NULL) – Missing data takes no space in a time series  Rows can be marked as hidden – Useful for holidays and other times where data is not available23 2012
  • 22. Key Concepts: Containers  A “container” is the name given to the data structure that hold data for one or more time series.  It guarantees that time series data is stored clustered and sorted on disk  A container is explicetly created using this SQL syntax: EXECUTE PROCEDURE TsContainerCreate(‘cont_name’, ‘dbspace_name’, ‘rowtype_name’, first_extent, next_extent); – rowtype_name is the name of an existing row type – DBSPACE_NAME is the name of an existing dbspace (predefined area of disk) – FIRST_EXTENT is the size of the first extent of storage – NEXT_EXTENT is the size of the subsequent extents of storage  TimeSeries 5.00 has an automatic container allocation mechanism – With no container definition the dbspace of the table is used – Otherwise user defined pools can be used – Policy can be Round Robin or user defined24 2012
  • 23. Putting it all Together  Create a calendar for 30 minute intervals; INSERT INTO Calendartable (c_name, c_calendar) VALUES (‘interval’, ‘start(2010-01-01 00:00:00), pattern({1 on, 29 off}, minute)’);  Create a row type: CREATE ROW TYPE reading (tstamp datetime year to fraction(5), phase1 DECIMAL, phase2 DECIMAL, phase3 DECIMAL, temp DECIMAL);  Create a container: EXECUTE PROCEDURE TsContainerCreate (‘int_cont1’, ‘tsdbs’, ‘reading’, 1024, 1024);  Create a table and insert a “blank” row for 1 meter: CREATE TABLE meters (Meter_ID char(64), Actual timeseries(reading)); INSERT INTO meters VALUES (“9908898”, “origin (2010-01-01 00:00:00), calendar(interval), container(int_cont1), regular”);25 2012
  • 24. Relational Storage – Traditional Index Method Data pages have mixed Meter_IDs Multiple page access required Meter_ID Start End Key stored in both index and data root 2010 Meter_ID Start End Meter_ID Start End Meter_ID Start End MX001 00:00 23:30 MX002 00:00 23:30 MX1209980 00:00 23:30 Index Page Data PageMeter_ID TStamp Pointer Meter_ID TStamp usageMX001 2010-06-01 00:00 MX001 2010-06-01 00:00 1.6MX001 2010-06-01 00:30 MX001 2010-06-01 00:30 1.8MX001 2010-06-01 01:00 MX002 2010-06-01 12:30 3.6MX001 2010-06-01 01:30 MX003 2010-06-01 06:00 8.2MX001 2010-06-01 02:00 MX001 2010-06-01 01:00 4.726 2012
  • 25. Relational Storage – “High Performance” Index Method Only index page access required But All data is stored in both index Meter_ID Start End and data pages root 2010 Meter_ID Start End Meter_ID Start End Meter_ID Start End MX001 00:00 23:30 MX002 00:00 23:30 MX1209980 00:00 23:30 Index Page Data PageMeter_I TStamp Usage Pointer Meter_ID TStamp UsageD MX001 2010-06-01 00:00 1.6MX001 2010-06-01 00:00 1.6 MX001 2010-06-01 00:30 1.8MX001 2010-06-01 00:30 1.8 MX002 2010-06-01 12:30 3.6MX001 2010-06-01 01:00 4.7 MX003 2010-06-01 06:00 8.2MX001 2010-06-01 01:30 2.5 MX001 2010-06-01 01:00 4.7MX001 2010-06-01 02:00 2.127 2012
  • 26. An Informix Table Containing a Timeseries Column The timeseries in the table is a physical reference to a mini-btree in a container Meter_ID Timeseries(reading) Container “A” MX001 [container_A, 1] MX002 [container_B, 2] MX003 [container_A, 3] MX004 [container_C, 4] Container “B” MX234 [container_C, 5] MX239 [container_B, 6] MX675 [container_C, 7] Container “C” MX521 [container_C, 8]28 2012
  • 27. Timeseries Container Layout The btree index key is the time series id plus either: • An integer for regular time series • A timestamp for irregular time series Each low-level page holds sorted data for 4 5 7 8 12 16 exactly one time series Index Twig Pages:29 2012
  • 28. Irregular Timeseries Storage Compared to Relational Data values only stored once No data pointers or pages Multiple, smaller btrees TS_ID Start End root 2010-01-01 TS_ID Start End TS_ID Start End TS_ID Start End 1 00:00 23:30 2 00:00 23:30 1000 00:00 23:30 Timeseries Page (irregular) Data PageMeter_ID TStamp Usage Pointer Meter_ID TStampMX001 2010-06-01 01:03 1.6 MX001 2010-06-01 01:03 1.6MX001 2010-06-01 01:45 1.8 MX001 2010-06-01 01:45 1.8MX001 2010-06-01 02:06 1.9 MX002 2010-06-01 02:06 3.6MX001 2010-06-01 02:08 2.1 MX003 2010-06-01 02:08 8.2MX001 2010-06-01 02:25 1.8 MX001 2010-06-01 02:25 1.930 2012
  • 29. Regular Timeseries Storage Compared to Relational Data is only stored once No timestamps or data pages Multiple, smaller btrees TS_ID Start End root 2010-01-01 TS_ID Start End TS_ID Start End TS_ID Start End 1 00:00 23:30 2 00:00 23:30 1000 00:00 23:30 Timeseries Page (regular) Data PageMeter_ID TStamp Usage Pointer Meter_ID TStampMX001 2010-06-01 00:00 1.6 MX001 2010-06-01 00:00 1.6MX001 2010-06-01 00:30 1.8 MX001 2010-06-01 00:30 1.8MX001 2010-06-01 01:00 1.9 MX002 2010-06-01 12:30 3.6MX001 2010-06-01 01:30 2.1 MX003 2010-06-01 06:00 8.2MX001 2010-06-01 02:00 1.8 MX001 2010-06-01 01:00 1.931 2012
  • 30. Informix Timeseries Space Saving  There is a small overhead for the b-tree pages – Meter_ID and Timestamp stored – Also pointer to Timeseries page  Irregular Timeseries must store Timestamp for each element – 8 Bytes Extra overhead per element  Regular Timeseries uses known offsets – No Timestamp stored – Even more efficient  NULL data is compressed – NULL elements (missing regular elements) take zero space – Sparse arrays are not stored at all if no elements in time range – Unlike relational storage NULL values take NO SPACE – A row type of (DECIMAL(12), INTEGER, INTEGER) is 7 + 4 + 4 = 15 bytes – Storing (NULL, 1, NULL) would only require 4 bytes32 2012
  • 31. Worked Example – Relational Method Number of meters: 3,000,000 Interval: 15 minutes (96 readings per day) Meter ID length: 8 bytes Timestamp length: 12 bytes Data length: 8 + 6 bytes + 2 bytes slot overhead Data space: 3000000 * 96 * ( 8 + 12 + 8 + 6 + 2 ) = 10GB Index space: 3000000 * 96 * ( 8 + 12 + 8 + 2 ) + 10% b+tree overhead = 9GB Total storage: = 19GB 19GB per day 19GB per day33 2012
  • 32. Worked Example – Informix Timeseries Number of meters: 3,000,000 Interval: 15 minutes (96 readings per day) Meter ID length: 64 bytes Timestamp length: 12 bytes Timeseries metadata: 86 bytes Data length: 8 + 6 bytes + 2 bytes slot overhead Fixed data space: 3000000 * ( 64 + 86 ) = 429MB Timeseries overhead: 3000000 * ( 12 + 4 + 2 ) + 10% = 66MB Variable data space: 3000000 * 96 * ( 8 + 6 + 2 ) = 4.4GB That is aahuge saving of 76% That is huge saving of 76%34 2012
  • 33. Timeseries Simplicity – Example • Much simpler SQL – Apply a tariff Relational: SELECT meter_id, sum (value * 1.76) FROM meters where (tstamp BETWEEN 2010-06-02 00:00 AND 2010-06-02 06:59) OR (tstamp between 2010-06-02 21:00 AND 2010-06-02 23:59) GROUP BY 1, 2; Timeseries: SELECT meter_id, apply_tariff (readings, tariff, 2010-06-02 00:00, 2010-06-02 23:59)::Timeseries(applied_cost) FROM meters;  But what if there is a missing value in the interval data?  What if you want to reference data outside the query range?36 2012
  • 34. Building Applications with the TimeSeries Datablade  Standard client access to server – ESQL/C – ODBC, JDBC, .NET – Perl DBD::Informix, PHP, Ruby  Several Timeseries specific interfaces are available: – SQL – VTI – SPL – Java (client & server) – C-API (client & server)  It’s a toolkit approach! – Allow people to build their analytics in the server37 2012
  • 35. Informix Timeseries SQL Interface  Timeseries data is usually accessed through user defined routines (UDR’s) from SQL  Over 80 predefined functions come with Informix Timeseries: – Clip() - clip a range of a time series and return it – LastElem(), FirstElem() - return the last (first) element in the time series – Apply() – run a query across a time series • Apply filters, project only subset of columns, apply functions to elements, etc… – AggregateBy() – Roll up or down values • Change the frequency of a Timeseries from hourly to daily for instance – SetContainerName() - move a Timeseries from one container to another – BulkLoad() - load data into a Timeseries from a file38 2012
  • 36. TimeSeries SQL Examples  Get all meter data for meter 3 for the last day SELECT Clip(reading, CURRENT – 1 units day, CURRENT) FROM meters WHERE Meter_ID = ‘3’;  Get the last meter record for meter 3 SELECT GetLast (reading) FROM meters WHERE Meter_ID = ‘3’;  Find the maximum usage by week for meter 3 over the last 30 days SELECT AggregateBy (‘max($usage)’, ‘weeklycal’, reading, CURRENT – 30 units day, CURRENT) FROM meters WHERE Meter_ID = ‘3’;39 2012
  • 37. Informix Timeseries VTI Interface  Makes time series data look like standard relational data – useful for programs that can’t our proprietary Timeseries format – There is a small penalty for using VTI  Restrictions – No secondary indices are allowed – No triggers allowed  SQL to create a VTI table: – If you have a table called “meters” with a time series column the following query will create an equivalent VTI table: EXECUTE PROCEDURE tscreatevirtualtab(‘readings’, ‘meters’);40 2012
  • 38. VTI Interface: Continued Meters – The Timeseries data Meter_ID Origin 00:00 01:00 02:00 03:00 ... MX001 2010-06-01 1.3 1.6 1.4 1.5 MX002 2010-06-01 0.4 0.3 0.2 0.5 MX003 2010-06-01 3.5 4.3 Readings – A virtual view of the Timeseries data Meter_ID TStamp usage MX001 2010-06-01 00:00 1.3 The VTI view is equivalent to MX001 2010-06-01 01:00 1.6 the tall thin relational table MX001 2010-06-01 02:00 1.4 and can be easily accessed MX001 2010-06-01 03:00 1.5 by any SQL client ... MX002 2010-06-01 00:00 0.4 MX002 2010-06-01 01:00 0.341 2012
  • 39. Informix Timeseries 5.00 VTI Interface  TimeSeries 5.00 VTI Enhancements – Update regular VTI using primary key only – Use of TimeSeries expressions (read only)  SQL to create a VTI table with an expression: EXECUTE PROCEDURE TSCreateExpressionVirtualTab( day_agg_readings, devices, AggregateBy("sum($kwh),avg($phase_a),avg($phase_b),avg($phase_c)", "cal1day", readings, 0), reading, 1024, readings);42 2012
  • 40. Comparison of VTI vs Native Time Series Queries  Select a range of data for a meter: – Native: SELECT Clip (reading, “2010-01-01”, “2010-01-10”) FROM Meters WHERE Meter_ID = “2”; – VTI: SELECT * FROM readings WHERE tstamp BETWEEN “2010-01-01” AND “2010-01-10” AND Meter_ID = ”2”;  Find the max usage for a given meter in a given period of time - Native: SELECT Apply (“max($usage)”, “2010-01-01”, “2010-01-10”, reading) FROM Meters WHERE Meter_ID = “2”; - VTI: SELECT max(usage) FROM readings WHERE tstamp BETWEEN “2010-01-01” AND “2010-01-10 AND Meter_ID = “2”; Note: – Native will normally be faster than VTI, probably in 5 to 10% range – It is often much faster to write custom user defined functions – VTI functions are very convenient for standard SQL clients43 2012
  • 41. TimeSeries C-API Interface  Client and server versions of the API  Treats a time series like a table (sort of) – Functions to open and close a time series – Functions to scan a time series between 2 timestamps – Functions to create a time series – Functions to retrieve, insert, delete, update  Plus another 70 functions defined44 2012
  • 42. Timeseries Data Loading46 2012
  • 43. Timeseries Data Loading  Timeseries is a specialist type and benefits from a specialist data loading mechanism  Traditionally the Real Time Loader has been used for high speed Timeseries data insert – Developed for stock market trade data – Good for irregular Timeseries – Small symbol universe – 10s of thousands of stocks – Data arriving in timestamp order – Small number of active stocks – Needs to cope with very high peak loads at exchange open & close  Smart Meter Data is a new challenge – Timeseries is regular – Many millions of meters – Data batched by Meter Identifier – All meters equally active47 2012
  • 44. Smart Meter Data Loader  Uses similar internal mechanism as RTL to directly access containers  Builds internal map of Meter ID and Timeseries ID  Can use fragmentation of base table for better parallelism  Parallel sessions can work on separate disks to reduce contention  Load rates can be in excess of 50,000 intervals per second per core50 2012
  • 45. Smart Meter Data Loader – Architecture Random Distribution Meter_ID TS ID 7898765 1 2168768 2 9879821 3 1656578 4 8787987 5 4678768 6 7354658 7 2537591 8 8973547 9 1352857 10 3451759 11 7656472 12 6543897 13Meter Data Loaders 3324516 14 Containers Physical Disks Hash table52 2012
  • 46. Oncor PoC53 2012
  • 47. Oncor PoC Details  Simulation – 90 days worth of meter data for 1 million meters • 15 minute intervals • One value stored per interval – 200 locations – 500 feeders – 34 substations  Hardware – Power7 with 2 sockets each with 8 cores – 64 bit SUSE Linux 11 – 128 GB of memory • Memory actually needed, 44GB, although could probably be less – 6 disks dedicated to the database, 2 additional for OS and LSE staging • Disk space actually used by the database, about 350GB (110 days) – Additional disks for the operating system and staging area for files  Software – Informix Ultimate Edition 11.7 – Informix Timeseries54 2012
  • 48. Informix Time Series SchemaThe Meter table looks like this: A Meter reading looks like this:CREATE TABLE meters ( CREATE ROW TYPE meter_data ( esi_id char(64) not null primary key, tstamp datetime year to fraction(5), suffix char(32), value decimal (14,3) location char(16), ); feeder char(16), sub_station char(16), dbspace varchar(128), An update (correction) record container varchar(128), looks like: actual Timeseries(meter_data), estimated Timeseries(meter_data), CREATE ROW TYPE update_day ( valid Timeseries(update_day) tstamp datetime year to fraction(5),) last_update datetime year to fraction(5), ); Hierarchy is sub_station->feeder->meter. There are also tables for location, sub_station and feeder not shown above.55 2012
  • 49. Primary Use Cases  Load 90 days worth of data for 1 million meters from LSE files – Original set of LSE files massaged to generate 1 million distinct meters Oracle 6 hours Timeseries 18 minutes  6-day ERCOT Settlement Extract – Show support for the ERCOT settlement processes by creating LSE file consisting of every record (every meter) for operating day - 6 (calendar day that occurred 6 days prior to current day). Must be able to extract and create the LSE files for 1M meters for a specific day. Oracle 5 hours T Timeseries <7 minutes  22-Day Update ERCOT Settlement Extract – Show support for the ERCOT settlement processes by creating LSE files consisting of every record that has had a consumption interval record update since the prior extract / pull (6-Day). Only extract the last or most current update for each meter, so if a meter has been updated four times, only the last / current record is sent. The entire 96 15 minute intervals are sent each time as well. Oracle 8 hours Timeseries 4 minutes (90 day 11 minutes)  Missing Record ERCOT Settlement Extract – Show support for the ERCOT settlement processes by creating an LSE file consisting of only the meter IDs and date that is provided in a missing meter ID file from ERCOT. The dates will be as far back as 90 days and no sooner than 28 days back in time. 4000 random reads on one day - 6 seconds 4000 random reads many days - 24 seconds65 2012
  • 50. Other Use Cases  Determine the count and the list of meter IDs for all meters with missing intervals and / or register reads on a given day Oracle 3-4 hours Timeseries <7 minutes  Determine the 90 day history for a given meter (90 day aggregation) Oracle > 1 second Timeseries 0.04 seconds  Determine the count and list of meter IDs that exceeded a given high interval value for a given day or given time period (multiple days). For example, count and list of meters that had interval value of 12 or higher for a given period of time. Timeseries <6 minutes  Determine list of meters that have 5 consecutive or more days with estimated values only (no actual interval reads during a 5 day or more period) Oracle 6 hours Timeseries 17 minutes66 2012
  • 51. Internal Benchmark67 2012
  • 52. Internal Benchmark - Requirements  35 Million meters – 10 minute intervals with 5 values – 5 billion intervals per day  12 Months data storage – Over 1.8 trillion intervals – Regular TimeSeries 30TB – Predicted Relational 84TB  OLTP concurrent users – All running while data is loading  Complex aggregations – Required new TSRollUp function68 2012
  • 53. Internal Benchmark - Hardware  IBM P780 with AIX 7.1  Storage: IBM DS8000 - 576 HDD 146GB/15krpm  Space used –TimeSeries intervals: 30Tb • Split over 64 logical devices, 768 containers –Relational Tables: 112Gb • 1 main data dbspace, 70 fragmentation dbspaces –System use: 148Gb • Root, log dbspaces + 6 temp dbspaces  64 cores, primary CPU thread affinitied to 64 Virtual Processors  1Tb main memory, up to 950Gb assigned to database server –80Gb relational data buffers –680Gb TimeSeries data buffers –45Gb system memory69 2012
  • 54. Internal Benchmark - Results  Data loading – Single day load: 20 minutes (64 Cores used) – Historical load of 12 months: <6 days – Daily load during queries: 160 minutes (8 Cores used) – Data cleansing after load: 2 minutes  Query performance – 3,000 concurrent sessions – Single meter queries sub-second response time – Larger summary queries executed in <5 seconds – No performance degradation during data load70 2012
  • 55. AMT Sybex Benchmark71 2012
  • 56. AMT Sybex Benchmark  Most ambitious Smart Meter Benchmark to date  100 Million Meters – 30 minute intervals – 1, 2 or 3 daily registers  Target was to confirm a 24hr operational window – Load data – Validate data – Calculate estimated corrections – Billing run for 6% of the meters Validation Load VEE Database Query Single IBM Power 750 server72 2012
  • 57. AMT Sybex Benchmark  Hardware – IBM Power 750 32 cores (3.5GHz) running AIX 7.1 – 1 x Gb LAN Fibre adapter (dual port, using 1 port) – 2 x 8Gb FC adapters (dual port, using 4 port) – 512Gb memory – 1 x IBM XIV Storage System with 15 x 2Tb data modules  Software – IBM Informix Dynamic Server 11.70.FC3 – IBM Informix TimeSeries 5.00.FC1 – AMT-SYBEX SmartDTS v 6.0  Database Server – 101,000,000 x 4Kb buffers – 16 cpu vps – 30 x 2Gb logical logs – 40Gb physical log – The time series were stored over 16 logical disks73 2012
  • 58. AMT Sybex Benchmark – Processing time Daily Processing Time Showing predictability of processing as database size increases 540 4.0 480 3.5 420 3.0 360 2.5 300 Minutes 2.0 Tb 240 1.5 180 1.0 120 60 0.5 0 0.0 Validation Loading VEE Space Used74 2012
  • 59. AMT Sybex Benchmark – Performance Results Individual operations Operation Time in hrs CPU Validate 2:18 100% Load 3:15 80% VEE 2:10 100% Total 7:4375 2012
  • 60. AMT Sybex Benchmark – Performance Results Individual operations Operation Time in hrs CPU Validate 2:18 100% Load 3:15 80% VEE 2:10 100% Total 7:43 Billing Query 4:21 5% Overall total 12:0476 2012
  • 61. AMT Sybex Benchmark – Performance Results Combined operations The Billing Query and the load can be run concurrently Operation Time in hrs CPU Validate 2:18 100% Load + Billing 4:41 85% VEE 2:10 100% Overall Total 9:09 This result confirmed that a 9hr processing window was sufficient for the daily processing77 2012
  • 62. How Does This Benchmark Compare?Comparison of Published Benchmarks for Meter Data Management Daily Total Total DB App DB App Meters Reads Cores RAM cores cores RAM RAM Informix TimeSeries 100M 4.9B 16 500 16 (shared) 500 (shared) The Competition * 10M 970M 456 3668 48 <180 384 1.5TB Daily Readings (meters * registers * intervals) Database Resources (CPU cores) Informix TimeSeries Informix TimeSeries 4,900,000,000 total cores 16 48 The Competition – db cores 180 The Competition 970,000,000 The Competition – app server cores 5 times the performance < 1/5 the resources … with significantly simpler management using a single node system * Based on latest published Oracle benchmark http://www.oracle.com/us/industries/utilities/ultilities-exadata-exalogic-wp-1499854.pdf78 22 May 2012 2012
  • 63. http://www.ibm.com/informix Cosmo@uk.ibm.com79 2012