Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SAP HANA SPS10- Series Data/ TimeSeries

6,493 views

Published on

See what's new in SAP HANA SPS10- Series Data/ TimeSeries

Published in: Technology
  • Be the first to comment

SAP HANA SPS10- Series Data/ TimeSeries

  1. 1. 1© 2014 SAP AG or an SAP affiliate company. All rights reserved. SAP HANA SPS 10 - What’s New? Series Data / TimeSeries SAP HANA Product Management June, 2015 (Delta from SPS 09 to SPS 10)
  2. 2. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Public Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
  3. 3. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 3Public Agenda  Overview – Series Data Overview – SPS09 Summary – SPS10 Overview  Store Enhancements – Enhanced Support for Equidistant Series – Support for Equidistant Series with Multiple Increments, Offsets  Query Enhancements – Updates to SERIES_ROUND  Analytic Enhancements – New Analytic Functions
  4. 4. Overview
  5. 5. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 5Public Series Data Overview  Series Data synonymous with Time Series  Series Data support introduced in SPS09 as a core SAP HANA capability  Series Data - What is it? – Ordered sequence of data points/measurements – Measured at points in time or within time intervals o E.g. Discrete measurement taken from a sensor at every 10s o E.g. Energy consumed by a home for every 15 minute interval (smart metering)  Series Data - What do we do with it? – Analyze and predict o Extract useful statistical information o Forecasting  Series Data – Relevance? – Foundational technology for IoT o Industry 4.0 / Industrial Internet of Things (IIoT) o IT/OT Convergence
  6. 6. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Public Series Data – SPS09 Review  Support very high volumes of data using effective compression techniques – Non-lossy compression; all values originally inserted are accessible for auditing/regulatory purpose  Support both equidistant and non-equidistant data – Often, source data will be non-equidistant; it will then be “snapped” to an equidistant “grid” for analysis, model fitting, etc.  Allow time series manipulation, cleaning, and analytic operations to be expressed naturally in SQL while maintaining high performance – Table Creation via CREATE COLUMN TABLE extensions for Series Data – Efficient grouping to different granularities (GROUP BY SERIES_ROUND(…)) – Built in SQL functions for efficient handling of Series Data o SERIES_GENERATE; SERIES_DISAGGREGATE; SERIES_ROUND; SERIES_PERIOD_TO_ELEMENT; SERIES_ELEMENT_TO_PERIOD – New Analytical SQL Functions o CORR; CORR_SPEARMAN; LINEAR_APPROX; MEDIAN
  7. 7. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 7Public Series Data – SPS10 Overview  Handle timestamp data that is NOT equidistant with a single offset in the entire table – With good compression for reduced memory consumption – Range block indexes for efficient handling of range queries  Enhance SERIES_ROUND with new rounding modes and to accept an offset – Enhanced usability and greater expressive power for querying series data  New aggregate and window functions  CDS Support Store  Equidistant series w/ any alignment  Generated rounded columns  Piecewise equidistant series Query  Round to computed interval  Granulize (any offset) Analyze  AUTO_CORR, CROSS_CORR  BINNING  CUBIC_SPLINE_APPROX  DFT  RANDOM_PARTITION  SERIES_FILTER  WEIGHTED_AVG  Sliding window support  {FIRST/NTH/LAST}_VALUE
  8. 8. Store Enhancements Enhanced Support for Equidistant Series
  9. 9. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 9Public Enhanced Support for Equidistant Series Limitations of SPS09  Restrictions/Limitations in SPS09 on Equidistant Series – Only one equidistant property per table o i.e. Only a single INCREMENT BY is supported; Defined at table creation time; Applies to all of the series in the table o Efficient compression can be provided on the timestamp column (but it had to be exactly aligned on the increment boundary). i.e. no support for any offset o Can be encoded as a line t = mx (i.e. single slope ‘m’, no offset from the INCREMENT boundary) – Data needed to be ordered on INSERT (ordered by ‘Series Key, TimeStamp’) for good compression  SPS09 Equidistant series support works great for series data and use cases that meet the above criteria
  10. 10. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Public Enhanced Support for Equidistant Series Many use cases require more flexible handling of timestamps/periods  But, many use cases where – ‘runs of data’ where timestamps for consecutive data points differ by a constant interval o i.e. data effectively has multiple INCREMENTs o can be due to different intervals for different series in table o can be due to different intervals within single series in table – timestamps are not necessarily aligned to INCREMENT boundaries o i.e. offsets can exist from the INCREMENT boundaries – often there may be slight local variations in the timestamp, i.e. some “jitter”
  11. 11. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 11Public New Representation For Timestamps  Encode series timestamps/periods as t = mx + b + j t = mx + b + j • x integer value (monotonically increasing) • m represents slope (i.e. represents INCREMENT BY) • b is an offset value (locally constant) • j is a jitter value (can have few distinct values)  Offers good compression even with different slopes and offsets in the series – Slight differences from ideal line representation and recorded timestamps (j) represented efficiently with n-bit compression  Enables support for alternate periods – Useful when the period column needs to be offset by some constant o e.g. for time zone differences; for daylight savings time; differences in starting day of week etc.
  12. 12. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Public Grammar Updates to Support Equidistant Piecewise Series Supported via CDS  Note: New syntax currently only supported via CDS and not via CREATE TABLE – CREATE TABLE support may be provided in a future version – Use of syntax via SQL statement will give errors series_definition := SERIES ‘(‘ series_spec_list ‘)’ series_spec_list: SERIES KEY '(' column_name_list ')' | NO MINVALUE | MINVALUE str_const | NO MAXVALUE MAXVALUE str_const | PERIOD FOR SERIES ‘(‘ {column|NULL} [‘,’ {column|NULL}] ‘)’ | series_equidistant_definition | reorganize_process | ALTERNATE PERIOD FOR SERIES (column [, column ...]) series_equidistant_definition: NOT EQUIDISTANT | EQUIDISTANT INCREMENT BY constant [MISSING ELEMENTS [NOT] ALLOWED] | EQUIDISTANT PIECEWISE
  13. 13. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Public Grammar Updates to Support Equidistant Piecewise Series Supported via CDS entity Weather { station_id String(3) not null; ts_utc UTCTimestamp not null; -- UTC time at start of period ts_local UTCTimestamp not null; -- local time at start of period temp Decimal(3,1) not null; -- mean temp ℃ wind_speed Decimal(2) null; -- wind speed (Km/h) ts_utc_month UTCTimestamp not null; -- period rounded to months GENERATED ALWAYS AS SERIES_ROUND(ts_utc,’INTERVAL 1 MONTH’); } SERIES ( SERIES KEY(station_id) EQUIDISTANT PIECEWISE PERIOD FOR SERIES (ts_utc) ALTERNATE PERIOD FOR SERIES(ts_local) ) CREATE COLUMN TABLE Weather_( station_id varchar(3) NOT NULL, ts_utc_ timestamp NULL, -- ts_utc_x_ integer default 0 NOT NULL, ts_utc_m_ decimal default 1 NOT NULL, ts_utc_b_ timestamp default timestamp’0001-01-01 00:00’ NOT NULL, ts_utc_j_ decimal default 0 NOT NULL, ts_local_ timestamp NULL, ts_local_d_ decimal default 1 NOT NULL, temp decimal(3,1) NOT NULL, wind_speed decimal(2) NULL, ts_utc_month TIMESTAMP NOT NULL GENERATED ALWAYS AS SERIES_ROUND( COALESCE(ts_utc, ADD_SECONDS(_series_b, _series_m*_series_x +_series_j)) ,’INTERVAL 1 MONTH’) flags_ int default 0 not null, ) SERIES ( SERIES KEY(station_id) EQUIDISTANT INCREMENT BY 1 PERIOD FOR SERIES (ts_utc_x) ) CREATE VIEW Weather AS SELECT station_id, COALESCE(ts_utc_, ADD_SECONDS(ts_utc_b_, ts_utc_m_* ts_utc_x_ + ts_utc_j_) ) as ts_utc, COALESCE(ts_local_, ADD_SECONDS(ts_utc_b_, ts_utc_m_* ts_utc_x_ + ts_utc_j_+ ts_local_o_) ) as ts_local, temp, wind_speed, ts_utc_month FROM Weather_; On activation of CDS Document Logical Representation of the series table Physical Representation of the series table CDS specification
  14. 14. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Public Representation for Equidistant Piecewise Series CREATE COLUMN TABLE Weather_( station_id varchar(3) NOT NULL, ts_utc_ timestamp NULL, -- ts_utc_x_ integer default 0 NOT NULL, ts_utc_m_ decimal default 1 NOT NULL, ts_utc_b_ timestamp default timestamp’0001-01-01 00:00’ NOT NULL, ts_utc_j_ decimal default 0 NOT NULL, ts_local_ timestamp NULL, ts_local_d_ decimal default 1 NOT NULL, temp decimal(3,1) NOT NULL, wind_speed decimal(2) NULL, ts_utc_month TIMESTAMP NOT NULL GENERATED ALWAYS AS SERIES_ROUND( COALESCE(ts_utc, ADD_SECONDS(_series_b, _series_m*_series_x +_series_j)) ,’INTERVAL 1 MONTH’) flags_ int default 0 not null, ) SERIES ( SERIES KEY(station_id) EQUIDISTANT INCREMENT BY 1 PERIOD FOR SERIES (ts_utc_x) ) Physical Representation of the series table • On first insert ts_utc_ is stored unmodified • After a reorg step the x, m, b, j (ts_utc_x_, etc) are calculated, and ts_utc_ is set to NULL • The view is defined to correctly read the original time stamp value or the calculated timestamp value after the reorganization. • Using COALESCE • Reorg is via ALTER TABLE SERIES REORGANIZE command • Needs to be instantiated by user • Generated Rounded Columns: Use rounded period columns for good performance on range queries Note: in SPS10, the j component is not yet realized. It is set to 0. This will be fixed in a subsequent release.
  15. 15. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Public Equidistant Piecewise Series – Reorg step ALTER TABLE SERIES REORGANIZE for compression • On first INSERT the period columns (including alternate period columns) are stored as is (i.e. uncompressed form) • ALTER TABLE SERIES REORGANIZE is required to store timestamps in their equidistant piecewise form (i.e. x,m,b,j components) which provides compression • Reorders the rows by (series key, period) by deleting existing rows (deletion gives good $rowid$ compression by ensuring rowid matches timely order) • Equidistant piecewise representation components are calculated (i.e. m, x, b, j) to give good compression while maintaining the correct timestamp value • Sets the period column to NULL (after this the timestamps get calculated via the components) • ALTER TABLE SERIES REORGANIZE • Needs to be user instantiated • Can be run against subsets of data (e.g. partitions) and be limited to processing a fixed number of rows during a run • Will find the rows that are not optimally encoded and process them • Should be run against sufficiently large sets of rows (1000’s to 100’s thousands) for good compression • Is resource intensive – so best run during quiet periods • M_SERIES_TABLE monitor view returns various statistics on series tables, including no. of rows reorganized
  16. 16. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Public Generated Rounded Columns Rounded Period Columns for Better Performance on Range Predicates and OLAP Queries • Generated rounded columns can be used to store period or alternate period columns rounded to a more coarse level (e.g. day, week, month) • Have great compression • Are optional • Multiple such columns can be created (on different period columns, different levels of coarseness) • Used automatically by server for improved performance of range predicates on the original column; as well as for OLAP queries (server can limit no of rows for which exact timestamps need to be calculated) • CREATE COLUMN TABLE Weather_( station_id varchar(3) NOT NULL, ts_utc_ timestamp NULL, -- , … ts_utc_month TIMESTAMP NOT NULL GENERATED ALWAYS AS SERIES_ROUND( COALESCE(ts_utc, ADD_SECONDS(_series_b, _series_m*_series_x +_series_j)) ,’INTERVAL 1 MONTH’) … ) SERIES ( SERIES KEY(station_id) EQUIDISTANT INCREMENT BY 1 PERIOD FOR SERIES (ts_utc_x) ) • Generated Rounded Columns These store values rounded to a coarser interval
  17. 17. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Public Summary Benefits of Equidistant Piecewise Representation  Order-Independent INSERT w/ no degradation in compression  Good compression for multiple INCREMENT BY scenarios  Good Compression for scenarios with multiple offsets from zero in timestamp  Good Compression for scenarios where timestamps have jitter  Support for local time variations w/ good compression  Efficient range comparisons on timestamp columns  Efficient GROUP BY for timestamp columns
  18. 18. Query Enhancements SERIES_ROUND Updates
  19. 19. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Public SERIES_ROUND New Rounding Modes & Non-Zero Alignment • New rounding modes especially useful for intervals of months, years => months and years have variable lengths! • The default rounding mode is ROUND_HALF_UP • The <alignment_expression> allows specification of a non-zero alignment for the interval datatype • Allows MINVALUE to have a non-zero offset • E.g. Allows for summarizing weeks that begin with Mondays (as opposed to Saturdays which is the natural zero 0001-01-01 for the datetime data type • Interval widths (INCREMENT BY) can be dynamically specified Mode Semantics ROUND_HALF_UP Default value. The value is rounded to the nearest series value. Values that fall halfway between two series values are rounded up away from zero. ROUND_HALF_DOWN The value is rounded to the nearest series value. Values that fall halfway between two round values are rounded down towards zero. ROUND_HALF_EVEN The value is rounded to the nearest series value. Values that fall halfway between two rounded values are rounded to the even series value based on element number. ROUND_UP The value is always rounded away from zero, to the larger series value. ROUND_DOWN The value is always rounded towards zero, to the smaller series value. ROUND_CEILING The value is always rounded in a positive direction, to the larger series value. ROUND_FLOOR The value is always rounded in a negative direction, to the smaller series value. SERIES_ROUND (<value>, {<increment_by> | SERIES TABLE <series_table>} [, <rounding_mode> [, <alignment_expression>]]) New Rounding Modes
  20. 20. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 20Public SERIES_ROUND Examples of Rounding with Month and Year Intervals Period Length Expression Result with Default ROUND_HALF_UP 28 days SERIES_ROUND(‘2014-02-14 23:59:59’, ‘INTERVAL 1 MONTH’) 2014-02-01 00:00:00 SERIES_ROUND(‘2014-02-15 00:00:00’, ‘INTERVAL 1 MONTH’) 2014-03-01 00:00:00 29 days SERIES_ROUND(‘2012-02-15 11:59:59’, ‘INTERVAL 1 MONTH’) 2012-02-01 00:00:00 SERIES_ROUND(‘2012-02-15 12:00:00’, ‘INTERVAL 1 MONTH’) 2012-03-01 00:00:00 30 days SERIES_ROUND(‘2014-04-15 23:59:59’, ‘INTERVAL 1 MONTH’) 2014-04-01 00:00:00 SERIES_ROUND(‘2014-04-16 00:00:00’, ‘INTERVAL 1 MONTH’) 2014-05-01 00:00:00 31 days SERIES_ROUND(‘2014-01-16 11:59:59’, ‘INTERVAL 1 MONTH’) 2014-01-01 00:00:00 SERIES_ROUND(‘2014-01-16 12:00:00’, ‘INTERVAL 1 MONTH’) 2014-02-01 00:00:00 59 days 31+28 SERIES_ROUND(‘2014-01-30 11:59:59’, ‘INTERVAL 2 MONTH’) 2014-01-01 00:00:00 SERIES_ROUND(‘2014-01-30 12:00:00’, ‘INTERVAL 2 MONTH’) 2014-03-01 00:00:00 92 days 31+31+30 SERIES_ROUND(‘2014-08-15 23:59:59’, ‘INTERVAL 3 MONTH’) 2014-07-01 00:00:00 SERIES_ROUND(‘2014-08-16 00:00:00’, ‘INTERVAL 3 MONTH’) 2014-10-01 00:00:00 Period Length Expression Result with Default ROUND_HALF_UP 365 days SERIES_ROUND(‘2014-07-02 11:59:59’, ‘INTERVAL 1 YEAR’) 2014-01-01 00:00:00 SERIES_ROUND(‘2014-07-02 12:00:00’, ‘INTERVAL 1 YEAR’) 2015-01-01 00:00:00 366 days SERIES_ROUND(‘2012-07-01 23:59:59’, ‘INTERVAL 1 YEAR’) 2012-01-01 00:00:00 SERIES_ROUND(‘2012-07-02 00:00:00’, ‘INTERVAL 1 YEAR’) 2013-01-01 00:00:00 730 days 365+365 SERIES_ROUND(‘2014-12-31 23:59:59’, ‘INTERVAL 2 YEAR’) 2014-01-01 00:00:00 SERIES_ROUND(‘2015-01-01 00:00:00’, ‘INTERVAL 2 YEAR’) 2016-01-01 00:00:00 731 days 366+365 SERIES_ROUND(‘2012-12-31 11:59:59’, ‘INTERVAL 2 YEAR’) 2012-01-01 00:00:00 SERIES_ROUND(‘2012-12-31 12:00:00’, ‘INTERVAL 2 YEAR’) 2014-01-01 00:00:00 Note that the rounding result depends on the no of days in the period!
  21. 21. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Public SERIES_ROUND Examples of Rounding with Specified Alignment Values Expression Result Explain SERIES_ROUND(8, 10, 3) 13 because 8 is the midpoint between 3 and 13 and the default rounding mode ROUND_HALF_UP rounds away from 0. SERIES_ROUND(5, 10, 3) 3 because 5 is closer to 3 than 13 SERIES_ROUND(12, 10, 3) 13 because 12 is closer to 13 than 3 SERIES_ROUND(19, 10, 3) 23 because 19 is closer to 23 than 13 SERIES_ROUND( ‘2015-02-27’ , ‘INTERVAL 7 DAY’, ‘2015-01-05 09:00:00’, ROUND_UP ) ‘2015-03-02 09:00:00’ because 2015-01-05 is a Monday, and 2015-02-27 is a Friday that is closer to Monday 2015-03-02 than to Monday 2015-02-23. SERIES_ROUND( ‘2015-03-01’ , ‘INTERVAL 2 MONTH’, ‘2014-02-01’) ‘2015-02-01’ because ‘2015-03-01’ lies closer to ‘2015-02-01’ than to ‘2015-04-01’
  22. 22. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 22Public SERIES_ROUND Rounding to an Evaluated Interval Width Some use cases require a dynamic granularity for the interval width E.g. To split data into n buckets per year (where n is a variable): SELECT bucket, max(value) FROM ( SELECT SERIES_ROUND(ts,'interval ' || 3600*24*365/n || ' second' ) as bucket , value FROM T ) D GROUP BY bucket
  23. 23. Analytic Enhancements New Analytic Functions
  24. 24. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 24Public Analytic Functions Summary of New functions Function Description AUTO_CORR(col,maxlag {SERIES(…) | ORDER BY c1, …}) Aggregate to computes all autocorrelation coefficients for a given input column. DFT(col,N {SERIES(…) | ORDER BY c1,…} ).{REAL|IMAGINARY|AMPLITUDE|PHASE} Aggregate to computes the Discrete Fourier Transform of a column for the first N values and return an array with exactly N elements. FIRST_VALUE(col ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return first value (with given ordering) LAST_VALUE(col ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return last value (with given ordering) NTH_VALUE(col, n ORDER BY c1,…) [RESPECT NULLS | IGNORE NULLS] Aggregate function to return n’th value (with given ordering) CUBIC_SPLINE_APPROX(col, type, mode, par1, par2 ) OVER (PARTITION BY <…> ORDER BY <…>) Window function to replace NULL values with cubic spline approximation CROSS_CORR(col1,col2,N ORDER BY … ) The cross correlation function computes the correlation between two value columns for a given number of lags BINNING(col, name => val) OVER(…) Window function assigning input into bins using different algorithms. RANDOM_PARTITION(n1,n2,n3,seed) OVER(…) Window function to assign input randomly to different sets (training/validation/test) WEIGHTED_AVG(col,weight_array) OVER(…) Window function to compute a weighted moving average with the provided weight values. SERIES_FILTER(col,filter) OVER(…) A window function that applies filtering or smoothing. For example, exponential smoothing or an autoregressive filter. SERIES_FORECAST(model).{FITTED | LOW95 | HIGH95 | LOW80 | HIGH80} OVER (…) Forecast based on a model built using PAL.
  25. 25. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 25Public Analytic Functions First, Last, Nth Value Aggregate Functions Changing the time granularity from days to monthsSAP Stock Price SELECT min("date") as "date", first_value("open" order by "date") as "open", last_value("close" order by "date") as "close", max("high") as "high", min("low") as "low", sum("volume") as "volume" FROM "I058576"."sap_stock_price" GROUP BY SERIES_ROUND("date", 'INTERVAL 1 MONTH', ROUND_DOWN)
  26. 26. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 26Public select distinct GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) AS bin_datetime, FIRST_VALUE(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as open_price, max(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as high_price, min(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as low_price, LAST_VALUE(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as close_price, COUNT(GF_LAST) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as num_trades, sum(GF_LAST_VOL) over (partition by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER))) order by GF_ISIN, TO_TIMESTAMP(GF_DATE || ' ' || ADD_SECONDS('09:00:00.000',CAST( CEIL(SECONDS_BETWEEN('09:00:00.000',GF_TIME) / CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200)) * CEIL(1+SECONDS_BETWEEN('09:00:00.000','18:00:00.000' ) /200) as INTEGER)))) as bin_vol from RAP_USER.GF_TICKS where GF_TIME >= '08:59:59.999' and GF_TIME <= '18:00:00.001' and GF_DATE ='2012-01-13' and GF_LAST_VOL > 0 and GF_ISIN = 'DE0007164600' Query without series feature Same query with series feature SELECT min("date") as "date", first_value("open" order by "date") as "open", last_value("close" order by "date") as "close", max("high") as "high", min("low") as "low", sum("volume") as "volume" FROM "I058576"."sap_stock_price" GROUP BY SERIES_ROUND("date", 'INTERVAL 1 MONTH', ROUND_DOWN) Analytic Functions First, Last, Nth Value Aggregate Functions
  27. 27. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 27Public Analytic Functions Cubic Spline Approximation Replacement of null values by interpolating the gaps and extrapolating any leading or trailing null values. Interpolation can be done by  Linear interpolation  Cubic spline interpolation SELECT "ts", "temperature", linear_approx("temperature") OVER (ORDER BY "ts"), cubic_spline_approx("temperature") OVER (ORDER BY "ts") FROM "weather"
  28. 28. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 28Public Analytic Functions Auto Correlation and Cross Correlation Series data function used to find periodic pattern in the data, like seasonality. Auto-correlation looks for periodicity between values of the same series as a function of the time lag between them. Cross-correlation looks for periodicity between values of different series as a function of the time lag between them SELECT corr, ordinality AS lag FROM unnest(( SELECT auto_corr(temperature, 1000 ORDER BY ts) FROM weather )) WITH ORDINALITY AS tt(corr)
  29. 29. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 29Public Analytic Functions Weighted Moving Average Data smoothing via weighted moving average with linearly decreasing weights. Window frame defines the smoothing window. SELECT "ts", "temperature", weighted_avg("temperature") OVER (ORDER BY "ts" ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) FROM "weather"
  30. 30. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 30Public Analytic Functions Filtering of Series Data Filter function for different filter method  Exponential smoothing  Autoregressive and moving average filter In SPS10 available  Single exponential smoothing  Double exponential smoothing PAL functions integrated into series data. Support for smoothing and forecasting. -- single exponential smoothing with a smoothing parameter alpha = 0.2 select "ts", "temperature", series_filter(value => "temperature", method_name => 'SINGLESMOOTH', alpha => 0.2) OVER (ORDER BY "ts") AS SINGLESMOOTH FROM "weather"
  31. 31. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 31Public Analytic Functions Binning Binning assigns data values to bins. Different binning methods  Number of equal width bins  Width of the bins  Number of bins with equal number of records  Number of standard deviations left and right from the mean PAL function integrated into series data 0 5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 -- compute histogram SELECT bin_number, count(bin_number) as cnt FROM ( SELECT binning(value => "open", bin_count => 8) OVER (ORDER BY "date") AS bin_number FROM "I058576"."sap_stock_price" ) GROUP BY bin_number
  32. 32. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 32Public Analytic Functions Random Partition Partitioning divides the input data into three sets, a training, a validation, and a test set that are used in machine learning. Support for  Random partitioning  Stratified partitioning PAL function integrated into series data -- stratified partitioning with fractional partition sizes (70% training, 20% validation, 10% test) SELECT *, random_partition(0.7, 0.2, 0.1, 42) OVER (PARTITION BY "weather_station") AS "PARTITION" FROM "weather"
  33. 33. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 33Public Analytic Functions Discrete Fourier Transform Discrete Fourier transforms are used in spectral analysis of series data, e.g. in vibration analysis. Computation uses the FFT algorithm and returns  Amplitude / phase  Real part / imaginary part SELECT ordinality AS "frequency", "amplitude"/4096 AS "amplitude" FROM unnest (( SELECT dft("amplitude", 4096 order by "ts").amplitude FROM "vibration" )) WITH ORDINALITY AS tt(amplitude)
  34. 34. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 34Public Analytic Functions Miscellaneous Updates MEDIAN as window function with arbitrary window frames CORR_SPEARMAN for character columns Aggregate functions in the series library • Standard deviation (sample and population) • Variance (sample and population) • Co-Variance (sample and population)
  35. 35. © 2015 SAP SE or an SAP affiliate company. All rights reserved. Thank you Contact information Raj Rathee SAP HANA Product Management AskSAPHANA@sap.com

×