SlideShare a Scribd company logo
Time Series Data Mining -
from PhD to Startup
Peter Laurinec
October 27, 2018
Highlights
Time series data mining - from PhD to start-up:
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
• Work after Phd - energy start-up,
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
• Work after Phd - energy start-up,
• Differences and my thoughts,
1/27
Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
• Work after Phd - energy start-up,
• Differences and my thoughts,
• What we do there...
1/27
Time Series Data in Energetics
Smart metering
• Measuring electricity consumption or production
(photovoltaic panels) from every consumer or
producer (together prosumer) every 5, 15, or 30
minutes,
• This creates a large amount of time series data,
• 3 years of data from consumer 96*365*3 =
105120...from 10 thousand consumers... > 1 billion rows
of multiple columns,
• Smart grid - set of consumers and producers,
2/27
Time Series Data in Energetics
Smart metering
• Measuring electricity consumption or production
(photovoltaic panels) from every consumer or
producer (together prosumer) every 5, 15, or 30
minutes,
• This creates a large amount of time series data,
• 3 years of data from consumer 96*365*3 =
105120...from 10 thousand consumers... > 1 billion rows
of multiple columns,
• Smart grid - set of consumers and producers,
Characteristics:
• High-dimensionality,
• Multiple seasonalities (daily, weekly, yearly),
• Large amount of stochastic factors as: weather,
holidays, black-outs, changes on market etc. 2/27
Examples of Consumers TS
0 250 500 750 1000
20
40
60
0
5
10
15
0.0
0.5
1.0
1.5
2.0
0
1
2
Time (3 weeks)
ElectricityConsumption(kW)
3/27
Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
4/27
Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
4/27
Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
4/27
Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
• Optimizing whole smart grid,
4/27
Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
• Optimizing whole smart grid,
• Monitoring smart grid,
4/27
Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
• Optimizing whole smart grid,
• Monitoring smart grid,
• Anomaly detection.
4/27
TS Data Mining Methods
• Methods for working with TS:
5/27
TS Data Mining Methods
• Methods for working with TS:
• TS representations,
5/27
TS Data Mining Methods
• Methods for working with TS:
• TS representations,
• TS distance measures,
5/27
TS Data Mining Methods
• Methods for working with TS:
• TS representations,
• TS distance measures,
• Tasks:
5/27
TS Data Mining Methods
• Methods for working with TS:
• TS representations,
• TS distance measures,
• Tasks:
• TS classification,
• TS clustering,
• TS forecasting,
• TS anomaly detection,
• TS indexing.
5/27
PhD. Thesis Goals
• The thesis had the goal to investigate, in the broader
context, the usage of time series data mining (analysis)
methods in order to improve the predictive performance
of machine learning methods and its combinations.
6/27
PhD. Thesis Goals
• The thesis had the goal to investigate, in the broader
context, the usage of time series data mining (analysis)
methods in order to improve the predictive performance
of machine learning methods and its combinations.
• In more detail, the goal was to investigate the usage of
various time series representations for seasonal time
series, clustering, and forecasting methods for electricity
consumption forecasting accuracy improvement.
6/27
Approach Overview
7/27
I. Time Series Representations
8/27
I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
9/27
I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
• Use time series representations!
9/27
I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
• Use time series representations!
They are excellent to:
• Reduce memory load.
• Accelerate subsequent machine learning algorithms.
• Implicitly remove noise from the data.
• Emphasize the essential characteristics of the data.
• Help to find patterns in data (or motifs).
9/27
4.00
4.25
4.50
4.75
0 500 1000
Time
Load
4.0
4.2
4.4
4.6
4.8
0 50 100 150
Length
Load
4.0
4.2
4.4
4.6
4.8
0 50 100 150
Length
Load
10/27
4.00
4.25
4.50
4.75
0 500 1000
Time
Load
4.2
4.3
4.4
4.5
4.6
0 10 20 30 40 50
Length
Load
4.2
4.4
4.6
0 100 200 300
Length
Load
11/27
I. Time Series Representations 1
I used TS representations for:
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
I. Time Series Representations 1
I used TS representations for:
• Dimensionality reduction (curse of dimensionality),
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
I. Time Series Representations 1
I used TS representations for:
• Dimensionality reduction (curse of dimensionality),
• Emphasising the main characteristics of data,
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
I. Time Series Representations 1
I used TS representations for:
• Dimensionality reduction (curse of dimensionality),
• Emphasising the main characteristics of data,
• More accurate clustering of consumers TS to create more
predictable (forecastable) groups of aggregated TS of
electricity consumption.
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
Clustered TS Representations
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
0 20 40 0 20 40 0 20 40 0 20 40
−1
0
1
2
3
−2
−1
0
1
2
3
−2
0
2
−2
0
2
−2
−1
0
1
2
3
−1
0
1
2
−3
−2
−1
0
1
2
−2
0
2
−2
−1
0
1
2
0
2
4
−2
0
2
4
−1
0
1
2
−2
0
2
4
−2
0
2
−2
−1
0
1
2
3
−1
0
1
2
−1
0
1
2
3
0
2
4
−2
−1
0
1
2
−2
−1
0
1
2
Length
RegressionCoefficients
13/27
Groups of Aggregated TS
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
−0.5
0.0
0.5
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
1.0
−0.50
−0.25
0.00
0.25
−1
0
1
−1.0
−0.5
0.0
0.5
−1.0
−0.5
0.0
0.5
1.0
−1.0
−0.5
0.0
0.5
−1.0
−0.5
0.0
0.5
1.0
−1.0
−0.5
0.0
0.5
1.0
−0.5
0.0
0.5
1.0
0
1
0
1
2
−0.5
0.0
0.5
−1.0
−0.5
0.0
0.5
1.0
−0.5
0.0
0.5
1.0
1.5
0
1
0
1
2
0
1
2
3
4
5
−1.0
−0.5
0.0
0.5
1.0
−1
0
1
Time
NormalizedLoad
14/27
TSrepr
TSrepr - CRAN2, GitHub3
• R package for time series representations computing
• Large amount of various methods are implemented
• Several useful support functions are also included
• Easy to extend and to use
data <- rnorm(1000)
repr_paa(data, func = median, q = 10)
2
https://CRAN.R-project.org/package=TSrepr
3
https://github.com/PetoLau/TSrepr/
15/27
All type of time series representations methods are implemented, so far these:
• PAA - Piecewise Aggregate Approximation ( repr_paa )
• DWT - Discrete Wavelet Transform ( repr_dwt )
• DFT - Discrete Fourier Transform ( repr_dft )
• DCT - Discrete Cosine Transform ( repr_dct )
• PIP - Perceptually Important Points ( repr_pip )
• SAX - Symbolic Aggregate Approximation ( repr_sax )
• PLA - Piecewise Linear Approximation ( repr_pla )
• Mean seasonal profile ( repr_seas_profile )
• Model-based seasonal representations based on linear model ( repr_lm )
• FeaClip - Feature extraction from clipping representation ( repr_feaclip )
Additional useful functions are implemented as:
• Windowing ( repr_windowing )
• Matrix of representations ( repr_matrix )
• Normalisation functions - z-score ( norm_z ), min-max ( norm_min_max )
16/27
Usage of TSrepr
mat <- "some matrix with lot of time series"
mat_reprs <- repr_matrix(mat, func = repr_lm,
args = list(method = "rlm", freq = c(48, 48*7)),
normalise = TRUE, func_norm = norm_z)
mat_reprs <- repr_matrix(mat, func = repr_feaclip,
windowing = TRUE, win_size = 48)
clustering <- kmeans(mat_reprs, 20)
17/27
Simple Extensibility of TSrepr
Example #1:
library(moments)
data_ts_skew <- repr_paa(data, q = 48, func = skewness)
Example #2:
repr_fea_extract <- function(x)
c(mean(x), median(x), max(x), min(x), sd(x))
data_fea <- repr_windowing(data,
win_size = 100, func = repr_fea_extract)
18/27
II. Time Series Clustering
19/27
II. Clustering Multiple Data Streams 4
Motivation:
4
https://github.com/PetoLau/ClipStream/
20/27
II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
4
https://github.com/PetoLau/ClipStream/
20/27
II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
4
https://github.com/PetoLau/ClipStream/
20/27
II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
• Automatic anomaly detection (anomalous consumers),
4
https://github.com/PetoLau/ClipStream/
20/27
II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
• Automatic anomaly detection (anomalous consumers),
• Automatic change detection.
4
https://github.com/PetoLau/ClipStream/
20/27
II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
• Automatic anomaly detection (anomalous consumers),
• Automatic change detection.
Approach:
• Take advantage of incrementality of clipped representation
(windowing),
• Fast detection of anomalous consumers from extracted features from
clipping,
• Change detection by Anderson-Darling test.
4
https://github.com/PetoLau/ClipStream/
20/27
21/27
III. Time Series Forecasting
22/27
III. Time Series Forecasting
Large number of methods suitable for forecasting:
• Time series analysis methods:
• ARIMA,
• Exponential smoothing,
• Theta,
23/27
III. Time Series Forecasting
Large number of methods suitable for forecasting:
• Time series analysis methods:
• ARIMA,
• Exponential smoothing,
• Theta,
• Regression methods:
• Linear regression, GAM,
• SVR, Gaussian process,
• Regression trees, Bagging, Random Forest, Boosting,
• Artificial Neural Networks.
23/27
III. Time Series Forecasting 5
Finding the most suitable forecasting methods with
clustering...
• STL+ARIMA, Exponential smoothing, Tree-based methods,
Advanced ANNs (S2S + LSTM nets).
5
https://github.com/PetoLau/TSMedianBasedEnsembleLearning/,
https://github.com/PetoLau/UnsupervisedEnsembles/,
https://github.com/PetoLau/DensityEnsembles/
24/27
III. Time Series Forecasting 5
Finding the most suitable forecasting methods with
clustering...
• STL+ARIMA, Exponential smoothing, Tree-based methods,
Advanced ANNs (S2S + LSTM nets).
The problem of choosing the most suitable method among the
set of methods...
Solution:
• Ensemble learning - combining forecasts.
5
https://github.com/PetoLau/TSMedianBasedEnsembleLearning/,
https://github.com/PetoLau/UnsupervisedEnsembles/,
https://github.com/PetoLau/DensityEnsembles/
24/27
Life after PhD
• I was happy to be hired by start-up PowereX.
• We solve problems strongly related with my thesis.
25/27
Life after PhD
• I was happy to be hired by start-up PowereX.
• We solve problems strongly related with my thesis.
PowereX
• P2P energy sharing - commodity and also capacity,
• Analysis of consumers smart meter data,
• Forecasting and modelling maximal load (hourly, daily,
etc.).
25/27
Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
26/27
Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
• Many times working with poor academic datasets.
26/27
Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
• Many times working with poor academic datasets.
Business:
• Finding real value for customers,
• Accuracy is not that important,
• Working on real rich data.
26/27
Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
• Many times working with poor academic datasets.
Business:
• Finding real value for customers,
• Accuracy is not that important,
• Working on real rich data.
But...they are also related and need each other...
26/27
Conclusions
TS data mining:
27/27
Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
27/27
Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
• Implemented in TSrepr package,
27/27
Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
• Implemented in TSrepr package,
• PhD study is great practice before work.
27/27
Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
• Implemented in TSrepr package,
• PhD study is great practice before work.
Questions: Peter Laurinec laurinec.peter@gmail.com
Code: https://github.com/PetoLau/
More research: https://petolau.github.io/research
Blog: https://petolau.github.io
27/27

More Related Content

Similar to Time Series Data Mining - from PhD to Startup

IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET Journal
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Datapepeborja
 
Introduction to Statistics and Probability:
Introduction to Statistics and Probability:Introduction to Statistics and Probability:
Introduction to Statistics and Probability:
Shrihari Shrihari
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
IJPEDS-IAES
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
 
Big data
Big dataBig data
Big data
Zeeshan Khan
 
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...
Luigi Vanfretti
 
Big data
Big dataBig data
Big data
Big dataBig data
Big data
Harshit Namdev
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
1904saikrishna
 
Monitoring of Transmission and Distribution Grids using PMUs
Monitoring of Transmission and Distribution Grids using PMUsMonitoring of Transmission and Distribution Grids using PMUs
Monitoring of Transmission and Distribution Grids using PMUs
Luigi Vanfretti
 
MSPresentation_Spring2011
MSPresentation_Spring2011MSPresentation_Spring2011
MSPresentation_Spring2011Shaun Smith
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
Skillwise Group
 
Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...
Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...
Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...
Smart Villages
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
Zagazig University
 
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Aimilia-Myrsini Theologi
 
Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...
Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...
Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...
IRJET Journal
 

Similar to Time Series Data Mining - from PhD to Startup (20)

IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
 
Ac26185187
Ac26185187Ac26185187
Ac26185187
 
Introduction to Statistics and Probability:
Introduction to Statistics and Probability:Introduction to Statistics and Probability:
Introduction to Statistics and Probability:
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Big data
Big dataBig data
Big data
 
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
presentation_btp
presentation_btppresentation_btp
presentation_btp
 
Monitoring of Transmission and Distribution Grids using PMUs
Monitoring of Transmission and Distribution Grids using PMUsMonitoring of Transmission and Distribution Grids using PMUs
Monitoring of Transmission and Distribution Grids using PMUs
 
MSPresentation_Spring2011
MSPresentation_Spring2011MSPresentation_Spring2011
MSPresentation_Spring2011
 
Energy saving policies final
Energy saving policies finalEnergy saving policies final
Energy saving policies final
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...
Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...
Panama | May 2017 | Lecciones y factores de éxito - Microrredes Rurales Fotov...
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...
 
Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...
Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...
Wide Area Monitoring, Protection and Control (WAMPAC) Application in Transmis...
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 

Time Series Data Mining - from PhD to Startup

  • 1. Time Series Data Mining - from PhD to Startup Peter Laurinec October 27, 2018
  • 2. Highlights Time series data mining - from PhD to start-up: 1/27
  • 3. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), 1/27
  • 4. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), • TS data mining methods, 1/27
  • 5. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), • TS data mining methods, • PhD. study thesis - combining and developing TS data mining methods, 1/27
  • 6. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), • TS data mining methods, • PhD. study thesis - combining and developing TS data mining methods, • TSrepr R package - TS representations, 1/27
  • 7. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), • TS data mining methods, • PhD. study thesis - combining and developing TS data mining methods, • TSrepr R package - TS representations, • Work after Phd - energy start-up, 1/27
  • 8. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), • TS data mining methods, • PhD. study thesis - combining and developing TS data mining methods, • TSrepr R package - TS representations, • Work after Phd - energy start-up, • Differences and my thoughts, 1/27
  • 9. Highlights Time series data mining - from PhD to start-up: • Problems and solutions for using large amount of long time series (TS), • TS data mining methods, • PhD. study thesis - combining and developing TS data mining methods, • TSrepr R package - TS representations, • Work after Phd - energy start-up, • Differences and my thoughts, • What we do there... 1/27
  • 10. Time Series Data in Energetics Smart metering • Measuring electricity consumption or production (photovoltaic panels) from every consumer or producer (together prosumer) every 5, 15, or 30 minutes, • This creates a large amount of time series data, • 3 years of data from consumer 96*365*3 = 105120...from 10 thousand consumers... > 1 billion rows of multiple columns, • Smart grid - set of consumers and producers, 2/27
  • 11. Time Series Data in Energetics Smart metering • Measuring electricity consumption or production (photovoltaic panels) from every consumer or producer (together prosumer) every 5, 15, or 30 minutes, • This creates a large amount of time series data, • 3 years of data from consumer 96*365*3 = 105120...from 10 thousand consumers... > 1 billion rows of multiple columns, • Smart grid - set of consumers and producers, Characteristics: • High-dimensionality, • Multiple seasonalities (daily, weekly, yearly), • Large amount of stochastic factors as: weather, holidays, black-outs, changes on market etc. 2/27
  • 12. Examples of Consumers TS 0 250 500 750 1000 20 40 60 0 5 10 15 0.0 0.5 1.0 1.5 2.0 0 1 2 Time (3 weeks) ElectricityConsumption(kW) 3/27
  • 13. Typical Use Cases • Forecasting el. consumption or production - market planning, black-outs prevention etc., 4/27
  • 14. Typical Use Cases • Forecasting el. consumption or production - market planning, black-outs prevention etc., • Extract typical profiles of consumption - changes in tariffs, create new ones etc., 4/27
  • 15. Typical Use Cases • Forecasting el. consumption or production - market planning, black-outs prevention etc., • Extract typical profiles of consumption - changes in tariffs, create new ones etc., • Optimizing electricity consumption of some consumer, 4/27
  • 16. Typical Use Cases • Forecasting el. consumption or production - market planning, black-outs prevention etc., • Extract typical profiles of consumption - changes in tariffs, create new ones etc., • Optimizing electricity consumption of some consumer, • Optimizing whole smart grid, 4/27
  • 17. Typical Use Cases • Forecasting el. consumption or production - market planning, black-outs prevention etc., • Extract typical profiles of consumption - changes in tariffs, create new ones etc., • Optimizing electricity consumption of some consumer, • Optimizing whole smart grid, • Monitoring smart grid, 4/27
  • 18. Typical Use Cases • Forecasting el. consumption or production - market planning, black-outs prevention etc., • Extract typical profiles of consumption - changes in tariffs, create new ones etc., • Optimizing electricity consumption of some consumer, • Optimizing whole smart grid, • Monitoring smart grid, • Anomaly detection. 4/27
  • 19. TS Data Mining Methods • Methods for working with TS: 5/27
  • 20. TS Data Mining Methods • Methods for working with TS: • TS representations, 5/27
  • 21. TS Data Mining Methods • Methods for working with TS: • TS representations, • TS distance measures, 5/27
  • 22. TS Data Mining Methods • Methods for working with TS: • TS representations, • TS distance measures, • Tasks: 5/27
  • 23. TS Data Mining Methods • Methods for working with TS: • TS representations, • TS distance measures, • Tasks: • TS classification, • TS clustering, • TS forecasting, • TS anomaly detection, • TS indexing. 5/27
  • 24. PhD. Thesis Goals • The thesis had the goal to investigate, in the broader context, the usage of time series data mining (analysis) methods in order to improve the predictive performance of machine learning methods and its combinations. 6/27
  • 25. PhD. Thesis Goals • The thesis had the goal to investigate, in the broader context, the usage of time series data mining (analysis) methods in order to improve the predictive performance of machine learning methods and its combinations. • In more detail, the goal was to investigate the usage of various time series representations for seasonal time series, clustering, and forecasting methods for electricity consumption forecasting accuracy improvement. 6/27
  • 27. I. Time Series Representations 8/27
  • 28. I. Time Series Representations What can we do for solving problems with high-dimensional TS? 9/27
  • 29. I. Time Series Representations What can we do for solving problems with high-dimensional TS? • Use time series representations! 9/27
  • 30. I. Time Series Representations What can we do for solving problems with high-dimensional TS? • Use time series representations! They are excellent to: • Reduce memory load. • Accelerate subsequent machine learning algorithms. • Implicitly remove noise from the data. • Emphasize the essential characteristics of the data. • Help to find patterns in data (or motifs). 9/27
  • 31. 4.00 4.25 4.50 4.75 0 500 1000 Time Load 4.0 4.2 4.4 4.6 4.8 0 50 100 150 Length Load 4.0 4.2 4.4 4.6 4.8 0 50 100 150 Length Load 10/27
  • 32. 4.00 4.25 4.50 4.75 0 500 1000 Time Load 4.2 4.3 4.4 4.5 4.6 0 10 20 30 40 50 Length Load 4.2 4.4 4.6 0 100 200 300 Length Load 11/27
  • 33. I. Time Series Representations 1 I used TS representations for: 1 Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016. 12/27
  • 34. I. Time Series Representations 1 I used TS representations for: • Dimensionality reduction (curse of dimensionality), 1 Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016. 12/27
  • 35. I. Time Series Representations 1 I used TS representations for: • Dimensionality reduction (curse of dimensionality), • Emphasising the main characteristics of data, 1 Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016. 12/27
  • 36. I. Time Series Representations 1 I used TS representations for: • Dimensionality reduction (curse of dimensionality), • Emphasising the main characteristics of data, • More accurate clustering of consumers TS to create more predictable (forecastable) groups of aggregated TS of electricity consumption. 1 Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016. 12/27
  • 37. Clustered TS Representations 17 18 19 20 13 14 15 16 9 10 11 12 5 6 7 8 1 2 3 4 0 20 40 0 20 40 0 20 40 0 20 40 −1 0 1 2 3 −2 −1 0 1 2 3 −2 0 2 −2 0 2 −2 −1 0 1 2 3 −1 0 1 2 −3 −2 −1 0 1 2 −2 0 2 −2 −1 0 1 2 0 2 4 −2 0 2 4 −1 0 1 2 −2 0 2 4 −2 0 2 −2 −1 0 1 2 3 −1 0 1 2 −1 0 1 2 3 0 2 4 −2 −1 0 1 2 −2 −1 0 1 2 Length RegressionCoefficients 13/27
  • 38. Groups of Aggregated TS 17 18 19 20 13 14 15 16 9 10 11 12 5 6 7 8 1 2 3 4 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 −0.5 0.0 0.5 1.0 1.5 −1.5 −1.0 −0.5 0.0 0.5 1.0 −0.50 −0.25 0.00 0.25 −1 0 1 −1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 0 1 0 1 2 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0 1.5 0 1 0 1 2 0 1 2 3 4 5 −1.0 −0.5 0.0 0.5 1.0 −1 0 1 Time NormalizedLoad 14/27
  • 39. TSrepr TSrepr - CRAN2, GitHub3 • R package for time series representations computing • Large amount of various methods are implemented • Several useful support functions are also included • Easy to extend and to use data <- rnorm(1000) repr_paa(data, func = median, q = 10) 2 https://CRAN.R-project.org/package=TSrepr 3 https://github.com/PetoLau/TSrepr/ 15/27
  • 40. All type of time series representations methods are implemented, so far these: • PAA - Piecewise Aggregate Approximation ( repr_paa ) • DWT - Discrete Wavelet Transform ( repr_dwt ) • DFT - Discrete Fourier Transform ( repr_dft ) • DCT - Discrete Cosine Transform ( repr_dct ) • PIP - Perceptually Important Points ( repr_pip ) • SAX - Symbolic Aggregate Approximation ( repr_sax ) • PLA - Piecewise Linear Approximation ( repr_pla ) • Mean seasonal profile ( repr_seas_profile ) • Model-based seasonal representations based on linear model ( repr_lm ) • FeaClip - Feature extraction from clipping representation ( repr_feaclip ) Additional useful functions are implemented as: • Windowing ( repr_windowing ) • Matrix of representations ( repr_matrix ) • Normalisation functions - z-score ( norm_z ), min-max ( norm_min_max ) 16/27
  • 41. Usage of TSrepr mat <- "some matrix with lot of time series" mat_reprs <- repr_matrix(mat, func = repr_lm, args = list(method = "rlm", freq = c(48, 48*7)), normalise = TRUE, func_norm = norm_z) mat_reprs <- repr_matrix(mat, func = repr_feaclip, windowing = TRUE, win_size = 48) clustering <- kmeans(mat_reprs, 20) 17/27
  • 42. Simple Extensibility of TSrepr Example #1: library(moments) data_ts_skew <- repr_paa(data, q = 48, func = skewness) Example #2: repr_fea_extract <- function(x) c(mean(x), median(x), max(x), min(x), sd(x)) data_fea <- repr_windowing(data, win_size = 100, func = repr_fea_extract) 18/27
  • 43. II. Time Series Clustering 19/27
  • 44. II. Clustering Multiple Data Streams 4 Motivation: 4 https://github.com/PetoLau/ClipStream/ 20/27
  • 45. II. Clustering Multiple Data Streams 4 Motivation: • Deal with velocity of data coming, 4 https://github.com/PetoLau/ClipStream/ 20/27
  • 46. II. Clustering Multiple Data Streams 4 Motivation: • Deal with velocity of data coming, • Dynamic change of number of clusters, 4 https://github.com/PetoLau/ClipStream/ 20/27
  • 47. II. Clustering Multiple Data Streams 4 Motivation: • Deal with velocity of data coming, • Dynamic change of number of clusters, • Automatic anomaly detection (anomalous consumers), 4 https://github.com/PetoLau/ClipStream/ 20/27
  • 48. II. Clustering Multiple Data Streams 4 Motivation: • Deal with velocity of data coming, • Dynamic change of number of clusters, • Automatic anomaly detection (anomalous consumers), • Automatic change detection. 4 https://github.com/PetoLau/ClipStream/ 20/27
  • 49. II. Clustering Multiple Data Streams 4 Motivation: • Deal with velocity of data coming, • Dynamic change of number of clusters, • Automatic anomaly detection (anomalous consumers), • Automatic change detection. Approach: • Take advantage of incrementality of clipped representation (windowing), • Fast detection of anomalous consumers from extracted features from clipping, • Change detection by Anderson-Darling test. 4 https://github.com/PetoLau/ClipStream/ 20/27
  • 50. 21/27
  • 51. III. Time Series Forecasting 22/27
  • 52. III. Time Series Forecasting Large number of methods suitable for forecasting: • Time series analysis methods: • ARIMA, • Exponential smoothing, • Theta, 23/27
  • 53. III. Time Series Forecasting Large number of methods suitable for forecasting: • Time series analysis methods: • ARIMA, • Exponential smoothing, • Theta, • Regression methods: • Linear regression, GAM, • SVR, Gaussian process, • Regression trees, Bagging, Random Forest, Boosting, • Artificial Neural Networks. 23/27
  • 54. III. Time Series Forecasting 5 Finding the most suitable forecasting methods with clustering... • STL+ARIMA, Exponential smoothing, Tree-based methods, Advanced ANNs (S2S + LSTM nets). 5 https://github.com/PetoLau/TSMedianBasedEnsembleLearning/, https://github.com/PetoLau/UnsupervisedEnsembles/, https://github.com/PetoLau/DensityEnsembles/ 24/27
  • 55. III. Time Series Forecasting 5 Finding the most suitable forecasting methods with clustering... • STL+ARIMA, Exponential smoothing, Tree-based methods, Advanced ANNs (S2S + LSTM nets). The problem of choosing the most suitable method among the set of methods... Solution: • Ensemble learning - combining forecasts. 5 https://github.com/PetoLau/TSMedianBasedEnsembleLearning/, https://github.com/PetoLau/UnsupervisedEnsembles/, https://github.com/PetoLau/DensityEnsembles/ 24/27
  • 56. Life after PhD • I was happy to be hired by start-up PowereX. • We solve problems strongly related with my thesis. 25/27
  • 57. Life after PhD • I was happy to be hired by start-up PowereX. • We solve problems strongly related with my thesis. PowereX • P2P energy sharing - commodity and also capacity, • Analysis of consumers smart meter data, • Forecasting and modelling maximal load (hourly, daily, etc.). 25/27
  • 58. Differences between PhD and Business PhD: • Strong focus on accuracy measures - ↓ % of Mean Absolute Percentage Error, or internal validation indexes for clustering... 26/27
  • 59. Differences between PhD and Business PhD: • Strong focus on accuracy measures - ↓ % of Mean Absolute Percentage Error, or internal validation indexes for clustering... • Many times working with poor academic datasets. 26/27
  • 60. Differences between PhD and Business PhD: • Strong focus on accuracy measures - ↓ % of Mean Absolute Percentage Error, or internal validation indexes for clustering... • Many times working with poor academic datasets. Business: • Finding real value for customers, • Accuracy is not that important, • Working on real rich data. 26/27
  • 61. Differences between PhD and Business PhD: • Strong focus on accuracy measures - ↓ % of Mean Absolute Percentage Error, or internal validation indexes for clustering... • Many times working with poor academic datasets. Business: • Finding real value for customers, • Accuracy is not that important, • Working on real rich data. But...they are also related and need each other... 26/27
  • 63. Conclusions TS data mining: • TS representations are our fiends in clustering, forecasting, classification etc., 27/27
  • 64. Conclusions TS data mining: • TS representations are our fiends in clustering, forecasting, classification etc., • Implemented in TSrepr package, 27/27
  • 65. Conclusions TS data mining: • TS representations are our fiends in clustering, forecasting, classification etc., • Implemented in TSrepr package, • PhD study is great practice before work. 27/27
  • 66. Conclusions TS data mining: • TS representations are our fiends in clustering, forecasting, classification etc., • Implemented in TSrepr package, • PhD study is great practice before work. Questions: Peter Laurinec laurinec.peter@gmail.com Code: https://github.com/PetoLau/ More research: https://petolau.github.io/research Blog: https://petolau.github.io 27/27