The talk will be oriented on differences between "doing" a research and an application of time series data mining to real problems in business on a real rich data.
I will discuss, why research and business need to be related and also not. Typical tasks of time series data mining in energetics with use cases in R will be shown.
This paper presents artificial intelligence approach of artificial neural network (ANN) and random forest (RF) that used to perform short-term photovoltaic (PV) output current forecasting (STPCF) for the next 24-hours. The input data for ANN and RF is consists of multiple time lags of hourly solar irradiance, temperature, hour, power and current to determine the movement pattern of data that have been denoised by using wavelet decomposition. The Levenberg-Marquardt optimization technique is used as a back-propagation algorithm for ANN and the bagging based bootstrapping technique is used in the RF to improve the results of forecasting. The information of PV output current is obtained from Green Energy Research (GERC) University Technology Mara Shah Alam, Malaysia and is used as the case study in estimation of PV output current for the next 24-hours. The results have shown that both proposed techniques are able to perform forecasting of future hourly PV output current with less error.
Presentation for the Softskills Seminar course @ Telecom ParisTech. Topic is the paper by Domings Hulten "Mining high speed data streams". Presented by me the 30/11/2017
Show and Tell - Data and Digitalisation, Digital Twins.pdfSIFOfgem
This is the third in a series of 'Show and Tell' webinars from the Ofgem Strategic Innovation Fund Discovery phase, covering the Digital Twin projects.
As the move towards a net zero energy system accelerates, network customers and consumers will require simplified and accessible digital products, processes and services that can improve their user experience. Data and digital initiatives are already beginning to show the potential to improve the efficiency of energy networks whilst making it easier for third parties to interact with and innovate for the energy system. Digitalisation of energy network activities will contribute to better coordination, planning and network optimisation.
You will hear from SIF projects which are investigating new digital products and services such as digital twins.
The Strategic Innovation Fund (SIF) is an Ofgem programme managed in partnership with Innovate UK, part of UKRI. The SIF aims to fund network innovation that will contribute to achieving Net Zero rapidly and at lowest cost to consumers, and help transform the UK into the ‘Silicon Valley’ of energy, making it the best place for high-potential businesses to grow and scale in the energy market.
For more information on the SIF visit: www.ofgem.gov.uk/sif
Or sign-up for our newsletter here: https://ukri.innovateuk.org/ofgem-sif-subscription-sign-up
This presentation on Process Analysis using 3D plots was given at Emerson Exchange, 2010. Details are provided on a field trail in which the DeltaV historian was modified to support array parameters and a web enabled interfaces was used to provide a 3D plot of array data. Information is provided on how this was used to look at absorber column and stripper column temperature profiles.
This paper presents artificial intelligence approach of artificial neural network (ANN) and random forest (RF) that used to perform short-term photovoltaic (PV) output current forecasting (STPCF) for the next 24-hours. The input data for ANN and RF is consists of multiple time lags of hourly solar irradiance, temperature, hour, power and current to determine the movement pattern of data that have been denoised by using wavelet decomposition. The Levenberg-Marquardt optimization technique is used as a back-propagation algorithm for ANN and the bagging based bootstrapping technique is used in the RF to improve the results of forecasting. The information of PV output current is obtained from Green Energy Research (GERC) University Technology Mara Shah Alam, Malaysia and is used as the case study in estimation of PV output current for the next 24-hours. The results have shown that both proposed techniques are able to perform forecasting of future hourly PV output current with less error.
Presentation for the Softskills Seminar course @ Telecom ParisTech. Topic is the paper by Domings Hulten "Mining high speed data streams". Presented by me the 30/11/2017
Show and Tell - Data and Digitalisation, Digital Twins.pdfSIFOfgem
This is the third in a series of 'Show and Tell' webinars from the Ofgem Strategic Innovation Fund Discovery phase, covering the Digital Twin projects.
As the move towards a net zero energy system accelerates, network customers and consumers will require simplified and accessible digital products, processes and services that can improve their user experience. Data and digital initiatives are already beginning to show the potential to improve the efficiency of energy networks whilst making it easier for third parties to interact with and innovate for the energy system. Digitalisation of energy network activities will contribute to better coordination, planning and network optimisation.
You will hear from SIF projects which are investigating new digital products and services such as digital twins.
The Strategic Innovation Fund (SIF) is an Ofgem programme managed in partnership with Innovate UK, part of UKRI. The SIF aims to fund network innovation that will contribute to achieving Net Zero rapidly and at lowest cost to consumers, and help transform the UK into the ‘Silicon Valley’ of energy, making it the best place for high-potential businesses to grow and scale in the energy market.
For more information on the SIF visit: www.ofgem.gov.uk/sif
Or sign-up for our newsletter here: https://ukri.innovateuk.org/ofgem-sif-subscription-sign-up
This presentation on Process Analysis using 3D plots was given at Emerson Exchange, 2010. Details are provided on a field trail in which the DeltaV historian was modified to support array parameters and a web enabled interfaces was used to provide a 3D plot of array data. Information is provided on how this was used to look at absorber column and stripper column temperature profiles.
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...IJPEDS-IAES
Power electronic circuit simulation today has become increasingly more demanding in both
the speed and accuracy. Whilst almost every simulator has its own advantages and disadvantages,
co-simulations are becoming more prevalent. This paper provides an overview of
the co-simulation capabilities of device-level circuit simulators. More specifically, a listing
of device-level simulators with their salient features are compared and contrasted. The
co-simulation interfaces between several simulation tools are discussed. A case study is
presented to demonstrate the co-simulation between a device-level simulator (PSIM) interfacing
a system-level simulator (Simulink), and a finite element simulation tool (FLUX).
Results demonstrate the necessity and convenience as well as the drawbacks of such a comprehensive
simulation.
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...Luigi Vanfretti
RaPId is a recursive acronym for Rapid Parameter Identification. The toolbox was built within WP3 of the FP7 iTesla project. It uses Modelica models compiled in FMUs compliant with the FMI standard, which are imported into Simulink using the FMI Toolbox for Matlab/Simulink from Modelon. Within the Matlab environment, we have developed a plug-in architecture that lets the user choose many different (or even their own) optimization solvers for parameter calibration. Not to mention, you can choose any simulation solver available in Simulink (not just trapezoidal integration!)
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy
Hadoop Training in Chennai from BigDataTraining.IN is a leading Global Talent Development Corporation, building skilled manpower pool for global industry requirements. BigData Training.in has today grown to be amongst worlds leading talent development companies offering learning solutions to Individuals, Institutions & Corporate Clients.
Monitoring of Transmission and Distribution Grids using PMUsLuigi Vanfretti
My presentation on "Monitoring of Transmission and Distribution Grids using PMUs" for the Workshop on Energy Business Opportunities in NY State.
The Center for Integrated Electrical Energy Systems (CIEES) at Stony Brook University and the Center for Future Energy Systems (CFES) at Rensselaer Polytechnic Institute will be holding a one day Workshop on Energy Business Opportunities in NY State.
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...Aimilia-Myrsini Theologi
The aim of the thesis is to optimally coordinate the reactive power sources in offshore wind farms in a predictive manner based to the principle of minimizing the wind farm power losses, as well the variations of the transformers tap positions. First, an accurate Neural Network-based wind speed forecasting algorithm was developed in order to counteract the uncertainty of the wind and finally, the optimal management of the available reactive sources is tackled by a metaheuristics-based method. Two different cases were investigated: a far-offshore wind farm with HVDC interconnection link and the AC connected Dutch wind farm BORSSELE.
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...IJPEDS-IAES
Power electronic circuit simulation today has become increasingly more demanding in both
the speed and accuracy. Whilst almost every simulator has its own advantages and disadvantages,
co-simulations are becoming more prevalent. This paper provides an overview of
the co-simulation capabilities of device-level circuit simulators. More specifically, a listing
of device-level simulators with their salient features are compared and contrasted. The
co-simulation interfaces between several simulation tools are discussed. A case study is
presented to demonstrate the co-simulation between a device-level simulator (PSIM) interfacing
a system-level simulator (Simulink), and a finite element simulation tool (FLUX).
Results demonstrate the necessity and convenience as well as the drawbacks of such a comprehensive
simulation.
The RaPId Toolbox for Parameter Identification and Model Validation: How Mode...Luigi Vanfretti
RaPId is a recursive acronym for Rapid Parameter Identification. The toolbox was built within WP3 of the FP7 iTesla project. It uses Modelica models compiled in FMUs compliant with the FMI standard, which are imported into Simulink using the FMI Toolbox for Matlab/Simulink from Modelon. Within the Matlab environment, we have developed a plug-in architecture that lets the user choose many different (or even their own) optimization solvers for parameter calibration. Not to mention, you can choose any simulation solver available in Simulink (not just trapezoidal integration!)
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy
Hadoop Training in Chennai from BigDataTraining.IN is a leading Global Talent Development Corporation, building skilled manpower pool for global industry requirements. BigData Training.in has today grown to be amongst worlds leading talent development companies offering learning solutions to Individuals, Institutions & Corporate Clients.
Monitoring of Transmission and Distribution Grids using PMUsLuigi Vanfretti
My presentation on "Monitoring of Transmission and Distribution Grids using PMUs" for the Workshop on Energy Business Opportunities in NY State.
The Center for Integrated Electrical Energy Systems (CIEES) at Stony Brook University and the Center for Future Energy Systems (CFES) at Rensselaer Polytechnic Institute will be holding a one day Workshop on Energy Business Opportunities in NY State.
Metaheuristics-based Optimal Reactive Power Management in Offshore Wind Farms...Aimilia-Myrsini Theologi
The aim of the thesis is to optimally coordinate the reactive power sources in offshore wind farms in a predictive manner based to the principle of minimizing the wind farm power losses, as well the variations of the transformers tap positions. First, an accurate Neural Network-based wind speed forecasting algorithm was developed in order to counteract the uncertainty of the wind and finally, the optimal management of the available reactive sources is tackled by a metaheuristics-based method. Two different cases were investigated: a far-offshore wind farm with HVDC interconnection link and the AC connected Dutch wind farm BORSSELE.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
3. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
1/27
4. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
1/27
5. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
1/27
6. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
1/27
7. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
• Work after Phd - energy start-up,
1/27
8. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
• Work after Phd - energy start-up,
• Differences and my thoughts,
1/27
9. Highlights
Time series data mining - from PhD to start-up:
• Problems and solutions for using large amount of long
time series (TS),
• TS data mining methods,
• PhD. study thesis - combining and developing TS data
mining methods,
• TSrepr R package - TS representations,
• Work after Phd - energy start-up,
• Differences and my thoughts,
• What we do there...
1/27
10. Time Series Data in Energetics
Smart metering
• Measuring electricity consumption or production
(photovoltaic panels) from every consumer or
producer (together prosumer) every 5, 15, or 30
minutes,
• This creates a large amount of time series data,
• 3 years of data from consumer 96*365*3 =
105120...from 10 thousand consumers... > 1 billion rows
of multiple columns,
• Smart grid - set of consumers and producers,
2/27
11. Time Series Data in Energetics
Smart metering
• Measuring electricity consumption or production
(photovoltaic panels) from every consumer or
producer (together prosumer) every 5, 15, or 30
minutes,
• This creates a large amount of time series data,
• 3 years of data from consumer 96*365*3 =
105120...from 10 thousand consumers... > 1 billion rows
of multiple columns,
• Smart grid - set of consumers and producers,
Characteristics:
• High-dimensionality,
• Multiple seasonalities (daily, weekly, yearly),
• Large amount of stochastic factors as: weather,
holidays, black-outs, changes on market etc. 2/27
13. Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
4/27
14. Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
4/27
15. Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
4/27
16. Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
• Optimizing whole smart grid,
4/27
17. Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
• Optimizing whole smart grid,
• Monitoring smart grid,
4/27
18. Typical Use Cases
• Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
• Extract typical profiles of consumption - changes in tariffs,
create new ones etc.,
• Optimizing electricity consumption of some consumer,
• Optimizing whole smart grid,
• Monitoring smart grid,
• Anomaly detection.
4/27
19. TS Data Mining Methods
• Methods for working with TS:
5/27
20. TS Data Mining Methods
• Methods for working with TS:
• TS representations,
5/27
21. TS Data Mining Methods
• Methods for working with TS:
• TS representations,
• TS distance measures,
5/27
22. TS Data Mining Methods
• Methods for working with TS:
• TS representations,
• TS distance measures,
• Tasks:
5/27
23. TS Data Mining Methods
• Methods for working with TS:
• TS representations,
• TS distance measures,
• Tasks:
• TS classification,
• TS clustering,
• TS forecasting,
• TS anomaly detection,
• TS indexing.
5/27
24. PhD. Thesis Goals
• The thesis had the goal to investigate, in the broader
context, the usage of time series data mining (analysis)
methods in order to improve the predictive performance
of machine learning methods and its combinations.
6/27
25. PhD. Thesis Goals
• The thesis had the goal to investigate, in the broader
context, the usage of time series data mining (analysis)
methods in order to improve the predictive performance
of machine learning methods and its combinations.
• In more detail, the goal was to investigate the usage of
various time series representations for seasonal time
series, clustering, and forecasting methods for electricity
consumption forecasting accuracy improvement.
6/27
28. I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
9/27
29. I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
• Use time series representations!
9/27
30. I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
• Use time series representations!
They are excellent to:
• Reduce memory load.
• Accelerate subsequent machine learning algorithms.
• Implicitly remove noise from the data.
• Emphasize the essential characteristics of the data.
• Help to find patterns in data (or motifs).
9/27
33. I. Time Series Representations 1
I used TS representations for:
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
34. I. Time Series Representations 1
I used TS representations for:
• Dimensionality reduction (curse of dimensionality),
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
35. I. Time Series Representations 1
I used TS representations for:
• Dimensionality reduction (curse of dimensionality),
• Emphasising the main characteristics of data,
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
36. I. Time Series Representations 1
I used TS representations for:
• Dimensionality reduction (curse of dimensionality),
• Emphasising the main characteristics of data,
• More accurate clustering of consumers TS to create more
predictable (forecastable) groups of aggregated TS of
electricity consumption.
1
Laurinec P., Lucká M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
39. TSrepr
TSrepr - CRAN2, GitHub3
• R package for time series representations computing
• Large amount of various methods are implemented
• Several useful support functions are also included
• Easy to extend and to use
data <- rnorm(1000)
repr_paa(data, func = median, q = 10)
2
https://CRAN.R-project.org/package=TSrepr
3
https://github.com/PetoLau/TSrepr/
15/27
40. All type of time series representations methods are implemented, so far these:
• PAA - Piecewise Aggregate Approximation ( repr_paa )
• DWT - Discrete Wavelet Transform ( repr_dwt )
• DFT - Discrete Fourier Transform ( repr_dft )
• DCT - Discrete Cosine Transform ( repr_dct )
• PIP - Perceptually Important Points ( repr_pip )
• SAX - Symbolic Aggregate Approximation ( repr_sax )
• PLA - Piecewise Linear Approximation ( repr_pla )
• Mean seasonal profile ( repr_seas_profile )
• Model-based seasonal representations based on linear model ( repr_lm )
• FeaClip - Feature extraction from clipping representation ( repr_feaclip )
Additional useful functions are implemented as:
• Windowing ( repr_windowing )
• Matrix of representations ( repr_matrix )
• Normalisation functions - z-score ( norm_z ), min-max ( norm_min_max )
16/27
41. Usage of TSrepr
mat <- "some matrix with lot of time series"
mat_reprs <- repr_matrix(mat, func = repr_lm,
args = list(method = "rlm", freq = c(48, 48*7)),
normalise = TRUE, func_norm = norm_z)
mat_reprs <- repr_matrix(mat, func = repr_feaclip,
windowing = TRUE, win_size = 48)
clustering <- kmeans(mat_reprs, 20)
17/27
44. II. Clustering Multiple Data Streams 4
Motivation:
4
https://github.com/PetoLau/ClipStream/
20/27
45. II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
4
https://github.com/PetoLau/ClipStream/
20/27
46. II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
4
https://github.com/PetoLau/ClipStream/
20/27
47. II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
• Automatic anomaly detection (anomalous consumers),
4
https://github.com/PetoLau/ClipStream/
20/27
48. II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
• Automatic anomaly detection (anomalous consumers),
• Automatic change detection.
4
https://github.com/PetoLau/ClipStream/
20/27
49. II. Clustering Multiple Data Streams 4
Motivation:
• Deal with velocity of data coming,
• Dynamic change of number of clusters,
• Automatic anomaly detection (anomalous consumers),
• Automatic change detection.
Approach:
• Take advantage of incrementality of clipped representation
(windowing),
• Fast detection of anomalous consumers from extracted features from
clipping,
• Change detection by Anderson-Darling test.
4
https://github.com/PetoLau/ClipStream/
20/27
52. III. Time Series Forecasting
Large number of methods suitable for forecasting:
• Time series analysis methods:
• ARIMA,
• Exponential smoothing,
• Theta,
23/27
53. III. Time Series Forecasting
Large number of methods suitable for forecasting:
• Time series analysis methods:
• ARIMA,
• Exponential smoothing,
• Theta,
• Regression methods:
• Linear regression, GAM,
• SVR, Gaussian process,
• Regression trees, Bagging, Random Forest, Boosting,
• Artificial Neural Networks.
23/27
54. III. Time Series Forecasting 5
Finding the most suitable forecasting methods with
clustering...
• STL+ARIMA, Exponential smoothing, Tree-based methods,
Advanced ANNs (S2S + LSTM nets).
5
https://github.com/PetoLau/TSMedianBasedEnsembleLearning/,
https://github.com/PetoLau/UnsupervisedEnsembles/,
https://github.com/PetoLau/DensityEnsembles/
24/27
55. III. Time Series Forecasting 5
Finding the most suitable forecasting methods with
clustering...
• STL+ARIMA, Exponential smoothing, Tree-based methods,
Advanced ANNs (S2S + LSTM nets).
The problem of choosing the most suitable method among the
set of methods...
Solution:
• Ensemble learning - combining forecasts.
5
https://github.com/PetoLau/TSMedianBasedEnsembleLearning/,
https://github.com/PetoLau/UnsupervisedEnsembles/,
https://github.com/PetoLau/DensityEnsembles/
24/27
56. Life after PhD
• I was happy to be hired by start-up PowereX.
• We solve problems strongly related with my thesis.
25/27
57. Life after PhD
• I was happy to be hired by start-up PowereX.
• We solve problems strongly related with my thesis.
PowereX
• P2P energy sharing - commodity and also capacity,
• Analysis of consumers smart meter data,
• Forecasting and modelling maximal load (hourly, daily,
etc.).
25/27
58. Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
26/27
59. Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
• Many times working with poor academic datasets.
26/27
60. Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
• Many times working with poor academic datasets.
Business:
• Finding real value for customers,
• Accuracy is not that important,
• Working on real rich data.
26/27
61. Differences between PhD and Business
PhD:
• Strong focus on accuracy measures - ↓ % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
• Many times working with poor academic datasets.
Business:
• Finding real value for customers,
• Accuracy is not that important,
• Working on real rich data.
But...they are also related and need each other...
26/27
63. Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
27/27
64. Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
• Implemented in TSrepr package,
27/27
65. Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
• Implemented in TSrepr package,
• PhD study is great practice before work.
27/27
66. Conclusions
TS data mining:
• TS representations are our fiends in clustering,
forecasting, classification etc.,
• Implemented in TSrepr package,
• PhD study is great practice before work.
Questions: Peter Laurinec laurinec.peter@gmail.com
Code: https://github.com/PetoLau/
More research: https://petolau.github.io/research
Blog: https://petolau.github.io
27/27