This document discusses time series forecasting techniques for multivariate and hierarchical time series data. It presents several cases involving energy consumption forecasting, sales forecasting, and freight transportation forecasting. For each case, it describes the time series data and components, discusses feature generation methods like nonparametric transformations and the Haar wavelet transform to extract features, and evaluates different forecasting models and their ability to generate consistent forecasts while respecting any hierarchical relationships in the data. The focus is on generating accurate forecasts while maintaining properties like consistency, minimizing errors, and handling complex time series structures.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
Information Matrices and Optimality Values for various Block Designstheijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
The Bayesian paradigm provides a coherent approach for quantifying uncertainty given available data and prior information. Aspects of uncertainty that arise in practice include uncertainty regarding parameters within a model, the choice of model, and propagation of uncertainty in parameters and models for predictions. In this talk I will present Bayesian approaches for addressing model uncertainty given a collection of competing models including model averaging and ensemble methods that potentially use all available models and will highlight computational challenges that arise in implementation of the paradigm.
The Delta Of An Arithmetic Asian Option Via The Pathwise MethodAnna Borisova
In the slides present a structured description of the methods that can be used to calculate the delta for an asian option. Only European options are considered. The reference list has added as the last slide. Enjoy the presentation!
Stochastic Differential Equations: Application to Pension Funds under Adverse...Marius García Meza
Shown in #MCDE2014 Magno Coloquio de Doctorantes en Economía 2014, in Mexico City, November 2014.
Rights Reserved to Escuela Superior de Economía, Instituto Politécnico Nacional
Inferring cause-effect relationships between variables is of primary importance in many sciences. In this talk, I will discuss two approaches for making valid inference on treatment effects when a large number of covariates are present. The first approach is to perform model selection and then to deliver inference based on the selected model. If the inference is made ignoring the randomness of the model selection process, then there could be severe biases in estimating the parameters of interest. While the estimation bias in an under-fitted model is well understood, I will address a lesser known bias that arises from an over-fitted model. The over-fitting bias can be eliminated through data splitting at the cost of statistical efficiency, and I will propose a repeated data splitting approach to mitigate the efficiency loss. The second approach concerns the existing methods for debiased inference. I will show that the debiasing approach is an extension of OLS to high dimensions, and that a careful bias analysis leads to an improvement to further control the bias. The comparison between these two approaches provides insights into their intrinsic bias-variance trade-off, and I will show that the debiasing approach may lose efficiency in observational studies.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
Information Matrices and Optimality Values for various Block Designstheijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
The Bayesian paradigm provides a coherent approach for quantifying uncertainty given available data and prior information. Aspects of uncertainty that arise in practice include uncertainty regarding parameters within a model, the choice of model, and propagation of uncertainty in parameters and models for predictions. In this talk I will present Bayesian approaches for addressing model uncertainty given a collection of competing models including model averaging and ensemble methods that potentially use all available models and will highlight computational challenges that arise in implementation of the paradigm.
The Delta Of An Arithmetic Asian Option Via The Pathwise MethodAnna Borisova
In the slides present a structured description of the methods that can be used to calculate the delta for an asian option. Only European options are considered. The reference list has added as the last slide. Enjoy the presentation!
Stochastic Differential Equations: Application to Pension Funds under Adverse...Marius García Meza
Shown in #MCDE2014 Magno Coloquio de Doctorantes en Economía 2014, in Mexico City, November 2014.
Rights Reserved to Escuela Superior de Economía, Instituto Politécnico Nacional
Inferring cause-effect relationships between variables is of primary importance in many sciences. In this talk, I will discuss two approaches for making valid inference on treatment effects when a large number of covariates are present. The first approach is to perform model selection and then to deliver inference based on the selected model. If the inference is made ignoring the randomness of the model selection process, then there could be severe biases in estimating the parameters of interest. While the estimation bias in an under-fitted model is well understood, I will address a lesser known bias that arises from an over-fitted model. The over-fitting bias can be eliminated through data splitting at the cost of statistical efficiency, and I will propose a repeated data splitting approach to mitigate the efficiency loss. The second approach concerns the existing methods for debiased inference. I will show that the debiasing approach is an extension of OLS to high dimensions, and that a careful bias analysis leads to an improvement to further control the bias. The comparison between these two approaches provides insights into their intrinsic bias-variance trade-off, and I will show that the debiasing approach may lose efficiency in observational studies.
Strategies for Sensor Data Aggregation in Support of Emergency ResponseMichele Weigle
Presented by Xianping Wang
Military Communications Conference (MILCOM)
October 6-8, 2014
Baltimore, MD
Xianping Wang, Aaron Walden, Michele C. Weigle and Stephan Olariu, "Strategies for Sensor Data Aggregation in Support of Emergency Response," In Proceedings of the Military Communications Conference (MILCOM). Baltimore, MD, October 2014.
I conti economici trimestrali: avanzamenti metodologici e prospettive di innovazione
Seminario
Roma, 21 aprile 2016
Istat, Aula Magna
Via Cesare Balbo, 14
We combined: low-rank tensor techniques and FFT to compute kriging, estimate variance, compute conditional covariance. We are able to solve 3D problems with very high resolution
Opening of our Deep Learning Lunch & Learn series. First session: introduction to Neural Networks, Gradient descent and backpropagation, by Pablo J. Villacorta, with a prologue by Fernando Velasco
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.2: Regression
Stochastic reaction networks (SRNs) are a particular class of continuous-time Markov chains used to model a wide range of phenomena, including biological/chemical reactions, epidemics, risk theory, queuing, and supply chain/social/multi-agents networks. In this context, we explore the efficient estimation of statistical quantities, particularly rare event probabilities, and propose two alternative importance sampling (IS) approaches [1,2] to improve the Monte Carlo (MC) estimator efficiency. The key challenge in the IS framework is to choose an appropriate change of probability measure to achieve substantial variance reduction, which often requires insights into the underlying problem. Therefore, we propose an automated approach to obtain a highly efficient path-dependent measure change based on an original connection between finding optimal IS parameters and solving a variance minimization problem via a stochastic optimal control formulation. We pursue two alternative approaches to mitigate the curse of dimensionality when solving the resulting dynamic programming problem. In the first approach [1], we propose a learning-based method to approximate the value function using a neural network, where the parameters are determined via a stochastic optimization algorithm. As an alternative, we present in [2] a dimension reduction method, based on mapping the problem to a significantly lower dimensional space via the Markovian projection (MP) idea. The output of this model reduction technique is a low dimensional SRN (potentially one dimension) that preserves the marginal distribution of the original high-dimensional SRN system. The dynamics of the projected process are obtained via a discrete $L^2$ regression. By solving a resulting projected Hamilton-Jacobi-Bellman (HJB) equation for the reduced-dimensional SRN, we get projected IS parameters, which are then mapped back to the original full-dimensional SRN system, and result in an efficient IS-MC estimator of the full-dimensional SRN. Our analysis and numerical experiments verify that both proposed IS (learning based and MP-HJB-IS) approaches substantially reduce the MC estimator’s variance, resulting in a lower computational complexity in the rare event regime than standard MC estimators. [1] Ben Hammouda, C., Ben Rached, N., and Tempone, R., and Wiechert, S. Learning-based importance sampling via stochastic optimal control for stochastic reaction net-works. Statistics and Computing 33, no. 3 (2023): 58. [2] Ben Hammouda, C., Ben Rached, N., and Tempone, R., and Wiechert, S. (2023). Automated Importance Sampling via Optimal Control for Stochastic Reaction Networks: A Markovian Projection-based Approach. To appear soon.
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
We solve a PDE with uncertain coefficients. The solution is approximated in the Karhunen Loeve/PCE basis. How to compute maximum ? frequency? probability density function? with almost linear complexity? We offer various methods.
The issues about maneuvering target track prediction were discussed in this paper. Firstly, using Kalman filter which based on current statistical model describes the state of maneuvering target motion, thereby analyzing time range of the target maneuvering occurred. Then, predict the target trajectory in real time by the improved gray prediction model. Finally, residual test and posterior variance test model accuracy, model accuracy is accurate.
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
This is a presntation of my second year intership on optimal stochastic theory and how we can apply it on some financial application then how we can solve such problems using finite differences methods!
Enjoy it !
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
1. Business intelligence III:
Feature generation and model selection
for multiscale time series forecasting
Vadim Strijov
Moscow Institute of Physics and Technology
AINL FRUCT
2016, 10 - 12 of November
1 / 68
2. Model selection for time series forecasting
The Internet of things is the world of networking devices (portables, vehicles, buildings)
embedded with sensors and software.
Environment and energy monitoring
Medical and health monitoring
Consumer support, sales monitoring
Urban management and manufacturing
2 / 68
3. Case 1. Energy consumption and price forecasting, 1-day ahead hourly
The components of multivariate time series with periodicity
Time series:
energy price,
consumption,
daytime,
temperature,
humidity,
wind force,
holiday schedule.
Periodicity:
one year seasons
(temperature,
daytime),
one week,
one day (working
day, week-end),
a holiday,
aperiodic events.
3 / 68
7. The one-day forecast: expected error is 3.1% working day, 3.7% week-end
5 10 15 20
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
x 10
4
Hours
Value
History
Forecast
The model ˆy = f(X, w) could be a linear model, neural network, deep NN, SVN, . . .
7 / 68
14. The model performance criteria and forecast errors
Stability:
the error does not change significantly following small changes in time series,
the distribution of the model parameters does not change.
Complexity:
the number of parameters (elements in superposition) is minimal,
the minimum description length principle holds the William Ockham’s rule.
Error: the residue εj = ˆyj − yj for
mean absolute error and (symmetric) mean absolute percent error
RSS =
r
j=1
ε2
j , MAPE =
1
r
r
j=1
|εj |
|yj |
, sMAPE =
1
r
r
j=1
2|εj |
|ˆyj + yj |
.
14 / 68
15. Design matrix
Forecast is a mapping from p-dimensional objects space to r-dimensional answers
space.
X∗
=
x
1×n
y
1×r
X
m×n
Y
m×r
15 / 68
16. Rolling validation
Z =
. . . . . .
xval,k
1×n
yval,k
1×r
Xtrain,k
mmin×n
Ytrain,k
mmin×r
. . . . . .
k
The rolling validation procedure
1) construct the validation vector x∗
val,k for time series of the length ∆tr as the first
row of the design matrix Z,
2) construct the rest rows of the design matrix Z for the time after tk and present it as
3) optimize model parameters w using Xtrain,k, Ytrain,k and compute
residues εk = yval,k − f(xvalk
, w) and MAPE,
4) increase k and repeat.
16 / 68
17. Case 2. Sales planning: to forecast the goods consumption
Retailers’ daily routines:
custom inventory,
calculation of optimal insurance stocks,
consumer demand forecasting.
There given historical time series of the volume off-takes: foodstuff.
Let the time series be homoscedastic: its variance is time-constant.
Minimizing the loss function one must forecast the next sample.
17 / 68
20. The performance criterion is minimum loss of money
Error functions: quadratic, linear, asymmetric.
20 / 68
21. The time series of residues and its histogram
There given historgam
H = {Xi , gi }m
i=1
and loss function
L = L(Z, X).
The optimal forecast is
˜X = arg min
Z∈{X1...Xm}
m
i=1
gi L(Z, Xi )
of this convolution.
21 / 68
26. Promotional profile extraction
Hypothesis: the shape of the profile (excluding the profile parameters) does not
depend of duration of the action.
Problem: to forecast the customer demand during the promotional action.
26 / 68
27. Forecast the residues to boost performance
1. Model f forecasts n(g) history ends ˆxf
t , ..., ˆxf
t−n(g)+1 for one sample.
2. Compute n(g) residues ˆεt, ..., ˆεt−n(g)+1 as ˆεt−k = xt−k − ˆxf
t−k.
3. Function g forecasts residues ˆεt+i ahead max(i) time-ticks.
4. Combine forecasts ˆxf ,g
t+i = ˆxf
t+i + ˆεt+i computing f for each sample ˆxf
t+i .
27 / 68
28. Case 3. Forecasting volumes of Russian railways freight transportation
Keep a hierarchical structure of time series without loosing performance
Forecast with hierarchical aggregation of
types of freight in
stations, regions, and roads,
for a day, week, month, and quarter,
counting all combinations above.
Satisfy the conditions:
minimize error,
incorporate important external factors,
respect hierarchical structure,
do not exceed physical bounds of forecast values. 28 / 68
29. The railroad map counts ∼ 78 regions, ∼ 4000 stations, and ∼ 100 rail-yards
29 / 68
30. Independent forecasts might be inconsistent with aggregated ones
Time
50 100 150 200
Timeseries
-0.4
-0.2
0
0.2
0.4
30 / 68
32. Performance of independent and reconciliated forecasts
There given link matrix S, admissible sets A, B
and independent forecasts ˆχ ∈ A, ˆχ ∈ B
to make reconciliated forecast ˆϕ subject to
consistency ˆϕ ∈ A, A = {χ ∈ Rd
| Sχ = 0},
physical limitations ˆϕ ∈ B,
precision lh(χT+1, ˆϕ) ≤ lh(χT+1, ˆχ) for any
hierarchical data χT+1 ∈ A ∩ B.
Theorem [Maria Stenina, 2014]
Given listed conditions, the projection vector
ˆϕ = χproj = arg min
χ∈A∩B
lh(χ, ˆχ)
is guaranteed to satisfy the requirements of
consistency, physical limitations and precision.
The solution of the optimization
problem ˆϕ = arg min
χ∈A∩B
χ − ˆχ 2
2
demonstrates decrease in loss for all
control samples.
0 20 40 60 80 100
−6
−5
−4
−3
−2
−1
0
x 10
6
Control point
Lt
Lt = χt − ˆϕ 2
2 − χt − ˆχ 2
2
32 / 68
33. Case 4. Internet of things, multiscale dataset for vector forecasting
Days
Energy
Max T.
Min T.
Precipitation
Wind
Humidity
Solar
Energy
Solar
τ′
τ
ti ti+1 t′
i = iτ′ t′
i+1
Each real-valued time series s = [s1, . . . , si , . . . , sT ], si = s(ti ), 0 ≤ ti ≤ tmax
is a sequence of observations of some real-valued signal s(t) with its own sampling rate τ.
33 / 68
34. To boost the forecast quality include external variables
34 / 68
35. Generate features with nonparametric transformation functions
Univariate
Formula Output dimension
√
x 1
x
√
x 1
arctan x 1
ln x 1
x ln x 1
Bivariate
Plus x1 + x2
Minus x1 − x2
Product x1 · x2
Division x1
x2
x1
√
x2
x1 ln x2
35 / 68
36. Nonparametric aggregation: sample statistics
Nonparametric transformations include basic data statistics:
Sum or average value of each row xi , i = 1, . . . , m:
φi =
n
j=1
xij , or φi =
1
n
n
j=1
xij .
Min and max values: φi = minj xij , φi = maxj xij .
Standard deviation:
φi =
1
n − 1
n
j=1
(xij − mean(xi ))2.
Data quantiles: φi = [X1, . . . , XK ], where
n
j=1
[Xk−1 < xij ≤ Xk] =
1
K
, for k = 1, . . . , K.
36 / 68
37. Nonparametric transformations: Haar’s transform
Applying Haar’s transform produces multiscale representations of the same data.
Assume that n = 2K
and init φ
(0)
i,j
= φ
(0)
i,j
= xij for j = 1, . . . , n.
To obtain coarse-graining and fine-graining of the input feature vector xi , for
k = 1, . . . , K repeat:
data averaging step
φ
(k)
i,j =
φ
(k−1)
i,2j−1 + φ
(k−1)
i,2j
2
, j = 1, . . . ,
n
2k
,
and data differencing step
φ
(k)
i,j =
φ
(k−1)
i,2j − φ
(k−1)
i,2j−1
2
, j = 1, . . . ,
n
2k
.
The resulting multiscale feature vectors are φi = [φ
(1)
i
, . . . , φ
(K)
i
] and φi = [φ
(1)
i
, . . . , φ
(K)
i
].
37 / 68
38. Monotone functions
By grow rate
Function name Formula Constraints
Linear w1x + w0
Exponential rate exp(w1x + w0) w1 > 0
Polynomial rate exp(w1 ln x + w0) w1 > 1
Sublinear
polynomial rate
exp(w1 ln x + w0) 0 < w1 < 1
Logarithmic rate w1 ln x + w0 w1 > 0
Slow convergence w0 + w1/x w1 = 0
Fast convergence w0 + w1 · exp(−x) w1 = 0
Other
Soft ReLu ln(1 + ex )
Sigmoid 1/(w0 + exp(−w1x)) w1 > 0
Softmax 1/(1 + exp(−x))
Hiberbolic tangent tanh(x)
softsign |x|
1+|x| 38 / 68
39. Collection of parametric transformation functions
Function
name
Formula Output
dim.
Num.
of args
Num.
of pars
Add constant x + w 1 1 1
Quadratic w2x2 + w1x + w0 1 1 3
Cubic w3x3 + w2x2 + w1x + w0 1 1 4
Logarithmic
sigmoid
1/(w0 + exp(−w1x)) 1 1 2
Exponent exp x 1 1 0
Normal 1
w1
√
2π
exp (x−w2)2
2w2
1
1 1 2
Multiply by
constant
x · w 1 1 1
Monomial w1xw2 1 1 2
Weibull-2 w1w2xw2−1 exp −w1xw2 1 1 2
Weibull-3 w1w2xw2−1 exp −w1(x − w3)w2 1 1 3
. . . . . . . . . . . . . . . 39 / 68
40. Parametric transformations
Optimization of the transformation function parameters b is iterative:
1. Fix the vector ˆb, collected over all the primitive functions {g}, which generate
features φ:
ˆw = arg min S w|f(w, x), y , where φ(ˆb, s) ⊆ x.
2. Optimize transformation parameters ˆb given model parameters ˆw
ˆb = arg min S b|f(ˆw, x), y .
Repeat these steps until vectors ˆw, ˆb converge.
40 / 68
41. Markup the time series, two types of marks: Up and Down
41 / 68
42. Parameters of the local models
More feature generation options:
Parameters of SSA approximation of the time series x(q).
Parameters of the FFT of each x(q).
Parameters of polynomial/spline approximation of each x(q).
42 / 68
43. Parameters of the local models: SSA
For the time series s construct the Hankel matrix with a period k and shift p, so that
for s = [s1, . . . , sT ] the matrix
H∗
=
sT . . . sT−k+1
...
...
...
sk+p . . . s1+p
sk . . . s1
, where 1 p k.
Reconstruct the regression to the first column of the matrix H∗ = [h, H] and denote its
least square parameters as the feature vector
φ(s) = arg min h − Hφ 2
2.
For the orignal feature vector x = [x(1), . . . , x(Q)] use the parameters φ(x(q)),
q = 1, . . . , Q as the features.
43 / 68
44. SSA: principal components
Compute the SVD of covariance matrix of H
1
N
H
T
H = VΛV
T
, Λ = diag(λ1, . . . , λN)
and find the principal components yj = Hvj .
−5 0 5
−6
−4
−2
0
2
4
6
8
Principal component, y1
Principalcomponent,y2
44 / 68
45. Models and features
Models:
Baseline method: ˆsi = si−1.
Multivariate linear regression (MLR) with l2-regularization. Regularization coefficient: 2
SVR with multiple output. Kernel type: RBF, p1: 2, p2: 0, γ: 0.5, λ: 4.
Feed-forward ANN with single hidden layer, size: 25
Random forest (RF). Number of trees: 25 , number of variables for each decision split: 48.
Feature combinations:
History: the standard regression-based forecast with no additional features.
SSA, Cubic, Conv, Centroids, NW: history + a particular feature.
All: all of the above, with no feature selection.
PCA and NPCA: all generation strategies with feature selection.
45 / 68
48. Case 5. Classification of human physical activity with mobile devices
3D-projection of acceleration time series to spatial axis
x = {accx (t); accy (t); accz(t)}n
t=1 → y ∈ RS
.
Slow walking
0 1 2 3 4 5
−0.5
0
0.5
1
1.5
2
Time t, s
Acceleration,x,y,z
Jogging
0 1 2 3 4
−1
0
1
2
3
4
Time t, sAcceleration,x,y,z 48 / 68
49. Local transformations for deep learning neural network
Model f = a(hN(. . . h1(x)))(w) contains autoencoders hk and softmax classifier a:
f(w, x) =
exp(a(x))
j exp(aj (x))
, a(x) = W
T
2 tanh(W
T
1 x), hk(x) = σ(Wkx + bk),
where w minimizes the error function.
Feature generation by local transformations:
parameters of SSA approximation of the time series x,
FFT of x,
parameters of polynomial/spline approximation,
could reduce complexity this model down to complexity of logistic regression.
49 / 68
50. Parameters of the local models: SSA
For time series s construct the Hankel matrix with
a period k and shift p, so that for s = [s1, . . . , sT ]
the matrix
H∗
=
sT . . . sT−k+1
...
...
...
sk+2 . . . s2
sk . . . s1
.
−5 0 5
−6
−4
−2
0
2
4
6
8
Principal component, y1
Principalcomponent,y2
Reconstruct the regression to the first column of the matrix H∗ = [h, H] and denote its
least square parameters as the feature vector
φ(s) = arg min h − Hφ 2
2.
For the original feature vector x = [x(1), . . . , x(Q)] use the parameters φ(x) as an item.
50 / 68
51. Human gate detection with time series segmentation
Find dissection of the trajectory of principal components yj = Hvj , where H is the
Hankel matrix and vj are its eigenvectors:
1
N
H
T
H = VΛV
T
, Λ = diag(λ1, . . . , λN).
51 / 68
52. Metric features: distances to the centroids of local clusters
Apply kernel trick to the time series.
1. For objects xi from X compute k-mean centroids c.
2. Use distance function ρ to combine feature vector
φi = [ρ(c1, xi ), . . . , ρ(cp, xi )] ∈ Rp
+.
52 / 68
56. Performance of the human physical activities classification model
1 2 3 4 5 6 7 8 9 10 11 12
Class labels
0
5
10
15
20
Objectsnumber
99.5% 97.8%
98.7% 98.8% 97.5%100.0%99.4% 93.4% 97.0% 99.9% 99.4% 97.2%
Mean Accuracy: 0.9823
1) walk forward
2) walk left
3) walk right
4) go upstairs
5) go downstairs
6) run forward
7) jump up and down
8) sit and fidget
9) stand
10) sleep
11) elevator up
12) elevator down
56 / 68
57. Case 6. How many parameters must be used? Relations of 24 hourly models
5 10 15 20
2
4
6
8
10
12
14
16
18
20
22
Hours
Parameters
57 / 68
58. Selection of a stable set of features of restricted size
The sample contains multicollinear χ1, χ2 and noisy χ5, χ6 features, columns of the design
matrix X. We want to select two features from six.
Stability and accuracy for a fixed complexity
The solution: χ3, χ4is an orthogonal set of features minimizing the error function.
58 / 68
59. Multicollinear features and the forecast: possible configurations
Inadequate and correlated Adequate and random
Adequate and redundant Adequate and correlated
59 / 68
60. Model parameter values with regularization
Vector-function f = f(w, X) = [f (w, x1), . . . , f (w, xm)]T
∈ Ym.
0 2 4 6 8 10
−15
−10
−5
0
5
10
15
20
25
30
Regularization, τ
Parameters,w
S(w) = f(w, X) − y 2 + γ2 w 2
0 2 4 6 8
−2
−1
0
1
2
3
Parameters sum,
i
|wi|
Parameters,w
S(w) = f(w, X) − y 2, T(w) τ
60 / 68
61. Empirical distribution of model parameters
There given a sample {w1, . . . , wK } of realizations of the m.r.v. w and an error
function S(w|D, f). Consider the set of points {sk = exp −S(wk|D, f) |k = 1, . . . , K}.
0
0.2
0.4 0
0.2
0.4
0.02
0.04
0.06
0.08
w2w1
exp−S(w)
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
w1
w2
x- and y-axis: parameters w, z-axis: exp −S(w) . 61 / 68
62. Minimize number of similar and maximize number of relevant features
Introduce a feature selection method QP(Sim, Rel) to solve the optimization problem
a∗
= arg min
a∈Bn
a
T
Qa − b
T
a,
where matrix Q ∈ Rn×n of pairwise similarities of features χi and χj is
Q = [qij ] = Sim(χi , χj ) =
Cov(χi , χj )
Var(χi )Var(χj )
and vector b ∈ Rn of feature relevances to the target is
b = [bi ] = Rel(χi ),
where elements bi equal absolute values of the sample correlation coefficient between
feature χi and the target vector y.
Number of correlated features Sim → min, number of correlated to the target Rel → max.
62 / 68
64. ECoG brain signals to forecast upper limbs’ movements
Neurotycho.org food-tracking dataset: 32 epidural electrodes
captures brain signals of the monkey (ECoG),
11 sensors track movements of the hand, contralateral to the
implant side,
experiment duration is 15 minutes,
the experimenter feeds the monkey ≈ 4.5 per minute,
ECoG and motion data were sampled at 1KHz and 120Hz,
respectively.
64 / 68
65. Weather forecasting and geo (spatio) temporal dataset
Monthly Ecosystem Respiration by M. Reichstein, GHG-Europe 65 / 68
66. Spatio temporal dataset, frequency versus time, given spatial position
Electric field measurement, the Van Allen probes by I. Zhelavskaya, Skoltech 66 / 68
67. Подробнее о вышеизложенных задачах
Авторегрессионное прогнозирование и выбор признаков
Катруца А., Стрижов В. Проблема мультиколлинеарности при выборе признаков в регрессионных задачах //
Информационные технологии, 2015, 1 : 8-18.
Нейчев Р.Г., Катруца А.М., Стрижов В. Выбор оптимального набора признаков из мультикоррелирующего множества в
задаче прогнозирования // Заводская лаборатория. Диагностика материалов, 2016, 3.
Мотренко А.П., Рудаков К.В., Стрижов В.В. Учет влияния экзогенных факторов при непараметрическом
прогнозировании временных рядов // Вестник Московского университета. Серия 15. Вычислительная математика и
кибернетика, 2016. A
Классификация временных рядов акселерометра
Ignatov A., Strijov V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer
// Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14.
Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical
and Health Informatics, 2016.
Попова М.С., Стрижов В.В. Выбор оптимальной модели классификации физической активности по измерениям
акселерометра // Информатика и ее применения, 2015, 9(1) : 79-89.
Попова М. С., Стрижов В.В. Построение сетей глубокого обучения для классификации временных рядов // Системы и
средства информатики, 2015, 25(3) : 60-77.
Гончаров А.В., Стрижов В.В. Метрическая классификация временных рядов со взвешенным выравниванием
относительно центроидов классов // Информатика и ее применения, 2016, 2.
Согласование прогнозов иерархических временных рядов
Стенина М.М., Стрижов В.В. Согласование прогнозов при решении задач прогнозирования иерархических временных
рядов // Информатика и ее применения, 2015, 9(2) : 77-89.
Стенина М., Стрижов В. Согласование агрегированных и детализированных прогнозов при решении задач
непараметрического прогнозирования // Системы и средства информатики, 2014, 24(2) : 21-34.
67 / 68
68. Model selection for time series forecasting
Thanks to the Chair of Intelligent Systems, MIPT
Anastasia Motrenko
Mikhail Kuznetsov
Alexandr Aduenko
Arsentiy Kuzmin
Maria Stenina
Alexandr Katrutsa
Oleg Bakhteev
Maria Popova
Andrey Kulunchakov
Mikhail Karasikov
Radoslav Neychev
Alexey Goncharov
Roman Isachenko
Maria Vladimirova
http://machinelearning.ru
strijov@phystech.edu
68 / 68