This document provides an overview of computational intelligence methods for time series prediction. It begins with introductions to time series analysis and machine learning approaches for prediction. Specific models discussed include autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) processes. Parameter estimation techniques for AR models are also covered. The document outlines applications in areas like forecasting, wireless sensors, and biomedicine and concludes with perspectives on future directions.
A Monte Carlo strategy for structure multiple-step-head time series predictionGianluca Bontempi
The document proposes a Monte Carlo approach called SMC (Structured Monte Carlo) for multiple-step-ahead time series forecasting that takes into account the structural dependencies between predictions. It generates samples using a direct forecasting approach and weights them based on how well they satisfy dependencies identified by an iterated approach. Experiments on three benchmark datasets show the SMC approach achieves more accurate forecasts as measured by SMAPE than iterated, direct, or other comparison methods for most prediction horizons tested.
Machine Learning Strategies for Time Series PredictionGianluca Bontempi
This document introduces machine learning strategies for time series prediction. It begins with an introduction to the speaker and his background and research interests. It then provides an outline of the topics to be covered, including notions of time series, machine learning approaches for prediction, local learning methods, forecasting techniques, and applications and future directions. The document discusses what the audience should know coming into the course and what they will learn.
Local modeling in regression and time series predictionGianluca Bontempi
The document discusses global modeling versus local modeling approaches for regression and time series prediction problems. Global modeling fits a single analytical function to all input data, while local modeling performs separate fits to subsets of nearby data points. The document outlines the local modeling approach using lazy learning, which stores all training data and performs local fits when making predictions for new query points. It then applies lazy learning techniques to problems in regression, time series prediction, and feature selection.
This document provides an introduction and outline for a machine learning summer school session on machine learning strategies for time series prediction. It introduces the speaker, Gianluca Bontempi, and his background in machine learning. It then discusses what attendees should know and expect to learn, including foundations of statistical machine learning and strategies for forecasting. The outline presents topics that will be covered, including notions of time series, machine learning for prediction, local learning approaches, forecasting techniques, applications, and future directions.
This document introduces the plm package for panel data econometrics in R. It discusses panel data models and estimation approaches, describes the software approach and functions in plm, and compares plm to other R packages for longitudinal data analysis. Key features of plm include functions for estimating linear panel models with fixed effects, random effects, and GMM, as well as data management and diagnostic testing capabilities for panel data. The package aims to make panel data analysis intuitive and straightforward for econometricians in R.
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
In this talk, we present a CNN architecture for predicting autoregressive asynchronous time series. We illustrate its application on predicting traders’ quotes of credit default swaps (proprietary dataset from Hellebore Capital), and on artificial time series. The paper is available there: http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf
A Comparison between FPPSO and B&B Algorithm for Solving Integer Programming ...Editor IJCATR
This document summarizes and compares two algorithms for solving integer programming problems: branch and bound (B&B) and an improved version of the flower pollination algorithm combined with particle swarm optimization (FPPSO). B&B is commonly used but has high computational costs, while FPPSO is able to obtain optimal results faster than traditional methods like B&B. The FPPSO combines the flower pollination algorithm with particle swarm optimization to improve search accuracy for integer programming problems.
This document discusses queuing theory, simulation analysis, and computer simulation. It provides definitions and examples of queuing theory concepts like single queuing nodes and Kendall's notation. It outlines the advantages and disadvantages of simulation analysis. Simulation models are defined as mathematical models that calculate the impact of uncertain inputs on outcomes. Computer simulation is described as using a computer model to represent the dynamic responses of a real system.
A Monte Carlo strategy for structure multiple-step-head time series predictionGianluca Bontempi
The document proposes a Monte Carlo approach called SMC (Structured Monte Carlo) for multiple-step-ahead time series forecasting that takes into account the structural dependencies between predictions. It generates samples using a direct forecasting approach and weights them based on how well they satisfy dependencies identified by an iterated approach. Experiments on three benchmark datasets show the SMC approach achieves more accurate forecasts as measured by SMAPE than iterated, direct, or other comparison methods for most prediction horizons tested.
Machine Learning Strategies for Time Series PredictionGianluca Bontempi
This document introduces machine learning strategies for time series prediction. It begins with an introduction to the speaker and his background and research interests. It then provides an outline of the topics to be covered, including notions of time series, machine learning approaches for prediction, local learning methods, forecasting techniques, and applications and future directions. The document discusses what the audience should know coming into the course and what they will learn.
Local modeling in regression and time series predictionGianluca Bontempi
The document discusses global modeling versus local modeling approaches for regression and time series prediction problems. Global modeling fits a single analytical function to all input data, while local modeling performs separate fits to subsets of nearby data points. The document outlines the local modeling approach using lazy learning, which stores all training data and performs local fits when making predictions for new query points. It then applies lazy learning techniques to problems in regression, time series prediction, and feature selection.
This document provides an introduction and outline for a machine learning summer school session on machine learning strategies for time series prediction. It introduces the speaker, Gianluca Bontempi, and his background in machine learning. It then discusses what attendees should know and expect to learn, including foundations of statistical machine learning and strategies for forecasting. The outline presents topics that will be covered, including notions of time series, machine learning for prediction, local learning approaches, forecasting techniques, applications, and future directions.
This document introduces the plm package for panel data econometrics in R. It discusses panel data models and estimation approaches, describes the software approach and functions in plm, and compares plm to other R packages for longitudinal data analysis. Key features of plm include functions for estimating linear panel models with fixed effects, random effects, and GMM, as well as data management and diagnostic testing capabilities for panel data. The package aims to make panel data analysis intuitive and straightforward for econometricians in R.
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
In this talk, we present a CNN architecture for predicting autoregressive asynchronous time series. We illustrate its application on predicting traders’ quotes of credit default swaps (proprietary dataset from Hellebore Capital), and on artificial time series. The paper is available there: http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf
A Comparison between FPPSO and B&B Algorithm for Solving Integer Programming ...Editor IJCATR
This document summarizes and compares two algorithms for solving integer programming problems: branch and bound (B&B) and an improved version of the flower pollination algorithm combined with particle swarm optimization (FPPSO). B&B is commonly used but has high computational costs, while FPPSO is able to obtain optimal results faster than traditional methods like B&B. The FPPSO combines the flower pollination algorithm with particle swarm optimization to improve search accuracy for integer programming problems.
This document discusses queuing theory, simulation analysis, and computer simulation. It provides definitions and examples of queuing theory concepts like single queuing nodes and Kendall's notation. It outlines the advantages and disadvantages of simulation analysis. Simulation models are defined as mathematical models that calculate the impact of uncertain inputs on outcomes. Computer simulation is described as using a computer model to represent the dynamic responses of a real system.
The document discusses the development of an intelligent system using case-based reasoning to predict customer profiles and the risk of fraud or delinquency. It motivates the goals of the project, reviews relevant machine learning techniques like decision trees and k-nearest neighbors, describes implementing the techniques in Ruby, tests the system on several datasets, and discusses improving the system in the future with additional data. The system is able to accurately predict customer risk levels in experiments, but the author notes limitations with the available data.
Nigeria aviation industry drifting in turbulent watersDung Rwang Pam
DRIFTING IN TURBULENT WATERS!
AVIATION INDUSTRY 2004 OVERVIEW
On a global perspective, the aviation industry is just on the verge of initiating
a recovery. The fallout of September 11 2001 is still resonating in the
background. The war in Iraq and SARS has had their toll and the ripple effects
cannot altogether be avoided. Finally, the fuel crisis has robbed the industry of
profitability in 2004.
Just as a constitution is the final guiding document of any jurisprudent society.
So is the civil aviation policy the lighthouse towards which the Nation’s stakeholders
should be moving towards. The minimum ICAO standards and
recommended practices will form the benchmark for determining how the
Nigerian aviation industry has fared this year. This will enable the readers to be
the true final assessors of the journey so far.
In giving a fair appraisal, it is necessary to x-ray the component parts of industry.
Nigeria Civil Aviation Authority (NCAA)
Open skies and the International Aviation Safety Assessment.
In 26th August 2000, the Nigerian government signed a provisional open skies agreement with the USA with the expectation that the NCAA will be able to
achieve the IASA (International Aviation Safety Assessment) category 1
certification soon after.
Under the leadership of the current DG, the regulatory body has made
spirited efforts to ensure that it meets the minimum ICAO safety oversight
requirements. Simply put, we needed to prove to ICAO and the world that
we satisfy ALL the following five requirements:
1. The country has laws or regulations necessary to support the certification
and oversight of air carriers in accordance with minimum international
standards;
2. The NCAA has the technical expertise, resources, and organization to
license or oversee air carrier operations;
3. The NCAA has adequately trained and qualified technical personnel;
4. The NCAA has provided adequate inspector guidance to ensure enforcement
of, and compliance with, minimum international standards; and
5. The NCAA has sufficient documentation and records of certification and
adequate continuing oversight and surveillance of air carrier operations.
More than four years thereafter, and despite the efforts of the NCAA, this
certification has eluded us. This means that any aircraft on the Nigerian register is
not safe enough to fly to the USA, because it has not undergone the minimum safe
certification process.
70 illegal aerodromes, airstrips, helipads operating in Nigeria
In the ministerial brief of December 2004,The Minister admitted that “ the high
powered ministerial committee set up on the monitoring and control of the
private airports in the country" has discovered more than 70 aerodromes, airstrips
and heli-pads operating illegally across Nigeria without license and control or
supervision.
This obviously means the system of continuous surveillance of air operations by
the NCAA is grossly inadequate.
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)Dung Rwang Pam
Excerpts of the discourse between 'Leadership Magazine' editor and the Chair governing council of the Nigeria aviation safety initiative (NASI) Capt. Dung Rwang Pam over the state of the Nigerian aviation industry. Rather long winded, but you be the judge..
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...Dung Rwang Pam
After over 40 years of aviation we have the following to show.
1. A national carrier that cost us over $3billion to run and
ruin. Finally, it generated so much embarrassment and
national shame we could not bear to rescue or redeem
it. Two cancers, gross mismanagement and persistent
adverse government interference
2. About 42 indigenous airlines mostly famous for being
infamous; none of which is a world class airline. The
current acid test is the IATA (international air transport
association) operational safety audit (IOSA). Not a single
maintenance facility capable of C checks for any
commercial jet. Thus, most of the aeroplanes that arrive
in Nigeria have an average lifespan of 5 years, after
which they become cadavers defacing our airports or of
late they are being put to good use by aluminium
kitchen utensil makers. Not a single commercial
simulator facility to cater for recurrent pilot training
locally. The Nigeria College of aviation technology
(NCAT) Zaria is currently in critical condition.
3. 0ver 21 Airports with infra structure we’re unable to
maintain. How many of them have the required ICAO
ARFF (international civil aviation organisation Airport
4
Rescue and Fire Fighting) capability to support the
aircraft their runways were designed for? Of course only
five of them have proven economically viability, though
cows have attempted to take over Port Harcourt and Jos
airports at some point in time. The Nigeria
meteorological agency (NIMET) is still unable to provide
our operational airports with 24 hour reliable weather
reports and forecasts.
4. A regulatory system shackled with bureaucracy and
devoid of international credibility; it’s only hope being
Nigeria’s strategic location, large population and the
potential of being the hub in the West African subregion.
Any aircraft registered in Nigeria if at all allowed
to fly even to outside the continent is often subjected to
humiliating detailed spot checks. For these and other
reasons, most leasors will not allow their aircraft to be
registered in Nigeria even if operated in Nigeria.
5. A history of fatal accidents with attendant loss of lives
has earned us a place as the runner up (coming behind
the DR of Congo) on the continent with the worst safety
record globally. Within the last one year, our score
amounts to nearly one casualty every day.
Safe and sustainable aviation in africa; alignment of policies, regulation an...Dung Rwang Pam
Aviation is considered a vital tool for economic development in Africa. This becomes more critical considering the level of surface transport development across the continent. It generated around 450,000 jobs and contributed more than $10 billion USD to Africa’s GDP in 2007 (ATAG) . While air transport plays an important role in itself, its main role is to facilitate economic activity. Unfortunately, the region has suffered a history of high airline failure rates, poor infrastructure and an accident rate that is 8 times the global average. A major challenge now facing the continent is the lack of sustainable levels of the requisite, skilled workforce at all levels. This is necessary to steer the course of both governance and industry. The global community through various government and non-governmental agencies has proffered a plethora of initiatives and interventions, designed to redress the situation. However, the successes recorded through these efforts have been marginal.
It is time Africa learns from its past mistakes and focuses on achieving safe, sustainable, reliable and efficient air travel. It must be supported by sound infrastructure and concern for the environment. If the continent is at all serious about aviation, urgent steps must be put in place. These must include the following criteria which must to be strategically laid out in a detailed policy and supported by legal processes that will aid successful implementation.
• Governments must be transparent, accountable and guided by democratic principles.
• Transformational leadership should result in social and political stability that will create the suitable environment for regional economic integration.
• This integration will be easier to achieve if the region aligns its aviation policies and regulations to optimise the workforce available.
• All member States must pool resources to invest in infrastructure, aircraft acquisitions, fuel purchase-agreements and workforce training.
• Africa must understand that all infrastructure or equipment procured will need to be entrusted into the hands of a competent and skilled workforce if the industry is to achieve its objectives.
• Aviation professionals in the region must be proactive and visible.
The airlines should consider strategic commercial agreements and mergers to benefit from possible cost synergies.
• Safety and economic benefits will accrue from having a single African sky, a fly Africa policy and one Multi-lateral Air Service Agreement between Africa and Indian Ocean region and the rest of the world.
Creating human capital takes time; lost time is irretrievable. The region is running out of both time and human capital and the competition is not waiting.
Aviation is an industry of continental strategic importance to Africa. Africa depends mostly on air transport to link people with each other and the rest of the world at large. A safe, secure and efficient aviation Industry is crucial to support t
This document presents a general framework for enhancing time series prediction performance. It discusses using multiple predictions from a base method like neural networks, ARIMA or Holt-Winters to improve accuracy. Short-term enhancement uses support vector regression on statistic and reliability features of the multiple predictions to enhance 1-step ahead predictions. Long-term enhancement trains additional models on the short-term predictions to enhance longer-horizon predictions. The framework is evaluated on traffic flow data with prediction horizons of 1 week and 13 weeks.
Time series analysis examines patterns in data over time. It relies on identifying trends, measuring past patterns to forecast the future, and decomposing time series into four main components: secular trends, cyclical movements, seasonal variations, and irregular variations. Secular trends represent long-term direction, while cyclical and seasonal variations have recurring patterns over different time scales. Various techniques can depict trends and identify variations, including freehand drawing, semi-averages, moving averages, least squares, and exponential smoothing.
This document provides an overview of time series analysis and forecasting techniques. It discusses key concepts such as stationary and non-stationary time series, additive and multiplicative models, smoothing methods like moving averages and exponential smoothing, autoregressive (AR), moving average (MA) and autoregressive integrated moving average (ARIMA) models. The document uses examples to illustrate how to identify patterns in time series data and select appropriate models for description, explanation and forecasting of time series.
Extending and integrating a hybrid knowledge representation system into the c...Valentina Rho
This document summarizes Valentina Rho's thesis on extending a hybrid knowledge representation system called Dual-PECCS into the cognitive architecture ACT-R. Dual-PECCS represents concepts using both classical and typical information based on dual-process theory. The objectives were (a) extending Dual-PECCS and (b) integrating it into ACT-R. Dual-PECCS was translated into ACT-R chunks and a new action allows accessing its subsystems. Experiments using riddles showed the integrated system had similar accuracy to humans in conceptual categorization and representation proxyfication. Future work includes improving generalization in the typical system and integrating Dual-PECCS into other architectures.
This document provides an introduction to exploring and visualizing data using the R programming language. It discusses the history and development of R, introduces key R packages like tidyverse and ggplot2 for data analysis and visualization, and provides examples of reading data, examining data structures, and creating basic plots and histograms. It also demonstrates more advanced ggplot2 concepts like faceting, mapping variables to aesthetics, using different geoms, and combining multiple geoms in a single plot.
How to win data science competitions with Deep LearningSri Ambati
This document summarizes a presentation about how to win data science competitions using deep learning with H2O. It discusses H2O's architecture and capabilities for deep learning. It then demonstrates live modeling on Kaggle competitions, providing step-by-step explanations of building and evaluating deep learning models on three different datasets - an African soil properties prediction challenge, a display advertising challenge, and a Higgs boson machine learning challenge. It concludes with tips and tricks for deep learning with H2O and an invitation to the H2O World conference.
This document discusses applying data mining techniques to analyze active users on Reddit. It defines active users as those who posted or commented in at least 5 subreddits and have at least 5 posts/comments in each subreddit. The preprocessing steps extract over 25,000 active users and their posts from the raw Reddit data. K-means clustering is then used to cluster the active users into 10 groups based on their activities to gain insights into different types of active users on Reddit.
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
Video: https://www.youtube.com/watch?v=R3IXd1iwqjc
Meetup: http://www.meetup.com/SF-Bay-ACM/events/231709894/
In this talk, Arno Candel presents a brief history of AI and how Deep Learning and Machine Learning techniques are transforming our everyday lives. Arno will introduce H2O, a scalable open-source machine learning platform, and show live demos on how to train sophisticated machine learning models on large distributed datasets. He will show how data scientists and application developers can use the Flow GUI, R, Python, Java, Scala, JavaScript and JSON to build smarter applications, and how to take them to production. He will present customer use cases from verticals including insurance, fraud, churn, fintech, and marketing.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Some resources how to navigate in the hardware space in order to build your own workstation for training deep learning models.
Alternative download link: https://www.dropbox.com/s/o7cwla30xtf9r74/deepLearning_buildComputer.pdf?dl=0
Deep learning - Conceptual understanding and applicationsBuhwan Jeong
This document provides an overview of deep learning, including conceptual understanding and applications. It defines deep learning as a deep and wide artificial neural network. It describes key concepts in artificial neural networks like signal transmission between neurons, graphical models, linear/logistic regression, weights/biases/activation, and backpropagation. It also discusses popular deep learning applications and techniques like speech recognition, natural language processing, computer vision, representation learning using restricted Boltzmann machines and autoencoders, and deep network architectures.
This document summarizes a master's thesis that implemented a continuous sequential importance resampling (CSIR) algorithm to estimate predictive densities in stochastic volatility (SV) models. The thesis began with an introduction to relevant econometrics concepts. It then explained SV models and particle filtering approaches. The thesis described implementing and testing functions to develop an R package for CSIR estimation in SV models. Diagnostics and parameter estimates from simulated and real stock return data were reported. The thesis concluded by discussing the package's applications and potential for future development.
This document provides information about a computational stochastic processes course, including lecture details, prerequisites, syllabus, and examples. The key points are:
- Lectures will cover Monte Carlo simulation, stochastic differential equations, Markov chain Monte Carlo methods, and inference for stochastic processes.
- Prerequisites include probability, stochastic processes, and programming.
- Assessments will include a coursework and exam. The coursework will involve computational problems in Python, Julia, R, or similar languages.
- Motivating examples discussed include using Monte Carlo methods to evaluate high-dimensional integrals and simulating Langevin dynamics in statistical physics.
The document discusses the development of an intelligent system using case-based reasoning to predict customer profiles and the risk of fraud or delinquency. It motivates the goals of the project, reviews relevant machine learning techniques like decision trees and k-nearest neighbors, describes implementing the techniques in Ruby, tests the system on several datasets, and discusses improving the system in the future with additional data. The system is able to accurately predict customer risk levels in experiments, but the author notes limitations with the available data.
Nigeria aviation industry drifting in turbulent watersDung Rwang Pam
DRIFTING IN TURBULENT WATERS!
AVIATION INDUSTRY 2004 OVERVIEW
On a global perspective, the aviation industry is just on the verge of initiating
a recovery. The fallout of September 11 2001 is still resonating in the
background. The war in Iraq and SARS has had their toll and the ripple effects
cannot altogether be avoided. Finally, the fuel crisis has robbed the industry of
profitability in 2004.
Just as a constitution is the final guiding document of any jurisprudent society.
So is the civil aviation policy the lighthouse towards which the Nation’s stakeholders
should be moving towards. The minimum ICAO standards and
recommended practices will form the benchmark for determining how the
Nigerian aviation industry has fared this year. This will enable the readers to be
the true final assessors of the journey so far.
In giving a fair appraisal, it is necessary to x-ray the component parts of industry.
Nigeria Civil Aviation Authority (NCAA)
Open skies and the International Aviation Safety Assessment.
In 26th August 2000, the Nigerian government signed a provisional open skies agreement with the USA with the expectation that the NCAA will be able to
achieve the IASA (International Aviation Safety Assessment) category 1
certification soon after.
Under the leadership of the current DG, the regulatory body has made
spirited efforts to ensure that it meets the minimum ICAO safety oversight
requirements. Simply put, we needed to prove to ICAO and the world that
we satisfy ALL the following five requirements:
1. The country has laws or regulations necessary to support the certification
and oversight of air carriers in accordance with minimum international
standards;
2. The NCAA has the technical expertise, resources, and organization to
license or oversee air carrier operations;
3. The NCAA has adequately trained and qualified technical personnel;
4. The NCAA has provided adequate inspector guidance to ensure enforcement
of, and compliance with, minimum international standards; and
5. The NCAA has sufficient documentation and records of certification and
adequate continuing oversight and surveillance of air carrier operations.
More than four years thereafter, and despite the efforts of the NCAA, this
certification has eluded us. This means that any aircraft on the Nigerian register is
not safe enough to fly to the USA, because it has not undergone the minimum safe
certification process.
70 illegal aerodromes, airstrips, helipads operating in Nigeria
In the ministerial brief of December 2004,The Minister admitted that “ the high
powered ministerial committee set up on the monitoring and control of the
private airports in the country" has discovered more than 70 aerodromes, airstrips
and heli-pads operating illegally across Nigeria without license and control or
supervision.
This obviously means the system of continuous surveillance of air operations by
the NCAA is grossly inadequate.
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)Dung Rwang Pam
Excerpts of the discourse between 'Leadership Magazine' editor and the Chair governing council of the Nigeria aviation safety initiative (NASI) Capt. Dung Rwang Pam over the state of the Nigerian aviation industry. Rather long winded, but you be the judge..
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...Dung Rwang Pam
After over 40 years of aviation we have the following to show.
1. A national carrier that cost us over $3billion to run and
ruin. Finally, it generated so much embarrassment and
national shame we could not bear to rescue or redeem
it. Two cancers, gross mismanagement and persistent
adverse government interference
2. About 42 indigenous airlines mostly famous for being
infamous; none of which is a world class airline. The
current acid test is the IATA (international air transport
association) operational safety audit (IOSA). Not a single
maintenance facility capable of C checks for any
commercial jet. Thus, most of the aeroplanes that arrive
in Nigeria have an average lifespan of 5 years, after
which they become cadavers defacing our airports or of
late they are being put to good use by aluminium
kitchen utensil makers. Not a single commercial
simulator facility to cater for recurrent pilot training
locally. The Nigeria College of aviation technology
(NCAT) Zaria is currently in critical condition.
3. 0ver 21 Airports with infra structure we’re unable to
maintain. How many of them have the required ICAO
ARFF (international civil aviation organisation Airport
4
Rescue and Fire Fighting) capability to support the
aircraft their runways were designed for? Of course only
five of them have proven economically viability, though
cows have attempted to take over Port Harcourt and Jos
airports at some point in time. The Nigeria
meteorological agency (NIMET) is still unable to provide
our operational airports with 24 hour reliable weather
reports and forecasts.
4. A regulatory system shackled with bureaucracy and
devoid of international credibility; it’s only hope being
Nigeria’s strategic location, large population and the
potential of being the hub in the West African subregion.
Any aircraft registered in Nigeria if at all allowed
to fly even to outside the continent is often subjected to
humiliating detailed spot checks. For these and other
reasons, most leasors will not allow their aircraft to be
registered in Nigeria even if operated in Nigeria.
5. A history of fatal accidents with attendant loss of lives
has earned us a place as the runner up (coming behind
the DR of Congo) on the continent with the worst safety
record globally. Within the last one year, our score
amounts to nearly one casualty every day.
Safe and sustainable aviation in africa; alignment of policies, regulation an...Dung Rwang Pam
Aviation is considered a vital tool for economic development in Africa. This becomes more critical considering the level of surface transport development across the continent. It generated around 450,000 jobs and contributed more than $10 billion USD to Africa’s GDP in 2007 (ATAG) . While air transport plays an important role in itself, its main role is to facilitate economic activity. Unfortunately, the region has suffered a history of high airline failure rates, poor infrastructure and an accident rate that is 8 times the global average. A major challenge now facing the continent is the lack of sustainable levels of the requisite, skilled workforce at all levels. This is necessary to steer the course of both governance and industry. The global community through various government and non-governmental agencies has proffered a plethora of initiatives and interventions, designed to redress the situation. However, the successes recorded through these efforts have been marginal.
It is time Africa learns from its past mistakes and focuses on achieving safe, sustainable, reliable and efficient air travel. It must be supported by sound infrastructure and concern for the environment. If the continent is at all serious about aviation, urgent steps must be put in place. These must include the following criteria which must to be strategically laid out in a detailed policy and supported by legal processes that will aid successful implementation.
• Governments must be transparent, accountable and guided by democratic principles.
• Transformational leadership should result in social and political stability that will create the suitable environment for regional economic integration.
• This integration will be easier to achieve if the region aligns its aviation policies and regulations to optimise the workforce available.
• All member States must pool resources to invest in infrastructure, aircraft acquisitions, fuel purchase-agreements and workforce training.
• Africa must understand that all infrastructure or equipment procured will need to be entrusted into the hands of a competent and skilled workforce if the industry is to achieve its objectives.
• Aviation professionals in the region must be proactive and visible.
The airlines should consider strategic commercial agreements and mergers to benefit from possible cost synergies.
• Safety and economic benefits will accrue from having a single African sky, a fly Africa policy and one Multi-lateral Air Service Agreement between Africa and Indian Ocean region and the rest of the world.
Creating human capital takes time; lost time is irretrievable. The region is running out of both time and human capital and the competition is not waiting.
Aviation is an industry of continental strategic importance to Africa. Africa depends mostly on air transport to link people with each other and the rest of the world at large. A safe, secure and efficient aviation Industry is crucial to support t
This document presents a general framework for enhancing time series prediction performance. It discusses using multiple predictions from a base method like neural networks, ARIMA or Holt-Winters to improve accuracy. Short-term enhancement uses support vector regression on statistic and reliability features of the multiple predictions to enhance 1-step ahead predictions. Long-term enhancement trains additional models on the short-term predictions to enhance longer-horizon predictions. The framework is evaluated on traffic flow data with prediction horizons of 1 week and 13 weeks.
Time series analysis examines patterns in data over time. It relies on identifying trends, measuring past patterns to forecast the future, and decomposing time series into four main components: secular trends, cyclical movements, seasonal variations, and irregular variations. Secular trends represent long-term direction, while cyclical and seasonal variations have recurring patterns over different time scales. Various techniques can depict trends and identify variations, including freehand drawing, semi-averages, moving averages, least squares, and exponential smoothing.
This document provides an overview of time series analysis and forecasting techniques. It discusses key concepts such as stationary and non-stationary time series, additive and multiplicative models, smoothing methods like moving averages and exponential smoothing, autoregressive (AR), moving average (MA) and autoregressive integrated moving average (ARIMA) models. The document uses examples to illustrate how to identify patterns in time series data and select appropriate models for description, explanation and forecasting of time series.
Extending and integrating a hybrid knowledge representation system into the c...Valentina Rho
This document summarizes Valentina Rho's thesis on extending a hybrid knowledge representation system called Dual-PECCS into the cognitive architecture ACT-R. Dual-PECCS represents concepts using both classical and typical information based on dual-process theory. The objectives were (a) extending Dual-PECCS and (b) integrating it into ACT-R. Dual-PECCS was translated into ACT-R chunks and a new action allows accessing its subsystems. Experiments using riddles showed the integrated system had similar accuracy to humans in conceptual categorization and representation proxyfication. Future work includes improving generalization in the typical system and integrating Dual-PECCS into other architectures.
This document provides an introduction to exploring and visualizing data using the R programming language. It discusses the history and development of R, introduces key R packages like tidyverse and ggplot2 for data analysis and visualization, and provides examples of reading data, examining data structures, and creating basic plots and histograms. It also demonstrates more advanced ggplot2 concepts like faceting, mapping variables to aesthetics, using different geoms, and combining multiple geoms in a single plot.
How to win data science competitions with Deep LearningSri Ambati
This document summarizes a presentation about how to win data science competitions using deep learning with H2O. It discusses H2O's architecture and capabilities for deep learning. It then demonstrates live modeling on Kaggle competitions, providing step-by-step explanations of building and evaluating deep learning models on three different datasets - an African soil properties prediction challenge, a display advertising challenge, and a Higgs boson machine learning challenge. It concludes with tips and tricks for deep learning with H2O and an invitation to the H2O World conference.
This document discusses applying data mining techniques to analyze active users on Reddit. It defines active users as those who posted or commented in at least 5 subreddits and have at least 5 posts/comments in each subreddit. The preprocessing steps extract over 25,000 active users and their posts from the raw Reddit data. K-means clustering is then used to cluster the active users into 10 groups based on their activities to gain insights into different types of active users on Reddit.
Transform your Business with AI, Deep Learning and Machine LearningSri Ambati
Video: https://www.youtube.com/watch?v=R3IXd1iwqjc
Meetup: http://www.meetup.com/SF-Bay-ACM/events/231709894/
In this talk, Arno Candel presents a brief history of AI and how Deep Learning and Machine Learning techniques are transforming our everyday lives. Arno will introduce H2O, a scalable open-source machine learning platform, and show live demos on how to train sophisticated machine learning models on large distributed datasets. He will show how data scientists and application developers can use the Flow GUI, R, Python, Java, Scala, JavaScript and JSON to build smarter applications, and how to take them to production. He will present customer use cases from verticals including insurance, fraud, churn, fintech, and marketing.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Some resources how to navigate in the hardware space in order to build your own workstation for training deep learning models.
Alternative download link: https://www.dropbox.com/s/o7cwla30xtf9r74/deepLearning_buildComputer.pdf?dl=0
Deep learning - Conceptual understanding and applicationsBuhwan Jeong
This document provides an overview of deep learning, including conceptual understanding and applications. It defines deep learning as a deep and wide artificial neural network. It describes key concepts in artificial neural networks like signal transmission between neurons, graphical models, linear/logistic regression, weights/biases/activation, and backpropagation. It also discusses popular deep learning applications and techniques like speech recognition, natural language processing, computer vision, representation learning using restricted Boltzmann machines and autoencoders, and deep network architectures.
This document summarizes a master's thesis that implemented a continuous sequential importance resampling (CSIR) algorithm to estimate predictive densities in stochastic volatility (SV) models. The thesis began with an introduction to relevant econometrics concepts. It then explained SV models and particle filtering approaches. The thesis described implementing and testing functions to develop an R package for CSIR estimation in SV models. Diagnostics and parameter estimates from simulated and real stock return data were reported. The thesis concluded by discussing the package's applications and potential for future development.
This document provides information about a computational stochastic processes course, including lecture details, prerequisites, syllabus, and examples. The key points are:
- Lectures will cover Monte Carlo simulation, stochastic differential equations, Markov chain Monte Carlo methods, and inference for stochastic processes.
- Prerequisites include probability, stochastic processes, and programming.
- Assessments will include a coursework and exam. The coursework will involve computational problems in Python, Julia, R, or similar languages.
- Motivating examples discussed include using Monte Carlo methods to evaluate high-dimensional integrals and simulating Langevin dynamics in statistical physics.
Chaotic Secure Communication Using Iterated Filtering Method P. Karthik -Assistant Professor,
D. Gokul Prashanth -UG Scholar,
T. Gokul - UG Scholar,
Department of Electronics and Communication Engineering,
SNS College of Engineering, Coimbatore, India.
The document discusses sampling theory and analog-to-digital conversion. It begins by explaining that most real-world signals are analog but must be converted to digital for processing. There are three steps: sampling, quantization, and coding. Sampling converts a continuous-time signal to a discrete-time signal by taking samples at regular intervals. The sampling theorem states that the sampling frequency must be at least twice the highest frequency of the sampled signal to avoid aliasing. Finally, it provides an example showing how to calculate the minimum sampling rate, or Nyquist rate, given the highest frequency of a signal.
Signals and Systems is an introduction to analog and digital signal processing, a topic that forms an integral part of engineering systems in many diverse areas, including seismic data processing, communications, speech processing, image processing, defense electronics, consumer electronics, and consumer products.
Presentation for the Burstiness satellite at CCS'16 AmsterdamMikko Kivelä
Slides for the presentation "Estimating inter-event times distributions from finite observation periods" which I gave at the Burstiness satellite at the Conference on Complex Systems 2016 Amsterdam. See https://burstinesssatellite.wordpress.com
This document provides an overview of an advanced digital signal processing lecture. It discusses pre-requisites for the course including basic signals and communications knowledge and MATLAB proficiency. It outlines the course structure, including chapters covered, textbook references, and assessment breakdown. Key concepts from the first lecture are summarized such as characterizing signals as continuous or discrete, common signal representations including exponentials and sinusoids, and introducing linear time-invariant systems.
Implementation of adaptive stft algorithm for lfm signalseSAT Journals
Abstract
Normally Time-Frequency analysis is done by sliding a window through the time domain data and computing the Fourier
Transform of the data within the window. The choice of the window length determines whether specular or resonant information
will be emphasized. A narrow window will isolate specular reflections but will not be wide enough to accommodate the slowly
varying global resonances; a wide window cannot temporally separate resonance and specular information. So we will adapt
window length according to changes in frequencies. In this case we are realizing the specifications of Linear Frequency
Modulation (LFM) signal.
Index Terms—LFM, FFT, DFT, STFT and ASTFT.
This lecture covers signal and systems analysis, including:
1) Definitions of signals, systems, and their properties like time-invariance, linearity, stability, causality, and memory.
2) Classification of signals as continuous-time vs discrete-time, analog vs digital, deterministic vs random, periodic vs aperiodic.
3) Concepts of orthogonality, correlation, autocorrelation as they relate to signal comparison.
4) Review of the Fourier series and Fourier transform as tools to represent signals in the frequency domain.
Communication Systems_B.P. Lathi and Zhi Ding (Lecture No 4-9)Adnan Zafar
Lecture No 4: https://youtu.be/E3QT55J9uWs
Lecture No 5: https://youtu.be/pb7GdbcLnI0
Lecture No 6: https://youtu.be/aFXr1ufTF7Q
Lecture No 7: https://youtu.be/1Yt6ZCKhcYg
Lecture No 8: https://youtu.be/I8UWw3DC19Y
Lecture No 9: https://youtu.be/zRKFi3dotEc
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsCynthia Freeman
The document discusses various methods for anomaly detection in time series data. It begins by defining time series and anomalies, noting that anomaly detection is challenging due to issues like lack of labeled data and data imbalance. It then covers characteristics of time series like seasonality, trends, and concept drift, and how to detect them. Various anomaly detection methods are outlined, including STL, SARIMA, Prophet, Gaussian processes, and RNNs. Evaluation methods and factors to consider in choosing a detection method are also discussed. The document provides an overview of approaches to determining the optimal anomaly detection model for a given time series and application.
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
This document discusses phase estimation in quantum computing. It begins by introducing quantum Fourier transforms and how they are important for algorithms like Shor's algorithm. It then describes the phase estimation algorithm in detail, including how it uses two registers to estimate the phase of a quantum state and how the inverse quantum Fourier transform improves this estimate. Simulation results are presented that show the probability distribution of the estimated phase converging to the true value and how the probability of success increases with more qubits while computational costs rise polynomially. The paper concludes that the optimal number of qubits balances high success probability and low costs for phase estimation.
This document compares different methods for disaggregating low frequency economic time series data into higher frequency data: Chow-Lin (static model), Fernandez (static model), Litterman (static model), and Santo Silvacardoso (dynamic model). The Chow-Lin, Fernandez, and Litterman models are static, while Santo Silvacardoso uses a dynamic regression model. The models were used to disaggregate annual private consumption expenditure data into monthly data. Results showed that all methods produced high correlation between original and disaggregated data annually. At the monthly level, Santo Silvacardoso performed best with the lowest standard deviation, while Litterman performed worst.
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcsitconf
A quantum computation problem is discussed in this paper. Many new features that make
quantum computation superior to classical computation can be attributed to quantum coherence
effect, which depends on the phase of quantum coherent state. Quantum Fourier transform
algorithm, the most commonly used algorithm, is introduced. And one of its most important
applications, phase estimation of quantum state based on quantum Fourier transform, is
presented in details. The flow of phase estimation algorithm and the quantum circuit model are
shown. And the error of the output phase value, as well as the probability of measurement, is
analysed. The probability distribution of the measuring result of phase value is presented and
the computational efficiency is discussed.
COMPUTATIONAL PERFORMANCE OF QUANTUM PHASE ESTIMATION ALGORITHMcscpconf
A quantum computation problem is discussed in this paper. Many new features that make quantum computation superior to classical computation can be attributed to quantum coherence
effect, which depends on the phase of quantum coherent state. Quantum Fourier transform algorithm, the most commonly used algorithm, is introduced. And one of its most important
applications, phase estimation of quantum state based on quantum Fourier transform, is presented in details. The flow of phase estimation algorithm and the quantum circuit model are
shown. And the error of the output phase value, as well as the probability of measurement, is analysed. The probability distribution of the measuring result of phase value is presented and the computational efficiency is discussed.
This document describes a collapsed dynamic factor analysis model for macroeconomic forecasting. It summarizes that multivariate time series models can more accurately capture relationships between economic variables compared to univariate models. The document then presents a collapsed dynamic factor model that relates a target time series (yt) to unobserved dynamic factors (Ft) estimated from related macroeconomic data (gt). Out-of-sample forecasting experiments on US personal income and industrial production data demonstrate the model achieves more accurate point forecasts than univariate benchmarks like random walk or AR(2) models.
This document describes a collapsed dynamic factor analysis model for macroeconomic forecasting. It summarizes that multivariate time series models can more accurately capture relationships between economic variables compared to univariate models. The document then presents a collapsed dynamic factor model that relates a target time series (yt) to unobserved dynamic factors (Ft) estimated from related macroeconomic data (gt). Out-of-sample forecasting experiments on US personal income and industrial production data demonstrate the model achieves more accurate point forecasts than univariate benchmarks like random walk or AR(2) models.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
This document discusses the Fast Fourier Transform (FFT) and its implementation in MATLAB. It begins by introducing signals and their representation in both the time and frequency domains. It then discusses the Fourier transform and its ability to convert between these domains. The discrete Fourier transform (DFT) and how it can be used to analyze discrete, finite-length signals is also introduced. Finally, it describes how the FFT provides an efficient algorithm for computing the DFT, reducing the computation time from O(N^2) to O(NlogN).
Similar to Computational Intelligence for Time Series Prediction (20)
A statistical criterion for reducing indeterminacy in linear causal modelingGianluca Bontempi
This document proposes a new statistical criterion called C to help distinguish between causal patterns in completely connected triplets when inferring causal relationships from observational data. The criterion is based on differences in values of the term S, which is derived from the covariance matrix, between different causal hypotheses. This criterion informs an algorithm called RC that incorporates both relevance and causal measures to iteratively select variables. Experiments on linear and nonlinear networks show RC has higher accuracy than other algorithms at inferring network structure. The criterion C and RC algorithm help address challenges of causal inference from complex data where dependencies are frequent.
Adaptive model selection in Wireless Sensor NetworksGianluca Bontempi
This document discusses challenges in using wireless sensor networks for environmental monitoring applications. It notes that sensor nodes have limited energy, which poses challenges for applications that need to run for months or years. The document describes the hardware capabilities of wireless sensor nodes and their energy consumption during different operating modes. It also provides an overview of using machine learning models to model sensor measurements over time and across sensors, with the goal of reducing energy usage through adaptive model selection.
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi
This document discusses approaches to feature selection for machine learning models, specifically comparing global versus local modeling techniques. It proposes combining lazy learning, racing, and subsampling for effective feature selection. Lazy learning uses local linear models for prediction rather than global nonlinear models, improving computational efficiency when many predictions are needed. Racing and subsampling allow efficient evaluation of feature subsets during wrapper-based feature selection by discarding poor-performing subsets early based on statistical tests of performance on subsets of the data. Experimental results are said to validate this combined approach for feature selection.
A model-based relevance estimation approach for feature selection in microarr...Gianluca Bontempi
This document presents a model-based approach for estimating feature relevance for feature selection in microarray datasets. It aims to provide an unbiased relevance estimation between filter and wrapper methods. The approach combines a low-bias k-nearest neighbor cross-validation error estimator with either a direct probability model estimator or a mutual information filter estimator to reduce variance. Experimental results on 20 public microarray datasets compare the proposed combined estimators to a support vector machine wrapper approach.
This document discusses using feature selection techniques to address the curse of dimensionality in microarray data analysis. It presents the problem of having many more features than samples in bioinformatics tasks like cancer classification and network inference. It describes filter, wrapper and embedded feature selection approaches and proposes a blocking strategy that uses multiple learning algorithms to evaluate feature subsets in order to improve selection robustness when samples are limited. Finally, it lists several microarray gene expression datasets that are commonly used to evaluate feature selection methods.
THM1: Formalizing a problem as a prediction problem is often the most important contribution of a data scientist.
THM2: A predictor is an estimator, i.e. an algorithm which takes data and returns a prediction. Reality is stochastic, so data and predictions are stochastic.
THM3: Learning is challenging since data must be used both to create prediction models and to assess them. Bias and variance must be balanced to achieve good generalization.
FP7 evaluation & selection: the point of view of an evaluatorGianluca Bontempi
The document discusses the process of evaluating proposals for EU funding as an EU evaluator. It begins by introducing the author's expertise and background evaluating FP6 and FP7 proposals. It then outlines the evaluation process, which involves individual evaluation of assigned proposals followed by consensus building and panel evaluation. Key aspects covered include managing conflicts of interest, maintaining confidentiality, and adhering to a code of conduct. The evaluation criteria for integrated projects focus on relevance to program objectives, potential impact, scientific and technological excellence, quality of consortium, and quality of management.
This document discusses feature selection methods for causal inference in bioinformatics. It describes how relevance and causality differ, with relevant features not always being causal. Information theory concepts like mutual information, conditional mutual information, and interaction information are introduced to quantify dependence and independence between variables. The min-Interaction Max-Relevance (mIMR) filter method is proposed to select features based on both relevance to the target and minimal interaction, approximating causal relationships. Experimental results on breast cancer gene expression datasets show mIMR outperforms conventional ranking in predictive performance, identifying a potential causal signature for survival.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Computational Intelligence for Time Series Prediction
1. Computational Intelligence Methods
for Time Series Prediction
Second European Business Intelligence Summer
School (eBISS 2012)
Gianluca Bontempi
Département d’Informatique
Boulevard de Triomphe - CP 212
http://www.ulb.ac.be/di
Computational Intelligence Methods for Prediction – p. 1/83
2. Outline
• Introduction (5 mins)
• Notions of time series (30 mins)
• Machine learning for prediction (60 mins)
• bias/variance
• parametric and structural identification
• validation
• model selection
• feature selection
• Local learning
• Forecasting: one-step and multi-step-ahed (30 mins)
• Some applications (30 mins)
• time series competitions
• chaotic time series
• wireless sensor
• biomedical
• Future directions and perspectives (15 mins)
Computational Intelligence Methods for Prediction – p. 2/83
3. ULB Machine Learning Group (MLG)
• 8 researchers (2 prof, 5 PhD students, 4 postdocs).
• Research topics: Knowledge discovery from data, Classification, Computational
statistics, Data mining, Regression, Time series prediction, Sensor networks,
Bioinformatics, Network inference.
• Computing facilities: high-performing cluster for analysis of massive datasets,
Wireless Sensor Lab.
• Website: mlg.ulb.ac.be.
• Scientific collaborations in ULB: Bioinformatique des génomes et des réseaux
(IBMM), CENOLI (Sciences), Microarray Unit (Hopital Jules Bordet), Laboratoire
de Médecine experimentale, Laboratoire d’Anatomie, Biomécanique et
Organogénèse (LABO), Service d’Anesthesie (ERASME).
• Scientific collaborations outside ULB: Harvard Dana Farber (US), UCL Machine
Learning Group (B), Politecnico di Milano (I), Universitá del Sannio (I), Helsinki
Institute of Technology (FIN).
Computational Intelligence Methods for Prediction – p. 3/83
4. ULB-MLG: recent projects
1. Adaptive real-time machine learning for credit card fraud detection
2. Discovery of the molecular pathways regulating pancreatic beta cell dysfunction
and apoptosis in diabetes using functional genomics and bioinformatics: ARC
(2010-2015)
3. ICT4REHAB - Advanced ICT Platform for Rehabilitation (2011-2013)
4. Integrating experimental and theoretical approaches to decipher the molecular
networks of nitrogen utilisation in yeast: ARC (2006-2010).
5. TANIA - Système d’aide à la conduite de l’anesthésie. WALEO II project funded
by the Région Wallonne (2006-2010)
6. "COMP2
SYS" (COMPutational intelligence methods for COMPlex SYStems)
MARIE CURIE Early Stage Research Training funded by the EU (2004-2008).
Computational Intelligence Methods for Prediction – p. 4/83
5. Time series
Definition A time series is a sequence of observations, usually ordered in
time.
Examples of time series
• Weather variables, like temperature, pressure
• Economic factors.
• Traffic.
• Activity of business.
• Electric load, power consumption.
• Financial index.
• Voltage.
Computational Intelligence Methods for Prediction – p. 5/83
6. Why studying time series?
There are various reasons:
Prediction of the future based on the past.
Control of the process producing the series.
Understanding of the mechanism generating the series.
Description of the salient features of the series.
Computational Intelligence Methods for Prediction – p. 6/83
7. Univariate discrete time series
• Quantities, like temperature and voltage, change in a continuous way.
• In practice, however, the digital recording is made discretely in time.
• We shall confine ourselves to discrete time series (which however take
continuous values).
• Moreover we will consider univariate time series, where one type of
measurement is made repeatedly on the same object or individual.
Computational Intelligence Methods for Prediction – p. 7/83
8. A general model
Let an observed discrete univariate time series be y1, . . . , yT . This means
that we have T numbers which are observations on some variable made at T
equally distant time points, which for convenience we label 1, 2, . . . , T.
A fairly general model for the time series can be written
yt = g(t) + ϕt t = 1, . . . , T
The observed series is made of two components
Systematic part: g(t), also called signal or trend, which is a determistic
function of time
Stochastic sequence: a residual term ϕt, also called noise, which follows a
probability law.
Computational Intelligence Methods for Prediction – p. 8/83
9. Types of variation
Traditional methods of time-series analysis are mainly concerned with
decomposing the variation of a series yt into:
Trend : this is a long-term change in the mean level, eg. an increasing trend.
Seasonal effect : many time series (sale figures, temperature readings) exhibit
variation which is seasonal (e.g. annual) in period. The measure and the
removal of such variation brings to deseasonalized data.
Irregular fluctuations : after trend and cyclic variations have been removed
from a set of data, we are left with a series of residuals, which may or
may not be completely random.
We will assume here that once we have detrended and deseasonalized the
series, we can still extract information about the dependency between the
past and the future. Henceforth ϕt will denote the detrended and
deseasonalized series.
Computational Intelligence Methods for Prediction – p. 9/83
11. Probability and dependency
• Forecasting a time series is possible since future depends on the past or
analogously because there is a relationship between the future and the
past. However this relation is not deterministic and can hardly be written
in an analytical form.
• An effective way to describe a nondeterministic relation between two
variables is provided by the probability formalism.
• Consider two continuous random variables ϕ1 and ϕ2 representing for
instance the temperature today and tomorrow. We tend to believe that
ϕ1 could be used as a predictor of ϕ2 with some degree of uncertainty.
• The stochastic dependency between ϕ1 and ϕ2 is resumed by the joint
density p(ϕ1, ϕ2) or equivalently by the conditional probability
p(ϕ2|ϕ1) =
p(ϕ1, ϕ2)
p(ϕ1)
• If p(ϕ2|ϕ1) = p(ϕ2) then ϕ1 and ϕ2 are not independent or equivalently
the knowledge of the value of ϕ1 reduces the uncertainty about ϕ2.
Computational Intelligence Methods for Prediction – p. 11/83
12. Stochastic processes
• A discrete-time stochastic process is a collection of random variables ϕt,
t = 1, . . . , T defined by a joint density
p(ϕ1, . . . , ϕT )
• Statistical time-series analysis is concerned with evaluating the
properties of the probability model which generated the observed time
series.
• Statistical time-series modeling is concerned with inferring the
properties of the probability model which generated the observed time
series from a limited set of observations.
Computational Intelligence Methods for Prediction – p. 12/83
13. Strictly stationary processes
• Definition A stochastic process is said to be strictly stationary if the joint
distribution of ϕt1
, ϕt2
, . . . , ϕtn
is the same as the joint distribution of
ϕt1+k, ϕt2+k, . . . , ϕtn+k for all n, t1, . . . , tn and k.
• In other words shifting the time origin by an amount k has no effect on
the joint distribution which depends only on the intervals between
t1, . . . , tn.
• This implies that the distribution of ϕt is the same for all t.
• The definition holds for any value of n.
• Let us see what does it mean in practice for n = 1 and n = 2.
Computational Intelligence Methods for Prediction – p. 13/83
14. Properties
n=1 : If ϕt is strictly stationary and its first two moments are finite, we have
µt = µ σ2
t = σ2
n=2 : Furthermore the autocovariance function γ(t1, t2) depends only on the
lag k = t2 − t1 and may be written by
γ(k) = Cov[ϕt, ϕt+k]
In order to avoid scaling effects, it is useful to introduce the
autocorrelation function
ρ(k) =
γ(k)
σ2
=
γ(k)
γ(0)
Computational Intelligence Methods for Prediction – p. 14/83
15. Weak stationarity
• A less restricted definition of stationarity concerns only the first two
moments of ϕt
Definition A process is called second-order stationary or weakly stationary
if its mean is constant and its autocovariance function depends only on
the lag.
• No assumptions are made about higher moments than those of second
order.
• Strict stationarity implies weak stationarity.
• In the special case of normal processes, weak stationarity implies strict
stationarity.
Definition A process is called normal is the joint distribution of
ϕt1
, ϕt2
, . . . , ϕtn
is multivariate normal for all t1, . . . , tn.
Computational Intelligence Methods for Prediction – p. 15/83
16. Purely random processes
• It consists of a sequence of random variables ϕt which are mutually
independent and identically distributed. For each t and k
p(ϕt+k|ϕt) = p(ϕt+k)
• It follows that this process has constant mean and variance. Also
γ(k) = Cov[ϕt, ϕt+k] = 0
for k = ±1, ±2, . . . .
• A purely random process is strictly stationary.
• A purely random process is sometimes called white noise particularly by
engineers.
Computational Intelligence Methods for Prediction – p. 16/83
17. Example: Gaussian purely random
0 10 20 30 40 50 60 70 80 90 100
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Gaussian purely random
Computational Intelligence Methods for Prediction – p. 17/83
18. Example: Uniform purely random
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Uniform purely random
Computational Intelligence Methods for Prediction – p. 18/83
19. Random walk
• Suppose that wt is a discrete, purely random process with mean µ and
variance σ2
w.
• A process ϕt is said to be a random walk if
ϕt = ϕt−1 + wt
• If ϕ0 = 0 then
ϕt =
t
i=1
wi
• E[ϕt] = tµ and Var [ϕt] = tσ2
w.
• As the mean and variance change with t the process is non-stationary.
Computational Intelligence Methods for Prediction – p. 19/83
20. Random walk (II)
• The first differences of a random walk given by
∇ϕt = ϕt − ϕt−1
form a purely random process, which is stationary.
• The best-known examples of time series which behave like random
walks are share prices on successive days.
Computational Intelligence Methods for Prediction – p. 20/83
21. Ten random walks
Let w ∼ N(0, 1).
0 50 100 150 200 250 300 350 400 450 500
−40
−30
−20
−10
0
10
20
30
40
50
60
Random walks
Computational Intelligence Methods for Prediction – p. 21/83
22. Autoregressive processes
Suppose that wt is a purely random process with mean zero and variance
σ2
w. A process ϕt is said to be an autoregressive process of order p (also an
AR(p) process) if
ϕt = α1ϕt−1 + · · · + αpϕt−p + wt
Note that this is like a multiple regression model where ϕ is regressed not on
independent variables but on its past values (hence the prefix “auto”).
Computational Intelligence Methods for Prediction – p. 22/83
23. First order AR(1) process
If p = 1, we have the so-called Markov process AR(1)
ϕt = αϕt−1 + wt
By substitution it can be shown that
ϕt = α(αϕt−2 + wt−1) + wt = α2
(αϕt−3 + wt−2) + αwt−1 + wt =
= wt + αwt−1 + α2
wt−2 + . . .
Then
E[ϕt] = 0 Var [zt] = σ2
w(1 + α2
+ α4
+ . . . )
Then if |α| < 1 the variance if finite and equals
Var [ϕt] = σ2
ϕ = σ2
w/(1 − α2
)
and the autocorrelation is
ρ(k) = αk
k = 0, . . . , 1, 2
Computational Intelligence Methods for Prediction – p. 23/83
24. General order AR(p) process
We can find again the duality between AR and infinite-order MA. By using the
B operator, the AR(p) process is
(1 − α1B − · · · − αpBp
)ϕt = zt
or equivalently
ϕt = zt/(1 − α1B − · · · − αpBp
) = f(B)zt
where
f(B) = (1 − α1B − · · · − αpBp
)−1
= (1 + β1B + β2B2
+ . . . )
It has been shown that condition necessary and sufficient for the stationarity
is that the roots of the equation
φ(B) = 1 − α1B − · · · − αpBp
= 0
lie outside the unit circle.
Computational Intelligence Methods for Prediction – p. 24/83
25. Autocorrelation in AR(q)
Unlike the autocorrelation function in MA(q) which cuts off at lag q, the
autocorrelation of an AR(q) attenuates slowly.
0 10 20 30 40 50 60 70 80
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
AR(16) correlation (absolute value)
Computational Intelligence Methods for Prediction – p. 25/83
26. Fitting an autoregressive process
The estimation of an autoregressive process to a set of data
DT = {ϕ1, . . . , ϕT } demands the resolution of two problems:
1. The estimation of the order p of the process.
2. The estimation of the set of parameters {α1, . . . , αp}.
Computational Intelligence Methods for Prediction – p. 26/83
27. Estimation of AR(p) parameters
Suppose we have an AR(p) process of order p
ϕt = α1ϕt−1 + · · · + αpϕt−p + wt
Given T observations, the parameters may be estimated by least-squares by
minimizing
ˆα = arg min
α
T
t=p+1
[ϕt − α1ϕt−1 + · · · + αpϕt−p]
2
In matrix form this amounts to solve the multiple least-squares problem where
Computational Intelligence Methods for Prediction – p. 27/83
28. Estimation of AR(p) parameters (II)
X =
ϕT −1 ϕT −2 . . . ϕT −p−1
ϕT −2 ϕT −3 . . . ϕT −p−2
...
...
...
...
ϕp ϕp−1 . . . ϕ1
Y =
ϕT
ϕT −1
...
ϕp+1
(1)
Most of the considerations made for multiple linear regression still hold in this
case.
Computational Intelligence Methods for Prediction – p. 28/83
29. The NAR representation
• AR models assume that the relation between past and future is linear
• Once we assume that the linear assumption does not hold, we may
extend the AR formulation to a Nonlinear Auto Regressive (NAR)
formulation
ϕt
= f ϕt−d
, ϕt−d−1
, . . . , ϕt−d−n+1
+ w(t)
where the missing information is lumped into a noise term w.
• In what follows we will consider this relationship as a particular instance
of a dependence
y = f(x) + w
between a multidimensional input x ∈ X ⊂ Rn
and a scalar output y ∈ R.
Computational Intelligence Methods for Prediction – p. 29/83
30. Nonlinear vs. linear
• AR models assume that the relation between past and future is linear
• The advantage of linear models are numerous:
• the least-squares ˆβ estimate can be expressed in an analytical form
• the least-squares ˆβ estimate can be easily calculated through matrix
computation.
• statistical properties of the estimator can be easily defined.
• recursive formulation for sequential updating are avaialble.
• the relation between empirical and generalization error is known.
• Unfortunately in real problem it is extremely unlikely that the input and
ouput variables are linked by a linear relation.
• Moreover, the form of the relation is often unknown and only a limited
amount of samples is available.
Computational Intelligence Methods for Prediction – p. 30/83
31. Supervised learning
TRAINING
DATASET
UNKNOWN
DEPENDENCY
INPUT OUTPUT
ERROR
PREDICTION
MODEL
PREDICTION
The prediction problem is also known as the supervised learning problem,
because of the presence of the outcome variable which guides the learning
process.
Collecting a set of training data is analogous to the situation where a teacher
suggests the correct answer for each input configuration.
Computational Intelligence Methods for Prediction – p. 31/83
32. The regression plus noise form
• A typical way of representing the unknown input/output relation is the
regression plus noise form
y = f(x) + w
where f(·) is a deterministic function and the term w represents the
noise or random error. It is typically assumed that w is independent of x
and E[w] = 0.
• Suppose that we have available a training set { xi, yi : i = 1, . . . , N},
where xi = (xi1, . . . , xin), generated according to the previous model.
• The goal of a learning procedure is to find a model h(x) which is able to
give a good approximation of the unknown function f(x).
• But how to choose h,if we do not know the probability distribution
underlying the data and we have only a limited training set?
Computational Intelligence Methods for Prediction – p. 32/83
33. A simple example
−2 −1 0 1 2
−5051015
x
Y
Computational Intelligence Methods for Prediction – p. 33/83
34. Model 1
−2 −1 0 1 2
−5051015
x
Y
Training error= 2 degree= 1
Computational Intelligence Methods for Prediction – p. 34/83
35. Model 2
−2 −1 0 1 2
−5051015
x
Y
Training error= 0.92 degree= 3
Computational Intelligence Methods for Prediction – p. 35/83
36. Model 3
−2 −1 0 1 2
−5051015
x
Y
Training error= 0.4 degree= 18
Computational Intelligence Methods for Prediction – p. 36/83
37. Generalization and overfitting
• How to estimate the quality of a model? Is the training error a good
measure of the quality?
• The goal of learning is to find a model which is able to generalize, i.e. able
to return good predictions for input sets independent of the training set.
• In a nonlinear setting, it is possible to find models with such a complicate
structure that they have a null empirical risk. Are these models good?
• Typically NOT. Since doing very well on the training set could mean
doing badly on new data.
• This is the phenomenon of overfitting.
• Using the same data for training a model and assessing it is typically a
wrong procedure, since this returns an over optimistic assessment of the
model generalization capability.
Computational Intelligence Methods for Prediction – p. 37/83
38. Bias and variance of a model
A result of estimation theory shows that the mean-squared-error, i.e. a
measure of the generalization quality of an estimator can be decomposed
into three terms:
MSE = σ2
w + squared bias + variance
where the intrinsic noise term reflects the target alone, the bias reflects the
target’s relation with the learning algorithm and the variance term reflects the
learning algorithm alone.
This result is theoretical since these quantities cannot be measured on the
basis of a finite amount of data. However, this result provide insight about
what makes accurate a learning process.
Computational Intelligence Methods for Prediction – p. 38/83
39. The bias/variance trade-off
• The first term is the variance of y around its true mean f(x) and cannot
be avoided no matter how well we estimate f(x), unless σ2
w = 0.
• The bias measures the difference in x between the average of the
outputs of the hypothesis functions over the set of possible DN and the
regression function value f(x)
• The variance reflects the variability of the guessed h(x, αN ) as one
varies over training sets of fixed dimension N. This quantity measures
how sensitive the algorithm is to changes in the data set, regardless to
the target.
Computational Intelligence Methods for Prediction – p. 39/83
40. The bias/variance dilemma
• The designer of a learning machine has not access to the term MSE but
can only estimate it on the basis of the training set. Hence, the
bias/variance decomposition is relevant in practical learning since it
provides a useful hint about the features to control in order to make the
error MSE small.
• The bias term measures the lack of representational power of the class
of hypotheses. To reduce the bias term we should consider complex
hypotheses which can approximate a large number of input/output
mappings.
• The variance term warns us against an excessive complexity of the
approximator. This means that a class of too powerful hypotheses runs
the risk of being excessively sensitive to the noise affecting the training
set; therefore, our class could contain the target but it could be
practically impossible to find it out on the basis of the available dataset.
Computational Intelligence Methods for Prediction – p. 40/83
41. • In other terms, it is commonly said that an hypothesis with large bias but
low variance underfits the data while an hypothesis with low bias but
large variance overfits the data.
• In both cases, the hypothesis gives a poor representation of the target
and a reasonable trade-off needs to be found.
• The task of the model designer is to search for the optimal trade-off
between the variance and the bias term, on the basis of the available
training set.
Computational Intelligence Methods for Prediction – p. 41/83
43. The learning procedure
A learning procedure aims at two main goals:
1. to choose a parametric family of hypothesis h(x, α) which contains or
gives good approximation of the unknown function f (structural
identification).
2. within the family h(x, α), to estimate on the basis of the training set DN
the parameter αN which best approximates f (parametric identification).
In order to accomplish that, a learning procedure is made of two nested
loops:
1. an external structural identification loop which goes through different
model structures
2. an inner parametric identification loop which searches for the best
parameter vector within the family structure.
Computational Intelligence Methods for Prediction – p. 43/83
44. Parametric identification
The parametric identification of the hypothesis is done according to ERM
(Empirical Risk Minimization) principle where
αN = α(DN ) = arg min
α∈Λ
MISEemp(α)
minimizes the empirical risk or training error
MISEemp(α) =
N
i=1 (yi − h(xi, α))
2
N
constructed on the basis of the data set DN .
Computational Intelligence Methods for Prediction – p. 44/83
45. Parametric identification (II)
• The computation of αN requires a procedure of multivariate optimization
in the space of parameters.
• The complexity of the optimization depends on the form of h(·).
• In some cases the parametric identification problem may be an NP-hard
problem.
• Thus, we must resort to some form of heuristic search.
• Examples of parametric identification procedure are linear least-squares
for linear models and backpropagated gradient-descent for feedforward
neural networks.
Computational Intelligence Methods for Prediction – p. 45/83
46. Validation techniques
How to measure MISE in a reliable way on a finite dataset? The most
common techniques to return an estimate MISE are
Testing: a testing sequence independent of DN and distributed according to
the same probability distribution is used to assess the quality. In
practice, unfortunately, an additional set of input/output observations is
rarely available.
Holdout: The holdout method, sometimes called test sample estimation,
partitions the data DN into two mutually exclusive subsets, the training
set Dtr and the holdout or test set DNts .
k-fold Cross-validation: the set DN is randomly divided into k mutually
exclusive test partitions of approximately equal size. The cases not
found in each test partition are independently used for selecting the
hypothesis which will be tested on the partition itself. The average error
over all the k partitions is the cross-validated error rate.
Computational Intelligence Methods for Prediction – p. 46/83
47. The K-fold cross-validation
This is the algorithm in detail:
1. split the dataset DN into k roughly equal-sized parts.
2. For the kth part k = 1, . . . , K, fit the model to the other K − 1 parts of the
data, and calculate the prediction error of the fitted model when
predicting the k-th part of the data.
3. Do the above for k = 1, . . . , K and average the K estimates of prediction
error.
Let k(i) be the part of DN containing the ith sample. Then the
cross-validation estimate of the MISE prediction error is
MISECV =
1
N
N
i=1
(yi − ˆy
−k(i)
i )2
=
1
N
N
i=1
yi − h(xi, α−k(i)
)
2
where ˆy
−k(i)
i denotes the fitted value for the ith observation returned by the
model estimated with the k(i)th part of the data removed.
Computational Intelligence Methods for Prediction – p. 47/83
48. 10-fold cross-validation
K = 10: at each iteration 90% of data are used for training and the remaining
10% for the test.
90%
10%
Computational Intelligence Methods for Prediction – p. 48/83
49. Leave-one-out cross validation
• The cross-validation algorithm where K = N is also called the
leave-one-out algorithm.
• This means that for each ith sample, i = 1, . . . , N,
1. we carry out the parametric identification, leaving that observation
out of the training set,
2. we compute the predicted value for the ith observation, denoted by
ˆy−i
i
The corresponding estimate of the MISE prediction error is
MISELOO =
1
N
N
i=1
(yi − ˆy−i
i )2
=
1
N
N
i=1
(yi − h xi, α−i
)
2
where α−i
is the set of parameters returned by the parametric identification
perfomed on the training set with the ith sample set aside.
Computational Intelligence Methods for Prediction – p. 49/83
50. Model selection
• Model selection concerns the final choice of the model structure in the
set that has been proposed by model generation and assessed by
model validation.
• In real problems, this choice is typically a subjective issue and is often
the result of a compromise between different factors, like the quantitative
measures, the personal experience of the designer and the effort
required to implement a particular model in practice.
• Here we will consider only quantitative criteria. Two are the possible
approaches:
1. the winner-takes-all approach
2. the combination of estimators approach.
Computational Intelligence Methods for Prediction – p. 50/83
51. Model selection
N
REALIZATION
STOCHASTIC
PROCESS
VALIDATION
CLASSES of HYPOTHESIS
LEARNED MODEL
TRAINING
SET
MODEL SELECTION
PARAMETRIC IDENTIFICATION
IDENTIFICATION
,,
, ,
,
,
STRUCTURAL
α
?
ΛSΛ2Λ1
GN
1
α
1
N
α
1
N
α
2
N
α
2
N
αN
s
αN
s
GN
2
GN
S
GN
2
GN
1
GN
S
Computational Intelligence Methods for Prediction – p. 51/83
52. Winner-takes-all
The best hypothesis is selected in the set {αs
N }, with s = 1, . . . , S, according
to
˜s = arg min
s=1,...,S
MISE
s
A model with complexity ˜s is trained on the whole dataset DN and used for
future predictions.
Computational Intelligence Methods for Prediction – p. 52/83
53. Winner-takes-all pseudo-code
1. for s = 1, . . . , S: (Structural loop)
• for j = 1, . . . , N
(a) Inner parametric identification (for l-o-o):
αs
N−1 = arg min
α∈Λs
i=1:N,i=j
(yi − h(xi, α))2
(b) ej = yj − h(xj, αs
N−1)
• MISELOO(s) = 1
N
N
j=1 e2
j
2. Model selection: ˜s = arg mins=1,...,S MISELOO(s)
3. Final parametric identification:
α˜s
N = arg minα∈Λ˜s
N
i=1(yi − h(xi, α))2
4. The output prediction model is h(·, α˜s
N )
Computational Intelligence Methods for Prediction – p. 53/83
54. Model combination
• The winner-takes-all approach is intuitively the approach which should
work the best.
• However, recent results in machine learning show that the performance
of the final model can be improved not by choosing the model structure
which is expected to predict the best but by creating a model whose
output is the combination of the output of models having different
structures.
• The reason is that in reality any chosen hypothesis h(·, αN ) is only an
estimate of the real target and, like any estimate, is affected by a bias
and a variance term.
• Theoretical results on the combination of estimators show that the
combination of unbiased estimators leads an unbiased estimator with
reduced variance.
• This principle is at the basis of approaches like bagging or boosting.
Computational Intelligence Methods for Prediction – p. 54/83
55. Local modeling procedure
The learning of a local model in xq ∈ Rn
can be summarized in these steps:
1. Compute the distance between the query xq and the training samples
according to a predefined metric.
2. Rank the neighbors on the basis of their distance to the query.
3. Select a subset of the k nearest neighbors according to the bandwidth
which measures the size of the neighborhood.
4. Fit a local model (e.g. constant, linear,...).
Each of the local approaches has one or more structural (or smoothing)
parameters that control the amount of smoothing performed.
Let us focus on the bandwidth selection.
Computational Intelligence Methods for Prediction – p. 55/83
56. The bandwidth trade-off: overfit
e
q
0011
0011
0011
01
01
0011
01
0011
0011
00
00
11
11 0
0
1
1
0011
0011
00001111
01
00001111
0011
01
0011
00001111
0011
0011
000000000000000000000000000000000000000000011111111111111111111111111111111111111111110
00
00
0
00
00
000000000000000000000
1
11
11
1
11
11
111111111111111111111
x
y
0011
0011
0011
01
01
0011
01
0011
0011
00
00
11
11 0
0
1
1
0011
000
111
00001111
01
00001111
0011
01
0011
00001111
0011
000000
111111
0011
000000000000000000000000000000000000000000011111111111111111111111111111111111111111110
00
00
0
00
00
000000000000000000000
1
11
11
1
11
11
111111111111111111111
x
y
Too narrow bandwidth ⇒ overfitting ⇒ large prediction error e.
In terms of bias/variance trade-off, this is typically a situation of high variance.
Computational Intelligence Methods for Prediction – p. 56/83
57. The bandwidth trade-off: underfit
e
q
0011
0011
0011
01
01
0011
01
0011
0011
00
00
11
11 0
0
1
1
0011
00001111
01
00001111
0011
01
0011
00001111
0011
0011
000000000000000000000000000000000000000000011111111111111111111111111111111111111111110
00
00
0
00
00
000000000000000000000
1
11
11
1
11
11
111111111111111111111
x
y
0011
0011
0011
01
01
0011
000000
111111
0
00
1
11 00
0000
11
1111
0011
000
111
00001111
000
111
01
001100001111
00110011
000000
111111 0011
00001111
000000
111111
0011
000
111
0
00
1
11
000000
111111
00001111
000000000000000000000000000000000000000000011111111111111111111111111111111111111111110
00
00
0
00
00
000000000000000000000
1
11
11
1
11
11
111111111111111111111
x
y
Too large bandwidth ⇒ underfitting ⇒ large prediction error e
In terms of bias/variance trade-off, this is typically a situation of high bias.
Computational Intelligence Methods for Prediction – p. 57/83
58. Bandwidth and bias/variance trade-off
Mean Squared Error
1/Bandwith
FEW NEIGHBORSMANY NEIGHBORS
Bias
Variance
Underfitting Overfitting
Computational Intelligence Methods for Prediction – p. 58/83
59. The PRESS statistic
• Cross-validation can provide a reliable estimate of the algorithm
generalization error but it requires the training process to be repeated K
times, which sometimes means a large computational effort.
• In the case of linear models there exists a powerful statistical procedure
to compute the leave-one-out cross-validation measure at a reduced
computational cost
• It is the PRESS (Prediction Sum of Squares) statistic, a simple formula
which returns the leave-one-out (l-o-o) as a by-product of the
least-squares.
Computational Intelligence Methods for Prediction – p. 59/83
60. Leave-one-out for linear models
PARAMETRIC IDENTIFICATION ON N-1 SAMPLES
PUT THE j-th SAMPLE ASIDE
TEST ON THE j-th SAMPLE
PARAMETRIC IDENTIFICATION
ON N SAMPLES
N TIMES
TRAINING SET
PRESS STATISTIC
LEAVE-ONE-OUT
The leave-one-out error can be computed in two equivalent ways: the slowest
way (on the right) which repeats N times the training and the test procedure;
the fastest way (on the left) which performs only once the parametric
identification and the computation of the PRESS statistic.
Computational Intelligence Methods for Prediction – p. 60/83
61. The PRESS statistic
• This allows a fast cross-validation without repeating N times the
leave-one-out procedure. The PRESS procedure can be described as
follows:
1. we use the whole training set to estimate the linear regression
coefficients
ˆβ = (XT
X)−1
XT
Y
2. This procedure is performed only once on the N samples and
returns as by product the Hat matrix
H = X(XT
X)−1
XT
3. we compute the residual vector e, whose jth
term is ej = yj − xT
j
ˆβ,
4. we use the PRESS statistic to compute eloo
j as
eloo
j =
ej
1 − Hjj
where Hjj is the jth
diagonal term of the matrix H.
Computational Intelligence Methods for Prediction – p. 61/83
62. The PRESS statistic
Thus, the leave-one-out estimate of the local mean integrated squared error
is:
MISELOO =
1
N
N
i=1
yi − ˆyi
1 − Hii
2
Note that PRESS is not an approximation of the loo error but simply a faster
way of computing it.
Computational Intelligence Methods for Prediction – p. 62/83
63. Selection of the number of neighbours
• For a given query point xq, we can compute a set of predictions
ˆyq(k) = xT
q
ˆβ(k)
, together with a set of associated leave-one-out error vectors
MISELOO(k) for a number of neighbors ranging in [kmin, kmax].
• If the selection paradigm, frequently called winner-takes-all, is adopted,
the most natural way to extract a final prediction ˆyq, consists in
comparing the prediction obtained for each value of k on the basis of the
classical mean square error criterion:
ˆyq = xT
q
ˆβ(ˆk), with ˆk = arg min
k
MISELOO(k)
Computational Intelligence Methods for Prediction – p. 63/83
64. Local Model combination
• As an alternative to the winner-takes-all paradigm, we can use a
combination of estimates.
• The final prediction of the value yq is obtained as a weighted average of
the best b models, where b is a parameter of the algorithm.
• Suppose the predictions ˆyq(k) and the loo errors MISELOO(k) have been
ordered creating a sequence of integers {ki} so that
MISELOO(ki) ≤ MISELOO(kj), ∀i < j. The prediction of ˆyq is given by
ˆyq =
b
i=1 ζi ˆyq(ki)
b
i=1 ζi
,
where the weights are the inverse of the mean square errors:
ζi = 1/MISELOO(ki).
Computational Intelligence Methods for Prediction – p. 64/83
65. One step-ahead and iterated prediction
• Once a model of the embedding mapping is available, it can be used for
two objectives: one-step-ahead prediction and iterated prediction.
• In one-step-ahead prediction, the n previous values of the series are
available and the forecasting problem can be cast in the form of a
generic regression problem
• In literature a number of supervised learning approaches have been
used with success to perform one-step-ahead forecasting on the basis
of historical data.
Computational Intelligence Methods for Prediction – p. 65/83
66. One step-ahead prediction
f
ϕt-2
z-1
z-1
z-1
ϕt-3
ϕt-n
ϕt-1
z-1
ϕt
The approximator ˆf returns the prediction of the value of the time series at
time t + 1 as a function of the n previous values (the rectangular box
containing z−1
represents a unit delay operator, i.e., ϕt−1
= z−1
ϕt
).
Computational Intelligence Methods for Prediction – p. 66/83
67. Multi-step ahead prediction
• The prediction of the value of a time series H > A steps ahead is called
H-step-ahead prediction.
• We classify the methods for H-step-ahead prediction according to two
features: the horizon of the training criterion and the single-output or
multi-output nature of the predictor.
Computational Intelligence Methods for Prediction – p. 67/83
68. Multi-step ahead prediction strategies
The most common strategies are
1. Iterated: the model predicts h steps ahead by iterating a one-step-ahead
predictor whose parameters are optimized to minimize the training error
on one-step-ahead forecast (one-step-ahead training criterion).
2. Iterated strategy where parameters are optimized to minimize the
training error on the iterated htr-step-ahead forecast (htr-step-ahead
training criterion) where 1 < htr ≤ H.
3. Direct: the model makes a direct forecast at time t + h − 1, h = 1, . . . , H
by modeling the time series in a multi-input single-output form
4. Direc: direct forecast but the input vector is extended at each step with
predicted values.
5. MIMO: the model returns a vectorial forecast by modeling the time
series in a multi-input multi-output form
Computational Intelligence Methods for Prediction – p. 68/83
69. Iterated (or recursive) prediction
• In the case of iterated prediction, the predicted output is fed back as
input for the next prediction.
• Here, the inputs consist of predicted values as opposed to actual
observations of the original time series.
• As the feedback values are typically distorted by the errors made by the
predictor in previous steps, the iterative procedure may produce
undesired effects of accumulation of the error.
• Low performance is expected in long horizon tasks. This is due to the
fact that they are essentially models tuned with a one-step-ahead
criterion which is not capable of taking temporal behavior into account.
Computational Intelligence Methods for Prediction – p. 69/83
70. Iterated prediction
f
ϕt-2
z-1
z-1
z-1
z-1
ϕt-3
ϕt-n
ϕt-1
z-1
ϕt
The approximator ˆf returns the prediction of the value of the time series at
time t + 1 by iterating the predictions obtained in the previous steps (the
rectangular box containing z−1
represents a unit delay operator, i.e.,
ˆϕt−1
= z−1
ˆϕt
).
Computational Intelligence Methods for Prediction – p. 70/83
71. Iterated with h-step training criterion
• This strategy adopts one-step-ahead predictors but adapts the model
selection criterion in order to take into account the multi-step-ahead
objective.
• Methods like Recurrent Neural Networks belong to such class. Their
recurrent architecture and the associated training algorithm (temporal
backpropagation) are suitable to handle the time-dependent nature of
the data.
• In [?] we proposed an adaptation of the Lazy Learning algorithm where
the number of neighbors is optimized in order to minimize the
leave-one-out error over an horizon larger than one. This technique
ranked second in the 1998 KULeuven Time Series competition.
• A similar technique has been proposed by [?] who won the competition.
Computational Intelligence Methods for Prediction – p. 71/83
72. Direct strategy
• The Direct strategy [?, ?, ?] learns independently H models fh
ϕt+h = fh(ϕt, . . . , ϕt−n+1) + wt+h
with t ∈ {n, . . . , N − H} and h ∈ {1, . . . , H} and returns a multi-step
forecast by concatenating the H predictions.
Computational Intelligence Methods for Prediction – p. 72/83
73. Direct strategy
• Since the Direct strategy does not use any approximated values to
compute the forecasts, it is not prone to any accumulation of errors,
since each model is tailored for the horizon it is supposed to predict.
Notwithstanding, it has some weaknesses.
• Since the H models are learned independently no statistical
dependencies between the predictions ˆyN+h[?, ?, ?] is considered.
• Direct methods often require higher functional complexity [?] than
iterated ones in order to model the stochastic dependency between two
series values at two distant instants [?].
• This strategy demands a large computational time since the number of
models to learn is equal to the size of the horizon.
• Different machine learning models have been used to implement the
Direct strategy for multi-step forecasting tasks, for instance neural
networks [?], nearest neighbors [?] and decision trees [?].
Computational Intelligence Methods for Prediction – p. 73/83
74. DirRec strategy
• The DirRec strategy [?] combines the architectures and the principles
underlying the Direct and the Recursive strategies.
• DirRec computes the forecasts with different models for every
horizon (like the Direct strategy) and, at each time step, it enlarges the
set of inputs by adding variables corresponding to the forecasts of the
previous step (like the Recursive strategy).
• Unlike the previous strategies, the embedding size n is not the same for
all the horizons. In other terms, the DirRec strategy learns H models fh
from the time series [y1, . . . , yN ] where
yt+h = fh(yt+h−1, . . . , yt−n+1) + wt+h
with t ∈ {n, . . . , N − H} and h ∈ {1, . . . , H}.
• The technique is prone to the curse of dimensionality. The use of feature
selection is recommended for large h.
Computational Intelligence Methods for Prediction – p. 74/83
75. MIMO strategy
• This strategy [?, ?] (also known as Joint strategy [?]) avoids the simplistic
assumption of conditional independence between future values made by
the Direct strategy [?, ?] by learning a single multiple-output model
[yt+H, . . . , yt+1] = F(yt, . . . , yt−n+1) + w
where t ∈ {n, . . . , N − H}, F : Rd
→ RH
is a vector-valued function [?],
and w ∈ RH
is a noise vector with a covariance that is not necessarily
diagonal [?].
• The forecasts are returned in one step by a multiple-output model ˆF
where
[ˆyt+H , . . . , ˆyt+1] = ˆF(yN , . . . , yN−n+1)
Computational Intelligence Methods for Prediction – p. 75/83
76. MIMO strategy
• The rationale of the MIMO strategy is to model, between the predicted
values, the stochastic dependency characterizing the time series. This
strategy avoids the conditional independence assumption made by the
Direct strategy as well as the accumulation of errors which plagues the
Recursive strategy. So far, this strategy has been successfully applied to
several real-world multi-step time series forecasting tasks [?, ?, ?, ?].
• However, the wish to preserve the stochastic dependencies constrains
all the horizons to be forecasted with the same model structure. Since
this constraint could reduce the flexibility of the forecasting approach [?],
a variant of the MIMO strategy has been proposed in [?, ?] .
Computational Intelligence Methods for Prediction – p. 76/83
77. Validation of time series methods
• The huge variety of strategies and algorithms that can be used to infer a
predictor from observed data asks for a rigorous procedure of
comparison and assessment.
• Assessment demands benchmarks and benchmarking procedure.
• Benchmarks can be defined by using
• Simulated data obtained by simulating AR, NAR and other stochastic
processes. This is particular useful for validating theoretical
properties in terms of bias/variance.
• Public domain benchmarks, like the one provided by Time Series
Competitions.
• Real measured data
Computational Intelligence Methods for Prediction – p. 77/83
78. Competitions
• Santa Fe Time Series Prediction and Analysis Competition (1994) [?]:
• International Workshop on Advanced Black-box techniques for nonlinear
modeling Competition (Leuven, Belgium; 1998)
• NN3 competition [?]: 111 monthly time series drawn from homogeneous
population of empirical business time series.
• NN5 competition [?]: 111 time series of the daily retirement amounts
from independent cash machines at different, randomly selected
locations across England.
Computational Intelligence Methods for Prediction – p. 78/83
79. MLG projects on forecasting
Wireless sensor
Anesthesia
Car market prediction
Computational Intelligence Methods for Prediction – p. 79/83