Cross-validation aggregation is proposed as a method to combine the benefits of cross-validation and forecast aggregation. It involves using cross-validation to generate K models trained on different partitions of the data and aggregating the predictions from these K models to obtain a final prediction. This approach saves the predictions from each of the K models rather than discarding them, as is usually done in cross-validation for model selection.
Classification using L1-Penalized Logistic RegressionSetia Pramana
L1-Penalized Logistic Regression, is commonly used for classification in high dimensional data such as microarray. This slide presents a brief overview of the algorithm.
This document provides an overview of resampling methods, including jackknife, bootstrap, permutation, and cross-validation. It explains that resampling methods are used to approximate sampling distributions and estimate parameters' reliability when the true sampling distribution is difficult to derive. The document then describes each resampling method, their applications, and sampling procedures. It provides examples to illustrate permutation tests and how they are conducted through permutation resampling.
Cross-validation aggregation for forecastingDevon Barrow
Cross-validation aggregation combines the benefits of cross-validation and forecast aggregation. It saves the predictions from models estimated on different cross-validation folds and averages these predictions to obtain the final forecast. Empirical results on 111 time series show that cross-validation aggregation outperforms simple model averaging and bagging, with the lowest errors on validation sets. Different cross-validation aggregation methods perform best depending on data characteristics like time series length and forecast horizon.
A Study on the Importance of Adaptive Seed Value Exploration Seoung-Ho Choi
This document discusses a study on the importance of adaptively exploring seed values for training deep learning models. It explains that setting the initial seed value is important as it impacts the overall training direction. Through two experiments on different datasets, the study shows that a model's performance can vary significantly depending on the seed value and batch size. The conclusion is that efficiently setting the initial seed value is important for training deep learning models, and future work should explore adapting seed values and using quantum random number generators for more secure key generation.
This document discusses different sampling methods including probability sampling methods (such as simple random sampling, stratified random sampling, and multistage sampling), non-probability sampling methods (such as judgment sampling), and mixed sampling methods. It provides examples and definitions of each method type. Simple random sampling involves an equal chance of selection, while stratified random sampling divides the population into homogeneous groups before sampling. Multistage sampling involves multiple stages of sampling units. Non-probability sampling relies on the sampler's judgment.
Combining and pooling forecasts based on selection criteriaDevon K. Barrow
Nearly 50 years of research has focused on forecast combination versus selection. From the one hand findings suggest that correct identification of the ‘best’ model for a given series may lead to significant accuracy improvements in some cases up to 20-30%. Selecting the best method is challenging in two ways: identifying the best method given sampling uncertainties; and selecting the appropriate criteria. For example, selecting an optimal model based on Akaike Information Criterion (AIC) and Bayesian Information Criteria (BIC) can yield different results. These challenges suggest that there are potential benefits from combining forecasts of models instead. While simple combinations across all available methods may not always be an ideal strategy, combinations performed across a set of suitable methods or using appropriate combination weights usually outperform the selection of a single model. Prior research has focused mainly on combining forecasts of different models, parameters or fitting samples e.g. through bootstrapping. In contrast here we propose a new way to construct pools of forecasts to be combined by combining across different criteria of ‘optimal’ models. Our approach combines forecasts selected by different selection methods including, Akaike, Bayesian, Hanan Quin information criteria and cross-validation. We benchmark this strategy against the combination from a single model selection criterion and evaluate the forecasting performance of the two approaches under different experimental setups providing recommendations for practice.
Grouped time-series forecasting: Application to regional infant mortality countshanshang
This document presents a method for constructing prediction intervals for grouped time series forecasting. It applies this method to forecasts of regional infant mortality counts in Australia. The bottom-up method is used to generate point forecasts, which are more accurate than independent forecasts at higher levels of the hierarchy. A parametric bootstrap method is then used to construct prediction intervals for the point forecasts. This method is shown to provide prediction intervals with empirical coverage close to the nominal level.
Machine Learning, Data Mining, Genetic Algorithms, Neural ...butest
The document discusses various machine learning concepts including concept learning, decision trees, genetic algorithms, and neural networks. It provides details on each concept, such as how concept learning uses positive and negative examples to learn concepts, how decision trees use nodes and branches to classify data, and how genetic algorithms and neural networks are modeled after biological processes. It also gives examples of applications for each concept, such as using decision trees for classification and neural networks for tasks like handwriting recognition where explicit rules are difficult to define.
Classification using L1-Penalized Logistic RegressionSetia Pramana
L1-Penalized Logistic Regression, is commonly used for classification in high dimensional data such as microarray. This slide presents a brief overview of the algorithm.
This document provides an overview of resampling methods, including jackknife, bootstrap, permutation, and cross-validation. It explains that resampling methods are used to approximate sampling distributions and estimate parameters' reliability when the true sampling distribution is difficult to derive. The document then describes each resampling method, their applications, and sampling procedures. It provides examples to illustrate permutation tests and how they are conducted through permutation resampling.
Cross-validation aggregation for forecastingDevon Barrow
Cross-validation aggregation combines the benefits of cross-validation and forecast aggregation. It saves the predictions from models estimated on different cross-validation folds and averages these predictions to obtain the final forecast. Empirical results on 111 time series show that cross-validation aggregation outperforms simple model averaging and bagging, with the lowest errors on validation sets. Different cross-validation aggregation methods perform best depending on data characteristics like time series length and forecast horizon.
A Study on the Importance of Adaptive Seed Value Exploration Seoung-Ho Choi
This document discusses a study on the importance of adaptively exploring seed values for training deep learning models. It explains that setting the initial seed value is important as it impacts the overall training direction. Through two experiments on different datasets, the study shows that a model's performance can vary significantly depending on the seed value and batch size. The conclusion is that efficiently setting the initial seed value is important for training deep learning models, and future work should explore adapting seed values and using quantum random number generators for more secure key generation.
This document discusses different sampling methods including probability sampling methods (such as simple random sampling, stratified random sampling, and multistage sampling), non-probability sampling methods (such as judgment sampling), and mixed sampling methods. It provides examples and definitions of each method type. Simple random sampling involves an equal chance of selection, while stratified random sampling divides the population into homogeneous groups before sampling. Multistage sampling involves multiple stages of sampling units. Non-probability sampling relies on the sampler's judgment.
Combining and pooling forecasts based on selection criteriaDevon K. Barrow
Nearly 50 years of research has focused on forecast combination versus selection. From the one hand findings suggest that correct identification of the ‘best’ model for a given series may lead to significant accuracy improvements in some cases up to 20-30%. Selecting the best method is challenging in two ways: identifying the best method given sampling uncertainties; and selecting the appropriate criteria. For example, selecting an optimal model based on Akaike Information Criterion (AIC) and Bayesian Information Criteria (BIC) can yield different results. These challenges suggest that there are potential benefits from combining forecasts of models instead. While simple combinations across all available methods may not always be an ideal strategy, combinations performed across a set of suitable methods or using appropriate combination weights usually outperform the selection of a single model. Prior research has focused mainly on combining forecasts of different models, parameters or fitting samples e.g. through bootstrapping. In contrast here we propose a new way to construct pools of forecasts to be combined by combining across different criteria of ‘optimal’ models. Our approach combines forecasts selected by different selection methods including, Akaike, Bayesian, Hanan Quin information criteria and cross-validation. We benchmark this strategy against the combination from a single model selection criterion and evaluate the forecasting performance of the two approaches under different experimental setups providing recommendations for practice.
Grouped time-series forecasting: Application to regional infant mortality countshanshang
This document presents a method for constructing prediction intervals for grouped time series forecasting. It applies this method to forecasts of regional infant mortality counts in Australia. The bottom-up method is used to generate point forecasts, which are more accurate than independent forecasts at higher levels of the hierarchy. A parametric bootstrap method is then used to construct prediction intervals for the point forecasts. This method is shown to provide prediction intervals with empirical coverage close to the nominal level.
Machine Learning, Data Mining, Genetic Algorithms, Neural ...butest
The document discusses various machine learning concepts including concept learning, decision trees, genetic algorithms, and neural networks. It provides details on each concept, such as how concept learning uses positive and negative examples to learn concepts, how decision trees use nodes and branches to classify data, and how genetic algorithms and neural networks are modeled after biological processes. It also gives examples of applications for each concept, such as using decision trees for classification and neural networks for tasks like handwriting recognition where explicit rules are difficult to define.
Probability density estimation using Product of Conditional ExpertsChirag Gupta
This document discusses probability density estimation using a product of conditional experts model. It summarizes that density estimation constructs a probability distribution function from observed data to understand the underlying pattern. A product of conditional experts model is proposed, where simple classification models like logistic regression are used as experts to estimate the conditional probability. The experts are combined by multiplying their probabilities. The model is trained using gradient ascent to maximize the log probability. When evaluated on artificial and real datasets, the product of conditional experts model is shown to learn distributions close to the true distributions and generalize better than linear and non-linear baseline models. The document also explores applying the model to outlier detection.
The document provides an overview of key concepts related to estimation in statistics, including:
- Estimation involves using sample data to estimate unknown population parameters. Common estimators include the sample mean, proportion, and standard deviation.
- There are two main types of estimates - point estimates and interval estimates. Point estimates are single values while interval estimates specify a range.
- The process of estimation involves identifying the parameter, selecting a random sample, choosing an estimator, and calculating the estimate.
- Estimates can differ from the true population value due to sampling error and non-sampling error. Bias occurs when the expected value of the estimate differs from the true parameter value.
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
SMOTE is a technique used to handle class imbalance problems in data. It involves over-sampling the minority class by synthesizing new minority class examples and under-sampling the majority class. This helps improve recall, or the detection of truly positive instances from the minority class, which is often prioritized over precision in class imbalance situations. K-fold cross-validation is a resampling method used to evaluate machine learning models on limited data. It involves splitting the dataset into k groups, using each group as a test set while the remaining form the training set, and averaging the results.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
This document discusses processing and analyzing data. It defines processing as editing, coding, classifying, and tabulating raw data. Analysis is categorized as descriptive or inferential. Descriptive analysis studies distributions through measures like mean, median and correlation, while inferential analysis determines relationships through regression and hypothesis testing. Multivariate analysis simultaneously analyzes more than two variables using techniques like multiple regression, discriminant analysis, and ANOVA. Proper data analysis requires understanding concepts like sampling, standard error, and estimation to make valid statistical inferences.
Explore the latest techniques and technologies used in classifying fetal health, from traditional methods to cutting-edge AI approaches. Understand the importance of accurate classification for prenatal care and fetal well-being. Join us to delve into this critical aspect of healthcare. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more data science insights
This document provides an overview of sampling techniques and sampling design. It discusses the significance of sampling and defines key terms like population, sample, sampling frame, representative sample, and sampling bias. It also describes different probability sampling techniques like simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multi-stage sampling. Finally, it briefly covers non-probability sampling techniques and factors to consider in sample size and design.
This document discusses sampling distributions and related statistical concepts. It defines descriptive and inferential statistics, and explains that inferential statistics uses samples to draw conclusions about populations. Key concepts covered include sampling, probability distributions, sampling distributions, and the central limit theorem. The sampling distribution of the sample mean is examined in depth. For a sample mean, the expected value is equal to the population mean, while the standard error depends on factors like the population standard deviation and sample size. Examples are provided to illustrate these statistical properties.
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.)
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
Modeling selection pressure in XCS for proportionate and tournament selectionkknsastry
In this paper, we derive models of the selection pressure in XCS for proportionate (roulette wheel) selection and tournament selection. We show that these models can explain the empirical results that have been previously presented in the literature. We validate the models on simple problems showing that, (i) when the model assumptions hold, the theory perfectly matches the empirical evidence; (ii) when the model assumptions do not hold, the theory can still provide qualitative explanations of the experimental results.
This document provides an overview of machine learning methods, including supervised and unsupervised learning. It discusses commonly used machine learning algorithms like support vector machines (SVM), hidden Markov models, decision trees, random forests, Bayesian networks, and neural networks. It also covers datasets, assessment metrics, and caveats to consider when using machine learning.
Ensemble learning methods were very successful in the Netflix Prize competition to improve movie recommendations. These methods combine the predictions from multiple models to obtain better accuracy than single models. Popular ensemble techniques included bagging, boosting, and random forests. The winning teams in the Netflix Prize all used ensemble methods that blended the predictions of dozens or hundreds of individual models.
This document defines key terms and concepts related to probability distributions, including discrete and continuous random variables, and the mean, variance, and standard deviation of probability distributions. It also describes the characteristics and computations for the binomial, hypergeometric, and Poisson probability distributions. Examples are provided to illustrate how to calculate probabilities using these three specific probability distributions.
Diversity mechanisms for evolutionary populations in Search-Based Software En...Annibale Panichella
This document discusses mechanisms for maintaining diversity in evolutionary algorithms. It begins by explaining the importance of balancing exploration and exploitation. Several techniques for preserving diversity are then presented, including modifying genetic operators, changing the objective function, and applying statistical methods. Empirical evaluations demonstrate how diversity mechanisms can improve performance in search-based software engineering problems like test data generation and test suite optimization, which often suffer from premature convergence and getting stuck in local optima due to loss of diversity. Parameter tuning techniques like adjusting the mutation rate and niching methods like fitness sharing are also described as ways to explicitly promote diversity.
This document provides a summary of a 4-part training program on using PASW Statistics 17 (SPSS 17) software to perform descriptive statistics, tests of significance, regression analysis, and chi-square/ANOVA. The agenda covers topics like frequency analysis, correlations, t-tests, ANOVA, importing/exporting data, and more. The goal is to help users answer research questions and test hypotheses using techniques in PASW Statistics.
This document summarizes ensemble classification methods including bagging, boosting, and random forests. It discusses discriminative vs generative models and reviews literature on various machine learning algorithms. It provides details on bagging, boosting, random forests algorithms and compares their pros and cons. It discusses empirical comparisons of algorithm performance on different datasets and problems.
The document discusses key concepts in sampling techniques and survey methods, including defining a population and sample, different types of sampling designs (e.g. simple random sampling), the importance of a sampling frame, total survey design which considers all aspects of designing and implementing a survey from objectives to analysis, and provides an example of designing a sampling plan to estimate the average farm acreage and corn acreage based on a population of 4 farms.
Probability density estimation using Product of Conditional ExpertsChirag Gupta
This document discusses probability density estimation using a product of conditional experts model. It summarizes that density estimation constructs a probability distribution function from observed data to understand the underlying pattern. A product of conditional experts model is proposed, where simple classification models like logistic regression are used as experts to estimate the conditional probability. The experts are combined by multiplying their probabilities. The model is trained using gradient ascent to maximize the log probability. When evaluated on artificial and real datasets, the product of conditional experts model is shown to learn distributions close to the true distributions and generalize better than linear and non-linear baseline models. The document also explores applying the model to outlier detection.
The document provides an overview of key concepts related to estimation in statistics, including:
- Estimation involves using sample data to estimate unknown population parameters. Common estimators include the sample mean, proportion, and standard deviation.
- There are two main types of estimates - point estimates and interval estimates. Point estimates are single values while interval estimates specify a range.
- The process of estimation involves identifying the parameter, selecting a random sample, choosing an estimator, and calculating the estimate.
- Estimates can differ from the true population value due to sampling error and non-sampling error. Bias occurs when the expected value of the estimate differs from the true parameter value.
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
SMOTE is a technique used to handle class imbalance problems in data. It involves over-sampling the minority class by synthesizing new minority class examples and under-sampling the majority class. This helps improve recall, or the detection of truly positive instances from the minority class, which is often prioritized over precision in class imbalance situations. K-fold cross-validation is a resampling method used to evaluate machine learning models on limited data. It involves splitting the dataset into k groups, using each group as a test set while the remaining form the training set, and averaging the results.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
This document discusses processing and analyzing data. It defines processing as editing, coding, classifying, and tabulating raw data. Analysis is categorized as descriptive or inferential. Descriptive analysis studies distributions through measures like mean, median and correlation, while inferential analysis determines relationships through regression and hypothesis testing. Multivariate analysis simultaneously analyzes more than two variables using techniques like multiple regression, discriminant analysis, and ANOVA. Proper data analysis requires understanding concepts like sampling, standard error, and estimation to make valid statistical inferences.
Explore the latest techniques and technologies used in classifying fetal health, from traditional methods to cutting-edge AI approaches. Understand the importance of accurate classification for prenatal care and fetal well-being. Join us to delve into this critical aspect of healthcare. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more data science insights
This document provides an overview of sampling techniques and sampling design. It discusses the significance of sampling and defines key terms like population, sample, sampling frame, representative sample, and sampling bias. It also describes different probability sampling techniques like simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multi-stage sampling. Finally, it briefly covers non-probability sampling techniques and factors to consider in sample size and design.
This document discusses sampling distributions and related statistical concepts. It defines descriptive and inferential statistics, and explains that inferential statistics uses samples to draw conclusions about populations. Key concepts covered include sampling, probability distributions, sampling distributions, and the central limit theorem. The sampling distribution of the sample mean is examined in depth. For a sample mean, the expected value is equal to the population mean, while the standard error depends on factors like the population standard deviation and sample size. Examples are provided to illustrate these statistical properties.
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.)
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
Modeling selection pressure in XCS for proportionate and tournament selectionkknsastry
In this paper, we derive models of the selection pressure in XCS for proportionate (roulette wheel) selection and tournament selection. We show that these models can explain the empirical results that have been previously presented in the literature. We validate the models on simple problems showing that, (i) when the model assumptions hold, the theory perfectly matches the empirical evidence; (ii) when the model assumptions do not hold, the theory can still provide qualitative explanations of the experimental results.
This document provides an overview of machine learning methods, including supervised and unsupervised learning. It discusses commonly used machine learning algorithms like support vector machines (SVM), hidden Markov models, decision trees, random forests, Bayesian networks, and neural networks. It also covers datasets, assessment metrics, and caveats to consider when using machine learning.
Ensemble learning methods were very successful in the Netflix Prize competition to improve movie recommendations. These methods combine the predictions from multiple models to obtain better accuracy than single models. Popular ensemble techniques included bagging, boosting, and random forests. The winning teams in the Netflix Prize all used ensemble methods that blended the predictions of dozens or hundreds of individual models.
This document defines key terms and concepts related to probability distributions, including discrete and continuous random variables, and the mean, variance, and standard deviation of probability distributions. It also describes the characteristics and computations for the binomial, hypergeometric, and Poisson probability distributions. Examples are provided to illustrate how to calculate probabilities using these three specific probability distributions.
Diversity mechanisms for evolutionary populations in Search-Based Software En...Annibale Panichella
This document discusses mechanisms for maintaining diversity in evolutionary algorithms. It begins by explaining the importance of balancing exploration and exploitation. Several techniques for preserving diversity are then presented, including modifying genetic operators, changing the objective function, and applying statistical methods. Empirical evaluations demonstrate how diversity mechanisms can improve performance in search-based software engineering problems like test data generation and test suite optimization, which often suffer from premature convergence and getting stuck in local optima due to loss of diversity. Parameter tuning techniques like adjusting the mutation rate and niching methods like fitness sharing are also described as ways to explicitly promote diversity.
This document provides a summary of a 4-part training program on using PASW Statistics 17 (SPSS 17) software to perform descriptive statistics, tests of significance, regression analysis, and chi-square/ANOVA. The agenda covers topics like frequency analysis, correlations, t-tests, ANOVA, importing/exporting data, and more. The goal is to help users answer research questions and test hypotheses using techniques in PASW Statistics.
This document summarizes ensemble classification methods including bagging, boosting, and random forests. It discusses discriminative vs generative models and reviews literature on various machine learning algorithms. It provides details on bagging, boosting, random forests algorithms and compares their pros and cons. It discusses empirical comparisons of algorithm performance on different datasets and problems.
The document discusses key concepts in sampling techniques and survey methods, including defining a population and sample, different types of sampling designs (e.g. simple random sampling), the importance of a sampling frame, total survey design which considers all aspects of designing and implementing a survey from objectives to analysis, and provides an example of designing a sampling plan to estimate the average farm acreage and corn acreage based on a population of 4 farms.
Similar to Euro 2013 barrow crone - slideshare (20)
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
2. 1. Motivation
2. Cross-validation and model selection
3. Cross-validation aggregation
4. Empirical evaluation
5. Conclusions and future work
Outline
Cross validation aggregation for forecasting Motivation 1
3. • Scenario:
– The statistician constructs a model and wishes to estimate the error
rate of this model when used to predict future values
Motivation
Cross validation aggregation for forecasting Motivation 2
5. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
6. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
7. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
1996 - Breiman introduces bootstrapping and aggregation
8. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
9. (a) Published items in each year (b) Citations in Each Year
Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Bagging for time series
forecasting:
• Forecasting with many
predictors (Watson 2005)
• Macro-economic time series
e.g. consumer price inflation
(Inoue & Kilian 2008)
• Volatility prediction (Hillebrand &
M. C. Medeiros 2010)
• Small datasets – few
observations (Langella 2010)
• With other approaches e.g.
feature selection – PCA (Lin and
Zhu 2007)
Citation results for publications on bagging for time series
10. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Research gap:
In contrast to bootstrapping, cross-validation has not been used for forecasts
aggregation
11. Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)
Goal Estimating generalisation error Estimating generalisation error
Motivation
Cross validation aggregation for forecasting Motivation 2
Procedure Random sampling with replacement from a
single learning set (bootstrap samples). The
validation set is the same as the original
learning set.
Splits the data into mutually exclusive
subsets, using one subset as a set to train
each model, and the remaining part as a
validation sample (Arlot & Celisse, 2010)
Properties Low variance but is downward biased (Efron
and Tibshirani, 1997)
Generalization error estimate is nearly
unbiased but can be highly variable (Efron
and Tibshirani, 1997)
Research contribution:
We propose to combine the benefits of cross-validation and forecast
aggregation – Crogging
Forecast
aggregation
Bagging (Breiman 1996) – aggregates the
outputs of models trained on bootstrap
samples
Research gap:
In contrast to bootstrapping, cross-validation has not been used for forecasts
aggregation
12. Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
NN
yyyS ,x,...,,x,,x 2211
K
13. Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)Sk
S
NN
yyyS ,x,...,,x,,x 2211
K
14. Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }xˆ k
m iik
ym xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
15. Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }xˆ k
m iik
ym xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
16. Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }xˆ k
m iik
ym xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
17. Motivation: The Bagging algorithm
Cross validation aggregation for forecasting Motivation 3
• Inputs: learning set
• Selection the number of bootstraps =
• For i=1 to K {
– Generate a bootstrap sample using (your favorite bootstrap method)
– Using training set estimate a model such that }
• Combine model to obtain:
xˆ k
m iik
ym xˆ
K
k
k
m
K
M
1
xˆ
1
xˆ
Sk
S
k
S
NN
yyyS ,x,...,,x,,x 2211
K
18. 1.
2. Cross-validation and model selection
3.
4.
5.
Outline
Cross validation aggregation for forecasting Cross-validation 4
19. • Cross validation is a widely used strategy:
– Estimating the predictive accuracy of a model
– Performing model selection e.g.:
• Choosing among variables in a regression or the degrees of
freedom of a nonparametric model (selection for identification)
• Parameter estimation and tuning (selection for estimation)
Cross validation aggregation for forecasting Cross-validation 5
Cross-validation: Background
20. • Main features:
– Main idea: test the model on data not used in estimation
– Split data once or several times
– Part of data is used for training each model (the training
sample), and the remaining part is used for estimating the
prediction error of the model (the validation sample)
Cross validation aggregation for forecasting Cross-validation 5
Cross-validation: Background
22. • K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
K samples (one or more observations)
Cross-validation: How it works?
23. • K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
24. • K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
25. • K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
26. • K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
K samples (one or more observations)
Cross-validation: How it works?
27. • K-fold cross-validation:
Sample 1 Sample 2 Sample K-1 Sample K
Estimation Validation
…
K
t
i
m
e
s
K samples (one or more observations)
Cross-validation: How it works?
28. • k-fold cross-validation
– Divides the data into k none-overlapping and mutually
exclusive sub-samples of approximately equal size.
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
29. • k-fold cross-validation
– Divides the data into k none-overlapping and mutually
exclusive sub-samples of approximately equal size.
– If k=2, 2-Fold cross validation
– If k=10, 10-Fold cross validation
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
30. • If k=N, Leave-one-out cross-validation (LOOCV)
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
31. • Monte-carlo cross-validation
– Randomly split the data into two sub-samples (training and
validation) multiple times, each time randomly drawing
without replacement
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
32. • Hold-out method
– A single split into two data sub-samples
Cross-validation strategies
Cross validation aggregation for forecasting Cross-validation aggregation 7
33. • Goal: select a model having the smallest generalisation
error
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
34. • Goal: select a model having the smallest generalisation
error
• Compute an approximation of the generalisation error
defined as follows: N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
35. • Estimate model m on the training set, and calculate the
error on the validation set for sample k is:
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
36. • Estimate the generalisation error after K repetitions as the
average error across all repetitions:
N
i
ii
N
gen
N
my
mE
1
2
xˆ
lim
KN
my
mE
KN
i
val
i
val
i
k
1
2
xˆ
K
mE
mE
K
k
k
gen
1ˆ
Cross validation: model selection
Cross validation aggregation for forecasting Cross-validation 8
41. • In model selection, the model obtained is the one built on all the
data (no data reserved for validation)
– However predictive accuracy is adjudged on models built on different
parts of the data
– These supplementary models are thrown away after they have served
their purpose
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
42. • The proposed approach:
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
43. • The proposed approach:
– We save the predictions made by the K estimated models
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
44. • The proposed approach:
– This gives us a prediction for every observation in the training sample
derived from a model that was built when that observation was in the
validation sample
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
45. • The proposed approach:
– We then average across the predictions from the K models to produce
a final prediction.
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
46. • The proposed approach:
– In the case of neural networks, we also use the validation samples for
early stop training
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
47. • The proposed approach:
– In the case of neural networks, we also use the validation samples for
early stop training
– We average across multiple initialisations together with cross
validation aggregation (to reduce variance)
K
k
tkt
m
K
M
1
xˆ
1
xˆ
Cross-validation aggregation: Crogging
Cross validation aggregation for forecasting Cross-validation aggregation 10
49. Complete Dataset
Reduced Dataset
Short Long Normal Difficult SUM
Non-Seasonal
25
(NS)
25
(NL)
4
(NN)
3
(ND)
57
Seasonal
25
(SS)
25
(SL)
4
(SN)
- 54
SUM 50 50 8 3 111
Summary description of NN3 competition time series dataset
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
• Time series data
• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon,
and Nikolopoulos 2011)
50. 20 40 60 80 100 120 140
4000
5000
6000
NN3_101
20 40 60 80 100 120 140
0
5000
10000
NN3_102
20 40 60 80 100 120 140
0
5
10
x 10
4
NN3_103
20 40 60 80 100 120
0
5000
10000
NN3_104
20 40 60 80 100 120 140
2000
4000
6000
NN3_105
20 40 60 80 100 120 140
0
5000
10000
NN3_106
4000
5000
NN3_107
5000
10000
NN3_108Plot of 10 time series from the NN3 dataset
Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
• Time series data
• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon,
and Nikolopoulos 2011)
51. Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
•
• The following experimental setup is used:
– Forecast horizon: 12 months
– Holdout period: 18 months
– Error Measures: SMAPE and MASE.
– Rolling origin evaluation (Tashman,2000).
52. Evaluation: Design and implementation
Cross validation aggregation for forecasting Empirical evaluation 12
•
• Neural network specification:
– A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags.
– Each MLP network contains a single hidden layer; two hidden nodes; and a single
output node with a linear identity function. The hyperbolic tangent transfer
function is used.
53. • Across all time series
– On validation set Monte carlo cross-validation is always best
– All Crogging variants outperform the benchmark Bagging algorithm
and hold-out method (NN model averaging)
Method Train Validation Test
BESTMLP 1.25 0.96 1.49
HOLDOUT 0.64 0.75 1.20
BAG 0.76 0.70 1.21
MONTECV 0.76 0.41 1.16
10FOLDCV 0.69 0.45 1.07
2FOLDCV 0.73 0.60 1.15
Method Train Validation Test
BESTMLP 12.36 11.10 17.89
HOLDOUT 11.78 12.57 16.08
BAG 12.95 13.17 16.32
MONTECV 13.81 8.29 15.35
10FOLDCV 12.65 8.94 15.52
2FOLDCV 13.68 11.19 15.29
MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 13
MASE SMAPE
54. Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference
represents the median value of the distributions.
• Across all time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 13
55. Length Method
Forecast Horizon
1-3 4-12 13-18 1-18
Long BESTMLP 10.79 16.59 20.02 16.77
HOLDOUT 9.34 14.96 16.20 14.43
BAG 9.74 15.46 16.38 14.81
MONTECV 10.86 15.16 15.43 14.54
10FOLDCV 10.39 14.04 14.82 13.69
2FOLDCV 9.03 14.64 15.69 14.06
SMAPE on test set averaged over long time series for short, medium and long forecast horizon
• Data conditions:
– Long time series: 10-fold cross-validation has the smallest error for
medium to long horizons, and over forecast lead times 1-18
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
56. Length Method
Forecast Horizon
1-3 4-12 13-18 1-18
Short BESTMLP 16.83 17.03 20.66 18.20
HOLDOUT 17.59 17.04 20.12 18.16
BAG 17.20 17.27 20.96 18.49
MONTECV 15.47 14.71 19.05 16.28
10FOLDCV 16.00 15.91 20.25 17.37
2FOLDCV 15.86 14.51 18.95 16.21
SMAPE on test set averaged over short time series for short, medium and long forecast horizon
• Data conditions:
– Short time series: 2-fold cross validation and Monte-carlo cross-
validation outperform 10-fold cross-validation for all forecast horizons
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
57. • Data conditions:
Boxplots of the SMAPE averaged across long (left) and short (right) time series
Evaluation: Findings
Cross validation aggregation for forecasting Empirical evaluation 14
65. Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
Not a Forecasting Method!
66. Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
A general method for
improving the accuracy of a
forecast model
67. • Conclusion
– Cross-validation aggregation outperforms model selection, Bagging
and the current approaches to model averaging which uses a single
hold-out (validation sample)
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
68. • Conclusion
– It is especially effective when the amount of data available for training
the model is limited as shown for short time series
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
69. • Conclusion
– Improvements in forecast accuracy increase with forecast horizons
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
70. • Conclusion
– It offers promising results on the NN3 competition
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
71. • Future work
– Perform bias-variance decomposition and analysis
– Consider other base model types other than neural networks
– Evaluate forecast accuracy for a larger set of time series - M3
Competition Data (3003 time series, established benchmark)
Cross validation aggregation for forecasting Conclusions and future work 17
Conclusions and future work
72. Devon K. Barrow
Lancaster University Management School
Centre for Forecasting
Lancaster, LA1 4YX, UK
Tel.: +44 (0) 7960271368
Email: d.barrow@lancaster.ac.uk