SlideShare a Scribd company logo
Euro area GDP forecasting using large survey datasets A random forest approach Olivier Biau – Angela D’Elia Directorate General for Economic and Financial Affairs European Commission Groupe Travail prévision Paris, 12 May 2010   Views expressed represent exclusively the positions of the authors and do not necessarily correspond to those of the European Commission
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
DG ECFIN: mission statement ,[object Object],[object Object],[object Object],[object Object]
Data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
From binary trees to Random Forest ,[object Object],[object Object],[object Object]
How to grow a tree? with CART ,[object Object],[object Object],[object Object],[object Object]
What is the tree predictor? ,[object Object],[object Object],[object Object]
RF algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RF algorithm   ,[object Object],[object Object],[object Object],[object Object],[object Object]
Benchmark ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
Results (1) ,[object Object],[object Object],[object Object],[object Object]
Results (2) ,[object Object],[object Object]
Results (2) ,[object Object],[object Object]
Results (2) ,[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

JDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing News
JDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing NewsJDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing News
JDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing News
National Bank of Belgium
 
Nowcasting German GDP growth and the real time newsflow
Nowcasting German GDP growth and the real time newsflowNowcasting German GDP growth and the real time newsflow
Nowcasting German GDP growth and the real time newsflow
National Bank of Belgium
 
Presentation of sm on distibution lag model
Presentation of sm on distibution lag modelPresentation of sm on distibution lag model
Presentation of sm on distibution lag model
sumit gyawali
 
Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
University of Salerno
 
Professor Alejandro Diaz Bautista Input Output Conference March 2013.
Professor Alejandro Diaz Bautista Input Output Conference March 2013.Professor Alejandro Diaz Bautista Input Output Conference March 2013.
Professor Alejandro Diaz Bautista Input Output Conference March 2013.
Economist
 
The CIA (consistency in aggregation) approach - A new economic approach to el...
The CIA (consistency in aggregation) approach - A new economic approach to el...The CIA (consistency in aggregation) approach - A new economic approach to el...
The CIA (consistency in aggregation) approach - A new economic approach to el...
Istituto nazionale di statistica
 

What's hot (6)

JDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing News
JDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing NewsJDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing News
JDemetra+Nowcasting: Macroeconomic Monitoring and Visualizing News
 
Nowcasting German GDP growth and the real time newsflow
Nowcasting German GDP growth and the real time newsflowNowcasting German GDP growth and the real time newsflow
Nowcasting German GDP growth and the real time newsflow
 
Presentation of sm on distibution lag model
Presentation of sm on distibution lag modelPresentation of sm on distibution lag model
Presentation of sm on distibution lag model
 
Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
 
Professor Alejandro Diaz Bautista Input Output Conference March 2013.
Professor Alejandro Diaz Bautista Input Output Conference March 2013.Professor Alejandro Diaz Bautista Input Output Conference March 2013.
Professor Alejandro Diaz Bautista Input Output Conference March 2013.
 
The CIA (consistency in aggregation) approach - A new economic approach to el...
The CIA (consistency in aggregation) approach - A new economic approach to el...The CIA (consistency in aggregation) approach - A new economic approach to el...
The CIA (consistency in aggregation) approach - A new economic approach to el...
 

Viewers also liked

Forecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business SurveysForecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business Surveys
Cdiscount
 
Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses
Cdiscount
 
Prévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAMPrévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAM
Cdiscount
 
Prévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnellesPrévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnelles
Cdiscount
 
Prediction in dynamic Graphs
Prediction in dynamic GraphsPrediction in dynamic Graphs
Prediction in dynamic Graphs
Cdiscount
 
Paris2012 session1
Paris2012 session1Paris2012 session1
Paris2012 session1
Cdiscount
 
Paris2012 session3b
Paris2012 session3bParis2012 session3b
Paris2012 session3bCdiscount
 
State Space Model
State Space ModelState Space Model
State Space ModelCdiscount
 
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Cdiscount
 
Paris2012 session4
Paris2012 session4Paris2012 session4
Paris2012 session4Cdiscount
 
Scm prix blé_2012_11_06
Scm prix blé_2012_11_06Scm prix blé_2012_11_06
Scm prix blé_2012_11_06Cdiscount
 
Paris2012 session2
Paris2012 session2Paris2012 session2
Paris2012 session2Cdiscount
 
Robust sequentiel learning
Robust sequentiel learningRobust sequentiel learning
Robust sequentiel learning
Cdiscount
 
Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06Cdiscount
 
Prévisions trafic aérien
Prévisions trafic aérienPrévisions trafic aérien
Prévisions trafic aérien
Cdiscount
 
Présentation G.Biau Random Forests
Présentation G.Biau Random ForestsPrésentation G.Biau Random Forests
Présentation G.Biau Random Forests
Cdiscount
 

Viewers also liked (17)

Forecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business SurveysForecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business Surveys
 
Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses
 
Prévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAMPrévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAM
 
Prévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnellesPrévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnelles
 
Prediction in dynamic Graphs
Prediction in dynamic GraphsPrediction in dynamic Graphs
Prediction in dynamic Graphs
 
Paris2012 session1
Paris2012 session1Paris2012 session1
Paris2012 session1
 
Paris2012 session3b
Paris2012 session3bParis2012 session3b
Paris2012 session3b
 
State Space Model
State Space ModelState Space Model
State Space Model
 
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
 
Paris2012 session4
Paris2012 session4Paris2012 session4
Paris2012 session4
 
Scm prix blé_2012_11_06
Scm prix blé_2012_11_06Scm prix blé_2012_11_06
Scm prix blé_2012_11_06
 
Paris2012 session2
Paris2012 session2Paris2012 session2
Paris2012 session2
 
Scm risques
Scm risquesScm risques
Scm risques
 
Robust sequentiel learning
Robust sequentiel learningRobust sequentiel learning
Robust sequentiel learning
 
Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06
 
Prévisions trafic aérien
Prévisions trafic aérienPrévisions trafic aérien
Prévisions trafic aérien
 
Présentation G.Biau Random Forests
Présentation G.Biau Random ForestsPrésentation G.Biau Random Forests
Présentation G.Biau Random Forests
 

Similar to Présentation Olivier Biau Random forests et conjoncture

Advanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptxAdvanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptx
akashayosha
 
Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...
Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...
Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...
International Journal of World Policy and Development Studies
 
Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...
Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...
Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...
OECD Environment
 
Advanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptxAdvanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptx
akashayosha
 
Predicting the economic public opinions in Europe
Predicting the economic public opinions in EuropePredicting the economic public opinions in Europe
Predicting the economic public opinions in Europe
SYRTO Project
 
ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...
ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...
ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...
ijaia
 
Sectoral allocation and macroeconomic imbalances in EMU
Sectoral allocation and macroeconomic imbalances in EMUSectoral allocation and macroeconomic imbalances in EMU
Sectoral allocation and macroeconomic imbalances in EMU
ADEMU_Project
 
Developing a Soft Linkage between a detailed dynamic input-output macroeconom...
Developing a Soft Linkage between a detailed dynamic input-output macroeconom...Developing a Soft Linkage between a detailed dynamic input-output macroeconom...
Developing a Soft Linkage between a detailed dynamic input-output macroeconom...
IEA-ETSAP
 
Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, ...
Session III - Census and Registers -  M. Scanu, G.Donariello, D. Frattarola, ...Session III - Census and Registers -  M. Scanu, G.Donariello, D. Frattarola, ...
Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, ...
Istituto nazionale di statistica
 
INTRODUCTION TO TIME SERIES REGRESSION AND FORCASTING
INTRODUCTION TO TIME SERIES REGRESSION AND FORCASTINGINTRODUCTION TO TIME SERIES REGRESSION AND FORCASTING
INTRODUCTION TO TIME SERIES REGRESSION AND FORCASTING
SPICEGODDESS
 
The Role of Production Factor Quality and Technology Diffusion in 20th Centur...
The Role of Production Factor Quality and Technology Diffusion in 20th Centur...The Role of Production Factor Quality and Technology Diffusion in 20th Centur...
The Role of Production Factor Quality and Technology Diffusion in 20th Centur...
Structuralpolicyanalysis
 
Comparing the Economic Structure and Carbon Dioxide Emission between China an...
Comparing the Economic Structure and Carbon Dioxide Emission between China an...Comparing the Economic Structure and Carbon Dioxide Emission between China an...
Comparing the Economic Structure and Carbon Dioxide Emission between China an...
International Journal of Economics and Financial Research
 
Wavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesWavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX Rates
aiQUANT
 
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Latvijas Banka
 
11.polynomial regression model of making cost prediction in mixed cost analysis
11.polynomial regression model of making cost prediction in mixed cost analysis11.polynomial regression model of making cost prediction in mixed cost analysis
11.polynomial regression model of making cost prediction in mixed cost analysisAlexander Decker
 
Polynomial regression model of making cost prediction in mixed cost analysis
Polynomial regression model of making cost prediction in mixed cost analysisPolynomial regression model of making cost prediction in mixed cost analysis
Polynomial regression model of making cost prediction in mixed cost analysis
Alexander Decker
 
AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...
AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...
AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...
ijmpict
 
论文 yanning zang
论文 yanning zang论文 yanning zang
论文 yanning zangYanning Zang
 
Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...
Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...
Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...
Soledad Zignago
 

Similar to Présentation Olivier Biau Random forests et conjoncture (20)

Advanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptxAdvanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptx
 
Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...
Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...
Labour Productivity Dynamics Regularities Analyses by Manufacturing in Europe...
 
Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...
Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...
Paul Hiebert, ECB - OECD Workshop on “Climate change, Assumptions, Uncertaint...
 
Advanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptxAdvanced Econometrics L3-4.pptx
Advanced Econometrics L3-4.pptx
 
Predicting the economic public opinions in Europe
Predicting the economic public opinions in EuropePredicting the economic public opinions in Europe
Predicting the economic public opinions in Europe
 
ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...
ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...
ARTIFICIAL INTELLIGENCE TO OPTIMIZE COUNTRIES’ MACROECONOMIC AND ENVIRONMENTA...
 
Sectoral allocation and macroeconomic imbalances in EMU
Sectoral allocation and macroeconomic imbalances in EMUSectoral allocation and macroeconomic imbalances in EMU
Sectoral allocation and macroeconomic imbalances in EMU
 
Developing a Soft Linkage between a detailed dynamic input-output macroeconom...
Developing a Soft Linkage between a detailed dynamic input-output macroeconom...Developing a Soft Linkage between a detailed dynamic input-output macroeconom...
Developing a Soft Linkage between a detailed dynamic input-output macroeconom...
 
Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, ...
Session III - Census and Registers -  M. Scanu, G.Donariello, D. Frattarola, ...Session III - Census and Registers -  M. Scanu, G.Donariello, D. Frattarola, ...
Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, ...
 
INTRODUCTION TO TIME SERIES REGRESSION AND FORCASTING
INTRODUCTION TO TIME SERIES REGRESSION AND FORCASTINGINTRODUCTION TO TIME SERIES REGRESSION AND FORCASTING
INTRODUCTION TO TIME SERIES REGRESSION AND FORCASTING
 
The Role of Production Factor Quality and Technology Diffusion in 20th Centur...
The Role of Production Factor Quality and Technology Diffusion in 20th Centur...The Role of Production Factor Quality and Technology Diffusion in 20th Centur...
The Role of Production Factor Quality and Technology Diffusion in 20th Centur...
 
Comparing the Economic Structure and Carbon Dioxide Emission between China an...
Comparing the Economic Structure and Carbon Dioxide Emission between China an...Comparing the Economic Structure and Carbon Dioxide Emission between China an...
Comparing the Economic Structure and Carbon Dioxide Emission between China an...
 
Wavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesWavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX Rates
 
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
 
11.polynomial regression model of making cost prediction in mixed cost analysis
11.polynomial regression model of making cost prediction in mixed cost analysis11.polynomial regression model of making cost prediction in mixed cost analysis
11.polynomial regression model of making cost prediction in mixed cost analysis
 
Polynomial regression model of making cost prediction in mixed cost analysis
Polynomial regression model of making cost prediction in mixed cost analysisPolynomial regression model of making cost prediction in mixed cost analysis
Polynomial regression model of making cost prediction in mixed cost analysis
 
AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...
AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...
AN IMPROVED DECISION SUPPORT SYSTEM BASED ON THE BDM (BIT DECISION MAKING) ME...
 
Artur-eea-presentation
Artur-eea-presentationArtur-eea-presentation
Artur-eea-presentation
 
论文 yanning zang
论文 yanning zang论文 yanning zang
论文 yanning zang
 
Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...
Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...
Productivity and GDP per capita growth: A long-term perspective, Bergeaud, Ce...
 

More from Cdiscount

R Devtools
R DevtoolsR Devtools
R Devtools
Cdiscount
 
Presentation r markdown
Presentation r markdown Presentation r markdown
Presentation r markdown
Cdiscount
 
R2DOCX : R + WORD
R2DOCX : R + WORDR2DOCX : R + WORD
R2DOCX : R + WORD
Cdiscount
 
Fltau r interface
Fltau r interfaceFltau r interface
Fltau r interfaceCdiscount
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2Cdiscount
 
Introduction à la cartographie avec R
Introduction à la cartographie avec RIntroduction à la cartographie avec R
Introduction à la cartographie avec R
Cdiscount
 
HADOOP + R
HADOOP + RHADOOP + R
HADOOP + R
Cdiscount
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Cdiscount
 
Premier pas de web scrapping avec R
Premier pas de  web scrapping avec RPremier pas de  web scrapping avec R
Premier pas de web scrapping avec R
Cdiscount
 
Incorporer du C dans R, créer son package
Incorporer du C dans R, créer son packageIncorporer du C dans R, créer son package
Incorporer du C dans R, créer son package
Cdiscount
 
Comptabilité Nationale avec R
Comptabilité Nationale avec RComptabilité Nationale avec R
Comptabilité Nationale avec R
Cdiscount
 
Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)Cdiscount
 
Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1) Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1) Cdiscount
 
RStudio is good for you
RStudio is good for youRStudio is good for you
RStudio is good for you
Cdiscount
 
R fait du la tex
R fait du la texR fait du la tex
R fait du la tex
Cdiscount
 
Première approche de cartographie sous R
Première approche de cartographie sous RPremière approche de cartographie sous R
Première approche de cartographie sous R
Cdiscount
 

More from Cdiscount (17)

R Devtools
R DevtoolsR Devtools
R Devtools
 
Presentation r markdown
Presentation r markdown Presentation r markdown
Presentation r markdown
 
R2DOCX : R + WORD
R2DOCX : R + WORDR2DOCX : R + WORD
R2DOCX : R + WORD
 
Gur1009
Gur1009Gur1009
Gur1009
 
Fltau r interface
Fltau r interfaceFltau r interface
Fltau r interface
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
 
Introduction à la cartographie avec R
Introduction à la cartographie avec RIntroduction à la cartographie avec R
Introduction à la cartographie avec R
 
HADOOP + R
HADOOP + RHADOOP + R
HADOOP + R
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Premier pas de web scrapping avec R
Premier pas de  web scrapping avec RPremier pas de  web scrapping avec R
Premier pas de web scrapping avec R
 
Incorporer du C dans R, créer son package
Incorporer du C dans R, créer son packageIncorporer du C dans R, créer son package
Incorporer du C dans R, créer son package
 
Comptabilité Nationale avec R
Comptabilité Nationale avec RComptabilité Nationale avec R
Comptabilité Nationale avec R
 
Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)
 
Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1) Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1)
 
RStudio is good for you
RStudio is good for youRStudio is good for you
RStudio is good for you
 
R fait du la tex
R fait du la texR fait du la tex
R fait du la tex
 
Première approche de cartographie sous R
Première approche de cartographie sous RPremière approche de cartographie sous R
Première approche de cartographie sous R
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

Présentation Olivier Biau Random forests et conjoncture

  • 1. Euro area GDP forecasting using large survey datasets A random forest approach Olivier Biau – Angela D’Elia Directorate General for Economic and Financial Affairs European Commission Groupe Travail prévision Paris, 12 May 2010 Views expressed represent exclusively the positions of the authors and do not necessarily correspond to those of the European Commission
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.  
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.

Editor's Notes

  1. In recent years there has been increasing interest in forecasting methods that utilise large data sets . Indeed, there is a huge quantity of information available in the economic arena which might be useful for forecasting, but standard econometric techniques are not well suited to extract this in a useful form. This is not only an issue of academic interest . Central bankers and policy makers are interested in summarising large data sets for forecasting purposes (Eklund and Kapetanios gave a wide review of the recent literature on this issue) Since the reference paper of Stock and Watson (2002), factors methods have been at the fore front of developments in forecasting with large data set. The methodology to extract a few number of common factors has become more and more sophisticated from principal components to dynamic factor models, for example in Forni et al. Or to deal with unbalanced data sets at the end of the sample (jagged edge feature of the data), like in the recent contribution from Giannone on real time nowcast Finally, factor analysis combined with linear models (bridge models) has been among the main tool to summarise large data set for forecasting purpose.
  2. In our study, we would like to present a new statistical approach to forecasting macro economic aggregates based on the RF techniques , proposed by Breiman in the 2000’s. This technique is widely used in biostatistics, becomes more and more popular and appears to be very powerful in a lot of different applications (classification problems and also regression problems). But, it is largely unknown in economics . To our knowledge, the only application in the economic field is the paper I wrote with my co-authosr (gérard Biau ans Laurent Rouvière) when I was working for the French INSEE. In this paper, we use the micro data from the French Industry Business Tendency Survey to track the manufacturing output.
  3. If RF are so widely used in the medical research, it is because it enjoys good prediction properties, is robust to noise and can handle a very large number of input variables (which is very often the case in medicine, as you can collect a lot of variable on your patients but very often the number of observation is limited. RF is considered to be one of the most accurate general-purpose learning techniques available, independent of any functional and distributional assumptions RF is considered to be very powerful… but it is not clearly elucidated from a mathematical point of view . Although the mechanism appears simple, it involves many different driving forces which make it difficult to analyse. In fact, its mathematical properties remain to date largely unknown and, up to now, most theoretical studies have concentrated on isolated parts or stylized versions of the algorithm. Lin and Jeon: connection with the adaptative nearest neighbour The most recent paper from Gérard Biau is a step frowards as it shows consistency theorem of the RF algorithm.
  4. After this introduction, here the outline of the presentation: I will first present the dataset In order to explain the algorithm I will take one basic example Then, I will present the benchmarks, I mean, the competitors of the Random forest outputs Finally, we will see our results and the perspectives of this work
  5. After this introduction, here the outline of the presentation: I will first present the dataset In order to explain the algorithm I will take one basic example Then, I will present the benchmarks, I mean, the competitors of the Random forest outputs Finally, we will see our results and the perspectives of this work
  6. Well, the data set used in this paper is based on the J H EU Programme of BCS . It covers 5 sectors: A key aspect of the business surveys is that most questions ask for qualitative responses, reflecting the sentiment or confidence (optimistic, pessimistic or neutral) of managers and consumers The Programme covers all the 27 Members States, Croatia, the FYROM and Turkey More than 125 000 firms and over 40 000 consumers are surveyed every month
  7. To be more precise, The dataset mainly consists of the euro area balances of opinion (%positive - % negative) The time series used in the analysis are those available at the end of the third month of each quarter: the level series: monthly St or quarterly Sq , the difference series ( St - St-1 , St - St-2 , St - St-3 for monthly questions, Sq - Sq-1 for quarterly questions) The dataset is composed of p = 172 ‘soft’ series: X i The only ‘hard’ variable is the euro area GDP qoq growth series: Y i Finally, we have –what we called in this literature- a « learning set » L = { ( X 1 , Y 1 ) … ( X n , Y n ) } n=57 (from 3 rd Q 1995 to 3 rd Q 2009)
  8. I will now speak about the RF. But, before dealing with the Forest, I have to explain what is a tree and, above all, how it is grown . What is a tree? It is a partition of space into regions. The most important is the binary tree, since they have just 2 children per node Here we have an example: We first split the space into two regions. One or both of these regions are split into two more regions, and this process is continued, until some stopping rule is applied How to grow a tree (the response is for example with CART) and what is the tree predictor (or tree regressor) . In my next slide, I will take a basic example…
  9. Imagine I want to predict the income of someone who enters this room using the information I have collected about your self: So I know about you a lot of characteristics Xi such as your gender, size, age, your position –I mean do you have responsibility or not, are you head of unit, etc…), …and of course I have collected your income Y. We have to find the first node of the tree, that is the first splitting variable X j and the first split point s which discriminate the most (by solving this expression if we want to model the response forexample by a constant in each region) If we take the variable ‘size’, N1 represents the small ones (smaller than I) and N2 the tall ones. C1 is the average income for the small ones and C2 the average income of the all one. Do you think, this would be the best partition in term of minimum sum of square ? Undoubtedly the answer would be NO. The first node will be the position (the chiefs / the others…) or the age. So having found the first split by scanning all the covariates, CART repeats the process with other variables until for example the terminal nodes contains a user specified number of individuals.
  10. Now, the tree is grow… Given the characteristics X of the new entering the room (gender, size, position, …) I put him down the tree, He falls in a terminal node N(X) And I predict his income by averaging the observed Yi over the observations i ‘falling’ in that node. For example, is the new entering this room is a young Man, he is 150cm high, he is French and not HoU, … I will predict his income by averaging those of the young small French Men who are not chiefs.
  11. Breiman has demonstrated that consequential gains in prediction accuracy can be achieve by using a set of simpler trees. Here we have the RF algorithm as described in the book of Hastie More precisely, a random forest is a collection of K tree predictors, Where each tree is constructed from a bootstrap sample from the learning dataset L However, instead of determining the optimal split on a given node by evaluating all possible covariates, a subset of covariates drawn at random is used. Finally, we aggregate K tree predictors of simpler size trees. By doing that, it is possible to approximate a rich class of functions. And it gives an accurate approximation of the conditional mean. For the free parameters K , nodesize and mtry (the variables chosen at random from p) , we used the default values 500, 5 and p /3 of the random forest R-package
  12. In addition, RF posses a number of features, that can be used for example to deal with the problem of missing values. In this study, we use the important feature to reduce data dimensionality . Here n the number of observations is much lower than p the dimension of the explanatory variables The variable importance allows to discriminate between informative and noninformative variables. For each variable, the idea is to compare the prediction error with the prediction error where the variable is randomly permuted - Large positive values for a variable indicate that this variable is predictive, since noising it up increases prediction error, - zero or negative importance values indicate non-predictive variables In my previous example, the importance for size would have been very likely negative, while age or position would have been positive…
  13. So, our aim is to predict GDP growth for quarter Q, based on data available at the end on quarter Q To be clear, based on the information available until end of June 2010, we want to forecast 2 nd Q of 2010 (now cast, remember that Flash estimate GDP will be release by Eurostat mid august) To asses the quality of our forecast, we will do an out-of-sample analysis: 2004q1 – 2009q3 The criterium will be the MSE (out of sample MSE) We started our study an unvariate AR model … which appear to be a poor competitor. And we decided to compared our results with a fair competitor: the quarterly projections of the EZEO (jointly release by three major European economic institutes, the German IFO, the French Insee and the Italian ISAE) EZEO is a quarterly publication Based on the information available at the end of the previous quarter, EZEO forecast GDP growth of quarter Q. In fact, this publication provides also 2-steps -ahead projections (Q+1 and Q+2) for GDP, IP, Consumption, Inflation And describes also economic links explaining these forecasts… Here, however, our concern was just to asses How a data-driven model like the RF perform relative to competitors for GDP nowcasting.
  14. Results: We compare the pure random forest (non parametric) to our benchmarks. Based on MSE computed on the whole out of sample period, RF outperforms the AR but not the euro zone economic outlook You have quarter after quarter all the forecast in the paper, but in this graph I plotted how MSE develop over time (real time MSE) The graphs highlights the good performance of RF before the crisis Poor performance during the crisis… in fact for 2008Q4 and 2009Q1, we made a huge error. But it is not surprising, as the pure RF are based on average of values observed in the learning set. And before the crisis, no negative vales were present in the learning set ! However, it is worth noting that the RF algorithm “learns” and will be able to predict negative output in the future.
  15. To square the problem of non negative values in our learning set, we try to use the power of RF to select the 25 most important variables And then to insert this variable in a bridge model. We use the General to Specific procedure to choose the best linear model. RF_LIMOD is the model we retained. Remarkably, 3 of five explanatory variables come from the Industry (whose variability explain most of the GDP cycles), moreover the variable are related to orders book, which are well known to be among the most factual and reliable soft variable We have also one question about order in retail trader and on question from the consumer survey.
  16. To square the problem of non negative values in our learning set, we try to use the power of RF to select the 25 most important variables And then to insert this variable in a bridge model. We use the General to Specific procedure to choose the best linear model. RF_LIMOD is the model we retained. Remarkably, 3 of five explanatory variables come from the Industry (whose variability explain most of the GDP cycles), moreover the variable are related to orders book, which are well known to be among the most factual and reliable soft variable We have also one question about order in retail trader and on question from the consumer survey.
  17. Results: We continue compare RF_LINMOD outputs using the same criterion Based on MSE computed on the whole out of sample period, it performs as well as the euro zone economic outlook outperforms the pure RF and compares well to the euro zone economic outlook during the ‘crisis’
  18. For further development, we would like continue this analysis by adding in our dataset the hard variable that are available at the end of each quarter (for example the carry over of industrial production, first registration of private cars)… and according to our tests, the carry over of the IP appears to be one of the most important variable! Moreover, this kind of information is for sure used by our colleagues in the euro zone economic outlook… So, this work is still ongoing but promising.