This document provides an overview of a syllabus for a social forecasting course. It includes:
- Two assignments and one final paper will be required. The main textbook is Practical Time Series Forecasting with R.
- Forecasting involves predicting future values of time series. Time series data is common and found everywhere. Forecasting is done by governments, organizations, private sector firms, and academics.
- Forecasting the future is difficult due to our innate cognitive biases and the complexity of social systems. However, some aspects of the future can be predicted using statistical analysis of past data and ensemble forecasting methods.
This document provides an overview of a course on machine learning. It discusses topics that will be covered, including data visualization, descriptive statistics, the central limit theorem, correlation, classification, and confusion matrices. Classification examples include binary classification of emails as spam or not spam based on multiple features, as well as digit recognition from images. Trade-offs between types of errors in predictive models and optimizing goals like profit are also mentioned.
This document provides an overview of a course on machine learning. It discusses topics that will be covered in the course including data visualization, descriptive statistics, the central limit theorem, correlation, classification algorithms for binary and multiclass problems, and confusion matrices. Examples are provided for correlation, linear classification of handwritten digits, and how different types of classification errors can impact domains like medical diagnosis or airline overbooking policies. The goal is to introduce foundational machine learning concepts.
This document provides an overview of an introduction to machine learning course, including:
- A description of the course content which covers Python programming, data visualization, supervised learning algorithms, regression, and unsupervised learning.
- An example of predicting bike share usage at different stations and the importance of understanding the problem and data.
- Guidance on exploring and visualizing data in Python to gain insights before applying machine learning algorithms.
How to Write an Argumentative Essay Step By Step - Gudwriter. Sample Essay Outlines - 34+ Examples, Format, Pdf | Examples. Argumentative Essay Outline - 9+ Examples, Format, Pdf | Examples. A Sample Argumentative Essay.
- The document describes Stanley Milgram's famous experiment on obedience to authority from 1963. In the experiment, participants were instructed to administer electric shocks to a learner for incorrect answers, though no actual shocks were given.
- About 65% of participants administered what they believed were severe electric shocks, showing high obedience to authority. Each participant can be viewed as a Bernoulli trial with probability of 0.35 to refuse the shock.
- The document then discusses using the binomial distribution to calculate probabilities of outcomes with a given number of trials and probability of success for each trial. It provides the formula and conditions for applying the binomial distribution.
This document discusses the opportunities and challenges of big data and data science over the next decade. It outlines three key points:
1. Big data is opening doors to accelerating scientific discovery through generating hypotheses from data and using ensemble models to gain multiple perspectives. However, challenges around efficacy and efficiency remain.
2. Data science can be viewed as applying the scientific method to data through discovering correlations from data-driven models and seeking causation through empirical verification, similar to traditional scientific discovery.
3. For data science to fulfill its potential, its laws and best practices around ensuring meaningful correlations and determining causation through verification must be followed, although they are not always common in practice currently. The limits of data science also
This document provides an overview of a course on machine learning. It discusses topics that will be covered, including data visualization, descriptive statistics, the central limit theorem, correlation, classification, and confusion matrices. Classification examples include binary classification of emails as spam or not spam based on multiple features, as well as digit recognition from images. Trade-offs between types of errors in predictive models and optimizing goals like profit are also mentioned.
This document provides an overview of a course on machine learning. It discusses topics that will be covered in the course including data visualization, descriptive statistics, the central limit theorem, correlation, classification algorithms for binary and multiclass problems, and confusion matrices. Examples are provided for correlation, linear classification of handwritten digits, and how different types of classification errors can impact domains like medical diagnosis or airline overbooking policies. The goal is to introduce foundational machine learning concepts.
This document provides an overview of an introduction to machine learning course, including:
- A description of the course content which covers Python programming, data visualization, supervised learning algorithms, regression, and unsupervised learning.
- An example of predicting bike share usage at different stations and the importance of understanding the problem and data.
- Guidance on exploring and visualizing data in Python to gain insights before applying machine learning algorithms.
How to Write an Argumentative Essay Step By Step - Gudwriter. Sample Essay Outlines - 34+ Examples, Format, Pdf | Examples. Argumentative Essay Outline - 9+ Examples, Format, Pdf | Examples. A Sample Argumentative Essay.
- The document describes Stanley Milgram's famous experiment on obedience to authority from 1963. In the experiment, participants were instructed to administer electric shocks to a learner for incorrect answers, though no actual shocks were given.
- About 65% of participants administered what they believed were severe electric shocks, showing high obedience to authority. Each participant can be viewed as a Bernoulli trial with probability of 0.35 to refuse the shock.
- The document then discusses using the binomial distribution to calculate probabilities of outcomes with a given number of trials and probability of success for each trial. It provides the formula and conditions for applying the binomial distribution.
This document discusses the opportunities and challenges of big data and data science over the next decade. It outlines three key points:
1. Big data is opening doors to accelerating scientific discovery through generating hypotheses from data and using ensemble models to gain multiple perspectives. However, challenges around efficacy and efficiency remain.
2. Data science can be viewed as applying the scientific method to data through discovering correlations from data-driven models and seeking causation through empirical verification, similar to traditional scientific discovery.
3. For data science to fulfill its potential, its laws and best practices around ensuring meaningful correlations and determining causation through verification must be followed, although they are not always common in practice currently. The limits of data science also
A look at two different Datasets (infection data & mobility data to make some predictions about Corona Virus. The main takeaways:
1. Without a vaccine Corona is here to stay for 18 months till herd immunity. We need to have cyclical lockdowns of 2 weeks lockdown 6 weeks opening.
2. The structure of a city dictates whether a lockdown works or not. Rural and Nature heavy cities like Utah can't follow the same strategy like NY or Manhattan.
Presentation at ESCAIDE 2016 by Thibaut Jombart. The R Epidemics Consortium: Building the next generation of statistical tools for outbreak response using R
Role of Data Accessibility During PandemicDatabricks
This talk focuses on the importance of data access and how crucial it is, to have the granular level of data availability in the open-source space as it helps researchers and data teams to fuel their work.
We present to you the research conducted by the DS4C (Data Science for Covid-19) team who made a huge and detailed level of South Korea Covid-19 data available to a wider community. The DS4C dataset was one of the most impactful datasets on Kaggle with over fifty thousand cumulative downloads and 300 unique contributors. What makes the DS4C dataset so potent is the sheer amount of data collected for each patient. The Korean government has been collecting and releasing patient information with unprecedented levels of detail. The data released includes infected people’s travel routes, the public transport they took, and the medical institutions that are treating them. This extremely fine-grained detail is what makes the DS4C dataset valuable as it makes it easier for researchers and data scientists to identify trends and more evidence to support hypotheses to track down the cause and gain additional insights. We will cover the data challenges, impact that it had on the community by making this data available on a public forum and conclude it with an insightful visual representation.
From health persona to societal health uci 131202Ramesh Jain
Personal life style plays important role in a person’s health. It is now possible to analyze and understand a person’s life style. Most people use phones with myriad sensors that continuously generate data streams related to most aspects of their life. By correlating these multi-sensory data streams, it is possible to create an accurate chronicle of a person’s life. By correlating life events with health related events, obtained using wearable sensors and other common sources of information, one can build health persona of a person. Health persona of a person is a long-term objective characterization of a person’s health. By using health persona for a large group of people, one can analyze and understand health patterns and causes of different diseases in a society. In this talk, we present a framework that collects, manages, and correlates personal data from heterogeneous data sources and detects events happening at personal level to build health persona. We use several data streams such as motion tracking, location tracking, activity level, and personal calendar data. We illustrate how recognition algorithms can be applied to Life Event detection problem and then build an objective chronicle for a person. We show how this could be combined with situation detection and help people in making decisions in their every day life. In this talk, we will present our ideas related to health persona, its impact on societal health, and its use in making decisions.
This document provides an overview of lectures on machine learning topics including classification, overfitting, support vector machines, data projection, and regression. It discusses evaluating models, controlling overfitting through cross-validation, precision vs recall, and implementing classification and regression in Python using Scikit-Learn. Examples are provided on linear classification with SVM, handling non-linearly separable data, and using data projection techniques like LDA.
Datascience Introduction WebSci Summer School 2014Claudia Wagner
This document provides an overview of key concepts in data science and statistical analysis. It discusses the different activities involved in a typical data science project, including data collection, preparation, analysis, visualization, and preservation. Various data types and scales of measurement are defined. Common statistical and machine learning techniques are explained, such as clustering, dimensionality reduction, and regression. Potential biases and issues in data collection and analysis are also addressed. The document aims to give readers a well-rounded introduction to the data science process and some important statistical concepts.
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmgdahirf
This document discusses selecting the right metrics and avoiding misleading metrics when analyzing system performance. It cautions against averages that obscure variability, averages of averages, percentages without baseline context, and correlation being confused for causation. The key is to select metrics that provide useful insights, understand the data and what is being measured, and avoid cherry-picking or misusing statistics to mislead. Consistency, standard deviation, medians, and displaying trends over time are emphasized as better approaches than simple averages or percentages without context.
Internationalising the Curriculum: Teaching and Learning for the Digital WorldMark Brown
Keynote presentation at SEDA Spring Teaching, Learning and Assessment Conference, 2015 Internationalising the Curriculum: What does it mean? How can we achieve it? 15th May, 2015, Manchester.
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
Talk given to the Rhodes Biomedical Association, 4th May 2016.
For references see: http://www.slideshare.net/deevybishop/references-on-reproducibility-crisis-in-science-by-dvm-bishop
The document describes different study designs for observational studies, including matching designs. It provides two examples of matching designs used to study the effects of hurricanes on online friendships and the effects of exercise on mental health using Twitter data. The hurricane study matched universities affected by a hurricane with unaffected universities on variables like size and ranking. The exercise study matched Twitter users who tweeted about exercising with similar users who did not exercise. The document also discusses using propensity score matching and difference-in-differences to study the effect of having an answer accepted on question answering sites like Stack Overflow.
Transforming Science Education in An Age of MisinformationCarl Bergstrom
Keynote at theNorthwest Commission on Colleges and Universities annual meeting. I argue that if we want to address science misinformation on social media and beyond, we need to teach (1) data reasoning and (2) an understanding of the social process of science.
This document discusses promoting gender equality in education and STEM fields. It notes that girls are less likely than boys to take science subjects in high school in western countries. It discusses how stereotypes and unconscious bias can influence career choices and expectations. It promotes the idea of equitable education that creates an enabling environment for all students. It provides examples of international initiatives like Girls in ICT day that aim to encourage women in STEM. The document concludes by offering resources for building inclusive classrooms and rethinking teaching methods.
This document discusses case-control studies and related epidemiological concepts. It begins with an overview of case-control studies, noting that controls provide a "window" into the study base rather than serving as experimental controls. It then discusses incidence rate, also called incidence density, and how it is calculated using population time at risk. Finally, it provides an example calculation of incidence rate based on cases that occurred over three weeks in a hypothetical population.
Basic concepts about natural experiments, based mostly on Dunning's book.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
1. The document discusses communicating science effectively to different audiences.
2. It emphasizes the importance of understanding the audience and bridging the gap between the content and how it is presented.
3. Examples are provided of generating awareness of scientific concepts through mechanisms, predictions, and experiments, such as explaining how Saturn's rings were discovered.
David Gross
Department of Biochemistry and Molecular Biology
UMass, Amherst
Peter Newbury
Center for Teaching Development
UC San Diego
19 February 2015
collegeclassroom.ucsd.edu
cirtl.net
This document summarizes a workshop on working with data. The workshop covers defining data, disassembling or breaking down data, evaluating data through testing hypotheses and predictions, and acting on insights from data. It provides examples of key data concepts and encourages participants to engage in exercises to forecast values and reflect on accuracy. The overall goal is to help participants develop data literacy and an ability to make decisions based on facts and evidence rather than intuition alone.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
A look at two different Datasets (infection data & mobility data to make some predictions about Corona Virus. The main takeaways:
1. Without a vaccine Corona is here to stay for 18 months till herd immunity. We need to have cyclical lockdowns of 2 weeks lockdown 6 weeks opening.
2. The structure of a city dictates whether a lockdown works or not. Rural and Nature heavy cities like Utah can't follow the same strategy like NY or Manhattan.
Presentation at ESCAIDE 2016 by Thibaut Jombart. The R Epidemics Consortium: Building the next generation of statistical tools for outbreak response using R
Role of Data Accessibility During PandemicDatabricks
This talk focuses on the importance of data access and how crucial it is, to have the granular level of data availability in the open-source space as it helps researchers and data teams to fuel their work.
We present to you the research conducted by the DS4C (Data Science for Covid-19) team who made a huge and detailed level of South Korea Covid-19 data available to a wider community. The DS4C dataset was one of the most impactful datasets on Kaggle with over fifty thousand cumulative downloads and 300 unique contributors. What makes the DS4C dataset so potent is the sheer amount of data collected for each patient. The Korean government has been collecting and releasing patient information with unprecedented levels of detail. The data released includes infected people’s travel routes, the public transport they took, and the medical institutions that are treating them. This extremely fine-grained detail is what makes the DS4C dataset valuable as it makes it easier for researchers and data scientists to identify trends and more evidence to support hypotheses to track down the cause and gain additional insights. We will cover the data challenges, impact that it had on the community by making this data available on a public forum and conclude it with an insightful visual representation.
From health persona to societal health uci 131202Ramesh Jain
Personal life style plays important role in a person’s health. It is now possible to analyze and understand a person’s life style. Most people use phones with myriad sensors that continuously generate data streams related to most aspects of their life. By correlating these multi-sensory data streams, it is possible to create an accurate chronicle of a person’s life. By correlating life events with health related events, obtained using wearable sensors and other common sources of information, one can build health persona of a person. Health persona of a person is a long-term objective characterization of a person’s health. By using health persona for a large group of people, one can analyze and understand health patterns and causes of different diseases in a society. In this talk, we present a framework that collects, manages, and correlates personal data from heterogeneous data sources and detects events happening at personal level to build health persona. We use several data streams such as motion tracking, location tracking, activity level, and personal calendar data. We illustrate how recognition algorithms can be applied to Life Event detection problem and then build an objective chronicle for a person. We show how this could be combined with situation detection and help people in making decisions in their every day life. In this talk, we will present our ideas related to health persona, its impact on societal health, and its use in making decisions.
This document provides an overview of lectures on machine learning topics including classification, overfitting, support vector machines, data projection, and regression. It discusses evaluating models, controlling overfitting through cross-validation, precision vs recall, and implementing classification and regression in Python using Scikit-Learn. Examples are provided on linear classification with SVM, handling non-linearly separable data, and using data projection techniques like LDA.
Datascience Introduction WebSci Summer School 2014Claudia Wagner
This document provides an overview of key concepts in data science and statistical analysis. It discusses the different activities involved in a typical data science project, including data collection, preparation, analysis, visualization, and preservation. Various data types and scales of measurement are defined. Common statistical and machine learning techniques are explained, such as clustering, dimensionality reduction, and regression. Potential biases and issues in data collection and analysis are also addressed. The document aims to give readers a well-rounded introduction to the data science process and some important statistical concepts.
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmgdahirf
This document discusses selecting the right metrics and avoiding misleading metrics when analyzing system performance. It cautions against averages that obscure variability, averages of averages, percentages without baseline context, and correlation being confused for causation. The key is to select metrics that provide useful insights, understand the data and what is being measured, and avoid cherry-picking or misusing statistics to mislead. Consistency, standard deviation, medians, and displaying trends over time are emphasized as better approaches than simple averages or percentages without context.
Internationalising the Curriculum: Teaching and Learning for the Digital WorldMark Brown
Keynote presentation at SEDA Spring Teaching, Learning and Assessment Conference, 2015 Internationalising the Curriculum: What does it mean? How can we achieve it? 15th May, 2015, Manchester.
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
Talk given to the Rhodes Biomedical Association, 4th May 2016.
For references see: http://www.slideshare.net/deevybishop/references-on-reproducibility-crisis-in-science-by-dvm-bishop
The document describes different study designs for observational studies, including matching designs. It provides two examples of matching designs used to study the effects of hurricanes on online friendships and the effects of exercise on mental health using Twitter data. The hurricane study matched universities affected by a hurricane with unaffected universities on variables like size and ranking. The exercise study matched Twitter users who tweeted about exercising with similar users who did not exercise. The document also discusses using propensity score matching and difference-in-differences to study the effect of having an answer accepted on question answering sites like Stack Overflow.
Transforming Science Education in An Age of MisinformationCarl Bergstrom
Keynote at theNorthwest Commission on Colleges and Universities annual meeting. I argue that if we want to address science misinformation on social media and beyond, we need to teach (1) data reasoning and (2) an understanding of the social process of science.
This document discusses promoting gender equality in education and STEM fields. It notes that girls are less likely than boys to take science subjects in high school in western countries. It discusses how stereotypes and unconscious bias can influence career choices and expectations. It promotes the idea of equitable education that creates an enabling environment for all students. It provides examples of international initiatives like Girls in ICT day that aim to encourage women in STEM. The document concludes by offering resources for building inclusive classrooms and rethinking teaching methods.
This document discusses case-control studies and related epidemiological concepts. It begins with an overview of case-control studies, noting that controls provide a "window" into the study base rather than serving as experimental controls. It then discusses incidence rate, also called incidence density, and how it is calculated using population time at risk. Finally, it provides an example calculation of incidence rate based on cases that occurred over three weeks in a hypothetical population.
Basic concepts about natural experiments, based mostly on Dunning's book.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
1. The document discusses communicating science effectively to different audiences.
2. It emphasizes the importance of understanding the audience and bridging the gap between the content and how it is presented.
3. Examples are provided of generating awareness of scientific concepts through mechanisms, predictions, and experiments, such as explaining how Saturn's rings were discovered.
David Gross
Department of Biochemistry and Molecular Biology
UMass, Amherst
Peter Newbury
Center for Teaching Development
UC San Diego
19 February 2015
collegeclassroom.ucsd.edu
cirtl.net
This document summarizes a workshop on working with data. The workshop covers defining data, disassembling or breaking down data, evaluating data through testing hypotheses and predictions, and acting on insights from data. It provides examples of key data concepts and encourages participants to engage in exercises to forecast values and reflect on accuracy. The overall goal is to help participants develop data literacy and an ability to make decisions based on facts and evidence rather than intuition alone.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
How to Manage Reception Report in Odoo 17Celine George
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxOH TEIK BIN
(A Free eBook comprising 3 Sets of Presentation of a selection of Puzzles, Brain Teasers and Thinking Problems to exercise both the mind and the Right and Left Brain. To help keep the mind and brain fit and healthy. Good for both the young and old alike.
Answers are given for all the puzzles and problems.)
With Metta,
Bro. Oh Teik Bin 🙏🤓🤔🥰
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
2. Trinity College Dublin, The University of Dublin
Syllabus
• 2 assignments
• 1 final paper
• Main book: Shmueli, Galit, and Kenneth C. Lichtendahl Jr. Practical Time Series
Forecasting with R: A Hands-on Guide (2nd ed.). Axelrod Schnall Publishers, 2018.
3. Trinity College Dublin, The University of Dublin
“Forecast”= predict the future value
of a time series
5. Trinity College Dublin, The University of Dublin 5
Tell us what the future
holds, so we may know
that you are gods.
(Isaiah 41:23)
Lycurgus Consulting the Pythia (1835/1845), as imagined
by Eugène Delacroix (source: Wikipedia)
6. Trinity College Dublin, The University of Dublin
Who generates forecasts?
Governments
NGOs
Corporates
Private sector
Consulting firms Academia
8. Trinity College Dublin, The University of Dublin 8
“The horse is here to stay but
the automobile is only a
novelty—a fad”
1903, the president of
Michigan Savings Bank
Stock prices have reached
“what looks like a
permanently high plateau… I
believe the principle of the
investment trusts is sound,
and the public is justified in
participating in them.”
Irving Fisher, October 1929
“I think there is a
world market for
maybe five
computers.”
Thomas Watson, 1943
13. Trinity College Dublin, The University of Dublin
We are poor predictors
We like simple explanations
We don’t correct
We are overconfident
We hate randomness
21. Trinity College Dublin, The University of Dublin
Social Sciences are worse
First order chaotic systems Second order chaotic systems
Observers observing observers who
observe observers
22. Trinity College Dublin, The University of Dublin
Fundamentally unpredictable?
Multiple equilibria
Mixed strategies
23. Trinity College Dublin, The University of Dublin
Irreducible sources of error
- Specification error: cannot include
all variables
- Include as much as you can? No!
- Measurement error: some variables
are particularly difficult to observe
- Natural phenomena: Indian Ocean
tsunami and violence in Aceh
Source: Spagat et al. “Estimating War
Deaths: An Arena of Contestation”
24. Trinity College Dublin, The University of Dublin
Much is predictable
Rules
Strategies and equilibria
Structural constraints
Strong autocorrelation in: space, time
25. Trinity College Dublin, The University of Dublin
Non-trivial questions
Boring Unpredictable
Just right
Civil war in
Switzerland
in 2022?
Black
swans
Rare events
29. Trinity College Dublin, The University of Dublin
Which is easiest to forecast?
29
• Daily electricity demand in 3 days time
• Timing of next Halley’s comet appearance
• Time of sunrise this day next year
• Google stock price tomorrow
• Google stock price in 6 months time
• Maximum temperature tomorrow
• Exchange rate of $/€ next week
• Total sales of drugs in Irish pharmacies next month
30. Trinity College Dublin, The University of Dublin
How predictable?
30
Depends on:
1. how well we understand the factors that contribute to it
2.how much data is available
3.whether the forecasts can affect the thing we are trying to forecast.
4.the future is somewhat similar to the past
5.there is relatively low natural/unexplainable random variation.
31. Trinity College Dublin, The University of Dublin
Improving forecasts…
31
…but social
science forecasts
are much harder
37. Trinity College Dublin, The University of Dublin
Statistics
37
N refugeest = f(casualties, unemployment, day of the week, error)
N refugeest = f(Nrefugeest-1, Nrefugeest-2, Nrefugeest-3, …, error)
N refugeest = f(Nrefugeest-1, casualties, unemployment, …, error)
38. Trinity College Dublin, The University of Dublin
Statistics
Learning Sample
Test Sample
Y increases by b when x increases by 1 (well, sort of)
Predictions in test sample
39. Trinity College Dublin, The University of Dublin
Machine learning algorithms
Support vector machines
40. Trinity College Dublin, The University of Dublin
Risks of Machine Learning approaches
Over-fitting
Too little data
Don’t improve that much, if at all, over
much simpler logits
41. Trinity College Dublin, The University of Dublin
Forecasts are hard, especially
about the future
Aka: how poorly do we do?
42. Trinity College Dublin, The University of Dublin 42
3
-5
-4
-3
-2
-1
0
1
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
2017
Financial year ended
2013
2016
2015
2014
2012
2011
Actual
Grattan analysis of Commonwealth Budget Papers
Commonwealth plans to drift back to surplus
show the triumph of experience over hope
Actual and forecast Commonwealth underlying cash balance
per cent of GDP
Forecast made in
43. Trinity College Dublin, The University of Dublin
Important forecasting efforts
Academic:
— Political Instability Task Force 2002-present
— DARPA ICEWS 2007-2015
— Peace Research Center Oslo (PRIO) and Uppsala University UCDP models
— Uppsala ViEWS
— Many others
Governments: typically rely on experts, but some use large-N data:
— Germany
— Netherlands
— EU
— World Bank
— US
— Others, but often classified
45. Trinity College Dublin, The University of Dublin
Overpredicting vs
underpredicting
True warnings
False alarms
46. Trinity College Dublin, The University of Dublin
How well: Experts
Tetlock:
284 experts
20+ years of forecasts
1000s of forecasts
47. Trinity College Dublin, The University of Dublin
Experts: results
Overpredict rare events
No better than dilettantes
All humans far worse than simple
algorithms
Why so bad?
48. Trinity College Dublin, The University of Dublin
Machine learning: performance
No Conflict Conflict
No conflict 1432 80
Conflict 58 374
48
Predicted
Observed
49. Trinity College Dublin, The University of Dublin
How well? Crowds
IARPA competition: GJP the winner
The top forecasters in the Good Judgement Project (Tetlock) are "reportedly
30% better than intelligence officers with access to actual classified
information.”
49
56. Trinity College Dublin, The University of Dublin
Basic Notation
t=1,2,3… = time period index
Yt = value of the series at time period t
Ft+k = forecast for time period t+k, given data until
time t
et = forecast error for period t
ForecastingBook.com
57. Trinity College Dublin, The University of Dublin
Time series components
Systematic part
• Level
• Trend
• Seasonal patterns
Non-systematic part
• “Noise”
Additive:
Yt = Level + Trend + Seasonality + Noise
Multiplicative:
Yt = Level x Trend x Seasonality x Noise
ForecastingBook.com