This file is mainly useful for undergraduate Earth science students. Explains various non-radiometric and radiometric methods of finding the age of Earth.
This file is mainly useful for undergraduate Earth science students. Explains various non-radiometric and radiometric methods of finding the age of Earth.
Crashes on limited access roadways typically occur due to drivers being unable to react in time to avoid collisions with vehicles ahead of them either moving slower or merging
unexpectedly. Prevailing traffic stream conditions with high volume and low or variable speed downstream of low volume and high speed conditions can increase the possibilities for such collisions to occur. Real time trajectories of vehicles collected through crowd sourcing methods can give information about the distribution of speeds in the traffic stream by space
and time. Spatio-temporal models relating these observed speed distributions to the occurrence of crashes or near crashes can help to identify crash prone traffic conditions as
they arise, offering the opportunity to warn drivers before crashes occur.
As the analysis by Kalra & Paddock (2016) demonstrated, traditional crash data and analysis approaches may require hundreds of millions or billions of self-driving miles to achieve sufficient power to demonstrate that automated vehicles (AVs) have lower injury/fatality risk than human-driven vehicles. Moreover, crash risk for AVs is a moving target as algorithms and systems change, and the mistakes AVs will make are not necessarily the same mistakes humans make. Thus, we need to rethink both the data that will make up transportation safety datasets in the near future as well as the analytical approaches used. I will present some newer data-collection approaches along with some specific challenges that might call for different analytical approaches than are being used for crash data today.
As we prepare for a future of driverless cars, what new risks must we work to understand? Despite the connotation of driverless, we can expect that humans will remain in the loop at each iteration of increasingly autonomous technology integration. While our technology is advancing, our population and economics are also transitioning to present challenging paradigm shifts that we should account for in assessing the risks of driverless cars. Let us take this holistic systems engineering approach to exploring transportation at the Statistical and Applied Mathematical Sciences Institute.
Highway crash data with average of 39 thousand fatalities and 2.4 million nonfatal injuries per year have repetitive and predictable patterns, and may benefit from statistical predictive
models to enhance highway safety and operation efforts to reduce crash fatalities/injuries. Highway crashes have patterns that repeat over fixed periods of time within the data set for
crashes such as motorcycle, bicycles, pedestrians, nighttime, fixed object, weekend, and winter crashes. In some States, these crashes are weekly, monthly, or seasonally. Contributing
factors such as: age category, light condition, weather, weekday, underlying state of the economy, and others impact these variations.
This talk will review dynamic modeling and prediction for temporal and spatio-temporal data and describe algorithms for suitable state space models. Use of dynamic models for modeling crash types by severity will be briefly illustrated. Extension of these approaches for handling irregular temporal spacing and spatial sparseness will be discussed, and a potential application to travel time prediction will be explored.
Highlights topics of discussion on remote sensing during Day 1 of Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop.
The climate and earth sciences have recently undergone a rapid transformation from a data-poor
to a data-rich environment. In particular, massive amount of data about Earth and its
environment is now continuously being generated by a large number of Earth observing satellites
as well as physics-based earth system models running on large-scale computational platforms.
These massive and information-rich datasets offer huge potential for understanding how the
Earth's climate and ecosystem have been changing and how they are being impacted by humans’
actions. This talk will discuss various challenges involved in analyzing these massive data sets
as well as opportunities they present for both advancing machine learning as well as the science
of climate change in the context of monitoring the state of the tropical forests and surface water
on a global scale.
Crashes on limited access roadways typically occur due to drivers being unable to react in time to avoid collisions with vehicles ahead of them either moving slower or merging
unexpectedly. Prevailing traffic stream conditions with high volume and low or variable speed downstream of low volume and high speed conditions can increase the possibilities for such collisions to occur. Real time trajectories of vehicles collected through crowd sourcing methods can give information about the distribution of speeds in the traffic stream by space
and time. Spatio-temporal models relating these observed speed distributions to the occurrence of crashes or near crashes can help to identify crash prone traffic conditions as
they arise, offering the opportunity to warn drivers before crashes occur.
As the analysis by Kalra & Paddock (2016) demonstrated, traditional crash data and analysis approaches may require hundreds of millions or billions of self-driving miles to achieve sufficient power to demonstrate that automated vehicles (AVs) have lower injury/fatality risk than human-driven vehicles. Moreover, crash risk for AVs is a moving target as algorithms and systems change, and the mistakes AVs will make are not necessarily the same mistakes humans make. Thus, we need to rethink both the data that will make up transportation safety datasets in the near future as well as the analytical approaches used. I will present some newer data-collection approaches along with some specific challenges that might call for different analytical approaches than are being used for crash data today.
As we prepare for a future of driverless cars, what new risks must we work to understand? Despite the connotation of driverless, we can expect that humans will remain in the loop at each iteration of increasingly autonomous technology integration. While our technology is advancing, our population and economics are also transitioning to present challenging paradigm shifts that we should account for in assessing the risks of driverless cars. Let us take this holistic systems engineering approach to exploring transportation at the Statistical and Applied Mathematical Sciences Institute.
Highway crash data with average of 39 thousand fatalities and 2.4 million nonfatal injuries per year have repetitive and predictable patterns, and may benefit from statistical predictive
models to enhance highway safety and operation efforts to reduce crash fatalities/injuries. Highway crashes have patterns that repeat over fixed periods of time within the data set for
crashes such as motorcycle, bicycles, pedestrians, nighttime, fixed object, weekend, and winter crashes. In some States, these crashes are weekly, monthly, or seasonally. Contributing
factors such as: age category, light condition, weather, weekday, underlying state of the economy, and others impact these variations.
This talk will review dynamic modeling and prediction for temporal and spatio-temporal data and describe algorithms for suitable state space models. Use of dynamic models for modeling crash types by severity will be briefly illustrated. Extension of these approaches for handling irregular temporal spacing and spatial sparseness will be discussed, and a potential application to travel time prediction will be explored.
Highlights topics of discussion on remote sensing during Day 1 of Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop.
The climate and earth sciences have recently undergone a rapid transformation from a data-poor
to a data-rich environment. In particular, massive amount of data about Earth and its
environment is now continuously being generated by a large number of Earth observing satellites
as well as physics-based earth system models running on large-scale computational platforms.
These massive and information-rich datasets offer huge potential for understanding how the
Earth's climate and ecosystem have been changing and how they are being impacted by humans’
actions. This talk will discuss various challenges involved in analyzing these massive data sets
as well as opportunities they present for both advancing machine learning as well as the science
of climate change in the context of monitoring the state of the tropical forests and surface water
on a global scale.
Presentation given by Chris Swain, Water Quality Advocate at the Opening Session on Monday, January 25, 2010 during the 2010 NEWEA Annual Conference in Boston, Massachusetts
How we know what we know about climate changeLisa Gardiner
Presentation for science educators at the National Science Teachers Association conference in Denver, CO 2013 about how climate scientists use data from paleoclimate proxies, current observations of the Earth system, and models of future climates to gain an understanding of Earth's climate.
Climate: Climatic Change - Evidence, Cycles and The Futuregeomillie
A PowerPoint used in class to cover the key forms of evidence you need to know for the Exam. Key Questions are likely to be focused on how we can gain information of past climatic change, and how it can be used to predict future, and I would expect you to be able to comment on the usefulness of the different types. For instance, Ice cores are highly accurate and quantifiable evidence, but gaining them is expensive, and only gives a climatic record for the site at which the snow formed. However, they do provide the longest record of change.
09-28-17 Lifelong Learning Lecture: Jim HaynesEllsworth1835
"Natural and Human Causes of Climate Change: What Scientists Know and How They Know It"
Presented by James M. Haynes, PhD, Interim Provost and Vice President for Academic Affairs, Professor of Environmental Science and Ecology
For eons, six slowly, often intermittently acting natural forces have changed the Earth's temperature within a range of +7 degrees Fahrenheit, leading to climate swings from ice ages to planet-wide tropical conditions. Now, a seventh rapidly acting force is changing climate—modern human civilization. What evidences of climate change are observed today, and what is likely to happen to our children and generations beyond as a result of human activity in the recent past and today? What can we do to minimize the impacts of the changes to come?
Petro Teach Free Webinar on "Rock Physics for Quantitative Seismic Reservoir ...Petro Teach
Rock Physics ranging from basic laboratory and theoretical results to practical “recipes” that can be immediately applied in the field. In this Free Webinar, Professor Tapan Mukerji will present quantitative tools for understanding and predicting diagnostic seismic signatures of deposition, diagenesis, lithology, pore fluid saturation, stress, pore pressure and temperature, and fractures. He will explain how to apply rock physics models for seismic reservoir characterization and uncertainty quantification
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
1.4 modern child centered education - mahatma gandhi-2.pptx
CLIM Undergraduate Workshop: Statistical Development and challenges for Paleo-climate Reconstruction - Bo Li, Oct 24, 2017
1. Undergraduate Students Workshop
Statistical Developments and Challenges in
Past Climate Reconstruction
Bo Li
Department of Statistics
University of Illinois at Urbana-Champaign
October 24, 2017
2. Acknowledgement:
A mixed group of climate scientists and statisticians:
• Caspa Ammann, Luis Barboza, Peter Craigmile, Murali Haran,
Elizabeth Manshardt, Doug Nychka, Bala Rajaratnam,
Jason Smerdon, Martin Tingley, Frederi Viens,
Sooin Yun, Xianyang Zhang
3. Why care about the PAST climate?
• Accurate and precise reconstructions of past climate
help to characterize natural climate variability on longer
time scales.
• Spatially wide-spread instrumental temperature ob-
servations extend back to only about 1850.
• Validate climate models - Atmosphere/Ocean Gen-
eral Circulation Model (AOGCM)
4. How to recover past climate?
• Earth’s climate history written in ice, wood and stone.
• Reconstruct the past temperature from indirect ob-
servations (proxies) such as
– tree-ring width and densities
– Pollen
– Borehole
– Speleothems (cave deposits)
– Coral records, etc.
• Radiative Forcings: Solar, Volcanic eruption and
Greenhouse gases.
5. Tree-rings
This cross-section of a giant sequoia tree shows some of the tree-rings and fire
scars. The numbers indicate the year that a particular ring was laid down by
the tree
8. Speleothem
As water runs through the ground, it picks up minerals, the most common
of which is calcium carbonate. When the mineral-rich water drips into caves,
it leaves behind solid mineral deposits. The mineral deposits accumulate into
speleothems.
10. Forcings
1000 1200 1400 1600 1800 2000
a
b
c
a: Volcanism (contains substantial noise)
b: Solar irradiance
c: Green house gases
11. A quiz for you
Start with a simple reconstruction
• Assume we observe Y and a bunch of X’s.
• Now given new X’s, how to estimate Y?
12. A quiz for you
Start with a simple reconstruction
• Assume we observe Y and a bunch of X’s.
• Now given new X’s, how to estimate Y?
Solution: Linear regression
Y = Xβ +
Y = Xβ
14. A stats perspective
The problem:
• Quantify the uncertainty in the temperature esti-
mates from other kinds of observations (i.e. proxies).
• e.g. What is the uncertainty in the maximum decadal
temperature estimates for the last 1000 years?
A statistical solution:
• Find the distribution of the temperatures given the
other observations.
• Represent this distribution by an ensemble of pos-
sible reconstructions all statistically valid.
15. An ensemble of Santa’s and the ensemble mean
Santa
Don’t send your Xmas letter to Santa !
Credit: Cabinet 15
16. Statistical Relationship between
NH temperature and the proxies.
A minimal recipe for generating ensembles:
• A stationary linear relationship between the temper-
ature and proxies
• the conditional distribution of temperature given prox-
ies is normally distributed.
• Prediction errors are correlated in time
• Adjustment made for overfitting during calibration
period using cross-validation
• Also include the uncertainty in the parameters.
19. Linear Model of NH temperature on proxies
The statistical model:
Tt = ptβ + et
Tt, pt: the temperature and proxies at time t.
β: a vector of regression coefficients
The vector of error et follows an AR(2) process:
et = φ1et−1 + φ2et−2 + t, t ∼ iid Normal(0, σ2
)
Model fitting procedure: Generalized Least Squares
20. Overfitting and
uncertainty of model parameter estimates
• Symptom of overfitting: The variance of observed
prediction errors is greater than the prediction vari-
ability derived from the model.
Solution: 10-fold cross-validation to estimate the in-
flation adjustment
• Uncertainty of model parameter estimates:
Parametric bootstrap to estimate the sampling dis-
tribution of the parameter estimates.
• An ensemble:
T = Pβ + (e1000, ..., e1849) |(e1850, ..., e1980)
29. A statistical perspective.
The Program:
• Develop a statistical (perhaps complicated) relation-
ship between temperatures and proxies.
• The prediction is the distribution of the temperatures
given the observed values of the proxies.
What can not be addressed:
Errors in the proxies and analysis outside the statistical
model. (We don’t know what we don’t know.)
e.g. Proxies change in their relationship to temperature
over time
30. How to integrate different data sources
Skill of each proxy and forcings
• Tree ring (Dendrochronology): annual to decadal
• Pollen: bi-decadal to semi-centennial
• Borehole: centennial and onward
• Forcings: external drivers
Goal: Reconstruct the 850-1849 temperature by all
proxies, forcings and the 1850-1999 temperature
31. Bayesian Hierarchical Model (BHM)
Bayesian Hierarchical Model (BHM) to thread all
proxies, forcings and temperatures
Baye’s theorem:
P(A|B) =
P(B|A)P(A)
P(B)
Named after Thomas Bayes (1701-1761)
32. Bayesian Hierarchical Model (BHM)
A distribution rule:
[P, T, θ] = [P|T, θ][T|θ][θ]
Three hierarchies:
• Data Stage: [Proxies|Temperature, Parameters]
Likelihood of Proxies given temperatures
• Process Stage: [Temperature|Parameters]
Physical model of temperature process
• Parameter Stage: [Parameters]
Specify the prior of parameters
33. Apply BHM to Climate from Models
National Center for Atmospheric Research (NCAR) Com-
munity Climate System Model (CCSM) Version 1.4
• Provide test beds to evaluate our reconstruction
method
• Climate model is highly nonlinear and substantial
complexity
Resolution:
– 3.750 × 3.750/400km×400km for atmosphere and land
• Generate synthetic proxies (15 Tree-Ring, 10 Pollen
and 5 Borehole) from model climate based on their
own characteristics
35. Synthetic Proxies Generation
Generate synthetic proxies:
• 15 tree rings: subtract the 10-year smoothing aver-
age from each of the local temperature series
• 10 pollen: Sample a 10-year smoothing average tem-
perature series at 30 year intervals
1000 1200 1400 1600 1800 2000
−1.5−1.0−0.50.00.5
temperatureanomaly
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
Tree-rings: black line; Pollen: red dots
36. • 5 borehole:
– POM-SAT Forward Model
– describe the diffusion process of
surface temperature
year
temperature
0.0
0.2
0.4
0.6
0.8
1.0
1000 1200 1400 1600 1800 2000
pulse at year 850
0.0
0.2
0.4
0.6
0.8
1.0
pulse at year 650
0.0
0.2
0.4
0.6
0.8
1.0
pulse at year 450
0.0
0.2
0.4
0.6
0.8
1.0
pulse at year 250
0.0
0.2
0.4
0.6
0.8
1.0
pulse at year 50
0.00 0.04 0.08 0.12
depth temperature
depth(m)
400350300250200150100500
37. Numerical Study
Five models with different subsets of proxy or
related data:
• local/regional Temperature series (T)
• Tree ring (Dendrochronology) only (D)
• Tree ring + Pollen (DP)
• Tree ring + Borehole (DB)
• Tree ring + Borehole + Pollen (DBP)
38. 1000 1200 1400 1600 1800 2000
temperature
target
reconstruction
target
reconstruction
target
reconstruction
−0.60.20.6−0.60.20.6−0.60.20.6
c
b
a
The reconstructions using tree-rings and pollen together with forc-
ings in three scenarios. a: modeling T and without noise; b: mod-
eling T1 and without noise; c: modeling T and with noise.
39. 0.0050.0500.500
period (year)
spectrum
1000 400 200 100 50 30 20 10 5 3
T
DBP
DP
D
DB
Using smoothed spectrum of reconstruction residuals from the five data models
to illustrate the frequency band at which proxies capture the variation of the
temperature process Li, Nychka and Ammann (2012).
40. Messages from numerical analyses
• BHM enables integrating information from forcings
and different proxies
• Forcing covariates dramatically reduce the bias, the
variance and thus rmse
– if the included proxy data do not well represent the
low frequency variability
• Tree-rings help to retain the high frequency variabil-
ity, while the pollen captures the variability at about
a 30 year period and onward
• With noise the reconstruction deteriorates somewhat
but not terribly
41. Challenges: Space-time reconstruction
– Multivariate linear regression (Smerdon, 2017):
T = BP + ,
T: instrumental climate records; P: proxy values.
• Covariance-based matrix factorizations:
PCA and canonical correlation analysis.
• Penalized regressions: ridge or lasso regression
• Ordinary least square vs total least square
• Hybrid: split the random field at 20 year time period.
– Bayesian reconstruction (Tingley and Huybers, 2010ab)
42. Challenges – Space-time reconstruction
Many issues remained:
• Nonstationary in both mean and dependency
• Modeling teleconnection (Choi, Li, Zhang and Li, 2015)
• Extensive computation for global scale
• How to evaluate the space-time reconstruction
(Li and Smerdon, 2012; Li, Zhang and Smerdon, 2016; Yun, Li,
Smerdon and Zhang, 2017)
• Multivariate spatial reconstruction
– Non-Gaussian of some climate variable
– Identifiability issue, e.g., temperature and precipi-
tation in general not identifiable