Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Pennsylvania
Predictive Model:
Lessons Learned
Matthew D. Harris, AECOM - Burlington, NJ
matthew.d.harris@aecom.com
FHWA Statement
“The contents of the report reflect the views of the author(s) who are
responsible for the facts and accura...
“Remember that all models are
wrong; the practical question is
how wrong do they have to be
to not be useful.”
~ George E....
Organization of talk
• Introduction to PA Model
• Data lessons
• Methodological lessons
• Policy lessons
• Concluding obse...
Pennsylvania Predictive
Model
PA Model Specs
• 45,293 square miles
• 1 billion raster cells
• 2 million site-present cells
• 18,226 pre-contact sites
• ...
PA Model
PA Model in
comparison
PA Model in
comparison
DATA Lessons Learned
• Unique characteristics of archaeological data
• Representation of archaeological data
• Archaeologi...
Characteristics of Archaeological Data
Population Generating Process:
• Highly dynamic & complex
• Non-mechanistic
• Cultu...
Data Representation
Do centroids represent sites?
Background
Samples and
model variance
How many non-site samples to
use?
Background gif
Model uncertainty
Quantifying Uncertainty
Logistic regression (Bayesian GLM)
Quantifying Uncertainty
95% Credibility Interval
Quantifying Uncertainty
500 simulated plausible models
Methodological Lessons Learned
• Define your objectives and assumptions
• Reproducibility
• Create a model building system...
Reproducibility
Reproducibility and Accountability
www.rstudio.com
www.python.org
www.esri.com
aws.amazon.com
code example:
pseudo-code ex...
Model Building
System
● Variable creationand analysis
● Train model hyperparameters
● Algortihm Selection
● Test error wit...
Validation and error
Does this model predict new sites?
“The generalization performance of a
learning method relates to its prediction
capability on independent test data.” ~ Has...
Bias & Variance Tradeoff
ErrorError
Policy Lessons Learned
• Model purpose dictates policy applications
• Implementation requires explicit assumptions
• Error...
How it all works...
PURPOSE ASSUMPTIONS METHODS
ALGORITHMS /
MODELS
INTERPRETATIONPOLICY
Lessons learned
Reproducibility
Accountability in all aspects of model building
Clear and understandableassumptions
Validation
Test predictions on independent data to assess error
Balance Models to achieve appropriate generalization
Uncertainty
Understand and control for sources of uncertainty
Communicate uncertainty in text and visually
Purpose
Assess all aspects of a model relative to its purpose
Policy and implementation are based on model purpose
Not all doom and gloom!
• Face modeling issues head-on
• Model for our unique data
• Standardize our approaches
• Formaliz...
THANK
YOU!!!
@md_harris
github.com/mrecos
matthewdharris.com
www.penndotcrm.orgReport:
Upcoming SlideShare
Loading in …5
×

A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

1,275 views

Published on

A recently completed archaeological predictive model (APM) of the state of Pennsylvania provided an unprecedented opportunity to explore the current status of APM methods and extend them based on current methods derived from related scientific fields, medicine, and statistical computing. Through this process many different types of models were created and tested for validity, predictive performance, and adherence to archaeological theory. One result of this project is a comprehensive view of the problems that beset existing APM methodologies, solutions to some of these problems, and the nature of challenges that we will face going forward with new techniques. Most, if not all of the findings of this project are applicable to the eastern deciduous United States, and much of the methodological scope is useful to APMs in any geography. This paper will discuss the primary lessons learned from this project in regards to archaeological data, modeling methods, and theory, as well as touch on best-practices for future APM efforts.

Published in: Environment
  • Be the first to comment

  • Be the first to like this

A Statewide Archaeological Predictive Model of Pennsylvania: Lessons Learned

  1. 1. Pennsylvania Predictive Model: Lessons Learned Matthew D. Harris, AECOM - Burlington, NJ matthew.d.harris@aecom.com
  2. 2. FHWA Statement “The contents of the report reflect the views of the author(s) who are responsible for the facts and accuracy of the data presented within. The contents do not necessarily reflect the official view or policies of the Department or FHWA at the time of publication.” Report available at: www.penndotcrm.org
  3. 3. “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” ~ George E. P. Box, 1987
  4. 4. Organization of talk • Introduction to PA Model • Data lessons • Methodological lessons • Policy lessons • Concluding observations
  5. 5. Pennsylvania Predictive Model
  6. 6. PA Model Specs • 45,293 square miles • 1 billion raster cells • 2 million site-present cells • 18,226 pre-contact sites • 132 geographic study areas • 528 individual models • 93 model variables • 102 billion cells processed • Random Forest, MARS, and Stepwise Logistic Regression models Archaeo “Big Data”
  7. 7. PA Model
  8. 8. PA Model in comparison
  9. 9. PA Model in comparison
  10. 10. DATA Lessons Learned • Unique characteristics of archaeological data • Representation of archaeological data • Archaeological site prevalence • Covariates and correlation • Dealing with uncertainty
  11. 11. Characteristics of Archaeological Data Population Generating Process: • Highly dynamic & complex • Non-mechanistic • Cultural and Agency • Dynamic environment • Changing parameters • Subjectively defined expression • Censored through taphonomy Sample Generating Process: • Non-systematic • Subjective & inconsistent • Extensive measurement error • Imperfect detectability • Non-representative of population • Spatially biased • Over simplification
  12. 12. Data Representation
  13. 13. Do centroids represent sites?
  14. 14. Background Samples and model variance How many non-site samples to use? Background gif
  15. 15. Model uncertainty
  16. 16. Quantifying Uncertainty Logistic regression (Bayesian GLM)
  17. 17. Quantifying Uncertainty 95% Credibility Interval
  18. 18. Quantifying Uncertainty 500 simulated plausible models
  19. 19. Methodological Lessons Learned • Define your objectives and assumptions • Reproducibility • Create a model building system • ArcGIS is only part of the answer • Understand your algorithms • Test and validate all results
  20. 20. Reproducibility
  21. 21. Reproducibility and Accountability www.rstudio.com www.python.org www.esri.com aws.amazon.com code example: pseudo-code example:
  22. 22. Model Building System ● Variable creationand analysis ● Train model hyperparameters ● Algortihm Selection ● Test error with Cross-Validation ● Assess performance ● Model selection ● Mosaic and aggregate
  23. 23. Validation and error Does this model predict new sites?
  24. 24. “The generalization performance of a learning method relates to its prediction capability on independent test data.” ~ Hastie et al. (2008)
  25. 25. Bias & Variance Tradeoff ErrorError
  26. 26. Policy Lessons Learned • Model purpose dictates policy applications • Implementation requires explicit assumptions • Error rates and uncertainty must be known • Scale of data is critical in scale of use • Methods to visualize uncertainty
  27. 27. How it all works... PURPOSE ASSUMPTIONS METHODS ALGORITHMS / MODELS INTERPRETATIONPOLICY
  28. 28. Lessons learned
  29. 29. Reproducibility Accountability in all aspects of model building Clear and understandableassumptions
  30. 30. Validation Test predictions on independent data to assess error Balance Models to achieve appropriate generalization
  31. 31. Uncertainty Understand and control for sources of uncertainty Communicate uncertainty in text and visually
  32. 32. Purpose Assess all aspects of a model relative to its purpose Policy and implementation are based on model purpose
  33. 33. Not all doom and gloom! • Face modeling issues head-on • Model for our unique data • Standardize our approaches • Formalize our theory • Compare our results
  34. 34. THANK YOU!!! @md_harris github.com/mrecos matthewdharris.com www.penndotcrm.orgReport:

×