Machine Learning in the Environmental Discipline:      Statistical Paradigm Shifts, Combining Data Miningwith Geographic I...
Applying and Teaching Machine Learning at Public Institutions                What’s at stake ?-a better pre-screening of d...
This presentation-Environmental Sciences (Sustainability)-Paradigm Shift in the Sciences-Geographic Information Systems (G...
The earth and its atmosphere ...we only have one ...
The atmosphere
Rio Convention 1992 and its goals
The Earth at night…     …people are everywhere these days,     mostly at the coast, affecting Wildlife,     Habitats and E...
An Earth Sampler: The Three Poles1.                             3.                 2.
Three poles: Ice, glaciers and snow => A cool (or a hot?) place to be...1.                      2.              3.
No data …and “nothing is known” …?Climbing and Understanding Data Mountains with Data Mining:           A free copy of GBI...
Examples of people using and publishing onMachine Learning in the Biodiversity, Wildlifeand Sustainability Sciences…
Classify, pre-screen, explain and predict data                      A heap of data,                   e.g. online and merg...
Multiple Regressions: Are they any good ?!Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +….The issues are f...
What statistical philosophies    and schools exist ?          -Frequency Statistics R. A. Fisher                          ...
What statistical philosophies    and schools exist ?  e.g. Frequency Statistics  -Normal distribution  -Independence  -Lin...
What statistical philosophies    and schools exist ? e.g. Bayesian Statistics -Priors -Fit a function -Variance explained ...
What statistical philosophies    and schools exist ? e.g. Machine Learning -Inherently non-linear -No prior assumptions (n...
What linear models and frequency statistics cannot achieve             Linear                      Non-Linear           (~...
What machine learning is                      Machine Learning                        algorithmA numerical pattern        ...
What is a non-linear algorithm …?Through the origin vs.                  Not through the originSingle formula vs.         ...
What is a non-linear algorithm …?Through the origin vs.                   Not through the origin
What is a non-linear algorithm …?Single formula vs.                     Many formulas
What is a non-linear algorithm …?Through the origin vs.                   Not through the originSingle formula vs.        ...
Linear regression as an artefact,aka statistical & institutional terror ?!    Linear regression: a+ bx vs.                ...
PredictionsPredictions as the prime goal, and linked with Adaptive Management(done in Random Forest)                      ...
Outliers and variance …reality                 predicted    Root Mean Square Error (RMSE)
Outliers and variance …                   Predicted          Yes   Yes            No                80%            20%Real...
Moving into Data Mining                   (non-parametric, rel. high data error)   A heap of data,e.g. online and merged,f...
Moving into Data Mining                                                                         Describe Patterns,        ...
Data                                         Mining                                                                       ...
Data                                                                          ⇒     SOFTWARE                              ...
(Adaptive/Science-based)      Wildlife Management:         How it works…-Identify a problem-Do the science-Provide a manag...
Paradigm Shifts-Data Mining as a means to inform an analysis-Predictions as a goal in itself-Linear regressions and algori...
RandomForest as a scientific discipline ?!-a big and new research field per se (e.g. machine learning,                    ...
Adding Spatial Complexity:         What is a GIS(Geographic Information System) ?   “Computerized Mapping”   Consists of H...
What is a GIS(Geographic Information System) ?
RandomForest & GIS:            Mapping Supervised & Unsupervised ClassificationSupervised Classification: -Multiple Regres...
11 Cliome Clusters (RF)Climate Cluster Data, Canada & Alaska                         Credit: M. Lindgren et al.
Teaching Machine Learning            What’s the issue ?-the current teaching concept re. math & stats-wider introduction o...
Teaching math, statistics and     machine learning…Usually, machine learning starts wheremaximum-likelihood theory ends…=>...
Teaching math, statistics and          machine learning…Student’s age       Topic learned16                  Basic Math5  ...
Teaching math, statistics and   machine learning…the better wayStudent’s age   Topic learned  15             Basic Math  1...
Teaching math, statistics and         machine learning…Parsimony (complexity   vs. variance explained)    Model name   AIC...
Teaching math, statistics and         machine learning…Predictions (knowing things ahead of time)         For spatial pred...
To…Salford Systems,Dan Steinberg and his team,our students, L. Strecker,S. Linke, and many manyproject and research collab...
Upcoming SlideShare
Loading in …5
×

Paradigm shifts in wildlife and biodiversity management through machine learning

1,522 views

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,522
On SlideShare
0
From Embeds
0
Number of Embeds
603
Actions
Shares
0
Downloads
12
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Paradigm shifts in wildlife and biodiversity management through machine learning

  1. 1. Machine Learning in the Environmental Discipline: Statistical Paradigm Shifts, Combining Data Miningwith Geographic Information Systems (GIS), and Educating the Next Generation of Environmental Researchers Falk Huettmann -EWHALE lab- 419 Irving 1 Inst. of Arctic Biology Biology & Wildlife Dept Fairbanks, Alaska 99775 USA fhuettmann@alaska.edu
  2. 2. Applying and Teaching Machine Learning at Public Institutions What’s at stake ?-a better pre-screening of data (data mining)-a better inference (better model fits)-a better hypothesis (more relevant)-a faster and more accurate analysis-a better prediction (pro-active decisions) => progress in science & society => a new culture
  3. 3. This presentation-Environmental Sciences (Sustainability)-Paradigm Shift in the Sciences-Geographic Information Systems (GIS)-Teaching Machine Learning at Public Institutions
  4. 4. The earth and its atmosphere ...we only have one ...
  5. 5. The atmosphere
  6. 6. Rio Convention 1992 and its goals
  7. 7. The Earth at night… …people are everywhere these days, mostly at the coast, affecting Wildlife, Habitats and Ecological Services
  8. 8. An Earth Sampler: The Three Poles1. 3. 2.
  9. 9. Three poles: Ice, glaciers and snow => A cool (or a hot?) place to be...1. 2. 3.
  10. 10. No data …and “nothing is known” …?Climbing and Understanding Data Mountains with Data Mining: A free copy of GBIF data… www.gbif.org, seamap.env.duke.edu/ , mountainbiodiversity.org Megascience
  11. 11. Examples of people using and publishing onMachine Learning in the Biodiversity, Wildlifeand Sustainability Sciences…
  12. 12. Classify, pre-screen, explain and predict data A heap of data, e.g. online and merged, free of research designResponse ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +…. (e.g. clustering & multiple regression)
  13. 13. Multiple Regressions: Are they any good ?!Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +….The issues are for instance: -Many predictors, e.g. 30 -Correlations -Interactions (2-way, 3-way and higher) -Coefficients + Offset -(Multi-) hypothesis testing -Model Selection -Parsimony = > A single formula vs a software code as a result ?!
  14. 14. What statistical philosophies and schools exist ? -Frequency Statistics R. A. Fisher Occam (Parsimony) -Bayesian StatisticsBayes -Machine Learning L. Breiman
  15. 15. What statistical philosophies and schools exist ? e.g. Frequency Statistics -Normal distribution -Independence -Linear functions -Variance explained -Parsimony
  16. 16. What statistical philosophies and schools exist ? e.g. Bayesian Statistics -Priors -Fit a function -Variance explained -(alternative) confidence intervals -Linear functions (usually)
  17. 17. What statistical philosophies and schools exist ? e.g. Machine Learning -Inherently non-linear -No prior assumptions (non-parametric) -Fast -Computationally intense
  18. 18. What linear models and frequency statistics cannot achieve Linear Non-Linear (~unrealistic) (driven by data) ‘Mean’ ‘Mean’ ? SD SD ? Machine Learning, e.g. => potentially low r2 CART, TreeNet & RandomForest (there are many other algorithms !) Virtually not Pro-Active Pro-active Potential ~Pre-cautionary
  19. 19. What machine learning is Machine Learning algorithmA numerical pattern The mimick’ed pattern Virtually NOT parsimonious (no AIC, no R2 etc etc)
  20. 20. What is a non-linear algorithm …?Through the origin vs. Not through the originSingle formula vs. Many formulas
  21. 21. What is a non-linear algorithm …?Through the origin vs. Not through the origin
  22. 22. What is a non-linear algorithm …?Single formula vs. Many formulas
  23. 23. What is a non-linear algorithm …?Through the origin vs. Not through the originSingle formula vs. Many formulasA binary software algorithm that capturesthe mathematical relationship vs. A single formula with (a) coefficient
  24. 24. Linear regression as an artefact,aka statistical & institutional terror ?! Linear regression: a+ bx vs. Machine Learning?!
  25. 25. PredictionsPredictions as the prime goal, and linked with Adaptive Management(done in Random Forest) Best data used and available ?Pelagic King Penguin prediction using RF (french telemetry data from SCAR-MARBIN)
  26. 26. Outliers and variance …reality predicted Root Mean Square Error (RMSE)
  27. 27. Outliers and variance … Predicted Yes Yes No 80% 20%Reality 20% 80% NoConfusion Matrix (=> ROC curve)
  28. 28. Moving into Data Mining (non-parametric, rel. high data error) A heap of data,e.g. online and merged,free of research design Your classic Frequentist Statistician might get very scared and horrified with such data…
  29. 29. Moving into Data Mining Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, Traditionale.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.
  30. 30. Data Mining Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, (Traditional)e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209- 229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg.
  31. 31. Data ⇒ SOFTWARE Mining This becomes a standard, e.g. BEST PROFESSIONAL PRACTICE Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, (Traditional)e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209- 229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg.
  32. 32. (Adaptive/Science-based) Wildlife Management: How it works…-Identify a problem-Do the science-Provide a management suggestion-Implement science-based management-Assess and fine-tune =>Pro-active (before trouble occurs) => Predictive in Space and Time
  33. 33. Paradigm Shifts-Data Mining as a means to inform an analysis-Predictions as a goal in itself-Linear regressions and algorithms as a poor performer for modern problems of relevance to mankind-Metrics like r2, p-values and AIC as being suboptimal-Statistics based on (very advanced) software-Decision-support systems, Pro-Active Management and Operations Research
  34. 34. RandomForest as a scientific discipline ?!-a big and new research field per se (e.g. machine learning, ensemble models)-comes naturally with computing power and the internet-changes science (and fieldwork, e.g. design-based)-changes education (e.g. universities & stats depts)-changes pro-active/adaptive management, multi-species-changes society-affects land- and seascapes
  35. 35. Adding Spatial Complexity: What is a GIS(Geographic Information System) ? “Computerized Mapping” Consists of Hardware & Software (Monitor, PC and Database) ArcGIS by ESRI as a major software suite =>Can be linked with Machine Learning, e.g. for predictions, classifications and data mining
  36. 36. What is a GIS(Geographic Information System) ?
  37. 37. RandomForest & GIS: Mapping Supervised & Unsupervised ClassificationSupervised Classification: -Multiple Regression (classification or continuous) -Multiple Response e.g. YAIMPUTE RandomForestUnsupervised Classification: 1. Proximity Matrix via Bagging/Voting (RF) 2. Similarity Matrix 3. e.g. Regular Clustering (mclust, PAM) 3. Visualize Result
  38. 38. 11 Cliome Clusters (RF)Climate Cluster Data, Canada & Alaska Credit: M. Lindgren et al.
  39. 39. Teaching Machine Learning What’s the issue ?-the current teaching concept re. math & stats-wider introduction of machine learning-parsimony vs. holistic analysis and approach-better predictions
  40. 40. Teaching math, statistics and machine learning…Usually, machine learning starts wheremaximum-likelihood theory ends…=>maximum-likelihood starts at college… But it does not have to be that way!
  41. 41. Teaching math, statistics and machine learning…Student’s age Topic learned16 Basic Math5 Frequency Statistics (Maximum Likelihood)23 - 30 Machine Learning
  42. 42. Teaching math, statistics and machine learning…the better wayStudent’s age Topic learned 15 Basic Math 16 Frequency Statistics 18 (Maximum Likelihood) 19 Machine Learning Why & How done ? via Greybox Blackbox
  43. 43. Teaching math, statistics and machine learning…Parsimony (complexity vs. variance explained) Model name AIC etc Model 1 1000 Model 2 900 Model 3 600 Model 4 500 Model 5 1200 Breaking down a complex Model 6 1100 real-world problem into a single linear term…and for a mechanistic understanding…??
  44. 44. Teaching math, statistics and machine learning…Predictions (knowing things ahead of time) For spatial predictions For temporal predictions For a pro-active management
  45. 45. To…Salford Systems,Dan Steinberg and his team,our students, L. Strecker,S. Linke, and many manyproject and research collaboratorsworld-wide over the years!! Questions ?

×