Your SlideShare is downloading. ×
0
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Paradigm shifts in wildlife and biodiversity management through machine learning

1,081

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,081
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Machine Learning in the Environmental Discipline: Statistical Paradigm Shifts, Combining Data Miningwith Geographic Information Systems (GIS), and Educating the Next Generation of Environmental Researchers Falk Huettmann -EWHALE lab- 419 Irving 1 Inst. of Arctic Biology Biology & Wildlife Dept Fairbanks, Alaska 99775 USA fhuettmann@alaska.edu
  • 2. Applying and Teaching Machine Learning at Public Institutions What’s at stake ?-a better pre-screening of data (data mining)-a better inference (better model fits)-a better hypothesis (more relevant)-a faster and more accurate analysis-a better prediction (pro-active decisions) => progress in science & society => a new culture
  • 3. This presentation-Environmental Sciences (Sustainability)-Paradigm Shift in the Sciences-Geographic Information Systems (GIS)-Teaching Machine Learning at Public Institutions
  • 4. The earth and its atmosphere ...we only have one ...
  • 5. The atmosphere
  • 6. Rio Convention 1992 and its goals
  • 7. The Earth at night… …people are everywhere these days, mostly at the coast, affecting Wildlife, Habitats and Ecological Services
  • 8. An Earth Sampler: The Three Poles1. 3. 2.
  • 9. Three poles: Ice, glaciers and snow => A cool (or a hot?) place to be...1. 2. 3.
  • 10. No data …and “nothing is known” …?Climbing and Understanding Data Mountains with Data Mining: A free copy of GBIF data… www.gbif.org, seamap.env.duke.edu/ , mountainbiodiversity.org Megascience
  • 11. Examples of people using and publishing onMachine Learning in the Biodiversity, Wildlifeand Sustainability Sciences…
  • 12. Classify, pre-screen, explain and predict data A heap of data, e.g. online and merged, free of research designResponse ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +…. (e.g. clustering & multiple regression)
  • 13. Multiple Regressions: Are they any good ?!Response ~ Predictor1 + Predictor2 + Predictor3 + Predictor4 +….The issues are for instance: -Many predictors, e.g. 30 -Correlations -Interactions (2-way, 3-way and higher) -Coefficients + Offset -(Multi-) hypothesis testing -Model Selection -Parsimony = > A single formula vs a software code as a result ?!
  • 14. What statistical philosophies and schools exist ? -Frequency Statistics R. A. Fisher Occam (Parsimony) -Bayesian StatisticsBayes -Machine Learning L. Breiman
  • 15. What statistical philosophies and schools exist ? e.g. Frequency Statistics -Normal distribution -Independence -Linear functions -Variance explained -Parsimony
  • 16. What statistical philosophies and schools exist ? e.g. Bayesian Statistics -Priors -Fit a function -Variance explained -(alternative) confidence intervals -Linear functions (usually)
  • 17. What statistical philosophies and schools exist ? e.g. Machine Learning -Inherently non-linear -No prior assumptions (non-parametric) -Fast -Computationally intense
  • 18. What linear models and frequency statistics cannot achieve Linear Non-Linear (~unrealistic) (driven by data) ‘Mean’ ‘Mean’ ? SD SD ? Machine Learning, e.g. => potentially low r2 CART, TreeNet & RandomForest (there are many other algorithms !) Virtually not Pro-Active Pro-active Potential ~Pre-cautionary
  • 19. What machine learning is Machine Learning algorithmA numerical pattern The mimick’ed pattern Virtually NOT parsimonious (no AIC, no R2 etc etc)
  • 20. What is a non-linear algorithm …?Through the origin vs. Not through the originSingle formula vs. Many formulas
  • 21. What is a non-linear algorithm …?Through the origin vs. Not through the origin
  • 22. What is a non-linear algorithm …?Single formula vs. Many formulas
  • 23. What is a non-linear algorithm …?Through the origin vs. Not through the originSingle formula vs. Many formulasA binary software algorithm that capturesthe mathematical relationship vs. A single formula with (a) coefficient
  • 24. Linear regression as an artefact,aka statistical & institutional terror ?! Linear regression: a+ bx vs. Machine Learning?!
  • 25. PredictionsPredictions as the prime goal, and linked with Adaptive Management(done in Random Forest) Best data used and available ?Pelagic King Penguin prediction using RF (french telemetry data from SCAR-MARBIN)
  • 26. Outliers and variance …reality predicted Root Mean Square Error (RMSE)
  • 27. Outliers and variance … Predicted Yes Yes No 80% 20%Reality 20% 80% NoConfusion Matrix (=> ROC curve)
  • 28. Moving into Data Mining (non-parametric, rel. high data error) A heap of data,e.g. online and merged,free of research design Your classic Frequentist Statistician might get very scared and horrified with such data…
  • 29. Moving into Data Mining Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, Traditionale.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA.
  • 30. Data Mining Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, (Traditional)e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209- 229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg.
  • 31. Data ⇒ SOFTWARE Mining This becomes a standard, e.g. BEST PROFESSIONAL PRACTICE Describe Patterns, Screening Outliers Data Quantification (~prediction) Informed A heap of data, (Traditional)e.g. online and merged Analysis Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209- 229 in T.G. Smolinski, M.G. Milanova & A-E. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 122, Springer-Verlag Berlin Heidelberg.
  • 32. (Adaptive/Science-based) Wildlife Management: How it works…-Identify a problem-Do the science-Provide a management suggestion-Implement science-based management-Assess and fine-tune =>Pro-active (before trouble occurs) => Predictive in Space and Time
  • 33. Paradigm Shifts-Data Mining as a means to inform an analysis-Predictions as a goal in itself-Linear regressions and algorithms as a poor performer for modern problems of relevance to mankind-Metrics like r2, p-values and AIC as being suboptimal-Statistics based on (very advanced) software-Decision-support systems, Pro-Active Management and Operations Research
  • 34. RandomForest as a scientific discipline ?!-a big and new research field per se (e.g. machine learning, ensemble models)-comes naturally with computing power and the internet-changes science (and fieldwork, e.g. design-based)-changes education (e.g. universities & stats depts)-changes pro-active/adaptive management, multi-species-changes society-affects land- and seascapes
  • 35. Adding Spatial Complexity: What is a GIS(Geographic Information System) ? “Computerized Mapping” Consists of Hardware & Software (Monitor, PC and Database) ArcGIS by ESRI as a major software suite =>Can be linked with Machine Learning, e.g. for predictions, classifications and data mining
  • 36. What is a GIS(Geographic Information System) ?
  • 37. RandomForest & GIS: Mapping Supervised & Unsupervised ClassificationSupervised Classification: -Multiple Regression (classification or continuous) -Multiple Response e.g. YAIMPUTE RandomForestUnsupervised Classification: 1. Proximity Matrix via Bagging/Voting (RF) 2. Similarity Matrix 3. e.g. Regular Clustering (mclust, PAM) 3. Visualize Result
  • 38. 11 Cliome Clusters (RF)Climate Cluster Data, Canada & Alaska Credit: M. Lindgren et al.
  • 39. Teaching Machine Learning What’s the issue ?-the current teaching concept re. math & stats-wider introduction of machine learning-parsimony vs. holistic analysis and approach-better predictions
  • 40. Teaching math, statistics and machine learning…Usually, machine learning starts wheremaximum-likelihood theory ends…=>maximum-likelihood starts at college… But it does not have to be that way!
  • 41. Teaching math, statistics and machine learning…Student’s age Topic learned16 Basic Math5 Frequency Statistics (Maximum Likelihood)23 - 30 Machine Learning
  • 42. Teaching math, statistics and machine learning…the better wayStudent’s age Topic learned 15 Basic Math 16 Frequency Statistics 18 (Maximum Likelihood) 19 Machine Learning Why & How done ? via Greybox Blackbox
  • 43. Teaching math, statistics and machine learning…Parsimony (complexity vs. variance explained) Model name AIC etc Model 1 1000 Model 2 900 Model 3 600 Model 4 500 Model 5 1200 Breaking down a complex Model 6 1100 real-world problem into a single linear term…and for a mechanistic understanding…??
  • 44. Teaching math, statistics and machine learning…Predictions (knowing things ahead of time) For spatial predictions For temporal predictions For a pro-active management
  • 45. To…Salford Systems,Dan Steinberg and his team,our students, L. Strecker,S. Linke, and many manyproject and research collaboratorsworld-wide over the years!! Questions ?

×