Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
http://www.plantnet-project.org/
Crowdsourcing Biodiversity
Monitoring: How Sharing your Photo
Stream can Sustain our Plan...
2
• Global warming, food crisis and biodiversity erosion
• Accurate knowledge of living species distribution and
evolution...
Pl@ntNet project (launched 2010)
Bridging the taxonomic impediment thanks to an innovative
crowdsourcing workflow based on...
The positive feedback loop does work !
+
+
+
Pl@ntNet project (launched 2010)
Pl@ntNet app today2,5 M downloads
14 M sessions
10-50 K users / day
150 Countries
5
Languages
FR, EN, ES, IT, PT,
DE, AR, ...
Pl@ntNet data
Validated data = 3% of the queried plant images
- 30K collaboratively revised observations per year (TelaBot...
Pl@ntNet data
Unlabeled data = 97% of the raw query stream
- > 1 Million of observations per year (5.1M today)
- Not explo...
Pl@ntNet mobile search logs
Species Distribution Modelling from UGC
image streams ?
Can we predict (real-time and/or long-term) Species Distribution M...
Challenges
1. Improve recognition in open-world streams
10
Recognizing plants in an open world
11
An open-set recognition problem
- With 10K’s of known and unknown classes
- Highly ...
1. Improve automatic recognition of plants in open-world streams
- Novelty affects all systems, whatever the used rejectio...
Challenges
1. Improve recognition in open-world streams
2. Use geo-location and date
13
Geo-location and date ?
- Not so easy !
- No real success within 5 years of PlantCLEF challenge
- Why ?
- Plant distributi...
Challenges
1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
15
Using taxonomy ?
Taxonomy = a hierarchical classification built by botanists for hundreds of years
→ 600 families > 14K ge...
Challenges
1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
4. Optimize and boost...
Pro-active crowdsourcing
Classifier (CNN)
Annotators (heterogeneous skills)
Tasks selection &
assignment
?
?
?
Training
Training
2. Create
quizzes by
Monte-carlo
sampling
Beginner
Intermediate
1. ConvNet predictions
3. Sort quizzes b...
Identification
success rate
Experiments: Simpson’s paradox
20
Declared expertise
Workers are assigned tasks they have been...
Challenges
1. Improve recognition in open-world streams
2. Use geo-location and date
3. Use taxonomy
4. Optimize and boost...
22
Objectif: Estimate the relative abundance Aij
of species i in place j supposing
Nij
~ Law( Aij
, Bij
)
Nij
: Number of ...
Conclusion: biodiversity
informatics needs MM
23
Biodiversity
Dimension
Biodiversity Conservation
Challenge
Who? Multimedi...
Thank you
Upcoming SlideShare
Loading in …5
×

Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

116 views

Published on

Paper at the ACM Multimedia 2016 Brave New Ideas Session on Societal Impact of Multimedia Research:
Alexis Joly, Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski, Henning Müller, and Pierre Bonnet. 2016. Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet. In Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM, New York, NY, USA, 958-967.

Paper: https://hal-lirmm.ccsd.cnrs.fr/hal-01373762/document
Pl@ntNet app:
https://play.google.com/store/apps/details?id=org.plantnet&hl=en

Published in: Science
  • Be the first to comment

  • Be the first to like this

Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet

  1. 1. http://www.plantnet-project.org/ Crowdsourcing Biodiversity Monitoring: How Sharing your Photo Stream can Sustain our Planet 1 Alexis Joly, Hervé Goëau, Julien Champ, Samuel Dufour-Kowalski, Henning Müller, Pierre Bonnet Acknowledgement: Nozha Boujemaa, Daniel Barthelemy, Jean-François Molino
  2. 2. 2 • Global warming, food crisis and biodiversity erosion • Accurate knowledge of living species distribution and evolution is essential • Ultimate goal: sustainable and global biodiversity monitoring tools – Surveillance of global warming consequences, plant & animal diseases, human activities impact, invasive species propagation • The Taxonomic impediment – Less and less people can identify plants and animals – Less and less nature observers can produce biodiversity data Context
  3. 3. Pl@ntNet project (launched 2010) Bridging the taxonomic impediment thanks to an innovative crowdsourcing workflow based on automated plant identification
  4. 4. The positive feedback loop does work ! + + + Pl@ntNet project (launched 2010)
  5. 5. Pl@ntNet app today2,5 M downloads 14 M sessions 10-50 K users / day 150 Countries 5 Languages FR, EN, ES, IT, PT, DE, AR, ZH, SK
  6. 6. Pl@ntNet data Validated data = 3% of the queried plant images - 30K collaboratively revised observations per year (TelaBotanica) - Publicly available through international initiatives (GBIF, LifeCLEF) - Validation is a slow and hard process
  7. 7. Pl@ntNet data Unlabeled data = 97% of the raw query stream - > 1 Million of observations per year (5.1M today) - Not exploited today - A high potential for biodiversity monitoring
  8. 8. Pl@ntNet mobile search logs
  9. 9. Species Distribution Modelling from UGC image streams ? Can we predict (real-time and/or long-term) Species Distribution Models directly from Pl@ntNet mobile search logs ? Or from any other UGC image stream ? 9
  10. 10. Challenges 1. Improve recognition in open-world streams 10
  11. 11. Recognizing plants in an open world 11 An open-set recognition problem - With 10K’s of known and unknown classes - Highly imbalanced training data We carried out an evaluation within LifeCLEF 2016 - Training set of 1000 known species (113K pictures) - Test set = 8K manually annotated Pl@ntNet queries (half known, half distractors) - Classification Mean Average Precision on a subset of 26 invasive species ?? ? ? ? ? ?
  12. 12. 1. Improve automatic recognition of plants in open-world streams - Novelty affects all systems, whatever the used rejection method (even supervised) - No rejection method can deal with strong novelty rates → we are still far from being able to monitor biodiversity in Twitter or Snapchat streams ! 12 Recognizing plants in an open world
  13. 13. Challenges 1. Improve recognition in open-world streams 2. Use geo-location and date 13
  14. 14. Geo-location and date ? - Not so easy ! - No real success within 5 years of PlantCLEF challenge - Why ? - Plant distributions are not well known (this is actually our objective !) - Habitats are extremely heterogeneous from a species to another one (some plants live everywhere while others live in very specific biotopes) - What can we do ? - Big occurrence data (like GBIF) might help but is biased, heterogeneous and incomplete (no absence data) - Environmental variables might help but heterogeneous, incomplete, noisy, etc. → This will be one of the focus of LifeCLEF 2017
  15. 15. Challenges 1. Improve recognition in open-world streams 2. Use geo-location and date 3. Use taxonomy 15
  16. 16. Using taxonomy ? Taxonomy = a hierarchical classification built by botanists for hundreds of years → 600 families > 14K genus > 300K species But, taxonomy is highly heterogeneous and imbalanced → Classical hierarchical classification algorithms can be not be directly used - Some genus with up to 1000 very similar species - But many genus and families include very distinct species - The long tail distribution occurs at each level and in each node Genus Orobanche Genus Bupleurum Family Bupleurum
  17. 17. Challenges 1. Improve recognition in open-world streams 2. Use geo-location and date 3. Use taxonomy 4. Optimize and boost training data production 17
  18. 18. Pro-active crowdsourcing Classifier (CNN) Annotators (heterogeneous skills) Tasks selection & assignment ? ? ?
  19. 19. Training Training 2. Create quizzes by Monte-carlo sampling Beginner Intermediate 1. ConvNet predictions 3. Sort quizzes by difficulty (= success expectation across all workers)
  20. 20. Identification success rate Experiments: Simpson’s paradox 20 Declared expertise Workers are assigned tasks they have been trained on before !
  21. 21. Challenges 1. Improve recognition in open-world streams 2. Use geo-location and date 3. Use taxonomy 4. Optimize and boost data validation processes 5. Control bias in Species Distribution Models 21
  22. 22. 22 Objectif: Estimate the relative abundance Aij of species i in place j supposing Nij ~ Law( Aij , Bij ) Nij : Number of observations of i in j Aij : Abundance of i in j Bij : Bias that might be complex because of the diversity of contributors, the opportunistic property of the observations and the confusions Modeling bias factors ?
  23. 23. Conclusion: biodiversity informatics needs MM 23 Biodiversity Dimension Biodiversity Conservation Challenge Who? Multimedia research topics Aesthetic Enjoy and love it Everybody IR, Recommendation Diverse Identify and classify Taxonomists Multimodal & Large-scale classification Complex Decipher & model Biologists Multimedia Data analytics Unknown Discover & associate Taxonomists Multimedia Data mining Endangered Define & implement policies Decision makers Visualization, Interactivity Indispensable Use sustainably Everybody Cross-media streams monitoring
  24. 24. Thank you

×