Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Schuh ecn2013 tcn_data_structure

284 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Schuh ecn2013 tcn_data_structure

  1. 1. The structure of insect—plant host data as derived from museum collections: An analysis based on data from the NSF-funded Tritrophic Database — Thematic Collections Network (TTD-TCN) Randall T. Schuh Katja Seltmann Christine A. Johnson American Museum of Natural History
  2. 2. TTD-TCN Rationale “The data captured via ADBC funding will dramatically improve our understanding of the relationships among the more than 11,000 species of North American Hemiptera (scale insects, aphids, leafhoppers, true bugs, and relatives), their food plants, and the wasps that parasitize the hemipterans.”
  3. 3. The data we will evaluate today were captured through a Web-based application developed with NSF Planetary Biodiversity Inventory funding and used by the TTD-TCN. This software application, known as Arthropod Easy Capture (AEC), is built in open-source code, is being implemented as an appliance by the ADBC-funded Home Uniting Biocollections (HUB, iDigBio), and through that implementation will be able to be installed with a “one-click” installation application. Server code is online at Source Forge: http://sourceforge.net/projects/arthropodeasy/
  4. 4. Specimen Count by Project (1,144,240)
  5. 5. Sources of Insect—Plant Host Data
  6. 6. Data on insect-plant relationships is available primarily from labels on insect specimens—as opposed to labels on plant specimens. Substantial amounts of data were captured for the family Miridae on a world basis under NSF Planetary Biodiversity Inventory funding between 2003—2011. The TTD-TCN is a collaboration among 17 US entomological institutions. The institutional contributions from these two projects, as represented by numbers of specimen records, are seen in the following graph. The TTD-TCN is defining the field structure for host data as used by the iDigBio and for other Web-aggregators such as DiscoverLife.org.
  7. 7. Choice of Groups for Analysis
  8. 8. In order to evaluate the nature of insect-host plant data derived from collections, we need to look at groups that offer large data sets. Necessary attributes are: 1.Large numbers of specimen records with host information 2.Large numbers of collecting events 3.Substantial diversity of host taxa At the present time the following taxa in our database meet those criteria:
  9. 9. Hemiptera Sternorrhyncha Aphididae (4400 species worldwide) Auchenorrhyncha Membracidae (3200 species worldwide) Heteroptera Miridae (11,000 species worldwide) Raw data for each taxon are distributed as seen in the following four graphs.
  10. 10. Collection Events Miridae Aphididae Membracidae Combined data Year Specimens Collected
  11. 11. Host Records as a Proportion of Collecting Events Hosts unique Hosts non-unique Without hosts
  12. 12. aa aa aa aa aa aa aa a Aphididae Miridae Miridae Aphididae Membracidae Membracidae
  13. 13. Algorithmic Assessment of Data Quality
  14. 14. COLLECTING EVEN DATA: The occurrence of an insect species on a plant genus ANALYSIS: evaluate insect/plant ANALYSIS: evaluate insect/plant associations with different scores associations with different scores Modify algorithm to improve fit of model to data based on results Compute frequency of occurrence on a particular plant genus Compare with all insect collecting events on any plant Scores: High, Medium, or Low confidence in insect--plant association HEURISTIC DATA: Larvae present? Multiple specimens? Voucher specimen available?
  15. 15. f(y) ≥ 15.00% y≥5 f(y) ≥ 2.00% y≥3 ∨ f(y) ≥ 15.00% y≥2 ) n m p # s h u , e v r : a c g l o i b ( x=y′ +y c t s i r u e H not high or medium v g l o n m i c e p s : t a D x=1 Analysis
  16. 16. Results of Analyses
  17. 17. Using Larrea (creosote bush) as a example host
  18. 18. Miridae/Larrea Association Network
  19. 19. Miridae/Larrea Association Network with High Confidence
  20. 20. Reasons for Low Host Scores and Methods for Improving Data Quality
  21. 21. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa.
  22. 22. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.
  23. 23. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence.
  24. 24. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence. 4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed.
  25. 25. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence. 4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed. 5. Single collecting events: Indistinguishable from absolute host fidelity based on multiple events, except no confidence limit can be assessed. Heuristics such as presence of larvae and large numbers of specimens give credence to presumed association. Resolved only by further fieldwork.
  26. 26. Implication of Results
  27. 27. Conclusions
  28. 28. 1. Insect collections offer substantial data on host relationships even though a majority of the specimens lack such information. 2. Our algorithm demonstrates a method for assessing data quality on a large scale. Our initial analyses show that: - We can have confidence in a significant proportion of the available information The data demonstrate a substantial degree of host specificity in our three target groups. 3. Degree of host specificity requires a scoring method that takes into account biological attributes, collecting techniques, and approaches to data capture in the field.
  29. 29. Acknowledgments •Participating TCN and PBI Institutions •iDigBio •AMNH Database Data-entry Personnel •Participating TCN Data-entry Personnel •Michael D. Schwartz •National Science Foundation

×