Your SlideShare is downloading. ×
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Schuh ecn2013 tcn_data_structure
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Schuh ecn2013 tcn_data_structure

59

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
59
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Good morning. Today I would like to speak to you about data on insect-plant associations as derived from insect collections. This presentation is a joint effort by Katja Seltmann, Christine Johnson, and me as part of our work on a TCN award from the NSF.
  • In this talk we will use TCN data to host data for three families of herbivorous hemipterans and evaluate three propositions:
    The degree to which collections contain information on host relationships
    The degree of confidence we are able to place in that information, and
    The degree to which those data demonstrate host specificity or the lack thereof
  • The AEC database has supported data capture for a number of NSF-supported projects. This slide shows the relative proportion of data captured by these projects, which in aggregate represent more than 1 million specimen records, the largest numbers coming from the TCN project which represents about two-thirds of the red slice of the pie.
  • Here we see the institutions with more than 10,000 speciemen records and which have therefore made the most significant contributions to our knowledge of host relationships.
  • These graphs plot specimens against time, with each point representing a collecting event. The graph in the lower right is the sum of collecting events for all three groups. Note that the scale for each graph is different, with the Miridae having a much greater number of specimens per collecting event than Aphidae and Membracidae. These data represent all collecting events, irrespective of whether host data involved or not.
  • Here we see the data in the prior graphs transformed to show the numbers of collecting events with host records or each taxon, as well as information for remaining taxa in the database. Comparison of the right-hand bar with the remaining three gives a clear indication of the reasons for choice of taxa for this analysis.
    Blue is for records without host information;
    brown is for non-unique hosts;
    Yellow represents unique hosts, in other words, all host records for the insect taxon are from the same host genus in this analysis.
    Aphids almost always come with host information as a result of the collecting methods that are used in the group.
  • Here we can see the numbers of plant families on the left, and plant genera on the right, occupied by each of the three groups we have chosen to analyze. Relative to the size of the taxon sample, the Aphidae show the highest diversity of host information at both the family and generic levels. The family data also support the proposition that all three taxa are specializing on many of the same plant families, a phenomenon that is reinforced in the following graphs.
  • Here we numbers of collecting events by plant family for the Miridae,
  • For the Membracidae, and
  • For the Aphididae. You will note that a few families loom large as hosts, usually in all three groups, notably Asteraceae, Fabaceae, Fagaceae, Rosaceae, and Pinaceae, with most other families occurring in lower frequencies.
  • Our approach to assessing the strength of host data in through a DECISION TREE: the first set of decisions is based on the frequencies and collecting event counts; the second set of decisions is based on the heuristic properties. Scores are based on fit of the model to the data and ranked from high to low.
    The main contributor to a score is frequency (f). Low frequency does not argue that information for a taxon should be completely disregarded. The score for an insect-plant association can be increased through information from the heuristics component, as for example, having insect larvae collected on a given plant species which would indicate a strong association even when there is a low number of collecting events. The existence large numbers of specimens or of authoritatively identified plant vouchers would also improve the score for a given association. Associations with a frequency of 1 make no argument for whether the data are strong or not because no confidence limits can be established. The only way to bolster the score is through more collecting. The single-event data do suggest that when going to the field the first host to be investigated should be the one for which we already have a presumed association.
    For example, in order to get in the high category, the frequency of y (f(y) has to be greater than or equal to 15 AND y has to be equal to or greater than 5.
    In order to get a medium score, you either need one or the other score.
    The value for the frequency of f(y) [frequency of y] is obtained by following formula:
  • Here we see confidence values for the three families we have analyzed. As a proportion the Membracidae have the most high scores (in yellow). In absolute terms the Miridae present the most data on host fidelity with 842 associations with high scores, and in blue 1844 associations with medium scores; but, they also possess the greatest number of host data points based on a single collecting event (pink), a situation that obviously demands further fieldwork but may nonetheless be an indicator of a large number of valid host associations. All three insect families have large numbers of putative host associations with low scores (gray), a situation we will return to later in the presentation.
  • Here we see a histogram showing all species of Miridae known to occur on Larrea (Zygophyllaceae) in the American Southwest. The gray portion of the bar indicates the proportion of collecting events known from Larrea, while the other colors indicate the proportions of collecting events from other plant families. What does our decision tree approach tell us about these data?
    Larrea served as the model from which we developed the decision-tree criteria.
  • Here we see those same data plotted in the form of a graph: gray nodes represent insect species, green nodes represent plant genera, the large node representing Larrea. Red lines (edges) represent associations for which the decision tree indicates a high level of confidence in the host association. One might ask “Just what does this graph tell us?”
    The size of the balls is determined by the number of collecting events.
  • This slide makes clear that in order to make sense of these data we need to tease apart the noise from the real signal. This graph shows the signal, whereas the prior graph commingles noise and signal. Even though specimen labels indicate that many taxa had been collected on Larrea, only 5 of those taxa actually appear to be host specific, as seen by their connections to the large green node. All other insect taxa are shown to have there actual (breeding) host associations with taxa other than Larrea, a result that may not be clear from a naïve interpretation of the data. We might therefore wish to look at the reasons for why we get these spurious answers. This graph is based on high scores only.
  • When the noise is filtered out for the Miridae as a group by the elimination of low scores, and insect genera are plotted against plant genera, we see distinct patterns develop in the group. Here we see plant groups which have high herbivore diversity and in which we also have high confidence in the data. If this graph was done on a species-by-species comparison, the strength of the signal would probably be even greater although more complex in terms of presentation.
  • Transcript

    • 1. The structure of insect—plant host data as derived from museum collections: An analysis based on data from the NSF-funded Tritrophic Database — Thematic Collections Network (TTD-TCN) Randall T. Schuh Katja Seltmann Christine A. Johnson American Museum of Natural History
    • 2. TTD-TCN Rationale “The data captured via ADBC funding will dramatically improve our understanding of the relationships among the more than 11,000 species of North American Hemiptera (scale insects, aphids, leafhoppers, true bugs, and relatives), their food plants, and the wasps that parasitize the hemipterans.”
    • 3. The data we will evaluate today were captured through a Web-based application developed with NSF Planetary Biodiversity Inventory funding and used by the TTD-TCN. This software application, known as Arthropod Easy Capture (AEC), is built in open-source code, is being implemented as an appliance by the ADBC-funded Home Uniting Biocollections (HUB, iDigBio), and through that implementation will be able to be installed with a “one-click” installation application. Server code is online at Source Forge: http://sourceforge.net/projects/arthropodeasy/
    • 4. Specimen Count by Project (1,144,240)
    • 5. Sources of Insect—Plant Host Data
    • 6. Data on insect-plant relationships is available primarily from labels on insect specimens—as opposed to labels on plant specimens. Substantial amounts of data were captured for the family Miridae on a world basis under NSF Planetary Biodiversity Inventory funding between 2003—2011. The TTD-TCN is a collaboration among 17 US entomological institutions. The institutional contributions from these two projects, as represented by numbers of specimen records, are seen in the following graph. The TTD-TCN is defining the field structure for host data as used by the iDigBio and for other Web-aggregators such as DiscoverLife.org.
    • 7. Choice of Groups for Analysis
    • 8. In order to evaluate the nature of insect-host plant data derived from collections, we need to look at groups that offer large data sets. Necessary attributes are: 1.Large numbers of specimen records with host information 2.Large numbers of collecting events 3.Substantial diversity of host taxa At the present time the following taxa in our database meet those criteria:
    • 9. Hemiptera Sternorrhyncha Aphididae (4400 species worldwide) Auchenorrhyncha Membracidae (3200 species worldwide) Heteroptera Miridae (11,000 species worldwide) Raw data for each taxon are distributed as seen in the following four graphs.
    • 10. Collection Events Miridae Aphididae Membracidae Combined data Year Specimens Collected
    • 11. Host Records as a Proportion of Collecting Events Hosts unique Hosts non-unique Without hosts
    • 12. aa aa aa aa aa aa aa a Aphididae Miridae Miridae Aphididae Membracidae Membracidae
    • 13. Algorithmic Assessment of Data Quality
    • 14. COLLECTING EVEN DATA: The occurrence of an insect species on a plant genus ANALYSIS: evaluate insect/plant ANALYSIS: evaluate insect/plant associations with different scores associations with different scores Modify algorithm to improve fit of model to data based on results Compute frequency of occurrence on a particular plant genus Compare with all insect collecting events on any plant Scores: High, Medium, or Low confidence in insect--plant association HEURISTIC DATA: Larvae present? Multiple specimens? Voucher specimen available?
    • 15. f(y) ≥ 15.00% y≥5 f(y) ≥ 2.00% y≥3 ∨ f(y) ≥ 15.00% y≥2 ) n m p # s h u , e v r : a c g l o i b ( x=y′ +y c t s i r u e H not high or medium v g l o n m i c e p s : t a D x=1 Analysis
    • 16. Results of Analyses
    • 17. Using Larrea (creosote bush) as a example host
    • 18. Miridae/Larrea Association Network
    • 19. Miridae/Larrea Association Network with High Confidence
    • 20. Reasons for Low Host Scores and Methods for Improving Data Quality
    • 21. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa.
    • 22. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.
    • 23. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence.
    • 24. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence. 4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed.
    • 25. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence. 4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed. 5. Single collecting events: Indistinguishable from absolute host fidelity based on multiple events, except no confidence limit can be assessed. Heuristics such as presence of larvae and large numbers of specimens give credence to presumed association. Resolved only by further fieldwork.
    • 26. Implication of Results
    • 27. Conclusions
    • 28. 1. Insect collections offer substantial data on host relationships even though a majority of the specimens lack such information. 2. Our algorithm demonstrates a method for assessing data quality on a large scale. Our initial analyses show that: - We can have confidence in a significant proportion of the available information The data demonstrate a substantial degree of host specificity in our three target groups. 3. Degree of host specificity requires a scoring method that takes into account biological attributes, collecting techniques, and approaches to data capture in the field.
    • 29. Acknowledgments •Participating TCN and PBI Institutions •iDigBio •AMNH Database Data-entry Personnel •Participating TCN Data-entry Personnel •Michael D. Schwartz •National Science Foundation

    ×