Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Franz 2017 sols cbs seminar the limits of synthesis for integrative biology

64 views

Published on

ASU Center for Biology + Society Conversation Series. School of Life Sciences, October 11, 2017. See also https://doi.org/10.1101/157214

Published in: Science
  • Be the first to comment

  • Be the first to like this

Franz 2017 sols cbs seminar the limits of synthesis for integrative biology

  1. 1. The limits of synthesis for integrative biology Nico Franz School of Life Sciences, Arizona State University Center of Biology + Society Conversation Series October 11, 2017 – School of Life Sciences, ASU @ http://www.slideshare.net/taxonbytes/franz-2017-sols-cbs-seminar-the-limits-of-synthesis-for-integrative-biology
  2. 2. Premise: The notion of synthesis is appealing doi:10.1371/journal.pbio.1001468
  3. 3. Premise: The notion of synthesis is appealing https://www.nsf.gov/funding/index.jsp
  4. 4. Implementation (in systematics): Synthesis = one view (at a time) • Example: The Open Tree of Life project doi:10.1073/pnas.1423041112
  5. 5. Implementation (in systematics): Synthesis = one view (at a time) • Example: The Global Biodiversity Information Facility (GBIF) https://www.slideshare.net/mdoering/gbif-checklist-bank-and-the-
  6. 6. Implementation (in systematics): Synthesis = one view (at a time) • Example: The Global Biodiversity Information Facility (GBIF) • "It is updated regularly through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. The following 54 sources have been used to assemble the GBIF backbone: …" doi:10.5072/hufs9m
  7. 7. Initial questions – How to integrate biological data? • Does synthesis necessarily mean one view?
  8. 8. Initial questions – How to integrate biological data? • Does synthesis necessarily mean one view? ⇒ No. Most generally: "The combination of components or elements to form a connected whole" (~ Oxford).
  9. 9. Initial questions – How to integrate biological data? • Does synthesis necessarily mean one view? ⇒ No. Most generally: "The combination of components or elements to form a connected whole" (~ Oxford). • Is equating synthesis with one hierarchy empirically and socially adequate, or desirable?
  10. 10. Initial questions – How to integrate biological data? • Does synthesis necessarily mean one view? ⇒ No. Most generally: "The combination of components or elements to form a connected whole" (~ Oxford). • Is equating synthesis with one hierarchy empirically and socially adequate, or desirable? ⇒ Likely not if novel or conflicting views are thereby somehow suppressed.
  11. 11. Initial questions – How to integrate biological data? • Does synthesis necessarily mean one view? ⇒ No. Most generally: "The combination of components or elements to form a connected whole" (~ Oxford). • Is equating synthesis with one hierarchy empirically and socially adequate, or desirable? ⇒ Likely not if novel or conflicting views are thereby somehow suppressed. • What are the consequences of synthesis = one view? • What are the remedies? • What are the incentives to conceive of synthesis differently? • What are the obstacles to doing so?
  12. 12. Initial questions – How to integrate biological data? • Does synthesis necessarily mean one view? ⇒ No. Most generally: "The combination of components or elements to form a connected whole" (~ Oxford). • Is equating synthesis with one hierarchy empirically and socially adequate, or desirable? ⇒ Likely not if novel or conflicting views are thereby somehow suppressed. • What are the consequences of synthesis = one view? • What are the remedies? • What are the incentives to conceive of synthesis differently? • What are the obstacles to doing so? ⇒ To be explored for the use case of biological systematics / biodiversity data.
  13. 13. Language Types Background: Linnaean names refer to "non-types" contingently Dubois. 2005. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf Non-types
  14. 14. Language Background: Linnaean names refer to "non-types" contingently Dubois. 2005. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf Non-types Cleistes bifaria acc. to author 1
  15. 15. Language Background: Linnaean names refer to "non-types" contingently Dubois. 2005. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf Non-types Cleistes bifaria acc. to author 2
  16. 16. Language Background: Linnaean names refer to "non-types" contingently Dubois. 2005. http://sciencepress.mnhn.fr/sites/default/files/articles/pdf/z2005n2a8.pdf Non-types Cleistes bifaria acc. to author 3
  17. 17. The Cleistes/Cleistesiopsis use case ⇒ 20 orchid occurrence records, 3 taxonomies, 1 synthesis ⇒ Let's map them! Charly Lewis, CC BY-SA 3.0 doi:10.1101/157214
  18. 18. A. sec. Radford, Ahles & Bell 1968 – The Bible Source: Radford, Ahles & Bell. 1968. Manual of the vascular flora of the Carolinas. UNC Press, Chapel Hill.
  19. 19. B. sec. Kartesz 2010 – The Federal Standard Source: Kartesz. 2010. Floristic synthesis of North America, version 9-15-2010. Biota of North America Program, Chapel Hill.
  20. 20. C. sec. Weakley 2015 – The "Best" New Regional Flora Source: Weakley. 2015. Flora of the Southern and Mid-Atlantic States. UNC Herbarium, Chapel Hill.
  21. 21. Expert views are in conflict. One aggregate may distort any/all views!
  22. 22. D. sec. SERNEC Raw – Mid-Level Herbarium Aggregator Source: SERNEC Data Portal. 2017. Available from http://sernecportal.org. Accessed 01 June 2017.
  23. 23. E. sec. SERNEC Synthesis – Mid-Level Herbarium Aggregator Source: SERNEC Data Portal. 2017. Available from http://sernecportal.org. Accessed 01 June 2017.
  24. 24. What are the implications of "synthesis"? ⇒ The orchids are variously rare and red-listed Charly Lewis, CC BY-SA 3.0
  25. 25. Individual expert views are in conflict; however... doi:10.1101/157214
  26. 26. ...the synthesis merges the conflicts "unevenly". doi:10.1101/157214
  27. 27. One view yields novel inferences, with no expert provenance. doi:10.1101/157214
  28. 28. How to remedy? ⇒ Synthesis as a conflict exposition and alignment service Charly Lewis, CC BY-SA 3.0
  29. 29. Remedy: Representing taxonomic concepts and alignments • 9 schemata for the Cleistes/Cleistesiopsis complex doi:10.3897/rio.2.e10610
  30. 30. • 9 schemata for the Cleistes/Cleistesiopsis complex • Vertical sections identify congruent taxonomic concept regions Remedy: Representing taxonomic concepts and alignments doi:10.3897/rio.2.e10610
  31. 31. • 9 schemata for the Cleistes/Cleistesiopsis complex • Vertical sections identify congruent taxonomic concept regions • Colors identify lineages of taxonomic names (epithets) in use Remedy: Representing taxonomic concepts and alignments doi:10.3897/rio.2.e10610
  32. 32. • 9 schemata for the Cleistes/Cleistesiopsis complex • Vertical sections identify congruent taxonomic concept regions • Colors identify lineages of taxonomic names (epithets) in use • There is no consensus! Five incongruent schemata are used concurrently Remedy: Representing taxonomic concepts and alignments doi:10.3897/rio.2.e10610
  33. 33. Further diagnosis: If incongruent taxonomies are endorsed – locally, provisionally, and democratically – then what is the impact for aggregated biodiversity data?
  34. 34. Further diagnosis: ⇒ Taxonomy becomes a variable that we need to represent, and thereby control for (at the system level).
  35. 35. The 'consensus' The 'bible' The (formerly) federal 'standard' The 'best', latest regional flora "Controllingthetaxonomicvariable" "Just bad" Expert views are in conflict Solution: Instead of aggregating an artificial 'consensus', … doi:10.3897/rio.2.e10610
  36. 36. The 'consensus' The 'bible' The (formerly) federal 'standard' The 'best', latest regional flora "Controllingthetaxonomicvariable" "Just bad" Expert views are reconciled Solution: Instead of aggregating an artificial 'consensus', build translation services doi:10.3897/rio.2.e10610
  37. 37. Challenge: How can we redesign aggregation to yield high-quality biodiversity data packages? (very abbreviated version)
  38. 38. Step 1 ⇒ Represent only taxonomic concept labels (TCLs) 1 • Syntax (TCL): taxonomic name [author, year, page] sec. source 1 Multi-taxonomy input/alignment visualizations generated with Euler/X toolkit: https://github.com/EulerProject/EulerX Cleistes divaricata sec. Gregg & Catling 1993 Pogonia sec. Brown & Wunderlin 1997
  39. 39. Step 2 ⇒ Represent each source coherently (Parent-Child relationships) • Syntax (PC): TCL1 is a child/parent of TCL2 [where TCL1/2 = same source] Cleistesiopsis bifaria sec. Pans. & de Barr. 2008 is a child of Cleistesiopsis sec. Pans. & de Barr. 2008
  40. 40. Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf == < > >< ! • Two regions N, M are either: • congruent (N == M) • properly inclusive (N < M) • inversely properly inclusive (N > M) • overlapping (N >< M) • exclusive of each other (N ! M) Step 3 ⇒ Align concepts with Region Connection Calculus (RCC–5)
  41. 41. Source: Thau, D.M. 2010. Reasoning about taxonomies. Thesis, UC Davis. http://gradworks.proquest.com/3422778.pdf == < > >< ! • Two regions N, M are either: • congruent (N == M) • properly inclusive (N < M) • inversely properly inclusive (N > M) • overlapping (N >< M) • exclusive of each other (N ! M) • RCC–5 articulations answer the query: "Can we join regions N and M?" • Taxonomies have multiple RCC–5 alignable components: nodes (parents, children), node-associated traits, even node-anchoring specimens. Step 3 ⇒ Align concepts with Region Connection Calculus (RCC–5)
  42. 42. Step 4 ⇒ Identify occurrence records only to TCLs Records: EKY39235 MTSU003611 NCSC00040204 … Records: BOON8098 CLEMS0061133 WILLI39399 … Records: GMUF-0039355 IBE006808 USCH58399 … Records: CONV0006268 MDKY00006482 NCU00038930 … Records: BRYV0023582, BRYV0023584 KHD00032030, MISS0016604 MMNS000227, NCSC00040206 USMS_000002923, USMS_000002924 VSC0053223, VSC0065528 … Records: ARIZ393087 DBG39049 USCH51217 … Records: NCU00040710 USCH96248 VSC0053218 … Records: CLEMS0012881 FUGR0003293 GA023130 … Records: BOON8100 NCSC00040210 SJNM45487 … Records: GA023144 LSU00012494 MISS0016608 … Records: IBE006810, IND-0012374, MMNS000227 Records: NY8654 • Syntax (ID): Occurrence / organism is identified to TCL "CLEMS0012881" is identified to Cleistes divaricata sec. Smith et al. 2004 [additional ID metadata]
  43. 43. Step 5 ⇒ Generate logically consistent RCC–5 alignments • Euler/X is a toolkit that infers logically consistent RCC–5 alignments
  44. 44. • Valued-added: MIR – set of Maximally Informative Relations containing the RCC–5 articulation for every possible TCL pair ⇒ Scalability Reasonerinference Step 5 ⇒ Generate logically consistent RCC–5 alignments
  45. 45. Step 6 ⇒ Integrate occurrence-to-TCL identifications & alignments Records: BOON8098, CLEMS0061133, CONV0006268, EKY39235 GMUF-0039355, IBE006808, IBE006810, IND-0012374 MDKY00006482, MMNS000227, MTSU003611, NCSC00040204 NCU00038930, NY8654, USCH58399, WILLI39399 … Records: ARIZ393087, BRYV0023582, BRYV0023584, DBG39049 KHD00032030, MISS0016604, MMNS00022, NCSC00040206 USMS_000002923, USMS_000002924, VSC0053223, VSC0065528 … Records: BOON8100, CLEMS0012881, FUGR0003293 GA023130, GA023144, LSU00012494 MISS0016608, NCSC00040210, NCU00040710 SJNM45487, USCH96248, VSC0053218 … • Specimen integration is fully driven by TCL-to-TCL RCC–5 signals
  46. 46. The 'consensus' The 'bible' The (formerly) federal 'standard' The 'best', latest regional flora "Controllingthetaxonomicvariable" Impact: "Please select your preference (A – D); we can perform all translations" doi:10.3897/rio.2.e10610
  47. 47. • We can now respond to queries such as: • "Show all specimens identified to the taxonomic name Cleistes divaricata" • Returns many records ⇒ Resolves incongruent lineage of name usages Remedy: Aggregation as a translational service
  48. 48. • We can now respond to queries such as: • "Show all specimens identified to the taxonomic name Cleistes divaricata" • Returns many records ⇒ Resolves incongruent lineage of name usages • "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015" • Returns record subset ⇒ Resolves only one narrowly circumscribed concept Remedy: Aggregation as a translational service
  49. 49. • We can now respond to queries such as: • "Show all specimens identified to the taxonomic name Cleistes divaricata" • Returns many records ⇒ Resolves incongruent lineage of name usages • "Now show specimens with the TCL Cleistesiopsis divaricata sec. Weakley 2015" • Returns record subset ⇒ Resolves only one narrowly circumscribed concept • "Now show specimens identified to the TCL Cleistes divaricata sec. RAB 1968, yet translated into the more granular TCLs sec. Weakley 2015" • Returns (again) many records, yet represents and contrasts two treatments, as opposed to providing the ambiguous lineage view (above) • "Show all specimens with ambiguous 2010/2015 TCL identifications…" (etc.) Remedy: Aggregation as a translational service
  50. 50. Synthesis, conflict, and integrative biology: Incentives and obstacles
  51. 51. Understanding the attraction of synthesis = one view • Ok, so we have diagnosed an issue. How prevalent does it need to be for aggregation designs to actually change?
  52. 52. Understanding the attraction of synthesis = one view • Ok, so we have diagnosed an issue. How prevalent does it need to be for aggregation designs to actually change? • Complication: Under the one-view design, we cannot measure the extent of the phenomenon very well.
  53. 53. Understanding the attraction of synthesis = one view • Ok, so we have diagnosed an issue. How prevalent does it need to be for aggregation designs to actually change? • Complication: Under the one-view design, we cannot measure the extent of the phenomenon very well. • Is the threshold (of the prevalence of the phenomenon) shared universally between contributors and users? [⇒ Fitness for use]
  54. 54. Understanding the attraction of synthesis = one view • Ok, so we have diagnosed an issue. How prevalent does it need to be for aggregation designs to actually change? • Complication: Under the one-view design, we cannot measure the extent of the phenomenon very well. • Is the threshold (of the prevalence of the phenomenon) shared universally between contributors and users? [⇒ Fitness for use] • Are unitary aggregation systems designed to foster distrust particularly among career-advancing experts (e.g. graduate students, postdocs, early-career researchers) who tend to produce novel, "groundbreaking" views?
  55. 55. Understanding the attraction of synthesis = one view • Ok, so we have diagnosed an issue. How prevalent does it need to be for aggregation designs to actually change? • Complication: Under the one-view design, we cannot measure the extent of the phenomenon very well. • Is the threshold (of the prevalence of the phenomenon) shared universally between contributors and users? [⇒ Fitness for use] • Are unitary aggregation systems designed to foster distrust particularly among career-advancing experts (e.g. graduate students, postdocs, early-career researchers) who tend to produce novel, "groundbreaking" views? • Is the "sweeping under the rug" of conflict an expectation grounded in the long history of taxonomy? It's 2017 for crying out load, shouldn't we have figured out orchids already? Why can't we have one unified "webpage" for every species? Or: We're so close, fund us once more and we'll promise to "get there".
  56. 56. Understanding the attraction of synthesis = one view • Ok, so we have diagnosed an issue. How prevalent does it need to be for aggregation designs to actually change? • Complication: Under the one-view design, we cannot measure the extent of the phenomenon very well. • Is the threshold (of the prevalence of the phenomenon) shared universally between contributors and users? [⇒ Fitness for use] • Are unitary aggregation systems designed to foster distrust particularly among career-advancing experts (e.g. graduate students, postdocs, early-career researchers) who tend to produce novel, "groundbreaking" views? • Is the "sweeping under the rug" of conflict an expectation grounded in the long history of taxonomy? It's 2017 for crying out load, shouldn't we have figured out orchids already? Why can't we have one unified "webpage" for every species? Or: We're so close, fund us once more and we'll promise to "get there". • Is the quieting of conflict an increasingly acceptable design feature of big data?
  57. 57. Understanding the attraction of synthesis = one view • Better integration – that accounts for past/present/future conflict – requires a kind of cognitive readjustment. "I need to ready my data now so that a dissenting view is more easily/scalably linkable to them". That may be asking for too much…
  58. 58. Understanding the attraction of synthesis = one view • Better integration – that accounts for past/present/future conflict – requires a kind of cognitive readjustment. "I need to ready my data now so that a dissenting view is more easily/scalably linkable to them". That may be asking for too much… • Better integration will likely also force contributors and users to be more transparent upfront regarding the aims of integration, i.e., to make stronger and more transparent commitments about fitness for use. Again, asking a lot.
  59. 59. Understanding the attraction of synthesis = one view • Better integration – that accounts for past/present/future conflict – requires a kind of cognitive readjustment. "I need to ready my data now so that a dissenting view is more easily/scalably linkable to them". That may be asking for too much… • Better integration will likely also force contributors and users to be more transparent upfront regarding the aims of integration, i.e., to make stronger and more transparent commitments about fitness for use. Again, asking a lot. • It does seem that we are in the process of giving something up for the sake of big data integration. To some extent the integration designs are still too driven by technical feasibility constraints (which are a moving target, however).
  60. 60. Understanding the attraction of synthesis = one view • Better integration – that accounts for past/present/future conflict – requires a kind of cognitive readjustment. "I need to ready my data now so that a dissenting view is more easily/scalably linkable to them". That may be asking for too much… • Better integration will likely also force contributors and users to be more transparent upfront regarding the aims of integration, i.e., to make stronger and more transparent commitments about fitness for use. Again, asking a lot. • It does seem that we are in the process of giving something up for the sake of big data integration. To some extent the integration designs are still too driven by technical feasibility constraints (which are a moving target, however). • Dealing with ambiguity and conflict in the ways we humans are accustomed to in integrative biology, is not something that we have translated well enough into the machine processing realm yet.
  61. 61. Understanding the attraction of synthesis = one view • Better integration – that accounts for past/present/future conflict – requires a kind of cognitive readjustment. "I need to ready my data now so that a dissenting view is more easily/scalably linkable to them". That may be asking for too much… • Better integration will likely also force contributors and users to be more transparent upfront regarding the aims of integration, i.e., to make stronger and more transparent commitments about fitness for use. Again, asking a lot. • It does seem that we are in the process of giving something up for the sake of big data integration. To some extent the integration designs are still too driven by technical feasibility constraints (which are a moving target, however). • Dealing with ambiguity and conflict in the ways we humans are accustomed to in integrative biology, is not something that we have translated well enough into the machine processing realm yet. • Personal issue: At what point should my advocacy "stop"?
  62. 62. Acknowledgments • CBS hosts: Kelle Dhein, Andrea Cottrell & Beckett Sterner – Thank you! • Euler/X team: Bertram Ludäscher, Shizhuo Yu, Jessica Cheng, Ed Gilbert. • NSF DEB–1155984 (PI Franz); IIS–118088, DBI–1147273 (PI Ludäscher). • If you have to read one paper: https://doi.org/10.1093/sysbio/syw023
  63. 63. Products: Concept taxonomy in theory and in practice ZooKeys. doi:10.3897/zookeys.528.6001 Semantic Web. doi:10.3233/SW-160220 Biological Theory. doi:10.1007/s13752-017-0259-5 PloS ONE. doi:10.1371/journal.pone.0118247 Systematics Biodiv. doi:10.1080/14772000.2013.806371 Systematic Biology. doi:10.1093/sysbio/syw023 Biodiversity Data Journal. doi:10.3897/BDJ.5.e10469 Research Ideas and Outcomes. doi: 10.3897/rio.2.e10610

×