Presentation on reconciling taxonomic concepts using the Euler approach, given at the 2012 Annual Meeting of Entomological Society of America, Knoxville, TN.
Franz 2014 BIGCB Tracking Change across Classifications and Phylogeniestaxonbytes
Slides presented on the Euler/X toolkit at the "Understanding Taxon Ranges in Space and Time" Workshop – Berkeley Initiative in Global Change Biology (BIGCB); held on November 07-09, 2014, University of California at Berkeley, CA. See also http://taxonbytes.org/bigcb-workshop-at-uc-berkeley-tackling-the-taxon-concept-problem/
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...taxonbytes
Answer Set Programming (ASP) is a declarative, stable model approach to logic programming with an under-realized potential for representing and reasoning over biological information. ASP is particularly suited to address reasoning challenges with complex starting conditions and rule sets. One such challenge is the interplay of taxonomic and nomenclatural change in biological taxonomy that often results when a taxonomy is revised based on a previously published perspective. Depending on the nature of the taxonomic changes to be undertaken, one or more Code-mandated principles will apply to regulate specific and concomitant name changes. In the case of the International Code of Zoological Nomenclature, two principles of significance include the Principles of Priority and Typification. Although the relationship between the number of taxonomic and nomenclatural adjustments under a given transition scenario is not linear, the application of the name-changing rules is usually unambiguous and therefore amenable to logic representation. Here we explore the modeling of the taxonomy/nomenclature interplay in ASP with a simple, abstract nine-taxon use case that contains four terminal species of which two are type-bearers for their respective genera. Four distinct one-taxon transfer scenarios are simulated through a transition system approach, requiring 1-7 concomitant nomenclatural changes depending (1) on the priority relationships among the terminal taxa being repositioned and (2) the type-bearing name dependencies of their higher-level parents. ASP can simulate these rules faithfully and thus reason over situations that range from a one-to-one match of taxonomic and nomenclatural changes to situations where they two kinds of change become increasingly disconnected (e.g., transfer of non-type genera among tribes without name change, or "transfer" [in reverse direction] of a single priority-carrying name/taxon into a larger yet junior entity with numerous required name changes). Our results, though very preliminary, illustrate how ASP logic approach may be utilized to perform optimizations at the taxonomy/nomenclature intersection, and generally represent a novel step towards translating Code-mandated naming rules into logic, with potential benefits for virtual taxonomic domains.
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?taxonbytes
Slides presented on the Euler/X projected (http://taxonbytes.org/prior-work-on-concept-taxonomy-2013/ & https://bitbucket.org/eulerx/euler-project) - for the conference "The Meaning of Names: Naming Diversity in the 21st Century", CU Natural History Museum, September 30, 2014.
Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Exper...taxonbytes
We discuss the perceived requirements – conceptual, technical, and social – for the creation of a “Taxonomic Clearing House” (TCH) that will enfranchise and enhance contributions by individual taxonomic experts and collaboratives in a global, names-based infrastructure. In terms of scale, such an infrastructure must be suited to assemble, retrieve, and editing contemporary taxonomic and phylogenetic classifications that involve some 22 million name strings representing 2.3 million living and extinct species; and serve diverse contributor and user communities including 6-40 thousand experts, 400,000 biologists, and more than 100 million citizen scientists. Existing classification synthesis platforms fall short of this grand challenge because they (1) may be limited to living or fossil taxa, (2) fail to show alternative points of view or (3) integrate molecularly-defined entities (“dark taxa”), (4) do not automatically monitor new data, (5) lack scalable solutions for providing feedback and credit, (6) have slow revisionary processes, (7) lack effective machine-to-machine services, or (8) cannot represent finer-grained insights such as evolving taxonomic concepts. Jointly these factors can produce a disconnect of the expert community that leads the global, piece-meal process of advancing classifications from large-scale platforms that purport to represent and unify their individual contributions. A suitable TCH should counteract this by acting as an open communal environment allowing expert contributors to jointly assemble and edit evolving taxonomic and phylogenetic content leading to large-scale classifications. In particular, it must (1) engage major collaborating taxonomic ad phylogenetic initiatives and facilitate diverse information flow; (2) expand information acquisition capabilities to harvest names and classifications from diverse sources; (3) create a powerful interface for taxonomic editing, including a topology assembly and visualization layer, nomenclatural and taxonomic editing layers, a Filtered Push-based service (http://wiki.filteredpush.org/wiki/) for submitting, tracking and accrediting edits to expert contributors, and taxonomically intelligent alerts; and (4) leverage these efforts towards a “Union” reference classification holding two million taxa and multiple alternative perspectives as indicated. To promote the engagement and acceptance, a TCH should target existing expert communities such as contributor to the Symbiota collections or TimeTree phylogenetics platforms. The presentation will both introduce the elements of this TCH vision and assess their merits and current progress and challenges towards realization.
Franz 2014 BIGCB Tracking Change across Classifications and Phylogeniestaxonbytes
Slides presented on the Euler/X toolkit at the "Understanding Taxon Ranges in Space and Time" Workshop – Berkeley Initiative in Global Change Biology (BIGCB); held on November 07-09, 2014, University of California at Berkeley, CA. See also http://taxonbytes.org/bigcb-workshop-at-uc-berkeley-tackling-the-taxon-concept-problem/
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...taxonbytes
Answer Set Programming (ASP) is a declarative, stable model approach to logic programming with an under-realized potential for representing and reasoning over biological information. ASP is particularly suited to address reasoning challenges with complex starting conditions and rule sets. One such challenge is the interplay of taxonomic and nomenclatural change in biological taxonomy that often results when a taxonomy is revised based on a previously published perspective. Depending on the nature of the taxonomic changes to be undertaken, one or more Code-mandated principles will apply to regulate specific and concomitant name changes. In the case of the International Code of Zoological Nomenclature, two principles of significance include the Principles of Priority and Typification. Although the relationship between the number of taxonomic and nomenclatural adjustments under a given transition scenario is not linear, the application of the name-changing rules is usually unambiguous and therefore amenable to logic representation. Here we explore the modeling of the taxonomy/nomenclature interplay in ASP with a simple, abstract nine-taxon use case that contains four terminal species of which two are type-bearers for their respective genera. Four distinct one-taxon transfer scenarios are simulated through a transition system approach, requiring 1-7 concomitant nomenclatural changes depending (1) on the priority relationships among the terminal taxa being repositioned and (2) the type-bearing name dependencies of their higher-level parents. ASP can simulate these rules faithfully and thus reason over situations that range from a one-to-one match of taxonomic and nomenclatural changes to situations where they two kinds of change become increasingly disconnected (e.g., transfer of non-type genera among tribes without name change, or "transfer" [in reverse direction] of a single priority-carrying name/taxon into a larger yet junior entity with numerous required name changes). Our results, though very preliminary, illustrate how ASP logic approach may be utilized to perform optimizations at the taxonomy/nomenclature intersection, and generally represent a novel step towards translating Code-mandated naming rules into logic, with potential benefits for virtual taxonomic domains.
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?taxonbytes
Slides presented on the Euler/X projected (http://taxonbytes.org/prior-work-on-concept-taxonomy-2013/ & https://bitbucket.org/eulerx/euler-project) - for the conference "The Meaning of Names: Naming Diversity in the 21st Century", CU Natural History Museum, September 30, 2014.
Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Exper...taxonbytes
We discuss the perceived requirements – conceptual, technical, and social – for the creation of a “Taxonomic Clearing House” (TCH) that will enfranchise and enhance contributions by individual taxonomic experts and collaboratives in a global, names-based infrastructure. In terms of scale, such an infrastructure must be suited to assemble, retrieve, and editing contemporary taxonomic and phylogenetic classifications that involve some 22 million name strings representing 2.3 million living and extinct species; and serve diverse contributor and user communities including 6-40 thousand experts, 400,000 biologists, and more than 100 million citizen scientists. Existing classification synthesis platforms fall short of this grand challenge because they (1) may be limited to living or fossil taxa, (2) fail to show alternative points of view or (3) integrate molecularly-defined entities (“dark taxa”), (4) do not automatically monitor new data, (5) lack scalable solutions for providing feedback and credit, (6) have slow revisionary processes, (7) lack effective machine-to-machine services, or (8) cannot represent finer-grained insights such as evolving taxonomic concepts. Jointly these factors can produce a disconnect of the expert community that leads the global, piece-meal process of advancing classifications from large-scale platforms that purport to represent and unify their individual contributions. A suitable TCH should counteract this by acting as an open communal environment allowing expert contributors to jointly assemble and edit evolving taxonomic and phylogenetic content leading to large-scale classifications. In particular, it must (1) engage major collaborating taxonomic ad phylogenetic initiatives and facilitate diverse information flow; (2) expand information acquisition capabilities to harvest names and classifications from diverse sources; (3) create a powerful interface for taxonomic editing, including a topology assembly and visualization layer, nomenclatural and taxonomic editing layers, a Filtered Push-based service (http://wiki.filteredpush.org/wiki/) for submitting, tracking and accrediting edits to expert contributors, and taxonomically intelligent alerts; and (4) leverage these efforts towards a “Union” reference classification holding two million taxa and multiple alternative perspectives as indicated. To promote the engagement and acceptance, a TCH should target existing expert communities such as contributor to the Symbiota collections or TimeTree phylogenetics platforms. The presentation will both introduce the elements of this TCH vision and assess their merits and current progress and challenges towards realization.
Presented online for C++ on Sea (2020-07-17)
Video at https://www.youtube.com/watch?v=Bai1DTcCHVE
Lambdas. All the cool kid languages have them. But does lambda mean what C++ and other languages, from Java to Python, mean by lambda? Where did lambdas come from? What were they originally for? What is their relationship to data abstraction?
In this session we will into the history, the syntax, the uses and abuses of lambdas and the way in which lambda constructs in C++ and other languages do (or do not) match the original construct introduced in lambda calculus.
Presented online for C++ on Sea (2020-07-17)
Video at https://www.youtube.com/watch?v=Bai1DTcCHVE
Lambdas. All the cool kid languages have them. But does lambda mean what C++ and other languages, from Java to Python, mean by lambda? Where did lambdas come from? What were they originally for? What is their relationship to data abstraction?
In this session we will into the history, the syntax, the uses and abuses of lambdas and the way in which lambda constructs in C++ and other languages do (or do not) match the original construct introduced in lambda calculus.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-‐...Edward Blurock
ChemConnect is a database that interconnects fine-grained information extracted from chemical kinetic and thermodynamic sources such as
CHEMKIN mechanism files, NASA polynomial files, and even the information behind automatic generation files.
The key to the interconnection is the Resource Description Framework (RDF) from Semantic Web technologies. The RDF is a triplet where an object item (first) is associated through a descriptor (second) to a subject item.
In this way the information of the object is connected (through the descriptor) to the subject.
In ChemConnect the object is word (text) and the subject can be text or a database item. The search mechanism within ChemConnect uses the object and subject text as search strings.
The presentation also contains an brief introduction to cloud computing.
This was presented at the COST Action 1404 SMARTCATS workshop on Databases and Systems Use Cases (http//http://www.smartcats.eu/wg4ws1dp/)
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
Will the rich domain knowledge from research publications and the implicit cross-domain metadata of cultural objects be compliant with each other? A contextual framework is proposed as dynamic and relational in supporting three different contexts: Reusing, Publication and Curation, which are individually constructed but overlapped with major conceptual elements. A Relations for Reusing (R4R) ontology has been devised for modeling these overlapping
conceptual components (Article, Data, Code, Provence, and License) for interlinking research outputs and cultural heritage data. In particular, packaging and citation relations are key to build up interpretations for dynamic contexts. Examples are provided for illustrating how the linking mechanism can be constructed and represented as a result to reveal the data linked in different contexts.
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
Discussion about ways of achieving FAIRness of both metadata and data. Brute force approaches, and more elegant "projection" approaches are shown.
Relevant papers are at:
doi: 10.7717/peerj-cs.110 (https://peerj.com/articles/cs-110/)
doi: 10.3389/fpls.2016.00641 (https://doi.org/10.3389/fpls.2016.00641)
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
Full SPARQL training
Covers all SPARQL : basic graph patterns, FILTERs, functions, property paths, optional, negation, assignation, aggregation, subqueries, federated queries.
Does not cover except SPARQL updates.
Includes exercices on DBPedia.
CC BY license
Lecture slides by Mustafa Jarrar at Birzeit University, Palestine.
See the course webpage at: http://jarrar-courses.blogspot.com/2011/09/knowledgeengineering-fall2011.html
and http://www.jarrar.info
and on Youtube:
http://www.youtube.com/watch?v=3_-HGnI6AZ0&list=PLDEA50C29F3D28257
What makes a linked data pattern interesting?Szymon Klarman
A short talk on the problem of mining linked data (RDF) patterns, introducing a few preliminary notions towards the definition of generic linked data mining algorithms.
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Often information is spread among
several data sources, such as hospital databases, lab databases,
spreadsheets, etc. Moreover, the complexity of each of these data sources
might make it difficult for end-users to access them, and even
more, to query all of them at the same time.
A new solution that has been proposed to this problem is
ontology-based data access (OBDA).
OBDA is a popular paradigm, developed since the mid 2000s, to query
various types of data sources
using a common vocabulary familiar to the end-users. In a nutshell
OBDA separates the user
from the data sources (relational databases, CVS files, etc.) by means
of an ontology, which is a common terminology that provides the user with a
convenient query vocabulary, hides the structure of the data sources,
and can enrich incomplete data with background knowledge. About a
dozen OBDA systems have been implemented in both academia and
industry.
In this tutorial we will give an overview of OBDA, and our system -ontop-
which is currently being used in the context of the European project
Optique. We will discuss how to use -ontop- for data integration,
in particular concentrating on:
– How to create an ontology (common vocabulary) for a life science domain.
– How to map available data sources to this ontology.
– How to query the database using the terms in the ontology.
– How to check consistency of the data sources w.r.t. the ontology
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
This slide deck accompanies the manuscript "Interoperability and FAIRness through a novel combination of Web technologies", submitted to PeerJ Computer Science: https://doi.org/10.7287/peerj.preprints.2522v1
It describes the output of the "Skunkworks" FAIR implementation group, who were tasked with building a prototype infrastructure that would fulfill the FAIR Principles for scholarly data publishing. We show how a novel combination of the Linked Data Platform, RDF Mapping Language (RML) and Triple Pattern Fragments (TPF) can be combined to create a scholarly publishing infrastructure that is markedly interoperable, at both the metadata and the data level.
This slide deck (or something close) will be presented at the Dutch Techcenter for Life Sciences Partners Workshop, November 4, 2016.
Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R
Similar to Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012 (20)
De-centralized but global: Redesigning biodiversity data aggregation for impr...taxonbytes
Biodiversity data pose fundamental challenges for unification-based paradigms of data science. In particular, a hierarchical, backbone-driven approach to aggregating global biodiversity data tends to limit community engagement. Data quality, trust, fitness for use, and impact are similarly reduced. This presentation will outline an alternative, de-centralized design for aggregating biodiversity data globally. The design requires a coordinative approach to representing and reconciling evolving systematic perspectives, and further social but technologically mediated coordination between regionally and taxonomically constrained "communities of practice" (sensu Wenger, 2000, https://doi.org/10.1177/135050840072002). Important next steps in this direction include the development of use cases that quantify the benefits of a de-centralized biodiversity data aggregation - in terms of lowering costs to expert engagement, raising efficiency of curation, validating novel integration services, and improving reproducibility and provenance tracking across heterogenous data structures and portals.
Anzaldo franz 2017 ecn your daily weeviltaxonbytes
Slides of the presentation "#YourDailyWeevil - a story of modest but gratifying social media success", given at the 2017 Annual Meeting of the Entomological Collections Network, November 05, 2017, Denver, Colorado.
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgetaxonbytes
Invited Presentation given at the University of Illinois Urbana Champaign iSchool, Center for Informatics Research in Science and Scholarship, CIRSS Seminar, Friday, February 17, 2017.
Franz et al tdwg 2016 new developments for libraries of lifetaxonbytes
Franz et al. @ #TDWG16 - "New developments for the Libraries of Life project and app". Talk # 1138, Friday, December 09, 2016, 02:45 pm. Session Lightning Talks. See https://mbgserv18.mobot.org/ocs/index.php/tdwg/tdwg2016/schedConf/program
Franz et al tdwg 2016 introducing lep nettaxonbytes
Franz et al. @ #TDWG16 - "Introducing LepNet – the Lepidoptera of North America Network". Talk # 1139, Friday, December 09, 2016, 02:40 pm. Session Lightning Talks. See https://mbgserv18.mobot.org/ocs/index.php/tdwg/tdwg2016/schedConf/program
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...taxonbytes
View a video recording here: https://vimeo.com/195024485
Franz & Sterner @ #TDWG16 - "A new power balance is needed for trustworthy biodiversity data". Talk # 1134, Friday, December 09, 2016, 11:30 am. Session Contributed Papers 05: Data Gaps, Trust, Knowledge Acquisition. See https://mbgserv18.mobot.org/ocs/index.php/tdwg/tdwg2016/schedConf/program
Franz et al ice 2016 addressing the name meaning drift challenge in open ende...taxonbytes
Presentation for the Symposium: Building the Biodiversity Knowledge Graph for Insects – Components, Progress, and Challenges; 2016 XXV International Congress of Entomology, Orlando, FL – September 26, 2016 (#ICE2016). See https://esa.confex.com/esa/ice2016/meetingapp.cgi/Session/24482
Zhang et al ecn 2016 building an accessible weevil tissue collection for geno...taxonbytes
Poster describing the origin and function of the ASUHIC Weevil Tissue Collection (WTC), see tinyurl.com/weeviltissuecollection; presented at the 2016 Entomological Collections Network Meeting, September 23, 2016, Orlando, Florida. ECN website: http://ecnweb.org/
Franz et al evol 2016 aligning multipe incongruent phylogenies with the euler...taxonbytes
Lightning talk at iEvoBio 2016 (http://www.ievobio.org/), given on June 21, 2016, at Evolution Meetings in Austin, Texas. Brief overview of using Euler/X to align phylogenies. See https://github.com/EulerProject
Johnston ESA 2014 Trogloderus Sand Dune Speciationtaxonbytes
Andrew Johnston's presentation on Trogloderus (Coleoptera: Tenebrionidae) systematics and speciation in Southwestern United States sand dune habitat, given at the 2014 Annual Meeting of the Entomological Society of America in Portland, OR. http://www.entsoc.org/entomology2014
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
1. Reconciling succeeding
taxonomic classifications
Nico M. Franz
School of Life Sciences, Arizona State University
Mingmin Chen, Shizhuo Yu, Bertram Ludäscher *
Department of Computer Science, University of California at Davis
ESA Annual Meeting 2012
November 14, 2012 – Knoxville, TN
* PI – NSF-IIS 1118088: A logic-based, provenance-aware system for merging scientific data under context and classification constraints.
2. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Source: Weakley. 2005. Flora of the Carolinas, Virginia, and Georgia. Available at http://www.herbarium.unc.edu/flora.htm
3. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual columns represent past classifications of Andropogon.
Source: Weakley. 2005. Flora of the Carolinas, Virginia, and Georgia. Available at http://www.herbarium.unc.edu/flora.htm
4. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual rows represent equivalent taxonomic entities, (almost)
regardless of their name labels.
5. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual rows represent equivalent taxonomic entities, (almost)
regardless of their name labels.
Name/synonymy relationships are not sufficiently granular to
capture this evolution of taxonomic views of Andropogon species.
6. Tracking classification provenance with concepts and articulations
Definition: A taxonomic concept is the underlying meaning of a scientific name as stated
by a particular author and publication. It represents the author's full-blown
view of how the name reaches out to un-/observed objects in nature.
Labeling: The abbreviation sec. for the Latin secundum, or "according to", is preceded by
the full Linnaean name and followed by the specific author and publication.
Source: Berendsohn. 1995. The concept of "potential taxa" in databases. Taxon 44: 207–212.
7. Tracking classification provenance with concepts and articulations
Definition: A taxonomic concept is the underlying meaning of a scientific name as stated
by a particular author and publication. It represents the author's full-blown
view of how the name reaches out to un-/observed objects in nature.
Labeling: The abbreviation sec. for the Latin secundum, or "according to", is preceded by
the full Linnaean name and followed by the specific author and publication.
Examples: Andropogon virginicus L. sec. Radford et al. (1968)
Andropogon virginicus L. sec. Weakley (2005)
[earlier, wider concept]
[later, narrower concept]
Utility: Representing multiple classifications (revisions) through concepts makes it possible
to track their similarities and differences through articulations.
Source: Berendsohn. 1995. The concept of "potential taxa" in databases. Taxon 44: 207–212.
8. Five basic articulations between two concepts C1, C2 (set theory)
equivalence
inverse proper
inclusion
exclusion
proper inclusion
overlap
Use of "OR" to express uncertainty.
Example: C1 == OR > C2
Source: Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Syst. Biodiv. 7: 5–20.
9. How does it work? Connecting Hackel 1889 and Small 1933
Step 1: Transcribe two concept hierarchies…
Hackel 1889 (1-12)
Small 1933 (13-16)
…and add unique IDs
10. How does it work? Connecting Hackel 1889 and Small 1933
Step 2: Create a table with all concept labels
Hackel 1889 (1-12)
Small 1933 (13-16)
11. How does it work? Connecting Hackel 1889 and Small 1933
Step 3: Create a table with corresponding parent/child relationships ('is_a')
Hackel 1889 (1-12)
Small 1933 (13-16)
12. How does it work? Connecting Hackel 1889 and Small 1933
Step 4: Create a table with a suitable set of articulations
Hackel 1889 (1-12)
Small 1933 (13-16)
13. How does it work? Connecting Hackel 1889 and Small 1933
Step 4: Create a table with a suitable set of articulations
Hackel 1889 (1-12)
Small 1933 (13-16)
Translation
Congruence
15. Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
16. Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
Input of articulations (goal: achieve a complete and consistent mapping)
Taxonomic experts will not input ∞ articulations
Taxonomic experts will miss relevant articulations ("mir")
Taxonomic experts could be uncertain of articulations ("possible worlds")
Taxonomic experts could posit logically inconsistent articulations
17. Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
Input of articulations (goal: achieve a complete and consistent mapping)
Taxonomic experts will not input ∞ articulations
Taxonomic experts will miss relevant articulations ("mir")
Taxonomic experts could be uncertain of articulations ("possible worlds")
Taxonomic experts could posit logically inconsistent articulations
"CleanTax" is being developed to explore solutions to these challenges. 1
1
There is continuation/overlap with the "Exploring Taxonomic Concepts" project that focuses on character matching (DBI-1147266).
18. CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
19. CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
CleanTax assesses consistency and completeness of articulations
Output of the set of maximally informative relationships – "mir"
Report , causal explanation, interactive repair of inconsistent articulations
Calculate multiple possible worlds (if ambiguous articulations are present)
20. CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
CleanTax assesses consistency and completeness of articulations
Output of the set of maximally informative relationships – "mir"
Report , causal explanation, interactive repair of inconsistent articulations
Calculate multiple possible worlds (if ambiguous articulations are present)
CleanTax creates multiple user-preferred views of the input and merge taxonomies
Reduced Containment Graph – RCG; and Directed Acyclic Graph – DAG
22. 'Training' CleanTax on abstract examples
Input
Output – raw hmtl list of articulations ("look-up" + inferred)
23. 'Training' CleanTax on abstract examples
Input
Output – 72 maximally informative relationships = mir
Based on the mir, all theoretically possible articulations
of the R32 lattice can be logically deduced.
24. Abstract Example 1 – Reduced Contained Graph of the merge
Input
Blue circles
Black circles
shared concepts
unique concepts
Black solid arrows expert input
Grey dashed arrows deducible
Red solid arrows newly inferred
25. More CleanTax training… our infamous Abstract Example 4
Example 4 – representing multiple 'possible worlds'
3/5 articulations
are disjoint (OR)
26. Reduced Containment Graphs of 7 'possible worlds' (combined or's)
Example 4 – CleanTax infers 7 possible worlds (user can view / select / repair / rerun)
Asserted by expert
Implied articulations
Inferred by CleanTax
Shared concepts
Unique concepts
Reduced Containment Graphs (RCGs)
27. Exploring "views" of the merge - circular Euler diagrams of PW1
Table of mir
Corresponding Euler diagram (circular)
Identical
information
content
28. Correspondence of circular and Directed Acyclic Diagrams
PW1: Typical Euler circles
Euler-DAG of PW1
Identical
information
content
30. Real-life examples, I – reconciling two weevil classifications 1
Curculionoidea sec. Kuschel 1995
Curculionoidea sec. Marvaldi & Morrone 2000
Concepts 348-372
Concepts 117-157
1
Initial articulations provided by NMF.
31. Merge taxonomy of Kuschel 1995 / Marvaldi & Morrone 2000
CleanTax RCG – 1 newly inferred articulation (
) + several inconsistencies
Microcerinae sec. M&M 2000 [363] are included in Brachycerinae sec. KU 1995 [148]
(yes, I missed that; Kuschel 1995 only mentions it in the text, not in the main taxon list)
32. Real-life examples, II – reconciling two weevil classifications
Curculionoidea sec. Crowson 1981
Curculionoidea sec. Marvaldi & Morrone 2000
Concepts 348-372
Concepts 1-17
33. Merge taxonomy of Crowson 1981 / Marvaldi & Morrone 2000
CleanTax RCG – 4 newly inferred articulations (
) / does not depict overlap (><)
e.g. {Aglycyderidae [2], Allocorynidae [3], Oxycorynidae [17]} sec. Crowson 1981
are included in Belidae [353] sec. M&M 2000
34. Euler-DAG of the Crowson / Marvaldi & Morrone merge taxonomy
Solid lines – proper inclusion
Black solid line given
Green solid line inferred
Orange solid line explanatory
[Red solid line inconsistent]
Dashed lines - overlap
Black dashed line given
Green dashed line inferred
Orange dashed line explanatory
Red dashed line inconsistent
Concept boxes - concepts
Orange square box shared
Black square box unique
Dashed square box combined
Dashed oval box inconsistent
35. DAGs generate "combined concepts"
Belidae
sec. MM2000
Belidae
sec. Cro1981
intersections of overlaps
"Belidae"
INT(Cro/MM)
Shared - [2,3,17,357]
36. New naming/viewing conventions – simple merges (shared, unique) *
Input
Concept B
A
Attelabidae CR81
AttCR81 [9]
Output
Concept A
B
Attelabidae MM00
AttMM00 [55]
Concept A – Concept B
AB
Attelabidae CR81 – Attelabidae MM00
AttCR81.AttMM00
* Simple extension to three or more congruent concepts.
37. New naming/viewing conventions – combined merges (overlap; T1, T2)
Input
Concept A
Concept B
A
Belidae CR81
BelCR81 [10]
B
Belidae MM00
BelMM00 [353]
Euler
Ab
BelCR81.
belMM00
AB
BelCR81.
BelMM00
A
aB
BelMM00.
belCR81
B
DAG
Ab
AB
aB
38. Input
Concept A
Concept C
A
Curculionidae CR81
CurCR81
T1, T2, T3
Concept B
B
Curculionidae KU95
CurKU95
C
Curculionidae s.s. MM00
CurMM00
Euler
ABc
Abc
aBc
CurCR81.
CurKU95.
curMM00
CURCR81.
curKU95.
curMM00
CurKU95.
curCR81.
curMM00
ABC
AbC
aBC
CurCR81.
CurKU95.
CurMM00
CurCR81.
CurMM00.
curKU95
CurKU95.
CurMM00.
curCR81
abC
CurMM00.
curCR81.
curKU95
DAG
A
Abc
B
ABc
C
aBc
AbC
ABC
aBC
abC
40. Current workflow / "usability" (CleanTax on "Lore" server, UC Davis)
Input script
Possible worlds
Visualization
Euler-DAG
Output file
Inconsistency
Repair, explanation
Interactive
reduction of PWs
(decision tree)
41. Shared, real use cases (Perelleschus) with ETC feature-based project
5 taxonomies, 48 concepts, expert articulations, plus textual feature diagnoses
42. Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
43. Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
The current approach treats concepts as a 'black box' – the input data are simple and
make no reference to type specimens, synapomorphies, diagnostic features, etc.
"Exploring Taxonomic Concepts" project will develop tools for a balanced view.
Nevertheless, the articulations can expose deep and varied semantic links among
succeeding classifications.
44. Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
The current approach treats concepts as a 'black box' – the input data are simple and
make no reference to type specimens, synapomorphies, diagnostic features, etc.
"Exploring Taxonomic Concepts" project will develop tools for a balanced view.
Nevertheless, the articulations can expose deep and varied semantic links among
succeeding classifications.
CleanTax may be the first attempt to 'explain' classification provenance to logic
reasoners. This could have considerable implications for future data integration.
45. Acknowledgments
Shawn Bowers, Dave Thau, Alan Weakley
NSF-IIS 1118088:
"III-SMALL: A logic-based, provenance-aware system for merging scientific data under
context and classification constraints"
"Euler" team, UC Davis