This document describes the process of creating a world catalog of the fly family Therevidae using a database called Mandala. Key aspects included:
- Inputting specimen, literature, and taxonomic data into Mandala for storage and linking.
- Using scripts to automate tasks like sorting specimens by location and generating taxon summaries.
- Exporting formatted text from Mandala for final editing and publication of the catalog, while keeping data linked and reusable online.
- Collaboration between several experts over many years to compile the necessary information for the comprehensive catalog.
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...Martin Kalfatovic
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversity Heritage Library. Martin R. Kalfatovic and William Ulate. CONABIO Teleconference. 29 August 2013.
"We want something like Google ... why do we get so many results?" : implemen...CIGScotland
Description of Durham developed unified resource discovery, and the challenges and rewards of integrating library, archival, museum and archaeological collections.
Presented at the CIG Scotland seminar 'Resource Discovery : from catalogues to discovery services' at the National Library of Scotland, Edinburgh, 21st March 2018
Date: March 3rd, 2016
Venue: Trondheim, Norway. Doctoral Seminar at NTNU
Please cite, link to or credit this presentation when using it or part of it in your work.
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...Martin Kalfatovic
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversity Heritage Library. Martin R. Kalfatovic and William Ulate. CONABIO Teleconference. 29 August 2013.
"We want something like Google ... why do we get so many results?" : implemen...CIGScotland
Description of Durham developed unified resource discovery, and the challenges and rewards of integrating library, archival, museum and archaeological collections.
Presented at the CIG Scotland seminar 'Resource Discovery : from catalogues to discovery services' at the National Library of Scotland, Edinburgh, 21st March 2018
Date: March 3rd, 2016
Venue: Trondheim, Norway. Doctoral Seminar at NTNU
Please cite, link to or credit this presentation when using it or part of it in your work.
Understanding Taxonomy, Drupal Camp Colorado, June 2009David Lanier
The power and flexibility of Drupal's taxonomy (classification) system is one thing that sets it apart from other CMSs. Yet many Drupal builders fail to fully harness what Drupal gives them in taxonomy. This session will help you get the most from it.
Session Overview:
- Introduction to taxonomy: what it is.
- Clearing up some terminology: what it means.
- Current uses, by Drupal core modules, by contributed modules, and on live sites.
- How taxonomy relates to the rest of the Drupal framework.
- When to use taxonomy and when to use something else, such as custom fields or custom content types.
- Modules that further expand the usefulness of taxonomy.
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
Millions of articles, hundreds of government websites, and countless catalog records are all available to you through the INSPIRE. You’ll learn search tips, discover unique collections, and explore a valuable, reliable tool available to all Hoosiers.
Catholic University of America College of Library and Information Sciences LSC 747 Special Collections lecture summer 2011 at the Smithsonian Institution
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Libraries around the world have a long tradition of maintaining authority files to assure the consistent presentation and indexing of names. As library authority files have become available online, the authority data has become accessible -- and many have been published as Linked Open Data (LOD) -- but names in one library authority file typically had no link to corresponding records for persons and organizations in other library authority files. After a successful experiment in matching the Library of Congress/NACO authority file with the German National Library's authority file, an online system called the Virtual International Authority File was developed to facilitate sharing by ingesting, matching, and displaying the relations between records in multiple authority files.
The Virtual International Authority File (VIAF) has grown from three source files in 2007 to more than two dozen files today. The system harvests authority records, enhances them with bibliographic information and brings them together into clusters when it is confident the records describe the same identity. Although the most visible part of VIAF is a HTML interface, the API beneath it supports a linked data view of VIAF with URIs representing the identities themselves, not just URIs for the clusters. It supports names for person, corporations, geographic entities, works, and expressions. With English, French, German, Spanish interfaces (and a Japanese in process), the system is used around the world, with over a million queries per day.
Speaker
Thomas Hickey is Chief Scientist at OCLC where he helped found OCLC Research. Current interests include metadata creation and editing systems, authority control, parallel systems for bibliographic processing, and information retrieval and display. In addition to implementing VIAF, his group looks into exploring Web access to metadata, identification of FRBR works and expressions in WorldCat, the algorithmic creation of authorities, and the characterization of collections. He has an undergraduate degree in Physics and a Ph.D. in Library and Information Science.
Lectures presented during the two-day PAARL-sponsored Public Consultation and Training Workshop on RDA Policy and Action Plan for Philippine Libraries held at
Phela Grande Hotel, Magsaysay Avenue cor. Atis St., General Santos City on 28-29 August 2014.
Slides from talk for ICZN/SHNH symposium in honour of Charles Sherbon: "Anchoring Biodiversity Information: From Sherborn to the 21st Century and Beyond"
Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.
However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.
The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.
Presented at the Lotico Berlin Semantic Web Meetup.
Understanding Taxonomy, Drupal Camp Colorado, June 2009David Lanier
The power and flexibility of Drupal's taxonomy (classification) system is one thing that sets it apart from other CMSs. Yet many Drupal builders fail to fully harness what Drupal gives them in taxonomy. This session will help you get the most from it.
Session Overview:
- Introduction to taxonomy: what it is.
- Clearing up some terminology: what it means.
- Current uses, by Drupal core modules, by contributed modules, and on live sites.
- How taxonomy relates to the rest of the Drupal framework.
- When to use taxonomy and when to use something else, such as custom fields or custom content types.
- Modules that further expand the usefulness of taxonomy.
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
Millions of articles, hundreds of government websites, and countless catalog records are all available to you through the INSPIRE. You’ll learn search tips, discover unique collections, and explore a valuable, reliable tool available to all Hoosiers.
Catholic University of America College of Library and Information Sciences LSC 747 Special Collections lecture summer 2011 at the Smithsonian Institution
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Libraries around the world have a long tradition of maintaining authority files to assure the consistent presentation and indexing of names. As library authority files have become available online, the authority data has become accessible -- and many have been published as Linked Open Data (LOD) -- but names in one library authority file typically had no link to corresponding records for persons and organizations in other library authority files. After a successful experiment in matching the Library of Congress/NACO authority file with the German National Library's authority file, an online system called the Virtual International Authority File was developed to facilitate sharing by ingesting, matching, and displaying the relations between records in multiple authority files.
The Virtual International Authority File (VIAF) has grown from three source files in 2007 to more than two dozen files today. The system harvests authority records, enhances them with bibliographic information and brings them together into clusters when it is confident the records describe the same identity. Although the most visible part of VIAF is a HTML interface, the API beneath it supports a linked data view of VIAF with URIs representing the identities themselves, not just URIs for the clusters. It supports names for person, corporations, geographic entities, works, and expressions. With English, French, German, Spanish interfaces (and a Japanese in process), the system is used around the world, with over a million queries per day.
Speaker
Thomas Hickey is Chief Scientist at OCLC where he helped found OCLC Research. Current interests include metadata creation and editing systems, authority control, parallel systems for bibliographic processing, and information retrieval and display. In addition to implementing VIAF, his group looks into exploring Web access to metadata, identification of FRBR works and expressions in WorldCat, the algorithmic creation of authorities, and the characterization of collections. He has an undergraduate degree in Physics and a Ph.D. in Library and Information Science.
Lectures presented during the two-day PAARL-sponsored Public Consultation and Training Workshop on RDA Policy and Action Plan for Philippine Libraries held at
Phela Grande Hotel, Magsaysay Avenue cor. Atis St., General Santos City on 28-29 August 2014.
Slides from talk for ICZN/SHNH symposium in honour of Charles Sherbon: "Anchoring Biodiversity Information: From Sherborn to the 21st Century and Beyond"
Semantic Web technologies (such as RDF and SPARQL) excel at bringing together diverse data in a world of independent data publishers and consumers. Common ontologies help to arrive at a shared understanding of the intended meaning of data.
However, they don’t address one critically important issue: What does it mean for data to be complete and/or valid? Semantic knowledge graphs without a shared notion of completeness and validity quickly turn into a Big Ball of Data Mud.
The Shapes Constraint Language (SHACL), an upcoming W3C standard, promises to help solve this problem. By keeping semantics separate from validity, SHACL makes it possible to resolve a slew of data quality and data exchange issues.
Presented at the Lotico Berlin Semantic Web Meetup.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
ILS on a shoe-string budget: open source software in a non-profit organization
Jolene Bennett and Zachary Osborne
Without the budgetary means to purchase proprietary software or hire consultants, the
Weston Family Library at the Toronto Botanical Garden embarked upon an
ambitious project to migrate its catalogue from InMagic to Koha, an open-source ILS platform. This session outlines the resources and processes that were used,
including volunteer technical expertise, freely available software, and other web
applications in order to migrate non-MARC bibliographic data from InMagic to a usable
format for the new Koha platform. We describe the win-win process of using
volunteer talent in a tight labour market to create an ILS in a non-profit organization.
By recruiting library technician and librarian volunteers, the library gained its necessary catalogue upgrade, and volunteers gained incomparable and marketable experience.
Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Project Website: http://www.researchobject.org/
researchobjects.org is a community project that has developed an approach to describe and package up all resources used as part of an investigation as Research Objects (RO’s).
RO’s - provide two main features; a manifest - a consistent way to provide a well-typed, structured description of the resources used in an investigation; and a ‘bundle’ - a mechanism for packaging up manifests with resources as a single, publishable unit.
RO’s therefore carry the research context of an experiment - data, software, standard operating procedures (SOPs), models etc - and gather together the components of an experiment so that they are findable, accessible, interoperable and reproducible (FAIR). RO’s combine software and data into an aggregative data structure consisting of well described reconstructable parts.
RO’s have the potential to address a number of challenges pertinent to open research including: a) supporting interoperability between infrastructures by using ROs as a primary mechanism for exchange and publication b) supporting the evolution of research objects as a living collection, enabling provenance tracking c) providing the ability to pivot research object components (data, software, models) that are not restricted to the traditional publication.
Here we present work towards the development and adoption of ROs:
(i) A series of specifications and conventions, using community standards, for the RO manifest and RO bundles.
(ii) Implementations of Java, Python and Ruby APIs and tooling against those specifications;
(iii) Examples of representations of the RO models in various languages (e.g. JSON-LD, RDF, HTML).
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
Ron Daniel and Corey Harper of Elsevier Labs present at the Columbia University Data Science Institute: https://www.elsevier.com/connect/join-us-as-elsevier-data-scientists-present-at-columbia-university
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Slides from the Tri-State College Library Cooperative (TCLC) Summer Camp 2009, outlining setting up search boxes into the library catalog and creating RSS feeds from catalogs using Dapper.net.
The explosion in growth of the Web of Linked Data has provided, for the first time, a plethora of information in disparate locations, yet bound together by machine-readable, semantically typed relations. Utilisation of the Web of Data has been, until now, restricted to the members of the community, eating their own dogfood, so to speak. To the regular web user browsing Facebook and watching YouTube, this utility is yet to be realised. The primary factor inhibiting uptake is the usability of the Web of Data, where users are required to have prior knowledge of elements from the Semantic Web technology stack. Our solution to this problem is to hide the stack, allowing end users to browse the Web of Data, explore the information it contains, discover knowledge, and use Linked Data. We propose a template-based visualisation approach where information attributed to a given resource is rendered according to the rdf:type of the instance.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Kampmeier ecn 2012
1. Catalog magic:
Behind the Scenes of Creating
a World Catalog of the
Therevidae
Gail E. Kampmeier
Illinois Natural History Survey, Prairie Research Institute
University of Illinois at Urbana-Champaign
gkamp@illinois.edu
Irina Brake
National Museum of Natural History, London
Kristin Algmin
University of Illinois at Urbana-Champaign
2. Why is it so Difficult to get from
Here… to… Here?
Therevidae
5. 1995 Freshmen of NSF PEET*
• Towards a World
Monograph of
the Therevidae
(Insecta: Diptera)
– 1995 – 2006
• Therevidae is
medium-sized
family with (now)
– 4 subfamilies
– ~130 genera
– ~1150 species
*National Science Foundation's Partnerships for Enhancing Expertise in Taxonomy
6. Products
• Trained
– 9 dipterists, 7 through Ph.D.
– Scientific illustrator
– Dozens of students in
databasing
• Publications
– 71 publications during grant
– 20 more since & counting
• Digitization
– Mandala database 1995-
– Website
– Collaborations with
DiscoverLife.org & GBIF
…and the world is unlikely to run
out of flies to study!
7. Process: Specimens
• Collect, sort, curate,
label, sex, determine, &
database specimen
information
– Assign unique identifiers
where none exist
• Visit & borrow material
from museums
• Examine types
8. Is All that Work Worthwhile?
• "A taxonomic paper often
plants the very seeds of its own
obsolescence." (Johnson 2011)
• There is no getting around the
work required to produce a
catalog or any taxonomic
treatment.
• What we can do, is make sure
that the information is
accessible and reusable.
• Is it time to ditch traditional
catalogs?
Henicomyia by J. Marie Metz
9. What Choices Do You Have?
• Last year's symposium on Arthropod Collections
databases explored some of your options, but
not all are suitable.
– Online collections database platforms (not suitable
for creating taxonomic catalogues that cross
collections not included)
• Arctos
• Specify 6
– Online taxonomic database platforms – optimize
creation of species pages
• Species File – taxonomic authority files
• Scratchpads – community-oriented contributions
• 3I – online revisions of taxa
• Encyclopedia of Life – Expert LifeDesks
10. What Choices Do You Have?
• Last year's symposium on Arthropod Collections
databases explored some of your options, but not all
are suitable.
– Online platforms designed to parse or take parsed data &
repurpose it (incl. online taxonomic database platforms
above)
• GBIF's Integrated Publishing Toolkit (IPT) – not thought of as a
workbench-level tool
• LUCID – especially good for keys & descriptive data
• Biodiversity Informatics Journal - Will take in parsed data from
Scratchpads and IPT & eventually databases (mechanism
unclear)
– Desktop or server-based platforms – usually in Filemaker
or 4D or MSAccess
• Mandala – http://www.inhs.illinois.edu/research/mandala/
• Biota - http://viceroy.eeb.uconn.edu/biota/
• Mantis - http://insects.oeb.harvard.edu/etypes/Downloads.htm
11. The Process: Decide on a Format
• Was decided to publish as traditional Myia catalog
• Expectations about what is in a "traditional catalog" or taxonomic
treatment & how it should be formatted
– Print styles (italics, bold, centered, hanging indents)
– Accented characters (for literature references, authority names, and localities)
– Special characters (for ♂ and ♀ signs)
– Notes kept with the taxon entry or as an appendix?
• Use Mandala to achieve retrievability & formatting of output
12. General Workflow:
TherevidMandala Database
• Input raw data: The Bulk of the Work is HERE!
• Link data in related tables
• Create fields for catalog output for
Taxa & their history
Literature (including disambiguation of
similar citations)
List of countries (& selected
states/provinces) by biogeographic region for
valid taxa
Create & number notes for listing in
appendix
• Create a script that finds data to be exported
• Create scripts to format data including styles
(bold, italics, codes for paragraph formatting)
• Export TaxonID& catalog output field only to
Filemaker Pro to isolate output & preserve
formatting including accented characters
Mandala
production db
Acrobat
MSWord
Catalog
Catalog Output
to new FMP db
13. Things Can Get Messy
• Some operations require expert
eyes to determine fitness-for-use
• A database can find, sort, &
summarize, but ultimately does
not "see" anomalies unless
specifically programmed to do so
• Automation (scripting, creation
of calculated fields) requires
time, refinement, & expertise
• Parsed data are key to flexibility
14. Create Taxonomic Hierarchy
Use to
automate
searches &
sort catalog
output by
classification
hierarchy,
rank, &
alphabetically
16. We Used the Specimen* Table to
Define our Distribution
*based on 105,889 specimens with valid names & parsed localities
17. Script to Find & Sort Specimens
• Once sorted, export a summary for each
taxon
18. • Summary can then be
formatted in MSWord
• Bring back into Filemaker
for final formatting
• Spot possible outliers
• Match TaxonID to import
formatted information
into production db
TaxonIDx Biogeographic Region x
Country x State/Province
19. Filling in the Cracks
• All taxa, literature, and specimens to be included in the
catalog were marked by an expert with a code for easier
retrieval
• Communication about scripts & field calculations were
done in Google Docs
• Literature with the same authors and years had to be
disambiguated with letters following the year.
– Used in both the literature cited and text of the catalog
• After including the notes in the text flow, it was decided
by the authors to number and put them into an appendix.
– Finding & sorting of these could be automated
– Replace with series allowed numbering of notes
– Awkward (but necessary) to renumber notes when new ones
were found to be needed.
20. General Workflow
• TaxonID is for reference only
• Resize catalog output field
(in layout mode) so all contents
will always be seen (page size)
& make sure to size the field to
fit the contents
• Open in Preview to check
• Save as PDF
Mandala
production db
Catalog Output
to new FMP db
21. General Workflow
• This step mainly
preserves catalog text styles &
accented characters out of FMP
• Save As MS Word document
after verifying expected results.
• Saving as Word will collapse
the formatting into giant
paragraphs
Mandala
production db
Catalog Output
to new FMP db
Acrobat
22. General Workflow
Mandala
production db
Acrobat
MSWord
Catalog
Catalog Output
to new FMP db
• Create styles in MSWord for
formatting text & paragraphs
• Search & replace special
characters (%%, $$, zzz, ||, //);
♂ and ♀ signs
• Clean up extra spaces,
paragraphs, & punctuation
• Using Google Docs is not (yet)
an option for a traditionally
published catalog as the
formatting tools aren't adequate
24. Consensus!
• When the experts are happy,
we're done, right?
• Still have to update the
database & web output online
– complements printed
catalog as it is dynamic
• Push corrections to public
portals of data (own website,
DiscoverLife, GBIF, etc.)
• So "magic" is a relative, kind
of wishful term—the future is
more likely in platforms such
as those being coordinated by
Pensoft.
25. References, Resources
• Miller, J. et al. 2012. From taxonomic literature to cybertaxonomic
content. BMC Biology 10:87http://www.biomedcentral.com/content/pdf/1741-
7007-10-87.pdf
• Johnson, N.F. 2012. A collaborative, integrated and electronic future for
taxonomy. Invertebrate Systematics 25: 471–475.
http://www.publish.csiro.au/?act=view_file&file_id=IS11052.pdf
• Biodiversity Data Journal (publication debut Dec.
2012)http://www.pensoft.net/journals/bdj
• Symposium: Arthropod Collections Databases. 2011 ECN
meeting, Reno, NV http://www.ecnweb.org/past/2011
• Darwin Core Standard http://rs.tdwg.org/dwc/
• Kampmeier, G. E. and M. E. Irwin. 2009. Meeting the interrelated
challenges of tracking specimen, nomenclature, and literature data in
Mandala. Chapter 15 in T. Pape, D. Bickel, and R. Meier (eds.) Diptera
Diversity: Status, Challenges and Tools. Leiden: Brill Academic
Publishers, pp. 407-437.
http://www.inhs.illinois.edu/research/mandala/Ch15_Mandala_DiptDiv2009.pdf
26. More Refs & Resources
• Kennedy, J., R. Hyam, R. Kukla, T. Paterson. 2006.
Standard data model representation for taxonomic
information. A Journal of Integrative Biology 10(2):
220-230. http://www.hyam.net/publications/omi.2006.10.220.pdf
• Penev, L., T. Georgiev, P. Stoev, D. Roberts, V. Smith.
2012. Making small data big! The Biodiversity Data
Journal (BDJ). TDWG 2012, Beijing, 22-26 October.
http://www.tdwg.org/fileadmin/2012conference/slides/Biodiversity_Data
_Journal.pdf
• Catalog of Life
http://www.catalogueoflife.org/colwebsite/sites/default/files/2012_CoL-
Standard_Dataset_v6_3.pdf
27. Acknowledgements
• Michael E. Irwin
• F. Chris Thompson
• Neal Evenhuis
• Christine Lambkin
• Shaun Winterton
• Don Webb
• Mark Metz
• Martin Hauser
• Kevin Holston
• Steve Gaimari
• J. Marie Metz
• David Yeates
• Amanda Buck
• Brian Wiegmann
• Evert Schlinger
• John Pickering
• FMWebschool
• National Science
Foundation
• Schlinger Foundation
• Illinois Natural History
Survey
• University of Illinois
• Discover Life
• Biodiversity Information
Standards (TDWG)
NSF Projects:
Therevid PEET:
DEB-95-21925;
99-77958
Fiji Arthropod
Survey: DEB-
0425790
FLYTREE: EF-
0334948
Tabanid PEET:
DEB 07-31528
30. Why Use A Database?
• Flexibility
– Finely parsed data may be
pieced together for
publication, labels
– Scripting of often used
functions
• Reuse/repurposing of data
– Sharing with GBIF,
DiscoverLife.org, museums
• Centralization of work
environment
– Workers can be anywhere,
any time zone
– Backup can be automated
• Individual work environment
– Choice with platforms not
required to be online
(although trade-off)
31. Vision
• "Taxonomy should fully embrace
electronic media and informatics tools.
Particularly, this step requires the
development and widespread
implementation of community data
standards. The barriers to progress in
these areas are not technological, but are
primarily social. The community needs to
see clear evidence of the value added
through these changes in procedures and
insist upon their use as standard practice."
Johnson, N.F. 2011. A collaborative, integrated and electronic future for taxonomy.
Invertebrate Systematics 25: 471.
32. Any Database Can Record the
Basics, but…
• How the information is related is also key
– defining taxonomic ranks as parent-child relationship
– valid taxonomic entities related to their synonyms
– types and specimens determined for a taxon
– literature associated with a taxonomic name
– collecting localities and collecting events
• Readability – if a published work rather than raw database output
• Format
– Based on existing print models?
– Print styles (italics, bold, centered, hanging indents)
– Accented characters (for literature references, authority names, and
localities)
– Special characters (for ♂ and ♀ signs)
– Notes kept with the taxon entry or as an appendix?
33. Mandala Data Model
• Not all of this is
required for a
traditional
catalog, but
these tables
contain a
wealth of vital,
interrelated
data.
• Tables with
rounded edges
are authority
files
We were fortunate to have two rounds of funding for this project on a medium-sized family of flies. We trained dipterists that are contributing their expertise even today, and continuing to work on the family Therevidae as well as other Diptera.
For better or worse, not yet.
The main part of the work, which has consumed many person hours to enter and verify is in the Mandala database devoted to the Therevidae.
Find all specimens with valid names and a localityID
You cannot create style sheets in Acrobat
Photo of Kevin,
A spreadsheet is not flexible, neither is a field notebook or index cards
But not just the community, the individual also needs to see and embrace this for him or herself
All this goes on in the background, once you have indicated which taxa you want to delimit.