Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
Global Biodiversity Information Facility (GBIF) - 2012Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
Talk to OpenForum Academy (Open Forum Europe) about Text and data Mining. Four use cases selected fo non-scientists. Also discussion of latest on Europena copyright reform and TDM exceptions
Global Biodiversity Information Facility (GBIF) - 2012Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) and GBIF Norway for the Department of Technical and Scientific Conservation (CONSERV) at the Natural History Museum, University of Oslo. Tøyen, Oslo, 7 November 2012.
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
Digital research: Collections, data, tools and methods Stella Wisdom
Presentation for the Economic and Social Research Council North West Social Sciences Doctoral Training Partnership event on 26th November 2021, by Stella Wisdom, Digital Curator, British Library
#HepaticaWeek April 2016, GBIF data publishingDag Endresen
Citizen science species observation reporting and data publishing with the Global Biodiversity Information Facility (GBIF). Video feed available at: https://youtu.be/t22QmFPcvOM?t=34m4s
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14Dag Endresen
GBIF data publishing seminar at the Department for Biology at the University of Bergen. http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
Lecture presented at the Journals Club of the Naturhistorisches Museum Bern, March 17, 2014.
"Towards an (European) Open Biodiversity Knowledge Management System"
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...Dag Endresen
Regional NODES meeting of Europe 2010. Presentation of the Global Biodiversity Resources Discovery System (GBRDS, under development) for the NODES. How do we the NODES want the GBRDS to look like. What do we the NODES wish/need the GBRDS to be.
http://www.gbif.org/
http://gbrds.gbif.org/
http://code.google.com/p/gbif-registry/
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
Smith, V.S. 2013. Delivering biodiversity knowledge in the information age. Hellenic Botanical Society, Thessaloniki, Greece, 3-6 Oct. 2013. [Delivered via video link through Google Hangouts]
Digital research: Collections, data, tools and methods Stella Wisdom
Presentation for the Economic and Social Research Council North West Social Sciences Doctoral Training Partnership event on 26th November 2021, by Stella Wisdom, Digital Curator, British Library
#HepaticaWeek April 2016, GBIF data publishingDag Endresen
Citizen science species observation reporting and data publishing with the Global Biodiversity Information Facility (GBIF). Video feed available at: https://youtu.be/t22QmFPcvOM?t=34m4s
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
A talk given at the Geological Society of London, UK on 2016/03/09 as part of the Lyell meeting on Palaeoinformatics. http://www.geolsoc.org.uk/lyell16 #lyell16
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14Dag Endresen
GBIF data publishing seminar at the Department for Biology at the University of Bergen. http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
Lecture presented at the Journals Club of the Naturhistorisches Museum Bern, March 17, 2014.
"Towards an (European) Open Biodiversity Knowledge Management System"
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...Dag Endresen
Regional NODES meeting of Europe 2010. Presentation of the Global Biodiversity Resources Discovery System (GBRDS, under development) for the NODES. How do we the NODES want the GBRDS to look like. What do we the NODES wish/need the GBRDS to be.
http://www.gbif.org/
http://gbrds.gbif.org/
http://code.google.com/p/gbif-registry/
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
Smith, V.S. 2013. Delivering biodiversity knowledge in the information age. Hellenic Botanical Society, Thessaloniki, Greece, 3-6 Oct. 2013. [Delivered via video link through Google Hangouts]
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...LIBER Europe
This talk was given by Prof. Geoffrey Boulton of the University of Edinburgh at LIBER's 42nd annual conference in Munich. Here is a brief summary: "The data storm that has been unleashed by novel means of data acquisition, manipulation and their instantaneous communication have posed both great challenges and opportunities for science. The challenge is to maintain scientific self-correction, which depends on concurrent publication of concepts and the underlying evidence. The opportunity is to exploit massive and complex data volumes in creating new knowledge. Both are non-trivial tasks. The former requires ‘intelligent openness‘."
"The latter requires new ways of thinking and new forms of collaboration, which make major demands on scientists, their institutions, those that fund science and those who publish it. Open access publishing is important, but open data is fundamental to scientific progress."
"In a post-Gutenberg era, can the library maintain its historic role as an efficient repository of scientific knowledge? Can it provide support for the creation of new knowledge? What responsibilities should it discharge, and how? What skills are required by those discharging the library function? And how do we achieve a realisable objective, of having all the publications online, all the data online, and for the two to be interoperable?"
Learn more about LIBER at www.libereurope.eu
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session.
In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers.
The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme.
2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes.
In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust.
[1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
[2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032
Australia's Environmental Predictive CapabilityTERN Australia
Federating world-leading research, data and technical capabilities to create Australia’s National Environmental Prediction System (NEPS).
Community consultation presentation.
3-12 February 2020
Dr Michelle Barker (Facilitator)
(Presentation v5)
Similar to The biodiversity informatics landscape: a systematics perspective (20)
FP7 Funded RI Project experiences: some overly honest tips from a project coo...Vince Smith
Smith, V.S. 2014. FP7 Funded RI Project experiences: some overly honest tips from a project coordinator, EC Horizon 2020 Research Infrastructures Information Day in at the Natural History Museum London, U.K. 18 June 2014.
No specimen left behind: Collections digitisation at the NHM, London*Vince Smith
Presentation on the Natural History Museum, London Digitisation Programme, given at the "Collections for the 21st Century" meeting in Gainesville, Florida, 5-6 May 2014
Assisted restructure of web content for paper-based presentation: a look at w...Vince Smith
Heaton, A., Rycroft, S., Baker, E., Bouton, K., Scott, B., Koureas, D., Livermore, L., Roberts, D., Smith, V. 2013 Assisted restructure of web content for paper-based presentation: a look at workflows and data representations. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov., 2013.
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...Vince Smith
King, D., Sautter, G., Morse, D., Penev, L., Biserkov, J., Georgiev, T., Roberts, D., Smith, V. Bibliography of Life: Comprehensive services for biodiversity bibliographic references (POSTER). TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov., 2013.
Scratchpads: the Virtual Research Environment for biodiversity dataVince Smith
Rycroft, S., Roberts, D., Smith, V., Heaton, A., Bouton, K., Livermore, L., Koureas, D., Baker, E. 2013. Scratchpads: the Virtual Research Environment for biodiversity data. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov., 2013.
Next generation sequencing requires next generation publishing: the Biodivers...Vince Smith
Penev, L., Stoev, P., Komericki, A., Akkari, N., Li, S., Zhou, X., Edmunds, S., Hunter, C., Weigand, A., Porco, D., Zapparoli, M., Georgiev, T., Mietchen, D., Roberts, D., Smith, V. 2013. Next generation sequencing requires next generation publishing: the Biodiversity Data Journal published the first eukaryotic new species with a fully sequenced transcriptome, DNA barcode and microcomputed tomography. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov.
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Vince Smith
Koureas, D., Livermore, L., Roberts, D., Smith, V. 2013. Use it or lose it: crowdsourcing support and outreach activities in a hybrid sustainability model for e-infrastructures – the ViBRANT project case studies. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov., 2013.
Don't make me think: biodiversity data publishing made easyVince Smith
Presented by V. Smith at the 2013 iEvoBio Conference. Part of Evolution 2013, the joint annual meeting of the Society for the Study of Evolution (SSE), the Society of Systematic Biologists (SSB), and the American Society of Naturalists (ASN). June 21-26, 2013, Snowbird Alpine Village, Utah, USA.
Don’t make me think: biodiversity data publishing made easyVince Smith
Presented by Vince Smith at the iEvoBio 2013 meeting in Snowbird, Utah, USA on 25th June, 2013. The presentation coauthors are Alice Heaton, Laurence Livermore, Simon Rycroft and Ben Scott from the Natural History Museum, London, and Lyubomir Penev from Pensoft Publishing, Bulgaria.
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
2. Overview
1.
Background – the biodiversity informatics domain
•
•
•
2.
Social challenges
•
•
•
3.
Mobilizing existing data (metadata, literature, collections)
New forms of data ([meta]genomics & observatories)
Synthetic challenges
•
•
•
5.
Openness
Collaboration and communities
Standards, identifiers & protocols
(Big) data challenges
•
•
4.
The problem (i.e. why are we here)
Representations of the domain (data, infrastructures, projects…)
Toward an integrated view (strategy)
Data Aggregation & linking
Visualisation
Modeling
Next steps (data infrastructures & funding)
•
Lessons learned: new informatics opportunities in H2020
4. The problem – integrating biodiversity research
How to we join up these activities?
What infrastructures do we need?
(technologies, tools, standards…)
What processes do we need?
(Modelling, workflows…)
What data do we need?
(Genes, localities…)
How do we use this as a tool?
Species conservation & protected areas
Impacts of human development
Biodiversity & human health
Impacts of climate change
Food, farming & biofuels
Invasive alien species
5. Natural History – the foundation
Darwin’s “tangled bank”…
"It is interesting to contemplate a tangled
bank, clothed with many plants of many
kinds, …, so different from each other, and
dependent upon each other in so complex a
manner, have all been produced by laws acting
around us.”
C. Darwin "On the Origin of Species”, 1859
Systematics, a foundational “law”
7. A granular understanding of biodiversity
Genes
Individuals Populations Species
Interactions
AB C D E F
GCGC
GTAC
CTAG
GenBank
i
ii
iii
iv
v
vi
1
2
1
2
3
Local populations
A
B
C
D
E
F
Global
biodiversity
-+++++
+-+++
+++
+
+
Biological
networks
8. An informaticians view of biodiversity
GenBank
MorphBank
Interactions
Geospatial
Census
Genotype
Phenotype
Biotic
Interactions
Environment
Human Effects
IUCN
Pop. data
Niche & Pop.
Ecology
TreeBase
Biodiversity
Loss
GBIF
Phylogenetic
Trees
IPNI, Zoobank
Taxonomy
AquaMaps
Geographic
Dsitributions
Extent of Occurrence
Range Maps
Conservation &
management
AquaMaps
Forecasts of
Change
Data
Products
Systems
Key problems
• Landscape is complex, fragmented & hard to navigate
• Many audiences (policy makers, scientists, amateurs, citizen scientists)
• Many scales (global solutions to local problems)
Figure adapted from
Peterson et al 2010
9. A project centric view of biodiversity
Scan / Mark/up
PLAZI
Inotaxa
BHL
eFloras
CDM
GNA (NameBank)
Phylogenetic
Tree of Life
TreeBase
CIPRES
Descriptive /
classification
EoL
Scratchpads
CATE
MorphoBank
Wikipedia
Molecular
Databases
NCBI/EMBL/DDBJ
CBoL
Barcode of Life
Initiative
Bibliographic
IPNI
Google Scholar
Connotea
ViTaL
ISI
Institutional
EMu (=MOA)
Recorder
uBio
TDWG
Checklists
Identification
Key2Nature
IdentifyLife
Inter-Institutional
Synthesis
BCI
BioCASE
GeoCASE
MaNIS
PESI:
ERMS
Fauna Europea
Euro+Med Plantbase
ORBIS
WORMS
Flora Europea
Nomenclators
Index Fungorum
ZooBank
IPNI
(Kew/AUS/Harvard)
ING
AFD/APC/APUI
NZOR
CoL (Sp2000& ITIS)
ZooRecord
LifeWatch
GBIF
Biodiversity
ALA
CONABIO
CRIA (Brazil)
IUCN
SEEK
OPAL
DAISIE
iNaturalist
A snapshot from 2009, “the dance of the initiatives”
10. The strategic view: community informatics challenges
GBIF GBIC Report
(Coming soon)
EU Biodiversity Strategy
(2011)
Biodiv. Inf. Challenges
(2013)
Grand Challenges for Biodiversity Informatics
(integrating activities for H2020)
11. 2. Social challenges
- Openness
- Collaboration and communities
- Standards, identifiers & links
12. Openness in biodiversity informatics
“A piece of data or content is open if anyone is free to use, reuse, and redistribute it subject, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/
• Sharing data is a foundation
for our activities
• Normal practice in some
communities (molecular)
• Mandated by some funders
& governments
Many kinds of openness:
• Open Access
• Open Data
• Open Science
• Open Source
E. Archambault et. al., Proportion of Open Access Peer-Reviewed Papers at the
European and World Levels--2004-2011, June 2013, Science-Metrix Inc.
“One-half of all papers are now freely available
within a year or two of publication”
13. Openness in biodiversity informatics
“A piece of data or content is open if anyone is free to use, reuse, and redistribute it subject, at most, to the requirement to attribute and/or share-alike.” http://opendefinition.org/
• Sharing data is a foundation
for our activities
• Normal practice in some
communities (molecular)
• Mandated by some funders
& governments
Many kinds of openness:
• Open Access
• Open Data
• Open Science
• Open Source
Incentivise through credit via citation (e.g. BDJ)
Need to continue to incentivise openness
14. What are Scratchpads? (http://scratchpads.eu)
Collaboration & communities
Making taxonomy a team sport
e.g., Scratchpad Virtual Research Communities
Taxa
Projects
544 Scratchpad Communities
by
6,644 active registered users
covering
91,631 taxa
in 535,317 pages.
Regions
Societies
In total more than
1,300,000 visitors
81 paper citations in 2012
Our infrastructures need to facilitate collaboration
15. Standards, identifiers & protocols
Facilitating data sharing across communities
A foundation for integration
Key requirements:
• Need to be inclusive, practical & extensible
• Readable by humans & machines
• Widely used
Good examples:
• Darwin Core
• CrossRef & DataCite DOIs
• ORCHID Author identifiers
Gaps / Problems
• Reuse & persistence of identifiers
• Vocabularies & ontologies (time consuming / little reward)
Potential solutions
• Build them into our credit systems
• Show sematic reasoning potential (LOD & RDF demonstrators)
Standards can’t be developed in isolation – they must be used
16. 3. (Big) data challenges
- Mobilising existing data
- New forms of data
17. Mobilising existing data
Collections, literature & metadata
How can we quickly, efficiently and cost
effectively mobilise biological data at scale?
Collections
• 1.5-3B specimens in collections worldwide
• Fragments efforts / heterogeneity of process
• Needs ambition (NHM: 20M in 5 yrs.) & coord.
Literature
• >300M pages of biodiversity literature
• BHL (41M pp.) an example of what can be done
• Needs a sustainability & article metadata
NHM
Digitisation
BHL
literature
Metadata registries
• Data about data (cheaper & scalable)
• e.g. bibliographic data, dataset portals
Informatics challenges
• Storage & persistence
• Automation & annotation
• Incentives to digitise & fitness for use
Bibliography of Life
(RefFinder & RefBank)
18. Mobilising & managing new forms of data
Metagenomics & ecological observatories
These new data types do not depend on
traditional taxonomy & systematics
New Molecular approaches
• Molecular detection & monitoring of organisms is routine
• Metagenomics (env. sequencing) commonplace
• Becoming the 1° route to understanding biodiversity
3-4 June 2013, NHM
Ecological observatories
• Automated biodiversity detection
• Remote sensing (e.g. satellite & acoustic data, drones, camera traps)
• Monitoring conspicuous, rare or invasive spp. (algal blooms, palms)
• Monitoring human activity
Informatics challenges
• Very large quantities of data (2.5-10TB per researcher per yr.)
• Doesn’t map well to existing data infrastructures
• Challenge current networking & storage capacity
• Digital and physical collections become equally important?
22 July, 2013
20. Aggregation & linking
Portals bringing together distributed & diverse forms of data
Giving consistent and comprehensive access
to all biological data
eMonocot
Several approaches, with different advantages
• Tightly coupled to a few data sources
•
(e.g. eMonocot, CDM)
• Loosely coupled to many sources
•
•
(e.g. BioNames, Wikipedia)
Hybrid forms (e.g. Canadensys, EOL, GBIF)
Selective & accurate but hard to scale
(276k taxa, 8k images, 13 keys & 3 phylogenies)
Informatics challenges
• Portals are hard to sustain
• New methods of data discovery & access
• Create new windows (views) on content
• New data structures, new types of database
BioNames
Scalable but less accurate
(3M taxon names, 93k phylogenies & 28k articles)
21. Visualisation
Visually synthesizing large, linked biodiversity datasets
Making biodiversity data accessible &
understandable
Research opportunities
• Tools integration (e.g. GeoCat, CartoDB)
• Span multiple audiences
Outreach opportunities
• Visually compelling story telling
• Crowdsourcing tools (e.g. Notes From Nature)
Exploiting new technologies
• Touch screens
• Mobile
• Location awareness
Informatics challenges
• Very specific to individual use cases
• Sustainability issues
NHM specimen records
http://data.nhm.ac.uk/globe/
22. Modeling the biosphere: a (the) 30 year goal?
Reasoning across large, linked biodiversity datasets
A clear, singular, long-term vision, which
biodiversity data can contribute too
Conceptually has many potential uses
• Identifying trends
• Explaining patterns
• Making predictions
• Real time alerts
- when data contradicts current knowledge
• The ultimate policy tool
Major informatics challenges
• Technical very difficult (many years off)
• Needs effective prototypes & platforms
• Some first steps e.g. OBOE, LEFT
Nature 2013, doi:10.1038/493295a
24. Lessons learned: new opportunities in H2020
PATHWAYS TO INTEGRATION
(by addressing these social, data & synthetic challenges)
• Break out of the discipline, technical &
project centric activities (it is
unsustainable, inefficient & bad for science)
• Integrate & build on exiting programmes
where possible (LifeWatch is a potential umbrella
for these activities)
• Bridge the disconnect between
informaticians & users (make the users
informaticians & in informaticians users)
• Our products well suited to address these
challenges
• Use H2020 as a mechanism to achieve
integration
How do we join up these activities?
26. Possible biodiversity informatics design principles*
= experience from 7-years with the Scratchpads
= lessons for infrastructures in H2020?
1. Start with needs - focus on real user needs (not just the ‘official process’)
2. Do less - if someone else is doing it, link to it or use it
3. Design with data - prototype and test with real users on the live website
4. Do the hard work to make it simple - let the computer take the strain
5. Iterate. Then iterate again. - iteration reduces risk & is more sustainable
6. Build for inclusion – it’s easier in the long run
7. Understand context - we are designing for people, not a screen or a brand
8. Build digital services, not websites - there is life beyond the website
9. Be consistent, not uniform - every circumstance is different
10. Make things open: it makes things better - it’s more sustainable
*https://www.gov.uk/designprinciples
27. Mobilising existing data: how to prioritise
CONTENT
FUN
LEARNING
OUTREACH
Digitise a few things & invest in
depth, description & promotion
A LITTLE
A LOT
Digitise lots of things, put little effort
into description & promotion
AGGREGATION
COLECTIONS
MANAGEMENT
METADATA
DATA MINING
RESEARCH
Nick Poole, UK Collections Trust
28. Collaboration & communities
Making taxonomy a team sport
Average dates when increasing numbers of taxonomists were involved in describing species
CONE SNAILS
BIRDS
MAMMALS
AMPHIBIANS
SPIDERS
PLANTS
Joppa et al, 2011
•
•
•
•
Very few recent single author papers
Most (fundable) science is cross-disciplinary
Need to incentivise data curation & annotation
Need mechanisms to share annotations
Our infrastructures need to facilitate collaboration