Your SlideShare is downloading. ×
0
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
EIA Biodiversity Data Mobilisation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

EIA Biodiversity Data Mobilisation

993

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
993
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • -to GBIF network, and for reuse by others as well
  • -Nick mentioned the key challenges in his presentation entitled “Why the need for a global infrastructure to discover, share, publish and use biodiversity data
  • -Nick mentioned the key challenges in his presentation entitled “Why the need for a global infrastructure to discover, share, publish and use biodiversity data
  • Transcript

    • 1. GLOBAL BIODIVERSITY INFORMATION FACILITY WWW.GBIF.ORG Publishing EIA Biodiversity Data: Technology and Infrastructure Vishwas Chavan, Nick King and Francois Rogers Global Biodiversity Information Facility [email_address] Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa 2-4 March 2010, Cape Town, South Africa
    • 2. Contents <ul><li>EIA Biodiversity Data: Types and formats </li></ul><ul><li>Data Capture & Digitisation tools </li></ul><ul><li>Data Discovery </li></ul><ul><li>Data Publishing </li></ul><ul><li>Data Quality & fitness-for-use </li></ul><ul><li>Data Hosting Centers </li></ul><ul><li>Community Building Platforms </li></ul>
    • 3. What are the challenges? More data types Richer user interface Better management Richer content Better synchronisation Improved discovery
    • 4. EIA BIODIVERSITY DATA: TYPES AND FORMATS
    • 5. Indices Nomenclators Namebanks Biology Conservation Ecology Distribution Phylogenies ... Geolocation Country Collector Date … Voucher specimen Blood sample DNA Barcode Image Audio Video ... BHL Plazi.org ... EIA Biodiversity data are very diverse Evidence Metadata Taxon names Taxon concepts Observation Literature Species banks
    • 6. DATA CAPTURE AND DIGITISATION TOOLS
    • 7. Data Capture and Digitisation Tools Florin Pandora Taxis Cassia FieldNote Mandala ATTA BirdRecorder
    • 8. uBio Tools <ul><li>Name recognition tool (FindIT) </li></ul><ul><li>Author abbreviation resolver </li></ul><ul><li>Checking classification (TSN name mapper) </li></ul><ul><li>Deconsrtuct scientific name (ParseIT) </li></ul><ul><li>Find scientific name (CrawlIT) </li></ul><ul><li>etc… </li></ul><ul><li>http://www.ubio.org </li></ul>
    • 9. GBIF Templates <ul><li>Capture data in DwC compatible format </li></ul><ul><ul><li>Occurrence Data Template </li></ul></ul><ul><ul><li>Names Data Template </li></ul></ul><ul><li>Facilitate authoring ’resource metadata’ </li></ul><ul><ul><li>Occurrence template </li></ul></ul><ul><ul><li>Documentation for occurrence template </li></ul></ul>
    • 10. GBIF Informatics Architecture Improved access to Names, Metadata and Primary Biodiversity Data Distributed GBIF informatics architecture Faster and easier publishing of data
    • 11. DATA DISCOVERY <ul><li>GBRDS REGISTRY </li></ul><ul><li>METADATA CATALOGUE </li></ul>GBRDS: Global Biodiversity Resources Discovery System
    • 12. DATA DISCOVERY: GBRDS REGISTRY
    • 13. GBRDS, a Discovery System Consumers Data Publishers Searching Retrieving Discovering Discovery System Registering Service Publishers Others…
    • 14. That links to resources… Who? Institutions, Collections … What? Where? When? How Data, Services, GUID/LSID… Location, Access points… Temporal Scope… Formats, protocols, qualities A distributed service ………… .. which resolves to information resources … ./
    • 15. Global Biodiversity Resources Discovery System <ul><li>Institutions/Collections </li></ul><ul><li>LSIDs/DOI/GUIDs </li></ul><ul><li>Standards </li></ul><ul><li>Protocols </li></ul><ul><li>Resources </li></ul><ul><li>Services/Applications </li></ul><ul><li>etc… </li></ul>
    • 16. Global Biodiversity Resources Discovery System <ul><li>Institutions/Collections </li></ul><ul><li>LSIDs/DOI/GUIDs </li></ul><ul><li>Standards </li></ul><ul><li>Protocols </li></ul><ul><li>Resources </li></ul><ul><li>Services/Applications </li></ul><ul><li>etc… </li></ul>GBRDS Registry Release: April 2010
    • 17. DATA DISCOVERY: METADATA CATALOGUES
    • 18. User Perspective Data Producer Perspective <ul><ul><li>Document data with minimum effort </li></ul></ul><ul><ul><li>Assess the value of the data for others </li></ul></ul><ul><ul><li>Bridge the gap between data owners and users </li></ul></ul><ul><ul><li>Educate users about the characteristics of the data </li></ul></ul>Craglia: http://www.ec-gis.org/Workshops/6ec-gis/papers/craglia-metadata.doc Two perspectives on metadata <ul><ul><li>Discover if data exists </li></ul></ul><ul><ul><li>Identify source, provenance </li></ul></ul><ul><ul><li>Make judgement about data quality and usability before getting it </li></ul></ul><ul><ul><li>Minimise costs involved in the search, retrieval, integration and use of the data </li></ul></ul>
    • 19. Two levels of metadata Discovery Metadata Full Metadata <ul><li>Discover if a resource exists; get information on - </li></ul><ul><ul><li>Ownership </li></ul></ul><ul><ul><li>Location </li></ul></ul><ul><ul><li>How to get further information </li></ul></ul><ul><li>Provides a full description of the resource, including - </li></ul><ul><ul><li>Data quality </li></ul></ul><ul><ul><li>Data lineage </li></ul></ul><ul><ul><li>Full access and exploitation </li></ul></ul>
    • 20. <ul><ul><li>Natural Collections Descriptions (NCD) </li></ul></ul><ul><ul><li>Ecological Metadata Language (EML) </li></ul></ul><ul><ul><li>ISO 19115/19139 </li></ul></ul><ul><ul><li>FGDC Biological Data Profile </li></ul></ul>Metadata Standards <ul><ul><li>Dublin Core </li></ul></ul><ul><ul><li>MRTG Multimedia Metadata Schema </li></ul></ul>IPT 1.1 Metadata Profile
    • 21. DATA PUBLISHING
    • 22. Key Components: the IPT IPT The Integrated Publishing Toolkit is a state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata and primary biodiversity data Data Publisher Registration (GBRDS) + Publishing of Names, Metadata, Primary biodiversity data etc…
    • 23. Simple process! The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!
    • 24. GBIF Integrated Publishing Toolkit (IPT) <ul><li>Open source Java web application </li></ul><ul><li>Bypasses limitations of traditional wrapper tools in publishing large amounts of data by publishing whole datasets in DwC-Archive dumps (especially useful for small data publishers or those with little or no internet access) </li></ul><ul><li>Has a richer environment than current wrapper tools, providing some data cleaning, visualisation capabilities, and the ability to publish dataset metadata </li></ul><ul><li>Documentation and download </li></ul><ul><ul><li>http:// code.google.com/p/gbif-providertoolkit/ </li></ul></ul><ul><li>Demo site </li></ul><ul><ul><li>http://ipt.gbif.org </li></ul></ul>
    • 25. * Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009 IPT Publishes Through… More to come….
    • 26. IPT Demo <ul><li>Screencast of IPT demo </li></ul><ul><li>GBIF Help Desk (helpdesk@gbif.org) </li></ul>IPT 1.1 Release: April 2010
    • 27. NAMES DATA
    • 28. Scope of the Global Names Architecture Referencing names in Checklists to a common Nomenclatural Index
    • 29. Checklist Bank – A Name Services brokerage Global broker of taxonomic data Index of Taxonomic Catalogues and Annotated Checklists Extends the GBIF network to support publishing Species-level data
    • 30. Publishing Checklists to GBIF <ul><li>Using Integrated Publishing Toolkit </li></ul><ul><li>Via pre-composed Spreadsheet templates </li></ul><ul><li>Exporting according to DwC Archive format and registering a local data file (self-serve) </li></ul><ul><li>GBIF desktop publishing tool </li></ul><ul><li>Other taxonomic editors (EDIT/ITIS) that support DwC Archive format </li></ul>
    • 31. Desktop Annotated Checklist Builder Create, manage, publish Synonymised checklists Vernacular Names Distribution data Bibliography Type/Specimen data Mac OS/ Windows Publishes “GBIF-ready” format DwC Archive – simple, extensible Text-based format Q3 2010
    • 32. Controlled Vocabularies Server ISO: Countries ISO: Language DwC: Basis of Record DwC: Nomenclatural Status DwC: Sex (Gender) DwC: Taxonomic Status IUCN: Threat Status … v ocabularies.gbif.org Vocabularies publishing platform – Internationalise all GBIF vocabularies
    • 33. Controlled Vocabularies Server Create, manage, publish Extensions to Darwin Core Extend Occurrence Data Extend Species Data v ocabularies.gbif.org Tie to vocabularies that are also drafted and published to this system. Then translate to your native langauge..
    • 34. DATA QUALITY & FITNESS-FOR-USE
    • 35. Fitness-for-use <ul><li>Primary biodiversity data can be used for multiple purposes by various user communities worldwide. </li></ul><ul><li>Assessing and enhancing fitness-for-use of data is therefore critical for the scientific and social relevance of biodiversity science. </li></ul><ul><li>Fitness-for-use varies from one use case to another..... </li></ul><ul><li>Data quality assessment and quality control are important components of ‘fitness-for-use’ regime </li></ul>
    • 36. Loss of Data Quality <ul><li>At the time of collection </li></ul><ul><li>During digitisation </li></ul><ul><li>During documentation </li></ul><ul><li>During storage and archiving </li></ul><ul><li>During analysis and manipulation </li></ul><ul><li>During dissemination and presentation </li></ul><ul><li>Through the use to which they are put </li></ul>
    • 37. Issues influencing data quality <ul><li>Accuracy and precision </li></ul><ul><li>Completeness </li></ul><ul><li>Currency and Timeliness </li></ul><ul><li>Update frequency </li></ul><ul><li>Consistency </li></ul><ul><li>Flexibility </li></ul><ul><li>Transparency </li></ul><ul><li>Performance measures and targets </li></ul><ul><li>Data cleaning </li></ul><ul><li>Outliers </li></ul><ul><li>setting targets for improvement </li></ul><ul><li>Truth in labelling </li></ul><ul><li>Error and bias </li></ul><ul><li>Uncertainty </li></ul><ul><li>Auditability </li></ul><ul><li>Edit Controls </li></ul><ul><li>Minimise duplication and reworking of data </li></ul><ul><li>Maintenance of original (or verbatim) data </li></ul><ul><li>Categorisation can lead to loss of data and quality </li></ul><ul><li>Documentation </li></ul><ul><li>Feedback </li></ul><ul><li>Education and Training </li></ul><ul><li>Accountability </li></ul>
    • 38. Data quality: Responsible Players <ul><li>Collectors </li></ul><ul><li>Custodian or Curator </li></ul><ul><li>Aggregator </li></ul><ul><li>Publisher </li></ul><ul><li>Users </li></ul>
    • 39. Data Cleaning: definition & framework <ul><li>A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions </li></ul><ul><li>General framework for data cleaning </li></ul><ul><li>Define and determine error types </li></ul><ul><li>Search and identify error instances </li></ul><ul><li>Correct the errors </li></ul><ul><li>Document error instances and error types; and </li></ul><ul><li>Modify data entry procedures to reduce future errors </li></ul>
    • 40. Tools and Best Practices http://mapstedi.colorado.edu/ http://manisnet.org/GeorefGuide.html
    • 41. Tools and Best Practices GBIF Templates
    • 42. Best Practice Guidelines All freely available
    • 43. Best resource… <ul><li>Chapters on </li></ul><ul><li>Data Quality </li></ul><ul><li>Data Cleaning </li></ul><ul><li>Geo-referencing </li></ul><ul><li>Generalising sensitive data </li></ul>http://www2.gbif.org/TM1.pdf
    • 44. DATA HOSTING CENTERS
    • 45. Data Hosting Centers <ul><li>Caters to data publishers without skills & resources </li></ul><ul><li>Facilitate long term archival and publishing </li></ul><ul><li>GBIF Plans </li></ul><ul><li>Criteria for establishing DHC </li></ul><ul><li>Criteria for endorsement of DHC </li></ul><ul><li>Tools and Best Practices for DHC </li></ul>
    • 46. Data Hosting Centers
    • 47. COMMUNITY BUILDING PLATFORMS
    • 48. http://community.gbif.org
    • 49. ? Email: [email_address] Skype: vishwaschavan

    ×