Your SlideShare is downloading. ×
0
GLOBAL BIODIVERSITY INFORMATION FACILITY WWW.GBIF.ORG Publishing EIA Biodiversity Data: Technology and Infrastructure Vish...
Contents <ul><li>EIA Biodiversity Data: Types and formats </li></ul><ul><li>Data Capture & Digitisation tools </li></ul><u...
What are the challenges? More data types Richer user interface Better management Richer content Better synchronisation Imp...
EIA BIODIVERSITY DATA:  TYPES AND FORMATS
Indices Nomenclators Namebanks Biology Conservation Ecology Distribution Phylogenies ... Geolocation Country Collector Dat...
DATA CAPTURE AND DIGITISATION TOOLS
Data Capture and Digitisation Tools Florin Pandora Taxis Cassia FieldNote Mandala ATTA BirdRecorder
uBio Tools <ul><li>Name recognition tool (FindIT) </li></ul><ul><li>Author abbreviation resolver </li></ul><ul><li>Checkin...
GBIF Templates <ul><li>Capture data in DwC compatible format </li></ul><ul><ul><li>Occurrence Data Template </li></ul></ul...
GBIF Informatics Architecture Improved access to Names, Metadata  and Primary Biodiversity  Data Distributed GBIF  informa...
DATA DISCOVERY <ul><li>GBRDS REGISTRY </li></ul><ul><li>METADATA CATALOGUE </li></ul>GBRDS: Global Biodiversity Resources ...
DATA DISCOVERY: GBRDS REGISTRY
GBRDS, a Discovery System Consumers Data Publishers Searching Retrieving Discovering Discovery System Registering Service ...
That links to resources… Who? Institutions, Collections … What? Where? When? How Data, Services, GUID/LSID… Location, Acce...
Global Biodiversity Resources Discovery System <ul><li>Institutions/Collections </li></ul><ul><li>LSIDs/DOI/GUIDs  </li></...
Global Biodiversity Resources Discovery System <ul><li>Institutions/Collections </li></ul><ul><li>LSIDs/DOI/GUIDs  </li></...
DATA DISCOVERY:  METADATA CATALOGUES
User Perspective Data Producer   Perspective <ul><ul><li>Document data with minimum effort </li></ul></ul><ul><ul><li>Asse...
Two levels of metadata Discovery Metadata Full Metadata <ul><li>Discover if a resource exists; get information on - </li><...
<ul><ul><li>Natural Collections Descriptions (NCD) </li></ul></ul><ul><ul><li>Ecological Metadata Language (EML) </li></ul...
DATA PUBLISHING
Key Components: the IPT IPT The Integrated Publishing Toolkit is a state-of-the-art tool to simplify the  mobilisation of ...
Simple process! The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Na...
GBIF Integrated Publishing Toolkit (IPT) <ul><li>Open source Java web application  </li></ul><ul><li>Bypasses limitations ...
* Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009 IPT Publishes Through… More to come….
IPT Demo <ul><li>Screencast of IPT demo </li></ul><ul><li>GBIF Help Desk (helpdesk@gbif.org) </li></ul>IPT 1.1 Release: Ap...
NAMES DATA
Scope of the Global Names Architecture Referencing names in Checklists to a common Nomenclatural Index
Checklist Bank  –  A Name Services brokerage Global broker of taxonomic data  Index of Taxonomic Catalogues and Annotated ...
Publishing Checklists to GBIF <ul><li>Using Integrated Publishing Toolkit </li></ul><ul><li>Via pre-composed Spreadsheet t...
Desktop Annotated Checklist Builder Create, manage, publish Synonymised checklists Vernacular Names Distribution data Bibl...
Controlled Vocabularies Server ISO: Countries ISO: Language DwC: Basis of Record DwC: Nomenclatural Status DwC: Sex (Gende...
Controlled Vocabularies Server Create, manage, publish Extensions to Darwin Core Extend Occurrence Data Extend Species Dat...
DATA QUALITY &  FITNESS-FOR-USE
Fitness-for-use <ul><li>Primary biodiversity data can be used for multiple purposes by various user communities worldwide....
Loss of Data Quality <ul><li>At the time of collection </li></ul><ul><li>During digitisation </li></ul><ul><li>During docu...
Issues influencing data quality <ul><li>Accuracy and precision </li></ul><ul><li>Completeness </li></ul><ul><li>Currency a...
Data quality: Responsible Players <ul><li>Collectors </li></ul><ul><li>Custodian or Curator </li></ul><ul><li>Aggregator  ...
Data Cleaning: definition & framework <ul><li>A process used to determine inaccurate, incomplete, or unreasonable data and...
Tools and Best Practices http://mapstedi.colorado.edu/ http://manisnet.org/GeorefGuide.html
Tools and Best Practices GBIF Templates
Best Practice Guidelines All freely available
Best resource… <ul><li>Chapters on  </li></ul><ul><li>Data Quality </li></ul><ul><li>Data Cleaning </li></ul><ul><li>Geo-r...
DATA HOSTING CENTERS
Data Hosting Centers <ul><li>Caters to data publishers without skills & resources </li></ul><ul><li>Facilitate long term a...
Data Hosting Centers
COMMUNITY BUILDING PLATFORMS
http://community.gbif.org
? Email:  [email_address] Skype:  vishwaschavan
Upcoming SlideShare
Loading in...5
×

EIA Biodiversity Data Mobilisation

995

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
995
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • -to GBIF network, and for reuse by others as well
  • -Nick mentioned the key challenges in his presentation entitled “Why the need for a global infrastructure to discover, share, publish and use biodiversity data
  • -Nick mentioned the key challenges in his presentation entitled “Why the need for a global infrastructure to discover, share, publish and use biodiversity data
  • Transcript of "EIA Biodiversity Data Mobilisation"

    1. 1. GLOBAL BIODIVERSITY INFORMATION FACILITY WWW.GBIF.ORG Publishing EIA Biodiversity Data: Technology and Infrastructure Vishwas Chavan, Nick King and Francois Rogers Global Biodiversity Information Facility [email_address] Scoping Workshop on Developing an EIA Biodiversity Data Publishing Framework in South Africa 2-4 March 2010, Cape Town, South Africa
    2. 2. Contents <ul><li>EIA Biodiversity Data: Types and formats </li></ul><ul><li>Data Capture & Digitisation tools </li></ul><ul><li>Data Discovery </li></ul><ul><li>Data Publishing </li></ul><ul><li>Data Quality & fitness-for-use </li></ul><ul><li>Data Hosting Centers </li></ul><ul><li>Community Building Platforms </li></ul>
    3. 3. What are the challenges? More data types Richer user interface Better management Richer content Better synchronisation Improved discovery
    4. 4. EIA BIODIVERSITY DATA: TYPES AND FORMATS
    5. 5. Indices Nomenclators Namebanks Biology Conservation Ecology Distribution Phylogenies ... Geolocation Country Collector Date … Voucher specimen Blood sample DNA Barcode Image Audio Video ... BHL Plazi.org ... EIA Biodiversity data are very diverse Evidence Metadata Taxon names Taxon concepts Observation Literature Species banks
    6. 6. DATA CAPTURE AND DIGITISATION TOOLS
    7. 7. Data Capture and Digitisation Tools Florin Pandora Taxis Cassia FieldNote Mandala ATTA BirdRecorder
    8. 8. uBio Tools <ul><li>Name recognition tool (FindIT) </li></ul><ul><li>Author abbreviation resolver </li></ul><ul><li>Checking classification (TSN name mapper) </li></ul><ul><li>Deconsrtuct scientific name (ParseIT) </li></ul><ul><li>Find scientific name (CrawlIT) </li></ul><ul><li>etc… </li></ul><ul><li>http://www.ubio.org </li></ul>
    9. 9. GBIF Templates <ul><li>Capture data in DwC compatible format </li></ul><ul><ul><li>Occurrence Data Template </li></ul></ul><ul><ul><li>Names Data Template </li></ul></ul><ul><li>Facilitate authoring ’resource metadata’ </li></ul><ul><ul><li>Occurrence template </li></ul></ul><ul><ul><li>Documentation for occurrence template </li></ul></ul>
    10. 10. GBIF Informatics Architecture Improved access to Names, Metadata and Primary Biodiversity Data Distributed GBIF informatics architecture Faster and easier publishing of data
    11. 11. DATA DISCOVERY <ul><li>GBRDS REGISTRY </li></ul><ul><li>METADATA CATALOGUE </li></ul>GBRDS: Global Biodiversity Resources Discovery System
    12. 12. DATA DISCOVERY: GBRDS REGISTRY
    13. 13. GBRDS, a Discovery System Consumers Data Publishers Searching Retrieving Discovering Discovery System Registering Service Publishers Others…
    14. 14. That links to resources… Who? Institutions, Collections … What? Where? When? How Data, Services, GUID/LSID… Location, Access points… Temporal Scope… Formats, protocols, qualities A distributed service ………… .. which resolves to information resources … ./
    15. 15. Global Biodiversity Resources Discovery System <ul><li>Institutions/Collections </li></ul><ul><li>LSIDs/DOI/GUIDs </li></ul><ul><li>Standards </li></ul><ul><li>Protocols </li></ul><ul><li>Resources </li></ul><ul><li>Services/Applications </li></ul><ul><li>etc… </li></ul>
    16. 16. Global Biodiversity Resources Discovery System <ul><li>Institutions/Collections </li></ul><ul><li>LSIDs/DOI/GUIDs </li></ul><ul><li>Standards </li></ul><ul><li>Protocols </li></ul><ul><li>Resources </li></ul><ul><li>Services/Applications </li></ul><ul><li>etc… </li></ul>GBRDS Registry Release: April 2010
    17. 17. DATA DISCOVERY: METADATA CATALOGUES
    18. 18. User Perspective Data Producer Perspective <ul><ul><li>Document data with minimum effort </li></ul></ul><ul><ul><li>Assess the value of the data for others </li></ul></ul><ul><ul><li>Bridge the gap between data owners and users </li></ul></ul><ul><ul><li>Educate users about the characteristics of the data </li></ul></ul>Craglia: http://www.ec-gis.org/Workshops/6ec-gis/papers/craglia-metadata.doc Two perspectives on metadata <ul><ul><li>Discover if data exists </li></ul></ul><ul><ul><li>Identify source, provenance </li></ul></ul><ul><ul><li>Make judgement about data quality and usability before getting it </li></ul></ul><ul><ul><li>Minimise costs involved in the search, retrieval, integration and use of the data </li></ul></ul>
    19. 19. Two levels of metadata Discovery Metadata Full Metadata <ul><li>Discover if a resource exists; get information on - </li></ul><ul><ul><li>Ownership </li></ul></ul><ul><ul><li>Location </li></ul></ul><ul><ul><li>How to get further information </li></ul></ul><ul><li>Provides a full description of the resource, including - </li></ul><ul><ul><li>Data quality </li></ul></ul><ul><ul><li>Data lineage </li></ul></ul><ul><ul><li>Full access and exploitation </li></ul></ul>
    20. 20. <ul><ul><li>Natural Collections Descriptions (NCD) </li></ul></ul><ul><ul><li>Ecological Metadata Language (EML) </li></ul></ul><ul><ul><li>ISO 19115/19139 </li></ul></ul><ul><ul><li>FGDC Biological Data Profile </li></ul></ul>Metadata Standards <ul><ul><li>Dublin Core </li></ul></ul><ul><ul><li>MRTG Multimedia Metadata Schema </li></ul></ul>IPT 1.1 Metadata Profile
    21. 21. DATA PUBLISHING
    22. 22. Key Components: the IPT IPT The Integrated Publishing Toolkit is a state-of-the-art tool to simplify the mobilisation of biodiversity information resources such as Names, Metadata and primary biodiversity data Data Publisher Registration (GBRDS) + Publishing of Names, Metadata, Primary biodiversity data etc…
    23. 23. Simple process! The Integrated Publishing Toolkit (IPT) is designed to simplify the mapping, indexing and harvesting of Names, Metadata and Primary Biodiversity Data!
    24. 24. GBIF Integrated Publishing Toolkit (IPT) <ul><li>Open source Java web application </li></ul><ul><li>Bypasses limitations of traditional wrapper tools in publishing large amounts of data by publishing whole datasets in DwC-Archive dumps (especially useful for small data publishers or those with little or no internet access) </li></ul><ul><li>Has a richer environment than current wrapper tools, providing some data cleaning, visualisation capabilities, and the ability to publish dataset metadata </li></ul><ul><li>Documentation and download </li></ul><ul><ul><li>http:// code.google.com/p/gbif-providertoolkit/ </li></ul></ul><ul><li>Demo site </li></ul><ul><ul><li>http://ipt.gbif.org </li></ul></ul>
    25. 25. * Darwin Core (Text-Archive) based on standard submitted to TDWG for review Feb 2009 IPT Publishes Through… More to come….
    26. 26. IPT Demo <ul><li>Screencast of IPT demo </li></ul><ul><li>GBIF Help Desk (helpdesk@gbif.org) </li></ul>IPT 1.1 Release: April 2010
    27. 27. NAMES DATA
    28. 28. Scope of the Global Names Architecture Referencing names in Checklists to a common Nomenclatural Index
    29. 29. Checklist Bank – A Name Services brokerage Global broker of taxonomic data Index of Taxonomic Catalogues and Annotated Checklists Extends the GBIF network to support publishing Species-level data
    30. 30. Publishing Checklists to GBIF <ul><li>Using Integrated Publishing Toolkit </li></ul><ul><li>Via pre-composed Spreadsheet templates </li></ul><ul><li>Exporting according to DwC Archive format and registering a local data file (self-serve) </li></ul><ul><li>GBIF desktop publishing tool </li></ul><ul><li>Other taxonomic editors (EDIT/ITIS) that support DwC Archive format </li></ul>
    31. 31. Desktop Annotated Checklist Builder Create, manage, publish Synonymised checklists Vernacular Names Distribution data Bibliography Type/Specimen data Mac OS/ Windows Publishes “GBIF-ready” format DwC Archive – simple, extensible Text-based format Q3 2010
    32. 32. Controlled Vocabularies Server ISO: Countries ISO: Language DwC: Basis of Record DwC: Nomenclatural Status DwC: Sex (Gender) DwC: Taxonomic Status IUCN: Threat Status … v ocabularies.gbif.org Vocabularies publishing platform – Internationalise all GBIF vocabularies
    33. 33. Controlled Vocabularies Server Create, manage, publish Extensions to Darwin Core Extend Occurrence Data Extend Species Data v ocabularies.gbif.org Tie to vocabularies that are also drafted and published to this system. Then translate to your native langauge..
    34. 34. DATA QUALITY & FITNESS-FOR-USE
    35. 35. Fitness-for-use <ul><li>Primary biodiversity data can be used for multiple purposes by various user communities worldwide. </li></ul><ul><li>Assessing and enhancing fitness-for-use of data is therefore critical for the scientific and social relevance of biodiversity science. </li></ul><ul><li>Fitness-for-use varies from one use case to another..... </li></ul><ul><li>Data quality assessment and quality control are important components of ‘fitness-for-use’ regime </li></ul>
    36. 36. Loss of Data Quality <ul><li>At the time of collection </li></ul><ul><li>During digitisation </li></ul><ul><li>During documentation </li></ul><ul><li>During storage and archiving </li></ul><ul><li>During analysis and manipulation </li></ul><ul><li>During dissemination and presentation </li></ul><ul><li>Through the use to which they are put </li></ul>
    37. 37. Issues influencing data quality <ul><li>Accuracy and precision </li></ul><ul><li>Completeness </li></ul><ul><li>Currency and Timeliness </li></ul><ul><li>Update frequency </li></ul><ul><li>Consistency </li></ul><ul><li>Flexibility </li></ul><ul><li>Transparency </li></ul><ul><li>Performance measures and targets </li></ul><ul><li>Data cleaning </li></ul><ul><li>Outliers </li></ul><ul><li>setting targets for improvement </li></ul><ul><li>Truth in labelling </li></ul><ul><li>Error and bias </li></ul><ul><li>Uncertainty </li></ul><ul><li>Auditability </li></ul><ul><li>Edit Controls </li></ul><ul><li>Minimise duplication and reworking of data </li></ul><ul><li>Maintenance of original (or verbatim) data </li></ul><ul><li>Categorisation can lead to loss of data and quality </li></ul><ul><li>Documentation </li></ul><ul><li>Feedback </li></ul><ul><li>Education and Training </li></ul><ul><li>Accountability </li></ul>
    38. 38. Data quality: Responsible Players <ul><li>Collectors </li></ul><ul><li>Custodian or Curator </li></ul><ul><li>Aggregator </li></ul><ul><li>Publisher </li></ul><ul><li>Users </li></ul>
    39. 39. Data Cleaning: definition & framework <ul><li>A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions </li></ul><ul><li>General framework for data cleaning </li></ul><ul><li>Define and determine error types </li></ul><ul><li>Search and identify error instances </li></ul><ul><li>Correct the errors </li></ul><ul><li>Document error instances and error types; and </li></ul><ul><li>Modify data entry procedures to reduce future errors </li></ul>
    40. 40. Tools and Best Practices http://mapstedi.colorado.edu/ http://manisnet.org/GeorefGuide.html
    41. 41. Tools and Best Practices GBIF Templates
    42. 42. Best Practice Guidelines All freely available
    43. 43. Best resource… <ul><li>Chapters on </li></ul><ul><li>Data Quality </li></ul><ul><li>Data Cleaning </li></ul><ul><li>Geo-referencing </li></ul><ul><li>Generalising sensitive data </li></ul>http://www2.gbif.org/TM1.pdf
    44. 44. DATA HOSTING CENTERS
    45. 45. Data Hosting Centers <ul><li>Caters to data publishers without skills & resources </li></ul><ul><li>Facilitate long term archival and publishing </li></ul><ul><li>GBIF Plans </li></ul><ul><li>Criteria for establishing DHC </li></ul><ul><li>Criteria for endorsement of DHC </li></ul><ul><li>Tools and Best Practices for DHC </li></ul>
    46. 46. Data Hosting Centers
    47. 47. COMMUNITY BUILDING PLATFORMS
    48. 48. http://community.gbif.org
    49. 49. ? Email: [email_address] Skype: vishwaschavan
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×