Bibliographic References in BHL

214 views
186 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
214
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Guidelines for speakers giving presentationsPresentation are limited to 15 minutes for each speaker plus 5 minutes for discussion.Presentations should clearly answer the following questions (7-8 slides), definitely focusing on the interoperability problem:Type of content we discuss (e.g., occurrences, genes, behaviour, morphology, etc.)Sources of content (from where)Formats of content (formats, standards)Methods of gathering information (e.g., harvesting, ftp uploads, protocols)Methods of delivery of information (e,g., free searches, API, web services, automated exports, linking mechanisms, etc.; provide links to API and web services documentation)Identifiers used (type, persistence, dereferencing, resolvability)’Present or forthcoming interoperability features with other platformsConstraints, needs and expectations to: a) Suppliers of content, and b) Users of contentOverall picture of what is needed within a certain domain (e.g., for names, references, genes, images, etc.) (2-3-slides)The final outputs of presentations and discussions should be two-fold:Summary table encompassing the answers to the above questions, that will be a basis for the whitepaper and future workMoU draft discussedProposing an Advisory Board of key stakeholders that will form the ground for a consortium to develop and launch the future BKMSTasks involved:Task 2.1. Coordination and routes for cooperation across organizations, projects and e-infrastructures (lead: Plazi). Encompassing the information gathered at Workshop 1 (Leiden, February 2013) and through the online questionnaire.Task 4.1 Improve technical cooperation and interoperability at the e-infrastructure level (lead: FUB-BGBM).Task 4.2 Promote and monitor the development and adoption of common mark-up standards and interoperability between schemas by identifying technical and societal constraints and needs to increase collaboration and interoperability between e-platforms and projects, and by envisioning practical solutions towards the Biodiversity Knowledge Management System (lead: Plazi).=============Concrete examples of ideas for potential points in a draft MoUA primary purpose of the “Routes towards cooperation” meeting is to increase our reciprocal understanding and progress towards a multi-institutional Memorandum of Understanding(MoU). The following points are potential points in a draft MoU. It is welcome to comment them here on the wiki before the meeting takes place, or to add further points. The results would then have to be further discussed by the appropriate levels.Establishment of a multi-institutional focus group to coordinate software development to improve the efficiency of resource use by means of common Open Source based development projects using Open Source methodology.Agreements on specialization, e.g., one institution specializes in geographical analysis and visualization, providing services to other institutions or projectsAgreement on long-term management procedures to provide stable identifiers. This agreement may be technology neutral (except that some way to use the identifiers in the human readable as well as semantic web should be specified). Both stable http-URIs (preferred in semantic web) and DOI technology (publishing industry) are possible implementations.Agreement on following the Linked Open Data example. (Note: Edinburgh may be a best practices example?)Agreement to communicate the data policies according to the Linked Open Data five star scoringPolicy agreements on Open AccessAgreement to register all services that are provided to other Biodiversity institutions in the Biodiversity Catalogue (Univ. Manchester, myExperiment).Agreement to communicate the expected and planned stability of services by means of a standard vocabulary (e.g.: undecided, experimental, long-term service without fixed API, long-term service with stable and versioned API)Agreement to collaborate on the development of shared term definitions (glossary-style) with the understanding that new terms can be freely added, but an effort will be made to re-use or improve existing term definitions.Agreement on crowdsourcing activities to clean up data, e.g. bibliographic references, or markup content in legacy literature, e.g. scientific names, treatments, material citations.Paul Kirk: Centrally 'cached' data should have a clear mechanism for providing usage statistics back to sources.
  • Type of content we discuss (e.g., occurrences, genes, behaviour, morphology, etc.)Sources of content (from where)Formats of content (formats, standards)Methods of gathering information (e.g., harvesting, ftp uploads, protocols)Methods of delivery of information (e,g., free searches, API, web services, automated exports, linking mechanisms, etc.; provide links to API and web services documentation)Identifiers used (type, persistence, dereferencing, resolvability)Present or forthcoming interoperability features with other platformsConstraints, needs and expectations to: a) Suppliers of content, and b) Users of content
  • [PortalUser Interface]
  • [Book Viewer Interface]
  • We ask the user to provide metadata if they’re generating a chapter or book title
  • On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  • [Citebank homepage]
  • [Citebank homepage]
  • [Citebank stats]
  • [World in which CiteBank lives]
  • [Citations in BHL and Sustainability Considerations]
  • [Citebank homepage]
  • [GNA Diagram]
  • [Define functional requirements]
  • We ask the user to provide metadata if they’re generating a chapter or book title
  • We ask the user to provide metadata if they’re generating a chapter or book title
  • [Where are we going?]
  • [Diagram of citations reconciliation]
  • Type of content we discuss (e.g., occurrences, genes, behaviour, morphology, etc.)Sources of content (from where)Formats of content (formats, standards)Methods of gathering information (e.g., harvesting, ftp uploads, protocols)Methods of delivery of information (e,g., free searches, API, web services, automated exports, linking mechanisms, etc.; provide links to API and web services documentation)Identifiers used (type, persistence, dereferencing, resolvability)Present or forthcoming interoperability features with other platformsConstraints, needs and expectations to: a) Suppliers of content, and b) Users of content
  • Bibliographic References in BHL

    1. 1. Bibliographic references in BHLCoordination and routes forcooperation across organizations,projects and e-infrastructures23rd of May 2013William Ulate R., Missouri Botanical Garden
    2. 2. Questions to Answer1. Type of content we discuss (e.g., occurrences, genes, behaviour,morphology, etc.)2. Sources of content (from where)3. Formats of content (formats, standards)4. Methods of gathering information (e.g., harvesting, ftp uploads,protocols)5. Methods of delivery of information (e,g., free searches, API, webservices, automated exports, linking mechanisms, etc.; provide linksto API and web services documentation)6. Identifiers used (type, persistence, dereferencing, resolvability)7. Present or forthcoming interoperability features with otherplatforms8. Constraints, needs and expectations to:a) Suppliers of content, andb) Users of content9. What is needed for Bibliographic References?
    3. 3. A brief history…
    4. 4. The Biodiversity Heritage Librarywww.biodiversitylibrary.org
    5. 5. Book Viewer
    6. 6. SharingBHL shares data through:APIsData ExportOpenURLOAI-PMH
    7. 7. Open Data• Downloads– Simple tab-delimited exports of core data– http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf• Data model– DB schema as ERD– http://bhl-bits.googlecode.com/files/20090930_BHLDataModel.pdf
    8. 8. Services• Names Service– Return all occurrences of a name throughout BHL digitized corpus• Documentation: http://bit.ly/2e6sg9– Access to 100+ million name strings using TaxonFinder & NetiNeti• 1.5 million unique names– Algorithm to detect nomenclatural & taxonomic acts• OpenURL– Facilitate links to citations: protologues, articles, references• Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx– Useful to Nomenclators, Reference Systems• IPNI• Tropicos
    9. 9. Services: OpenURLhttp://www.biodiversitylibrary.org/openurl?pid=title:3934&volume=14&issue=&spage=301&date=1879http://www.tropicos.org/Name/1200408
    10. 10. DOIs
    11. 11. DOIs for Legacy Literature• BHL member of CrossRef through Smithsonian• Started assigning DOIs to BHL monographs– Low hanging fruit: Easy, non-controversial– 54,856 DOIs Approved to date• Next, other publication types / articles?– Process of automatically assigning CrossRef DOIsto articles has a higher potential for collisions.
    12. 12. Article-level metadata• Disambiguating and locating structural componentsin the corpus• Done by automated and crowdsourced means– Thanks Rod Page! Welcome others!• Greatly increases semantic value of the dataset• Makes data addressable and thus linkableChapter-level metadataTreatment-level metadataPart-level metadata
    13. 13. Genesis: “BHL Article Repository”• Idea first introduced at TDWG 2008, Fremantle(by BHL, many have discussed for years)• YouTube for biodiversity articles• Needed (need) a way to access articles in BHL– “BHL has no articles.”– BHL has hundreds of thousands of articles but youcan’t search for them via author, article title search– Can find via “article coordinates” using BHL’s UI &OpenURL resolver: Journal / Volume / Start Page / Year
    14. 14. CiteBank• Objectives– Create a repository for community-vettedtaxonomic bibliographies.– Ability to ingest, display, download, and indexarticles so that the BHL can operate as an articlerepository.– Provide links to content published online throughother repositories.• Launched on December 6th 2010• 185609 bibliographic records to date
    15. 15. Citations today: http://citebank.org
    16. 16. Citations Providers
    17. 17. SpecimenDatabasesCommercialAggregatorsSoftware ToolsOpen AccessDigital LibrariesIndicesNomenclatorsSpecimenDatabasesCommercialAggregatorsSoftware ToolsOpen AccessDigital LibrariesIndicesNomenclatorsOpen AccessPublishersInternational Collaborative Projects
    18. 18. Lessons Learned• Biblio/Drupal data model insufficient for mass of dataenvisioned for all biodiversity, too flat and difficult toexpand in collaboration with Biblio developmentcommunity• Data providers want their content findable andmanaged in the Biodiversity Heritage Library, not asystem alongside BHL• Maintaining two platforms for biodiversity literaturethreatens sustainability of the literature resources overthe longer term
    19. 19. Global Names Architecture
    20. 20. What have we done?• Articles– Extended BHL data model to store article metadata– Built process to harvest data from BioStor• Created user interfaces for adding article metadataand associated files– Defined functional requirements as improvements toDrupal-based Citebank– Defined process flow for adding article metadata andassociated files– Implemented UI changes• Changed BHL UI to accommodate article search• Changed BHL UI to accommodate article display (TOC)
    21. 21. Articles in the BHL UI
    22. 22. Articles
    23. 23. Articles
    24. 24. Articles
    25. 25. Requirements for a citation repository?Admin. Interface– IMPORT AND MAPPING TOOL• Preview/Accept/Reject/Undo/Report on Import• No standard schema, MODS or Bibtex• Drag & drop GUI or mapped source and target field config.– USER MANAGEMENT• Self-Registration• Admin. Approval & Deletion• User Roles Assignment– GLOBAL UPDATES
    26. 26. Requirements for a citation repository?General User Interface– IMPORT• Upload/Preview/Accept/Reject/Undo/Report on Import– CREATE CITATION• By filling a Form, via BibTex– BROWSE• Faceted: title,author,subject, year, contributor, my citations
    27. 27. Requirements for a citation repository?• CITATION TYPES– Journal Article, Book Chapter, Conference Proceedings,Conference Paper, Thesis, Government Report, Note, etc.• OAI HARVESTING– Harvest and serve data through OAI-PMH• SPECIFICATIONS FOR DATA PROVIDERS PAGE• CONTRIBUTORS PAGE– Recognize ALL contributions• REPORTING– Statistics Page by Citation and Publication type– Recent/Latest Uploads
    28. 28. What are we doing?• Integrate BHL’s Services with ZooBank, IPNI & IF• Authoritative list of titles in common use fornomenclatural acts (“TL3”)• Harvest relevant content from Mendeley• Integrate services and interfaces with the GNUBdata model• Interoperate with citation parsing tools & services
    29. 29. Support citation reconciliation.......L. Sp. Pl. 2: 971. 1753Linneaus, C. Species Plantarum, vol. 2 p. 971. 1753Linné, Carl von. Sp. Pl. Vol. 2 Page 971. 1753Caroli Linnaei, Species Plantarum exhibentes plantas rite cognitas, ad generarelatas, cum Differentis Specificis, Nominibus Trivialibus, Synonymis Selectis,Locis Natalibus, secundum SYSTEMA SEXUALE digestas.. 2:971. 1753Zea mays
    30. 30. Questions to Answer1. Type of content - Literature, Images, OCR Textand Bibliographic Citations2. Sources of content - BHL, CB & other Repositories3. Formats of content - BibTex, MODS, DC4. Methods of gathering info - Harvesting, FTP Uploads5. Methods of delivery of info - Free Searches, API, webservices, exports, linkingmechanisms6. Identifiers used - CrossRef DOIs for Monographs7. Interoperability withother platforms - Zoobank, IPNI, IF8. Constraints, needs and expectations to suppliers of contentand users of content
    31. 31. Thank youpro-iBiosphere Meeting 3Coordination and routes for cooperation across organizations, projects and e-infrastructuresBerlin, GermanyMay 23rd, 2013William.Ulate@mobot.orgGlobal BHL Project ManagerBHL Technical DirectorSenior Project ManagerMissouri Botanical Garden

    ×