Your SlideShare is downloading. ×
0
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BHL Technical Director's Report, Mar. 2014

463

Published on

2014 Annual Technical Report at NYBG

2014 Annual Technical Report at NYBG

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
463
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BHL Technical Director’s Report William Ulate New York Botanical Garden March 10, 2014
  • 2. 22.00 40.00 84.86 94.6 105.85 120.09 132.86 9.2 16.4 31.8 35.4 38.9 41.9 42.8 - 20 40 60 80 100 120 140 Oct-08 Oct-09 Oct-10 Oct-11 Oct-12 Oct-13 Pages (Millions) and Volumes (in Thousands) included in BHL Volumes (K) Pages (M) More Online Content
  • 3. Technical Group at MBG Mike Lichtenberg Developer Trish Rose-Sandler Data Analyst William Ulate Technical Director
  • 4. Technical Support MBG IT Division • Manage servers, systems and telecommunications. • Installs software needed And others: • MBL • Internet Archive • BHL-Australia • BHL-Europe
  • 5. Technical Advisory Group
  • 6. Technical Support • BHL-Australia • BHL-Europe • MBL
  • 7. Projects • Global Names • Art of Life • Purposeful Gaming • Digging into Data
  • 8. Scientific Name Extraction • TaxonFinder algorithm in production since 2008 – More than 100 million candidate name strings – More than 1.5 million unique, verified names – Available through UI, APIs, Data Exports & Internet Archive • New collaboration with Global Names project – Improved algorithm, better precision & recall – More data with TaxonFinder and Neti Neti! – http://gnrd.globalnames.org/
  • 9. Taxon Names BEFORE Name Instances 101,591,803 101,288,804 Unique Names 7,498,554 7,464,924 Verified Names 1,905,507 1,902,803 EOL Names 63,130,350 62,963,582 EOL Pages 13,579,868 13,532,684 AFTER Name Instances 151,222,182 150,066,425 Unique Names 29,246,382 29,091,767 Verified Names 10,153,165 10,109,540 EOL Names 87,791,695 87,135,089 EOL Pages 15,466,713 15,342,867
  • 10. Article-level metadata Chapter-level metadata Treatment-level metadata Part-level metadata
  • 11. Articles in the BHL UI
  • 12. See also:
  • 13. Related Titles
  • 14. Art of Life
  • 15. Art of Life
  • 16. Art of Life
  • 17. Art of Life
  • 18. Art of Life
  • 19. Art of Life
  • 20. Macaw https://github.com/cajunjoel/macaw-book-metadata-tool
  • 21. Reviewing Metadata
  • 22. Reviewing Metadata
  • 23. Manually built: 1,714 sets 89,457 images
  • 24. Purposeful Gaming
  • 25. *E.xvi�c�piteI von c. cXx.WptdvonfnrWmn bu�fbe;bcn.5 am cix bIa � S &3rn~ 41X a�m cv(f b1air�'o�et ert oiensr �; �', :�hlrfc�c wa ff�4am.diug bist a 6aiw~s ff oJrJtwt nof bL4ecImt& blfafra mem b t wag `wr 4 cn wiu 4 e8t5m.ed bvUratflb ck wuo, ma144'*4I bttE5rmbebt =rt3'kn am4ra tif vrmr Waff C * t6rmnli an `tn�ciblatGteaM w ?ffoaifrn w4wmeu nu weib e , wpiteI voE5teiri ct c ober gtUcr cit cm` 91 cLi biar J ' >bSciatl�Oiff ;Bruet wacfttc n qmcx b1a bl: bt5c lttmtt bb9 lkr w.llr#e iti ncn xoa ff cu :r trtuft *e t � B Rn "� trv W1Rt' ?Cm c blas waIwutr Ober �ci ti 1V Ces ' wt gbtiemwwajfu tpctt, afferain 9 c: b�titbfof �r f eran m rs bra wlg auig4;f aer�m *mc vrt blatcabtfm wfru an'deg~m rt blas Iaum bwWt� run f ncmai b14ianf tJobrrfan ebrut4net vnber Brwt Ober awawi*m.crriii btafwfm uww c on$ 'it ttu wttkc 5,10 $ m~C fca trc* cx u W�e�&mcyfbq4 Mabtt mmw rc a iiu bc Jcn ncI.end.*, blat s. a u:�rprd3 rw4ftf wm c ii,+ ttCC tn wa frr9fr orfab fcfbt enb c optiti bt -r9 ceDa ttDcn i34M sn Sem i
  • 26. OCR Improvements • Gaming • Transcription
  • 27. OCR Improvements • Transcription • Purposeful Gaming • Looking at… – Crowdsource Markup
  • 28. Purposeful Gaming DIGITALKOOT • Joint project run by the National Library of Finland and Microtask to index the library's enormous archives so that they are searchable on the Internet for easier access to the Finnish cultural heritage. .
  • 29. Purposeful Gaming DIGITALKOOT • Launched on Feb 8 2011, nearly 110 000 participants completed over 8 million word fixing tasks by Nov 29 2012 • DigiTalkoot enabled volunteers to participate in this fixing work by playing games. • .
  • 30. Purposeful gaming and BHL: engaging the public in improving and enhancing access to digital texts • IMLS Grant Program: National Leadership Grants for Libraries • Partners: – Missouri Botanical Garden – Harvard University – Cornell University – New York Botanical Garden • P.I.: Trish Rose-Sandler, Missouri Botanical Garden • Dates: Dec 2013 – Nov. 2015
  • 31. Project objectives and benefits • Test new means of crowdsourcing to support the enhancement of content in BHL • Demonstrate if digital games are an effective tool for analyzing and improving digital outputs from OCR and transcription • Benefits of gaming include: – improved access to content by providing richer and more accurate data; – an extension of limited staff resources; and – exposure of library content to communities who may not know about the collections otherwise.
  • 32. OCR Improvements German text interpreted by the OCR process as: “unb auf ben ©elnrgen be6 fublic{)en”
  • 33. OCR Improvements Different resulting texts from parsing the phrase: “und auf den Gebirgen des südlichen Deutschlands” (“and on the mountains of southern Germany”) IA OCR OCR 2 Transcription 1 Transcription 2 1 unb und und und Ok 2 den ben den den Ok 3 ©elnrgen ©ebirgen Bebirgen Gebirgen X 4 be6 des de5 des Chk 5 fublic{)en fublichen Füdlichen Südlichen X 6 £)eittfc{)(anb6 Deutfchlanbs Deutfchlands Deutschlands X
  • 34. Purposeful Gaming
  • 35. Currently… • Evaluating Transcription Tools… • Setting up the Workflow for
  • 36. iDigBio’s aOCR Hackathon • Improve OCR parsing of labels with clear metrics (datasets, output formats, scoring algorithm) • Libraries of regular expr. to clean up each field (different error correction for latitude/longitude coordinates than personal names or herbarium catalog numbers) • Tool for classifying segments of the image before submitting to OCR • Do a first pass of OCR to clean images before sending them to a second, 'real' pass of OCR
  • 37. iDigBio’s CITScribe Hackathon 1. Interoperability betweenpublic participation tools and biodiversity data systems, 2. Transcription quality assessment/quality control (QA/QC) and the reconciliation of replicatetranscriptions, 3. Integration of optical character recognition (OCR) into thetranscription workflow 4. User engagement
  • 38. NfN & iDigBio’s CITScribe Hackathon • Jason Best’s DarwinScore • Ben Brumfield’s Handwriting Gibberish Detector • Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names) • Word Clouds created using n-gram scoring, faceting, and Solr for indexing + Carrot2 for specimen selection (visualize and explore of the use with a word of interest from the word cloud) and a data cleaning step (highlight infrequent words by the system).
  • 39. NESCent EOL-BHL Research Sprint There is no place like home: Defining “habitat” for biodiversity science Robert D. Stevenson UMass Boston, Dept. of Biology, 100 Morrissey Blvd., Boston, MA 02125-3393 Carl Nordman (Natureserve) and Evangelos Pafilis Hellenic Centre for Marine Research, P.O. Box 2214, Heraklion, 71003, Crete, Greece
  • 40. NESCent EOL-BHL Research Sprint Assessing Risk Status of Mexican Amphibians Through Data Mining. Esther Quintero and Bárbara Ayala National Commission for Knowledge and Use of Biodiversity (CONABIO) and Anne Thessen Marine Biological Laboratory and Arizona State University
  • 41. Planning for global change: using species interactions in conservation Nicole F. Angeli, Emma P. Gomez, Margot A. Wood, Applied Biodiversity Sciences Program, Texas A&M University, College Station, Texas nangeli1@jhu.edu Tweet me @auratus_nicole and Javier Otegui University of Colorado-Boulder
  • 42. There is no place like home: Defining “habitat” for biodiversity science Robert D. Stevenson UMass Boston, Dept. of Biology, 100 Morrissey Blvd., Boston, MA 02125-3393 Carl Nordman (Natureserve) Evangelos Pafilis Hellenic Centre for Marine Research, P.O. Box 2214, Heraklion, 71003, Crete, Greece http://epafilis.info/ , vagpafilis@gmail.com
  • 43. Evolution in the usage of anatomical concepts in the biodiversity literature Todd Vision (tjv@bio.unc.edu), Prashanti Manda (manda.prashanti@gmail.com), and Dongye Meng (dmeng@cs.unc.edu) University of North Carolina at Chapel Hill
  • 44. NESCent EOL-BHL Research Sprint Evolution in the usage of anatomical concepts in the biodiversity literature Todd Vision (tjv@bio.unc.edu), Prashanti Manda (manda.prashanti@gmail.com), and Dongye Meng University of North Carolina at Chapel Hill
  • 45. Some preliminary observations… • Our API seemed to work fine • Access via a taxon (or a group), for example: “I want to harvest all pages with names from this taxon (Chordata) or this common name (Vertebrate)”. • Groups started getting results after 2.5 days. • The structure of BHL was explained so researchers could understand the title, item, page and part levels and define what they wanted. Ex: one group was looking for terms in the titles and the parts’ titles. • Some others said they would Harvest the OCR from IA although they will not be able to harvest the text on a page by page granularity (only item level).
  • 46. NESCent EOL-BHL Research Sprint There is no place like home: Defining “habitat” for biodiversity science Robert D. Stevenson UMass Boston, Dept. of Biology, 100 Morrissey Blvd., Boston, MA 02125-3393 Carl Nordman (Natureserve) and Evangelos Pafilis Hellenic Centre for Marine Research, P.O. Box 2214, Heraklion, 71003, Crete, Greece
  • 47. Mining Biodiversity
  • 48. Mining Biodiversity • Mining Biodiversity: Enriching Biodiversity Heritage with Text Mining and Social Media • One of the international projects that won in the third round of the 2013 Digging Into Data Challenge • Promote the development of innovative computational techniques to apply into big data in the humanities and social sciences – The National Centre for Text Mining (UK) – Missouri Botanical Garden (US) – Dalhousie University's Big Data Analytics Institute (Canada) – Social Media Lab (Canada)
  • 49. MiBIO: Mining Biodiversity 1. Automatic error correction of OCR text errors. 2. Crowdsource annotation of legacy texts with semantic metadata. 3. Adapt text mining techniques to extract terminology, entities and significant events automatically and to track terminology evolution over time. 4. Use Interactive visualization techniques to help users manage search results through next generation browsing capabilities, assisted by a semantic similarity network of important terms and entities. 5. Design of a social media layer, serving as an environment for diverse users to interact and collaborate on science, public education, awareness and outreach.
  • 50. MiBIO: Mining Biodiversity •
  • 51. Crowdsource Markup Display text Species Profile Model category General/summary TaxonBiology Geographic range Distribution Habitat Habitat Food sources and feeding behavior TrophicStrategy Physical description (general) Description Physical description (detailed morphology) DiagnosticDescription
  • 52. Visit to NaCTeM, Feb. 17, 2014
  • 53. NaCTeM’s Biodiversity- relevant tools
  • 54. ANNNOTATION PLATFORM
  • 55. Remote Processing Workflows processed on remote machines. No attendance needed Workflows GUI for creating single-flow and multi-branch workflows Workflow Designer User Interaction Annotation Editor allows for making changes while processing Annotator/Curator WebService Third-party applications Processing Components Data (de)serialisation, search engines, NLP, NER, etc. Developers
  • 56. Workflows view
  • 57. Processes View
  • 58. Documents view
  • 59. Workflow editor
  • 60. Workflow as a Web service
  • 61. Workflow as a Web service http://argo.nactem.ac.uk/test/services/webservice/314 INPUT OUTPUT
  • 62. NAMED ENTITY RECOGNISERS AND NORMALISERS
  • 63. ✔ ✔ ✔ ✔ ✔
  • 64. Automatically recognised named entities
  • 65. Linking to external dictionaries
  • 66. Species and habitat recognition
  • 67. EVENT EXTRACTORS
  • 68. Events: associations between entities
  • 69. SEMANTIC SEARCH
  • 70. TERM EXTRACTION
  • 71. Dalhousie SocialLab’s Netlytic.org
  • 72. http://miningbiodiversity.com/http://miningbiodiversity.org/
  • 73. Thank you William Ulate BHL Technical Director Missouri Botanical Garden william.ulate@mobot.org Skype: william_ulate_r

×