This document summarizes the digitization workflow and challenges faced by the Biodiversity Heritage Library (BHL) in digitizing biodiversity literature. It discusses the processes of scanning, optical character recognition (OCR), scientific name mapping, and crowdsourcing corrections. OCR is especially challenging for BHL due to the variety of languages, time periods, typefaces, and inclusion of Latin names. The document provides examples of efforts to improve OCR accuracy through tools like TaxonFinder, WikiSource, and crowdsourcing corrections through CiteBank. In conclusion, it emphasizes that BHL makes its digitized literature freely available through open access to enable various types of multidisciplinary research.
Presentation on CIAT's IABIN tools project on threats to biodiversity in Latin America, presented in Costa Rica in February 2011. See http://dapa.ciat.cgiar.org for more information.
Mapping the dynamic component of biodiversity: A GIS analysis of migratory species diversity.
Talk presented at IX INTECOL International Congress of Ecology
Jointly Held with
the 90th Annual Meeting of the Ecological Society of America
(ESA)
Montreal, Canada, August 7-12, 2005
"Ecology at Multiple Scales"
Presentation on CIAT's IABIN tools project on threats to biodiversity in Latin America, presented in Costa Rica in February 2011. See http://dapa.ciat.cgiar.org for more information.
Mapping the dynamic component of biodiversity: A GIS analysis of migratory species diversity.
Talk presented at IX INTECOL International Congress of Ecology
Jointly Held with
the 90th Annual Meeting of the Ecological Society of America
(ESA)
Montreal, Canada, August 7-12, 2005
"Ecology at Multiple Scales"
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) knowledge organization system (KOS) work program for the National Center for Biomedical Ontology (NCBO) Web seminar series in October 2012. Available at http://www.bioontology.org/GBIF-vocabulary-management-for-biodiversity-informatics
Sherborn: Scholz - BHL-Europe: Tools and Services for Legacy Taxonomic Litera...ICZN
Literature research is the base for the scientific work of taxonomists. Therefore, large and well-curated natural history libraries are a very important prerequisite to carry out scientific projects efficiently. The library work, however, has several serious limitations that slow down the work significantly. The natural history library corpus is highly fragmented and scattered. In particular much of the early published literature is rare or is only available in a very few libraries. A lot of time and effort is involved to find and collect all scientific works that are necessary for a specific project.
Today, quick and easy access to digital literature is more and more important to facilitate scientific work. Over the last few years a large number of library resources for taxonomists have been made available online. Since 2007, the Biodiversity Heritage Library (BHL) project is digitising the biodiversity literature holdings of numerous libraries in the UK and USA and making them available on the internet.
Since 2009, the eContentplus project Biodiversity Heritage Library for Europe (BHL-Europe) is developing four different access routes to the biodiversity literature digitised by many European and global partners over the last years. With the Global References Index to Biodiversity (GRIB, http://grib.gbv.de/), BHL-Europe provides in collaboration with the EDIT project a union catalogue of library holdings of many European and US libraries. This will facilitate the search for literature, either digitised or not. This tool will also facilitate the management of digitisation projects all over the world and collect scan request from the scientific community. For an effective access to already digitised literature, BHL-Europe is building a multilingual portal for the scientific community. This portal will also have functionalities currently not available in the BHL portal. The BHL-Europe Portal will, for example, facilitate the search for common and scientific names of biological organisms as well as person names through the implementation of various webservices (e.g. Catalogue of Life, VIAF). The backbone of the portal is a preservation and archive system built on a customised storage infrastructure housed by the Natural History Museum in London. We are currently collecting digitised literature from 27 different content providers on our servers, including all the content that is currently available through the BHL portal (http://www.biodiversitylibrary.org). In order to serve also a broader audience, the digitised literature available by BHL-Europe is also accessible by Europeana, Europe's digital library, archive and museum (http://www.europeana.eu/).
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) knowledge organization system (KOS) work program for the National Center for Biomedical Ontology (NCBO) Web seminar series in October 2012. Available at http://www.bioontology.org/GBIF-vocabulary-management-for-biodiversity-informatics
Sherborn: Scholz - BHL-Europe: Tools and Services for Legacy Taxonomic Litera...ICZN
Literature research is the base for the scientific work of taxonomists. Therefore, large and well-curated natural history libraries are a very important prerequisite to carry out scientific projects efficiently. The library work, however, has several serious limitations that slow down the work significantly. The natural history library corpus is highly fragmented and scattered. In particular much of the early published literature is rare or is only available in a very few libraries. A lot of time and effort is involved to find and collect all scientific works that are necessary for a specific project.
Today, quick and easy access to digital literature is more and more important to facilitate scientific work. Over the last few years a large number of library resources for taxonomists have been made available online. Since 2007, the Biodiversity Heritage Library (BHL) project is digitising the biodiversity literature holdings of numerous libraries in the UK and USA and making them available on the internet.
Since 2009, the eContentplus project Biodiversity Heritage Library for Europe (BHL-Europe) is developing four different access routes to the biodiversity literature digitised by many European and global partners over the last years. With the Global References Index to Biodiversity (GRIB, http://grib.gbv.de/), BHL-Europe provides in collaboration with the EDIT project a union catalogue of library holdings of many European and US libraries. This will facilitate the search for literature, either digitised or not. This tool will also facilitate the management of digitisation projects all over the world and collect scan request from the scientific community. For an effective access to already digitised literature, BHL-Europe is building a multilingual portal for the scientific community. This portal will also have functionalities currently not available in the BHL portal. The BHL-Europe Portal will, for example, facilitate the search for common and scientific names of biological organisms as well as person names through the implementation of various webservices (e.g. Catalogue of Life, VIAF). The backbone of the portal is a preservation and archive system built on a customised storage infrastructure housed by the Natural History Museum in London. We are currently collecting digitised literature from 27 different content providers on our servers, including all the content that is currently available through the BHL portal (http://www.biodiversitylibrary.org). In order to serve also a broader audience, the digitised literature available by BHL-Europe is also accessible by Europeana, Europe's digital library, archive and museum (http://www.europeana.eu/).
Documenting Ferguson: Building a community digital repositoryChris Freeland
The August 2014 shooting death of Michael Brown in Ferguson, Missouri, along with other recent police-involved shootings around the country have inspired demonstrations, conversation, debate and calls for systemic change in our society. Soon after Brown’s shooting, Washington University Libraries and other St. Louis cultural heritage institutions established a repository to document events in or inspired by Ferguson. Appropriately named Documenting Ferguson, this community-sourced open repository now has more than 1,500 files of digital photographs, video recordings and other media contributed from all over the country. These are viewable online at http://digital.wustl.edu/ferguson. Video of this talk available at https://www.youtube.com/watch?v=_6whGNsesYA.
Newman Numismatic Portal Overview - Mar 2015Chris Freeland
The Newman Numismatic Portal will create the world’s most comprehensive online encyclopedia of American and Colonial coinage, currency, realia, and related correspondence and published literature. Materials from the Eric P. Newman Numismatic Education Society’s coin collections and supporting reference libraries will be digitized along with University collections and made available to an online community of scholars and enthusiasts. Digital content will be stored, curated and preserved by specialists in the Libraries, with corresponding curatorial activities on physical/analog materials. Outreach activities will raise awareness about the research portal and its contents.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Digitization and enhancement of biodiversity literature through OCR, scientific names mapping and crowdsourcing
1. Digitization and enhancement of biodiversity literature through OCR, scientific names mapping andcrowdsourcing Chris Freeland Technical Director, Biodiversity Heritage Library BioSystematics Berlin 2011 22 Feb 2011 http://biodiversitylibrary.org/page/33061402
6. OCR is a *BIG* challenge All book / literature digitization projects affected, not just BHL Especially problematic in BHL More than 50 languages represented in BHL Dates of publication from 1400’s to 2000’s Irregular typeface / typesetting Multiple languages on one page Botanical descriptions in Latin
7. Abbildungenund Beschreibungen der FischeSyriens, nebst einerneuen Classification und Characteristik sämmtlicherGattungen der i JOH. JAKOB HECKEL, Inipectoiam k. k. Hof-Natur.-iUenkabinete in Wien, mehr, yelelirt. UeHtllMeii. MIfglivd. STUTTGART. E. Schweizerbart' seheVerlagshandlung, 1843.
8. *E.xvi�c�piteI von c. cXx.WptdvonfnrWmn bu�fbe;bcn.5 am cixbIa� S &3rn~ 41X a�mcv(f b1air�'o�et ertoiensr�; �', :�hlrfc�cwa ff�4am.diug bist a 6aiw~s ff oJrJtwtnof bL4ecImt& blfaframembt wag `wr 4 cnwiu 4 e8t5m.ed bvUratflb ck wuo, ma144'*4I bttE5rmbebt =rt3'kn am4ra tifvrmrWaff C * t6rmnli an `tn�ciblatGteaMw ?ffoaifrn w4wmeu nu weibe , wpiteI voE5teiri ct cobergtUcr cit cm` 91 cLibiar J ' >bSciatl�Oiff ;Bruetwacfttcnqmcx b1a bl: bt5c lttmtt bb9 lkrw.llr#eitincnxoa ff cu :rtrtuft *et� B Rn "�trv W1Rt' ?Cm cblaswaIwutrOber�citi 1V Ces ' wt gbtiemwwajfutpctt, afferain 9 c: b�titbfof�rferanmrs bra wlg auig4;f aer�m *mc vrtblatcabtfmwfruan'deg~mrtblasIaumbwWt� run fncmai b14ianf tJobrrfan ebrut4net vnberBrwtOberawawi*m.crriiibtafwfmuwwc on$ 'it ttuwttkc 5,10 $ m~Cfcatrc* cxu W�e�&mcyfbq4 Mabttmmwrc a iiubcJcnncI.end.*, blat s. au:�rprd3 rw4ftf wm c ii,+ ttCCtnwa frr9fr orfabfcfbtenbcoptitibt -r9 ceDattDcn i34M snSemi
9. 2007 Name Finding Study 35.16% >35% OCR error rate for names only Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. Top OCR errors Wei, et al. An Evaluation of Taxonomic Name Recognition (TNR) in the Biodiversity Heritage Library. Proceedings of TDWG. 2008. http://www.tdwg.org/proceedings/article/view/380
10. WikiSource Trove - National Library of Australia Manual techniques for text correction
12. Goal: Semi-automated text correction OCR + Machine Learning + Users Let machines do raw processing Develop algorithms for natural language processing & machine learning Build a community of (human) users to help reCAPTCHA as an example Why not just use reCAPTCHA? Google bought it *More work needed here*
14. TaxonFinder API response Name finding via TaxonFinder Extract names Submit to NameBank Image from Scanner Converted to text via OCR Name Finding in action withuBio’sTaxonFinder…
28. CiteBank: http://citebank.org New search index to BHL content Platform for journals/publishers/societies in need of tools to store & share their digitized content Access to “crowdsourced” articles from BHL scans
29.
30.
31.
32.
33.
34. Crowdsourcing Statistics & Analysis Analysis http://biodiversitylibrary.blogspot.com/2009/04/pdf-article-metadata-analysis.html At that time, more than 80% of the PDFs created had metadata attached by users More than 50% contributed accurate article-level information New analysis over more data this summer / fall Now have more than 58,000 PDFs to analyze
35. Open Data = More Use Scholars Rod Page iPhylo BioGUID BioStor Ryan Schenk Other Apps EarthCape ZipecodeZoo
36. Conclusion BHL is a massive dataset useful for multidisciplinary research Systematics Natural Language Processing Humanities BHL is open Free to use at http://biodiversitylibrary.org Open access data for scholarly use & reuse BHL has APIs and data exports to enable reuse BHL data can be incorporated into other virtual research environments (EOL, Scratchpads, BioStor, others)
37. Questions? Chris Freeland Technical Director, Biodiversity Heritage Library Director, Center for Biodiversity Informatics, Missouri Botanical Garden Missouri Botanical Garden 4344 Shaw Blvd. St. Louis, MO 63110 USA Email: chris.freeland@mobot.org Twitter: @chrisfreeland Blog / info: chrisfreeland.com BioSystematics Berlin 2011 22 Feb 2011