The document discusses batch management strategies for mass digitization projects. It describes how the U.S. National Herbarium digitization project handles digitization in batches, from scanning specimens in batches of 1,000-3,000 images to transcribing specimen labels and folder labels in batches of up to 4,000 records. Each batch is assigned an identification number to keep track of the records. Batches then need to be combined to create individual records. Issues can arise if batches are incomplete or not properly imported, so batch tracking is an important management tool.
This is the webinar presented on the 31st March 2016 l as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=wSewa8Begpg
Bypassing Secure Boot using Fault InjectionRiscure
The Fault Injection attack surface of Secure Boot implementations is determined by the specifics of their design and implementation. Using a generic Secure Boot design we detail multiple vulnerabilities (~10) using examples in source code, disassembly and hardware. We will determine what the impact is of the target's design on its Fault Injection attack surface: from high-level architecture to low-level implementation details. Research originally presented in November 2016 at BlackHat Europe.
Batch Upload of Multimedia Files Using Import ToolAxiell ALM
andra Judge , Cataloger, Ingenium: Canada's Museums of Science and Innovation
Erin Secord, Conservator, Ingenium: Canada's Museums of Science and Innovation
This is the webinar presented on the 31st March 2016 l as part of the Ensembl Online Webinar series. You can view the recorded webinar on the Ensembl Helpdesk youtube channel https://www.youtube.com/watch?v=wSewa8Begpg
Bypassing Secure Boot using Fault InjectionRiscure
The Fault Injection attack surface of Secure Boot implementations is determined by the specifics of their design and implementation. Using a generic Secure Boot design we detail multiple vulnerabilities (~10) using examples in source code, disassembly and hardware. We will determine what the impact is of the target's design on its Fault Injection attack surface: from high-level architecture to low-level implementation details. Research originally presented in November 2016 at BlackHat Europe.
Batch Upload of Multimedia Files Using Import ToolAxiell ALM
andra Judge , Cataloger, Ingenium: Canada's Museums of Science and Innovation
Erin Secord, Conservator, Ingenium: Canada's Museums of Science and Innovation
Using Emu to Manage a Directory of the World’s HerbariAxiell ALM
Joel Ramirez and Barbara M. Thiers, Web Developer for Biodiversity Information Management and Vice President, Patricia K. Holmgren Director of the William and Lynda Steere Herbarium, and Curator of Bryophytes, New York Botanical Garden
Welcome from the New York Botanical GardenAxiell ALM
Barbara M. Thiers, Vice President, Patricia K. Holmgren Director of the William and Lynda Steere Herbarium, and Curator of Bryophytes, New York Botanical Gardens
Every year we create a calendar to celebrate our customers. We showcase 12 objects from collections around the world and send the calendars to all our staff and customers.
We're delighted to have finalised our 2018 calendar!
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Using Emu to Manage a Directory of the World’s HerbariAxiell ALM
Joel Ramirez and Barbara M. Thiers, Web Developer for Biodiversity Information Management and Vice President, Patricia K. Holmgren Director of the William and Lynda Steere Herbarium, and Curator of Bryophytes, New York Botanical Garden
Welcome from the New York Botanical GardenAxiell ALM
Barbara M. Thiers, Vice President, Patricia K. Holmgren Director of the William and Lynda Steere Herbarium, and Curator of Bryophytes, New York Botanical Gardens
Every year we create a calendar to celebrate our customers. We showcase 12 objects from collections around the world and send the calendars to all our staff and customers.
We're delighted to have finalised our 2018 calendar!
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
3. 2.7 million digital descriptive records
1.6 million specimen images
0
500000
1000000
1500000
2000000
2500000
3000000
Pre-2014 2014 2015 2016 2017
Inventoried Imaged
U.S. National Herbarium Digitization
4.
5.
6.
7.
8. Batch Management
• A batch is a quantity of the material produced during a given time
period or production run.
• In a mass production settings, production is usually executed
in batches.
• Keeping tracking of records individually can be inefficient
• Conveyor digitization creates several batches of different material
each day. Batches are identified by id number.
• All batches need to meet at some future point to create individual
records.
9.
10. Alembo transcribes Specimen
Labels transcribed by Alembo
Picturae batches label
transcriptions in sets of 4000
and does preliminary review
NMNH Botany reviews label
transcriptions at 2.5% check
Accepted Label
Transcription Sets
added to Master
Transcription SQL
db
Batches of 30,000-40,000 transcription created
from master SQL db; all records in batch
reviewed for import to EMu
Rejected Sets
returned to
Picturae for
correction
Alembo transcribes cover
taxonomic names; EMu
taxonomic irns added if in
picklist
Picturae batches cover
transcriptions in sets of
100-4000 and does
preliminary review
NMNH Botany reviews label
transcriptions and adds
missing EMu irns
Taxonomic irns
added to import
batches
Import scripts run on import
batch
Import to EMu
Botany Conveyor
Project:
Transcription Workflow
11. Conveyor Batches
Scanning batch
• Specimen image set (1-3K scans) batched of the conveyor
• Images batched from conveyor server to DAMS
batch id follows image from conveyor to EMu multimedia record
Transcriptions batch of specimen labels
• Transcription sets of specimen labels (1-4K records) batched from Alembo (1st id)
• Transcription set batched for import to EMu (2nd id)
Scanning batch id kept internally in transcription records as well as included in multimedia record
Alembo transcription batch id kept internally in transcription records
Import id included in EMu catalog record
Transcription batches of folder labels/ taxonomy
• Transcription sets of folder labels (400-2000 records) batched from Alembo
• Folder label records are assigned EMu taxonomy irns
• IRNs assigned to individual records in transcription import batch
Scanning batch id kept internally in transcription records
13. Scanning batch
• Specimen image set (1-3K scans) batched of the conveyor
• Images batched from conveyor server to DAMS
14. Scanning batch
• Specimen image set (1-3K scans) batched of the conveyor
• Images batched from conveyor server to DAMS
batch id follows image from conveyor to EMu multimedia record
15. Transcriptions batch of specimen labels
• Transcription sets of specimen labels (1-4K records) batched from Alembo (1st id)
16. Transcriptions batch of specimen labels
• Transcription sets of specimen labels (1-4K records) batched from Alembo (1st id)
• Transcription set batched by NMNH for import to EMu (2nd id)
Alembo batch id kept internally in transcription records as well as included in multimedia record
Alembo transcription batch id kept internally in transcription records
17. Transcriptions batch of specimen labels
• Transcription sets of specimen labels (1-4K records) batched from Alembo (1st id)
• Transcription set batched by NMNH for import to EMu (2nd id)
Alembo batch id kept internally in transcription records as well as included in multimedia record
Alembo transcription batch id kept internally in transcription records
NMNH Import id included in EMu catalog record
18. Transcription batches of folder labels/ taxonomy
• Transcription sets of folder labels (400-2000 records) batched from Alembo
• Folder label records are assigned EMu taxonomy irns
19. Transcription batches of folder labels/ taxonomy
• Transcription sets of folder labels (400-2000 records) batched from Alembo
• Folder label records are assigned EMu taxonomy irns
• IRNs assigned to individual records in transcription import batch
Scanning batch id kept internally in transcription records
20. Why does this matter?
• Important management tool
• Important for tracing errors and issues
21. Hi Sylvia,
I just finished reviewingTSI_20160825_BATCH_01_MS.
Overall, I would probably accept this batch, but there was another issue I noticed. Several
chunks of records do not have working JPG links, and I could not locate the barcodes in the
correctly dated folders or just in the JPG file in general. So I am not sure where the images went
for these records. There are complete transcriptions recorded for them, but I’m not sure how
to check them with the images. Here were the records with issues:
ID # 138-403 (Folder dates: 02/05, 07/15, 01/29)
ID # 1154-1330 (Folder dates: 01/29, 07/22, 02/05)
ID # 1395-1458 (Folder dates: 02/05)
ID # 1483-1533 (Folder dates: 02/05, 06/16, 07/15)
So it looks like the problematic dates are: 01/29, 02/05, 06/16, 07/15, and 07/22.
Example 1
22. Everyone,
We have multiple image groups that are not in DAMS and theVFCU reports. All the dates
are Fridays with one exception.
It looks like the problematic dates are: 01/29, 02/05, 06/16, 07/15, and 07/22.
For example on 7/15 we are missing the Tiff/Iiq
01842476
From the VFCU for 7/15 I can see the last image on that day was “01842475” but the sequence
does not pick up the following day of production on 7/18.
However there is a transcribed image that we can’t see nor can we find a deliverable on the
picturae server.
- Could this be a permission issue?
- A batching error?
- How are the jpgs created – from the IIQ or TIF or at the point of capture?
We are working on creating a list of everything we don’t have deliverables for from
TSI_20160825_BATCH_01_MS
Our concern is that there are many other dates besides the ones I outlined above which we
only discovered due to transcription checking.
23.
24. Example 2
Hi Stephanie. I have a few pockets of missing Multimedia records in EMu, and want to
make sure that these images all exist in the DAMS before I send a request to NMNH IT to
import these images to EMu. I will give you a list of the missing images. Sylvia
25. Example 2
Hi Stephanie. I have a few pockets of missing Multimedia records in EMu, and want to make
sure that these images all exist in the DAMS before I send a request to NMNH IT to import
these images to EMu. I will give you a list of the missing images. Sylvia
Sylvia,
Most were picked up byVFCU but not delivered to EMu. These are the directories
we are focusing on:
The files from this list that DID go throughVFCU were in these directories:
nmnh-botany-20160630
nmnh-botany-20161202
nmnh-botany-20170130
nmnh-botany-20161031-reprocessed_tifs
nmnh-botany-reshoots-2016-sep-part2-reprocessed_tifs
26. Example 2
Reasons for non-delivery to EMu
• Batches not marked for pickup by EMu
• Batches had errors in them and not imported
• Partial loading of batch, but process failed
All missing images were affected by batch errors, not individual record errors.
31. In conclusion…
• Mass digitization is mass production, and should be managed as such in batches
• Patterns are constant in large amounts of data
• Always look at the forest when thinking about the trees
Editor's Notes
We significantly increased our rate of digitization and now have over 2.398 million digital descriptive records and 1.4 million specimen images. The conveyor belt has imaged over 1 million specimens, with 900,000 records being transcribed to create digital descriptive records. The remaining 100,000 were specimens that had previously been inventoried and were simply imaged on the conveyor belt. This has been a great way for us to significantly increase our rate of digitization, and with an estimated 5 million pressed specimens in our collection, it is moving us towards our ultimate goal of a completely databased and imaged herbarium collection.