SlideShare a Scribd company logo
1 of 79
You ask, we scan MARAC Conference October 30 2009 The Amsterdam City Archives  and the Archiefbank
This morning  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MARAC Conference October 30 2009 MARAC Conference October 30 2009
Growing FAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MARAC Conference October 30 2009 City Archives 1848 - 2009
Growing FAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MARAC Conference October 30 2009 City Archives 1848 - 2009
BUT… less visitors each year MARAC Conference October 30 2009 17.958 2006 25.014 2002 26.598 1998 27.738 1992 29.788 1988 24.027 1982 Reading rooms Year Visitors
Archives are dusty MARAC Conference October 30 2009
MORE webvisitors  And… MARAC Conference October 30 2009 512.592 17.958 2006 224.050 25.014 2002 40.048 26.598 1998   27.738 1992   29.788 1988   24.027 1982 Website Reading rooms Year Visitors
New ,[object Object],[object Object],[object Object],[object Object],Service Concept MARAC Conference October 30 2009
How ,[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],[object Object],[object Object],to attract visitors? MARAC Conference October 30 2009
How ,[object Object],[object Object],[object Object],[object Object],[object Object],to create an internet reading room? MARAC Conference October 30 2009
MARAC Conference October 30 2009
You ask Scanning on customer’s request, economic principles, technical issues  and work process We Scan MARAC Conference October 30 2009
You ask We Scan MARAC Conference October 30 2009 We Store We Do Scanning on customer’s request, economic principles Image quality and workflow principles Compression and filesize Workflow, tools and practical issues
Q. How long does it take to scan it all? MARAC Conference October 30 2009 1 feet = 2.000 scans Production = 10.000 scans a week A. 406 years Will  this  be our ultimate solution? Q. How many scans can be made from 20 miles of archives? A.  739.200.001 scans
The user doesn’t commit to anything by placing a request, but neither does the archive You ask We let our users set priorities in digitization  In principle all requests are honored, unless We speak of a  request  for digitization and not of an  order   MARAC Conference October 30 2009 1.  Scanning at customer’s request It can not be digitized for material reasons Copyright material Disclosure restrictions apply All archive files can be requested for digitization via the online the finding aids
Costs for purchasing scans are equal for all users (the more you buy, the cheaper it gets) Scans available are integrated in the online finding aids All scans made are available for all users The requester is not obliged to buy all scans  MARAC Conference October 30 2009 You ask 1.  Scanning at customer’s request
Customers think a low price is important This means that costs for producing and storing scans have to be as low as possible Archival research easily runs into the use of dozens to hundreds of documents You ask The price of an ordinary copy in our reading room should be the benchmark MARAC Conference October 30 2009 2.  Low costs 100 scans should not cost $ 100 The costs when purchasing scans online should be competitive with travel costs when visiting our reading room
This asks for a streamlined, efficiently organized work process You ask Digitization takes time, but research should not have to be planned weeks ahead Delivery time in a scanning on request service should be as short as possible MARAC Conference October 30 2009 3.  Fast delivery Aim is a delivery time of 2 – 3 weeks
An efficiently organized work process Low incidental and structural costs You ask MARAC Conference October 30 2009 Conclusion If we can make sure that  All finding aids can be selected for digitization by users The scans are delivered in short time For low costs it can be stated that we have no backlog in digitizing and the objective that the customer is able to consult digitized item has been achieved We need:
Besides scanning on request projects are based on: In this presentation the focus is on large scale digitization at customer’s request We scan However, scanning on request is only a part of all digitization that takes place in the archives MARAC Conference October 30 2009 Digitization at the Amsterdam City Archives in general Grant money (often on specific topics, like WWII) Selections of photographs, drawings etc for the Imagebank (Beeldbank) Cooperation with Amsterdam district councils and services
Goals of digitization projects vary from access to substitution of the originals In every project quality standard and method are set, depending on purpose and type of material For all projects we have  one  workflow We always work on project basis We scan MARAC Conference October 30 2009 Digitization at the Amsterdam City Archives in general
Experience shows that a constant production of 10.000 scans (at cutomer’s request) each  week is achievable This way tasks can be planned best and deployment of staff is most efficient We scan 1. At large scale the more scans being made, the lower the price per scan Large scale production is a prerequisite in order to keep production costs as low as possible MARAC Conference October 30 2009 2. With a constant production Large scale production can only be organized effectively when constant production is assumed
Documents that are being digitized in this reproduction process can have the following forms We scan MARAC Conference October 30 2009 Small and large size Bound and loose-leafed entities Card indexes Old and modern material Low and high contrast documents Text alone, text and image together Hybrid forms 3. A broad spectrum of document types
Costs for producing and storing scans are determined to a high extent by the quality standard  set for the scans Purpose of the scans:  archival research using the web, straight from screen or print We scan 4. For archival research from screen or print The higher the standard of quality, the higher the costs will be In order to keep costs low it is prudent to allow the standard of quality follow from the requirement the end user places on the scan Textual information legible in de originals must be legible in the scans MARAC Conference October 30 2009
But has no added value for the customer at all A quality higher than that inevitably will push up both incidental and structural costs We scan 4. For archival research from screen or print Specified (basic) quality standard: MARAC Conference October 30 2009 Reproduction of all significant information  Reproduction of details which are not part of the textual information is not required
We scan MARAC Conference October 30 2009 Scan quality and legibility High quality scan Modified scan (contrast) Optimal tonal range Example: very “light” original Excellent flexibility Poor tonal range Little flexibility Experience in practice learns that what is experienced as being “good legibility” is very personal.  We decided to solve this problem with a smart filter in the document viewer. Poor legibility Excellent legibility Which one would you buy?
Skimming on the quality of scans (it can be better) is purely an economic decision, not one taken  on principle We scan MARAC Conference October 30 2009 4. For archival research from screen or print It  does  make sense to let the standard of quality follow from the purpose the end-uses places on of the scans Price rates scanning, external partner 0,05 $ Legibility, auto-feed 0,30 – 0,75 $ Legibility 3 – 10 $ High-end Price comparison scanning costs
This way damage or loss of the originals is ruled out After digitization the originals can not be requested in the reading room anymore We scan 5. For conservation and security The scans in the scanning on request service are made for the purpose of access / archival research Not  as a substitute for the originals Nevertheless, digitization does have a real conservation function MARAC Conference October 30 2009 Conservation of the originals remains the major concern
A file can contain one – hundreds of documents We scan By definition the entire file is scanned Never just a selection of pages There are a few reasons for this: MARAC Conference October 30 2009 6. Always complete files The costs for scanning are not so much a factor of quantity, but rather of the manual processing involving in it In the originals or the metadata it has to be indicated which documents are being digitized When shown in the Archiefbank, the user expects completeness When non-scanned pages have to be digitized later, the entire preparation process has to be gone through once again
Contracting out of scanning was a logical choice We scan The in-house scan facilities are not designed for large-scale digitizing The complexity of the workflow and material to be scanned calls for Investing only makes sense by very high production, organized on a large scale MARAC Conference October 30 2009 7. Contracting out the scanning to external partners Specialized hard- and software Specialized set-ups Knowledge Very complex technical infrastructure
This calls for intensive collaboration Also, the workflows of archive and digitizer have to dovetail We scan There are many scanning companies Most do have experience in bulk processing But not in this degree of complexity and diversity MARAC Conference October 30 2009 7. Contracting out scanning is more than awarding a contract to a supplier Contracting out the scanning to external partners
We use a combination of 1 and 3 We store Storage costs still are considerably high when producing large quantities of scans In order to bring structural costs down file size of the scans has to be as low as possible This can be achieved in three ways  MARAC Conference October 30 2009 Scans with a file size as small as possible 1. Skimming on resolution 3. Using (lossless or lossy) compression on the files 2. Skimming on bit depth / amount of colors (only possible in formats like TIFF and PNG)
We store Resolution, compression and legibility: an example MARAC Conference October 30 2009 300 dpi, high quility JPEG 200 dpi, low quility JPEG Scans with a file size as small as possible
We store Storage of compressed files as master images was “not done” The main arguments where Research after these arguments learned: MARAC Conference October 30 2009 When using lossy compression you’ll loose information Compressed files are more vulnerable (preservation) Even when using strong lossy compression legibility is still guaranteed Compressed files are not more vulnerable to loss then uncompressed files But no compression means: large files    high storage costs Storage of uncompressed files is not necessary Scans with a file size as small as possible
MARAC Conference October 30 2009 Comparison between file format, compression,  resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
TIFF uncompressed MARAC Conference October 30 2009 Comparison between file format, compression,  resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
JPEG (psd) 10 MARAC Conference October 30 2009 Comparison between file format, compression,  resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
JPEG (psd) 4 MARAC Conference October 30 2009 Comparison between file format, compression,  resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
JPEG2000 lossless MARAC Conference October 30 2009 Comparison between file format, compression,  resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
We store Comparison storage costs MARAC Conference October 30 2009 Storage of 500.000 images Avg size per scan uncompressed =  22,1 MB Price rate : 1 TB, storage in a controlled e-repository environment on two separate locations, including IT costs $ 7.000  (NLD, nov 2009) Scans with a file size as small as possible (File)size still  does  matter! $ 420.000 $ 8.680 $ 77.000 $ 770.000 Costs 10 years $ 42.000 $ 868 $ 7.700 $ 77.000 Costs 1 year 6 TB 124 GB 1,1 TB 11 TB Storage JPEG 2000 (part 1, ll) JPEG 4 (200 dpi) JPEG 10 Tiff uncompressed Fileformat
Projects with different goals, document types and partners take place at the same time A streamlined, standardized process is indispensable when digitizing on a large scale  Guidelines and best practices often take no account of these complex factors  and the amount of scans to be produced We developed a process in which large scale and flexibility are starting points All  digitization projects follow this process Developing the reproduction process MARAC Conference October 30 2009 We Do
We developed a simple, but effective workflow application in-house This asks for workflow management with a user-friendly application For all projects, at any moment, it has to be clear: We Do MARAC Conference October 30 2009 What the current status is of each to digitize unit Where each unit can be located What current and succeeding tasks are to be performed on each unit Developing the reproduction process
In the following slides we focus on the weekly production of 10.000 scans in the digitizing on request service We developed a simple, but effective workflow application in-house This asks for workflow management with a user-friendly application For all projects, at any moment, it has to be clear: We Do MARAC Conference October 30 2009 What the current status is of each to be digitized unit Where each unit can be located What current and succeeding tasks are to be performed on each unit Developing the reproduction process
All public files can be requested for digitization via the findings aids in the Archiefbank Just by clicking on the “digitize” button Production of 10.000 scans on weekly basis 1. Requesting for digitization MARAC Conference October 30 2009 We Do
A unit to be digitized must be able to be identified at each step of the handling process The units therefore get a unique meaningless  order number An order number is provided by the metadata management system and is the basis for In practice: all units to be digitized get an  order ticket 2. Providing ordernumbers MARAC Conference October 30 2009 Communication with the digitizer Scanning Assigning filenames Registration of filenames Billing by digitizer We Do
A unit to be digitized must be able to be identified at each step of the handling process The units therefore get a unique meaningless  order number An order number is provided by the metadata management system and is the basis for In practice: all units to be digitized get an  order ticket 2. Providing ordernumbers MARAC Conference October 30 2009 Communication with the digitizer Scanning Assigning filenames Registration of filenames Billing by digitizer We Do
The workflow system generates a list of all originals to asses from the repositories The list is sorted on repository / shelf to make retrieval efficient We Do 3. Assessing the originals MARAC Conference October 30 2009
MARAC Conference October 30 2009 All assessed originals are stored in a special room In this room all checks are executed We Do 4. Checking the originals
MARAC Conference October 30 2009 Information about the originals in our management  systems is not always complete If an item falls into one of these categories the request is rejected B. Condition of the material A rough check of the originals takes place A. Content We Do 4. Checking the originals Copyrights Publicity Privacy Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization
MARAC Conference October 30 2009 Information about the originals in our management  systems is not always complete If an item falls into one of these categories the request is rejected B. Condition of the material A rough check of the originals takes place A. Content We Do 4. Checking the originals Copyrights Publicity Privacy Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization
Material preparation is limited to the most minimal We Do 4. Checking the originals MARAC Conference October 30 2009 Staples are being removed as a rule Small reparations are executed by our restoration employees The sequence of the originals as found in the repository is not checked or altered We Do We don’t  The originals are not numbered
But this is only true when the numbering tallies exact, because: Numbering the originals has one advantage: We Do Not number the originals MARAC Conference October 30 2009 The completeness of the scans (compared to the originals) can be guaranteed Numbers that are assigned double lead to illogical end numbers (100 scans: scan 100 has been numbered as 99) Experiments with numbering in practice learned that faultless numbering can not be realized A missing number in a sequence of scans leads to the conclusion that there is one original that has not been scanned
Securing completeness can be realized by other means: We Do MARAC Conference October 30 2009 Comparing scans to originals 1:1 after digitization Scanning the originals twice  # scans =  365 # scans =  365 Low quality  High quality master files Not number the originals
For secure transport, special flight cases are used We Do 5. Transport MARAC Conference October 30 2009
It has to be perfectly clear which filenames this should be After scanning the scan operator or data manager has to assign filenames to the scans Because, when the meaning changes, filenames should change too As a rule filenames contain no meaningful information We Do 6. / 7. Scanning and assigning filenames MARAC Conference October 30 2009 Filenames are the key between scans  metadata
Assigning filenames at City Archives Amsterdam MARAC Conference October 30 2009 Customer request Management systems First 6#:  ordernr Last 6#:  serial nr Order ticket Filename Scanning the order A20758000001 A20758000002 A20758000003 Range A20758000001 – A20758999999 Archive 195 File 836 Order: A20758 A20758000004 A20758000005 Scan report A20758000001 A20758000002 A20758000003 A20758000004 A20758000005 12 digits Registration filenames Import
An application from which all checks can be executed is in development Scans and metadata are checked efficiently Where possible checks are automated  10. 11. Checking scans and metadata Basic checks We Do MARAC Conference October 30 2009 Depends on project Completeness Script Filenames Visual check production scans Visual check reference scans Quality scans Jhove File format validity MD-5 checksum comparison Data integrity Virus checker Viruses Method Check
After import the “order for digitization” of each unit is completed After approving of all checks, scans and metadata are imported into the management systems The imports are executed automatically, on basis of scripts and standard protocols  for file transfer 13. 14. Import metadata and scans into management systems We Do MARAC Conference October 30 2009
After import the metadata are optimized for the search system For exchange of finding aids we use EAD From any workstation at the archive, directly via the CMS of the website The website is hosted from an external location Metadata are uploaded to the webserver by simple HTTP transfer 18. Import metadata into the website We Do MARAC Conference October 30 2009
Until then scans are transported by use of portable USB harddisks Bandwith of the internet connections at the archive is still too small for direct sFTP  (or suchlike) upload of large quantities of scans to the webserver It seems likely that in the near future this will change 17. Import scans into the website Transport medium We Do MARAC Conference October 30 2009
Derivates for use of thumbnails and zoom / contrast functionality are made After connecting the harddisk to the server the import process starts Some basic checks are executed on the scans Import 17. Import scans into the website We Do MARAC Conference October 30 2009
MARAC Conference October 30 2009 The requester can decide whether to buy scans or not When both scans and metadata have been imported, automatically an e-mail is sent to the requester for digitization This email contains a link to the finding aid and thumbnails on the website Request complete! The happy customer: We Do
MARAC Conference October 30 2009 The requester can decide whether to buy scans or not When both scans and metadata have been imported, automatically an email is send  to the requester for digitization This email contains a link to the finding aid and thumbnails on the website Request completed We Do The happy customer:
MARAC Conference October 30 2009
MARAC Conference October 30 2009
Mission accomplished ,[object Object],[object Object],[object Object],[object Object],Accomplished MARAC Conference October 30 2009
Government satisfied  Government MARAC Conference October 30 2009 531.143 77.298  2009 (3/4) 538.483 118.312 2008 520.483 92.678 2007 512.592 17.958 2006 224.050 25.014 2002 40.048 26.598 1998   27.738 1992   29.788 1988   24.027 1982 Website Reading rooms Year Visitors
Management Management satisfied MARAC Conference October 30 2009 €  200,000 Digitization projects €  52,000 Webservices €  140,000 Digitsation on request Costs Archiefbank (2008) €  40,000 Government €  330,350 Project funding €  100,000 Digitsation on request Income Archiefbank (2008)
Customers ,[object Object],[object Object],[object Object],[object Object],Customers satisfied MARAC Conference October 30 2009
Thanks Free scancredits? [email_address] MARAC Conference October 30 2009
 

More Related Content

Similar to You Ask We Scan

Dalibor Jajcevic, SES-Tec, AT (CloudFlow)
 Dalibor Jajcevic, SES-Tec, AT (CloudFlow) Dalibor Jajcevic, SES-Tec, AT (CloudFlow)
Dalibor Jajcevic, SES-Tec, AT (CloudFlow)I4MS_eu
 
Convencion internacional nov.09
Convencion internacional nov.09Convencion internacional nov.09
Convencion internacional nov.09mellamoSanti
 
Convencion internacional nov.09
Convencion internacional nov.09Convencion internacional nov.09
Convencion internacional nov.09mellamoSanti
 
Nikos Kexagias
Nikos KexagiasNikos Kexagias
Nikos KexagiasI4MS_eu
 
When smarter procurement means...
When smarter procurement  means...When smarter procurement  means...
When smarter procurement means...OpusCapita
 
Putting it into practice: a digitisation case study
Putting it into practice: a digitisation case studyPutting it into practice: a digitisation case study
Putting it into practice: a digitisation case studyJISC Digital Media
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
3D Printing & Additive Manufacturing: Production Application Initiative 2014
3D Printing & Additive Manufacturing: Production Application Initiative 20143D Printing & Additive Manufacturing: Production Application Initiative 2014
3D Printing & Additive Manufacturing: Production Application Initiative 2014marketinglbcg
 
Rubik's Futuro Cube User Manual Documentation Plan
Rubik's Futuro Cube User Manual Documentation PlanRubik's Futuro Cube User Manual Documentation Plan
Rubik's Futuro Cube User Manual Documentation Planrmmoldovan
 
Arnaud Zoubir, Alphanov, FR (Lashare)
 Arnaud Zoubir, Alphanov, FR (Lashare) Arnaud Zoubir, Alphanov, FR (Lashare)
Arnaud Zoubir, Alphanov, FR (Lashare)I4MS_eu
 
Future Applications
Future ApplicationsFuture Applications
Future Applicationsgiustiniano
 
OpenStack Ansible for private cloud at Kaidee
OpenStack Ansible for private cloud at KaideeOpenStack Ansible for private cloud at Kaidee
OpenStack Ansible for private cloud at KaideeJirayut Nimsaeng
 
Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...
Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...
Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...IMPACT Centre of Competence
 
Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...
Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...
Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...Alfredo BALLESTER FERNÁNDEZ
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beammarcgonzalez.eu
 

Similar to You Ask We Scan (20)

Dalibor Jajcevic, SES-Tec, AT (CloudFlow)
 Dalibor Jajcevic, SES-Tec, AT (CloudFlow) Dalibor Jajcevic, SES-Tec, AT (CloudFlow)
Dalibor Jajcevic, SES-Tec, AT (CloudFlow)
 
Convencion internacional nov.09
Convencion internacional nov.09Convencion internacional nov.09
Convencion internacional nov.09
 
Convencion internacional nov.09
Convencion internacional nov.09Convencion internacional nov.09
Convencion internacional nov.09
 
Nikos Kexagias
Nikos KexagiasNikos Kexagias
Nikos Kexagias
 
When smarter procurement means...
When smarter procurement  means...When smarter procurement  means...
When smarter procurement means...
 
Putting it into practice: a digitisation case study
Putting it into practice: a digitisation case studyPutting it into practice: a digitisation case study
Putting it into practice: a digitisation case study
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
3D Printing & Additive Manufacturing: Production Application Initiative 2014
3D Printing & Additive Manufacturing: Production Application Initiative 20143D Printing & Additive Manufacturing: Production Application Initiative 2014
3D Printing & Additive Manufacturing: Production Application Initiative 2014
 
3 dpuk14 brochure
3 dpuk14 brochure3 dpuk14 brochure
3 dpuk14 brochure
 
Rubik's Futuro Cube User Manual Documentation Plan
Rubik's Futuro Cube User Manual Documentation PlanRubik's Futuro Cube User Manual Documentation Plan
Rubik's Futuro Cube User Manual Documentation Plan
 
Arnaud Zoubir, Alphanov, FR (Lashare)
 Arnaud Zoubir, Alphanov, FR (Lashare) Arnaud Zoubir, Alphanov, FR (Lashare)
Arnaud Zoubir, Alphanov, FR (Lashare)
 
Future Applications
Future ApplicationsFuture Applications
Future Applications
 
OpenStack Ansible for private cloud at Kaidee
OpenStack Ansible for private cloud at KaideeOpenStack Ansible for private cloud at Kaidee
OpenStack Ansible for private cloud at Kaidee
 
Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...
Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...
Datech2014 - Session 3 - User-driven correction of OCR errors. Combining crow...
 
CD Autumn 2017 - main slides
CD Autumn 2017 - main slidesCD Autumn 2017 - main slides
CD Autumn 2017 - main slides
 
Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...
Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...
Low-cost data-driven 3D reconstruction and its applications @ 6th ICE 3D Body...
 
IEEE ICC 2015 Advance Program
IEEE ICC 2015 Advance ProgramIEEE ICC 2015 Advance Program
IEEE ICC 2015 Advance Program
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Visitor information
Visitor informationVisitor information
Visitor information
 

More from Kate Theimer

What is the Role of the Professional Archivist in the Evolving Archival Space?
What is the Role of the Professional Archivist in the Evolving Archival Space?What is the Role of the Professional Archivist in the Evolving Archival Space?
What is the Role of the Professional Archivist in the Evolving Archival Space?Kate Theimer
 
The Future of Archives is Participatory: A New Mission for Archives
The Future of Archives is Participatory: A New Mission for ArchivesThe Future of Archives is Participatory: A New Mission for Archives
The Future of Archives is Participatory: A New Mission for ArchivesKate Theimer
 
Participatory Archives: Something Old, Something New
Participatory Archives: Something Old, Something NewParticipatory Archives: Something Old, Something New
Participatory Archives: Something Old, Something NewKate Theimer
 
What's Next? A Preview of Archives 3.0
What's Next? A Preview of Archives 3.0What's Next? A Preview of Archives 3.0
What's Next? A Preview of Archives 3.0Kate Theimer
 
Exploring the Participatory Archives
Exploring the Participatory ArchivesExploring the Participatory Archives
Exploring the Participatory ArchivesKate Theimer
 
Extras! You Ask We Scan
Extras! You Ask We ScanExtras! You Ask We Scan
Extras! You Ask We ScanKate Theimer
 
Archives 2.0: An Introduction
Archives 2.0: An IntroductionArchives 2.0: An Introduction
Archives 2.0: An IntroductionKate Theimer
 
RBMS Web 2.0 Workshop
RBMS Web 2.0 WorkshopRBMS Web 2.0 Workshop
RBMS Web 2.0 WorkshopKate Theimer
 
Dickinson Ref Blog Educause2009
Dickinson Ref Blog   Educause2009Dickinson Ref Blog   Educause2009
Dickinson Ref Blog Educause2009Kate Theimer
 

More from Kate Theimer (9)

What is the Role of the Professional Archivist in the Evolving Archival Space?
What is the Role of the Professional Archivist in the Evolving Archival Space?What is the Role of the Professional Archivist in the Evolving Archival Space?
What is the Role of the Professional Archivist in the Evolving Archival Space?
 
The Future of Archives is Participatory: A New Mission for Archives
The Future of Archives is Participatory: A New Mission for ArchivesThe Future of Archives is Participatory: A New Mission for Archives
The Future of Archives is Participatory: A New Mission for Archives
 
Participatory Archives: Something Old, Something New
Participatory Archives: Something Old, Something NewParticipatory Archives: Something Old, Something New
Participatory Archives: Something Old, Something New
 
What's Next? A Preview of Archives 3.0
What's Next? A Preview of Archives 3.0What's Next? A Preview of Archives 3.0
What's Next? A Preview of Archives 3.0
 
Exploring the Participatory Archives
Exploring the Participatory ArchivesExploring the Participatory Archives
Exploring the Participatory Archives
 
Extras! You Ask We Scan
Extras! You Ask We ScanExtras! You Ask We Scan
Extras! You Ask We Scan
 
Archives 2.0: An Introduction
Archives 2.0: An IntroductionArchives 2.0: An Introduction
Archives 2.0: An Introduction
 
RBMS Web 2.0 Workshop
RBMS Web 2.0 WorkshopRBMS Web 2.0 Workshop
RBMS Web 2.0 Workshop
 
Dickinson Ref Blog Educause2009
Dickinson Ref Blog   Educause2009Dickinson Ref Blog   Educause2009
Dickinson Ref Blog Educause2009
 

Recently uploaded

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 

Recently uploaded (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 

You Ask We Scan

  • 1. You ask, we scan MARAC Conference October 30 2009 The Amsterdam City Archives and the Archiefbank
  • 2.
  • 3.
  • 4.
  • 5. BUT… less visitors each year MARAC Conference October 30 2009 17.958 2006 25.014 2002 26.598 1998 27.738 1992 29.788 1988 24.027 1982 Reading rooms Year Visitors
  • 6. Archives are dusty MARAC Conference October 30 2009
  • 7. MORE webvisitors And… MARAC Conference October 30 2009 512.592 17.958 2006 224.050 25.014 2002 40.048 26.598 1998   27.738 1992   29.788 1988   24.027 1982 Website Reading rooms Year Visitors
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20. You ask Scanning on customer’s request, economic principles, technical issues and work process We Scan MARAC Conference October 30 2009
  • 21. You ask We Scan MARAC Conference October 30 2009 We Store We Do Scanning on customer’s request, economic principles Image quality and workflow principles Compression and filesize Workflow, tools and practical issues
  • 22. Q. How long does it take to scan it all? MARAC Conference October 30 2009 1 feet = 2.000 scans Production = 10.000 scans a week A. 406 years Will this be our ultimate solution? Q. How many scans can be made from 20 miles of archives? A. 739.200.001 scans
  • 23. The user doesn’t commit to anything by placing a request, but neither does the archive You ask We let our users set priorities in digitization In principle all requests are honored, unless We speak of a request for digitization and not of an order MARAC Conference October 30 2009 1. Scanning at customer’s request It can not be digitized for material reasons Copyright material Disclosure restrictions apply All archive files can be requested for digitization via the online the finding aids
  • 24. Costs for purchasing scans are equal for all users (the more you buy, the cheaper it gets) Scans available are integrated in the online finding aids All scans made are available for all users The requester is not obliged to buy all scans MARAC Conference October 30 2009 You ask 1. Scanning at customer’s request
  • 25. Customers think a low price is important This means that costs for producing and storing scans have to be as low as possible Archival research easily runs into the use of dozens to hundreds of documents You ask The price of an ordinary copy in our reading room should be the benchmark MARAC Conference October 30 2009 2. Low costs 100 scans should not cost $ 100 The costs when purchasing scans online should be competitive with travel costs when visiting our reading room
  • 26. This asks for a streamlined, efficiently organized work process You ask Digitization takes time, but research should not have to be planned weeks ahead Delivery time in a scanning on request service should be as short as possible MARAC Conference October 30 2009 3. Fast delivery Aim is a delivery time of 2 – 3 weeks
  • 27. An efficiently organized work process Low incidental and structural costs You ask MARAC Conference October 30 2009 Conclusion If we can make sure that All finding aids can be selected for digitization by users The scans are delivered in short time For low costs it can be stated that we have no backlog in digitizing and the objective that the customer is able to consult digitized item has been achieved We need:
  • 28. Besides scanning on request projects are based on: In this presentation the focus is on large scale digitization at customer’s request We scan However, scanning on request is only a part of all digitization that takes place in the archives MARAC Conference October 30 2009 Digitization at the Amsterdam City Archives in general Grant money (often on specific topics, like WWII) Selections of photographs, drawings etc for the Imagebank (Beeldbank) Cooperation with Amsterdam district councils and services
  • 29. Goals of digitization projects vary from access to substitution of the originals In every project quality standard and method are set, depending on purpose and type of material For all projects we have one workflow We always work on project basis We scan MARAC Conference October 30 2009 Digitization at the Amsterdam City Archives in general
  • 30. Experience shows that a constant production of 10.000 scans (at cutomer’s request) each week is achievable This way tasks can be planned best and deployment of staff is most efficient We scan 1. At large scale the more scans being made, the lower the price per scan Large scale production is a prerequisite in order to keep production costs as low as possible MARAC Conference October 30 2009 2. With a constant production Large scale production can only be organized effectively when constant production is assumed
  • 31. Documents that are being digitized in this reproduction process can have the following forms We scan MARAC Conference October 30 2009 Small and large size Bound and loose-leafed entities Card indexes Old and modern material Low and high contrast documents Text alone, text and image together Hybrid forms 3. A broad spectrum of document types
  • 32. Costs for producing and storing scans are determined to a high extent by the quality standard set for the scans Purpose of the scans: archival research using the web, straight from screen or print We scan 4. For archival research from screen or print The higher the standard of quality, the higher the costs will be In order to keep costs low it is prudent to allow the standard of quality follow from the requirement the end user places on the scan Textual information legible in de originals must be legible in the scans MARAC Conference October 30 2009
  • 33. But has no added value for the customer at all A quality higher than that inevitably will push up both incidental and structural costs We scan 4. For archival research from screen or print Specified (basic) quality standard: MARAC Conference October 30 2009 Reproduction of all significant information Reproduction of details which are not part of the textual information is not required
  • 34. We scan MARAC Conference October 30 2009 Scan quality and legibility High quality scan Modified scan (contrast) Optimal tonal range Example: very “light” original Excellent flexibility Poor tonal range Little flexibility Experience in practice learns that what is experienced as being “good legibility” is very personal. We decided to solve this problem with a smart filter in the document viewer. Poor legibility Excellent legibility Which one would you buy?
  • 35. Skimming on the quality of scans (it can be better) is purely an economic decision, not one taken on principle We scan MARAC Conference October 30 2009 4. For archival research from screen or print It does make sense to let the standard of quality follow from the purpose the end-uses places on of the scans Price rates scanning, external partner 0,05 $ Legibility, auto-feed 0,30 – 0,75 $ Legibility 3 – 10 $ High-end Price comparison scanning costs
  • 36. This way damage or loss of the originals is ruled out After digitization the originals can not be requested in the reading room anymore We scan 5. For conservation and security The scans in the scanning on request service are made for the purpose of access / archival research Not as a substitute for the originals Nevertheless, digitization does have a real conservation function MARAC Conference October 30 2009 Conservation of the originals remains the major concern
  • 37. A file can contain one – hundreds of documents We scan By definition the entire file is scanned Never just a selection of pages There are a few reasons for this: MARAC Conference October 30 2009 6. Always complete files The costs for scanning are not so much a factor of quantity, but rather of the manual processing involving in it In the originals or the metadata it has to be indicated which documents are being digitized When shown in the Archiefbank, the user expects completeness When non-scanned pages have to be digitized later, the entire preparation process has to be gone through once again
  • 38. Contracting out of scanning was a logical choice We scan The in-house scan facilities are not designed for large-scale digitizing The complexity of the workflow and material to be scanned calls for Investing only makes sense by very high production, organized on a large scale MARAC Conference October 30 2009 7. Contracting out the scanning to external partners Specialized hard- and software Specialized set-ups Knowledge Very complex technical infrastructure
  • 39. This calls for intensive collaboration Also, the workflows of archive and digitizer have to dovetail We scan There are many scanning companies Most do have experience in bulk processing But not in this degree of complexity and diversity MARAC Conference October 30 2009 7. Contracting out scanning is more than awarding a contract to a supplier Contracting out the scanning to external partners
  • 40. We use a combination of 1 and 3 We store Storage costs still are considerably high when producing large quantities of scans In order to bring structural costs down file size of the scans has to be as low as possible This can be achieved in three ways MARAC Conference October 30 2009 Scans with a file size as small as possible 1. Skimming on resolution 3. Using (lossless or lossy) compression on the files 2. Skimming on bit depth / amount of colors (only possible in formats like TIFF and PNG)
  • 41. We store Resolution, compression and legibility: an example MARAC Conference October 30 2009 300 dpi, high quility JPEG 200 dpi, low quility JPEG Scans with a file size as small as possible
  • 42. We store Storage of compressed files as master images was “not done” The main arguments where Research after these arguments learned: MARAC Conference October 30 2009 When using lossy compression you’ll loose information Compressed files are more vulnerable (preservation) Even when using strong lossy compression legibility is still guaranteed Compressed files are not more vulnerable to loss then uncompressed files But no compression means: large files  high storage costs Storage of uncompressed files is not necessary Scans with a file size as small as possible
  • 43. MARAC Conference October 30 2009 Comparison between file format, compression, resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
  • 44. TIFF uncompressed MARAC Conference October 30 2009 Comparison between file format, compression, resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
  • 45. JPEG (psd) 10 MARAC Conference October 30 2009 Comparison between file format, compression, resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
  • 46. JPEG (psd) 4 MARAC Conference October 30 2009 Comparison between file format, compression, resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
  • 47. JPEG2000 lossless MARAC Conference October 30 2009 Comparison between file format, compression, resolution and file size Scans with a file size as small as possible We store 55% 6 Tb 12 MB 24 bits 300 dpi Lossless Part 1 JPEG2000 0,5% 59 Gb 120 Kb 24 bits 300 dpi Lossy Part 6 34% 3,7 Tb 7,5 Mb 24 bits 300 dpi Lossy Qua (ps) 12 JPEG 10% 1,1 Tb 2,1 Mb 24 bits 300 dpi Lossy Qua (ps) 10 1,1% 124 Gb 255 Kb 24 bits 200 dpi Lossy Qua (ps) 4 Filesize 3,3 Mb 22,1 Mb Avg Lossy --- Type 15% 100% % 1,6 Tb 11 Tb 500.000 400 dpi 300 dpi Resolution Qua (ps) 10 No Compression 24 bits 24 bits TIFF Color Format
  • 48. We store Comparison storage costs MARAC Conference October 30 2009 Storage of 500.000 images Avg size per scan uncompressed = 22,1 MB Price rate : 1 TB, storage in a controlled e-repository environment on two separate locations, including IT costs $ 7.000 (NLD, nov 2009) Scans with a file size as small as possible (File)size still does matter! $ 420.000 $ 8.680 $ 77.000 $ 770.000 Costs 10 years $ 42.000 $ 868 $ 7.700 $ 77.000 Costs 1 year 6 TB 124 GB 1,1 TB 11 TB Storage JPEG 2000 (part 1, ll) JPEG 4 (200 dpi) JPEG 10 Tiff uncompressed Fileformat
  • 49. Projects with different goals, document types and partners take place at the same time A streamlined, standardized process is indispensable when digitizing on a large scale Guidelines and best practices often take no account of these complex factors and the amount of scans to be produced We developed a process in which large scale and flexibility are starting points All digitization projects follow this process Developing the reproduction process MARAC Conference October 30 2009 We Do
  • 50. We developed a simple, but effective workflow application in-house This asks for workflow management with a user-friendly application For all projects, at any moment, it has to be clear: We Do MARAC Conference October 30 2009 What the current status is of each to digitize unit Where each unit can be located What current and succeeding tasks are to be performed on each unit Developing the reproduction process
  • 51. In the following slides we focus on the weekly production of 10.000 scans in the digitizing on request service We developed a simple, but effective workflow application in-house This asks for workflow management with a user-friendly application For all projects, at any moment, it has to be clear: We Do MARAC Conference October 30 2009 What the current status is of each to be digitized unit Where each unit can be located What current and succeeding tasks are to be performed on each unit Developing the reproduction process
  • 52. All public files can be requested for digitization via the findings aids in the Archiefbank Just by clicking on the “digitize” button Production of 10.000 scans on weekly basis 1. Requesting for digitization MARAC Conference October 30 2009 We Do
  • 53. A unit to be digitized must be able to be identified at each step of the handling process The units therefore get a unique meaningless order number An order number is provided by the metadata management system and is the basis for In practice: all units to be digitized get an order ticket 2. Providing ordernumbers MARAC Conference October 30 2009 Communication with the digitizer Scanning Assigning filenames Registration of filenames Billing by digitizer We Do
  • 54. A unit to be digitized must be able to be identified at each step of the handling process The units therefore get a unique meaningless order number An order number is provided by the metadata management system and is the basis for In practice: all units to be digitized get an order ticket 2. Providing ordernumbers MARAC Conference October 30 2009 Communication with the digitizer Scanning Assigning filenames Registration of filenames Billing by digitizer We Do
  • 55. The workflow system generates a list of all originals to asses from the repositories The list is sorted on repository / shelf to make retrieval efficient We Do 3. Assessing the originals MARAC Conference October 30 2009
  • 56. MARAC Conference October 30 2009 All assessed originals are stored in a special room In this room all checks are executed We Do 4. Checking the originals
  • 57. MARAC Conference October 30 2009 Information about the originals in our management systems is not always complete If an item falls into one of these categories the request is rejected B. Condition of the material A rough check of the originals takes place A. Content We Do 4. Checking the originals Copyrights Publicity Privacy Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization
  • 58. MARAC Conference October 30 2009 Information about the originals in our management systems is not always complete If an item falls into one of these categories the request is rejected B. Condition of the material A rough check of the originals takes place A. Content We Do 4. Checking the originals Copyrights Publicity Privacy Items that are in such a condition that digitizing or transport could cause damage, or are packaged in a way that scanning in conventional set-ups is not possible do not qualify for standard way of digitization
  • 59. Material preparation is limited to the most minimal We Do 4. Checking the originals MARAC Conference October 30 2009 Staples are being removed as a rule Small reparations are executed by our restoration employees The sequence of the originals as found in the repository is not checked or altered We Do We don’t The originals are not numbered
  • 60. But this is only true when the numbering tallies exact, because: Numbering the originals has one advantage: We Do Not number the originals MARAC Conference October 30 2009 The completeness of the scans (compared to the originals) can be guaranteed Numbers that are assigned double lead to illogical end numbers (100 scans: scan 100 has been numbered as 99) Experiments with numbering in practice learned that faultless numbering can not be realized A missing number in a sequence of scans leads to the conclusion that there is one original that has not been scanned
  • 61. Securing completeness can be realized by other means: We Do MARAC Conference October 30 2009 Comparing scans to originals 1:1 after digitization Scanning the originals twice # scans = 365 # scans = 365 Low quality High quality master files Not number the originals
  • 62. For secure transport, special flight cases are used We Do 5. Transport MARAC Conference October 30 2009
  • 63. It has to be perfectly clear which filenames this should be After scanning the scan operator or data manager has to assign filenames to the scans Because, when the meaning changes, filenames should change too As a rule filenames contain no meaningful information We Do 6. / 7. Scanning and assigning filenames MARAC Conference October 30 2009 Filenames are the key between scans metadata
  • 64. Assigning filenames at City Archives Amsterdam MARAC Conference October 30 2009 Customer request Management systems First 6#: ordernr Last 6#: serial nr Order ticket Filename Scanning the order A20758000001 A20758000002 A20758000003 Range A20758000001 – A20758999999 Archive 195 File 836 Order: A20758 A20758000004 A20758000005 Scan report A20758000001 A20758000002 A20758000003 A20758000004 A20758000005 12 digits Registration filenames Import
  • 65. An application from which all checks can be executed is in development Scans and metadata are checked efficiently Where possible checks are automated 10. 11. Checking scans and metadata Basic checks We Do MARAC Conference October 30 2009 Depends on project Completeness Script Filenames Visual check production scans Visual check reference scans Quality scans Jhove File format validity MD-5 checksum comparison Data integrity Virus checker Viruses Method Check
  • 66. After import the “order for digitization” of each unit is completed After approving of all checks, scans and metadata are imported into the management systems The imports are executed automatically, on basis of scripts and standard protocols for file transfer 13. 14. Import metadata and scans into management systems We Do MARAC Conference October 30 2009
  • 67. After import the metadata are optimized for the search system For exchange of finding aids we use EAD From any workstation at the archive, directly via the CMS of the website The website is hosted from an external location Metadata are uploaded to the webserver by simple HTTP transfer 18. Import metadata into the website We Do MARAC Conference October 30 2009
  • 68. Until then scans are transported by use of portable USB harddisks Bandwith of the internet connections at the archive is still too small for direct sFTP (or suchlike) upload of large quantities of scans to the webserver It seems likely that in the near future this will change 17. Import scans into the website Transport medium We Do MARAC Conference October 30 2009
  • 69. Derivates for use of thumbnails and zoom / contrast functionality are made After connecting the harddisk to the server the import process starts Some basic checks are executed on the scans Import 17. Import scans into the website We Do MARAC Conference October 30 2009
  • 70. MARAC Conference October 30 2009 The requester can decide whether to buy scans or not When both scans and metadata have been imported, automatically an e-mail is sent to the requester for digitization This email contains a link to the finding aid and thumbnails on the website Request complete! The happy customer: We Do
  • 71. MARAC Conference October 30 2009 The requester can decide whether to buy scans or not When both scans and metadata have been imported, automatically an email is send to the requester for digitization This email contains a link to the finding aid and thumbnails on the website Request completed We Do The happy customer:
  • 74.
  • 75. Government satisfied Government MARAC Conference October 30 2009 531.143 77.298  2009 (3/4) 538.483 118.312 2008 520.483 92.678 2007 512.592 17.958 2006 224.050 25.014 2002 40.048 26.598 1998   27.738 1992   29.788 1988   24.027 1982 Website Reading rooms Year Visitors
  • 76. Management Management satisfied MARAC Conference October 30 2009 € 200,000 Digitization projects € 52,000 Webservices € 140,000 Digitsation on request Costs Archiefbank (2008) € 40,000 Government € 330,350 Project funding € 100,000 Digitsation on request Income Archiefbank (2008)
  • 77.
  • 78. Thanks Free scancredits? [email_address] MARAC Conference October 30 2009
  • 79.  

Editor's Notes

  1. I will take you a step deeper into the workprocess of creating large amounts of scans. I’ll tell you about starting points and choises we have made and I’ll show you the result of some research we have done, particularyu towards image quality and filesize. Also, I’ll sohw you some back- and frontoffice tools from our webstie.
  2. I will take you a step deeper into the workprocess of creating large amounts of scans. I’ll tell you about starting points and choises we have made and I’ll show you the result of some research we have done, particularyu towards image quality and filesize. Also, I’ll sohw you some back- and frontoffice tools from our webstie.