Digitization Project PlanningWhat work needs to be done;How it will be done (according to which standards,specifications, best practices);Who should do the work (and where);How long the work will take;How much it will cost, both to "resource" theinfrastructure and to do the content conversion http://www.ncecho.org/dig/guide_1planning.shtml http://www.nyu.edu/its/humanities/ninchguide/II/
Components of Digitization ProjectsPlanning and Project Management Selection File Formats – master & access derivatives Conservation TreatmentReformattingMetadata Design & CreationQuality ControlWeb Platform Open source vs. proprietary systemsPreservation
Selection CriteriaShould they be digitized? Research ValueMay they be digitized? Copyright statusCan they be digitized? Condition Format http://www.nedcc.org/resources/leaflets/6Reformatting/06Prese rvationAndSelection.php http://www.dlib.org/dlib/september09/ooghe/09ooghe.html
Digitization StandardsTechnical Standards Federal Agency Digitization Guidelines Initiative (FADGI) http://www.digitizationguidelines.gov/ NARA California Digital Library (CDL) http://www.cdlib.org/services/dsc/tools/docs/cdl_gdi_v2.pdf University of Colorado https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf
Web Platform OptionsOpen Source Software OMEKA Greenstone DSpace FedoraProprietary Software Contentdm (OCLC) Luna Insight Digitool
Web Harvesting involves:Identifying and collecting web resourcesProviding search capability for archived webcollectionsManaging and preserving web resources
Web HarvestingThe most common web archiving technique uses webcrawlers to automate the process of collecting webpages. Web crawlers typically view web pages in thesame manner that users with a browser see the Web,and therefore provide a comparatively simplemethod of remotely harvesting web content.
Web Crawling ProblemsRobots exclusion protocol may deny crawlers accessto portions of a website.Large portions of a web site may be hidden in thedeep Web.Crawler traps may cause a crawler to download aninfinite number of pages, so crawlers are usuallyconfigured to limit the number of dynamic pagesthey crawl. Calendars often cause problems for crawlers.
Web Harvesting ResourcesInternational Internet Preservation Consortium http://netpreserve.org/about/index.phpLibrary of Congress http://www.loc.gov/webarchivingArchive-It (Service) www.archive-it.org