Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

F1 hadar miller__israeli_internet_archive-nli

475 views

Published on

Hadar Miller, National Library of Israel:
“ArchioNet” Israel Internet Domain Archive

pdf file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

F1 hadar miller__israeli_internet_archive-nli

  1. 1. “ArchioNet” Israeli Internet Domain Archive
  2. 2. Agenda o NLI Digital Library Infrastructure o “ArchioNet” Project Scope o Technical Issues o The Project in Numbers o Legislation o What’s Next
  3. 3. NLI Digital Library Infrastructure
  4. 4. “ArchioNet” Project scope • • Why do we need this project ? What do we harvest? • • • Phase A : “Way back machine” in NLI Only , “Archionet” Only. Phase B : Over the Web , Cross Reference Discovery. When we started? • • • Phase b : Hebrew characters sites How to enable accessibility: • • • Phase A : .IL web site Phase A : 2 full crawl annually started September 2013 Phase B : additional 4 subject based crawl annually. Where to execute the harvest ? • • Phase A : NLI with Internet Archive. Phase B : NLI Infrastructure
  5. 5. Technical Issues • • • Which Crawler ( version ) to use ? Cataloguing and Search tool What to harvest ? • • • • • • • Seeds is needed Depth of a site Robots.txt The Deep Web How to store and preserve a WARC file Virus Detection System Architecture
  6. 6. The Project in Numbers • ~220K web sits • 0.5 Giga byte/Site • ~100 Tera / Harvest • Avg page lifetime ~ 100 days • 2 Full Harvest - Annually
  7. 7. Legislation • Can NLI Harvest • Where is it accessible ? • Intellectual Properties • What can/should we block ?
  8. 8. Thank You 
  9. 9. Back

×