Successfully reported this slideshow.

F1 hadar miller__israeli_internet_archive-nli

0

Share

Loading in …3
×
1 of 14
1 of 14

F1 hadar miller__israeli_internet_archive-nli

0

Share

Download to read offline

Hadar Miller, National Library of Israel:
“ArchioNet” Israel Internet Domain Archive

pptx file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il

Hadar Miller, National Library of Israel:
“ArchioNet” Israel Internet Domain Archive

pptx file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

F1 hadar miller__israeli_internet_archive-nli

  1. 1. “ArchioNet” Israeli Internet Domain Archive
  2. 2. Agenda o NLI Digital Library Infrastructure o “ArchioNet” Project Scope o Technical Issues o The Project in Numbers o Legislation o What’s Next
  3. 3. NLI Digital Library Infrastructure
  4. 4. “ArchioNet” Project scope • • Why do we need this project ? What do we harvest? • • • Phase A : “Way back machine” in NLI Only , “Archionet” Only. Phase B : Over the Web , Cross Reference Discovery. When we started? • • • Phase b : Hebrew characters sites How to enable accessibility: • • • Phase A : .IL web site Phase A : 2 full crawl annually started September 2013 Phase B : additional 4 subject based crawl annually. Where to execute the harvest ? • • Phase A : NLI with Internet Archive. Phase B : NLI Infrastructure
  5. 5. Technical Issues • • • Which Crawler ( version ) to use ? Cataloguing and Search tool What to harvest ? • • • • • • • Seeds is needed Depth of a site Robots.txt The Deep Web How to store and preserve a WARC file Virus Detection System Architecture
  6. 6. The Project in Numbers • ~220K web sits • 0.5 Giga byte/Site • ~100 Tera / Harvest • Avg page lifetime ~ 100 days • 2 Full Harvest - Annually
  7. 7. Legislation • Can NLI Harvest • Where is it accessible ? • Intellectual Properties • What can/should we block ?
  8. 8. Thank You 
  9. 9. Back

×