F1 hadar miller__israeli_internet_archive-nli

420 views

Published on

Hadar Miller, National Library of Israel:
“ArchioNet” Israel Internet Domain Archive

pptx file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
420
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

F1 hadar miller__israeli_internet_archive-nli

  1. 1. “ArchioNet” Israeli Internet Domain Archive
  2. 2. Agenda o NLI Digital Library Infrastructure o “ArchioNet” Project Scope o Technical Issues o The Project in Numbers o Legislation o What’s Next
  3. 3. NLI Digital Library Infrastructure
  4. 4. “ArchioNet” Project scope • • Why do we need this project ? What do we harvest? • • • Phase A : “Way back machine” in NLI Only , “Archionet” Only. Phase B : Over the Web , Cross Reference Discovery. When we started? • • • Phase b : Hebrew characters sites How to enable accessibility: • • • Phase A : .IL web site Phase A : 2 full crawl annually started September 2013 Phase B : additional 4 subject based crawl annually. Where to execute the harvest ? • • Phase A : NLI with Internet Archive. Phase B : NLI Infrastructure
  5. 5. Technical Issues • • • Which Crawler ( version ) to use ? Cataloguing and Search tool What to harvest ? • • • • • • • Seeds is needed Depth of a site Robots.txt The Deep Web How to store and preserve a WARC file Virus Detection System Architecture
  6. 6. The Project in Numbers • ~220K web sits • 0.5 Giga byte/Site • ~100 Tera / Harvest • Avg page lifetime ~ 100 days • 2 Full Harvest - Annually
  7. 7. Legislation • Can NLI Harvest • Where is it accessible ? • Intellectual Properties • What can/should we block ?
  8. 8. Thank You 
  9. 9. Back

×