Your SlideShare is downloading. ×
0
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
F1 hadar miller__israeli_internet_archive-nli
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

F1 hadar miller__israeli_internet_archive-nli

114

Published on

Hadar Miller, National Library of Israel: …

Hadar Miller, National Library of Israel:
“ArchioNet” Israel Internet Domain Archive

pptx file of the presentation at the
EVA/Minerva Jerusalem International Conference on Digitisation of Culture,
Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013
http://www.digital-heritage.org.il
Presentations available at: http://2013.minervaisrael.org.il

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
114
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. “ArchioNet” Israeli Internet Domain Archive
  • 2. Agenda o NLI Digital Library Infrastructure o “ArchioNet” Project Scope o Technical Issues o The Project in Numbers o Legislation o What’s Next
  • 3. NLI Digital Library Infrastructure
  • 4. “ArchioNet” Project scope • • Why do we need this project ? What do we harvest? • • • Phase A : “Way back machine” in NLI Only , “Archionet” Only. Phase B : Over the Web , Cross Reference Discovery. When we started? • • • Phase b : Hebrew characters sites How to enable accessibility: • • • Phase A : .IL web site Phase A : 2 full crawl annually started September 2013 Phase B : additional 4 subject based crawl annually. Where to execute the harvest ? • • Phase A : NLI with Internet Archive. Phase B : NLI Infrastructure
  • 5. Technical Issues • • • Which Crawler ( version ) to use ? Cataloguing and Search tool What to harvest ? • • • • • • • Seeds is needed Depth of a site Robots.txt The Deep Web How to store and preserve a WARC file Virus Detection System Architecture
  • 6. The Project in Numbers • ~220K web sits • 0.5 Giga byte/Site • ~100 Tera / Harvest • Avg page lifetime ~ 100 days • 2 Full Harvest - Annually
  • 7. Legislation • Can NLI Harvest • Where is it accessible ? • Intellectual Properties • What can/should we block ?
  • 8. Thank You 
  • 9. Back

×