SlideShare a Scribd company logo
Can we save the web?
WEB ARCHIVING
Vangelis Banos
http://vbanos.gr/
Unconference, 9-10 Δεκεμβρίου 2013
Can we save the web?
• What do you mean?
• What is web archiving;

• The practical use of web archives.
• Making your own web archive.
What is the World Wide Web?

A huge collection of digital documents (websites) which are
stored on special computers (web servers),
interconnected with each other.
What is the World Wide Web?
What is the World Wide Web?
What is the World Wide Web?
What is on the web?

What isn’t on
the web?
Why save the web?
1. More and more items are born digital only material!
2. Some websites contain unique data and valuable
information.
–

Users take action and make important decisions based
on this information.

3. The web is a live record of contemporary:
1.
2.
3.
4.

Society,
Culture,
Science,
Economy.

4. Responsibility to preserve the web.
5. Transparency is promoted by saving the web.
Isn’t the web already safe?
• The answer is: NOT really!
• Websites are in danger:
– Organisations that maintain them stop caring about
them,
– Organisations than maintain them cease to exist,
– Natural disasters destroy computer facilities (fires,
floods, storms, etc)
– Technical problems damage websites (bugs, computer
viruses, backup failures, hardware failures)
– Their data are tampered on purpose!!! for many
reasons (political, financial, crime, etc)
A major blog hosting company was shut down
by the U.S. Authorities
Yahoo GEOCITIES has closed.
Natural disasters cause data center problems
Websites are tampered all the time
Websites are tampered all the time
Does this sound familiar?
Can we save the web?
• What do you mean?
• What is web archiving;

• The practical use of web archives.
• Making your own web archive.
Websites are tampered all the time
Web Archiving

The Internet
Archive has
backups

MTSR 2013, 22 Nov 2013, Thessaloniki

18
WEB ARCHIVING
The process of collecting portions of
the World Wide Web to ensure the
information is preserved in an
archive for future researchers,
historians, and the public.
Challenges
• How it is done technically?
• What should I choose to archive?
– The whole website? some pages? Some files only?

• What do I want to do with the web archive I’m
creating?
• Who will have access?
• Who is the owner of the web archive content?
Archiving web pages is a technical challenge

File(s)

Software

Hardware

RECORD

Generic file archiving operation
Archiving web pages is a technical challenge
File(s)
File(s)
Software

File(s)
File(s)

Software

Hardware

File(s)
Software
File(s)
File(s)

Web archiving operation

Website
How it is done?

• Possible web archiving targets:
–
–
–
–

Government websites, Educational institutions,
People’s suggestions, Currently popular websites,
Popular media, Big companies,
Special events
Web archiving strategies
Who is working on web archiving?

Many important organisations work on
web archiving since 1996.
International Internet Preservation Consortium
• IIPC Members
–
–
–
–
–

National Libraries,
Academic Libraries,
Cultural Organisations,
Universities,
Software development companies

• Web Archiving Timeline
– http://timeline.webarchivists.org/
Obligation of the National Library
• According to UNESCO:
– «a national library is responsible for the
collection and storage of the national
cultural heritage».

• In Greece, accoding to law No.3149/03:
– «publishers or authors (when there is no
publisher) of any printed material, are
obliged to submit three copies of their work
to the National Library of Greece. This
obligation also includes audiovisual and epublishing material».

• What about the Greek web?
Bibliothèque nationale de France
2006: legal deposit extended to
“signs, signals, writings,
images, sounds or messages of
any kind communicated to the
public by electronic means”.

The goal is not to gather the «best of the web»,
but to preserve a collection representative of the web
at a certain date.
Can we save the web?
• What do you mean?
• What is web archiving?

• The practical use of web archives.
• Making your own web archive.
Visiting the Internet Archive
• http://archive.org/
Internet Archive activities
• Key features, browsing, searching.
• Indicative web sites:
– Υπουργείο Παιδείας, 3 Jul 2010,
www.minedu.gov.gr
– Υπουργείο Ανάπτυξης, 21 Dec 2009
http://www.ypoian.gr/
– The White House, 7 Apr 2000,
http://www.whitehouse.gov
– BBC, 11 Sept 2001, http://www.bbc.co.uk/
Visiting Archive-It
• http://archive-it.org/
Archive-It activities
• Key features, browsing, searching, collections.
• Examples:
– Egypt Revolution and politics, American University
in Cairo,
– 2008 Beijing Olympic games,
– Lybian Uprisings, University of Michigan,
– Venice Biennale 2013
Can we save the web?
• What do you mean?
• What is web archiving;

• The practical use of web archives.
• Making your own web archive.
HTTrack website copier
http://www.httrack.com
Making your own web archive
• Using HTTrack software (Open Source)
– Installation
– Practical advice
– Features
– Usage scenarios
• Archive http://2013.futurelibrary.gr/
• Archive http://www.auth.gr/
Things worth considering
• Set Limits
– Filters to define the file types you want to copy.
– Bandwidth limits & Connection limits to avoid overloading the
site you are archiving AND avoid saturating your library network.
– Time limits

• Check the size of the files you have downloaded.
• Plan for disk space according to your needs.
• Check target website copyrights. Are you allowed to:
– Archive for personal use?
– Archive for public use in library computers?
– Archive to publish on the web?

• If you are not sure, please ask the website owner before
beginning web archiving.
Scenario: create your own mini web
archive in your library on a shoestring.
• Equipment:
– Typical Windows computer with the biggest possible hard
disk. (The more ΤΒ, the better).
– Equal backup disk (e.g. External USB hard disk).
– DSL Internet connection.
– HTTRACK open source software

• Select important local websites.
• Get permissions from website owners if necessary.
• Setup a regular web archiving schedule (e.g. Once per
month).
• Provide information and access to the web archive in
your library’s local computers for the public.
Can we save the web?

YES WE CAN!
• Questions?
• Thank you for your attention 
• Contact:
– Web: http://vbanos.gr
– Email: vbanos@gmail.com
– Twitter: @vbanos

More Related Content

Similar to Can you save the web? Web Archiving!

The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
Essam Obaid
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria
 
Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019
Anna Perricci
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
Cliff Landis
 
Ariadne overview
Ariadne overviewAriadne overview
Ariadne overview
ariadnenetwork
 
SiteStory 2013
SiteStory  2013SiteStory  2013
SiteStory 2013
milbala
 
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
lljohnston
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
Ahmed AlSum
 
Digital library softaware greenstone & dsapce
Digital library softaware greenstone & dsapceDigital library softaware greenstone & dsapce
Digital library softaware greenstone & dsapce
S.N,D.T Women's University
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
National Library of Australia
 
Web 3
Web 3Web 3
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
Cheryl Tanicala-Roldan
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and Potential
Daniel Gomes
 
Save This Book
Save This BookSave This Book
Save This Book
Peter Brantley
 
Digital Archives on a Dime
Digital Archives on a DimeDigital Archives on a Dime
Digital Archives on a Dime
Jason Henderson
 
Tel Vortrag
Tel VortragTel Vortrag
Tel Vortrag
Patrick Danowski
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Future Perfect 2012
 
Slides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive SectorsSlides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive Sectors
Anna Perricci
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Biblioteca Nacional de España
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
MARAC Bethlehem PC
 

Similar to Can you save the web? Web Archiving! (20)

The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Ariadne overview
Ariadne overviewAriadne overview
Ariadne overview
 
SiteStory 2013
SiteStory  2013SiteStory  2013
SiteStory 2013
 
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Digital library softaware greenstone & dsapce
Digital library softaware greenstone & dsapceDigital library softaware greenstone & dsapce
Digital library softaware greenstone & dsapce
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Web 3
Web 3Web 3
Web 3
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
 
Web Archiving – Lessons and Potential
 Web Archiving – Lessons and Potential Web Archiving – Lessons and Potential
Web Archiving – Lessons and Potential
 
Save This Book
Save This BookSave This Book
Save This Book
 
Digital Archives on a Dime
Digital Archives on a DimeDigital Archives on a Dime
Digital Archives on a Dime
 
Tel Vortrag
Tel VortragTel Vortrag
Tel Vortrag
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Slides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive SectorsSlides for Web Archiving in the Heritage and Archive Sectors
Slides for Web Archiving in the Heritage and Archive Sectors
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 

More from Vangelis Banos

Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03
Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03
Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03
Vangelis Banos
 
Υπερδιαύγεια - Αναζήτηση στα δημόσια δεδομένα
Υπερδιαύγεια - Αναζήτηση στα δημόσια δεδομέναΥπερδιαύγεια - Αναζήτηση στα δημόσια δεδομένα
Υπερδιαύγεια - Αναζήτηση στα δημόσια δεδομένα
Vangelis Banos
 
BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...
BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...
BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...
Vangelis Banos
 
Αποθηκεύεται το διαδίκτυο; Web Archiving!
Αποθηκεύεται το διαδίκτυο; Web Archiving!Αποθηκεύεται το διαδίκτυο; Web Archiving!
Αποθηκεύεται το διαδίκτυο; Web Archiving!
Vangelis Banos
 
The theory and practice of Website Archivability
The theory and practice of Website ArchivabilityThe theory and practice of Website Archivability
The theory and practice of Website Archivability
Vangelis Banos
 
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
Vangelis Banos
 
ΥπερΔιαύγεια
ΥπερΔιαύγειαΥπερΔιαύγεια
ΥπερΔιαύγεια
Vangelis Banos
 
The Hellenic Aggregator - Overview, procedures & the cooperation with Europeana
The Hellenic Aggregator - Overview, procedures & the cooperation with EuropeanaThe Hellenic Aggregator - Overview, procedures & the cooperation with Europeana
The Hellenic Aggregator - Overview, procedures & the cooperation with Europeana
Vangelis Banos
 
Η Ιστορία της Μετρολογίας
Η Ιστορία της ΜετρολογίαςΗ Ιστορία της Μετρολογίας
Η Ιστορία της Μετρολογίας
Vangelis Banos
 
Ο κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας Μετρολογίας
Ο κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας ΜετρολογίαςΟ κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας Μετρολογίας
Ο κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας Μετρολογίας
Vangelis Banos
 
Heterogeneity in european digital libraries, the europeana challenge
Heterogeneity in european digital libraries, the europeana challengeHeterogeneity in european digital libraries, the europeana challenge
Heterogeneity in european digital libraries, the europeana challenge
Vangelis Banos
 
Επιτυχημένα παραδείγματα διαλειτουργικότητας σε ελληνικά αποθετήρια και σχε...
Επιτυχημένα παραδείγματα διαλειτουργικότητας  σε ελληνικά αποθετήρια  και σχε...Επιτυχημένα παραδείγματα διαλειτουργικότητας  σε ελληνικά αποθετήρια  και σχε...
Επιτυχημένα παραδείγματα διαλειτουργικότητας σε ελληνικά αποθετήρια και σχε...
Vangelis Banos
 
Η τεχνική υποδομή του εθνικού συσσωρευτή
Η τεχνική υποδομή του εθνικού συσσωρευτήΗ τεχνική υποδομή του εθνικού συσσωρευτή
Η τεχνική υποδομή του εθνικού συσσωρευτήVangelis Banos
 

More from Vangelis Banos (13)

Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03
Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03
Website Archivability - Library of Congress NDIIPP Presentation 2015/06/03
 
Υπερδιαύγεια - Αναζήτηση στα δημόσια δεδομένα
Υπερδιαύγεια - Αναζήτηση στα δημόσια δεδομέναΥπερδιαύγεια - Αναζήτηση στα δημόσια δεδομένα
Υπερδιαύγεια - Αναζήτηση στα δημόσια δεδομένα
 
BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...
BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...
BlogForever Crawler: Techniques and algorithms to harvest modern weblogs Pres...
 
Αποθηκεύεται το διαδίκτυο; Web Archiving!
Αποθηκεύεται το διαδίκτυο; Web Archiving!Αποθηκεύεται το διαδίκτυο; Web Archiving!
Αποθηκεύεται το διαδίκτυο; Web Archiving!
 
The theory and practice of Website Archivability
The theory and practice of Website ArchivabilityThe theory and practice of Website Archivability
The theory and practice of Website Archivability
 
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
 
ΥπερΔιαύγεια
ΥπερΔιαύγειαΥπερΔιαύγεια
ΥπερΔιαύγεια
 
The Hellenic Aggregator - Overview, procedures & the cooperation with Europeana
The Hellenic Aggregator - Overview, procedures & the cooperation with EuropeanaThe Hellenic Aggregator - Overview, procedures & the cooperation with Europeana
The Hellenic Aggregator - Overview, procedures & the cooperation with Europeana
 
Η Ιστορία της Μετρολογίας
Η Ιστορία της ΜετρολογίαςΗ Ιστορία της Μετρολογίας
Η Ιστορία της Μετρολογίας
 
Ο κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας Μετρολογίας
Ο κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας ΜετρολογίαςΟ κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας Μετρολογίας
Ο κόσμος των μικρών & των μεγάλων μέσα από το βλέμμα της κας Μετρολογίας
 
Heterogeneity in european digital libraries, the europeana challenge
Heterogeneity in european digital libraries, the europeana challengeHeterogeneity in european digital libraries, the europeana challenge
Heterogeneity in european digital libraries, the europeana challenge
 
Επιτυχημένα παραδείγματα διαλειτουργικότητας σε ελληνικά αποθετήρια και σχε...
Επιτυχημένα παραδείγματα διαλειτουργικότητας  σε ελληνικά αποθετήρια  και σχε...Επιτυχημένα παραδείγματα διαλειτουργικότητας  σε ελληνικά αποθετήρια  και σχε...
Επιτυχημένα παραδείγματα διαλειτουργικότητας σε ελληνικά αποθετήρια και σχε...
 
Η τεχνική υποδομή του εθνικού συσσωρευτή
Η τεχνική υποδομή του εθνικού συσσωρευτήΗ τεχνική υποδομή του εθνικού συσσωρευτή
Η τεχνική υποδομή του εθνικού συσσωρευτή
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Can you save the web? Web Archiving!

  • 1. Can we save the web? WEB ARCHIVING Vangelis Banos http://vbanos.gr/ Unconference, 9-10 Δεκεμβρίου 2013
  • 2. Can we save the web? • What do you mean? • What is web archiving; • The practical use of web archives. • Making your own web archive.
  • 3. What is the World Wide Web? A huge collection of digital documents (websites) which are stored on special computers (web servers), interconnected with each other.
  • 4. What is the World Wide Web?
  • 5. What is the World Wide Web?
  • 6. What is the World Wide Web?
  • 7. What is on the web? What isn’t on the web?
  • 8. Why save the web? 1. More and more items are born digital only material! 2. Some websites contain unique data and valuable information. – Users take action and make important decisions based on this information. 3. The web is a live record of contemporary: 1. 2. 3. 4. Society, Culture, Science, Economy. 4. Responsibility to preserve the web. 5. Transparency is promoted by saving the web.
  • 9. Isn’t the web already safe? • The answer is: NOT really! • Websites are in danger: – Organisations that maintain them stop caring about them, – Organisations than maintain them cease to exist, – Natural disasters destroy computer facilities (fires, floods, storms, etc) – Technical problems damage websites (bugs, computer viruses, backup failures, hardware failures) – Their data are tampered on purpose!!! for many reasons (political, financial, crime, etc)
  • 10. A major blog hosting company was shut down by the U.S. Authorities
  • 12. Natural disasters cause data center problems
  • 13. Websites are tampered all the time
  • 14. Websites are tampered all the time
  • 15. Does this sound familiar?
  • 16. Can we save the web? • What do you mean? • What is web archiving; • The practical use of web archives. • Making your own web archive.
  • 17. Websites are tampered all the time
  • 18. Web Archiving The Internet Archive has backups MTSR 2013, 22 Nov 2013, Thessaloniki 18
  • 19. WEB ARCHIVING The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public.
  • 20. Challenges • How it is done technically? • What should I choose to archive? – The whole website? some pages? Some files only? • What do I want to do with the web archive I’m creating? • Who will have access? • Who is the owner of the web archive content?
  • 21. Archiving web pages is a technical challenge File(s) Software Hardware RECORD Generic file archiving operation
  • 22. Archiving web pages is a technical challenge File(s) File(s) Software File(s) File(s) Software Hardware File(s) Software File(s) File(s) Web archiving operation Website
  • 23. How it is done? • Possible web archiving targets: – – – – Government websites, Educational institutions, People’s suggestions, Currently popular websites, Popular media, Big companies, Special events
  • 25. Who is working on web archiving? Many important organisations work on web archiving since 1996.
  • 26. International Internet Preservation Consortium • IIPC Members – – – – – National Libraries, Academic Libraries, Cultural Organisations, Universities, Software development companies • Web Archiving Timeline – http://timeline.webarchivists.org/
  • 27. Obligation of the National Library • According to UNESCO: – «a national library is responsible for the collection and storage of the national cultural heritage». • In Greece, accoding to law No.3149/03: – «publishers or authors (when there is no publisher) of any printed material, are obliged to submit three copies of their work to the National Library of Greece. This obligation also includes audiovisual and epublishing material». • What about the Greek web?
  • 28. Bibliothèque nationale de France 2006: legal deposit extended to “signs, signals, writings, images, sounds or messages of any kind communicated to the public by electronic means”. The goal is not to gather the «best of the web», but to preserve a collection representative of the web at a certain date.
  • 29. Can we save the web? • What do you mean? • What is web archiving? • The practical use of web archives. • Making your own web archive.
  • 30. Visiting the Internet Archive • http://archive.org/
  • 31. Internet Archive activities • Key features, browsing, searching. • Indicative web sites: – Υπουργείο Παιδείας, 3 Jul 2010, www.minedu.gov.gr – Υπουργείο Ανάπτυξης, 21 Dec 2009 http://www.ypoian.gr/ – The White House, 7 Apr 2000, http://www.whitehouse.gov – BBC, 11 Sept 2001, http://www.bbc.co.uk/
  • 33. Archive-It activities • Key features, browsing, searching, collections. • Examples: – Egypt Revolution and politics, American University in Cairo, – 2008 Beijing Olympic games, – Lybian Uprisings, University of Michigan, – Venice Biennale 2013
  • 34. Can we save the web? • What do you mean? • What is web archiving; • The practical use of web archives. • Making your own web archive.
  • 36. Making your own web archive • Using HTTrack software (Open Source) – Installation – Practical advice – Features – Usage scenarios • Archive http://2013.futurelibrary.gr/ • Archive http://www.auth.gr/
  • 37. Things worth considering • Set Limits – Filters to define the file types you want to copy. – Bandwidth limits & Connection limits to avoid overloading the site you are archiving AND avoid saturating your library network. – Time limits • Check the size of the files you have downloaded. • Plan for disk space according to your needs. • Check target website copyrights. Are you allowed to: – Archive for personal use? – Archive for public use in library computers? – Archive to publish on the web? • If you are not sure, please ask the website owner before beginning web archiving.
  • 38. Scenario: create your own mini web archive in your library on a shoestring. • Equipment: – Typical Windows computer with the biggest possible hard disk. (The more ΤΒ, the better). – Equal backup disk (e.g. External USB hard disk). – DSL Internet connection. – HTTRACK open source software • Select important local websites. • Get permissions from website owners if necessary. • Setup a regular web archiving schedule (e.g. Once per month). • Provide information and access to the web archive in your library’s local computers for the public.
  • 39. Can we save the web? YES WE CAN! • Questions? • Thank you for your attention  • Contact: – Web: http://vbanos.gr – Email: vbanos@gmail.com – Twitter: @vbanos