SlideShare a Scribd company logo
1 of 31
Download to read offline
A new class of primary source?
Prospects and pitfalls in using
web archives for research
Dr Peter Webster
Webster Research and Consulting
@pj_webster
A lost archive?
A lost archive?
A lost archive?
The web its own archive?
Open UK Web Archive 2004-13 comparison.
@anjacks0n http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on-
the-web-after-10-years-of-archiving-.html
Disappearing predictably
Disappearing unpredictably
.. But safe and sound in the archive
Reasons to care about web
archiving
• education and research
• enforcement of the law
• public accountability
Three archives for the UK
Temporal scope Content scope Access
Open UKWA 2004-present Selective
(14.7k)
Online
Legal Deposit
UKWA
2013-present Comprehensive
(for UK)
Onsite
JISC UK
Domain Dataset
1996-2013 Comprehensive
(for .uk)
Index only
JISC UK Web Domain Dataset
(1996-2013)
• copy of Internet Archive holdings for .uk
• bought by JISC, held by British Library
• 60TB of data
• no direct access to content
• prototype search at webarchive.org.uk/shine
• derived datasets in public domain
Web archives for NI and RoI
Temporal scope Content scope Access
NLI Web
Archive
2011-present Selective (542) Online
PRONI Web
Archive
2010-present Selective (115) Online
Legal Deposit
UKWA
2013-present Comprehensive
(for UK!)
Onsite (TCD)
Ways to use the archived web
• URL search -> single page
• Full-text search -> single page
• Visualisation -> trend -> page
Changing aesthetics
gov.ie, captured by archive.org, 15 August 2000
Vanished content
southtippcoco.ie, captured by archive.org, 4 Jan 2014
Visualising trends: Ngram
http://www.webarchive.org.uk/shine/graph
Ways to use the archived web
• URL search -> single page
• Full-text search -> single page
• Visualisation -> trend -> page
• Direct access to WARC
• Derived datasets
• API access
Derived datasets from the BL
From JISC UK Web Domain Dataset (1996-
2010)
• File format profile
• Geo-index
• Crawled URL Index (CDX)
• Host Link Graph
Public domain at data.webarchive.org.uk
Creationism ?
• non-evolutionary account of human
origins
• modern
• a long history
• a feature of some parts of evangelicalism
• (anti-evolutionism, Intelligent Design)
The creationist web :
three questions
A justified conspiracy theory about
marginalisation of creationist voices?
A real danger or a moral panic (Truth in
Science) ?
The web as friend of the marginalised
opinion?
http://peterwebster.me/2014/11/18/reading-creationism-in-the-web-archive/
UK Host Link Graph (1996-
2010)
2008 | newsimg.bbc.co.uk | youtube.com | 45
2008 | archbishopofyork.org.uk | flickr.com | 1
2002 | secularism.org.uk | geocities.com | 1
Public domain at: data.webarchive.org.uk
Approach
• selection of key UK creationist sites
• extraction of all unique inbound referring
hosts for 1996-2010
• inspection and classification
Caveats on method
• partial nature of the dataset
• benchmarking of absolute numbers
• selective sample
• what does a link mean, anyway ?
• not looking at number of linking resources
per host
Truth in Science: how
significant?
• only 46 unique inbound hosts
• … of which many were other creationists
or secularist sites
• two churches, one school
• fewer in 2010 than 2007
Conclusions
• a utopian dream unfulfilled
• a genuine moral panic
• a justified conspiracy theory
Next steps (1)
1. NI the 'creationism capital of Europe'?
(Analysis of:
• links from GB organisations to NI
creationists
• links from NI to RoW)
2. What about creationism in .ie ?
Next steps (2)
Project: EU National Web Spheres
• part of resaw.eu
• investigating the nature of a national web
domain
• .. including the interlinking between them
• case study I: Anglican & Presbyterian
churches in Ireland, north and south
Web Archives for Historians
@HistWebArchives , http://webarchivehistorians.org/
Questions ?
Peter Webster
peter@websterresearchconsulting.com
@pj_webster
peterwebster.me
websterresearchconsulting.com

More Related Content

What's hot

Working with the archived web, 1996-2013
Working with the archived web, 1996-2013 Working with the archived web, 1996-2013
Working with the archived web, 1996-2013 labsbl
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"Nick Sheppard
 
Disrupting Academic Publishing
Disrupting Academic PublishingDisrupting Academic Publishing
Disrupting Academic PublishingBrian Hole
 
Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab Jisc
 
Sustainable support for OER at the University of Edinburgh
Sustainable support for OER at the University of EdinburghSustainable support for OER at the University of Edinburgh
Sustainable support for OER at the University of EdinburghNick Sheppard
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesEric Meyer
 
Disrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to UniversitiesDisrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to UniversitiesBrian Hole
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaNick Sheppard
 
Disrupting academic publishing: a future role for libraries
Disrupting academic publishing: a future role for librariesDisrupting academic publishing: a future role for libraries
Disrupting academic publishing: a future role for librariesBrian Hole
 
Open Science: A New Publisher Perspective
Open Science: A New Publisher PerspectiveOpen Science: A New Publisher Perspective
Open Science: A New Publisher PerspectiveBrian Hole
 
The Ubiquity Partner Network: Enabling Library-Based Publishing
The Ubiquity Partner Network: Enabling Library-Based PublishingThe Ubiquity Partner Network: Enabling Library-Based Publishing
The Ubiquity Partner Network: Enabling Library-Based PublishingBrian Hole
 
Publishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourPublishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourBrian Hole
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open DataBrian Hole
 
Open Access is Just the Beginning: Disrupting Publishing
Open Access is Just the Beginning: Disrupting PublishingOpen Access is Just the Beginning: Disrupting Publishing
Open Access is Just the Beginning: Disrupting PublishingBrian Hole
 
Reflections on Open Educational Practice ​
Reflections on Open Educational Practice ​Reflections on Open Educational Practice ​
Reflections on Open Educational Practice ​Nick Sheppard
 
Ouls Open Meeting Slides
Ouls Open Meeting SlidesOuls Open Meeting Slides
Ouls Open Meeting SlidesRichard Ovenden
 
The Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingThe Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingBrian Hole
 

What's hot (20)

Open.Ed
Open.EdOpen.Ed
Open.Ed
 
Working with the archived web, 1996-2013
Working with the archived web, 1996-2013 Working with the archived web, 1996-2013
Working with the archived web, 1996-2013
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"
 
Disrupting Academic Publishing
Disrupting Academic PublishingDisrupting Academic Publishing
Disrupting Academic Publishing
 
Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab
 
Sustainable support for OER at the University of Edinburgh
Sustainable support for OER at the University of EdinburghSustainable support for OER at the University of Edinburgh
Sustainable support for OER at the University of Edinburgh
 
EOSC and the role of Research Libraries, Jeannette Frey
EOSC and the role of Research Libraries, Jeannette FreyEOSC and the role of Research Libraries, Jeannette Frey
EOSC and the role of Research Libraries, Jeannette Frey
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archives
 
Open Researh Europe, Michael Markie
Open Researh Europe, Michael MarkieOpen Researh Europe, Michael Markie
Open Researh Europe, Michael Markie
 
Disrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to UniversitiesDisrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to Universities
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
 
Disrupting academic publishing: a future role for libraries
Disrupting academic publishing: a future role for librariesDisrupting academic publishing: a future role for libraries
Disrupting academic publishing: a future role for libraries
 
Open Science: A New Publisher Perspective
Open Science: A New Publisher PerspectiveOpen Science: A New Publisher Perspective
Open Science: A New Publisher Perspective
 
The Ubiquity Partner Network: Enabling Library-Based Publishing
The Ubiquity Partner Network: Enabling Library-Based PublishingThe Ubiquity Partner Network: Enabling Library-Based Publishing
The Ubiquity Partner Network: Enabling Library-Based Publishing
 
Publishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourPublishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising Rigour
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
Open Access is Just the Beginning: Disrupting Publishing
Open Access is Just the Beginning: Disrupting PublishingOpen Access is Just the Beginning: Disrupting Publishing
Open Access is Just the Beginning: Disrupting Publishing
 
Reflections on Open Educational Practice ​
Reflections on Open Educational Practice ​Reflections on Open Educational Practice ​
Reflections on Open Educational Practice ​
 
Ouls Open Meeting Slides
Ouls Open Meeting SlidesOuls Open Meeting Slides
Ouls Open Meeting Slides
 
The Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingThe Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for Publishing
 

Viewers also liked

Archives in an Online World Creating LSE Digital Library
Archives in an Online WorldCreating LSE Digital LibraryArchives in an Online WorldCreating LSE Digital Library
Archives in an Online World Creating LSE Digital LibraryALISS
 
Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...
Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...
Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...Peter Stockinger
 
Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01
Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01
Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01Julio Pari
 
Donnelly providing reference services in archives
Donnelly providing reference services in archivesDonnelly providing reference services in archives
Donnelly providing reference services in archivesJennie Graves
 
ASK THE USERS: EXPECTATIONS, BEHAVIORS AND SATISFACTION OF ONLINE ARCHIVE...
ASK THE  USERS:  EXPECTATIONS, BEHAVIORS  AND SATISFACTION OF  ONLINE ARCHIVE...ASK THE  USERS:  EXPECTATIONS, BEHAVIORS  AND SATISFACTION OF  ONLINE ARCHIVE...
ASK THE USERS: EXPECTATIONS, BEHAVIORS AND SATISFACTION OF ONLINE ARCHIVE...Pierluigi Feliciati
 
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival RecordsInternet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival Recordsmwe400
 
Introduction to archival research 2015
Introduction to archival research 2015Introduction to archival research 2015
Introduction to archival research 2015Humphrey Southall
 

Viewers also liked (7)

Archives in an Online World Creating LSE Digital Library
Archives in an Online WorldCreating LSE Digital LibraryArchives in an Online WorldCreating LSE Digital Library
Archives in an Online World Creating LSE Digital Library
 
Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...
Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...
Archiving Culture in the Digital Age. The "Audiovisual Research Archive" (ARA...
 
Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01
Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01
Resumenestudiodecomercioelectrnicoecommercedaylima2011 110719083629-phpapp01
 
Donnelly providing reference services in archives
Donnelly providing reference services in archivesDonnelly providing reference services in archives
Donnelly providing reference services in archives
 
ASK THE USERS: EXPECTATIONS, BEHAVIORS AND SATISFACTION OF ONLINE ARCHIVE...
ASK THE  USERS:  EXPECTATIONS, BEHAVIORS  AND SATISFACTION OF  ONLINE ARCHIVE...ASK THE  USERS:  EXPECTATIONS, BEHAVIORS  AND SATISFACTION OF  ONLINE ARCHIVE...
ASK THE USERS: EXPECTATIONS, BEHAVIORS AND SATISFACTION OF ONLINE ARCHIVE...
 
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival RecordsInternet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
 
Introduction to archival research 2015
Introduction to archival research 2015Introduction to archival research 2015
Introduction to archival research 2015
 

Similar to Prospects and pitfalls in using web archives for research

Peter webster interrogating the archived uk web
Peter webster   interrogating the archived uk webPeter webster   interrogating the archived uk web
Peter webster interrogating the archived uk webDigital History
 
Reading creationism in the web archive: a utopian dream, a conspiracy theory ...
Reading creationism in the web archive: a utopian dream, a conspiracy theory ...Reading creationism in the web archive: a utopian dream, a conspiracy theory ...
Reading creationism in the web archive: a utopian dream, a conspiracy theory ...Peter Webster
 
Digital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesDigital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesPeter Webster
 
Digital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesDigital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesPeter Webster
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Andy Jackson
 
GLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introductionGLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introductionBarry Norton
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunitiesAhmed AlSum
 
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchIIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchAmerican Art Collaborative
 
UBC Library Web Archiving 2016
UBC Library Web Archiving 2016UBC Library Web Archiving 2016
UBC Library Web Archiving 2016Larissa Ringham
 
Building a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly useBuilding a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly useALISS
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWGlen Robson
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Anna Perricci
 
The meaning and value of web archives for research
The meaning and value of web archives for researchThe meaning and value of web archives for research
The meaning and value of web archives for researchPeter Webster
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK WebScott A. Hale
 
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...Nuno Freire
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 

Similar to Prospects and pitfalls in using web archives for research (20)

Peter webster interrogating the archived uk web
Peter webster   interrogating the archived uk webPeter webster   interrogating the archived uk web
Peter webster interrogating the archived uk web
 
Reading creationism in the web archive: a utopian dream, a conspiracy theory ...
Reading creationism in the web archive: a utopian dream, a conspiracy theory ...Reading creationism in the web archive: a utopian dream, a conspiracy theory ...
Reading creationism in the web archive: a utopian dream, a conspiracy theory ...
 
Digital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesDigital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issues
 
Digital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesDigital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issues
 
Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27Digging into the Web Archive at the British Library 2014-11-27
Digging into the Web Archive at the British Library 2014-11-27
 
Scaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-YuScaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-Yu
 
GLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introductionGLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introduction
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchIIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
 
UBC Library Web Archiving 2016
UBC Library Web Archiving 2016UBC Library Web Archiving 2016
UBC Library Web Archiving 2016
 
Building a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly useBuilding a Collection of the Historical UK Web for scholarly use
Building a Collection of the Historical UK Web for scholarly use
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct Collaborative Web Archiving with Ivy Plus / Borrow Direct
Collaborative Web Archiving with Ivy Plus / Borrow Direct
 
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
 
International Digital Library Initiatives
International Digital Library InitiativesInternational Digital Library Initiatives
International Digital Library Initiatives
 
The meaning and value of web archives for research
The meaning and value of web archives for researchThe meaning and value of web archives for research
The meaning and value of web archives for research
 
GLAMorous LOD
GLAMorous LODGLAMorous LOD
GLAMorous LOD
 
Ancient History of the UK Web
Ancient History of the UK WebAncient History of the UK Web
Ancient History of the UK Web
 
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 

Recently uploaded

Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideVarun Mithran
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.Tortogel
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfappinfoedgeca
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresencePC Doctors NET
 
Free scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsFree scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsrahman018755
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsrahman018755
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?Linksys Velop Login
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appscristianmanaila2
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfOndejSur
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirtrahman018755
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebJie Liau
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxChloeMeadows1
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkklolsDocherty
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyDamar Juniarto
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsrahman018755
 

Recently uploaded (17)

Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
Free scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsFree scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirts
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 

Prospects and pitfalls in using web archives for research

  • 1. A new class of primary source? Prospects and pitfalls in using web archives for research Dr Peter Webster Webster Research and Consulting @pj_webster
  • 2.
  • 6. The web its own archive? Open UK Web Archive 2004-13 comparison. @anjacks0n http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on- the-web-after-10-years-of-archiving-.html
  • 9. .. But safe and sound in the archive
  • 10. Reasons to care about web archiving • education and research • enforcement of the law • public accountability
  • 11. Three archives for the UK Temporal scope Content scope Access Open UKWA 2004-present Selective (14.7k) Online Legal Deposit UKWA 2013-present Comprehensive (for UK) Onsite JISC UK Domain Dataset 1996-2013 Comprehensive (for .uk) Index only
  • 12. JISC UK Web Domain Dataset (1996-2013) • copy of Internet Archive holdings for .uk • bought by JISC, held by British Library • 60TB of data • no direct access to content • prototype search at webarchive.org.uk/shine • derived datasets in public domain
  • 13. Web archives for NI and RoI Temporal scope Content scope Access NLI Web Archive 2011-present Selective (542) Online PRONI Web Archive 2010-present Selective (115) Online Legal Deposit UKWA 2013-present Comprehensive (for UK!) Onsite (TCD)
  • 14. Ways to use the archived web • URL search -> single page • Full-text search -> single page • Visualisation -> trend -> page
  • 15. Changing aesthetics gov.ie, captured by archive.org, 15 August 2000
  • 16. Vanished content southtippcoco.ie, captured by archive.org, 4 Jan 2014
  • 18. Ways to use the archived web • URL search -> single page • Full-text search -> single page • Visualisation -> trend -> page • Direct access to WARC • Derived datasets • API access
  • 19. Derived datasets from the BL From JISC UK Web Domain Dataset (1996- 2010) • File format profile • Geo-index • Crawled URL Index (CDX) • Host Link Graph Public domain at data.webarchive.org.uk
  • 20. Creationism ? • non-evolutionary account of human origins • modern • a long history • a feature of some parts of evangelicalism • (anti-evolutionism, Intelligent Design)
  • 21. The creationist web : three questions A justified conspiracy theory about marginalisation of creationist voices? A real danger or a moral panic (Truth in Science) ? The web as friend of the marginalised opinion? http://peterwebster.me/2014/11/18/reading-creationism-in-the-web-archive/
  • 22. UK Host Link Graph (1996- 2010) 2008 | newsimg.bbc.co.uk | youtube.com | 45 2008 | archbishopofyork.org.uk | flickr.com | 1 2002 | secularism.org.uk | geocities.com | 1 Public domain at: data.webarchive.org.uk
  • 23. Approach • selection of key UK creationist sites • extraction of all unique inbound referring hosts for 1996-2010 • inspection and classification
  • 24. Caveats on method • partial nature of the dataset • benchmarking of absolute numbers • selective sample • what does a link mean, anyway ? • not looking at number of linking resources per host
  • 25. Truth in Science: how significant? • only 46 unique inbound hosts • … of which many were other creationists or secularist sites • two churches, one school • fewer in 2010 than 2007
  • 26.
  • 27. Conclusions • a utopian dream unfulfilled • a genuine moral panic • a justified conspiracy theory
  • 28. Next steps (1) 1. NI the 'creationism capital of Europe'? (Analysis of: • links from GB organisations to NI creationists • links from NI to RoW) 2. What about creationism in .ie ?
  • 29. Next steps (2) Project: EU National Web Spheres • part of resaw.eu • investigating the nature of a national web domain • .. including the interlinking between them • case study I: Anglican & Presbyterian churches in Ireland, north and south
  • 30. Web Archives for Historians @HistWebArchives , http://webarchivehistorians.org/