SlideShare a Scribd company logo
1 of 21
Download to read offline
Observing Web Archives
Jessica Ogden | Susan Halford | Les carr
The Case for an Ethnographic Study of Web Archiving
Web Science Conference 2017 | Rensselaer Web Science Research Center, RPI | 25-28 June 2017
Corresponding Author: jessica.ogden@soton.ac.uk
• Web archiving as a field of
practice
• Problematising practice and the
field; a theoretical framework
• Methodology and empirical
work
• Preliminary results of a study at
the Internet Archive
Outline
Outline
Web Archiving
https://flic.kr/p/dUt1ir
Sam Hodgson, NYTimes https://nyti.ms/2mxKAGM
Practice
• Daniel Gomes et al. (2011) A Survey
on Web Archiving Initiatives.
• National Digital Stewardship
Alliance (2012) Web Archiving
Survey Report.
• Bailey et al. (2014) Web Archiving in
the United States: A 2013 Survey
• Bailey et al. (2017) Web Archiving in
the United States: A 2016 Survey
Practice
BAILEY ET AL. 2017
• Existing overviews, best practice
documents, surveys are partial
• Web archiving itself is inherently
selective, contingent and ≠ a copy
of the Web(s)
• Need to understand the role and
agency of non/human actors in
process
• Understanding the why, when +
how is important for claims
Problematising
“
We will move from databases to knowledge bases. We
will move, in the language of the post-modernists, to
re-contextualize our activities: we will reorient
ourselves from the content to the context, and from
the end result to the original empowering intent, that
is, from the artifact (the actual record) to the creating
processes behind it, and thus to the actions,
programmes, and functions behind those processes.
-Terry Cook
Framing engagement
• From archival science to the
archival turn
• The archive as generative,
subjective not a ‘view from
nowhere’
• Understanding non/professional
practice, role of politics
• Materiality of knowledge -
practice as labour, tangible
implications of intervention
Role of Theory
• Drawing on ‘information labor’ as
outlined by Downey (2014)
• Labour = the work it takes to
produce and transform web
archives into information sources
• Process of making information
‘useful’; putting it into ‘circulation’
• Human and algorithmic; often
obscure, hidden
Labour
GREGORY J. DOWNEY. 2014. MAKING MEDIA WORK: TIME, SPACE, IDENTITY, AND
LABOR IN THE ANALYSIS OF INFORMATION AND COMMUNICATION INFRASTRUCTURES. IN
MEDIA TECHNOLOGIES: ESSAYS ON COMMUNICATION, MATERIALITY, AND SOCIETY,
TARLETON GILLESPIE, PABLO J. BOCZKOWSKI, AND KIRSTEN A. FOOT (EDS.). MIT
PRESS, CAMBRIDGE, MASSACHUSETTS; LONDON, ENGLAND, 141–165.
• Ethnographic methods were
chosen to document patterns of
work
• ‘Archival ethnography’ (Gracy
2004)
• Understanding socio-cultural
meaning, not a behavioural study
• Draws on extensive body of STS
literature around record creation,
scientific labs
Methodology
• 16 un/semi-structured
(ethnographic) interviews
• Observation records - what was
done, made and used, what was
said
• Documentary sources - wiki,
policies, reports
• Two-tiered consent, 4 weeks
• Thematic analysis (Spradley 1979)
of ‘things informants know’
Methodology
• Non-profit digital library founded
in 1996 by Brewster Kahle
• Headquarters in San Francisco
• Began as mechanism to capture
web pages as by-product of
indexing
• Expanded remit to include books,
audio, film/video, images, docs,
video games and software
The Site
FMCCOWN AT ENGLISH WIKIPEDIA, HTTPS://COMMONS.WIKIMEDIA.ORG/W/
INDEX.PHP?CURID=10700055
2002
2006
External crawling
Self-directed crawling
Directed crawling
Archive-It
Heritrix
| |
Brozzler
|
PythonWayback
|
WaybackMachine
|
AlexaToolbar
|
• Wide Crawls
• Survey Crawls
• Platform/service specific
• Domain Crawls
• Thematic Crawls
• Event-based Crawls
*SCHEMATIC, NOT TO SCALE AND PROBABLY NOT EVEN CHRONOLOGICAL!
Mapping Practice Roles & Activities
‘Librarians’

Engineers

Web Archivists
Engineers

Web Archivists

‘The Crowd’
• User-directed
Web Archival Labour
• Defining selection priorities
• Allocating resources
• Analysing corpus; curating
• Maintaining crawls
Knowledge Work
‣ popularity, novelty, ‘precarity’
‣ staff, domain/host storage limits
‣ link-based analysis, tagging
‣ monitoring activities
Examples
Web Archival Labour
• Breakdown as moments when
contingencies of assemblage are
revealed (Star & Ruhleder 1996)
• Repair and maintenance reveals
‘ethics of care’ afforded to
technologies over time (Jackson
2014)
Breakdown & Repair
‣ crawler traps
‣ missing capture elements
‣ patch crawling
‣ other quality assurance tasks
Examples
Summary & Future Work
• Complex system of knowledge/
maintenance work for prioritising
web archiving
• Archive is leveraging corpus and
crowd for identifying domains
• Developed multiple tagging and
analysis tools for curation
Further examining algorithmic (non-
human) labour
Expand to include other
communities, case studies
Acknowledgements
Web Science Conference 2017 | Rensselaer Web Science Research Center, RPI | 25-28 June 2017
This work was supported by the UK Engineering and Physical Sciences
Research Council and the Web Science Centre for Doctoral Training,
Grant No. EP/G036926/1.
The authors would also like to thank the Internet Archive and staff for
opening their doors and being so generous with their time and feedback.

More Related Content

What's hot

Connected learning week three
Connected learning week threeConnected learning week three
Connected learning week three
Paul Richardson
 
Connected learning week three
Connected learning week threeConnected learning week three
Connected learning week three
Paul Richardson
 

What's hot (19)

Engaging Students with Collaborative Digital Scholarship Projects
Engaging Students with Collaborative Digital Scholarship ProjectsEngaging Students with Collaborative Digital Scholarship Projects
Engaging Students with Collaborative Digital Scholarship Projects
 
Using Disruption to Stay on Course (for Liberal Education)
Using Disruption to Stay on Course (for Liberal Education)Using Disruption to Stay on Course (for Liberal Education)
Using Disruption to Stay on Course (for Liberal Education)
 
Materiality and the digital archive
Materiality and the digital archiveMateriality and the digital archive
Materiality and the digital archive
 
Young and Wired: How today's young tech elite will influence the libraries of...
Young and Wired: How today's young tech elite will influence the libraries of...Young and Wired: How today's young tech elite will influence the libraries of...
Young and Wired: How today's young tech elite will influence the libraries of...
 
Building Liberal Arts Capacities through Digital Social Learning
Building Liberal Arts Capacities through Digital Social LearningBuilding Liberal Arts Capacities through Digital Social Learning
Building Liberal Arts Capacities through Digital Social Learning
 
Digital pedagogy in the Humanities: Models, Keywords, Prototypes
Digital pedagogy in the Humanities: Models, Keywords, PrototypesDigital pedagogy in the Humanities: Models, Keywords, Prototypes
Digital pedagogy in the Humanities: Models, Keywords, Prototypes
 
Disruptive Innovations in Learning Technologies
Disruptive Innovations in Learning Technologies Disruptive Innovations in Learning Technologies
Disruptive Innovations in Learning Technologies
 
DigPedATX demo
DigPedATX demoDigPedATX demo
DigPedATX demo
 
Connected learning week three
Connected learning week threeConnected learning week three
Connected learning week three
 
Connected learning week three
Connected learning week threeConnected learning week three
Connected learning week three
 
What/Where/How is Digital Scholarship?
What/Where/How is Digital Scholarship?What/Where/How is Digital Scholarship?
What/Where/How is Digital Scholarship?
 
Beyond the Scanned Image: A Needs Assessment of Faculty Users of Digital Coll...
Beyond the Scanned Image: A Needs Assessment of Faculty Users of Digital Coll...Beyond the Scanned Image: A Needs Assessment of Faculty Users of Digital Coll...
Beyond the Scanned Image: A Needs Assessment of Faculty Users of Digital Coll...
 
The Future of Teaching the Past: Digital Technologies and History Education i...
The Future of Teaching the Past: Digital Technologies and History Education i...The Future of Teaching the Past: Digital Technologies and History Education i...
The Future of Teaching the Past: Digital Technologies and History Education i...
 
Making Sense of Digital Humanities: a Conversation Starter
Making Sense of Digital Humanities: a Conversation Starter Making Sense of Digital Humanities: a Conversation Starter
Making Sense of Digital Humanities: a Conversation Starter
 
Reconciling online liberal arts
Reconciling online liberal artsReconciling online liberal arts
Reconciling online liberal arts
 
Collaborative Digital Pedagogy for Digital Literacies in Humanities Classrooms
Collaborative Digital Pedagogy for Digital Literacies in Humanities ClassroomsCollaborative Digital Pedagogy for Digital Literacies in Humanities Classrooms
Collaborative Digital Pedagogy for Digital Literacies in Humanities Classrooms
 
Mapping Technology Use for Teaching and Learning at Liberal Arts College
Mapping Technology Use for Teaching and Learning at Liberal Arts CollegeMapping Technology Use for Teaching and Learning at Liberal Arts College
Mapping Technology Use for Teaching and Learning at Liberal Arts College
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015
 
Liberal arts online, March 9, 2012
Liberal arts online, March 9, 2012Liberal arts online, March 9, 2012
Liberal arts online, March 9, 2012
 

Similar to Observing Web Archives: The Case for an Ethnographic Study of Web Archiving

Similar to Observing Web Archives: The Case for an Ethnographic Study of Web Archiving (20)

Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Curating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and realityCurating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and reality
 
Crowdsourcing Science
Crowdsourcing ScienceCrowdsourcing Science
Crowdsourcing Science
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
1-15-14 DuraSpace Solutions Webinar: The National Agenda for Digital Stewards...
1-15-14 DuraSpace Solutions Webinar: The National Agenda for Digital Stewards...1-15-14 DuraSpace Solutions Webinar: The National Agenda for Digital Stewards...
1-15-14 DuraSpace Solutions Webinar: The National Agenda for Digital Stewards...
 
Capturing the Behaviors of the Elusive User: Strategies for Library Ethnography
Capturing the Behaviors of the Elusive User: Strategies for Library EthnographyCapturing the Behaviors of the Elusive User: Strategies for Library Ethnography
Capturing the Behaviors of the Elusive User: Strategies for Library Ethnography
 
Capturing the Behaviors of the Elusive User: Strategies for Library Ethnography
Capturing the Behaviors of the Elusive User: Strategies for Library EthnographyCapturing the Behaviors of the Elusive User: Strategies for Library Ethnography
Capturing the Behaviors of the Elusive User: Strategies for Library Ethnography
 
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
 
Dh presentation helig 2014
Dh presentation helig 2014Dh presentation helig 2014
Dh presentation helig 2014
 
(Nov 2009) Preparing Future Digital Curators
(Nov 2009) Preparing Future Digital Curators(Nov 2009) Preparing Future Digital Curators
(Nov 2009) Preparing Future Digital Curators
 
Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post Implementation
 
Curating Community Digital Collections
Curating Community Digital CollectionsCurating Community Digital Collections
Curating Community Digital Collections
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
 
Final data presentation_clir_july2014
Final data presentation_clir_july2014Final data presentation_clir_july2014
Final data presentation_clir_july2014
 

More from Jessica Ogden

Maintaining the Web: Web Archiving, Labour and the Internet Archive
Maintaining the Web: Web Archiving, Labour and the Internet ArchiveMaintaining the Web: Web Archiving, Labour and the Internet Archive
Maintaining the Web: Web Archiving, Labour and the Internet Archive
Jessica Ogden
 
Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...
Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...
Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...
Jessica Ogden
 

More from Jessica Ogden (11)

Maintaining the Web: Web Archiving, Labour and the Internet Archive
Maintaining the Web: Web Archiving, Labour and the Internet ArchiveMaintaining the Web: Web Archiving, Labour and the Internet Archive
Maintaining the Web: Web Archiving, Labour and the Internet Archive
 
The Day of Archaeology: Archaeologists as Audience? The grassroots creation o...
The Day of Archaeology: Archaeologists as Audience? The grassroots creation o...The Day of Archaeology: Archaeologists as Audience? The grassroots creation o...
The Day of Archaeology: Archaeologists as Audience? The grassroots creation o...
 
Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...
Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...
Opening the ARK: Uncovering the socio-technical evolution of an archaeologica...
 
The Archaeological Recording Kit: An open source solution to project recording
The Archaeological Recording Kit: An open source solution to project recordingThe Archaeological Recording Kit: An open source solution to project recording
The Archaeological Recording Kit: An open source solution to project recording
 
Fasti & Furious: (re)Introducing FASTI Online
Fasti & Furious: (re)Introducing FASTI OnlineFasti & Furious: (re)Introducing FASTI Online
Fasti & Furious: (re)Introducing FASTI Online
 
ArchaeoFusion: New Software for Archaeo-Geophysical Data Processing and Integ...
ArchaeoFusion: New Software for Archaeo-Geophysical Data Processing and Integ...ArchaeoFusion: New Software for Archaeo-Geophysical Data Processing and Integ...
ArchaeoFusion: New Software for Archaeo-Geophysical Data Processing and Integ...
 
Potential Applications for the Nikon iSpace Tracking System in Geophysical Su...
Potential Applications for the Nikon iSpace Tracking System in Geophysical Su...Potential Applications for the Nikon iSpace Tracking System in Geophysical Su...
Potential Applications for the Nikon iSpace Tracking System in Geophysical Su...
 
Archaeological Impact Assessment VS. Rescue Archaeology: Brebemi Project (Italy)
Archaeological Impact Assessment VS. Rescue Archaeology: Brebemi Project (Italy)Archaeological Impact Assessment VS. Rescue Archaeology: Brebemi Project (Italy)
Archaeological Impact Assessment VS. Rescue Archaeology: Brebemi Project (Italy)
 
Highs and Lows: A Resel-based Approach to the Analysis of Data from Geophysic...
Highs and Lows: A Resel-based Approach to the Analysis of Data from Geophysic...Highs and Lows: A Resel-based Approach to the Analysis of Data from Geophysic...
Highs and Lows: A Resel-based Approach to the Analysis of Data from Geophysic...
 
"The Application of GPR and Resistance Tomography by the BSR on Complex Urban...
"The Application of GPR and Resistance Tomography by the BSR on Complex Urban..."The Application of GPR and Resistance Tomography by the BSR on Complex Urban...
"The Application of GPR and Resistance Tomography by the BSR on Complex Urban...
 
GPR and Integrating Geophysical Methods at Portus
GPR and Integrating Geophysical Methods at PortusGPR and Integrating Geophysical Methods at Portus
GPR and Integrating Geophysical Methods at Portus
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Observing Web Archives: The Case for an Ethnographic Study of Web Archiving

  • 1. Observing Web Archives Jessica Ogden | Susan Halford | Les carr The Case for an Ethnographic Study of Web Archiving Web Science Conference 2017 | Rensselaer Web Science Research Center, RPI | 25-28 June 2017 Corresponding Author: jessica.ogden@soton.ac.uk
  • 2. • Web archiving as a field of practice • Problematising practice and the field; a theoretical framework • Methodology and empirical work • Preliminary results of a study at the Internet Archive Outline
  • 5. • Daniel Gomes et al. (2011) A Survey on Web Archiving Initiatives. • National Digital Stewardship Alliance (2012) Web Archiving Survey Report. • Bailey et al. (2014) Web Archiving in the United States: A 2013 Survey • Bailey et al. (2017) Web Archiving in the United States: A 2016 Survey Practice BAILEY ET AL. 2017
  • 6. • Existing overviews, best practice documents, surveys are partial • Web archiving itself is inherently selective, contingent and ≠ a copy of the Web(s) • Need to understand the role and agency of non/human actors in process • Understanding the why, when + how is important for claims Problematising
  • 7. “ We will move from databases to knowledge bases. We will move, in the language of the post-modernists, to re-contextualize our activities: we will reorient ourselves from the content to the context, and from the end result to the original empowering intent, that is, from the artifact (the actual record) to the creating processes behind it, and thus to the actions, programmes, and functions behind those processes. -Terry Cook Framing engagement
  • 8. • From archival science to the archival turn • The archive as generative, subjective not a ‘view from nowhere’ • Understanding non/professional practice, role of politics • Materiality of knowledge - practice as labour, tangible implications of intervention Role of Theory
  • 9. • Drawing on ‘information labor’ as outlined by Downey (2014) • Labour = the work it takes to produce and transform web archives into information sources • Process of making information ‘useful’; putting it into ‘circulation’ • Human and algorithmic; often obscure, hidden Labour GREGORY J. DOWNEY. 2014. MAKING MEDIA WORK: TIME, SPACE, IDENTITY, AND LABOR IN THE ANALYSIS OF INFORMATION AND COMMUNICATION INFRASTRUCTURES. IN MEDIA TECHNOLOGIES: ESSAYS ON COMMUNICATION, MATERIALITY, AND SOCIETY, TARLETON GILLESPIE, PABLO J. BOCZKOWSKI, AND KIRSTEN A. FOOT (EDS.). MIT PRESS, CAMBRIDGE, MASSACHUSETTS; LONDON, ENGLAND, 141–165.
  • 10. • Ethnographic methods were chosen to document patterns of work • ‘Archival ethnography’ (Gracy 2004) • Understanding socio-cultural meaning, not a behavioural study • Draws on extensive body of STS literature around record creation, scientific labs Methodology
  • 11. • 16 un/semi-structured (ethnographic) interviews • Observation records - what was done, made and used, what was said • Documentary sources - wiki, policies, reports • Two-tiered consent, 4 weeks • Thematic analysis (Spradley 1979) of ‘things informants know’ Methodology
  • 12. • Non-profit digital library founded in 1996 by Brewster Kahle • Headquarters in San Francisco • Began as mechanism to capture web pages as by-product of indexing • Expanded remit to include books, audio, film/video, images, docs, video games and software The Site
  • 13.
  • 14.
  • 15.
  • 16. FMCCOWN AT ENGLISH WIKIPEDIA, HTTPS://COMMONS.WIKIMEDIA.ORG/W/ INDEX.PHP?CURID=10700055 2002 2006
  • 17. External crawling Self-directed crawling Directed crawling Archive-It Heritrix | | Brozzler | PythonWayback | WaybackMachine | AlexaToolbar | • Wide Crawls • Survey Crawls • Platform/service specific • Domain Crawls • Thematic Crawls • Event-based Crawls *SCHEMATIC, NOT TO SCALE AND PROBABLY NOT EVEN CHRONOLOGICAL! Mapping Practice Roles & Activities ‘Librarians’ Engineers Web Archivists Engineers Web Archivists ‘The Crowd’ • User-directed
  • 18. Web Archival Labour • Defining selection priorities • Allocating resources • Analysing corpus; curating • Maintaining crawls Knowledge Work ‣ popularity, novelty, ‘precarity’ ‣ staff, domain/host storage limits ‣ link-based analysis, tagging ‣ monitoring activities Examples
  • 19. Web Archival Labour • Breakdown as moments when contingencies of assemblage are revealed (Star & Ruhleder 1996) • Repair and maintenance reveals ‘ethics of care’ afforded to technologies over time (Jackson 2014) Breakdown & Repair ‣ crawler traps ‣ missing capture elements ‣ patch crawling ‣ other quality assurance tasks Examples
  • 20. Summary & Future Work • Complex system of knowledge/ maintenance work for prioritising web archiving • Archive is leveraging corpus and crowd for identifying domains • Developed multiple tagging and analysis tools for curation Further examining algorithmic (non- human) labour Expand to include other communities, case studies
  • 21. Acknowledgements Web Science Conference 2017 | Rensselaer Web Science Research Center, RPI | 25-28 June 2017 This work was supported by the UK Engineering and Physical Sciences Research Council and the Web Science Centre for Doctoral Training, Grant No. EP/G036926/1. The authors would also like to thank the Internet Archive and staff for opening their doors and being so generous with their time and feedback.