SlideShare a Scribd company logo
A survey of web-based art resources with
findings applicable to FARL electronic records
collection development
Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini
Frick Art Reference Library
Deborah Kempe, Chief, Collections Management & Access
Web Survey and Collection Development
Coffee on the terrace
M-LEAD-TWO
Intern enterprises -
"collection assessments, digital resource surveys,
web archiving, provide support for important
consortial programs such as shared resources"
● Brooklyn Museum: Mark Daly, Ronnette Hope,
Project Manager: Emily Atwater
● NYARC Latin American Resources (MOMA):
Ralph Baylor
● FARL: Gretchen Nadasky, Alison Rhonemus
Frick Art Reference Library
In early 2011, the Frick Art Reference Library
and the Thomas J. Watson Library at The
Metropolitan Museum of Art completed a pilot
project to address coordinated collecting of
born-digital auction catalogs using ContentDM
and Archive-It.
FARL web archiving program is situated in Collection Development.
Current plans for website capture include online auction catalogs and art web resources
cataloged by NYARC.
Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction
catalogs.
My project focused on NYARC cataloged websites.
Web Archiving
"The Internet Archive is already doing it.”
Actually, the IA is providing the tools for
other institutions to use in archiving.
ARCHIVE - IT
uses open source tools developed by the
Internet Archive
● Heritrix Web Crawler
● Wayback Interface
● WARC format, an ISO standard
the report and manual checks
Partner and WAYBACK interface
Quality Assurance
• Password protected sites – can not be archived
• Javascript – more complicated implementation
can be difficult to capture and display. Ongoing
area of development.
• Videos -- difficulty with some proprietary formats
• Form and Database driven content --‐ may be
archived using a sitemap or other direct links to the
content.
Evaluating seeds
Robots.txt Blocks
The crawler by default respects all robots.txt files. Check
post--‐crawl reports for blocked seeds or documents
If your site is blocked:
a) Contact the site owner and ask if they will un--‐block
b) Ask your Partner Specialist to turn on “ignore robots”
feature in your account
Notes:
/ denotes single directory seed
subdomains.archive.org (add individually or expand seed)
Site Survey Criteria
● html/flash/pdf
● images
● embedded material
● links
● directories and subdomains
● terms, rights statements and permissions
Obvious ruse
More of the obvious
Sites created without the intention of
being archived are the sites in need of
archiving.
Survey Says
● 257 cataloged entries
● 168 resources are possible to capture
● 82 resources would require more research or
display definite red flags for web archiving.
● PDFs are available for at least some of the
content in 75 resources.
● Flash was an element in 23 resources
● 16 sites used HTML5
● 54 used a CMS like Drupal or WordPress
There were 3 cataloged resources no longer
available on the live web but viewable through
Internet Archive.
Another 2 defunct resources were not available
through Internet Archive.
The main page for one of these lost resources was
available as a snapshot in WAYBACK but the actual
cataloged resource was not available.
Change is Constant
Archive-It Updates:
● Heritrix 1 series to Heritrix 3 series
(February)
● Archive-It 4.8
(May)
Archive-It 4.8
Plans
● Upcoming grants
● Capture of NYARC institution websites
● Include Wayback interface links in
Arcade catalog records
● Continue to identify websites for
capture and implement capture
Conclusions
○ Digital resources not prevalent enough to
reassign current staff
○ Website capture most costly in terms of staff time
○ Copyright continues to be an issue
○ Long term digital preservation needs yet to be
assessed
○ Capture of Frick Collection sites and NYARC will
pose as a challenging test case

More Related Content

Viewers also liked

Portfolio of mierza miranti
Portfolio of mierza mirantiPortfolio of mierza miranti
Portfolio of mierza miranti
Mierza Miranti
 
Ch05 6
Ch05 6Ch05 6
Ch05 6
Rendy Robert
 
Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)
Paulo Freire
 
Receitas 6ªC
Receitas   6ªCReceitas   6ªC
Receitas 6ªC
Misa Di
 
IDC Archiving
IDC ArchivingIDC Archiving
IDC Archiving
arms8586
 
Sonasoft email archiving
Sonasoft email archivingSonasoft email archiving
Sonasoft email archiving
keesnielen
 

Viewers also liked (7)

Portfolio of mierza miranti
Portfolio of mierza mirantiPortfolio of mierza miranti
Portfolio of mierza miranti
 
Ch05 6
Ch05 6Ch05 6
Ch05 6
 
Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)Heroes by Antonio García (6ºc)
Heroes by Antonio García (6ºc)
 
Receitas 6ªC
Receitas   6ªCReceitas   6ªC
Receitas 6ªC
 
IDC Archiving
IDC ArchivingIDC Archiving
IDC Archiving
 
Ds 02 015
Ds 02 015Ds 02 015
Ds 02 015
 
Sonasoft email archiving
Sonasoft email archivingSonasoft email archiving
Sonasoft email archiving
 

Similar to Farl web archiving

Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...
Anna Perricci
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
Anna Perricci
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
The Frick Collection
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
Samantha Norling
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
Essam Obaid
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
nullhandle
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
MARAC Bethlehem PC
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
mherbison
 
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
Liber2012
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Anna Perricci
 
The Commons and Digital Humanities
The Commons and Digital HumanitiesThe Commons and Digital Humanities
The Commons and Digital Humanities
christinadepaolo
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
charper
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Biblioteca Nacional de España
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
Tola Odugbesan
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
The Frick Collection
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 
Spotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resourcesSpotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resources
PaolaMarchionni
 
IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019
Glen Robson
 

Similar to Farl web archiving (20)

Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...Information sharing about Columbia University Library’s recent web archiving ...
Information sharing about Columbia University Library’s recent web archiving ...
 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
How to Face the Challenges of Web Archiving? The Experiences of a Small Libra...
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
The Commons and Digital Humanities
The Commons and Digital HumanitiesThe Commons and Digital Humanities
The Commons and Digital Humanities
 
Of Cataloging & Context
Of Cataloging & ContextOf Cataloging & Context
Of Cataloging & Context
 
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara AubryArchiving the French Web: the BnF web archiving workflow. Sara Aubry
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
 
Internet browsing techniques
Internet browsing techniquesInternet browsing techniques
Internet browsing techniques
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Spotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resourcesSpotlight on the Digital: increase discovery of your digital resources
Spotlight on the Digital: increase discovery of your digital resources
 
IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019IIIF Introduction given in South Africa - 2019
IIIF Introduction given in South Africa - 2019
 

Recently uploaded

Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Recently uploaded (20)

Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Farl web archiving

  • 1. A survey of web-based art resources with findings applicable to FARL electronic records collection development Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini Frick Art Reference Library Deborah Kempe, Chief, Collections Management & Access Web Survey and Collection Development Coffee on the terrace
  • 2. M-LEAD-TWO Intern enterprises - "collection assessments, digital resource surveys, web archiving, provide support for important consortial programs such as shared resources" ● Brooklyn Museum: Mark Daly, Ronnette Hope, Project Manager: Emily Atwater ● NYARC Latin American Resources (MOMA): Ralph Baylor ● FARL: Gretchen Nadasky, Alison Rhonemus
  • 3. Frick Art Reference Library In early 2011, the Frick Art Reference Library and the Thomas J. Watson Library at The Metropolitan Museum of Art completed a pilot project to address coordinated collecting of born-digital auction catalogs using ContentDM and Archive-It.
  • 4. FARL web archiving program is situated in Collection Development. Current plans for website capture include online auction catalogs and art web resources cataloged by NYARC. Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction catalogs. My project focused on NYARC cataloged websites.
  • 5. Web Archiving "The Internet Archive is already doing it.” Actually, the IA is providing the tools for other institutions to use in archiving.
  • 6. ARCHIVE - IT uses open source tools developed by the Internet Archive ● Heritrix Web Crawler ● Wayback Interface ● WARC format, an ISO standard
  • 7.
  • 8. the report and manual checks Partner and WAYBACK interface Quality Assurance
  • 9. • Password protected sites – can not be archived • Javascript – more complicated implementation can be difficult to capture and display. Ongoing area of development. • Videos -- difficulty with some proprietary formats • Form and Database driven content --‐ may be archived using a sitemap or other direct links to the content. Evaluating seeds
  • 10. Robots.txt Blocks The crawler by default respects all robots.txt files. Check post--‐crawl reports for blocked seeds or documents If your site is blocked: a) Contact the site owner and ask if they will un--‐block b) Ask your Partner Specialist to turn on “ignore robots” feature in your account Notes: / denotes single directory seed subdomains.archive.org (add individually or expand seed)
  • 11. Site Survey Criteria ● html/flash/pdf ● images ● embedded material ● links ● directories and subdomains ● terms, rights statements and permissions
  • 13. More of the obvious Sites created without the intention of being archived are the sites in need of archiving.
  • 14. Survey Says ● 257 cataloged entries ● 168 resources are possible to capture ● 82 resources would require more research or display definite red flags for web archiving. ● PDFs are available for at least some of the content in 75 resources. ● Flash was an element in 23 resources ● 16 sites used HTML5 ● 54 used a CMS like Drupal or WordPress
  • 15. There were 3 cataloged resources no longer available on the live web but viewable through Internet Archive. Another 2 defunct resources were not available through Internet Archive. The main page for one of these lost resources was available as a snapshot in WAYBACK but the actual cataloged resource was not available.
  • 16.
  • 17.
  • 18.
  • 19. Change is Constant Archive-It Updates: ● Heritrix 1 series to Heritrix 3 series (February) ● Archive-It 4.8 (May)
  • 21. Plans ● Upcoming grants ● Capture of NYARC institution websites ● Include Wayback interface links in Arcade catalog records ● Continue to identify websites for capture and implement capture
  • 22. Conclusions ○ Digital resources not prevalent enough to reassign current staff ○ Website capture most costly in terms of staff time ○ Copyright continues to be an issue ○ Long term digital preservation needs yet to be assessed ○ Capture of Frick Collection sites and NYARC will pose as a challenging test case