The document discusses challenges and opportunities in web archiving. It outlines the key stages in the web archiving lifecycle including selection of content, harvesting techniques, storage formats and infrastructure, ways to provide access, and the role of community. Specific challenges are discussed such as representing dynamic and social media content, optimizing storage solutions, and addressing limitations of current access interfaces. Opportunities exist in focusing collection efforts on underrepresented regions, leveraging existing archived data, and developing innovative services and tools to support researchers.
Web archiving collaborations: a presentation for colleagues working in the Li...Anna Perricci
These slides were used to support a presentation on web archiving collaborations for colleagues working in the Libraries of the Metropolitan Museum of Art.
Information sharing about Columbia University Library’s recent web archiving ...Anna Perricci
This presentation was given at the 2015 Archive-It partner meeting and contains some highlights from a recent web archiving conference held at Columbia University Libraries. More information about this conference, including presentation slides and videos, can be found on this page: https://library.columbia.edu/bts/web_resources_collection/Conferences/program.html
Collaborative Web Archiving with Ivy Plus / Borrow Direct Anna Perricci
Presentation for Web Archiving Collaboration: New Tools and Models (#cuwarc), which was a conference held at Columbia University Libraries on June 4th, 2015. There are corrections on the slide covering the citation analysis we are doing, which is still currently in progress. Video of this and all presentations on June 4 is expected to be available later in 2015.
Collaboration and Cash: Web Archiving Incentive AwardsAnna Perricci
This presentation was delivered in session 306 at the annual meeting of the Society of American Archivists (#saa15). These slides provide information about and lessons learned from the web archiving incentive awards program. Links provided are to facilitate further learning about the tools mentioned but are not a definitive set of resources about these tools.
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...Anna Perricci
Laura Stokes (Brown University) and Anna Perricci (Columbia University) created these slides for a presentation at the annual congress of the International Association of Music Libraries, Archives and Documentation Centres / IMS (#IAML2015) on June 24, 2015.
Lightning talk on MARC records for the Contemporary Composers Web Archive pre...Anna Perricci
These slides supported a lightning talk on MARC records for the Contemporary Composers Web Archive presented in session 703 at #saa14 (Society of American Archivists 2014)
This webinar slide show was intended to update current Variations Digital Music Library users on the status of the Avalon Media System. Avalon is being developed jointly by the libraries of Indiana University Bloomington and Northwestern University, funded in part by grants from the Institute of Museum and Library Services and Andrew W. Mellon Foundation. This system is intended to eventually replace the Variations Digital Music Library system.
Date: December 10, 2015
Time: 1:30pm - 2:30pm EST
Agenda:
Project overview and status
Demo of current system
Anticipated dates of upcoming releases
Migrating from Variations to Avalon
There will be an opportunity to ask questions.
Web archiving collaborations: a presentation for colleagues working in the Li...Anna Perricci
These slides were used to support a presentation on web archiving collaborations for colleagues working in the Libraries of the Metropolitan Museum of Art.
Information sharing about Columbia University Library’s recent web archiving ...Anna Perricci
This presentation was given at the 2015 Archive-It partner meeting and contains some highlights from a recent web archiving conference held at Columbia University Libraries. More information about this conference, including presentation slides and videos, can be found on this page: https://library.columbia.edu/bts/web_resources_collection/Conferences/program.html
Collaborative Web Archiving with Ivy Plus / Borrow Direct Anna Perricci
Presentation for Web Archiving Collaboration: New Tools and Models (#cuwarc), which was a conference held at Columbia University Libraries on June 4th, 2015. There are corrections on the slide covering the citation analysis we are doing, which is still currently in progress. Video of this and all presentations on June 4 is expected to be available later in 2015.
Collaboration and Cash: Web Archiving Incentive AwardsAnna Perricci
This presentation was delivered in session 306 at the annual meeting of the Society of American Archivists (#saa15). These slides provide information about and lessons learned from the web archiving incentive awards program. Links provided are to facilitate further learning about the tools mentioned but are not a definitive set of resources about these tools.
Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collec...Anna Perricci
Laura Stokes (Brown University) and Anna Perricci (Columbia University) created these slides for a presentation at the annual congress of the International Association of Music Libraries, Archives and Documentation Centres / IMS (#IAML2015) on June 24, 2015.
Lightning talk on MARC records for the Contemporary Composers Web Archive pre...Anna Perricci
These slides supported a lightning talk on MARC records for the Contemporary Composers Web Archive presented in session 703 at #saa14 (Society of American Archivists 2014)
This webinar slide show was intended to update current Variations Digital Music Library users on the status of the Avalon Media System. Avalon is being developed jointly by the libraries of Indiana University Bloomington and Northwestern University, funded in part by grants from the Institute of Museum and Library Services and Andrew W. Mellon Foundation. This system is intended to eventually replace the Variations Digital Music Library system.
Date: December 10, 2015
Time: 1:30pm - 2:30pm EST
Agenda:
Project overview and status
Demo of current system
Anticipated dates of upcoming releases
Migrating from Variations to Avalon
There will be an opportunity to ask questions.
OSDPA: One Body, Many Heads: Preservation and Access From Project HydraAvalon Media System
Presented at the session OSDPA (Open Source Digital Preservation and Access): One Body, Many Heads: Preservation and Access From Project Hydra on October 9, 2014 at the Association of Moving Image Archivists Annual Conference (October 8-11, 2014) by Stefan Elnabli of Northwestern University.
View the recording of Stefan's presentation: http://youtu.be/wAtc-nZeFNk?t=33m1s
Presentada en la Jornada Internacional sobre Archivos Web y Depósito Legal Electrónico, en la Biblioteca Nacional de España (BNE), el día 9 de julio de 2013.
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...Avalon Media System
This presentation was given by Stu Baker and Stefan Elnabli at a 2013 Media Preservation meeting hosted by the Media Preservation Initiative in Bloomington, Indiana.
Islandora Webinar: Highlighting CUHK Chinese Digital CollectionsErin Tripp
The webinar will feature a presentation and Q&A session with Jeff Liu, Digital Services Librarian and Louisa Lam, Head, Research Support and Digital Initiatives at the CUHK Library.
The CUHK Library has curated a collection of over five million digital objects in the past 20 years. It features Chinese literature, culture, arts, politics, society and religion. Until recently, the collection was stored in a broad range of different systems, complicating the discovery of these precious digital assets.
In 2015, librarians at CUHK embarked on a project to find a permanent, single platform for digital content. Objectives of the project included enhanced discoverability, multi-language support (Chinese, Japanese & Korean) and custom development capability to modify display and viewing features that would showcase Chinese literature in its true form.
Islandora met all the functional requirements and more, including support for digital humanities projects and access to a user-driven open source software community.
The CUHK library was also attracted to the vendor services and support available through discoverygarden. We provided advice, support and custom development assistance; contributing to the launch of the digital repository every step of the way.
The repository (http://repository.lib.cuhk.edu.hk) officially launched in February 2016, making the CUHK Library digital initiatives pioneers in Hong Kong.
These slides accompanied a presentation by Dan Gillean and Sara Allain of Artefactual Systems, delivered as part of AtoM Camp Cambridge, a three-day boot camp held at St John's College, Cambridge University, May 9-11, 2017 For more information, see:
https://wiki.accesstomemory.org/Community/Camps/SJC2017
In this session, we took a quick tour of some examples of how AtoM is being implemented by our global community of users. We looked specifically at interesting themes, customizations, or the creative use of existing features such as static pages or repository theming as a way of exploring some of the different ways AtoM can be used. Participants were then invited to come up and show off their AtoM site to the attendees as well.
A presentation on the ways in which digital preservation capability is being embedded within Hydra, given at the 2016 Spring meeting of the international Preservation and Archiving Special Interest Group
Learn about library guides and open source software to create them, presented by Katie Lynn in March 2010 for Get On The Bus Wyoming: http://getonthebuswyoming.wordpress.com/.
This presentation covers the general steps to restoring the US first website from a UNIX backup file. Presentation was given at IIPC GA 2015 at Stanford University.
OSDPA: One Body, Many Heads: Preservation and Access From Project HydraAvalon Media System
Presented at the session OSDPA (Open Source Digital Preservation and Access): One Body, Many Heads: Preservation and Access From Project Hydra on October 9, 2014 at the Association of Moving Image Archivists Annual Conference (October 8-11, 2014) by Stefan Elnabli of Northwestern University.
View the recording of Stefan's presentation: http://youtu.be/wAtc-nZeFNk?t=33m1s
Presentada en la Jornada Internacional sobre Archivos Web y Depósito Legal Electrónico, en la Biblioteca Nacional de España (BNE), el día 9 de julio de 2013.
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...Avalon Media System
This presentation was given by Stu Baker and Stefan Elnabli at a 2013 Media Preservation meeting hosted by the Media Preservation Initiative in Bloomington, Indiana.
Islandora Webinar: Highlighting CUHK Chinese Digital CollectionsErin Tripp
The webinar will feature a presentation and Q&A session with Jeff Liu, Digital Services Librarian and Louisa Lam, Head, Research Support and Digital Initiatives at the CUHK Library.
The CUHK Library has curated a collection of over five million digital objects in the past 20 years. It features Chinese literature, culture, arts, politics, society and religion. Until recently, the collection was stored in a broad range of different systems, complicating the discovery of these precious digital assets.
In 2015, librarians at CUHK embarked on a project to find a permanent, single platform for digital content. Objectives of the project included enhanced discoverability, multi-language support (Chinese, Japanese & Korean) and custom development capability to modify display and viewing features that would showcase Chinese literature in its true form.
Islandora met all the functional requirements and more, including support for digital humanities projects and access to a user-driven open source software community.
The CUHK library was also attracted to the vendor services and support available through discoverygarden. We provided advice, support and custom development assistance; contributing to the launch of the digital repository every step of the way.
The repository (http://repository.lib.cuhk.edu.hk) officially launched in February 2016, making the CUHK Library digital initiatives pioneers in Hong Kong.
These slides accompanied a presentation by Dan Gillean and Sara Allain of Artefactual Systems, delivered as part of AtoM Camp Cambridge, a three-day boot camp held at St John's College, Cambridge University, May 9-11, 2017 For more information, see:
https://wiki.accesstomemory.org/Community/Camps/SJC2017
In this session, we took a quick tour of some examples of how AtoM is being implemented by our global community of users. We looked specifically at interesting themes, customizations, or the creative use of existing features such as static pages or repository theming as a way of exploring some of the different ways AtoM can be used. Participants were then invited to come up and show off their AtoM site to the attendees as well.
A presentation on the ways in which digital preservation capability is being embedded within Hydra, given at the 2016 Spring meeting of the international Preservation and Archiving Special Interest Group
Learn about library guides and open source software to create them, presented by Katie Lynn in March 2010 for Get On The Bus Wyoming: http://getonthebuswyoming.wordpress.com/.
This presentation covers the general steps to restoring the US first website from a UNIX backup file. Presentation was given at IIPC GA 2015 at Stanford University.
Thumbnail Summarization Techniques For Web ArchivesAhmed AlSum
In this presentation, I'm discussing general techniques to summarize web archives timemap to generate thumbnails. The techniques depend on similarity features on the HTML text such as Simhash and DOM tree.
Archiving Web-Based #musetech for Institutional MemorySamantha Norling
Museum websites, blog and social media posts, gallery interactives, dashboards and microsites—these and other web-based content created by museum technologists contain a wealth of information about our institutions. Documenting everything from collections and exhibitions to public programs and staff activities, content created and shared on the web forms a vital part of a museum's institutional memory shared by its staff, audiences, and the communities of which it is a part.
While we'd like to think that web-based content and applications will live forever, the reality is that they often have a predetermined (or worse, unexpectedly shortened) active life on the web. Whether tied to a temporary exhibition or event, superseded by more current content, replaced by newer technologies, or fallen to technical obsolescence, retired web-based content can and should be archived for continued access to information in context.
This session will provide an overview of the web archiving landscape (best practices, available tools and resources, relevant initiatives). Web archiving activities of the Newfields Lab--in collaboration with Newfields Archives--will serve as case study. To date, the Newfields web archives include imamuseum.org, various blogs, the IMA Dashboard, and exhibition-related interactives and microsites--content which now serves a variety of uses as archives.
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Presentada en la Jornada Internacional sobre Archivos Web y Depósito Legal Electrónico, en la Biblioteca Nacional de España (BNE), el día 9 de julio de 2013.
"Creating and Maintaining Web Archives"
Presented by Joanne Archer (University of Maryland), Tessa Fallon (Columbia University), Abbie Grotke (Library of Congress), and Kate Odell (Internet Archive)
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
This is the main slide deck for a workshop at iPRES 2018 on human scale web collecting. A primary focus of the presentation was the use of Webrecorder.io, a free, open source web archiving tool available to all.
A presentation about web archiving projects end-user perspective review, as well about web archiving in Serbia, presented at VIII National conference of National center for digitization, Belgrade, Serbia, April 16, 2009.
IIIF Introduction given in South Africa - 2019Glen Robson
Given as part of the University of KwaZulu-Natal Special Collections: Preservation Conservation Conference 2019: http://campbell.ukzn.ac.za/?q=node/48122
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...datascienceiqss
Scholars Portal, a program of the Ontario Council of University Libraries (OCUL), provides the technical infrastructure to store, preserve, and provide access to shared digital library collections in Ontario - including hosting a local instance of Dataverse since 2011. As part of a national project known as Portage (a project of the Canadian Association of Research Libraries), Scholars Portal is partnering with Artefactual Systems, Dataverse, the University of British Columbia, the University of Alberta, and others, to integrate Dataverse with preservation software Archivematica. When completed, this project will facilitate the long-term preservation of research data according to the Open Archival Information System (OAIS) Reference Model.
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)TimelessFuture
Presentation at symposium “Scholarly Access to Web Archives: Progress, Requirements and Challenges”, IIPC, April 25, 2013 (Ljubljana, Slovenia). This presentation discusses the results of the WebART project’s first year, in which different research disciplines joined forces to tackle the issue of scholarly access to Web archives. It introduces WebARTist, a novel Web archive search interface, and discusses the potential of scholarly research using Web archives, as well as current barriers to success, based on the experiences gained during a pilot project.
Capture All the URLS: First Steps in Web ArchivingKristen Yarmey
Presentation with Judy Silva (Fine & Performing Arts Librarian and Archivist at Slippery Rock University) and Alexis Antracoli (Records Management archivist at Drexel University) at the Pennsylvania Library Association's 2013 annual conference in Seven Springs, Pennsylvania.
Abstract: As higher education embraces new technologies, teaching, learning, research, and record-keeping is increasingly taking place on university websites, on university-related social media pages, and elsewhere on the open web. This dynamic digital content, however, is highly vulnerable to degradation and loss. This session will introduce the concept of web archiving and articulate why it’s important for colleges and universities. Speakers will demonstrate web archiving service Archive-It and then share lessons learned from their institutions’ web archiving initiatives, from unexpected stumbling blocks to strategies for raising funds and support from campus stakeholders.
The University of Illinois uses a locally developed metasearch service, "Easy Search". We have recently added the ability to query the metasearch program as RESTful web service, allowing library content to be promoted to external web pages such as departmental web presences or courseware.
Slides accompanying a brief talk given as part of the Archivematica User Group meeting at #SAA2016, the Society of American Archivists 2016 conference in Atlanta, GA. The user group meeting was held on August 3rd Room 309/310 in the Hilton Atlanta.
These slides offer Archivematica users a brief update on the features included in the current 1.5 release and what's on the roadmap for future releases, as well as discussion of related events and resources such as the first ArchivematiCamp in August, screencasts, and more.
Similar to Web archiving challenges and opportunities (20)
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
4. CCSP Project
• An internal IBM support portal that provides client-facing
audiences a by-client, holistic view of client situations
• Technologies: WebSphere Portal, DB2, deployed on
zLinux machines
5. Responsibilities
• Software Engineer
• Enterprise Applications with J2EE platform technologies for
frontend (Servlets, JSP, Portlet APIs), and backend tasks based on
EJB
• Front-end components based on Web 20 technologies (AJAX
based on dojo 1.0, and Java Script)
• Lotus Sametime (Plugins and Bot development)
• Software engineer team leader
• Support project quality activities
• Lead code review and static analysis activities
6. Responsibilities
• Administrator
• Deploying Portal solutions on WebSphere Portal
• WebSphere Portal Administration for standalone and clustered
environment
• Administration on Linux and Windows OS
• DB2 server administration for single instance and multiple
instances with HADR support
• Customer support team lead
• Leading customer support activities
10. Memento
• Memento is an HTTP
extension to integrate the
Past and the Current
Web
I Jacobs and N Walsh Architecture of the world wide web Technical report, W3C, 2004 http://wwww3org/TR/webarch/
Now
T1
T2
T3
16. Web Archive Life Cycle
Hockx-Yu, H, 2011 The Past Issue of the Web In Proceedings of 3rd International Conference on Web Science pp 1–8
17. Selection
• Decide what to capture
Everything, any domain
National domains
Delegate selection to partners
Users’ favorites
• We studied what is already captured
18. How Much Of The Web Is
Archived?
S. G. Ainsworth, A. AlSum, H. SalahEldeen, M. C.
Weigle, and M. L. Nelson
In Proceedings of the 11th annual international ACM/IEEE
joint conference on Digital libraries, JCDL
'11, Ottawa, Canada 2011
See also: http://arxiv.org/abs/1212.6177
19. Archive categories
We have 3 categories of archives
• Internet Archive (classic interface)
• Search engine
• Other archives
Selection
U
K
U
S
Public Archives, ca. Late 2010 / Early 2011
20. 1000 URIs Ordered by First Observation Date
Selection
See also: http://ws-dl.blogspot.com/2011/06/2011-06-23-how-much-of-web-is-archived.html
22. How Much of the Web is Archived?
It Depends on Which Web…
Selection
Including
SE cache
Excluding
SE Cache
90% 79%
97% 68%
88% 19%
35% 16%
Changes since 2011: no more free SE APIs;
greatly reduced IA quarantine period; 15 public web archives
2013
95%
92%
23%
26%
23. Profiling Web Archive
Coverage For
Top-level Domain And
Content Language
A. AlSum, M. C. Weigle, M. L. Nelson, and H. Van de Sompel
In Proceedings of the 17th International Conference on
Theory and Practice of Digital Libraries, TPDL 2013, 2013
See also: http://arxiv.org/abs/1309.4008
24. Where is it archived?
Selection
IA Internet Archive CAN Library and Archives Canada PO Portuguese Web Archive CZ Archive of the Czech Web
LoC Library of Congress BL British Library CAT Web Archive of Catalonia TW National Taiwan University
IC Icelandic Web Archive UK UK National Library CR Croatian Web Archive AIT Archive It
25. Language Coverage
Selection
IA Internet Archive CAN Library and Archives Canada PO Portuguese Web Archive CZ Archive of the Czech Web
LoC Library of Congress BL British Library CAT Web Archive of Catalonia TW National Taiwan University
IC Icelandic Web Archive UK UK National Library CR Croatian Web Archive AIT Archive It
26. Growth Rate
Selection
IA Internet Archive CAN Library and Archives Canada PO Portuguese Web Archive CZ Archive of the Czech Web
LoC Library of Congress BL British Library CAT Web Archive of Catalonia TW National Taiwan University
IC Icelandic Web Archive UK UK National Library CR Croatian Web Archive AIT Archive It
Borrowed Portuguese
material from IA
Stopped archiving
since 2008
Steady growth
Stopped getting new
URIs, but still crawling
27. Selection Research Output
• Some portions of the web are
not well archived such as India
and Africa.
• Profiling helping us in Memento
query routing.
• IIPC proposal with Herbert Van
de Sompel (LANL) and David
Rosenthal (SUL).
Selection
28. Selection at SUL
• Focus on the missing parts of the Web
• Twitter - Crowdsource:
• UK Web archive: Twittervana
• Internet Memory: Collect URIs from twitter APIs
• VA Tech: CTRNET project
• Stanford Community
• World News collection: 10 news website from each county
• Tools:
Selection
29. Web Archive Life Cycle
Hockx-Yu, H, 2011 The Past Issue of the Web In Proceedings of 3rd International Conference on Web Science pp 1–8
30. Harvesting
• Services
• Archive-It
• WAS @ CDLib
• Dedicated servers
• New tools
See also: http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html
31. Special Harvesting Techniques
• Borrow old materials from other web archives
• Ex Stanford WebBase Project*
• 260 TB
• 7 Billion webpages
Harvesting
*http://www-diglib.stanford.edu/~testbed/doc2/WebBase/
32. Special Harvesting Techniques
• Social Media
• Focus on shared resources in the social media
Harvesting
Hany M SalahEldeen, Michael L Nelson, Losing My Revolution: How Many Resources Shared on Social Media Have Been
Lost?, Proceedings of TPDL 2012
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
33. Special Harvesting Techniques
• SiteStory - Transactional Archive
Harvesting
Justin F Brunelle, Michael L Nelson, Lyudmila Balakireva, Robert Sanderson, Herbert Van de Sompel, Evaluating the SiteStory
Transactional Web Archive With the ApacheBench Tool, Proceedings of TPDL 2013
Sitestory: http://mementoweb.github.io/SiteStory/
34. Harvesting
• Challenges
• Ajax and Web 2.0/3.0
• Streaming Media
• URI challenges
• Mobile
Harvesting
http://blog.dshr.org/2012/05/harvesting-and-preserving-future-web.html
http://netpreserve.org/sites/default/files/resources/OverviewFutureWebWorkshop.pdf
35. Web Archive Life Cycle
Hockx-Yu, H, 2011 The Past Issue of the Web In Proceedings of 3rd International Conference on Web Science pp 1–8
36. Storage (Format)
• Flat files:
• WARC files (ISO standard)
• No-SQL db:
• Hbase at Internet memory*
• Storage at SUL:
• We need to use both
Storage
*Philippe Rigaux, Understanding HBase— The data model, IM technology blog
http://internetmemoryorg/en/indexphp/synapse/understanding_the_hbase_data_model/
41. Accessing Web Archive
• Thumbnail View
• Trade-off between
building the
thumbnail in real time
or pre-building
Also, trade-off
between representing
the thumbnail by URI
or by embedded
binary data Can we
build partial
thumbnail map?
42. Accessing Web Archive
• Title View
• Trade-off between, extracting all the titles and keeping it as a
metadata about the memento and extracting the title from the HTML
content on the real time
Implemented using Simile: http://www.simile-widgets.org/timeline/
43. Accessing Web Archive
• Wayback Machine API
• XML interface for the
list of available
Mementos
44. Accessing Web Archive
• Web Page Snapshot Replay
• URI
rewriting, javascript, a
nd embedded
resources
45. Accessing Web Archive
• Page Completeness Degree
• The completeness
degree could be
calculated on the real
time by using the
preserved HTTP
status for the
embedded resources
See also: http://arxiv.org/abs/1309.5503
46. Accessing Web Archive
• Reconstructing web site
• Current approach is
using the web archive
public interface.
47. Accessing Web Archive
• Wayback Annotator
• Create collections
• Select and save
relevant content to
their collections
• Annotate & mark
important parts of
archived web pages
• Share their work and
collaborate on
archived content use
http://netpreserve.org/sites/default/files/resources/Predstavitev_07.pdf
http://netpreserve.org/sites/default/files/resources/Wayback_annotator_06.pdf
48. Accessing Web Archive
Collection-Based
• In addition to
browsing the
collection, you can
browse the URIs in
this collection
• Research questions:
Collection overview
49. Accessing Web Archive
• Collection visualization
• Term frequency
algorithms should be
normalized to take the
mementos density in
consideration
http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
50. Accessing Web Archive
• Web Archive analytics
See also: http://ilpubs.stanford.edu:8090/1037/1/arcspread.pdf
• ArcSpread took a
query from the
user, extracted related
information and
displayed the results
in spread sheet style.
51. Who And What Links To The
Internet Archive
Y. Alnoamany, A. AlSum, M. C. Weigle, M. L. Nelson
In Proceedings of 17th International Conference on
Theory and Practice of Digital Libraries, TPDL
2013, 2013 (Best Student Paper)
See also: http://arxiv.org/abs/1309.4016
52. Serving Robots!
• Log files analysis using Apache Pig
• Access to IA wayback machine as
Robots outnumber Humans
• 10:1 in terms of sessions,
• 5:4 in terms of raw HTTP accesses
• 4:1 in terms of megabytes transferred
Access
Sessions
10
1
HTTP
accesses
5
4
MB
Transferred
4
1
53. Where do Wayback Machine Users
Come From?
Website Percentage Description
en.wikipedia.org 12.9% Wikipedia
archive.org 11.9% IA Home Page
reddit.com 10.2% Social News Web Site
google.TLD 9.9% Search Engine
info-poland.buffalo.edu 1.5% Polish Studies
de.wikipedia.org 1.4% Wikipedia
cracked.com 1.2% Humor Site
snopes.com 1.1% Urban Legends Reference Pages
facebook.com 0.9% Social Media
crochetpatterncentral.com 0.9% Crocheting Hobbies
Access
55. ArcLink:
Optimization Techniques To Build And Retrieve
The Temporal Web Graph
A. AlSum, M. L. Nelson
IIPC GA 2013, Ljubljana, Slovenia
In Proceedings of the 13th international ACM/IEEE joint
conference on Digital libraries, JCDL '13, 2013
See also: http://arxiv.org/abs/1305.5959
57. Solved Questions, but hard
Q: What are the HTML titles for vancouver2010com
through time?
A Page scraping for all mementos
Access
58. Impossible Questions
Q What are the anchor-text that pointed to
www.vancouver2010.com through time?
Access
…
<a href=www.vancouver2010.com >
Vancouver Olympics
</a>
….
…
<a href=www.vancouver2010.com >
Winter Olympics
</a>
…
…
<a href=www.vancouver2010.com >
Vancouver 2010
</a>
…
63. Thumbnail Creation Challenges
• Scalability in Time
• IA may need 361 years to create thumbnail per each memento
using one hundred machine
• Scalability in Space
• IA will need 355 TB to store 1 thumbnail per each memento
• Page quality
Access
70. Community
• I suggest to be a member in IIPC
• Join the open Wayback Machine team
• Join the Winter Olympics 2014 collaborative project, even as an
observer
71. Community
• Web Archiving Workshops
WAC 2011, Ottawa, Canada
WAC 2012, Stanford, CA, USA
WADL 2013, Indianapolis, IN, USATempWeb 2013, Rio de Janeiro, Brazil
72. Tools to SUL Web Archive
• Selection
• Harvest
• Analysis
• Access
73. Conclusions
• Be Selective: Cover missing parts of the Web
• Be Older: Include WebBase
• Be Smart: Innovative services
• Be Helpful: Researcher Framework/Dataset
• Be Active: Participate in the WA communities
• Make a difference
aalsum@cs.odu.edu
@aalsum
75. What is missing?
IA Internet Archive CAN Library and Archives Canada PO Portuguese Web Archive CZ Archive of the Czech Web
LoC Library of Congress BL British Library CAT Web Archive of Catalonia TW
National Taiwan
University
IC Icelandic Web Archive UK UK National Library CR Croatian Web Archive AIT Archive It