The document summarizes Dr. Michele Weigle's presentation on tools for managing the past web. It discusses how webpages can disappear quickly from the live web and may only be accessible through web archives. It then describes several tools and projects from Old Dominion University's Web Sciences and Digital Libraries group that make archived web content more accessible and useful, such as tools that integrate the live and archived web, detect damage in archived pages, summarize collections of archived pages, and enable personal web archiving.
Archive What I See Now - 2014 NEH ODH OverviewMichele Weigle
"Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher
Slides from 2014 NEH ODH Project Directors' Meeting
September 15, 2014
Michele C. Weigle, Michael L. Nelson, Liza Potts
"Archive What I See Now" - NEH ODH overviewMichele Weigle
"Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher
Slides from shutdown-cancelled NEH ODH Project Directors' Meeting (originally scheduled for Oct 4, 2013)
Michele C. Weigle and Michael L. Nelson
Archive What I See Now - 2014 NEH ODH OverviewMichele Weigle
"Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher
Slides from 2014 NEH ODH Project Directors' Meeting
September 15, 2014
Michele C. Weigle, Michael L. Nelson, Liza Potts
"Archive What I See Now" - NEH ODH overviewMichele Weigle
"Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher
Slides from shutdown-cancelled NEH ODH Project Directors' Meeting (originally scheduled for Oct 4, 2013)
Michele C. Weigle and Michael L. Nelson
Created in the spirit of TED’s mission, “ideas worth spreading,” the TEDx program is designed to give communities, organizations and individuals the opportunity to stimulate dialogue through TED-like experiences at the local level.
Connecting your classroom to other classrooms in the world need not be overwhelming. Learn the seven steps to successfully, safely connect your classroom in meaningful ways that will enhance your curriculum and excite your students.
NOTE: Many photos included here are from istock photo and therefore, I do not have the license to allow download or distribution.
Web 2.0 Tools: Outreach and Community BuildingBrian Gray
Libraries are leveraging Web 2.0 tools and principles to build a new model of outreach. Library 2.0 includes a new wave of promotion, marketing, and collaboration to reach library users in their place of work or play. Users are gaining a sense of ownership in their information experience as they contribute to the web presence of the library. Hear tips and best practices from library leaders who have made this transition successfully.
The flat classroom concept is based on the constructivist principle of a multi-modal learning environment that is student-centered and a level playing field for teacher to student and student to teacher interaction. Based on the experiences of the award winning Flat Classroom Project this session will detail seven essential steps for lowering your classroom walls to promote connection and understanding between geographically dispersed, ethnically and culturally diverse groups of students in meaningful, global cooperative authentic learning experiences.
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"Daniel Bryant
All is not completely rosy in microservice-land. It’s often a sign of an architectural approach’s maturity that anti-patterns begin to be identified and classified alongside well-established principles and practices. Daniel Bryant introduces seven deadly sins from real projects, which left unchecked could easily ruin your next microservices project.
Daniel offers an updated tour for 2016 of some of the nastiest anti-patterns in microservices from several real-world projects he’s encountered as a consultant, providing a series of anti-pattern “smells” you can sniff out and exploring the tools and techniques you need to avoid or mitigate the potential damage.
Topics include:
Pride: Selfishly building the wrong thing, such as the "Inter-Domain-Enterprise-Application-Service-Bus” or a fully bespoke infrastructure platform
Envy: Introducing inappropriate intimacy within services by creating a shared “canonical” domain model
Wrath: Failing to deal with the inevitable bad things that occur within a distributed system
Sloth: Composing services in a lazy fashion, which ultimately leads to the creation of a "distributed monolith”
Lust: Embracing the latest and greatest technology without evaluating the operational impact incurred by these choices
Sinsai.info - How open collaboration helps disaster-affected people.Hal Seki
Sinsai.info is the crisis information live map platform for aggregating incident reports of the great earthquake in Japan.
http://sinsai.info/
This presentation shows an overview of sinsai.info and explain how open collaboration platform helps disaster-affected people.
I presented this paper at iPres 2018. Here, we introduce the Off-Topic Memento Toolkit, used to detect versions of web pages that have drifted off topic from the general topic of a collection.
Keynote discussion of how classrooms are changing and some practical things that teachers, IT directors, and administrators can do to facilitate this change.
Digitization Basics for Archives and Special Collections – Part 1: Select and...WiLS
Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS
This is the first part of a two-part, full-day workshop introducing the core elements of creating digital collections of historic photographs, documents and other archival materials. Part 1 focuses on selecting materials to digitize and the basics of reformatting. We’ll start with some recommendations for planning a successful project and consider how your digital collections can fit into the statewide and national landscape of digital content. We’ll discuss copyright concerns in order to help you answer the question “CAN I put this online?” And we’ll explore the vocabulary of digital images, including pixels, resolution and bit depth as well as tools and best practices for scanning photographs and documents.
This presentation was provided by Corey Davis of the University of Victoria during the NISO Virtual Conference, Convergence: The Web and Publishing Onto The Web, held on May 17, 2017
Created in the spirit of TED’s mission, “ideas worth spreading,” the TEDx program is designed to give communities, organizations and individuals the opportunity to stimulate dialogue through TED-like experiences at the local level.
Connecting your classroom to other classrooms in the world need not be overwhelming. Learn the seven steps to successfully, safely connect your classroom in meaningful ways that will enhance your curriculum and excite your students.
NOTE: Many photos included here are from istock photo and therefore, I do not have the license to allow download or distribution.
Web 2.0 Tools: Outreach and Community BuildingBrian Gray
Libraries are leveraging Web 2.0 tools and principles to build a new model of outreach. Library 2.0 includes a new wave of promotion, marketing, and collaboration to reach library users in their place of work or play. Users are gaining a sense of ownership in their information experience as they contribute to the web presence of the library. Hear tips and best practices from library leaders who have made this transition successfully.
The flat classroom concept is based on the constructivist principle of a multi-modal learning environment that is student-centered and a level playing field for teacher to student and student to teacher interaction. Based on the experiences of the award winning Flat Classroom Project this session will detail seven essential steps for lowering your classroom walls to promote connection and understanding between geographically dispersed, ethnically and culturally diverse groups of students in meaningful, global cooperative authentic learning experiences.
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"Daniel Bryant
All is not completely rosy in microservice-land. It’s often a sign of an architectural approach’s maturity that anti-patterns begin to be identified and classified alongside well-established principles and practices. Daniel Bryant introduces seven deadly sins from real projects, which left unchecked could easily ruin your next microservices project.
Daniel offers an updated tour for 2016 of some of the nastiest anti-patterns in microservices from several real-world projects he’s encountered as a consultant, providing a series of anti-pattern “smells” you can sniff out and exploring the tools and techniques you need to avoid or mitigate the potential damage.
Topics include:
Pride: Selfishly building the wrong thing, such as the "Inter-Domain-Enterprise-Application-Service-Bus” or a fully bespoke infrastructure platform
Envy: Introducing inappropriate intimacy within services by creating a shared “canonical” domain model
Wrath: Failing to deal with the inevitable bad things that occur within a distributed system
Sloth: Composing services in a lazy fashion, which ultimately leads to the creation of a "distributed monolith”
Lust: Embracing the latest and greatest technology without evaluating the operational impact incurred by these choices
Sinsai.info - How open collaboration helps disaster-affected people.Hal Seki
Sinsai.info is the crisis information live map platform for aggregating incident reports of the great earthquake in Japan.
http://sinsai.info/
This presentation shows an overview of sinsai.info and explain how open collaboration platform helps disaster-affected people.
I presented this paper at iPres 2018. Here, we introduce the Off-Topic Memento Toolkit, used to detect versions of web pages that have drifted off topic from the general topic of a collection.
Keynote discussion of how classrooms are changing and some practical things that teachers, IT directors, and administrators can do to facilitate this change.
Digitization Basics for Archives and Special Collections – Part 1: Select and...WiLS
Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS
This is the first part of a two-part, full-day workshop introducing the core elements of creating digital collections of historic photographs, documents and other archival materials. Part 1 focuses on selecting materials to digitize and the basics of reformatting. We’ll start with some recommendations for planning a successful project and consider how your digital collections can fit into the statewide and national landscape of digital content. We’ll discuss copyright concerns in order to help you answer the question “CAN I put this online?” And we’ll explore the vocabulary of digital images, including pixels, resolution and bit depth as well as tools and best practices for scanning photographs and documents.
This presentation was provided by Corey Davis of the University of Victoria during the NISO Virtual Conference, Convergence: The Web and Publishing Onto The Web, held on May 17, 2017
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
ws-dl.cs.odu.edu
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
Old Dominion University ECE Department Colloquium
2015-11-13
The development of better library information systems will always remain the core business of any serious library organization, but a shift took place towards (freely) available web-based tools for creating and managing the information workflow.
End-users are not only using these heavily, but are also creating their own preferred tools. Today's students are incorporating Web 2.0 skills in daily life, in their social and learning environments. Tomorrow's academic staff will expect to be able to use their preferred tools and resources within their work environment. Today's ánd tomorrow's libraries should support students and staff in the learning and research process by integrating their services and resources into our patrons' environments.
This practical workshop will demonstrate the use of Web 2.0 technology to empower users and librarians. During a hands-on session, participants will work with these tools. They will develop tailor-made services via personal start page software like Netvibes, making use of RSS-feeds, Widgets and Browser extensions.
We will explore the use of Netvibes and Web 2.0 tools in library staff and/or library user education/instruction. We will focus on library services which can be created almost on-the-fly with low costs and high impact. The growing use of social networks justifies the development of a library presence within these networks to reach out to our users.
Paper, slides and recommended reading : http://www.tilburguniversity.nl/services/lis/ticer/08carte/recommendedreading.html#brekel
Digital collections: Increasing awareness and useButtes
Your digital collections are online. What's next? Learn how CONTENTdm users including libraries, museums and archives use a variety of ways to increase awareness and promote their digital collections. The session will also highlight the use of the WorldCat Digital Collection Gateway that provides you with a self-service tool for uploading the metadata of your unique digital content to WorldCat and is available to all repository managers.
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...CILIP MDG
Would you like to re-energise your metadata skills, re-enthuse your colleagues by demonstrating the power of metadata, and re-vitalise discovery of your unique content? Then this session is for you!
Supporting this year’s theme of “Re-discovery” Ruth Elder and Helen Williams will introduce their recently launched Wikidata Thesis Toolkit (https://www.wikidata.org/wiki/Wikidata:WikiProject_Wikidata_Thesis_Toolkit ), a document which aims to reduce the development burden for other institutions looking to establish a Wikidata thesis project. Ruth and Helen will showcase the value and impact of a Wikidata thesis project at each of their institutions, inspire the audience to get hands on with Wikidata through the live creation of a Wikidata thesis item, and demonstrate how SPARQL queries make use of your metadata.
We hope that this session will be foundational in developing a growing community of practice among UK metadata experts who are interested in developing Wikidata work and sharing experience with one another.
Paper presented at the CILIP Metadata and Discovery Group (MDG) Conference & UKCoR RDA Day (6th - 8th Sept 2023 at IET Austin Court, Birmingham).
This presentation highlights current web design trends, agile development methodologies, and current trends in library research, user behaviors, and the implications of Lorcan Dempsey's concept of Inside Out libraries and Full Library Discovery on our users' experiences with our library websites.
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Michele Weigle
Based on work published in ACM Transactions on Information Systems (TOIS), 36(1), July 2017 by Lulwah Alkwai, Michael L. Nelson, and Michele C. Weigle
Presented at ACM SIGIR 2019 on July 24, 2019 by Michele C. Weigle
WS-DL’s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
Talk given at Library of Congress by Michele C. Weigle (@weiglemc)
December 18, 2018
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
My academic story as told through the Internet Archive's Wayback Machine.
Slides from my keynote presentation at the Southeast Women in Computing Conference, November 16, 2013
Full talk slides at http://www.slideshare.net/mweigle/telling-stories-with-web-archives
A Retasking Framework For Wireless Sensor NetworksMichele Weigle
Presented by Yang He
Military Communications Conference (MILCOM)
October 6-8, 2014
Baltimore, MD
Michael Ruffing, Yang He, Jason Hallstrom, Mat Kelly, Stephan Olariu and Michele C. Weigle, "A Retasking Framework For Wireless Sensor Networks," In Proceedings of the Military Communications Conference (MILCOM). Baltimore, MD, October 2014.
Strategies for Sensor Data Aggregation in Support of Emergency ResponseMichele Weigle
Presented by Xianping Wang
Military Communications Conference (MILCOM)
October 6-8, 2014
Baltimore, MD
Xianping Wang, Aaron Walden, Michele C. Weigle and Stephan Olariu, "Strategies for Sensor Data Aggregation in Support of Emergency Response," In Proceedings of the Military Communications Conference (MILCOM). Baltimore, MD, October 2014.
Presented by Michele C. Weigle, June 4, 2015
Columbia University Web Archiving Collaboration: New Tools and Models
Work by Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
What's Grad School All About?
Capital Region Celebration of Women in Computing (CAPWIC), Harrisonburg, VA
February 27, 2015
Presented by Michele Weigle
TDMA Slot Reservation in Cluster-Based VANETsMichele Weigle
Mohammad Almalag's PhD Defense Slides
Department of Computer Science
Old Dominion University
April 3, 2013
Note: You may need to download the file to see all of the animations.
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc NetworksMichele Weigle
Hadi Arbabi's PhD Defense Slides
Department of Computer Science
Old Dominion University
April 21, 2011
Note: You may need to download the file to see all of the animations.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
1. Tools for Managing the Past Web
Dr. Michele C. Weigle
Web Sciences and Digital Libraries (WS-DL) Group
Department of Computer Science
Old Dominion University
ODU - ECE Seminar
February 20, 2015
5. But webpages can disappear
• Average lifespan of a webpage: 50-100 days
• A year after publication, about 11% of content
shared on social media will be gone.
February 20, 2015
SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
5
7. Why archives matter
• Malaysia Airlines Flight
17 (MH17)
• Ukrainian separatists
originally took credit for
downing a transport plane
in that location
• Later deleted the post
• Internet Archive had
archived the post before
deletion
February 20, 2015 7
http://www.csmonitor.com/World/Europe/2014/0717/Web-
evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
8. Web archiving in the news - 2015
February 20, 2015 8
http://www.newyorker.com/magazine/2015/01/26/cobweb
9. But Wayback is not Google
• Wayback Machine has no full-text search
– too big to be indexed
– 452 billion web pages, 9 petabytes of data
– growing at 20 TB/week
• Enter URL and pick a date
February 20, 2015 9
"It’s more like a phone book than like an archive."
-Jill Lepore, The New Yorker
11. How can I access the
archives?
February 20, 2015
MementoFox
Memento for Chrome
http://ws-dl.blogspot.com/2010/03/2010-03-19-mementofox-add-on-released.html
http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
Mink
http://www.mementoweb.org
11
13. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 13
14. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 14
15. The State of Web Archiving
"Hooray! It's in the archive!"
vs.
"How well was it archived?"
current:
future:
February 20, 2015 15
17. How damaged are these mementos?
February 20, 2015
M = 0.17
(live web)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
17
18. How damaged are these mementos?
February 20, 2015
M = 0.17
(live web)
M = 0.24
(missing main)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
18
19. How damaged are these mementos?
February 20, 2015
M = 0.17
(live web)
M = 0.24
(missing main)
M = 0.29
(missing logo + navigation)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
19
20. How damaged are these mementos?
February 20, 2015
M = 0.17
D = 0.09
(live web)
M = 0.24
D = 0.41
(missing main)
M = 0.29
D = 0.36
(missing logo + navigation)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
20
21. How to detect damage?
February 20, 2015
vs.
Brunelle et al., JCDL 2014
21
22. February 20, 2015
Good News:
Although M is steady/increasing, D is decreasing
22
M = percentage missing
D = our damage metric
Sampled 45,000 mementos
- one memento/year of ~1850 webpages
- webpages from Bitly URIs shared over Twitter and Archive-It collections
Brunelle et al., JCDL 2014
23. Using JavaScript can result in
damaged mementos
February 20, 2015 23
JavaScript is
responsible for an
increasing proportion
of missing embedded
resources over time.
Brunelle, Kelly, Weigle and Nelson, "The Impact of JavaScript on Archivability," International Journal of Digital Libraries (IJDL), 2015
25. Different parts of a page can be
crawled at different times
February 20, 2015
Ainsworth and Nelson, "Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web
Archive", JCDL 2013
25
26. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 26
27. Which page did Chris Hayes
mean to tweet?
February 20, 2015 27
Tweet on Oct 3, 2014
Likely target (captured Oct 1, 2014)
28. What you see depends on
when you click
February 20, 2015 28
Oct 9, 2014
Oct 10, 2014
Nov 19-Dec 15, 2014 Today (Feb 2015) – now fergusonaction.com
29. Mapping Tweet Relevance
February 20, 2015 29
SalahEldeen and Nelson, "Reading the Correct History? Modeling Temporal Intention in Resource Sharing”, JCDL 2013
30. Let the reader choose live or
archived
February 20, 2015 30
31. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 31
33. What did usps.com look like?
February 20, 2015 33
http://whatdiditlooklike.mementoweb.org/
Animated GIF
1st memento of each
year
Submit a URL via
Twitter:
“#whatdiditlooklike URL”
34. Which tells you more about the
past of www.apple.com?
February 20, 2015
700 thumbnails
(not even all of them!)
32 sampled thumbnails
34
AlSum and Nelson, "Thumbnail Summarization Techniques for Web Archives", ECIR 2014
35. TimeMap Thumbnail
Summaries
• Compare HTML, not images
• Compute SimHash of HTML
– result is a string representing the content of
the page
• Calculate Hamming distance between
SimHashes of consecutive mementos
• Generate thumbnails of mementos that have at
least a 4 character difference in SimHash
– threshold too low -> near duplicate images
– threshold too high -> miss important
changes
February 20, 2015 35
3 lines of difference
AlSum and Nelson, "Thumbnail Summarization Techniques for Web Archives", ECIR 2014
39. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 39
40. Archive What I See Now
• Humanities
researchers know
they should
archive web
resources
• Standard web
archiving tools are
difficult for non IT
experts
February 20, 2015
"Archive What I See Now", NEH Digital Humanities Implementation Grant, 2014-2017, http://bit.ly/odu-dhig-2014
40
41. Why not just take a screenshot or
“save as”?
February 20, 2015
Can't interact with
a screenshot
"Save Page As..."output is
difficult to keep organized --
especially with multiple
captures over time
41
42. What about archiving pages behind
authentication or that change quickly?
February 20, 2015
Facebook - requires login
Twitter - changes faster
than typical crawling rate
42
43. How we're addressing the problem
• Google Chrome extension
• Archive the current state
of the page in standard
Web Archive (WARC)
format
• Compatible with
Wayback
February 20, 2015 43
Kelly and Weigle, "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage", JCDL 2012
Kelly, Weigle, and Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation
2012, Tools Demo Session
WARCreate
44. WARCreate - Work in Progress
• New modes of operation
– record mode
• while activated, add capture of each page visited to the
WARC
– countdown mode
• every interval, refresh and add new capture of page
– event mode
• add new capture of page every time it dynamically
reloads or refreshes
February 20, 2015 44
45. What to do with created WARCs?
February 20, 2015 45
Kelly, Weigle, and Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital
Archiving 2013, Poster Session
Kelly, Nelson, and Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013
WAIL
• Load created WARCs into
a Wayback instance on
your local computer
• Single-click install of
Wayback (and other
archiving tools)
• Available for Windows,
OS X
46. Bridging the gap between the past web
and the live web
February 20, 2015
Mink
46
Kelly, Nelson, and Weigle, "Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento,"
poster, ACM/IEEE Digital Libraries (DL), September 2014.
• Google Chrome extension
• For each page you visit,
displays the number of
archived versions available
• Provides access by date
• Allows for submission to
public archiving services
48. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 48
53. Storytelling For Archives
Archived collectionsStorytelling services
Archived enriched
stories
February 20, 2015 53
AlNoamany, "Using Web Archives to Enrich the Live Web Experience Through Storytelling", TCDL Bulletin, December 2013.
54. Tools for Storytelling
• Tools for Users
– use existing tools like Storify to view the stories of
a collection
• Tools for Curators
– use existing stories to augment your collections
– create stories from your collections
• candidate mementos automatically selected
February 20, 2015 54
55. Story Types
Fixed Page – Fixed Time:
differences in GeoIP,
mobile, etc.
Fixed Page – Sliding Time:
evolution of a single page
(or domain) through time
Sliding Page – Fixed Time:
different perspectives on a
point in time
Sliding Page – Sliding Time:
broadest possible coverage
of a collection
same
Time
different
URI
same
different
Issues: topic modeling, eliminating duplicates, maximizing
novelty, structural & content quality
February 20, 2015 55
56. ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 56
57. Web Sciences and Digital Libraries
Group (WS-DL)
• Scott Ainsworth
• Sawood Alam
• Lulwah Alkwai
• Yasmin AlNoamany
• Mohamed Aturban
• Justin Brunelle
• Mat Kelly
• Corren McCoy
• Shawn Jones
• Amara Naas
• Louis Nguyen
• Alexander Nwala
• Hany SalahEldeen
@WebSciDL
http://ws-dl.cs.odu.edu/
http://ws-dl.blogspot.com/
Dr. Michele C. Weigle
mweigle@cs.odu.edu
@weiglemc
http://www.cs.odu.edu/~mweigle/
February 20, 2015 57
Faculty
• Dr. Michael L. Nelson
• Dr. Michele C. Weigle
PhD Students