SCAPE Information Day at BL - Characterising content in web archives with Nanite

•

0 likes•556 views

This presentation was given by Will Palmer at ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. In this presentation Will Palmer introduced the SCAPE developed tool Nanite which can help institutions analyze their web archive data.

Technology

Characterising content in web archives with Nanite
William Palmer SCAPE Information Day British Library, UK, 14th July 2014

•When web sites are harvested they are stored in a container format
•The main web archive container formats are ARC and WARC (an ISO standard)
•They are effectively analogous to a zip file
2
Web Archives
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
WARC Container

•Web archives can hold billions of individual records
•To answer deeper questions you have to determine what data is held
•Not the same as a homogenous collection of images
•Can contain everything and anything
•Correctly formed files
•Malformed files
•Viruses
•Unknown files?
•You name it
3
Characterisation
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
?
?
?
?
JPG
GIF
TXT
XLS

Nanite
•Nanite is formed of two main modules
•nanite-core: a Java API for the UK National Archives’ Droid
•nanite-hadoop: WARC content characterisation using Hadoop
•Apache Tika (Detector), Nanite-core & libmagic-jni (‘file’)
•Optionally use Tika (Parsers); data output to sequence files
•Also list server content type & file extension
•Reuses: warc-hadoop-recordreaders (partially SCAPE)

Speed
•Fast: for 1TB, 14k warcs, 93m files; mimetypes detected in 17 hours
•Nanite has also been used at the Danish State and University Library
•7.3TB data, 80k ARC files, 261m files
•Identification using Droid and Tika
•Characterisation using Tika
•…in 32 hours
•Same platform but using FITS (not using Hadoop, but parallelised):
•12TB data, 100k ARC files, 400m files
•An entire year of processing (8760 hours)
Map
Tika Identify
Nanite/ Droid
Libmagic
Tika Parser

Stats
•1370 different MIME types reported by the original servers
•Tika detected 342
•DROID detected 319
•Additional information in this blog post: http://www.openplanetsfoundation.org/blogs/2014- 05-28-weekend-nanite

Visualising Characterisation Information
•Nanite has an option to output C3PO compatible outputs
•They can be directly loaded into C3PO for visualisation

What's hot

Basic Analytic Techniques - Using R Tool - Part 1

Beamsync

Expert Roundtable: The Future of Metadata After Hive Metastore

lakeFS

HDFCloud Workshop: HDF5 in the Cloud

The HDF-EOS Tools and Information Center

Have you heard about Inktank Ceph and are interested to learn some tips and tricks for getting started quickly and efficiently with Ceph? Then this is the session for you! In this two part session you learn details of: • the very latest enhancements and capabilities delivered in Inktank Ceph Enterprise such as a new erasure coded storage back-end, support for tiering, and the introduction of user quotas. • best practices, lessons learned and architecture considerations founded in real customer deployments of Dell and Inktank Ceph solutions that will help accelerate your Ceph deployment.

New Ceph capabilities and Reference Architectures

Kamesh Pemmaraju

DSpace Update from Open Repositories 2014

Repository Fringe

Hierarchical Data Formats (HDF) Update

The HDF-EOS Tools and Information Center

Web Archiving – Lessons and Potential

Daniel Gomes

Using Ceph for Large Hadron Collider Data

Rob Gardner

These slides are the basis of an Open Repositories 2015 talk about Archivematica integration. Abstract: The open repository ecosystem consists of many interlocking systems which satisfy needs at different points in content management workflows, and these differ within and among institutions. Archivematica is a digital preservation system which aims to integrate with existing repository, storage and access systems in order to leverage the resources that institutions have invested towards building their repository over time. The presentation will cover every integration the Archivematica project has completed thus far, including Dspace and DuraCloud, LOCKSS, Islandora/Fedora, Archivists' Toolkit, AccessToMemory (AtoM), CONTENTdm, Arkivum, HP Trim, and OpenStack, as well as ongoing projects with ArchivesSpace, Dataverse, and BitCurator. Each of these projects has had its own set of limitations in scope because of the requirements of the project sponsor and/or the limitations of other system, so in many ways several of them are not, and may never be 'complete' integrations. The discussion will explore what that means and strategies for expanding the functional capabilities of integration work over time. It will address scoping integration workflows and building requirements with limitations on functionality and resources. We will examine how systems can be built and enhanced in ways that accommodate diverse workflows and varied interlocking endpoints.

Archivematica integration handshaking towards comprehensive digital preserva...

Artefactual Systems - Archivematica

Archiving the French Web: the BnF web archiving workflow. Sara Aubry

Biblioteca Nacional de España

Report: Archivematica hosting in the cloud

Artefactual Systems - Archivematica

LoCloud - Local content in a Europeana cloud

Europeana

Summit2013 eventos onto quad

Semantic Technology Institute International

What's hot (13)

Basic Analytic Techniques - Using R Tool - Part 1

Expert Roundtable: The Future of Metadata After Hive Metastore

HDFCloud Workshop: HDF5 in the Cloud

New Ceph capabilities and Reference Architectures

DSpace Update from Open Repositories 2014

Hierarchical Data Formats (HDF) Update

Web Archiving – Lessons and Potential

Using Ceph for Large Hadron Collider Data

Archivematica integration handshaking towards comprehensive digital preserva...

Archiving the French Web: the BnF web archiving workflow. Sara Aubry

Report: Archivematica hosting in the cloud

LoCloud - Local content in a Europeana cloud

Summit2013 eventos onto quad

Viewers also liked

The British Library hosted a ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Some tools were presented and demonstrated in more detail (see the other presentations) and the day was closed with a presentation by Will Palmer, Carl Wilson and Peter May of some of the other outputs that SCAPE has delivered.

SCAPE Information Day at BL - Some of the SCAPE Outputs Available

SCAPE Project

Per Møldrup-Dalum introduced how the State and University Library in Denmark have deployed Hadoop in connection with the SCAPE project. With Hadoop the library have been able to process large amounts of data so much fast than what has been done before. The presentation was given at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. For more information about the demo day, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.

Hadoop and its applications at the State and University Library, SCAPE Inform...

SCAPE Project

Reinhold Huber-Mörk of Austrain Institute of Technology presented ‘An image based approach for content analysis in document collections’ at ISVC'13 (9th International Symposium on Visual Computing) in Rethymnon, Crete, Greece, on 31 July 2013. The development of tools for library workflows for duplicate content detection and content verification for complex documents were presented accompanied by results of the work.

An image based approach for content analysis in document collections

SCAPE Project

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...

SCAPE Project

This presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013. Barbara Sierman, Koninklijke Bibliotheek in the Netherlands, introduced the policy concept, previous work on policies and the work that has been done within SCAPE on preservation policies. SCAPE will build a catalogue of policy elements with three levels – guidance, preservation procedure, and control policies.

Preservation Policy in SCAPE - Training, Aarhus

SCAPE Project

At the iPres2013 conference in Lisbon, Portugal, in September 2013 Luís Faria, KEEP SOLUTIONS LDA, presented SCAPE work on monitoring of digital repositories and the tool, Scout, which has been developed in this connection. Scout is a web-based service that assists content holders in monitoring their digital repository and provides an ontological knowledge base for compiling the information needed to detect preservation risks and opportunities.

Automatic Preservation Watch

SCAPE Project

Alecs Geuder from the British Library presented a new SCAPE developed tool called ‘Flint’ at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. Flint is a format and file validation tool which can be used to valide your files and/or formats against a policy. At the British Library Flint is used to deal with non print legal deposit. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.

SCAPE Information day at BL - Flint, a Format and File Validation Tool

SCAPE Project

Policy levels in SCAPE

SCAPE Project

This presentation describes the EU-funded project SCAPE – Scalable Preservation Environments –, its developments and sustainability plans. The SCAPE project has developed scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. The project run-time was around 3½ years from 2011 to 2014. Read more about SCAPE at www.scape-project.eu

Scape project presentation - Scalable Preservation Environments

SCAPE Project

This presentation was given by Per Møldrup-Dalum at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. In this presentation an overview of the project, its results and how to sustain it is given. For more information, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.

SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...

SCAPE Project

Johan van der Knijff, The National Library of the Netherlands, presented his evaluation of format identification tools. He concluded by discussing the potential next steps for tools like DROID, following the results of his evaluation. This talk was given as part of The Future of File Format Identification: PRONOM and DROID User Consultation, in collaboration with the Digital Preservation Coalition at The National Archives, UK, on 28 November 2011. Listen to the full presentation at http://media.nationalarchives.gov.uk/index.php/johan-van-der-knijff-evaluation-of-format-identification-tools/

Evaluation of format identification tools

SCAPE Project

Sven Schlarb from the Austrian National Libraries gave an overview of the different application scenarios at the Austrian National Libraries related to Web Archiving and the Austrian Books Online project. The presentation was given at the LIBER Satellite Event on Long term accessibility of digital resources in theory and practice, https://liber2014.univie.ac.at/satellite-event/, in Vienna on 21 May 2014.

LIBER Satellite Event, SCAPE by Sven Schlarb

SCAPE Project

Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012

SCAPE Project

C sz z6

SCAPE Project

Ross King, Project Director of SCAPE, gave a short presentation of the EU funded project SCAPE, including descriptions of tools for planning and monitoring digital preservation, scalable computation and repositories, SCAPE Testbeds and where to learn more. The presentation was given at the workshop ‘Preservation at Scale’ http://bit.ly/17ppAln in connection with the iPres2013 conference in Lissabon, Portugal, in September 2013.

SCAPE - Scalable Preservation Environments

SCAPE Project

Viewers also liked (15)

SCAPE Information Day at BL - Some of the SCAPE Outputs Available

Hadoop and its applications at the State and University Library, SCAPE Inform...

An image based approach for content analysis in document collections

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...

Preservation Policy in SCAPE - Training, Aarhus

Automatic Preservation Watch

SCAPE Information day at BL - Flint, a Format and File Validation Tool

Policy levels in SCAPE

Scape project presentation - Scalable Preservation Environments

SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...

Evaluation of format identification tools

LIBER Satellite Event, SCAPE by Sven Schlarb

Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012

C sz z6

SCAPE - Scalable Preservation Environments

Similar to SCAPE Information Day at BL - Characterising content in web archives with Nanite

SCAPE Presentation at the Elag2013 conference in Gent/Belgium

Sven Schlarb

Application scenarios of the SCAPE project at the Austrian National Library

Sven Schlarb

Internet content as research data

National Library of Australia

From Box to Hydra via Archivematica

Jisc RDM

Something That Works: Implementing ResourceSpace Open Source Digital Asset Ma...

dwig

I say emulate

National Library of Australia

Repositories are systems to safely store and publish digital objects and their descriptive metadata. Repositories mainly serve their data by using web interfaces which are primarily oriented towards human consumption. They either hide their data behind non-generic interfaces or do not publish them at all in a way a computer can process easily. At the same time the data stored in repositories are particularly suited to be used in the Semantic Web as metadata are already available. They do not have to be generated or entered manually for publication as Linked Data. In my talk I will present a concept of how metadata and digital objects stored in repositories can be woven into the Linked (Open) Data Cloud and which characteristics of repositories have to be considered while doing so. One problem it targets is the use of existing metadata to present Linked Data. The concept can be applied to almost every repository software. At the end of my talk I will present an implementation for DSpace, one of the software solutions for repositories most widely used. With this implementation every institution using DSpace should become able to export their repository content as Linked Data.

SWIB14 Weaving repository contents into the Semantic Web

Pascal-Nicolas Becker

This presentation was given by Will Palmer at ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. In this presentation Will Palmer introduced Hadoop and the way the British Library and SCAPE have used Hadoop to process large-scale data.

SCAPE Information Day at BL - Large Scale Processing with Hadoop

SCAPE Project

These slides were used in a presentation at the "Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access" conference in Cambridge/UK in March 2016 in the session "Current and Future perspectives on technology for data preservation and sharing". They describe work in progress in the E-ARK project, which is co-funded by the European Commission and has as its main objective the creation of a scalable open source, digital archiving system offering efficent search and access content of very large digital object collections. The focus of this presentation lies on describing the core big data technologies (Apache Hadoop, Apache Hbase, and the document repository Lily developed by NGData), the architecture of the E-ARK integrated prototype implementation, and data mining use cases related to geographical data, named entitity extraction, and OLAP data analysis.

The Use of Big Data Techniques for Digital Archiving

Sven Schlarb

Openstack – An introduction

Muddassir Nazir

E-ARK-iPRES2016-Bern-October-2016

Sven Schlarb

The State and University Library, Denmark, hosted an information and demonstration day on 25 June 2014 for delegates from other large cultural heritage institutions in Denmark. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo. One of the presentations was given by Asger Askov Blekinge who showed how the library has worked on integrating its digital object management system with Hadoop. The library is currently digitizing 32 million newspaper pages and is using Hadoop map/reduce jobs to do quality assurance on the digitized files with the help of the SCAPE Stager/Loader so updated QA’ed files are stored in the repository.

Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...

SCAPE Project

Hadoop Distributed File System

elliando dias

Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...

Future Perfect 2012

Today, most any application can be “Dockerized.” However, there are special challenges when deploying a distributed application such as Spark on containers. This session will describe how to overcome these challenges in deploying Spark on Docker containers, with many practical tips and techniques for running Spark in a container environment. Containers are typically used to run stateless applications on a single host. There are significant real-world enterprise requirements that need to be addressed when running a stateful, distributed application in a secure multi-host container environment. There are decisions that need to be made concerning which tools and infrastructure to use. There are many choices with respect to container managers, orchestration frameworks, and resource schedulers that are readily available today and some that may be available tomorrow including:] • Mesos • Kubernetes • Docker Swarm Each has its own strengths and weaknesses; each has unique characteristics that may make it suitable, or unsuitable, for Spark. Understanding these differences is critical to the successful deployment of Spark on Docker containers. This session will describe the work done by the BlueData engineering team to run Spark inside containers, on a distributed platform, including the evaluation of various orchestration frameworks and lessons learned. You will learn how to apply practical networking and storage techniques to achieve high performance and agility in a distributed, container environment. Speaker Thomas Phelan, Chief Architect, Blue Data, Inc

Lessons learned from running Spark on Docker

DataWorks Summit

The Hellenic Aggregator - Overview, procedures & the cooperation with Europeana

Vangelis Banos

These slides accompany a 1.5 hour webinar sponsored by the Western New York Library Resources Council, presented by Dan Gillean of Artefactual Systems on February 15th, 2017. The session was intended to introduce participants to some of the key standards, services, and tools available to support digital preservation planning and activities. Part 1 focused on DP101, and how to begin tackling digital preservation in your institution. Part 2 introduced the Archivematica project's history, philosophy, and aims, while Part 3 was a live demonstration of Archivematica in action. Thank you to WNYLRC for sponsoring this event!

Digital Preservation with Archivematica: An Introduction

Artefactual Systems - Archivematica

NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...

National Information Standards Organization (NISO)

Apache Content Technologies

gagravarr

Hadoop training in bangalore

Kelly Technologies

Similar to SCAPE Information Day at BL - Characterising content in web archives with Nanite (20)

SCAPE Presentation at the Elag2013 conference in Gent/Belgium

Application scenarios of the SCAPE project at the Austrian National Library

Internet content as research data

From Box to Hydra via Archivematica

Something That Works: Implementing ResourceSpace Open Source Digital Asset Ma...

I say emulate

SWIB14 Weaving repository contents into the Semantic Web

SCAPE Information Day at BL - Large Scale Processing with Hadoop

The Use of Big Data Techniques for Digital Archiving

Openstack – An introduction

E-ARK-iPRES2016-Bern-October-2016

Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...

Hadoop Distributed File System

Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...

Lessons learned from running Spark on Docker

The Hellenic Aggregator - Overview, procedures & the cooperation with Europeana

Digital Preservation with Archivematica: An Introduction

NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...

Apache Content Technologies

Hadoop training in bangalore

More from SCAPE Project

The SCAPE developed tool Jpylyzer has long been in production use at a variety of institutions. The British Library uses Jpylyzer in combination with Schematron to validate JPEG2000 files. The presentation by Will Palmer was given at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...

SCAPE Project

This presentation origins from a webinar presented by Luís Faria. The webinar presents the SCAPE developed tools Scout and C3PO and demonstrates how to identify preservation risks in your content and, at the same time, share your content profile information with others to open new opportunities. Scout, the preservation watch system, centralizes all the necessary knowledge on the same platform, cross-referencing this knowledge to uncover all preservation risks. Scout automatically fetches information from several sources to populate its knowledge base. For example, Scout integrates with C3PO to get large-scale characterization profiles of content. Furthermore, Scout aims to be a knowledge exchange platform, to allow the community to bring together all the necessary information into the system. The sharing of information opens new opportunities for joining forces against common problems. The webinar was held 26 June 2014.

SCAPE Webinar: Tools for uncovering preservation risks in large repositories

SCAPE Project

At the ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014 Rune Bruun Ferneke-Nielsen presented how the library uses Jpylyzer, a SCAPE developed tool, to validate millions of JPEG 2000 files in connection with a large newspaper digitization project. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.

Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...

SCAPE Project

Hadoop has been used at the State and University Library, Denmark, in connection with an experiment on the migration of a large collection of audio files from mp3 to wav. This experiment was presented by Bolette Ammitzbøll Jurik at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. The experiment used Hadoop and Taverna but also xcorrSound waveform-compare which is a small tool developed within SCAPE to compare the content of audio files. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.

Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014

SCAPE Project

This presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013. Artur Kulmukhametov, Vienna University of Technology, introduced the importance of content profiling and how this can be done with the help of the SCAPE developed tool C3PO. Content profiling is based on characteristics extracted from the files’ metadata and will help the user to plan digital preservation. The tool C3PO can be easily integrated with both PLATO and Scout.

Content profiling and C3PO

SCAPE Project

Control policy formulation

SCAPE Project

Sven Schlarb of the Austrian National Library presented SCAPE (in German). Besides giving a general overview of SCAPE the presentation also includes descriptions of SCAPE solutions, including tools, software integration, planning, and more. The presentation was given at the Austrian Library day on ‘National Initiatives on Digital Information. Repositories, Research data and long-term preservation in Austria’ (http://www.obvsg.at/voeb-obvsg-bibliothekstage-2013/programm-410/) on 4 October 2013 in Vienna.

SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...

SCAPE Project

TAVERNA Components - Semantically annotated and sharable units of functionality

SCAPE Project

Johan van der Knijff, the National Library of the Netherlands, presented his views on ‘PDF/A-3 for preservation’ based on notes on embedded files and JPEG2000. The presentation was given at DPC briefing (http://bit.ly/1b487mD) which introduced and reviewed recent developments with the PDF / A standard, with particular emphasis on PDF/A version 3 published in October 2012. The meeting took place in Leeds on 13 March 2013.

PDF/A-3 for preservation. Notes on embedded files and JPEG2000

SCAPE Project

Rainer Schmidt, AIT Austrian Institute of Technology, presented Scalable Preservation Workflows from SCAPE at the 5-days ‘Digital Preservation Advanced Practitioner Training’ event (http://bit.ly/1fYCvMO), hosted by DPC, in Glasgow on 15-19 July 2013. The presentation gives an introduction to the SCAPE Platform, it presents scenarios from SCAPE Testbeds and it finally describes how to create scalable workflows and execute them on the SCAPE Platform.

Scalable Preservation Workflows

SCAPE Project

Quality assurance for document image collections in digital preservation

SCAPE Project

Digital Preservation Policies - SCAPE

SCAPE Project

This is an introduction to the Matchbox tool, a tool for quality control for digital collections. The introduction was given at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012. Presenters were Roman Graf and Reinhold Huber-Mörk from Austrian Institute of Technology and Alexander Schindler from Vienna University of Technology.

Matchbox tool. Quality control for digital collections – SCAPE Training event...

SCAPE Project

Characterisation - 101. An introduction to the identification and characteris...

SCAPE Project

More from SCAPE Project (14)

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...

SCAPE Webinar: Tools for uncovering preservation risks in large repositories

Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...

Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014

Content profiling and C3PO

Control policy formulation

SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...

TAVERNA Components - Semantically annotated and sharable units of functionality

PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Scalable Preservation Workflows

Quality assurance for document image collections in digital preservation

Digital Preservation Policies - SCAPE

Matchbox tool. Quality control for digital collections – SCAPE Training event...

Characterisation - 101. An introduction to the identification and characteris...

Recently uploaded

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

How to convert PDF to text with Nanonets

naman860154

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

CNv6 Instructor Chapter 6 Quality of Service

giselly40

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Delhi Call girls

08448380779 Call Girls In Friends Colony Women Seeking Men

Delhi Call girls

Microsoft's Threat Matrix for Kubernetes helps organizations understand the attack surface a Kubernetes deployment introduces to their environments. This ensures that adequate detections and mitigations are in place. By covering over 40 different attacker techniques, defenders can learn about Kubernetes-specific mitigations and controls to deploy to their environments. In this session, we will explore the MS-TA9013 Host Path Mount technique, which is commonly used by attackers to perform privilege escalation in a Kubernetes cluster. Attendees will learn how attackers and defenders can: * Escape the container's host volume mount to gain persistence on an underlying node * Move laterally from the underlying node into the customer's cloud environment * Analyze Kubernetes audit logs to detect pods deployed with a hostPath mount * Deploy an admission controller that prevents new pods from using a hostPath mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Puma Security, LLC

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Delhi Call girls

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

How to convert PDF to text with Nanonets

Advantages of Hiring UIUX Design Service Providers for Your Business

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

CNv6 Instructor Chapter 6 Quality of Service

Data Cloud, More than a CDP by Matt Robison

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

Breaking the Kubernetes Kill Chain: Host Path Mount

Boost PC performance: How more available memory can improve productivity

Boost Fertility New Invention Ups Success Rates.pdf

What Are The Drone Anti-jamming Systems Technology?

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Axa Assurance Maroc - Insurer Innovation Award 2024

A Year of the Servo Reboot: Where Are We Now?

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Automating Google Workspace (GWS) & more with Apps Script

[2024]Digital Global Overview Report 2024 Meltwater.pdf

SCAPE Information Day at BL - Characterising content in web archives with Nanite

1. Characterising content in web archives with Nanite William Palmer SCAPE Information Day British Library, UK, 14th July 2014

2. •When web sites are harvested they are stored in a container format •The main web archive container formats are ARC and WARC (an ISO standard) •They are effectively analogous to a zip file 2 Web Archives This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). WARC Container

3. •Web archives can hold billions of individual records •To answer deeper questions you have to determine what data is held •Not the same as a homogenous collection of images •Can contain everything and anything •Correctly formed files •Malformed files •Viruses •Unknown files? •You name it 3 Characterisation This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). ? ? ? ? JPG GIF TXT XLS

4. Nanite •Nanite is formed of two main modules •nanite-core: a Java API for the UK National Archives’ Droid •nanite-hadoop: WARC content characterisation using Hadoop •Apache Tika (Detector), Nanite-core & libmagic-jni (‘file’) •Optionally use Tika (Parsers); data output to sequence files •Also list server content type & file extension •Reuses: warc-hadoop-recordreaders (partially SCAPE)

5. Speed •Fast: for 1TB, 14k warcs, 93m files; mimetypes detected in 17 hours •Nanite has also been used at the Danish State and University Library •7.3TB data, 80k ARC files, 261m files •Identification using Droid and Tika •Characterisation using Tika •…in 32 hours •Same platform but using FITS (not using Hadoop, but parallelised): •12TB data, 100k ARC files, 400m files •An entire year of processing (8760 hours) Map Tika Identify Nanite/ Droid Libmagic Tika Parser

6. Stats •1370 different MIME types reported by the original servers •Tika detected 342 •DROID detected 319 •Additional information in this blog post: http://www.openplanetsfoundation.org/blogs/2014- 05-28-weekend-nanite

7. Visualising Characterisation Information •Nanite has an option to output C3PO compatible outputs •They can be directly loaded into C3PO for visualisation

SCAPE Information Day at BL - Characterising content in web archives with Nanite

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (15)

Similar to SCAPE Information Day at BL - Characterising content in web archives with Nanite

Similar to SCAPE Information Day at BL - Characterising content in web archives with Nanite (20)

More from SCAPE Project

More from SCAPE Project (14)

Recently uploaded

Recently uploaded (20)

SCAPE Information Day at BL - Characterising content in web archives with Nanite