Jpylyzer, a validation and feature extraction tool developed in SCAPE project

•

0 likes•677 views

Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.

SCAPE

Improved validation and feature
extraction for JPEG 2000 Part 1:
the jpylyzer tool
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands
2 Open Planets Foundation

3 The British Library

IS&T, Archiving 2012, Copenhagen, 15.6.2012

SCAPE
Metamorfoze
National Programme for preservation of paper
heritage
Digitisation as a means to conserve threatened paper
originals

146 TB

Migrate by end 2012
TIFF
JP2

SCAPE
JP2 from JISC 1 Newspaper Collection (BL)

SCAPE
JP2 from JISC 1 Newspaper Collection (BL)

“Well-formed and valid”

SCAPE

Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg

Hardware failure may result in
corrupted images

SCAPE

Not all encoders
produce standard
compliant images

SCAPE
Possible solutions

Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2
Not ready for operational use (yet)
Option 3
Develop dedicated tool

SCAPE
Jpylyzer tool

0 1 1 1 1 0 0 1 0 1 1 1 0 1 0 1 1

SCAPE
Jpylyzer tool
- First prototype: December 2011

- Refactoring of original code: Jan 2012

- Packaging (Debian): Mar 2012
Univ. Southampton, KEEP Solutions, AIT Vienna

- Add remaining functionality, bugfixes: Apr-May
2012 (current version: 1.5)

SCAPE
JP2 file

JPEG 2000 Signature box

File Type box

JP2 Header box (superbox)

Contiguous Codestream box 0

Contiguous Codestream box n

IPR box

XML box(es)

UUID box(es)

UUID Info box(es) (superbox)

SCAPE
Example 1: detection of broken JP2s in JISC 1
Newspapers

Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2

*Includes unzipping, actual time needed by jpylyzer much less!

SCAPE
Results

- 676 broken JP2s in JISC 1 collection (0.03 %)
TIFF originals still available

- JISC 2 (> 1 million images): 3 broken JP2s

- 19th Century books (> 22 million images): no broken
JP2s

SCAPE
Example 2: quality control Metamorfoze
migration

146 TB

Migrate by end 2012
TIFF
JP2

SCAPE
http://www.openplanetsfoundation.org/software/jpylyzer

SCAPE
Acknowledgements

Debian packages
- Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)

Feedback on early versions
- Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)

SCAPE
Funding

This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).

http://www.scape-project.eu

#SCAPEProject

This document summarizes the evaluation of several file format identification tools conducted as part of the SCAPE project. The evaluation tested tools including DROID, FIDO, Unix File Utility, FITS and JHOVE2 on a set of scientific journal files. It assessed the tools based on various criteria related to usability, format coverage, output and performance. The evaluation found that signature-based tools had trouble with text formats and that Java-based tools were slow for single file processing but faster for batches. Tool developers provided feedback and updates based on the results.

Introduction to ROS (Robot Operating System)

hvcoup

ROS is a framework that provides a communication infrastructure, robot-specific features, eliminates programming language barriers, and includes diagnostic and advanced simulation tools. It acts as a meta-operating system through its collection of frameworks, SDKs, and software that address the challenges of robot integration complexity. ROS was created in 2008 and is currently maintained by OSRF, with 9 versions released so far.

Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...

Stuart Wrigley

This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating semantic technologies and uses the Business Process Execution Language (BPEL) to define evaluation workflows that can be executed by process engines. This framework supports a variety of evaluations, from different semantic areas, including search, and is extendible to new evaluations. We show how BPEL addresses this diversity as well as how it is used to solve specific challenges such as heterogeneity, error handling and reuse. Presented at Data infrastructurEs for Supporting Information Retrieval Evaluation (DESIRE 2011) Workshop, Co-located with CIKM 2011, the 20th ACM Conference on Information and Knowledge Management Friday 28th October 2011, Glasgow, UK http://www.promise-noe.eu/events/desire-2011/

A Multidimensional Empirical Study on Refactoring Activity

Nikolaos Tsantalis

The document summarizes a multidimensional empirical study on refactoring activity in three open source projects. It aims to understand refactoring practices by developers through analyzing version control histories. The study seeks to answer five research questions related to the types of refactorings applied to test vs production code, individual contributions to refactoring, alignment with release dates, relationship between test and production refactoring, and motivations for applied refactorings. The methodology examines commit histories, detects refactorings, identifies test code, analyzes activity trends, and classifies sampled refactorings. Key findings include different refactoring focuses for test vs production code, alignment of refactoring and testing, increased refactoring before releases

Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...

Raffi Khatchadourian

Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924 Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.

Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Stuart Wrigley

The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind. In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language. The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.

Reverse engineering

Saswat Padhi

This document provides an introduction to reverse engineering and discusses cracking Windows applications. It begins with a disclaimer that reverse engineering copyrighted material is illegal. It then defines reverse engineering as analyzing a system to understand its structure and function in order to modify or reimplement parts of it. The document discusses reasons for learning reverse engineering like malware analysis, bug fixing, and customizations. It outlines some of the history of reverse engineering in software development. The remainder of the document focuses on tools and techniques for reverse engineering like PE identification, decompilers, disassemblers, debuggers, patching applications in OllyDbg, and analyzing key generation and phishing techniques.

ESSIR LivingKnowledge DiversityEngine tutorial

Jonathon Hare

The document summarizes a symposium on bias and diversity in information retrieval testbeds. It introduces the Diversity Engine, which provides collections, annotation tools, and an evaluation framework to allow for collaborative and comparable research on indexing and searching documents annotated with various metadata like entities, bias, trust, and multimedia features. It describes the architecture, design decisions, supported document collections and formats, analysis modules for text and images, indexing and search functionality using Solr, application development, and evaluation framework. An example application is demonstrated by indexing a sample collection and making it searchable.

The State and University Library, Denmark, hosted an information and demonstration day on 25 June 2014 for delegates from other large cultural heritage institutions in Denmark. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo. One of the presentations was given by Asger Askov Blekinge who showed how the library has worked on integrating its digital object management system with Hadoop. The library is currently digitizing 32 million newspaper pages and is using Hadoop map/reduce jobs to do quality assurance on the digitized files with the help of the SCAPE Stager/Loader so updated QA’ed files are stored in the repository.

Duplicate detection for quality assurance of document image collections

SCAPE Project

Quality assurance for document image collections in digital preservation

SCAPE Project

This document discusses quality assurance for digital image collections in preservation workflows. It presents a keypoint-based approach for comparing document images that is robust to scaling, rotation and other transformations. Keypoints are detected across images and matched, and structural similarity is evaluated. The method was tested on collections from the Dunhuang manuscripts and Google books, achieving high accuracy in identifying identical and similar image pairs. The goal is to integrate this approach into digital preservation platforms to automate quality control of large image collections.

Audio Quality Assurance. An application of cross correlation

SCAPE Project

Similarity Maps Using SSIM Index

Michel Alves

The document summarizes Michel Alves' 10-minute speech comparing two image quality assessment metrics: normalized cross-correlation (NCC) and structural similarity index (SSIM). It provides background on the metrics, including their mathematical formulas and properties. Various images are used to generate similarity maps using both metrics side-by-side for visual comparison. While NCC is simple and fast, SSIM better captures structural changes but has greater complexity and potential numerical instability. In conclusions, the document advises using an appropriate metric depending on application and reevaluating metrics in some cases.

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...

jkSlidevault

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...

SCAPE Project

The SCAPE developed tool Jpylyzer has long been in production use at a variety of institutions. The British Library uses Jpylyzer in combination with Schematron to validate JPEG2000 files. The presentation by Will Palmer was given at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.

【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision

Deep Learning JP

The document summarizes an academic paper on unpaired image super-resolution using pseudo-supervision. It presents the following key points: 1. The paper proposes a method using GANs and two networks - a correction network to transform real low-resolution images to clean low-resolution, and a super-resolution network to generate high-resolution images from clean low-resolution. 2. Experiments on multiple datasets demonstrate better results than previous methods, generating high-resolution images from diverse, unpaired low-resolution data. 3. The proposed method was incorporated into Sharp's newest smartphone just 1.5 years after the paper was published, showing the speed of applying academic research.

The djatoka Image Server

Herbert Van de Sompel

This document summarizes the djatoka JPEG 2000 image server created by Ryan Chute and Herbert Van de Sompel. Djatoka is an open-source image server and dissemination framework built using the Kakadu JPEG 2000 library. It provides URI-addressable access to image regions, rotations, and format transformations of JPEG 2000 files in a reusable and extensible service framework enabled by the OpenURL standard. The document discusses the motivations for and features of djatoka, including its use of JPEG 2000 standards, dynamic resolution levels, quality layers for compression, and extraction of random regions.

Jpeg 2000 For Digital Archives

Richard Bernier

ドワンゴでのScala活用事例「ニコニコandroid」

Satoshi Goto

Matchbox tool. Quality control for digital collections – SCAPE Training event...

SCAPE Project

This is an introduction to the Matchbox tool, a tool for quality control for digital collections. The introduction was given at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012. Presenters were Roman Graf and Reinhold Huber-Mörk from Austrian Institute of Technology and Alexander Schindler from Vienna University of Technology.

Seminario Maurizio Agelli, 20-09-2012

CRS4 Research Center in Sardinia

Il seminario affronta le principali problematiche relative alla gestione di grosse collezioni di immagini: come organizzarle, preservarle nel tempo e renderle utilizzabili in modo efficace. Oltre a soffermarsi su aspetti chiave come i formati, i metadati, la catalogazione e il backup, il seminario fornisce una panoramica comparativa delle principali piattaforme software oggi disponibili, sia proprietarie che open-source.

LOD2 Webinar: The 2nd release of the LOD2 stack

Semantic Web Company

This webinar in the course of the LOD2 webinar series will present the release 2.0 of the LOD2 stack, which contains updates to the components Ontowiki, Silk * the assisting sparql editor SPARQLED (DERI), * the LOD enabled Open Refine (previously Google Refine) (ZEMANTA), * the extended version of SILK with link suggestion management from LATC (DERI), * the rdfAuthor library which allows to manage structured information from RDFa-enhanced websites (ULEI), * the SPARQLPROXY which is a PHP based forward proxy for remote access to SPARQL end points (ULEI) Release 2.0 contains also a first contributed debian package for a component which is maintained by a group outside the LOD2 consortium. With the help of ULEI a package for the STANBOL engine (http://stanbol.apache.org/) has been contributed. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series! http://lod2.eu/BlogPost/webinar-series

Analysis Software Benchmark

Akira Shibata

The document discusses analysis models for processing Large Hadron Collider (LHC) collision data using grid computing resources. It presents benchmark timing results for different analysis modes in ROOT like using C++, Python, and Athena. Processing derived data products like D1PD, D2PD and D3PD files with a C++ compiled analysis provides the best performance, being up to an order of magnitude faster than other modes. The document aims to help physicists optimize their analysis setup by comparing available options and estimating resource requirements.

Smart annotation processing - Paris JUG

gdigugli

The document discusses using annotation processing tooling (APT) to generate code and reports based on annotations in source code. Specifically, it summarizes using APT to improve internationalization (i18n) by generating .properties files and ResourceBundle classes from @MessageBundle and @Message annotations. This allows maintaining localization keys and default text in Java code while generating files for other locales.

Bedrich Vychodil DIFFER

Future Perfect 2012

1) DIFFER is a tool that analyzes and compares digital image file formats like TIFF, JPEG, JP2 and DjVu to identify properties, validate formats, detect differences and glitches. 2) It incorporates various existing tools and uses techniques like hashing, PSNR and pixel detection to analyze files. 3) The tool can be used to help set digital preservation standards and quality control for file conversion and master files.

Jpeg2000

imec.archive

The document discusses JPEG 2000 software licensing. It notes that the authors' JPEG 2000 software package was originally intended for internal research and as a reference for the JPEG 2000 Part 10 standard, but that they have received numerous requests from companies and academics. It raises questions about how to balance non-commercial and commercial use policies for the code and how to provide access to it while potentially creating revenue.

Smart Annotation Processing - Marseille JUG

gdigugli

This document discusses using annotation processing (APT) to improve internationalization (i18n) in Java applications. It describes how APT can be used to generate property files and resource bundle classes from annotations, avoiding issues with maintaining keys across files. The Ez18n library demonstrates this approach by processing @Message and @MessageBundle annotations to generate localized text resources for different platforms.

iMinds The Conference: Jan Lemeire

imec

This document discusses using GPUs for image processing instead of CPUs. It notes that GPUs have much higher peak performance than CPUs, growing from 5,000 triangles/second in 1995 to 350 million triangles/second in 2010. However, GPU programming is more complex than CPUs due to the different architecture and programming model. This can make it harder to implement algorithms on GPUs and to optimize for high efficiency. The document proposes a methodology for GPU acceleration including characterizing algorithms, estimating performance, using models like Roofline to analyze bottlenecks, and benchmarking. It also describes establishing a competence center to help others overcome the challenges of GPU programming.

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...

SCAPE Project

Viewers also liked

Presentation of SCAPE Project

SCAPE Project

Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...

SCAPE Project

Duplicate detection for quality assurance of document image collections

SCAPE Project

Quality assurance for document image collections in digital preservation

SCAPE Project

Audio Quality Assurance. An application of cross correlation

SCAPE Project

Similarity Maps Using SSIM Index

Michel Alves

Viewers also liked (6)

Presentation of SCAPE Project

Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...

Duplicate detection for quality assurance of document image collections

Quality assurance for document image collections in digital preservation

Audio Quality Assurance. An application of cross correlation

Similarity Maps Using SSIM Index

Similar to Jpylyzer, a validation and feature extraction tool developed in SCAPE project

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...

jkSlidevault

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...

SCAPE Project

【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision

Deep Learning JP

The djatoka Image Server

Herbert Van de Sompel

Jpeg 2000 For Digital Archives

Richard Bernier

ドワンゴでのScala活用事例「ニコニコandroid」

Satoshi Goto

Matchbox tool. Quality control for digital collections – SCAPE Training event...

SCAPE Project

Seminario Maurizio Agelli, 20-09-2012

CRS4 Research Center in Sardinia

LOD2 Webinar: The 2nd release of the LOD2 stack

Semantic Web Company

Analysis Software Benchmark

Akira Shibata

Smart annotation processing - Paris JUG

gdigugli

Bedrich Vychodil DIFFER

Future Perfect 2012

Jpeg2000

imec.archive

Smart Annotation Processing - Marseille JUG

gdigugli

iMinds The Conference: Jan Lemeire

imec

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...

SCAPE Project

DAWN and Scientific Workflows

Matthew Gerring

JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...

gdigugli

Vips 4mar09e

guest0f52728

Overview of JPEG standardization committee activities

Touradj Ebrahimi

Similar to Jpylyzer, a validation and feature extraction tool developed in SCAPE project (20)

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...

【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision

The djatoka Image Server

Jpeg 2000 For Digital Archives

ドワンゴでのScala活用事例「ニコニコandroid」

Matchbox tool. Quality control for digital collections – SCAPE Training event...

Seminario Maurizio Agelli, 20-09-2012

LOD2 Webinar: The 2nd release of the LOD2 stack

Analysis Software Benchmark

Smart annotation processing - Paris JUG

Bedrich Vychodil DIFFER

Jpeg2000

Smart Annotation Processing - Marseille JUG

iMinds The Conference: Jan Lemeire

Large scale preservation workflows with Taverna – SCAPE Training event, Guima...

DAWN and Scientific Workflows

JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...

Vips 4mar09e

Overview of JPEG standardization committee activities

More from SCAPE Project

C sz z6

SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite

SCAPE Project

SCAPE Information Day at BL - Some of the SCAPE Outputs Available

SCAPE Project

The British Library hosted a ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Some tools were presented and demonstrated in more detail (see the other presentations) and the day was closed with a presentation by Will Palmer, Carl Wilson and Peter May of some of the other outputs that SCAPE has delivered.

SCAPE Information Day at BL - Large Scale Processing with Hadoop

SCAPE Project

SCAPE Information day at BL - Flint, a Format and File Validation Tool

SCAPE Project

Alecs Geuder from the British Library presented a new SCAPE developed tool called ‘Flint’ at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. Flint is a format and file validation tool which can be used to valide your files and/or formats against a policy. At the British Library Flint is used to deal with non print legal deposit. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.

SCAPE Webinar: Tools for uncovering preservation risks in large repositories

SCAPE Project

This presentation origins from a webinar presented by Luís Faria. The webinar presents the SCAPE developed tools Scout and C3PO and demonstrates how to identify preservation risks in your content and, at the same time, share your content profile information with others to open new opportunities. Scout, the preservation watch system, centralizes all the necessary knowledge on the same platform, cross-referencing this knowledge to uncover all preservation risks. Scout automatically fetches information from several sources to populate its knowledge base. For example, Scout integrates with C3PO to get large-scale characterization profiles of content. Furthermore, Scout aims to be a knowledge exchange platform, to allow the community to bring together all the necessary information into the system. The sharing of information opens new opportunities for joining forces against common problems. The webinar was held 26 June 2014.

SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...

SCAPE Project

This presentation was given by Per Møldrup-Dalum at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. In this presentation an overview of the project, its results and how to sustain it is given. For more information, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.

Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...

SCAPE Project

At the ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014 Rune Bruun Ferneke-Nielsen presented how the library uses Jpylyzer, a SCAPE developed tool, to validate millions of JPEG 2000 files in connection with a large newspaper digitization project. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.

Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014

SCAPE Project

Hadoop has been used at the State and University Library, Denmark, in connection with an experiment on the migration of a large collection of audio files from mp3 to wav. This experiment was presented by Bolette Ammitzbøll Jurik at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. The experiment used Hadoop and Taverna but also xcorrSound waveform-compare which is a small tool developed within SCAPE to compare the content of audio files. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.

Hadoop and its applications at the State and University Library, SCAPE Inform...

SCAPE Project

Per Møldrup-Dalum introduced how the State and University Library in Denmark have deployed Hadoop in connection with the SCAPE project. With Hadoop the library have been able to process large amounts of data so much fast than what has been done before. The presentation was given at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. For more information about the demo day, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.

Scape project presentation - Scalable Preservation Environments

SCAPE Project

This presentation describes the EU-funded project SCAPE – Scalable Preservation Environments –, its developments and sustainability plans. The SCAPE project has developed scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. The project run-time was around 3½ years from 2011 to 2014. Read more about SCAPE at www.scape-project.eu

LIBER Satellite Event, SCAPE by Sven Schlarb

SCAPE Project

Sven Schlarb from the Austrian National Libraries gave an overview of the different application scenarios at the Austrian National Libraries related to Web Archiving and the Austrian Books Online project. The presentation was given at the LIBER Satellite Event on Long term accessibility of digital resources in theory and practice, https://liber2014.univie.ac.at/satellite-event/, in Vienna on 21 May 2014.

Content profiling and C3PO

SCAPE Project

This presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013. Artur Kulmukhametov, Vienna University of Technology, introduced the importance of content profiling and how this can be done with the help of the SCAPE developed tool C3PO. Content profiling is based on characteristics extracted from the files’ metadata and will help the user to plan digital preservation. The tool C3PO can be easily integrated with both PLATO and Scout.

Control policy formulation

SCAPE Project

Preservation Policy in SCAPE - Training, Aarhus

SCAPE Project

This presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013. Barbara Sierman, Koninklijke Bibliotheek in the Netherlands, introduced the policy concept, previous work on policies and the work that has been done within SCAPE on preservation policies. SCAPE will build a catalogue of policy elements with three levels – guidance, preservation procedure, and control policies.

An image based approach for content analysis in document collections

SCAPE Project

Reinhold Huber-Mörk of Austrain Institute of Technology presented ‘An image based approach for content analysis in document collections’ at ISVC'13 (9th International Symposium on Visual Computing) in Rethymnon, Crete, Greece, on 31 July 2013. The development of tools for library workflows for duplicate content detection and content verification for complex documents were presented accompanied by results of the work.

SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...

SCAPE Project

Sven Schlarb of the Austrian National Library presented SCAPE (in German). Besides giving a general overview of SCAPE the presentation also includes descriptions of SCAPE solutions, including tools, software integration, planning, and more. The presentation was given at the Austrian Library day on ‘National Initiatives on Digital Information. Repositories, Research data and long-term preservation in Austria’ (http://www.obvsg.at/voeb-obvsg-bibliothekstage-2013/programm-410/) on 4 October 2013 in Vienna.

TAVERNA Components - Semantically annotated and sharable units of functionality

SCAPE Project

Taverna components are semantically annotated and sharable units of functionality that can be included in workflows. They are well-described, produce predictable and reliable behavior, and help hide complexity. Components must conform to agreed specifications described as component profiles. The Taverna workflow system supports finding, using, creating, and semantically annotating components to improve discoverability and interoperability.

Automatic Preservation Watch

SCAPE Project

At the iPres2013 conference in Lisbon, Portugal, in September 2013 Luís Faria, KEEP SOLUTIONS LDA, presented SCAPE work on monitoring of digital repositories and the tool, Scout, which has been developed in this connection. Scout is a web-based service that assists content holders in monitoring their digital repository and provides an ontological knowledge base for compiling the information needed to detect preservation risks and opportunities.

Policy levels in SCAPE

SCAPE Project

More from SCAPE Project (20)