Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.
Evaluation of format identification toolsSCAPE Project
This document summarizes the evaluation of several file format identification tools conducted as part of the SCAPE project. The evaluation tested tools including DROID, FIDO, Unix File Utility, FITS and JHOVE2 on a set of scientific journal files. It assessed the tools based on various criteria related to usability, format coverage, output and performance. The evaluation found that signature-based tools had trouble with text formats and that Java-based tools were slow for single file processing but faster for batches. Tool developers provided feedback and updates based on the results.
Introduction to ROS (Robot Operating System) hvcoup
ROS is a framework that provides a communication infrastructure, robot-specific features, eliminates programming language barriers, and includes diagnostic and advanced simulation tools. It acts as a meta-operating system through its collection of frameworks, SDKs, and software that address the challenges of robot integration complexity. ROS was created in 2008 and is currently maintained by OSRF, with 9 versions released so far.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Stuart Wrigley
This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating semantic technologies and uses the Business Process Execution Language (BPEL) to define evaluation workflows that can be executed by process engines. This framework supports a variety of evaluations, from different semantic areas, including search, and is extendible to new evaluations. We show how BPEL addresses this diversity as well as how it is used to solve specific challenges such as heterogeneity, error handling and reuse.
Presented at Data infrastructurEs for Supporting Information Retrieval Evaluation (DESIRE 2011) Workshop, Co-located with CIKM 2011, the 20th ACM Conference on Information and Knowledge Management
Friday 28th October 2011, Glasgow, UK
http://www.promise-noe.eu/events/desire-2011/
A Multidimensional Empirical Study on Refactoring ActivityNikolaos Tsantalis
The document summarizes a multidimensional empirical study on refactoring activity in three open source projects. It aims to understand refactoring practices by developers through analyzing version control histories. The study seeks to answer five research questions related to the types of refactorings applied to test vs production code, individual contributions to refactoring, alignment with release dates, relationship between test and production refactoring, and motivations for applied refactorings. The methodology examines commit histories, detects refactorings, identifies test code, analyzes activity trends, and classifies sampled refactorings. Key findings include different refactoring focuses for test vs production code, alignment of refactoring and testing, increased refactoring before releases
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924
Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.
Methodology and Campaign Design for the Evaluation of Semantic Search ToolsStuart Wrigley
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind.
In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language.
The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.
This document provides an introduction to reverse engineering and discusses cracking Windows applications. It begins with a disclaimer that reverse engineering copyrighted material is illegal. It then defines reverse engineering as analyzing a system to understand its structure and function in order to modify or reimplement parts of it. The document discusses reasons for learning reverse engineering like malware analysis, bug fixing, and customizations. It outlines some of the history of reverse engineering in software development. The remainder of the document focuses on tools and techniques for reverse engineering like PE identification, decompilers, disassemblers, debuggers, patching applications in OllyDbg, and analyzing key generation and phishing techniques.
ESSIR LivingKnowledge DiversityEngine tutorialJonathon Hare
The document summarizes a symposium on bias and diversity in information retrieval testbeds. It introduces the Diversity Engine, which provides collections, annotation tools, and an evaluation framework to allow for collaborative and comparable research on indexing and searching documents annotated with various metadata like entities, bias, trust, and multimedia features. It describes the architecture, design decisions, supported document collections and formats, analysis modules for text and images, indexing and search functionality using Solr, application development, and evaluation framework. An example application is demonstrated by indexing a sample collection and making it searchable.
Evaluation of format identification toolsSCAPE Project
This document summarizes the evaluation of several file format identification tools conducted as part of the SCAPE project. The evaluation tested tools including DROID, FIDO, Unix File Utility, FITS and JHOVE2 on a set of scientific journal files. It assessed the tools based on various criteria related to usability, format coverage, output and performance. The evaluation found that signature-based tools had trouble with text formats and that Java-based tools were slow for single file processing but faster for batches. Tool developers provided feedback and updates based on the results.
Introduction to ROS (Robot Operating System) hvcoup
ROS is a framework that provides a communication infrastructure, robot-specific features, eliminates programming language barriers, and includes diagnostic and advanced simulation tools. It acts as a meta-operating system through its collection of frameworks, SDKs, and software that address the challenges of robot integration complexity. ROS was created in 2008 and is currently maintained by OSRF, with 9 versions released so far.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Stuart Wrigley
This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating semantic technologies and uses the Business Process Execution Language (BPEL) to define evaluation workflows that can be executed by process engines. This framework supports a variety of evaluations, from different semantic areas, including search, and is extendible to new evaluations. We show how BPEL addresses this diversity as well as how it is used to solve specific challenges such as heterogeneity, error handling and reuse.
Presented at Data infrastructurEs for Supporting Information Retrieval Evaluation (DESIRE 2011) Workshop, Co-located with CIKM 2011, the 20th ACM Conference on Information and Knowledge Management
Friday 28th October 2011, Glasgow, UK
http://www.promise-noe.eu/events/desire-2011/
A Multidimensional Empirical Study on Refactoring ActivityNikolaos Tsantalis
The document summarizes a multidimensional empirical study on refactoring activity in three open source projects. It aims to understand refactoring practices by developers through analyzing version control histories. The study seeks to answer five research questions related to the types of refactorings applied to test vs production code, individual contributions to refactoring, alignment with release dates, relationship between test and production refactoring, and motivations for applied refactorings. The methodology examines commit histories, detects refactorings, identifies test code, analyzes activity trends, and classifies sampled refactorings. Key findings include different refactoring focuses for test vs production code, alignment of refactoring and testing, increased refactoring before releases
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
Actor concurrency is becoming increasingly important in the development of real-world software systems. Although actor concurrency may be less susceptible to some multithreaded concurrency bugs, such as low-level data races and deadlocks, it comes with its own bugs that may be different. However, the fundamental characteristics of actor concurrency bugs, including their symptoms, root causes, API usages, examples, and differences when they come from different sources are still largely unknown. Actor software development can significantly benefit from a comprehensive qualitative and quantitative understanding of these characteristics, which is the focus of this work, to foster better API documentation, development practices, testing, debugging, repairing, and verification frameworks. To conduct this study, we take the following major steps. First, we construct a set of 186 real-world Akka actor bugs from Stack Overflow and GitHub via manual analysis of 3,924
Stack Overflow questions, answers, and comments and 3,315 GitHub commits, messages, original and modified code snippets, issues, and pull requests. Second, we manually study these actor bugs and their fixes to understand and classify their symptoms, root causes, and API usages. Third, we study the differences between the commonalities and distributions of symptoms, root causes, and API usages of our Stack Overflow and GitHub actor bugs. Fourth, we discuss real-world examples of our actor bugs with these symptoms and root causes. Finally, we investigate the relation of our findings with those of previous work and discuss their implications. A few findings of our study are: (1) symptoms of our actor bugs can be classified into five categories, with Error as the most common symptom and Incorrect Exceptions as the least common, (2) root causes of our actor bugs can be classified into ten categories, with Logic as the most common root cause and Untyped Communication as the least common, (3) a small number of Akka API packages are responsible for most of API usages by our actor bugs, and (4) our Stack Overflow and GitHub actor bugs can differ significantly in commonalities and distributions of their symptoms, root causes, and API usages. While some of our findings agree with those of previous work, others sharply contrast.
Methodology and Campaign Design for the Evaluation of Semantic Search ToolsStuart Wrigley
The main problem with the state of the art in the semantic search domain is the lack of comprehensive evaluations. There exist only a few efforts to evaluate semantic search tools and to compare the results with other evaluations of their kind.
In this paper, we present a systematic approach for testing and benchmarking semantic search tools that was developed within the SEALS project. Unlike other semantic web evaluations our methodology tests search tools both automatically and interactively with a human user in the loop. This allows us to test not only functional performance measures, such as precision and recall, but also usability issues, such as ease of use and comprehensibility of the query language.
The paper describes the evaluation goals and assumptions; the criteria and metrics; the type of experiments we will conduct as well as the datasets required to conduct the evaluation in the context of the SEALS initiative. To our knowledge it is the first effort to present a comprehensive evaluation methodology for Semantic Web search tools.
This document provides an introduction to reverse engineering and discusses cracking Windows applications. It begins with a disclaimer that reverse engineering copyrighted material is illegal. It then defines reverse engineering as analyzing a system to understand its structure and function in order to modify or reimplement parts of it. The document discusses reasons for learning reverse engineering like malware analysis, bug fixing, and customizations. It outlines some of the history of reverse engineering in software development. The remainder of the document focuses on tools and techniques for reverse engineering like PE identification, decompilers, disassemblers, debuggers, patching applications in OllyDbg, and analyzing key generation and phishing techniques.
ESSIR LivingKnowledge DiversityEngine tutorialJonathon Hare
The document summarizes a symposium on bias and diversity in information retrieval testbeds. It introduces the Diversity Engine, which provides collections, annotation tools, and an evaluation framework to allow for collaborative and comparable research on indexing and searching documents annotated with various metadata like entities, bias, trust, and multimedia features. It describes the architecture, design decisions, supported document collections and formats, analysis modules for text and images, indexing and search functionality using Solr, application development, and evaluation framework. An example application is demonstrated by indexing a sample collection and making it searchable.
This is a general presentation of the EU Project SCAPE, http://www.scape-project.eu from 2011. The project is about large-scale digital preservation and runs from 2011 to 2014.
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
The State and University Library, Denmark, hosted an information and demonstration day on 25 June 2014 for delegates from other large cultural heritage institutions in Denmark. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.
One of the presentations was given by Asger Askov Blekinge who showed how the library has worked on integrating its digital object management system with Hadoop. The library is currently digitizing 32 million newspaper pages and is using Hadoop map/reduce jobs to do quality assurance on the digitized files with the help of the SCAPE Stager/Loader so updated QA’ed files are stored in the repository.
Duplicate detection for quality assurance of document image collectionsSCAPE Project
Reinhold Huber-Mörk, Austrian Institute of Technology, presented a method for quality assurance of scanned content based on computer vision at iPres 2012, Toronto.
In: iPRES 2012 – Proceedings of the 9th International Conference on Preservation of Digital Objects. Toronto 2012, 136-143.
ISBN 978-0-9917997-0-1
Quality assurance for document image collections in digital preservation SCAPE Project
This document discusses quality assurance for digital image collections in preservation workflows. It presents a keypoint-based approach for comparing document images that is robust to scaling, rotation and other transformations. Keypoints are detected across images and matched, and structural similarity is evaluated. The method was tested on collections from the Dunhuang manuscripts and Google books, achieving high accuracy in identifying identical and similar image pairs. The goal is to integrate this approach into digital preservation platforms to automate quality control of large image collections.
Audio Quality Assurance. An application of cross correlationSCAPE Project
Jesper Sindahl Nielsen, State and University Library, Denmark, presented algorithms for automated quality assurance on audio files in context of preservation actions and
access. Cross correlation is used to compare the soundwaves.
In: iPRES 2012 – Proceedings of the 9th International Conference on Preservation of Digital Objects. Toronto 2012, 144-149.
ISBN 978-0-9917997-0-1
The document summarizes Michel Alves' 10-minute speech comparing two image quality assessment metrics: normalized cross-correlation (NCC) and structural similarity index (SSIM). It provides background on the metrics, including their mathematical formulas and properties. Various images are used to generate similarity maps using both metrics side-by-side for visual comparison. While NCC is simple and fast, SSIM better captures structural changes but has greater complexity and potential numerical instability. In conclusions, the document advises using an appropriate metric depending on application and reevaluating metrics in some cases.
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...jkSlidevault
Presentation on jpylyzer, a new tool that performs thorough validation of JPEG 2000 Part 1 (JP2) images. Presented during IS&T "Archiving 2012" conference.
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
The SCAPE developed tool Jpylyzer has long been in production use at a variety of institutions. The British Library uses Jpylyzer in combination with Schematron to validate JPEG2000 files.
The presentation by Will Palmer was given at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
The document summarizes an academic paper on unpaired image super-resolution using pseudo-supervision. It presents the following key points:
1. The paper proposes a method using GANs and two networks - a correction network to transform real low-resolution images to clean low-resolution, and a super-resolution network to generate high-resolution images from clean low-resolution.
2. Experiments on multiple datasets demonstrate better results than previous methods, generating high-resolution images from diverse, unpaired low-resolution data.
3. The proposed method was incorporated into Sharp's newest smartphone just 1.5 years after the paper was published, showing the speed of applying academic research.
This document summarizes the djatoka JPEG 2000 image server created by Ryan Chute and Herbert Van de Sompel. Djatoka is an open-source image server and dissemination framework built using the Kakadu JPEG 2000 library. It provides URI-addressable access to image regions, rotations, and format transformations of JPEG 2000 files in a reusable and extensible service framework enabled by the OpenURL standard. The document discusses the motivations for and features of djatoka, including its use of JPEG 2000 standards, dynamic resolution levels, quality layers for compression, and extraction of random regions.
Presented at the Digital Initiatives and Nearby History Institute, Terre Haute, IN, July 19, 2006 and the Indiana Library Federation Annual Conference: Indianapolis, IN, April 12, 2006;
The document discusses DWANGO's use of Scala and the Play framework to build APIs for niconico's Android app. It summarizes the project's history and team structure, describes the core library, API server, and management server built using Scala, and outlines some pros and cons they experienced like case class limitations, Jenkins memory issues, and Akka exceptions in Play.
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
This is an introduction to the Matchbox tool, a tool for quality control for digital collections. The introduction was given at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012. Presenters were Roman Graf and Reinhold Huber-Mörk from Austrian Institute of Technology and Alexander Schindler from Vienna University of Technology.
Il seminario affronta le principali problematiche relative alla gestione di grosse collezioni di immagini: come organizzarle, preservarle nel tempo e renderle utilizzabili in modo efficace. Oltre a soffermarsi su aspetti chiave come i formati, i metadati, la catalogazione e il backup, il seminario fornisce una panoramica comparativa delle principali piattaforme software oggi disponibili, sia proprietarie che open-source.
This webinar in the course of the LOD2 webinar series will present the release 2.0 of the LOD2 stack, which contains updates to the components Ontowiki, Silk
* the assisting sparql editor SPARQLED (DERI),
* the LOD enabled Open Refine (previously Google Refine) (ZEMANTA),
* the extended version of SILK with link suggestion management from LATC (DERI),
* the rdfAuthor library which allows to manage structured information from RDFa-enhanced websites (ULEI),
* the SPARQLPROXY which is a PHP based forward proxy for remote access to SPARQL end points (ULEI)
Release 2.0 contains also a first contributed debian package for a component which is maintained by a group outside the LOD2 consortium. With the help of ULEI a package for the STANBOL engine (http://stanbol.apache.org/) has been contributed.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
http://lod2.eu/BlogPost/webinar-series
The document discusses analysis models for processing Large Hadron Collider (LHC) collision data using grid computing resources. It presents benchmark timing results for different analysis modes in ROOT like using C++, Python, and Athena. Processing derived data products like D1PD, D2PD and D3PD files with a C++ compiled analysis provides the best performance, being up to an order of magnitude faster than other modes. The document aims to help physicists optimize their analysis setup by comparing available options and estimating resource requirements.
The document discusses using annotation processing tooling (APT) to generate code and reports based on annotations in source code. Specifically, it summarizes using APT to improve internationalization (i18n) by generating .properties files and ResourceBundle classes from @MessageBundle and @Message annotations. This allows maintaining localization keys and default text in Java code while generating files for other locales.
1) DIFFER is a tool that analyzes and compares digital image file formats like TIFF, JPEG, JP2 and DjVu to identify properties, validate formats, detect differences and glitches.
2) It incorporates various existing tools and uses techniques like hashing, PSNR and pixel detection to analyze files.
3) The tool can be used to help set digital preservation standards and quality control for file conversion and master files.
The document discusses JPEG 2000 software licensing. It notes that the authors' JPEG 2000 software package was originally intended for internal research and as a reference for the JPEG 2000 Part 10 standard, but that they have received numerous requests from companies and academics. It raises questions about how to balance non-commercial and commercial use policies for the code and how to provide access to it while potentially creating revenue.
This document discusses using annotation processing (APT) to improve internationalization (i18n) in Java applications. It describes how APT can be used to generate property files and resource bundle classes from annotations, avoiding issues with maintaining keys across files. The Ez18n library demonstrates this approach by processing @Message and @MessageBundle annotations to generate localized text resources for different platforms.
This document discusses using GPUs for image processing instead of CPUs. It notes that GPUs have much higher peak performance than CPUs, growing from 5,000 triangles/second in 1995 to 350 million triangles/second in 2010. However, GPU programming is more complex than CPUs due to the different architecture and programming model. This can make it harder to implement algorithms on GPUs and to optimize for high efficiency. The document proposes a methodology for GPU acceleration including characterizing algorithms, estimating performance, using models like Roofline to analyze bottlenecks, and benchmarking. It also describes establishing a competence center to help others overcome the challenges of GPU programming.
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
Sven Schlarb of the Austrian National Library gave this introduction to large scale preservation workflows with Taverna at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012.
This is a general presentation of the EU Project SCAPE, http://www.scape-project.eu from 2011. The project is about large-scale digital preservation and runs from 2011 to 2014.
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
The State and University Library, Denmark, hosted an information and demonstration day on 25 June 2014 for delegates from other large cultural heritage institutions in Denmark. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.
One of the presentations was given by Asger Askov Blekinge who showed how the library has worked on integrating its digital object management system with Hadoop. The library is currently digitizing 32 million newspaper pages and is using Hadoop map/reduce jobs to do quality assurance on the digitized files with the help of the SCAPE Stager/Loader so updated QA’ed files are stored in the repository.
Duplicate detection for quality assurance of document image collectionsSCAPE Project
Reinhold Huber-Mörk, Austrian Institute of Technology, presented a method for quality assurance of scanned content based on computer vision at iPres 2012, Toronto.
In: iPRES 2012 – Proceedings of the 9th International Conference on Preservation of Digital Objects. Toronto 2012, 136-143.
ISBN 978-0-9917997-0-1
Quality assurance for document image collections in digital preservation SCAPE Project
This document discusses quality assurance for digital image collections in preservation workflows. It presents a keypoint-based approach for comparing document images that is robust to scaling, rotation and other transformations. Keypoints are detected across images and matched, and structural similarity is evaluated. The method was tested on collections from the Dunhuang manuscripts and Google books, achieving high accuracy in identifying identical and similar image pairs. The goal is to integrate this approach into digital preservation platforms to automate quality control of large image collections.
Audio Quality Assurance. An application of cross correlationSCAPE Project
Jesper Sindahl Nielsen, State and University Library, Denmark, presented algorithms for automated quality assurance on audio files in context of preservation actions and
access. Cross correlation is used to compare the soundwaves.
In: iPRES 2012 – Proceedings of the 9th International Conference on Preservation of Digital Objects. Toronto 2012, 144-149.
ISBN 978-0-9917997-0-1
The document summarizes Michel Alves' 10-minute speech comparing two image quality assessment metrics: normalized cross-correlation (NCC) and structural similarity index (SSIM). It provides background on the metrics, including their mathematical formulas and properties. Various images are used to generate similarity maps using both metrics side-by-side for visual comparison. While NCC is simple and fast, SSIM better captures structural changes but has greater complexity and potential numerical instability. In conclusions, the document advises using an appropriate metric depending on application and reevaluating metrics in some cases.
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...jkSlidevault
Presentation on jpylyzer, a new tool that performs thorough validation of JPEG 2000 Part 1 (JP2) images. Presented during IS&T "Archiving 2012" conference.
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
The SCAPE developed tool Jpylyzer has long been in production use at a variety of institutions. The British Library uses Jpylyzer in combination with Schematron to validate JPEG2000 files.
The presentation by Will Palmer was given at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
The document summarizes an academic paper on unpaired image super-resolution using pseudo-supervision. It presents the following key points:
1. The paper proposes a method using GANs and two networks - a correction network to transform real low-resolution images to clean low-resolution, and a super-resolution network to generate high-resolution images from clean low-resolution.
2. Experiments on multiple datasets demonstrate better results than previous methods, generating high-resolution images from diverse, unpaired low-resolution data.
3. The proposed method was incorporated into Sharp's newest smartphone just 1.5 years after the paper was published, showing the speed of applying academic research.
This document summarizes the djatoka JPEG 2000 image server created by Ryan Chute and Herbert Van de Sompel. Djatoka is an open-source image server and dissemination framework built using the Kakadu JPEG 2000 library. It provides URI-addressable access to image regions, rotations, and format transformations of JPEG 2000 files in a reusable and extensible service framework enabled by the OpenURL standard. The document discusses the motivations for and features of djatoka, including its use of JPEG 2000 standards, dynamic resolution levels, quality layers for compression, and extraction of random regions.
Presented at the Digital Initiatives and Nearby History Institute, Terre Haute, IN, July 19, 2006 and the Indiana Library Federation Annual Conference: Indianapolis, IN, April 12, 2006;
The document discusses DWANGO's use of Scala and the Play framework to build APIs for niconico's Android app. It summarizes the project's history and team structure, describes the core library, API server, and management server built using Scala, and outlines some pros and cons they experienced like case class limitations, Jenkins memory issues, and Akka exceptions in Play.
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
This is an introduction to the Matchbox tool, a tool for quality control for digital collections. The introduction was given at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012. Presenters were Roman Graf and Reinhold Huber-Mörk from Austrian Institute of Technology and Alexander Schindler from Vienna University of Technology.
Il seminario affronta le principali problematiche relative alla gestione di grosse collezioni di immagini: come organizzarle, preservarle nel tempo e renderle utilizzabili in modo efficace. Oltre a soffermarsi su aspetti chiave come i formati, i metadati, la catalogazione e il backup, il seminario fornisce una panoramica comparativa delle principali piattaforme software oggi disponibili, sia proprietarie che open-source.
This webinar in the course of the LOD2 webinar series will present the release 2.0 of the LOD2 stack, which contains updates to the components Ontowiki, Silk
* the assisting sparql editor SPARQLED (DERI),
* the LOD enabled Open Refine (previously Google Refine) (ZEMANTA),
* the extended version of SILK with link suggestion management from LATC (DERI),
* the rdfAuthor library which allows to manage structured information from RDFa-enhanced websites (ULEI),
* the SPARQLPROXY which is a PHP based forward proxy for remote access to SPARQL end points (ULEI)
Release 2.0 contains also a first contributed debian package for a component which is maintained by a group outside the LOD2 consortium. With the help of ULEI a package for the STANBOL engine (http://stanbol.apache.org/) has been contributed.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
http://lod2.eu/BlogPost/webinar-series
The document discusses analysis models for processing Large Hadron Collider (LHC) collision data using grid computing resources. It presents benchmark timing results for different analysis modes in ROOT like using C++, Python, and Athena. Processing derived data products like D1PD, D2PD and D3PD files with a C++ compiled analysis provides the best performance, being up to an order of magnitude faster than other modes. The document aims to help physicists optimize their analysis setup by comparing available options and estimating resource requirements.
The document discusses using annotation processing tooling (APT) to generate code and reports based on annotations in source code. Specifically, it summarizes using APT to improve internationalization (i18n) by generating .properties files and ResourceBundle classes from @MessageBundle and @Message annotations. This allows maintaining localization keys and default text in Java code while generating files for other locales.
1) DIFFER is a tool that analyzes and compares digital image file formats like TIFF, JPEG, JP2 and DjVu to identify properties, validate formats, detect differences and glitches.
2) It incorporates various existing tools and uses techniques like hashing, PSNR and pixel detection to analyze files.
3) The tool can be used to help set digital preservation standards and quality control for file conversion and master files.
The document discusses JPEG 2000 software licensing. It notes that the authors' JPEG 2000 software package was originally intended for internal research and as a reference for the JPEG 2000 Part 10 standard, but that they have received numerous requests from companies and academics. It raises questions about how to balance non-commercial and commercial use policies for the code and how to provide access to it while potentially creating revenue.
This document discusses using annotation processing (APT) to improve internationalization (i18n) in Java applications. It describes how APT can be used to generate property files and resource bundle classes from annotations, avoiding issues with maintaining keys across files. The Ez18n library demonstrates this approach by processing @Message and @MessageBundle annotations to generate localized text resources for different platforms.
This document discusses using GPUs for image processing instead of CPUs. It notes that GPUs have much higher peak performance than CPUs, growing from 5,000 triangles/second in 1995 to 350 million triangles/second in 2010. However, GPU programming is more complex than CPUs due to the different architecture and programming model. This can make it harder to implement algorithms on GPUs and to optimize for high efficiency. The document proposes a methodology for GPU acceleration including characterizing algorithms, estimating performance, using models like Roofline to analyze bottlenecks, and benchmarking. It also describes establishing a competence center to help others overcome the challenges of GPU programming.
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
Sven Schlarb of the Austrian National Library gave this introduction to large scale preservation workflows with Taverna at the first SCAPE Training event, ‘Keeping Control: Scalable Preservation Environments for Identification and Characterisation’, in Guimarães, Portugal on 6-7 December 2012.
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...gdigugli
This session presents a multidevice content display survivor guide and discusses smart use of annotation processing. Based on some real-life stories and best practices collected while coding Java applications, it includes the following stories for you:
1. Mr. Apt loves you! APT processing explained
2. Embedded i18n and encoding survivor guide
3. Mobile Me : the ez18n project
VIPS is an image processing library for working with large images. It provides fast performance even for multi-megapixel images through parallel processing across multiple CPUs. The core is written in C but it includes bindings for C++, Python, and other languages. The library contains over 350 image processing operations and supports common image formats and color spaces.
Overview of JPEG standardization committee activitiesTouradj Ebrahimi
If you need to know about JPEG standardization activities, these slides are for you. Feel free to distribute, and use in your talks, presentations, etc.
Similar to Jpylyzer, a validation and feature extraction tool developed in SCAPE project (20)
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
This presentation was given by Will Palmer at ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.
In this presentation Will Palmer introduced the SCAPE developed tool Nanite which can help institutions analyze their web archive data.
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
The British Library hosted a ‘SCAPE Information Day at the British Library’, on 14 July 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Some tools were presented and demonstrated in more detail (see the other presentations) and the day was closed with a presentation by Will Palmer, Carl Wilson and Peter May of some of the other outputs that SCAPE has delivered.
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
This document discusses using Hadoop for large scale processing. It provides an overview of Hadoop and MapReduce frameworks and how they allow distributing processing across many nodes to efficiently process large amounts of data in parallel. It also gives examples of how Hadoop has been used at the British Library for digital preservation tasks like format migration and analysis.
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
Alecs Geuder from the British Library presented a new SCAPE developed tool called ‘Flint’ at the ‘SCAPE Information Day at the British Library’, on 14 July 2014. Flint is a format and file validation tool which can be used to valide your files and/or formats against a policy. At the British Library Flint is used to deal with non print legal deposit.
The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
This presentation origins from a webinar presented by Luís Faria. The webinar presents the SCAPE developed tools Scout and C3PO and demonstrates how to identify preservation risks in your content and, at the same time, share your content profile information with others to open new opportunities.
Scout, the preservation watch system, centralizes all the necessary knowledge on the same platform, cross-referencing this knowledge to uncover all preservation risks. Scout automatically fetches information from several sources to populate its knowledge base. For example, Scout integrates with C3PO to get large-scale characterization profiles of content. Furthermore, Scout aims to be a knowledge exchange platform, to allow the community to bring together all the necessary information into the system. The sharing of information opens new opportunities for joining forces against common problems.
The webinar was held 26 June 2014.
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
This presentation was given by Per Møldrup-Dalum at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.
In this presentation an overview of the project, its results and how to sustain it is given. For more information, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
At the ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014 Rune Bruun Ferneke-Nielsen presented how the library uses Jpylyzer, a SCAPE developed tool, to validate millions of JPEG 2000 files in connection with a large newspaper digitization project.
The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
Hadoop has been used at the State and University Library, Denmark, in connection with an experiment on the migration of a large collection of audio files from mp3 to wav. This experiment was presented by Bolette Ammitzbøll Jurik at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants.
The experiment used Hadoop and Taverna but also xcorrSound waveform-compare which is a small tool developed within SCAPE to compare the content of audio files.
Read more about the event in this blog post, http://bit.ly/SCAPE_SB_Demo.
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
Per Møldrup-Dalum introduced how the State and University Library in Denmark have deployed Hadoop in connection with the SCAPE project. With Hadoop the library have been able to process large amounts of data so much fast than what has been done before.
The presentation was given at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. For more information about the demo day, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.
This presentation describes the EU-funded project SCAPE – Scalable Preservation Environments –, its developments and sustainability plans.
The SCAPE project has developed scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects.
The project run-time was around 3½ years from 2011 to 2014.
Read more about SCAPE at www.scape-project.eu
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
Sven Schlarb from the Austrian National Libraries gave an overview of the different application scenarios at the Austrian National Libraries related to Web Archiving and the Austrian Books Online project.
The presentation was given at the LIBER Satellite Event on Long term accessibility of digital resources in theory and practice, https://liber2014.univie.ac.at/satellite-event/, in Vienna on 21 May 2014.
This presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013.
Artur Kulmukhametov, Vienna University of Technology, introduced the importance of content profiling and how this can be done with the help of the SCAPE developed tool C3PO. Content profiling is based on characteristics extracted from the files’ metadata and will help the user to plan digital preservation. The tool C3PO can be easily integrated with both PLATO and Scout.
The presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013.
Catherine Jones, Science and Technology Facilities Council, presented the concept of control policies and what is needed to produce machine understandable control policies.
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
This presentation was given as part of a SCAPE Training event on ‘Effective Evidence-Based Preservation Planning’ in Aarhus, Denmark, 13-14 November 2013.
Barbara Sierman, Koninklijke Bibliotheek in the Netherlands, introduced the policy concept, previous work on policies and the work that has been done within SCAPE on preservation policies. SCAPE will build a catalogue of policy elements with three levels – guidance, preservation procedure, and control policies.
An image based approach for content analysis in document collectionsSCAPE Project
Reinhold Huber-Mörk of Austrain Institute of Technology presented ‘An image based approach for content analysis in document collections’ at
ISVC'13 (9th International Symposium on Visual Computing) in Rethymnon, Crete, Greece, on 31 July 2013.
The development of tools for library workflows for duplicate content detection and content verification for complex documents were presented accompanied by results of the work.
Sven Schlarb of the Austrian National Library presented SCAPE (in German). Besides giving a general overview of SCAPE the presentation also includes descriptions of SCAPE solutions, including tools, software integration, planning, and more.
The presentation was given at the Austrian Library day on ‘National Initiatives on Digital Information. Repositories, Research data and long-term preservation in Austria’ (http://www.obvsg.at/voeb-obvsg-bibliothekstage-2013/programm-410/) on 4 October 2013 in Vienna.
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
Taverna components are semantically annotated and sharable units of functionality that can be included in workflows. They are well-described, produce predictable and reliable behavior, and help hide complexity. Components must conform to agreed specifications described as component profiles. The Taverna workflow system supports finding, using, creating, and semantically annotating components to improve discoverability and interoperability.
At the iPres2013 conference in Lisbon, Portugal, in September 2013 Luís Faria, KEEP SOLUTIONS LDA, presented SCAPE work on monitoring of digital repositories and the tool, Scout, which has been developed in this connection. Scout is a web-based service that assists content holders in monitoring their digital repository and provides an ontological knowledge base for compiling the information needed to detect preservation risks and opportunities.
Barbara Sierman, the National Library of the Netherlands, presented ‘Policy levels in SCAPE’ at the iPres2013 conference in Lisbon, Portugal, in September 2013.
The policy work is part of the SCAPE project and is based on an analysis of digital preservation policies from partner institutions.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
National Security Agency - NSA mobile device best practices
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
1. SCAPE
Improved validation and feature
extraction for JPEG 2000 Part 1:
the jpylyzer tool
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands
2 Open Planets Foundation
3 The British Library
IS&T, Archiving 2012, Copenhagen, 15.6.2012
2. SCAPE
Metamorfoze
National Programme for preservation of paper
heritage
Digitisation as a means to conserve threatened paper
originals
146 TB
Migrate by end 2012
TIFF
JP2
7. SCAPE
Possible solutions
Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2
Not ready for operational use (yet)
Option 3
Develop dedicated tool
16. SCAPE
Example 1: detection of broken JP2s in JISC 1
Newspapers
Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2
*Includes unzipping, actual time needed by jpylyzer much less!
17. SCAPE
Results
- 676 broken JP2s in JISC 1 collection (0.03 %)
TIFF originals still available
- JISC 2 (> 1 million images): 3 broken JP2s
- 19th Century books (> 22 million images): no broken
JP2s
19. SCAPE
TIFF pixels no
identical?
pixel compare yes
Aware JP2K SDK
no
valid JP2?
JP2 Jpylyzer*
yes
image no
properties compare properties
match?
yes
properties
profile
pass fail
*Imported as module in Python-based workflow
20. SCAPE
Example 3: pre-ingest quality control Wellcome
Library
- JP2s produced in-house and by external suppliers
- Use jpylyzer to validate against JP2 spec
- Use extracted properties to validate against a
profile
(Progression order, ratio, layers, ….)
- Profile coded as XML schema
(So jpylyzer output can be validated against schema)
24. SCAPE
Acknowledgements
Debian packages
- Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)
Feedback on early versions
- Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)
25. SCAPE
Funding
This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).
http://www.scape-project.eu
#SCAPEProject