SlideShare a Scribd company logo
SCAPE


Improved validation and feature
extraction for JPEG 2000 Part 1:
the jpylyzer tool
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands
2 Open Planets Foundation

3 The British Library



IS&T, Archiving 2012, Copenhagen, 15.6.2012
SCAPE
                  Metamorfoze
National Programme for preservation of paper
  heritage
  Digitisation as a means to conserve threatened paper
    originals


         146 TB

             Migrate by end 2012
  TIFF
                                       JP2
SCAPE
JP2 from JISC 1 Newspaper Collection (BL)
SCAPE
JP2 from JISC 1 Newspaper Collection (BL)




                              “Well-formed and valid”
SCAPE




             Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg


Hardware failure may result in
corrupted images
SCAPE




Not all encoders
produce standard
compliant images
SCAPE
               Possible solutions

Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2
Not ready for operational use (yet)
Option 3
Develop dedicated tool
SCAPE
                                    Jpylyzer tool




0   1   1   1   1   0   0   1   0   1   1   1       0   1   0   1   1
SCAPE
                 Jpylyzer tool
- First prototype: December 2011

- Refactoring of original code: Jan 2012

- Packaging (Debian): Mar 2012
   Univ. Southampton, KEEP Solutions, AIT Vienna

- Add remaining functionality, bugfixes: Apr-May
   2012 (current version: 1.5)
SCAPE
JP2 file


             JPEG 2000 Signature box

                  File Type box

            JP2 Header box (superbox)

           Contiguous Codestream box 0



           Contiguous Codestream box n

                     IPR box

                   XML box(es)

                  UUID box(es)

           UUID Info box(es) (superbox)
SCAPE
Command-line use
SCAPE
Result
SCAPE
Properties extraction (excerpt)
SCAPE
Properties embedded ICC profile
SCAPE
Documentation
SCAPE
Example 1: detection of broken JP2s in JISC 1
               Newspapers

    Number of images           2,152,116
    Total size                 45 TB
    Average image size         21.8 MB
    Number of threads          1
    Time                       21 days*
    Images/day/ thread 100,000
    TB/day/thread              2


    *Includes unzipping, actual time needed by jpylyzer much less!
SCAPE
                           Results

- 676 broken JP2s in JISC 1 collection (0.03 %)
  TIFF originals still available


- JISC 2 (> 1 million images): 3 broken JP2s

- 19th Century books (> 22 million images): no broken
  JP2s
SCAPE
Example 2: quality control Metamorfoze
              migration



         146 TB


            Migrate by end 2012
 TIFF
                                     JP2
SCAPE
     TIFF                                            pixels     no
                                                   identical?

                  pixel compare                     yes
Aware JP2K SDK
                                                                 no
                                                   valid JP2?

     JP2                  Jpylyzer*
                                                   yes
                    image                                       no
                  properties      compare          properties
                                                    match?

                                                   yes
                  properties
                    profile
                                                     pass        fail


    *Imported as module in Python-based workflow
SCAPE
Example 3: pre-ingest quality control Wellcome
                   Library

 - JP2s produced in-house and by external suppliers

 - Use jpylyzer to validate against JP2 spec

 - Use extracted properties to validate against a
   profile
    (Progression order, ratio, layers, ….)

 - Profile coded as XML schema
    (So jpylyzer output can be validated against schema)
SCAPE
Platforms and licensing stuff
SCAPE
http://www.openplanetsfoundation.org/software/jpylyzer
SCAPE
Community involvement
SCAPE
              Acknowledgements

Debian packages
- Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)


Feedback on early versions
- Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)
SCAPE
                    Funding


This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).


      http://www.scape-project.eu



                        #SCAPEProject

More Related Content

Viewers also liked

Presentation of SCAPE Project
Presentation of SCAPE ProjectPresentation of SCAPE Project
Presentation of SCAPE Project
SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
SCAPE Project
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collections
SCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
SCAPE Project
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlation
SCAPE Project
 
Similarity Maps Using SSIM Index
Similarity Maps Using SSIM IndexSimilarity Maps Using SSIM Index
Similarity Maps Using SSIM Index
Michel Alves
 

Viewers also liked (6)

Presentation of SCAPE Project
Presentation of SCAPE ProjectPresentation of SCAPE Project
Presentation of SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collections
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlation
 
Similarity Maps Using SSIM Index
Similarity Maps Using SSIM IndexSimilarity Maps Using SSIM Index
Similarity Maps Using SSIM Index
 

Similar to Jpylyzer, a validation and feature extraction tool developed in SCAPE project

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
jkSlidevault
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
SCAPE Project
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
Deep Learning JP
 
The djatoka Image Server
The djatoka Image ServerThe djatoka Image Server
The djatoka Image Server
Herbert Van de Sompel
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital Archives
Richard Bernier
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」
Satoshi Goto
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
SCAPE Project
 
Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012
CRS4 Research Center in Sardinia
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stack
Semantic Web Company
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
Akira Shibata
 
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUGSmart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
gdigugli
 
Bedrich Vychodil DIFFER
Bedrich Vychodil DIFFERBedrich Vychodil DIFFER
Bedrich Vychodil DIFFER
Future Perfect 2012
 
Jpeg2000
Jpeg2000Jpeg2000
Jpeg2000
imec.archive
 
Smart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUGSmart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUG
gdigugli
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
imec
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
SCAPE Project
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
Matthew Gerring
 
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
gdigugli
 
Vips 4mar09e
Vips 4mar09eVips 4mar09e
Vips 4mar09e
guest0f52728
 
Overview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesOverview of JPEG standardization committee activities
Overview of JPEG standardization committee activities
Touradj Ebrahimi
 

Similar to Jpylyzer, a validation and feature extraction tool developed in SCAPE project (20)

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
 
The djatoka Image Server
The djatoka Image ServerThe djatoka Image Server
The djatoka Image Server
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital Archives
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stack
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
 
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUGSmart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
 
Bedrich Vychodil DIFFER
Bedrich Vychodil DIFFERBedrich Vychodil DIFFER
Bedrich Vychodil DIFFER
 
Jpeg2000
Jpeg2000Jpeg2000
Jpeg2000
 
Smart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUGSmart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUG
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
 
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
 
Vips 4mar09e
Vips 4mar09eVips 4mar09e
Vips 4mar09e
 
Overview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesOverview of JPEG standardization committee activities
Overview of JPEG standardization committee activities
 

More from SCAPE Project

C sz z6
C sz z6C sz z6
C sz z6
SCAPE Project
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
SCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
SCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
SCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
SCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
SCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
SCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
SCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
SCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
SCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
SCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
SCAPE Project
 

More from SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Jpylyzer, a validation and feature extraction tool developed in SCAPE project

  • 1. SCAPE Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool Johan van der Knijff1,2, René van der Ark1, Carl Wilson3 1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation 3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012
  • 2. SCAPE Metamorfoze National Programme for preservation of paper heritage Digitisation as a means to conserve threatened paper originals 146 TB Migrate by end 2012 TIFF JP2
  • 3. SCAPE JP2 from JISC 1 Newspaper Collection (BL)
  • 4. SCAPE JP2 from JISC 1 Newspaper Collection (BL) “Well-formed and valid”
  • 5. SCAPE Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg Hardware failure may result in corrupted images
  • 6. SCAPE Not all encoders produce standard compliant images
  • 7. SCAPE Possible solutions Option 1 Improve JPEG 2000 module JHOVE But no institutional support, superseded by JHOVE2 (?) Option 2 Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet) Option 3 Develop dedicated tool
  • 8. SCAPE Jpylyzer tool 0 1 1 1 1 0 0 1 0 1 1 1 0 1 0 1 1
  • 9. SCAPE Jpylyzer tool - First prototype: December 2011 - Refactoring of original code: Jan 2012 - Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna - Add remaining functionality, bugfixes: Apr-May 2012 (current version: 1.5)
  • 10. SCAPE JP2 file JPEG 2000 Signature box File Type box JP2 Header box (superbox) Contiguous Codestream box 0 Contiguous Codestream box n IPR box XML box(es) UUID box(es) UUID Info box(es) (superbox)
  • 16. SCAPE Example 1: detection of broken JP2s in JISC 1 Newspapers Number of images 2,152,116 Total size 45 TB Average image size 21.8 MB Number of threads 1 Time 21 days* Images/day/ thread 100,000 TB/day/thread 2 *Includes unzipping, actual time needed by jpylyzer much less!
  • 17. SCAPE Results - 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available - JISC 2 (> 1 million images): 3 broken JP2s - 19th Century books (> 22 million images): no broken JP2s
  • 18. SCAPE Example 2: quality control Metamorfoze migration 146 TB Migrate by end 2012 TIFF JP2
  • 19. SCAPE TIFF pixels no identical? pixel compare yes Aware JP2K SDK no valid JP2? JP2 Jpylyzer* yes image no properties compare properties match? yes properties profile pass fail *Imported as module in Python-based workflow
  • 20. SCAPE Example 3: pre-ingest quality control Wellcome Library - JP2s produced in-house and by external suppliers - Use jpylyzer to validate against JP2 spec - Use extracted properties to validate against a profile (Progression order, ratio, layers, ….) - Profile coded as XML schema (So jpylyzer output can be validated against schema)
  • 24. SCAPE Acknowledgements Debian packages - Dave Tarrant (Uni Southampton/OPF) - Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions), - Rainer Schmidt (AIT) Feedback on early versions - Christy Henshaw (Wellcome Library) - Ross Spencer (TNA) - Wouter Kool (KB)
  • 25. SCAPE Funding This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137). http://www.scape-project.eu #SCAPEProject