SlideShare a Scribd company logo
1 of 25
Download to read offline
SCAP
                                                               E

Improved validation and feature 
extraction for JPEG 2000 Part 1:
the jpylyzer tool
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek –
                         National Library of the Netherlands
2 Open Planets Foundation

3 The British Library 



IS&T, Archiving 2012, Copenhagen, 15.6.2012
SCAPE
                   Metamorfoze
National Programme for preservation of  paper 
  heritage
   Digitisation as a means to conserve threatened paper 
     originals


          146 TB

              Migrate by end 2012
  TIFF
                                        JP2
SCAPE
JP2 from JISC 1 Newspaper Collection (BL)
SCAPE
JP2 from JISC 1 Newspaper Collection (BL)




                              “Well‐formed and valid”
SCAPE




             Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg


Hardware failure may result in 
corrupted images
SCAPE




Not all encoders
produce standard
compliant images 
SCAPE
               Possible solutions

Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2
Not ready for operational use (yet)
Option 3
Develop dedicated tool
SCAPE
                                    Jpylyzer tool




0   1   1   1   1   0   0   1   0   1   1   1                           1   0   1   0   1   1
                                                0
                                                                    1
                                                    1       0
                                                        1       1


                                                                0
SCAPE
                 Jpylyzer tool
‐ First prototype: December 2011 

‐ Refactoring of original code: Jan 2012 

‐ Packaging (Debian): Mar 2012 
   Univ. Southampton, KEEP Solutions, AIT Vienna

‐ Add remaining functionality, bugfixes: Apr‐May 
   2012 (current version: 1.5)
SCAPE
JP2 file


             JPEG 2000 Signature box

                  File Type box

            JP2 Header box (superbox)

           Contiguous Codestream box 0



           Contiguous Codestream box n

                     IPR box

                   XML box(es)

                  UUID box(es)

           UUID Info box(es) (superbox)
SCAPE
Command‐line use
SCAPE
Result
SCAPE
Properties extraction (excerpt)
SCAPE
Properties embedded ICC profile
SCAPE
Documentation
SCAPE
Example 1: detection of broken JP2s in JISC 1 
               Newspapers

     Number of images          2,152,116
     Total size                45 TB
     Average image size        21.8 MB
     Number of threads         1
     Time                      21 days*
     Images/day/ thread 100,000

     TB/day/thread             2


    *Includes unzipping, actual time needed by jpylyzer much less!
SCAPE
                           Results

‐ 676 broken JP2s in JISC 1 collection (0.03 %)
  TIFF originals still available


‐ JISC 2 (> 1 million images): 3 broken JP2s

‐ 19th Century books (> 22 million images): no broken 
  JP2s
SCAPE
Example 2: quality control Metamorfoze
              migration



         146 TB


            Migrate by end 2012
  TIFF
                                     JP2
SCAPE
     TIFF                                             pixels     no
                                                    identical?

                  pixel compare                      yes
Aware JP2K SDK
                                                                  no
                                                    valid JP2?

     JP2                  Jpylyzer*
                                                    yes
                    image                                        no
                  properties       compare          properties
                                                     match?

                                                    yes
                  properties
                    profile
                                                      pass        fail


    *Imported as module in Python‐based workflow 
SCAPE
Example 3: pre‐ingest quality control Wellcome
                   Library

 ‐ JP2s produced in‐house and by external suppliers

 ‐ Use jpylyzer to validate against JP2 spec

 ‐ Use extracted properties to validate against a 
   profile 
    (Progression order, ratio, layers, ….)

 ‐ Profile coded as XML schema
    (So jpylyzer output can be validated against schema)
SCAPE
Platforms and licensing stuff
SCAPE
http://www.openplanetsfoundation.org/software/jpylyzer
SCAPE
Community involvement
SCAPE
              Acknowledgements

Debian packages
‐ Dave Tarrant (Uni Southampton/OPF)
‐ Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions), 
‐ Rainer Schmidt (AIT)


Feedback on early versions
‐ Christy Henshaw (Wellcome Library)
‐ Ross Spencer (TNA)
‐ Wouter Kool (KB)
SCAPE
                    Funding


This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under 
FP7 ICT‐2009.4.1 (Grant Agreement number 270137).


      http://www.scape‐project.eu



                         #SCAPEProject

More Related Content

Viewers also liked

LinkedIn Premium
LinkedIn PremiumLinkedIn Premium
LinkedIn Premiumheetez
 
Participácia Levice - Prvé verejné stretnutie
Participácia Levice - Prvé verejné stretnutieParticipácia Levice - Prvé verejné stretnutie
Participácia Levice - Prvé verejné stretnutieLupus Yonderboy
 
The Medusa Project
The Medusa ProjectThe Medusa Project
The Medusa ProjectRahul Dé
 
Prevent Domestic Violence
Prevent Domestic ViolencePrevent Domestic Violence
Prevent Domestic Violencecicerivera
 
Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...
Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...
Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...kiphampton
 
filsafat umum thales
filsafat umum thalesfilsafat umum thales
filsafat umum thalesLely Surya
 

Viewers also liked (7)

Amfiteater final web
Amfiteater final webAmfiteater final web
Amfiteater final web
 
LinkedIn Premium
LinkedIn PremiumLinkedIn Premium
LinkedIn Premium
 
Participácia Levice - Prvé verejné stretnutie
Participácia Levice - Prvé verejné stretnutieParticipácia Levice - Prvé verejné stretnutie
Participácia Levice - Prvé verejné stretnutie
 
The Medusa Project
The Medusa ProjectThe Medusa Project
The Medusa Project
 
Prevent Domestic Violence
Prevent Domestic ViolencePrevent Domestic Violence
Prevent Domestic Violence
 
Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...
Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...
Stop Making The Web Harder Than It Is; Real-world REST, HATEOAS, and Hypermed...
 
filsafat umum thales
filsafat umum thalesfilsafat umum thales
filsafat umum thales
 

Similar to Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool

Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectSCAPE Project
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationSCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
 
Evaluation of format identification tools
Evaluation of format identification toolsEvaluation of format identification tools
Evaluation of format identification toolsSCAPE Project
 
Distributed computing the Google way
Distributed computing the Google wayDistributed computing the Google way
Distributed computing the Google wayEduard Hildebrandt
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」Satoshi Goto
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific WorkflowsMatthew Gerring
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackSemantic Web Company
 
Overview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesOverview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesTouradj Ebrahimi
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesRichard Bernier
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Basic image processing
Basic image processingBasic image processing
Basic image processingJay Thakkar
 
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...gdigugli
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assetsJon Morley
 
OWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPLOWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPLizso
 

Similar to Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool (20)

Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlation
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
 
Bedrich Vychodil DIFFER
Bedrich Vychodil DIFFERBedrich Vychodil DIFFER
Bedrich Vychodil DIFFER
 
Evaluation of format identification tools
Evaluation of format identification toolsEvaluation of format identification tools
Evaluation of format identification tools
 
The djatoka Image Server
The djatoka Image ServerThe djatoka Image Server
The djatoka Image Server
 
Distributed computing the Google way
Distributed computing the Google wayDistributed computing the Google way
Distributed computing the Google way
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stack
 
Overview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesOverview of JPEG standardization committee activities
Overview of JPEG standardization committee activities
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital Archives
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Basic image processing
Basic image processingBasic image processing
Basic image processing
 
Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012
 
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
 
OWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPLOWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPL
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool

  • 1. SCAP E Improved validation and feature  extraction for JPEG 2000 Part 1: the jpylyzer tool Johan van der Knijff1,2, René van der Ark1, Carl Wilson3 1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation 3 The British Library  IS&T, Archiving 2012, Copenhagen, 15.6.2012
  • 2. SCAPE Metamorfoze National Programme for preservation of  paper  heritage Digitisation as a means to conserve threatened paper  originals 146 TB Migrate by end 2012 TIFF JP2
  • 5. SCAPE Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg Hardware failure may result in  corrupted images
  • 7. SCAPE Possible solutions Option 1 Improve JPEG 2000 module JHOVE But no institutional support, superseded by JHOVE2 (?) Option 2 Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet) Option 3 Develop dedicated tool
  • 8. SCAPE Jpylyzer tool 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 1 1 0 1 1 0 1 1 0
  • 9. SCAPE Jpylyzer tool ‐ First prototype: December 2011  ‐ Refactoring of original code: Jan 2012  ‐ Packaging (Debian): Mar 2012  Univ. Southampton, KEEP Solutions, AIT Vienna ‐ Add remaining functionality, bugfixes: Apr‐May  2012 (current version: 1.5)
  • 10. SCAPE JP2 file JPEG 2000 Signature box File Type box JP2 Header box (superbox) Contiguous Codestream box 0 Contiguous Codestream box n IPR box XML box(es) UUID box(es) UUID Info box(es) (superbox)
  • 16. SCAPE Example 1: detection of broken JP2s in JISC 1  Newspapers Number of images 2,152,116 Total size 45 TB Average image size 21.8 MB Number of threads 1 Time 21 days* Images/day/ thread 100,000 TB/day/thread 2 *Includes unzipping, actual time needed by jpylyzer much less!
  • 17. SCAPE Results ‐ 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available ‐ JISC 2 (> 1 million images): 3 broken JP2s ‐ 19th Century books (> 22 million images): no broken  JP2s
  • 18. SCAPE Example 2: quality control Metamorfoze migration 146 TB Migrate by end 2012 TIFF JP2
  • 19. SCAPE TIFF pixels no identical? pixel compare  yes Aware JP2K SDK no valid JP2? JP2 Jpylyzer* yes image no properties compare properties match? yes properties profile pass fail *Imported as module in Python‐based workflow 
  • 20. SCAPE Example 3: pre‐ingest quality control Wellcome Library ‐ JP2s produced in‐house and by external suppliers ‐ Use jpylyzer to validate against JP2 spec ‐ Use extracted properties to validate against a  profile  (Progression order, ratio, layers, ….) ‐ Profile coded as XML schema (So jpylyzer output can be validated against schema)
  • 24. SCAPE Acknowledgements Debian packages ‐ Dave Tarrant (Uni Southampton/OPF) ‐ Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),  ‐ Rainer Schmidt (AIT) Feedback on early versions ‐ Christy Henshaw (Wellcome Library) ‐ Ross Spencer (TNA) ‐ Wouter Kool (KB)
  • 25. SCAPE Funding This work was partially supported by the SCAPE Project.  The SCAPE project is co‐funded by the European Union under  FP7 ICT‐2009.4.1 (Grant Agreement number 270137). http://www.scape‐project.eu #SCAPEProject