Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Scossu gdi iiif_r+d_report_2019
1. Getty Common Image Service
Research & Design Report
Stefano Cossu, Software Architect
J. P. Getty Trust <scossu@getty.edu>
2. About Getty Digital (GDI)
~2 year-old department
Very active evolution
Created to consolidate IT services for all Getty programs:
Hardware & networking infrastructure
Software development
Information management
Information access & security
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 2
3. GDI's Grand Plan for IIIF
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 3
4. 1. Align with GDI mission
One infrastructure to serve images from all Getty programs:
Image delivery services
Metadata (IIIF Presentation) services
Access policies
Discovery services
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 4
5. 2. Start Simple
~150K images
Edward Ruscha's Streets of Los Angeles (GRI)
Museum Collection Open Content images
All media cleared for open access (i.e. no auth)
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 5
6. 3. Prepare to Grow Inde nitely
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 6
7. 3. Prepare to Grow Inde nitely
50M images
A/V media
Access control
Discovery
Annotations
ID management
ETL pipelines
…
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 7
8. The Report
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 8
9. Why?
Explore all options (easier with no production data!)
Scienti c approach to a challenging project
Stimulate discourse over choices within Getty and the IIIF
community
Possibly improve areas that we nd in need of improvement
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 9
10. Areas Covered
Source image format
Image encoding (compression)
Image Server
Ancillary tools (caching, ETL, etc.)
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 10
11. 1+2. Source Image Format & Encoding
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 11
12. Source Image Benchmark
Criteria
Compatibility with selected image servers
Decoding speed
Image size
Encoding speed
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 12
13. Source Image Benchmark
Methodology
Run conversion on a batch of sample images with different color
topologies and geometries
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 13
14. Source Image Benchmark
Format: Pyramidal TIFF (PTIFF) our pick
Very fast decoding
Established standard
Flexible compression options & tools
Limited IIIF server support
Lots of manual tweaking
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 14
15. Source Image Benchmark
Format: JPEG 2000 (JP2)
Very fast encoding (with Kakadu)
Slower decoding
Depends on proprietary software for decent performance
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 15
16. Source Image Benchmark
Encoding: JPEG our pick
very space- & CPU-ef cient
A classic…
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 16
17. Source Image Benchmark
Encoding: WebP
Higher image quality than JPEG for the same storage size
Limited support
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 17
18. Source Image Benchmark
Encoding: Lossless (LZW, LZMA, ZIP, etc.)
Highest image quality
CPU-intensive
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 18
19. Source Image Benchmark
Encoding: Uncompressed Data
Highest image quality
Lowest CPU usage
Not viable for the data volumes handled
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 19
20. 3. Image Server
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 20
22. Image Server Benchmark
Methodology
HTTP load test using Locust on several "axes":
Server software (in Docker containers)
Source image size (<10 Mp, 10÷75 Mp, >75 Mp)
Derivative size and type (region, full)
Number of concurrent requests (10, 100, 1000)
All caches turned off
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 22
23. Image Server Benchmark
IIPImage our pick
Most well-established software
Fastest processing by far
Most reliable delivery (0% failure rate)
Smallest resource footprint
Version 1.1 is out!
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 23
24. Image Server Benchmark
Cantaloupe
Fast (not quite as IIPImage)
Reliable (not quite as IIPImage)
High resource usage
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 24
25. Image Server Benchmark
Loris
Customizable (GDI is a Python shop)
Very high failure rate under stress
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 25
26. Outcome of Locust tests
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 26
27. Outcome of Locust tests
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 27
28. Load averages for iipsrv with 10, 100, 1000 connections
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 28
29. 4. Other Components (Custom Built)
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 29
30. Gateway Service
Provides a point of entry & exit for several request handling services:
API versioning
Caching
ID management and redirect service
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 30
31. ETL Pipeline
Provides batch conversion of original images to PTIFFs and source
system metadata to IIIF manifests.
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 31
32. The Stuff
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 32
33. INFO The report has a Getty-centric scope, however some outcomes
could be useful to the general public.
WARNING The Getty does not intend to maintain the software used for
the report in the long term.
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 33
34. Report Contents
Google Drive Folder containing:
Report (PDF)
Docker containers with server setups
Reference data set
Benchmark tools
Getty Common Image Service R&D Report—IIIF Meeting, Göttingen, Germany, June 2019 34