How the Congressional Budget Office Assists Lawmakers
IWM DAMS
1. DAMbusters: IWM’s mission to design
and implement a bespoke DAMS
15 September 2017
Emily Dodd, Head of Collections Development & Information
Rosie Forrest, Collections Systems Manager
Rob Tyler, IT Infrastructure Manager (DAMS)
2. Overview
Our digital assets, and how we use them
Why we need a DAMS – and why an Axiell DAMS
Commissioning the DAMS development
Phases 1 and 2 of IWM’s DAMS project
Infrastructure, migration, technical aspects
Your DAMS projects?
3. Over 535,000 assets and 500TB so far:
4,000 film scan masters 450TB (DPX)
93,000 audio masters 10TB (WAV)
380,000 image masters 10TB (TIFF)
Over 58,000 access versions/ renditions 30TB of renditions
(PRORES, MPEG, Flash,
MP3, JPEG)
But this is increasing daily:
• “7000 Project” for films generates 10TB per month
• Videotape scanning project will generate 900TB over four years
• Ongoing born-digital acquisition
Our digital assets
4. Born-digital
• Official film and photographs from MOD under the PRA
• Portraits and film captured by our staff
• Sound interviews
• Video and mixed media artworks
Digitised
• Images of collections objects
• Scanned documents
• Optimised photographs
• Digitised sound and video for preservation
How we acquire digital assets
5. How we use digital assets
Commercial
• TV programme and documentary makers
• Film studios
• Direct sales: image and film
• Publishing, licensing and brand
Research
• Staff use
• Members of the public – remote access, digital copying
• Specialist and academic researchers
• Supporting collections management: condition, hazard etc
Access
• Embedded in exhibitions and audio-visual displays
IWM website & Collections Online (6.2 million visits in 2016-17)
• Digital delivery of copies
7. For the same reasons that we have
secure stores, display spaces and
shops:
• Preservation
• Access
• Commercial (c£900k income pa)
• Evidencing we meet ISO 16363
(key for retaining our public record
deposits and ensuring
stakeholders can be sure our
digital preservation capability is fit
for purpose)
So we need a DAMS….
8. • We wanted a DAMS that was responsive to museums’ needs,
not just commercial needs
• We knew Axiell understood IWM’s challenges and infrastructure
• Adlib is one of our core corporate systems
• We needed reliable, future-proof Adlib integration
• We’ve successfully commissioned bespoke Axiell development
in the past
• Simplicity – one key supplier
Why an Axiell DAMS?
9. IWM
Emily Dodd Budget and management assurance
Rosie Forrest Technical specs, Adlib integration, user comms
Rob Tyler Infrastructure, technical specs, storage forecasting
David Walsh Film expertise and voice of the end-user
Axiell
Alex Fell Project management and client comms
Ryan Martin, Joanna Preston, Dan Moran, Christ Hagenaars, Giuseppe
Davies, Ian Brown – the Development team
Our project team
10. Specification Detailed technical requirements
Started Dec 2015 User consultation and wish-list
Phase 1 Proof of concept – still image ingest
Live Nov 2016 Technical database and UX
Getting enough technical data to manage assets
Phase 2 Major advance - Time-based media ingest
Live April 2017 All jobs run on servers, not on user’s PC
Phase 3 Refinements - Improved user experience for ingest
In progress! Improved asset retrieval
Audit and manage stored assets
Additional functions from our “nice to have” list
Project summary
11. Specification
Digital preservation
Integrate with existing systems and infrastructure
Automated batch ingest
Easy but secure access to media
Easy to use, monitor and upgrade
13. UMID = Unique Material Identifier
Network ► server ► volume ► YYYY ► MMDD ► HHMM ► UMID ► filename.type
0x060A2B340101010201010F12130000006DFAD8174478DE39000000155D065A03
Universal Label Material Number Time/ Date Coordinates Org/ User
14. Technical database as UMID translator
Input UMID to http service
Returns basic XML showing filepath
15. Adlib servers
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
How a search returns an image
Adlib API
Firewalls
Adlib servers
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
Conservation
Location
Exhibition
Management
Acquisition
Entry
How a search returns an image
Adlib API
Image reference
Firewalls
Adlib servers
Media record
Metadata
Version
Master UMID
Ingest data
Rendition UMIDS
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
Media type
Image reference
Rendition type
Conservation
Location
Exhibition
Management
Acquisition
Entry
Quality
How a search returns an image
Adlib API
Image reference
Firewalls
Adlib servers
Media record
Adlib API Metadata
Version
Master UMID
Ingest data
Rendition UMIDS
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
Media type
Image reference
Rendition type
Conservation
Location
Exhibition
Management
Acquisition
Entry
Quality
How a search returns an image
Adlib API
Image reference
Firewalls
Adlib servers
DAMS servers
Media record
Adlib API Metadata
Version
Master UMID
Ingest data
Rendition UMIDS
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
Media type
Media service record
UMIDFile path
Image reference
Rendition type
Metadata
Conservation
Location
Exhibition
Management
Acquisition
Entry
Quality
How a search returns an image
Adlib API
Image reference
Firewalls
Adlib servers
DAMS servers
Media record
Adlib API Metadata
Version
Master UMID
Ingest data
Rendition UMIDS
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
Media type
Media service record
UMIDFile path
Image reference
Rendition type
FILE
Metadata
Conservation
Location
Exhibition
Management
Acquisition
Entry
Quality
How a search returns an image
Adlib API
Image reference
Firewalls
16. Adlib servers
DAMS servers
Media record
Adlib API Metadata
Version
Master UMID
Ingest data
Rendition UMIDS
Object record
Keywords
Person names
Places
Descriptions
Dates
Collections
Media type
Media service record
UMIDFile path
Image reference
Rendition type
FILE
Metadata
Conservation
Location
Exhibition
Management
Acquisition
Entry
Quality
How a search returns an image
Adlib API
Image reference
Firewalls
Context
Usability
Delivery
25. Asynchronous
microservices
DAMS servers
Storage servers
User processes
Original storage
Database servers
Ingest UI
START
Visualisation UI
END
FileIngest service Storage service
Metadata extraction
service
Database interaction
service File
Adlib API Adlib
Transcode service
Video transcoders
Audio transcoders
Image transcoders
Rendition
Rendition
Rendition
26. “…keeping digital material alive so that they remain usable as
technological advances render original hardware and software
specification obsolete.”
Harrods Librarian Glossary via Wikipedia
Digital preservation
Storage data is abstracted
Files are read-only
Easy to upgrade/ add modules
Transcoding for renewed formats
27. Phase 3 of our Axiell DAMS project is just starting
Audit/ checksum comparison tool
Enhanced transcode capability
Delete unwanted renditions or duplicate media
Improved user interfaces
Better access to media
What’s next for IWM
Emily
Today we want to share our experiences of commissioning Axiell to develop and build a bespoke DAMS for us.
I will be telling you about who we are, our analogue and digital holdings, why the right DAMS is so important to us and how the project started, then I will hand over to Rosie and Rob for their technical expertise and more information about how we handle our digital assets.
Emily
Alongside the analogue collections we have over half a million digital assets, mostly images. This doesn’t include back ups and duplicates.
Though we have fewer film files in number, they are by far the largest files and we have several current projects which are adding to this at a fairly steady rate.
Emily
We acquire our born-digital assets in exactly the same ways as our analogue items:
Many of them from MOD as official deposits which are public records, or from service personnel and civilians, so it can come in all sorts of formats including head cam and mobile phone footage.
We have teams of staff , who take portrait images, create video footage and record sound interviews, and we collect some video and mixed media art works.
As well as the born digital assets we are also continually digitising parts of our collections for access and preservation.
Emily
And we use these digital assets, both digitised and born-digital, in support of all of our objectives and across all our commercial and non-commercial enterprises.
Emily
Most of all, having a DAMS enables people like me to retrieve our digital assets to illustrate our collection. Rosie will talk through the technical details of the formats and renditions that we hold, but I wanted an excuse to show one of the quirkier short films that we have.
This is a Second World War Ministry of Information newsreel film which encouraged the public to walk short distances instead of taking up much-needed space on public transport.
I hope you enjoy it
NPB 13536 AGAG [Main Title]
George the"transport hog"
Emily
So to manage all these assets, and enable us to make use of them, we need a reliable DAMS. We need to be able to preserve our collections and make them accessible.
On top of that, our film and image sales are a key part of our income stream, so the DAMS isn’t just nice to have, it’s business critical.
Just as we comply with external standards in our physical collections stores, we need to comply with the relevant standards for our digital storage - in this case ISO 16 363, so that we can prove to stakeholders that we are a suitable repository.
Emily
There was no question that we needed a DAMS. We already had a DAMS in place, since about 2008, but we were increasingly finding that we needed a more configurable and integrated solution.
Naturally we turned to Axiell as the bespoke systems development projects we have done with them in the past have been successful and Adlib is our central collections management system.
We had reached a point of stability in understanding exactly what we needed, as well as investing in systems and infrastructure expertise. For me as the budget holder for both systems, having one supplier and a fully integrated solution was ideal.
Will talk more about the project management aspects later
Emily
We put together a small team at our end and benefitted from an excellent team under Alex’s management, which was the first time we’d done a project like this with the UK office instead of Maarssen.
The team planned the time to come to our site, understand our specs and learn exactly what we needed and how it would be used.
Emily, and hand over to Rosie
We started planning in December 2015, with a long period of specification where we did a detailed assessment of what we needed and wanted, and testing the first tools ready for phase 1.
Phase 1 was all about building the core of the system and testing it with easier tasks, the ingest and transcoding of still images. This gave us proof of concept, and the confidence that a bespoke system would work.
Phase 2 was a bigger challenge of time-based media, that’s sound and video, and about scaling the system up to deal with larger files and bigger projects.
Phase 3 will be about refining our workflows based on months of experience using the system for live ingest, and securing long term sustainability for the system and our digital processes.
And I’m going to hand over to Rosie to tell you more about each phase and how the system works
Rosie
I’m not going to talk about the project but the system
Here are the key technical specifications we worked to at IWM
Digital preservation
Integrate with existing systems and infrastructure
Automated batch ingest
Easy but secure access to media
Easy to use, monitor and upgrade
Today’s audience is a mix of the very experienced and those just starting out their DAMS projects so I’ll explain the core concepts of DAMS that allowed us to meet these needs
Rosie
For me it was important to understand the difference between:
Digital Asset Management - which could in theory be a nice filing system and neat data standard
Digital Asset Management Systems – which provides that nice filing and data standard, but using machine processes and automated workflows
I had to learn how to trust my system – I don’t want to be in the control room, I just want to see the outcome.
We couldn’t integrate it, meet digital preservation needs, or deal with the volume of data that we have without automating a lot of processes.
Rosie
One of the ways we have arranged our assets to be more machine-friendly is the UMID
UMID - “unique material identifier” – is a long string of characters providing a unique reference to every file
It is not very human friendly
Globally unique SMPTE standard
Some embedded meaning of unique codes and file characteristic information
But essentially we treat it as a “dumb” identifier
Allows a machine to handle filing for me
The UMID is present in all database records relating to the file
And the UMID is embedded in the filepath
But we maintained a human-friendly filename
A readable and relevant filename makes export much easier
Rosie
But how do I get a file from a UMID?
Using the technical database – we call ours MediaService
A simple data structure matching UMID to filepath - a “UMID translator”
Keep file storage data separate from everything else eg. usability data
Accessible via HTTP lookup – returns basic XML containing a filepath
Give an API the HTTP address and UMIDs = automated file delivery
Users only get the UMID via appropriate database and API access
Which we already have in place to protect our data
Cannot extrapolate file location (via UMID or other context)
Other benefits of keeping storage data separate - later
Rosie
Sounds complicated – how does this work in practice?
When we search for an image or digital copy of the collections, we might search by description, keyword, creator, title... etc.
A digital asset is not much use if we don’t know what it depicts.
A data request from the Adlib API gets this info from object catalogue data.
Internal link between objects catalogue and media catalogue
In media catalogue we store descriptive and accountability data about the asset, plus the UMIDs
Another instance of the Adlib API filters the media record for an appropriate version and picks that UMID
API sends single UMID off to the technical database
…where we can get the actual file-path
…and the system can serve the appropriate media to the requester
Rosie
In summary, we link:
Context: what does the asset show and how is it important and appropriate
Usability: Can I play/ print this file – or a version of it
Delivery: securely serve up the file
Link between context and usability in this way means no duplication of effort in adding contextual data that make our digital assets meaningful
Any user wanting to access a file (for whatever reason not just web) can only get it if the Adlib database and API let them
So we can ensure each service is only getting the data it needs to deliver appropriate file
Rosie
Each object catalogue record can have more than one asset attached
Each with its own media catalogue record
We might have multiple versions of the same asset eg: the original photo and a cropped and optimised version
Rosie
Or different views or pages
Rosie
Or a raw scan showing a glass negative in an original and untouched state, plus and a cleaned, adjusted version ready for display or commercial use.
For each asset, a media record holds usability data:
Version and quality data help us see how that file is ready to use.
Technical metadata helps us choose the best file – maybe the biggest, or the most recent, or a particular format.
Metadata is extracted or generated at ingest using MediaInfo
Rosie
Something else that happens at ingest is transcoding
This shows a 10 minute film and its access copies
The Master is 23x bigger than the Mpeg…
Each rendition is a direct copy of the original in a widely supported and generally smaller format
For ease of access
And to reserve our original quality high-resolution assets for authorised use only
The process of making these copies is Transcoding
Each rendition file is assigned its own UMID at creation
This UMID data is stored as a sub-group of the main Adlib media record
It can be accessed in the same way and we can use the API to select the best rendition per service
Rosie
Our data formats and renditions
(DPX = digital picture exchange)
We use FFMPEG
Open source, with plenty of guidance available online
Uses a string of commands to produce an output file
Lots of options and filters
Axiell help us to set up our core commands
We can easily change these as our needs develop…
…simply by updating the command in a text editor
Bit fiddly, very flexible
Rosie
That’s quite a lot happening at ingest – how do we manage it all?
Machine workflows for setting metadata and transcode rules
Mapping key data about the file to Adlib fields
Here’s the Phase 1 config builder – to be improved!
Set transcode rules – what copy or copies do I want
Set field mapping rules
Regular Expressions, uses patterns in the filename to find the matching Adlib collections object record
Apply these rules to batch of files
Rosie
Here’s what it looks like. Step through slide versions of ingest UI on a batch of files.
I open ingest tool and add the folder where my files are batched
The files load in the preview window
I select a ruleset that contains metadata and transcode rules
The ruleset uses regular expressions to test my filenames against Adlib data – does an object number exist for this file?
I’ve removed some of the columns – you can still see some of the data preview of what will be written to Adlib
What about if the regular expression didn’t match? I can un-hide those items
I can actually edit those items too, and change the data before it goes into Adlib
So now I can start ingest
And go straight to the visualisation tool to see job messages as they complete
Rosie
Step through slide versions of visUI in action
Screenshot of the report my user gets when they submit a job
And progressions as that report develops
A message back from each part of the system as it completes or errors
No linear sequence of messages but the visualiser helps to categorise them
Phase 3 will refine this
Rosie
Here’s what’s happening after I send the initial messages:
We have the ingestUI and visualiser.
Here’s that one message we sent, and all the little messages we got back.
The ingest service splits messages off to each module telling them which rules apply to which files, and each part of the system reports back when done. They can all be working at once on different files, jobs, rules – asynchronous.
When Axiell built this system, the idea was that each aspect of ingest could be handled separately, so that if we ever needed to upgrade, change or add a service, it would be much easier than picking apart one bulky process. It can also be scaled across multiple servers. eg. a new transcode server is coming soon to share that part of the load. This system will support our long term expansion.
Rosie
I’ve covered getting data out of the system, getting data in – what about ongoing maintenance of that data once it is in?
A nice quote from Wikipedia which puts it very concisely!
Here’s how the data structures and processes support our infrastructure needs, which in turn support many digital preservation requirements.
Storage data is abstracted from content and use data (tech db)
Tech db makes it easy to migrate to new hardware – single point of data change
Quickly identify all files at any location
Even if file use data changes, the UMID is consistent
Files are read-only as machine storage = No need for human interference with stored files
As files are stored read-only, backup rarely changes
We can easily add new modules
We can re-transcode as new formats are needed
Rosie
Here’s what’s next for us:
We’ll look at long term management of stored assets as our infrastructure evolves.
So phase 3 will include audit tools including testing checksums.
We’ll need to re-transcode assets into more modern formats.
We might want to delete unwanted renditions or duplicate media.
We’ll always want to obtain statistics on the collection.
We’ll be working with Axiell to improve the user interfaces
I’d like to improve the way media is accessed for download and viewing
Overall happy with my system – what about yours?
Rosie
Emily and co will give the project management story for our DAMS
I can give some tips from the system administrator technical development POV when beginning a DAMS project:
How will the data and assets be used? – within the system and by other platforms – and how will those platforms access the media?
Make use of APIs for integrating, sharing, cross-referencing data
Who needs to access the data? Modify the data?
What does my server infrastructure look like? How are backups managed?
Choose stable formats for preservation and simple formats for access
Understand data hierarchy for versions and access copies
Design and control data standards
Ask users! Build friendly and linear user processes
Consider feedback and error reporting
Plan for expansion