4. FEDORA History
Continuing Research Project
– Cornell 1997
Prototype Application
– University Virginia
Fedora 1.0
– Open Source Release 2002
Fedora 1.2
– Tomorrow!
5. Options, options, options
Very few tools directly compete with
each other
Many tools can be used to accomplish
similar behavior
Many tools fulfill parts of the
functionality needed for a repository
Roll your own solution
6. Why Fedora?
Repository Architects & Developers
Excited
Object oriented approach to digital
objects
Open Source Project
– Funded development (and support)
Java Based
– Multiple HW Platforms
7. Flexible
Integrates well with existing systems
– CGI Scripts
– Web Services
Leaves most decisions to implementers
8. Extensible
Again, no product can do it all
– Imaging, Audio, Transformations,
Courseware
Easy to add new functionality to objects
Embraces web services
Open API’s
– Access
– Management
9. Digital Object
What is the definition of a digital object?
–Documents, such as
articles, preprints, working
papers, technical reports,
conference papers
–Books
–Theses
–Data sets
–Computer programs
–Visualizations,
simulations, and other
models
–Multimedia publications
–Administrative records
–Published books
–Bibliographic datasets
–Images
–Audio files
–Video files
–Reformatted digital library
collections
–Learning objects
–Web pages
list taken from the dspace.org website
11. Object Oriented
A software design method that models the
characteristics of abstract or real objects
using classes and objects.
Proven Techniques for Software
Development
– Requirements gathering – Use Cases
• Developers speak to librarians and other stakeholders
Facilitates reuse of functionality
Design Patterns
Not hacking Perl Scripts to make an
institutional repository
12. Object Oriented
Data
– Metadata
• MODS – Descriptive
• METS – Structural
• MIX, etc – Technical
– Bit streams
• Actual Files – JPG, TIF, WAV, MP3, TEI, EAD
Methods (Behaviors)
– Do stuff with the data
13. Object Oriented Concepts
Classes
– Objects of the same type belong to a class
Interfaces
– A contract defining behaviors a class of objects
will implement
Encapsulation
– Behaviors operate on the data in an object
Reflection
– Discover what interfaces and behaviors an object
implements
14. Image Objects
Two File Image Object
– Data
• Hi Resolution Version: tif
• Low Resolution Version: jpg
MrSID File Image Object
– Data
• MrSID File
16. Basic Image Interface
Implementations
Two File Image Object
– getHighResolutionTIF
• returns high resolution TIF
– getLowResolutionJPG
• returns low resolution JPG
MrSID Image Object
– getHighResolutionTIF
• processes the MrSID file to return a high resolution TIF
file of the image
– getLowResolutionJPG
• processes the MrSID file to return a low resolution JPG
of the image
17. Sheet Music Object
Data
– MODS Metadata
– Images of the pages (Image Objects)
– TEI encoded text of the lyrics (TEI Objects)
Behaviors
– getPageImage(Pagenumber)
• Invoke the getLowResolutionJPG to return the image!
– getMODS
– getLyrics
18. Persistent ID (PID)
Behavior Definition
Metadata
SystemMetadata
DatastreamsData Object
Persistent ID (PID)
Service Binding
Metadata (WSDL)
SystemMetadata
Datastreams
Persistent ID (PID)
Disseminators
Datastreams
System Metadata
Behavior Mechanism Object
Behavior Definition Object
FEDORA’s Interface Implementation
graphics taken from presentations available at www.fedora.info
19. What is FEDORA?
“Plumbing”
Manage associations between objects
and their interfaces
Invoke behaviors from an interface
which an object subscribes
Manages or references files
20. What FEDORA
currently does not do?
“Digital Library in a Box”
– Requires integration and custom
development
Prescribe the right way to do things
– Implementers are free to choose
– Best practices still being fleshed out
22. Choosing Repository Software
Fedora provides a foundation to build
on
LC member of initial deployment team
No other software is like FEDORA
– Except general purpose programming
languages
23. How LC is implementing
FEDORA
Types of Digital Objects
– Sheet Music
– Scores
– Sound Recordings
– Compact Discs
– Manuscripts
– Photographs
– Websites
– “Collections”
Less emphasis
– Intellectual output of university’s research faculty
24. METS Profiles
Correlates well with classes of objects
Articulates
– Structure of an object
– Metadata requirements
METS documents conforming to
profiles are ingested into repository
– Atomization
– Behavior association
26. SIP vs AIP
Complex digital objects are atomized into
small reusable objects upon ingest to
FEDORA
– Sheet Music METS Profile (SIP)
• Sheet music object (AIP)
– Structural metadata encoded in METS
– Descriptive encoded in MODS
• Image objects for each page (AIP)
– TIF and JPG Files
– Technical encoded in MIX
• TEI object for the lyrics (AIP)
– TEI File
27. Why this Architecture?
Clean Separation of Concerns
– Logic: Makes it go!
– Content: From FEDORA
– Style: Web Designers
Object not bound to display
– Repository is for preservation of metadata and
files not markup (HTML)
– Markup accomplished in cocoon layer
Leverage use of METS structural metadata
Performance: Cocoon Caching
28. User Interface Development
Web Designers
– Relate to objects and behaviors
– Can develop in HTML for display
– XSLT
• Uses XML from repository to drive display
29.
30. Other Pieces of the
Repository Puzzle
Other open source tools
– Cocoon
• XML Publishing Framework
– Lucene
• Text Indexing and Search API
Someone has to write software!
– Java to build Lucene indexes
– XSP searching
– More XSLT than you want to see
31. Digital Object Production
How are we building these digital
objects?
– MySQL
– Cocoon
– XSLT
– Homegrown Java
• Technical metadata extraction
32. Cocoon
XML Publishing Framework (Toolbox)
– Generate
• From files (or URLS)
• From databases
• From code (XSP, JSP, PHP)
– Transform
• XSLT
– Serialize
• XML, HTML, PDF, SVG, MIDI?
– Caching
33. XSLT
Philosophy
– Get data into XML as early in the workflow
as possible
Flexibility
– Easy to change logic in XSLT
– No need to recompile
Performance Issues
34. Resources Needed for
FEDORA (Cheap)
Hardware Requirements
– Minimal for experimentation
• Installs on Windows PC
• Packaged to get up and running quickly
• Demo set of objects
– Scales with hardware in a production
environment
35. Resources Needed for
FEDORA (Expensive)
1 or More Developers
– 1: Kick the tires
– or More: Real production
Application Architects
Requirement Analysts
Subject Matter Experts
– Articulate requirements
• Object Structure
• Descriptive Metadata
37. Who
Institutions with resources to do
software development
Unique requirements for digital library
software
– Preexisting tools do not fit the need
Need for integration of existing systems
into one management infrastructure
38. What
Digital Library Plumbing
Very general purpose
– Use it to build almost any digital library
application