EXPOSING HIDDEN RELATIONSHIPS:
PRACTICAL WORK IN LINKED DATA
USING DIGITAL COLLECTIONS
Cory Lampert and Silvia Southwick
UNLV University Libraries Digital Collections
April 23, 2015
Linked Data & RDF: New Frontiers in Metadata and
Access Conference
OVERVIEW
 Video Demo
 UNLV Linked Data Project
 Digital Collections Metadata: Source of Rich (But
Hidden) Relationships
 Video Demo
 Next Steps, Future
 Questions
VIDEO DEMO
This short video (no sound, just image) will give a preview of
what linked data may look like to users.
It shows the Virtuoso Pivot Viewer software acting upon
UNLV’s Linked Open Data – triplestore.
Think about how this is similar/different to how users
currently view data in library systems.
[PLAY PIVOTVIEWER.mp4]
EXPLORING LOD: TAKING THEORY
TO PRACTICE
• How we started
• Goals set
• What we accomplished
HOWWEBEGAN
Conferences and
“buzz”
Curiousity and
professional
development
Exploration and pilot
project
Compelling results;
sharing impact of
what we’ve learned
Assessment
Much more to do... A
sense of humor is
helpful!
Photo: Five men with
burros, circa 1900,
Tonopah/Goldfield
Collection
MOTIVATION
 Information encapsulated in
records
 Records contained in collections
 Very few links are created within
and/or across collections
 Links have to be manually
created
 Existing links do not specify the
nature of the relationships
among records
This structure hides potential
context (links) within and across
collections
 Free metadata from silos
 Expose rich relationships
 Leverage powerful, seamless,
interlinking of data from multiple
sources
 Discover and query data in new
ways
 More precise searching
 More opportunities to repurpose
data
Current Practice LOD Potential
POLL
Please use the agree/disagree button, available from
the pull down menu at the top of the screen to
respond to the statement below:
Statement: There is interest in doing practical work
with linked open data at my institution.
FOUNDATION OF PILOT
 Our digital collections consist of unique materials
documenting the history of Southern Nevada stored
in CONTENTdm; project focused on LOD for visual
material collections
 Definition of LOD we are using: “Linked Data refers
to a set of best practices for publishing and
interlinking data on the Web.”
 A good way to better understand this is the 5-Star
Data diagram: http://5stardata.info/
PREPARINGFORDEPARTURE
Before we launch into a
discussion of how we
created our linked data,
let’s take a short trip.
We will start in our
current data: digital
collections metadata
records, and end in the
new world of linked
open data.
Photo: Photograph of
Howard Hughes in cockpit
of the second XF-11, April
4, 1947, Howard Hughes
Collection
Graphical Representation: Part of a
Record
EXAMPLES OF RECORDS
December
12, 1915
EXPOSING HIDDEN LINKS
POLL
Please use the agree/disagree button, available from
the pull down menu at the top of the screen to
respond to the statement below:
Statement: The diagrams helped me to see how
linked data helps to reveal hidden relationships in
existing metadata.
UNLV LINKED OPEN DATA PROJECT GOALS
Study the feasibility of developing a common
process that would allow the conversion of our
collection records into linked data preserving their
original expressivity and richness
Publish data from our collections in the Linked Open
Data Cloud to improve discoverability and
connections across our collections and with data from
other related data sets on the Web
ACTIONS TECHNOLOGIES
Clean data
Export data
CONTENTd
m
Open
Refine
Import data
Prepare data
Reconcile
Generate
triples
Export RDF
Import data
Publish
Mulgara /
Virtuoso
Phase 1
Phase 2
Phase 3
WHATWELEARNED
With interest and
motivation, Linked
Open Data is a
feasible goal
Visualization tools
help convey the
benefits of LOD work
A pilot quickly turned
into a project and
then into production
Moving into the next
phase required
careful examination
of current practice
focusing on
expressing links
(relationships)
Photo: Film
transparency of a
chimpanzee with slot
machines at the Sands
Hotel, Las Vegas, circa
late 1950s, Sands
Collection
LOD APPROACH AFTER THE PILOT
 After learning the concepts, applying a model, and
testing technologies, the LOD transformation
process becomes repeatable
 Sustainability of process depends upon data quality
 Data begins with existing metadata in current
collections; there are many lessons from the pilot
that should inform revisions to current practice
(even if LOD is more in future than present)
MINING THE METADATA
Application profile
Shared Vocabularies
Managing Controlled Vocabularies
Managing Linked Data
When should we start preparing metadata for Linked Data?
EVOLUTION OF METADATA
OUR FOCUS IS ON METADATA
 Why?
 Metadata is essential for establishing relationships
 Any metadata?
 Ability of discovering relationships is directly affected by
metadata quality
 It is critical to:
 Use well-established Controlled Vocabularies
(particularly if they are linked data ready)
 Rigorously control local terms
 Re-use URIs
 Assign URIs for local terms
METADATA CREATION –
COMMON APPROACHES
 Focus is on the collection being created
 Usually metadata consistency is managed within collections
 Not much rigor is used to enter controlled vocabulary
terms
 Exs.: Misspellings, use of terms that do not match the
preferred terms, etc.
 Limited control of local terms
 Implications:
 Ability to identify relationships within and across collections is
decreased
When should we start preparing metadata for Linked Data?
WHAT CAN WE DO TO CREATE “SAPIENT”
METADATA?
Application
Profile
Re-design
strategies to
manage and
use CVs
WHAT DO I DO WITH MY LEGACY METADATA?
Adjust
metadata
according
to the
Application
Profile
Apply
strategies
to
manage
and use
CVs
effectively
METADATA MILESTONES AT UNLV LIBRARIES
Adopted an approach that considers each individual
digital collection as part of an integrated digital library.
THE UNLV APPLICATION PROFILE
 Specifies:
 which metadata terms UNLV Libraries uses for its digital
collections
 the source of metadata terms
 how metadata should be expressed
 labels to be used for each metadata field
 Benefits:
 Increases consistency of content across digital collections
 Improves user interactions with digital collections
 Indexing guidelines are easy to generate
 Facilitates transformation to Linked Data
 Increases compliance with regional and national aggregators
OUTCOMES
 Well-established CVs  allow re-use of URIs
 Rigorous rules of data entry  facilitate
reconciliation
 Local Controlled vocabularies  allow interlinking
among local terms / names within collections
 Shared vocabularies  allow interlinkage among
local terms / names across collections
All these actions:
allow creation of a single process to transform
digital collections into linked data
Video: [PLAY SUPER-SKELETON-WHH.mp4]
MOVING FROM EXPERIMENTATION TO
IMPLEMENTATION
 Cleaning and sharing controlled vocabularies from
legacy collections (time consuming)
 Re-training metadata creators
 Re-designing workflows
 Delegating additional data management
responsibilities
DATA MANAGEMENT
 Maintenance of local URIs
 Terms
 Authoritative Names
 Design and implementation of new processes to
maintain synchronization between digital library and
linked data set
 Design processes to enrich relationships with
external data sets
NEXT STEPS
Future Activities
Resources
Video Demo
FUTURE ACTIVITIES
 Publish data
 Interlinking with other data sets
 Documentation
 Collaborative activities (regional controlled
vocabularies)
 Training and staff skill development
 Interface design and development
 Work with hierarchical data
VIDEO DEMO
This short video (no sound, just image) will give a preview of
what linked data may look like to users.
It shows the Relfinder software acting upon UNLV’s Linked
Open Data – triplestore.
Think about how this is similar/different to how users
currently view data in library systems.
[PLAY SHOWING RELATIONSHIPS.mp4]
THE LINKED DATA CLOUD
RESOURCES
Leading to Linking: Introducing Linked Data to Academic Library
Digital Collections:
http://www.tandfonline.com/doi/pdf/10.1080/19386389.2013.826
095
A Guide for Transforming Digital Collections Metadata into
Linked Data Using Open Source Technologies:
http://www.tandfonline.com/doi/pdf/10.1080/19386389.2015.100
7009
UNLV Linked Data Blog (videos posted here):
https://www.library.unlv.edu/linked-data
Contact us!
THANKYOU!
Contact Us:
Cory Lampert
cory.lampert@unlv.edu
Silvia Southwick
silvia.southwick@unlv.edu
UNLV Digital
Collections
www.d.library.unlv.edu
Questions?
Photo: Photograph of
Bluebells posing
outside of Pan Am jet,
1958, Donn Arden
Collection
QUESTIONS?
Contact:
Cory Lampert
cory.lampert@unlv.edu
Silvia Southwick
silvia.southwick@unlv.edu
UNLV Digital Collections
www.d.library.unlv.edu

Exposing Hidden Relationships: Practical Work in Linked Data using Digital Collections

  • 1.
    EXPOSING HIDDEN RELATIONSHIPS: PRACTICALWORK IN LINKED DATA USING DIGITAL COLLECTIONS Cory Lampert and Silvia Southwick UNLV University Libraries Digital Collections April 23, 2015 Linked Data & RDF: New Frontiers in Metadata and Access Conference
  • 2.
    OVERVIEW  Video Demo UNLV Linked Data Project  Digital Collections Metadata: Source of Rich (But Hidden) Relationships  Video Demo  Next Steps, Future  Questions
  • 3.
    VIDEO DEMO This shortvideo (no sound, just image) will give a preview of what linked data may look like to users. It shows the Virtuoso Pivot Viewer software acting upon UNLV’s Linked Open Data – triplestore. Think about how this is similar/different to how users currently view data in library systems. [PLAY PIVOTVIEWER.mp4]
  • 4.
    EXPLORING LOD: TAKINGTHEORY TO PRACTICE • How we started • Goals set • What we accomplished
  • 5.
    HOWWEBEGAN Conferences and “buzz” Curiousity and professional development Explorationand pilot project Compelling results; sharing impact of what we’ve learned Assessment Much more to do... A sense of humor is helpful! Photo: Five men with burros, circa 1900, Tonopah/Goldfield Collection
  • 6.
    MOTIVATION  Information encapsulatedin records  Records contained in collections  Very few links are created within and/or across collections  Links have to be manually created  Existing links do not specify the nature of the relationships among records This structure hides potential context (links) within and across collections  Free metadata from silos  Expose rich relationships  Leverage powerful, seamless, interlinking of data from multiple sources  Discover and query data in new ways  More precise searching  More opportunities to repurpose data Current Practice LOD Potential
  • 7.
    POLL Please use theagree/disagree button, available from the pull down menu at the top of the screen to respond to the statement below: Statement: There is interest in doing practical work with linked open data at my institution.
  • 8.
    FOUNDATION OF PILOT Our digital collections consist of unique materials documenting the history of Southern Nevada stored in CONTENTdm; project focused on LOD for visual material collections  Definition of LOD we are using: “Linked Data refers to a set of best practices for publishing and interlinking data on the Web.”  A good way to better understand this is the 5-Star Data diagram: http://5stardata.info/
  • 9.
    PREPARINGFORDEPARTURE Before we launchinto a discussion of how we created our linked data, let’s take a short trip. We will start in our current data: digital collections metadata records, and end in the new world of linked open data. Photo: Photograph of Howard Hughes in cockpit of the second XF-11, April 4, 1947, Howard Hughes Collection
  • 10.
  • 11.
  • 12.
  • 13.
    POLL Please use theagree/disagree button, available from the pull down menu at the top of the screen to respond to the statement below: Statement: The diagrams helped me to see how linked data helps to reveal hidden relationships in existing metadata.
  • 14.
    UNLV LINKED OPENDATA PROJECT GOALS Study the feasibility of developing a common process that would allow the conversion of our collection records into linked data preserving their original expressivity and richness Publish data from our collections in the Linked Open Data Cloud to improve discoverability and connections across our collections and with data from other related data sets on the Web
  • 15.
    ACTIONS TECHNOLOGIES Clean data Exportdata CONTENTd m Open Refine Import data Prepare data Reconcile Generate triples Export RDF Import data Publish Mulgara / Virtuoso Phase 1 Phase 2 Phase 3
  • 16.
    WHATWELEARNED With interest and motivation,Linked Open Data is a feasible goal Visualization tools help convey the benefits of LOD work A pilot quickly turned into a project and then into production Moving into the next phase required careful examination of current practice focusing on expressing links (relationships) Photo: Film transparency of a chimpanzee with slot machines at the Sands Hotel, Las Vegas, circa late 1950s, Sands Collection
  • 17.
    LOD APPROACH AFTERTHE PILOT  After learning the concepts, applying a model, and testing technologies, the LOD transformation process becomes repeatable  Sustainability of process depends upon data quality  Data begins with existing metadata in current collections; there are many lessons from the pilot that should inform revisions to current practice (even if LOD is more in future than present)
  • 18.
    MINING THE METADATA Applicationprofile Shared Vocabularies Managing Controlled Vocabularies Managing Linked Data
  • 19.
    When should westart preparing metadata for Linked Data? EVOLUTION OF METADATA
  • 20.
    OUR FOCUS ISON METADATA  Why?  Metadata is essential for establishing relationships  Any metadata?  Ability of discovering relationships is directly affected by metadata quality  It is critical to:  Use well-established Controlled Vocabularies (particularly if they are linked data ready)  Rigorously control local terms  Re-use URIs  Assign URIs for local terms
  • 21.
    METADATA CREATION – COMMONAPPROACHES  Focus is on the collection being created  Usually metadata consistency is managed within collections  Not much rigor is used to enter controlled vocabulary terms  Exs.: Misspellings, use of terms that do not match the preferred terms, etc.  Limited control of local terms  Implications:  Ability to identify relationships within and across collections is decreased
  • 22.
    When should westart preparing metadata for Linked Data? WHAT CAN WE DO TO CREATE “SAPIENT” METADATA? Application Profile Re-design strategies to manage and use CVs
  • 23.
    WHAT DO IDO WITH MY LEGACY METADATA? Adjust metadata according to the Application Profile Apply strategies to manage and use CVs effectively
  • 24.
    METADATA MILESTONES ATUNLV LIBRARIES Adopted an approach that considers each individual digital collection as part of an integrated digital library.
  • 25.
    THE UNLV APPLICATIONPROFILE  Specifies:  which metadata terms UNLV Libraries uses for its digital collections  the source of metadata terms  how metadata should be expressed  labels to be used for each metadata field  Benefits:  Increases consistency of content across digital collections  Improves user interactions with digital collections  Indexing guidelines are easy to generate  Facilitates transformation to Linked Data  Increases compliance with regional and national aggregators
  • 26.
    OUTCOMES  Well-established CVs allow re-use of URIs  Rigorous rules of data entry  facilitate reconciliation  Local Controlled vocabularies  allow interlinking among local terms / names within collections  Shared vocabularies  allow interlinkage among local terms / names across collections
  • 27.
    All these actions: allowcreation of a single process to transform digital collections into linked data Video: [PLAY SUPER-SKELETON-WHH.mp4]
  • 28.
    MOVING FROM EXPERIMENTATIONTO IMPLEMENTATION  Cleaning and sharing controlled vocabularies from legacy collections (time consuming)  Re-training metadata creators  Re-designing workflows  Delegating additional data management responsibilities
  • 29.
    DATA MANAGEMENT  Maintenanceof local URIs  Terms  Authoritative Names  Design and implementation of new processes to maintain synchronization between digital library and linked data set  Design processes to enrich relationships with external data sets
  • 30.
  • 31.
    FUTURE ACTIVITIES  Publishdata  Interlinking with other data sets  Documentation  Collaborative activities (regional controlled vocabularies)  Training and staff skill development  Interface design and development  Work with hierarchical data
  • 32.
    VIDEO DEMO This shortvideo (no sound, just image) will give a preview of what linked data may look like to users. It shows the Relfinder software acting upon UNLV’s Linked Open Data – triplestore. Think about how this is similar/different to how users currently view data in library systems. [PLAY SHOWING RELATIONSHIPS.mp4]
  • 33.
  • 34.
    RESOURCES Leading to Linking:Introducing Linked Data to Academic Library Digital Collections: http://www.tandfonline.com/doi/pdf/10.1080/19386389.2013.826 095 A Guide for Transforming Digital Collections Metadata into Linked Data Using Open Source Technologies: http://www.tandfonline.com/doi/pdf/10.1080/19386389.2015.100 7009 UNLV Linked Data Blog (videos posted here): https://www.library.unlv.edu/linked-data Contact us!
  • 35.
    THANKYOU! Contact Us: Cory Lampert cory.lampert@unlv.edu SilviaSouthwick silvia.southwick@unlv.edu UNLV Digital Collections www.d.library.unlv.edu Questions? Photo: Photograph of Bluebells posing outside of Pan Am jet, 1958, Donn Arden Collection
  • 36.