Your SlideShare is downloading. ×
0
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Duraspace Hot Topics Series 6: Metadata and Repository Services

306

Published on

Presented by Declan Fleming, Arwen Hutt, and Matt Critchlow. The second in a three part Webinar series on Research Data Curation at UC San Diego, as part of the larger Research Cyberinfrastructure …

Presented by Declan Fleming, Arwen Hutt, and Matt Critchlow. The second in a three part Webinar series on Research Data Curation at UC San Diego, as part of the larger Research Cyberinfrastructure initiative.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
306
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hot Topics Web Seminar Series: Research Data in Repositories The UC San Diego Experience Second Webinar: Metadata and Repository Services for Research Data Curation
  • 2. General Series Intro • First webinar: Intro and Framing: UC San Diego decisions and planning • Second Webinar: Deep dive into technology and metadata • Third Webinar: The perspective from researchers, next steps
  • 3. Your esteemed presenters … First webinar: David Minor – Program Director, Research Data Curation Declan Fleming - Chief Technology Strategist Second webinar: Declan Fleming - Chief Technology Strategist Arwen Hutt - Metadata Librarian Matt Critchlow - Manager of Development and Web Services Third webinar: Dick Norris – Professor, Scripps Institution of Oceanography Rick Wagner – Data Scientist at San Diego Supercomputer Center
  • 4. Today we will … • Discuss real-world researcher interaction • Document how metadata and files combine to make digital objects • Describe the DAMS data model and how it supports complex research objects • Detail the technology driving the DAMS • Point to the future
  • 5. Working with Researchers: Pilots • The Brain Observatory • NSF OpenTopography Facility • Levantine Archaeology Laboratory • Scripps Institute of Oceanography Geological Collections • The Laboratory for Computational Astrophysics
  • 6. Working with Researchers: Process • • • • Introductory meeting Metadata point person Ongoing discussions One on one work Iterative, collaborative, customized, experimental…pilot!
  • 7. Working with Researchers: Data management • • • • Collocation Clean up Identifiers Metadata
  • 8. Working with Researchers: What is an object? • What are the boundaries on a discreet set or subset of data? What is required to make the data intelligible, usable and reusable? • What needs to be preserved? • What do they want to display and/or share? • What do they want to be able to refer to or cite?
  • 9. Working with Researchers: What is an object? Brain or Slice Etc… Artifact Site or
  • 10. Working with Researchers: Take Aways They are the subject experts There are a lot of broad level similarities But no such thing as one size fits all
  • 11. We want a new data model… • One that is flexible and accommodates disparate metadata from a variety of sources • While promoting consistency within the data store • One that supports relationships within and between objects • One that is more community engaged, both sharing vocabularies and technology, and utilizing others shared vocabularies and technologies • One that supports improved management of objects and metadata
  • 12. DAMS Data Model Development Process • Five people, in a room, 16 hours a week for 4 months • Worked through existing data, use case scenarios, known data requirements, investigated known ontologies, etc. • Lots and lots and lots of discussion • Utilizes MADS (Metadata Authority Description Schema) • Results = a data dictionary and an OWL ontology • Living document
  • 13. DAMS Data Model: Flexibility • The data model provides enough flexibility that we can accommodate a wide variety of data within the schema – Vocabularies – Use of “types” or “display labels” to distinguish specific subtypes of a data field – Flexible structures and relationships – Extensible
  • 14. DAMS Data Model: Consistency • But enough consistency that searching and display rules do not need to be customized for each individual collection of material – Rules can be applied at the level of the broader concept • As well as establishing the organizational structure necessary for maintaining consistency over time – Evaluation and approval of modifications
  • 15. DAMS Data Model: Relationships • It allows us to create a number of different relationships – Collections and sub-collections – Collections and objects – Objects and components (complex hierarchical objects) – Other related resources internal or external to the DAMS complex object example
  • 16. DAMS Data Model: Vocabularies • Allow management of local & community vocabularies – Vocabulary terms as entities – Ability to encode authority data (vocabulary source, value uri, etc.) as well as sameAs relationships between the same term expressed in multiple sources – Ability to update authority records as community vocabularies become more formalized.
  • 17. DAMS Data Model: Management • One that supports improved management of objects and metadata – Authority management of vocabulary terms – Event metadata!
  • 18. DAMS Architecture
  • 19. Preservation: Chronopolis Current DAMS Process 1. Create Bagit bags for all objects 2. Host via HTTP(S) 3. Bags are retrieved and ingested into Chronopolis DAMS4 Process 1. Create Bagit bags for Δ objects using Event metadata 2. Host via HTTP(S) or enqueue on messaging queue for ingestion
  • 20. Storage
  • 21. Storage: EMC Isilon 72NL Storage For Library Collections 1 cluster of 5 Nodes 1 Node = 36 x 2TB Drives Total Current Usable Storage of 320TB OneFS 7.0.2.1
  • 22. Storage: OpenStack Storage For Research Data Collections Testing: • Performance versus Local Storage • Large Files (up to 1TB) – Segmenting files > 5GB – Lexical order bug fix: 1,10,2 -> 0001,0002,…0010 • Rackspace CloudFiles API VS OpenStack REST API Testing Notes: https://libraries.ucsd.edu/blogs/dams/openstack-testing-notes/
  • 23. DAMS Repository
  • 24. DAMS Repository Core Repository Application: Create, Read, Update, Delete (CRUD) Uses: Jena, ActiveMQ, JHOVE, Apache Tika, FFMPEG, ImageMagick Manages: • Metadata Triplestore • Storage • Solr
  • 25. DAMS Repository: Metadata Triplestore
  • 26. DAMS Repository: Metadata Triplestore Triplestore was: Allegrograph Triplestore is: PostgresSQL DB + Jena • Schema: (ID), Parent, Subject, Predicate, Object Jena Usage: • Core/RDF API – Parsing, loading, updating, serializing RDF • ARQ API – SPARQL queries
  • 27. DAMS Repository: REST API
  • 28. Hydra Framework Source: https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts
  • 29. DAMS Repository: Fedora API-ish
  • 30. Fedora API – Next PID
  • 31. Fedora API – Next PID
  • 32. DAMS Manager
  • 33. DAMS Manager Java application using Spring MVC framework • Collection Management – – – – Metadata Ingest and Export File Ingest Derivative Generation Solr indexing by Collection • Administrative Reporting and Statistics
  • 34. DAMS Hydra Head
  • 35. DAMS Hydra Head
  • 36. DAMS Hydra Head: Blacklight
  • 37. RDF in Hydra
  • 38. RDF in Hydra: (Read) Nested Attributes
  • 39. RDF in Hydra: (Create) Nested Attributes
  • 40. DAMS Hydra Head: Complex Objects
  • 41. Next Steps Beta Release: Late October Production Release: January Future: • Sufia/Curate Integration for administrative functionality • Additional Linked Data Integration and Crosswalks – Schema.org, OpenURL, Dublin Core, ResourceSync • Fedora4
  • 42. More Information DAMS Overview https://github.com/ucsdlib/dams/wiki/DAMS-Manual DAMS Hydra Head https://github.com/ucsdlib/damspas DAMS Ontology https://github.com/ucsdlib/dams/tree/master/ontology DAMS REST API https://github.com/ucsdlib/dams/wiki/REST-API Hot Topics Series 3: Get a Head on the Repository with Hydra http://duraspace.org/hot-topics Hydra Technical Overview https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts OneFS Technical Overview http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf Isilon Overview http://www.emc.com/collateral/software/data-sheet/h10541-ds-isilon-platform.pdf
  • 43. Coming Up Next Final Webinar (October 31) The researcher perspective from two of our pilot participants Dick Norris – Professor, Scripps Institution of Oceanography Rick Wagner – Data Scientist at San Diego Supercomputer Center
  • 44. Questions? Thanks! Declan Fleming @declan | dfleming@ucsd.edu Arwen Hutt @arwenh | ahutt@ucsd.edu Matt Critchlow @mattcritchlow | mcritchlow@ucsd.edu

×