SlideShare a Scribd company logo
1 of 26
To architect or engineer?
Lessons from DataPool on
building RDM repositories



Steve Hitchcock, JISC DataPool Project
9th DCC Research Data Management Forum (RDMF9)
Cambridge, 14-15 November 2012
Why architecting?




                    http://datapool.soton.ac.uk
DataPool architecture (Sharepoint)




    Peter Hancock, iSolutions, University of Southampton
DataPool
Building Capacity, Developing Skills, Supporting Researchers
October 2011

         Policy and guidance                     Training                                  Data repository

                                                                                 SharePoint
                                                       Doctoral Training
                                                       Centres
                                                                 Graduate
                                                                 & staff
                                                                 training
                                                                 services
Progress
                                    Case studies +                                                         EPrints 3.3
                                    • Imaging, 3D
                                    •Geodata            University Strategic
                                    • ++                Research Groups
                        IDMB                                                                             EPrints data apps
Informed                Surveys of
by                      data practices
                        among academics
                                                                                     3-layer metadata
                                                                                                           March 2013


                     Support for Data            Capture/share with                                 Assign
Developing/          Management Plans            external sources,         Large-scale              DataCite
                                                 e.g. SWORD-ARM            data storage             DOIs
working with         e.g.


                                JISCMRD Progress           Byatt, D. (D.R.Byatt@soton.ac.uk)
                                Workshop                   Hitchcock, S. (sh94r@ecs.soton.ac.uk )
                                24-25 October 2012         White, W. (whw@soton.ac.uk )
                                Nottingham
                                             http:/datapool.soton.ac.uk/
Data repository platforms
                              Architected

              •DataFlow
         • MS Sharepoint
                •EPrints

                              Engineered
                              From a data repository
Other platforms available     perspective
         •DSpace, CKAN,
            data.bris, etc.
Implementations of DataFlow Model
                                            DataFlow: two data
                            Curated         deposit motivations
DataStage           SWORD   repository/ar   for creators: want to
                                            (practice), need to
                            chive
                                            (policy)

Two-stage
architecture                DataBank

Addresses Dropbox
effect for data             EPrints
producers



                            DSpace               QMUL
DataStage: Upload file




                               DataStage was developed at the University of Oxford
DataStage screenshots courtesy JISC Kaptur project http://www.vads.ac.uk/kaptur/
                                                            Thanks to Carlos Silva
DataStage: Submit as data package
3-layer metadata model




                         Takeda et al., 6th IDCC, Dec. 2010
        available from http://eprints.soton.ac.uk/169533/
    JISC Institutional Data Management Blueprint (IDMB)
                        Project, University of Southampton
SharePoint user interface 1: project
SharePoint user interface 2: data




                 + fields for format, keywords
Prof. Simon Cox (engng) on Sharepoint
“The concept that formed part of SP thinking (at
Southampton) from the very inception … that ability to use
SP as a way to manage or at least collaborate as part of a 5-10
year programme of work.

“The other side is what we‟re doing with intellectual property
and what we‟re offering for students. I chair a group design
project, and every single student has said „I just do it all on
Dropbox‟. The same is happening with our research. So I
think we have at least to provide a level of service and a level
of integration between our research experience and our
teaching experience. Would these people go to Southampton
rather than University of Nowhereshire on the Web or the
University of Google or the University of Dropbox? These are
deep questions for us.”
ePrintsSoton: Item type: Dataset




            Currently EPrints v3.2, customised to ePrintsSoton
                                 Dataset Item Type from 2007
ePrintsSoton: start to deposit Dataset
EPrints data apps




   Apps available from EPrints Bazaar http://bazaar.eprints.org/
                            Apps work with EPrints v3.3 or later
EPrints (test repo) DataShare enabled




            App by Tim Brody, EPrints + DataPool
EPrints (test repo) Data Core enabled




Data Core “adds a few
fields and doesn‟t
remove any fields
from the eprint
object. It creates an
alternate workflow for
datasets which is
much smaller than a
normal eprints
workflow.”
                         App by Patrick McSweeney
EPrints (test repo) Data Core enabled 2




                        App by Patrick McSweeney
Essex Research Data metadata profile aims
“Using metadata schema relevant to UK HE and
research data (DataCite, INSPIRE and DDI
2.1), we have developed a basic metadata profile
suited to describing research data generated at
institutions with disciplinary diversity. The
inclusion of fields like Funder and Grant number
will ensure future harvesting and linking
opportunities (like RCUK Research Outcome
Systems). The metadata also suits the EPSRC data
registry requirements.”
http://researchdataessex.posterous.com/reposito
ry-beta-metadata-profile-released
EPrints: Essex Research Data repository




Screenshots courtesy
 JISC Research Data
     @Essex project
    Thanks to Louise
  Corti, Tom Ensom,
       Alexis Wolton

                       EPrints v3.3.10, customised to Essex Research Data
                                          http://researchdata.essex.ac.uk/
Essex Research Data record
Essex Research Data: observations

•Assumes data deposit, so no selection of EPrints
Item Type
• No selection of e.g. Creative Commons licence,
just copyright
• Requirement for Time Period suggests particular
type of data expected
• Fields for Geographic info (not required)
suggests particular type of data expected
Architects and surroundings
                  “On  one plot aggressively crystalline
                  blocks by Rogers StirkHarbour are going
                  up, their diamond shapes having
                  nothing in particular to do with anything
     Nine Elms,   around them. On another Foster and
        London    Partners have designed a series of
usembassylondon
                  curving, stepped, blobby things, of the
                  kind usually designed to take advantage
                  of views on the Med or the Gulf, but are
                  here facing each other like rows of
                  daleks. Again, it shows little interest in
                  anything around it.”
                  R. Moore, Utopia on Thames, Observer, 11 Nov 2012
Open access repository interoperability

Confederation of Open Access Repositories (COAR)
Dublin Core, CRIS-CERIF
OpenAIRE, RepositoryNet+, Rioxx
RCUK: Research Outcomes System, Gateway to
Research, REF

Is there the same current debate about
interoperability of data repositories?
COAR on OA interoperability
Specific initiatives designed to support interoperability:
AuthorClaim, CRIS-OAR, DataCite, DINI Certificate for
Document and Publication Services, DOI, DRIVER,
Handle System, KE Usage Statistics Guidelines, OAI-
ORE, OAI-PMH, OA-Statistik, OA Repository Junction,
OpenAIRE, ORCID, PersID, PIRUS, SURE, SWORD, and
UK RepositoryNet+.
COAR, The Current State of Open Access Repository
Interoperability (2012), 26 Oct. 2012 v.02

   MT @gknight2000 (Gareth Knight) Lincoln's CKan
   instance impressive bit.ly/QQd1au Doesn't appear to
   support OAIPMH or preservation function #jiscmrd
What next for DataPool repositories?
Sharepoint
• User test and feedback sessions scheduled, will
direct further development
EPrints apps (1 or 2 0f following, initially)
• Develop app based on Essex data repository,
providing other repositories with a 1-click install of
this profile
• Build interoperability (I/O) apps:
e.g. Data Management Plans, Dropbox
• Automate record capture for producers of large-
scale, regular data outputs

More Related Content

What's hot

Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineFuture Perfect 2012
 
Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"
Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"
Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"DATAVERSITY
 
Partnering for Research Data
Partnering for Research DataPartnering for Research Data
Partnering for Research DataLiz Lyon
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020SMART Infrastructure Facility
 
Information Management best_practice_guide
Information Management best_practice_guideInformation Management best_practice_guide
Information Management best_practice_guideChristopher Bradley
 
Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioARDC
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
 
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012IUPUI
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012IUPUI
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...ManjulaPatel
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Historic Environment Scotland
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
 
The PSI Directive and Open Government Data
The PSI Directive and Open Government DataThe PSI Directive and Open Government Data
The PSI Directive and Open Government DataOpen Data Support
 
Iassist 2012 dms public version
Iassist 2012 dms public versionIassist 2012 dms public version
Iassist 2012 dms public versionjhudms
 
How to develop a Pilot Data Management Infrastructure for Biomedical Research...
How to develop a Pilot Data Management Infrastructure for Biomedical Research...How to develop a Pilot Data Management Infrastructure for Biomedical Research...
How to develop a Pilot Data Management Infrastructure for Biomedical Research...Meik Poschen
 

What's hot (20)

Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"
Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"
Data-Ed Online: "Building a Solid Foundation: Data/Information Architecture"
 
Partnering for Research Data
Partnering for Research DataPartnering for Research Data
Partnering for Research Data
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020SMART Seminar - The Future of Business Intelligence: Information 2020
SMART Seminar - The Future of Business Intelligence: Information 2020
 
A Study On Red Box Data Mining Approach
A Study On Red Box Data Mining ApproachA Study On Red Box Data Mining Approach
A Study On Red Box Data Mining Approach
 
SMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data ManagementSMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data Management
 
Information Management best_practice_guide
Information Management best_practice_guideInformation Management best_practice_guide
Information Management best_practice_guide
 
Transcript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audioTranscript of Webinar: Data management plans (DMPs) - audio
Transcript of Webinar: Data management plans (DMPs) - audio
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 
The PSI Directive and Open Government Data
The PSI Directive and Open Government DataThe PSI Directive and Open Government Data
The PSI Directive and Open Government Data
 
Iassist 2012 dms public version
Iassist 2012 dms public versionIassist 2012 dms public version
Iassist 2012 dms public version
 
How to develop a Pilot Data Management Infrastructure for Biomedical Research...
How to develop a Pilot Data Management Infrastructure for Biomedical Research...How to develop a Pilot Data Management Infrastructure for Biomedical Research...
How to develop a Pilot Data Management Infrastructure for Biomedical Research...
 
On Big Data
On Big DataOn Big Data
On Big Data
 

Similar to To architect or engineer? Lessons from DataPool on building RDM repositories

UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Simon Hodson
Simon HodsonSimon Hodson
Simon HodsonEduserv
 
Developing institutional RDM services
Developing institutional RDM servicesDeveloping institutional RDM services
Developing institutional RDM servicesMichael Day
 
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...ResearchSpace
 
Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin
 
Dorothy Byatt JIBS-RLUK event July 2012
Dorothy Byatt JIBS-RLUK event July 2012Dorothy Byatt JIBS-RLUK event July 2012
Dorothy Byatt JIBS-RLUK event July 2012sherif user group
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryRobin Rice
 
Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations
Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations
Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations EDINA, University of Edinburgh
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghRobin Rice
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghRobin Rice
 

Similar to To architect or engineer? Lessons from DataPool on building RDM repositories (20)

UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Simon Hodson
Simon HodsonSimon Hodson
Simon Hodson
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
Developing institutional RDM services
Developing institutional RDM servicesDeveloping institutional RDM services
Developing institutional RDM services
 
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
 
Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Educating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
Dorothy Byatt JIBS-RLUK event July 2012
Dorothy Byatt JIBS-RLUK event July 2012Dorothy Byatt JIBS-RLUK event July 2012
Dorothy Byatt JIBS-RLUK event July 2012
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations
Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations
Edinburgh DataShare – A DSpace Data Repository: Achievements and Aspirations
 
Fedora Oxford Dec09
Fedora Oxford Dec09Fedora Oxford Dec09
Fedora Oxford Dec09
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
DSpace for Data Revisited
DSpace for Data RevisitedDSpace for Data Revisited
DSpace for Data Revisited
 
Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of Edinburgh
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

To architect or engineer? Lessons from DataPool on building RDM repositories

  • 1. To architect or engineer? Lessons from DataPool on building RDM repositories Steve Hitchcock, JISC DataPool Project 9th DCC Research Data Management Forum (RDMF9) Cambridge, 14-15 November 2012
  • 2. Why architecting? http://datapool.soton.ac.uk
  • 3. DataPool architecture (Sharepoint) Peter Hancock, iSolutions, University of Southampton
  • 4. DataPool Building Capacity, Developing Skills, Supporting Researchers October 2011 Policy and guidance Training Data repository SharePoint Doctoral Training Centres Graduate & staff training services Progress Case studies + EPrints 3.3 • Imaging, 3D •Geodata University Strategic • ++ Research Groups IDMB EPrints data apps Informed Surveys of by data practices among academics 3-layer metadata March 2013 Support for Data Capture/share with Assign Developing/ Management Plans external sources, Large-scale DataCite e.g. SWORD-ARM data storage DOIs working with e.g. JISCMRD Progress Byatt, D. (D.R.Byatt@soton.ac.uk) Workshop Hitchcock, S. (sh94r@ecs.soton.ac.uk ) 24-25 October 2012 White, W. (whw@soton.ac.uk ) Nottingham http:/datapool.soton.ac.uk/
  • 5. Data repository platforms Architected •DataFlow • MS Sharepoint •EPrints Engineered From a data repository Other platforms available perspective •DSpace, CKAN, data.bris, etc.
  • 6. Implementations of DataFlow Model DataFlow: two data Curated deposit motivations DataStage SWORD repository/ar for creators: want to (practice), need to chive (policy) Two-stage architecture DataBank Addresses Dropbox effect for data EPrints producers DSpace QMUL
  • 7. DataStage: Upload file DataStage was developed at the University of Oxford DataStage screenshots courtesy JISC Kaptur project http://www.vads.ac.uk/kaptur/ Thanks to Carlos Silva
  • 8. DataStage: Submit as data package
  • 9. 3-layer metadata model Takeda et al., 6th IDCC, Dec. 2010 available from http://eprints.soton.ac.uk/169533/ JISC Institutional Data Management Blueprint (IDMB) Project, University of Southampton
  • 11. SharePoint user interface 2: data + fields for format, keywords
  • 12. Prof. Simon Cox (engng) on Sharepoint “The concept that formed part of SP thinking (at Southampton) from the very inception … that ability to use SP as a way to manage or at least collaborate as part of a 5-10 year programme of work. “The other side is what we‟re doing with intellectual property and what we‟re offering for students. I chair a group design project, and every single student has said „I just do it all on Dropbox‟. The same is happening with our research. So I think we have at least to provide a level of service and a level of integration between our research experience and our teaching experience. Would these people go to Southampton rather than University of Nowhereshire on the Web or the University of Google or the University of Dropbox? These are deep questions for us.”
  • 13. ePrintsSoton: Item type: Dataset Currently EPrints v3.2, customised to ePrintsSoton Dataset Item Type from 2007
  • 14. ePrintsSoton: start to deposit Dataset
  • 15. EPrints data apps Apps available from EPrints Bazaar http://bazaar.eprints.org/ Apps work with EPrints v3.3 or later
  • 16. EPrints (test repo) DataShare enabled App by Tim Brody, EPrints + DataPool
  • 17. EPrints (test repo) Data Core enabled Data Core “adds a few fields and doesn‟t remove any fields from the eprint object. It creates an alternate workflow for datasets which is much smaller than a normal eprints workflow.” App by Patrick McSweeney
  • 18. EPrints (test repo) Data Core enabled 2 App by Patrick McSweeney
  • 19. Essex Research Data metadata profile aims “Using metadata schema relevant to UK HE and research data (DataCite, INSPIRE and DDI 2.1), we have developed a basic metadata profile suited to describing research data generated at institutions with disciplinary diversity. The inclusion of fields like Funder and Grant number will ensure future harvesting and linking opportunities (like RCUK Research Outcome Systems). The metadata also suits the EPSRC data registry requirements.” http://researchdataessex.posterous.com/reposito ry-beta-metadata-profile-released
  • 20. EPrints: Essex Research Data repository Screenshots courtesy JISC Research Data @Essex project Thanks to Louise Corti, Tom Ensom, Alexis Wolton EPrints v3.3.10, customised to Essex Research Data http://researchdata.essex.ac.uk/
  • 22. Essex Research Data: observations •Assumes data deposit, so no selection of EPrints Item Type • No selection of e.g. Creative Commons licence, just copyright • Requirement for Time Period suggests particular type of data expected • Fields for Geographic info (not required) suggests particular type of data expected
  • 23. Architects and surroundings “On one plot aggressively crystalline blocks by Rogers StirkHarbour are going up, their diamond shapes having nothing in particular to do with anything Nine Elms, around them. On another Foster and London Partners have designed a series of usembassylondon curving, stepped, blobby things, of the kind usually designed to take advantage of views on the Med or the Gulf, but are here facing each other like rows of daleks. Again, it shows little interest in anything around it.” R. Moore, Utopia on Thames, Observer, 11 Nov 2012
  • 24. Open access repository interoperability Confederation of Open Access Repositories (COAR) Dublin Core, CRIS-CERIF OpenAIRE, RepositoryNet+, Rioxx RCUK: Research Outcomes System, Gateway to Research, REF Is there the same current debate about interoperability of data repositories?
  • 25. COAR on OA interoperability Specific initiatives designed to support interoperability: AuthorClaim, CRIS-OAR, DataCite, DINI Certificate for Document and Publication Services, DOI, DRIVER, Handle System, KE Usage Statistics Guidelines, OAI- ORE, OAI-PMH, OA-Statistik, OA Repository Junction, OpenAIRE, ORCID, PersID, PIRUS, SURE, SWORD, and UK RepositoryNet+. COAR, The Current State of Open Access Repository Interoperability (2012), 26 Oct. 2012 v.02 MT @gknight2000 (Gareth Knight) Lincoln's CKan instance impressive bit.ly/QQd1au Doesn't appear to support OAIPMH or preservation function #jiscmrd
  • 26. What next for DataPool repositories? Sharepoint • User test and feedback sessions scheduled, will direct further development EPrints apps (1 or 2 0f following, initially) • Develop app based on Essex data repository, providing other repositories with a 1-click install of this profile • Build interoperability (I/O) apps: e.g. Data Management Plans, Dropbox • Automate record capture for producers of large- scale, regular data outputs

Editor's Notes

  1. I thank Graham Pryor of DCC, organiser of this RDMF9 meeting, for inviting this talk, and for suggesting this topic based, presumably, on this project blog post. It sets out some of the higher-level issues while avoiding the trap of setting up a straw man pitting SharepointversusEPrints.
  2. That blog post included this architectural diagram, produced by Peter Hancock, director of the iSolutions IT service provider at the University of Southampton. Although it leans heavily towards referencing Sharepoint, it can be viewed as a high-level reference model, analogous to the OAIS in digital preservation, and therefore as a model that can embrace other repository types.
  3. Before we get into the detail of the presentation, here is a poster-based summary of the DataPool Project. It has a tripartite approach characteristic of similar institutional projects in the JISC MRD programme, covering data policy, training and, the area of interest here, building a data repository. It is worth noting as well, in this context, that the development partners shown in the row beneath the tripartite elements effectively represent ways of getting data in and out of the RDM service adopted, and are relevant factors in the repository design.
  4. Here is how the different repository platforms might line up on a broad spectrum of Architected vs Engineered. This is a rough-and-ready approach to illustrate the basic point. Also included is DataFlow, from the University of Oxford, perhaps the most innovative repository platform to have emerged for RDM. Given its originality, it appears towards the architected end of the spectrum. We could not claim that Sharepoint is a new software platform in the same way as DataFlow, but from an RDM perspective you don’t get anything out of the box – you have to start from scratch and ‘architect’ an RDM solution. What developers can do is try and ‘engineer’ the designed RDM element with the IT services already provided in Sharepoint. EPrints first appeared in 2001 to manage research publications. It has offered a ‘dataset’ deposit type since 2007, so provides a ready-made solution for an RDM repository, and can be ‘engineered’ to enhance that solution. As the slide notes, other RDM repository platforms are available. In the following slides we will explore the features of our three highlighted RDM platforms, starting with DataFlow.
  5. DataFlow is a two-stage architecture for data management: an open (Dropbox-like) space for data producers (DataStage), and a managed and curated repository (DataBank), connected by a standard content transfer protocol, SWORD. While DataBank provides a bespoke data management service for Oxford, we have recently noted experiments to connect an open source version of DataStage with EPrints- and DSpace-based curated repositories, thus providing the yearned for Dropbox functionality apparently so in demand with research data producers.
  6. This is an example screenshot from the DataStage-EPrints experimental arrangement used by the JISC Kaptur project. It shows the familiar Choose File-Upload button combination familiar to e.g. Wordpress blog users, for uploading data. Uploaded data is then shown in a conventional file manager list.
  7. To move data from DataStage to the curated repository, again shown in the experimental Kaptur implementation, uses this surprisingly simple SWORD client interface. If this seems insufficient description for a curated item, presumably a more detailed SWORD client could be substituted.
  8. One basis for building a more comprehensive description, or metadata, for research data is this 3-layer model produced by the Institutional Data Management Blueprint (IDMB) Project, the project that preceded DataPool at the University of Southampton. This is quite a general-purpose and flexible model, perhaps with more flexibility than meaning. Structurally, nevertheless, we will see that this has some relevance to repository deposit workflow design.
  9. The 3-layer metadata model can be seen quite clearly in the emerging user interface for data deposit built on Sharepoint. Here we see the interface for collecting project descriptions, used once per project and then linked to each data record produced by the project.
  10. In the same style, here is the Sharepoint user interface for collecting data descriptions. One of the most noticeable features within both the Project and Data forms is the small number of mandatory fields (indicated with a red asterisk), just one on each form. Mandatory fields have to be filled in for the form to submit successfully. Most people will have experienced these fields; invariably when completing a Web shopping form these will be returned with red text warning. In this case you could feasibly submit a project or data description containing only a title. Aspects such as this are shortly to be subjected to user testing and review of this implementation.
  11. Sharepoint has its detractors as an IT service platform, principally bemoaning its complexity-to-functionality ratio. Prof Simon Cox from Southampton University takes the opposite view passionately. This is an extract from his intervention at a DataPool Steering Group meeting (May 2012) putting the case for Sharepoint. It is a good way of understanding the wider strengths of Sharepoint, which may not be immediately apparent to users of particular Sharepoint services. Building the range of services suggested is a difficult and long-term project.
  12. EPrints supports the deposit of many item types, including datasets since 2007. When you open a new deposit process in EPrints you will first be shown this screen, where you can select an item type such as ‘dataset’.
  13. Selecting ‘dataset’ will take you to this next screen, which might look something like this from ePrintsSoton, the Southampton Institutional Repository. This is not quite a default screen for standard EPrints installs; the workflow and fields have been customised in some areas by a repository developer.
  14. EPrints users need not be restricted to standard interfaces or interfaces customised to a repository requirement. Interfaces in EPrints can be added or amended by simply installing an app from the app store, or Bazaar. Unlike the Apple app store, with which it might optimistically be compared, EPrints apps are not selected to be installed by users but installation is authorised by repository managers. There are already two apps for those managers to choose to suit particular RDM workflow requirements: DataShare and Data Core. More data apps are expected to follow. EPrints is thus being engineered for flexibility in RDM deposit. In the following slides we will explore these first two data apps.
  15. DataShare makes some minor modifications to the default EPrints workflow for deposit of datasets, highlighted with red circles here.
  16. Data Core aims to implement a minimal ‘core’ metadata for datasets. Implementing this app will overwrite the default EPrints workflow, replacing it with the minimal set, approximately half of which is shown here (the remainder in the next slide). In addition, we have a short description of the design aims for Data Core, which are unavailable for Sharepoint data deposit and the DataShare app.
  17. Taking both slides showing the Data Core deposit workflow, this is comparable, in extent, with the Sharepoint ‘data’ interface shown earlier, although it has a few more mandatory fields.
  18. Another example of an EPrints data deposit interface has been developed at the University of Essex. Like Data Core, the Essex approach has explicit design objectives, based on aligning with other metadata initiatives to support multi-disciplinary data. In other words, this does not simply expand or reduce the default EPrints workflow for data deposit, but starts with a new perspective. We have been liaising with its development team to investigate the possibility of building this approach into an Essex EPrintsapp for other repositories to share.
  19. Here is a section of the Essex workflow, highlighting one area of major difference with the default workflow. It shows fields for time- and geographic-based information.
  20. We’ve looked at getting data into the repository, but not yet how it is displayed as an output, or a data record from the repository. This is one example. It is not the most revealing record, but could be expanded.
  21. Essex has cited specific design criteria for its research data repository. Additionally we have observed some characteristic features, indicated here. In particular, it is a data-only repository, without provision for other data-types offered by EPrints (shown in slide 13). The indication of mandatory fields adds a further layer of insight into the implementation of the design criteria.
  22. So far in this presentation we have seen different implementations of data repository deposit interfaces, includingDataFlow, Sharepoint, and multiple interfaces for EPrints. Where is this heading, and what are the common themes? Since we are exploring the difference between architecting and engineering these repositories, I was interested to see this national newspaper article about a major redevelopment of an area close to central London, Nine Elms, an area that interests me as I pass through it on regular basis. Phrases that stand out refer to the relationship between the planned new high-rise buildings. What does this have to do with data repositories?
  23. Interoperability is the relationship between repositories and how they interact with services, such as search, through shared metadata. If repositories have ‘nothing in particular to do with anything around them’ or “show little interest in anything around’ them, then they will not be interoperable. If repositories stand alone rather than interoperate then they become less effective at making their contents visible. Open access repositories have long recognised the importance of interoperability, being founded on the Open Archives Initiative (OAI) over a decade ago, and efforts to improve interoperability continue with current developments. Shown here are some current interoperability initiatives from one morning’s mailbox. Data repositories will be connected to this debate, but so far it has not been a priority in all the examples we have considered here.
  24. One of the organisations listed on the previous slide, COAR, produced a report that outlines more comprehensively the scope of current interoperability initiatives for open access. While some solutions to the capture of research data seen here have reasonably been ‘architected’, that is, starting with a blank sheet to focus on the specific design needs of data deposit, these will need to catch up quickly with interoperability requirements, including most of those listed here. Data repositories ‘engineered’ on a platform such as EPrints, originally designed for other data types, do not obviously lack the flexibility to accommodate research data, and by virtue of having contributed to repository interoperability since the original OAI, already support most of the requirements shown here.
  25. As for the DataPool Project, it will continue its dual approach of developing and testing both Sharepoint and EPrints apps. As a project it does not get to choose what is ultimately adopted to run the emerging research data repository at the University of Southampton. There are repository-specific factors that will determine that; but there are other organisational factors to take into account as well. Institutions seeking to build research data repositories that are clearly focussed on this range of factors are likely to have most success in implementing a repository to attract data deposit and usage.