Business case and cost
modelling for an end-to-end
RDM service
Frances Madden, Research Data and Curation Manager
Dave Cobb, Project Manager
Background
• Research intensive university
• 21 academic departments, 9,000 students, 2000 staff
• Departments include Information Security, Physics,
Psychology
• Large arts and humanities faculty
Royal Holloway, University of London
• Have a policy, need ab end-to-end RDM service
• Initial investigation four months (underestimate!)
RDM Project & deliverables
Plan Create Preserve
Overarching Service Management & Support
Planning System
(improve DMPOnline)
Active Data Storage
CollaborationTools
Data Catalogue
Staging storage
Archive Data storage
(digital & analogue)
Preservation System
BusinessCase
• Funder mandates, heavily EPSRC & other Research
Council funded institution
• Improve infrastructure to facilitate research
excellence
• Improve research impact
• Be a partner of choice
Key Drivers
• Business analysis
• Focus group, interviews,
surveys
• Market investigation
• Events, networking,
presentations, Gartner
reports
• Supplier engagement
• Onsite visits, webinars,
events
• Guidance & consultancy
• TNA, DCC, ICO, Gartner,
specialist advice
Options Appraisal
Requirements led project – uptake is key
• Engaged the DCC for consultancy services
• Conducted funder analysis
• 75% of funded research is covered by OD requirements
• Conducted DAF/CARDIO Lite survey
• Circulated to all academic staff, 100+ responses
• Comparison with 2014 survey
• Informed risk and cost based business case
Consultancy
Findings
42%
19%
9%
14%
8%
2%
1%
5%
Data locations
Hard drive
External hard
drive
Cloud storage
Physical media
Central shared
drive
Email
Commercial
40
12
18
15
0
5
10
15
20
25
30
35
40
45
Held
internally
Project or
other website
Third party
data service
Other
Deposit locations for data
Findings
63%11%
10%
6%
4%
3% 2% 1%
Funding sources
RCUK funding
EU Government
Charity funding
Overseas
Other EU
British Academy
Government
funding (inc. NHS)
Industry
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
0 5 10 15 20 25 30
Less than 1GB
1-10GB
10-100GB
100-500GB
500GB-1TB
1-10TB
Over 10TB
DataVolumes
Costs
• Assessed costs of delivery options for Active and
Archive storage
• Received number of existing users of Cloud services
• Developed extensive business requirements for all
deliverables
• Still did not know how much storage was required!
Supplier and User Engagement
This became this….
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
[CELLRA
NGE]
0 5 10 15 20 25 30
Less than 1GB
1-10GB
10-100GB
100-500GB
500GB-1TB
1-10TB
Over 10TB
DataVolumes
And this….
Methodology – User types
• Decided on low,
medium, high and
massive categories of
users
• Based on survey
findings, scaled up
worked out quantities
of storage required
across the College
Classification % of users Example
Source
PI/Research staff 90%e N/A
Machine Generated 10%e Psychology – MRI scanner
Storage size requirements
Low (< 100GB) 60% Economics – spreadsheets; most humanities; mathematical
modelling
Medium (100GB-1TB) 28% Condensed matter physics
High (1TB – 10TB) < 9% Computer Science – data generated from algorithms
Massive (> 10TB) < 3% Physics – data received from CERN
Sensitivity
General 75% N/A
Restricted 25% PIR – defence policy work; Earth Sciences – commercial
partnerships
∟ Restricted (PII) 20% Psychology – parent & child research; Geography – video
research
Location
Online (local) 5% N/A
Online (remote) 91% N/A
Offline 4% Geography
Bandwidth & I/O
High 5%e Media Arts ; Psychology – MRI scanner
Standard 95%e N/A
• Calculated % which have external repositories
available based on funding councils
• Validated against estimated output per department
• Worked with R&E and academics to determine:
• Levels of personal identifiable information
• Quantity of unfunded research
• Collaboration requirements
• Offline access requirements
Methodology – User types cont.
Key stats
• 15-30% Personally identifiable
information
• 95% require collaboration capability
• 95% require offline or remote access
• 40% of research unfunded (based on
TRAC return, compared with funding
levels)
• 10% of research data produced
directly from equipment
29%
42%
11%
13%
5%
No. of grants and repositories
RCUK funded - Repository
RCUK funded - No repository
EU Funded
Funded - other (DR)
Funded - other (NDR)
Methodology – Storage requirements
• Storage allocations throughout
the lifecycle investigated and
costed
• Investigated delivery options for
archive and active
• Assessed options for different
numbers of user licences
• Appraised data for archive
storage – reducing data
volumes
UserType Qty
Research academics 250
All research staff 750
All research staff + RCUK students 1000
All research staff & students 2100
Funded research project 240
Unfunded research project 159
All research projects 399
Methodology – Other costs
• Integrations
• CRIS (Pure)
• DMPOnline
• Reporting tools
• New applications
• Service levels
• Staffing levels
• Application support
• Advisory service
• Project costs vs on-going costs
• Allocated vs. project costs
• Cost recovery
• Estimated 35% return over 4-5 year grant
lifecycle
• Based onTRAC return
• Overhead cost
Findings
 Balancing cost and control is a
challenge
 Data security work found that the
perceived risks of Cloud can be
mitigated and managed
 Data provision as it stands is not
adequate
 Cloud = higher OpEx – are you a
CapEX or OpEx institution?
 Very time-consuming
 Involves a lot of people, disparate
information sources - chase these down
and use them
 More confidence in decisions
 More involvement with stakeholders
across academic and professional
services departments
 Use expert sources like Gartner
Recommendation/Next steps
 Procure data catalogue and archive
storage in this financial year, before
Aug 2016
 Procure active storage solution to roll
out in Sept – Jan 2016
 Integrate planning tool with other
systems to increase knowledge of
planned research activities
 Develop service around the system to
support it and researchers using it
Still need to…
 Rate card for exceptional requirements
 Develop process for managing
exceptional requirements
 Training for admin and end users on all
new systems
 Plan for analogue data storage
Questions?
Frances Madden
frances.madden@rhul.ac.uk
@maddenfc
Dave Cobb
d.cobb@rhul.ac.uk
@dave_cobb

Business case and cost modelling for an end-to-end RDM service

  • 1.
    Business case andcost modelling for an end-to-end RDM service Frances Madden, Research Data and Curation Manager Dave Cobb, Project Manager
  • 2.
  • 3.
    • Research intensiveuniversity • 21 academic departments, 9,000 students, 2000 staff • Departments include Information Security, Physics, Psychology • Large arts and humanities faculty Royal Holloway, University of London
  • 4.
    • Have apolicy, need ab end-to-end RDM service • Initial investigation four months (underestimate!) RDM Project & deliverables Plan Create Preserve Overarching Service Management & Support Planning System (improve DMPOnline) Active Data Storage CollaborationTools Data Catalogue Staging storage Archive Data storage (digital & analogue) Preservation System
  • 5.
  • 6.
    • Funder mandates,heavily EPSRC & other Research Council funded institution • Improve infrastructure to facilitate research excellence • Improve research impact • Be a partner of choice Key Drivers
  • 7.
    • Business analysis •Focus group, interviews, surveys • Market investigation • Events, networking, presentations, Gartner reports • Supplier engagement • Onsite visits, webinars, events • Guidance & consultancy • TNA, DCC, ICO, Gartner, specialist advice Options Appraisal Requirements led project – uptake is key
  • 8.
    • Engaged theDCC for consultancy services • Conducted funder analysis • 75% of funded research is covered by OD requirements • Conducted DAF/CARDIO Lite survey • Circulated to all academic staff, 100+ responses • Comparison with 2014 survey • Informed risk and cost based business case Consultancy
  • 9.
    Findings 42% 19% 9% 14% 8% 2% 1% 5% Data locations Hard drive Externalhard drive Cloud storage Physical media Central shared drive Email Commercial 40 12 18 15 0 5 10 15 20 25 30 35 40 45 Held internally Project or other website Third party data service Other Deposit locations for data
  • 10.
    Findings 63%11% 10% 6% 4% 3% 2% 1% Fundingsources RCUK funding EU Government Charity funding Overseas Other EU British Academy Government funding (inc. NHS) Industry [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] 0 5 10 15 20 25 30 Less than 1GB 1-10GB 10-100GB 100-500GB 500GB-1TB 1-10TB Over 10TB DataVolumes
  • 11.
  • 12.
    • Assessed costsof delivery options for Active and Archive storage • Received number of existing users of Cloud services • Developed extensive business requirements for all deliverables • Still did not know how much storage was required! Supplier and User Engagement
  • 13.
    This became this…. [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] [CELLRA NGE] 05 10 15 20 25 30 Less than 1GB 1-10GB 10-100GB 100-500GB 500GB-1TB 1-10TB Over 10TB DataVolumes
  • 14.
  • 15.
    Methodology – Usertypes • Decided on low, medium, high and massive categories of users • Based on survey findings, scaled up worked out quantities of storage required across the College Classification % of users Example Source PI/Research staff 90%e N/A Machine Generated 10%e Psychology – MRI scanner Storage size requirements Low (< 100GB) 60% Economics – spreadsheets; most humanities; mathematical modelling Medium (100GB-1TB) 28% Condensed matter physics High (1TB – 10TB) < 9% Computer Science – data generated from algorithms Massive (> 10TB) < 3% Physics – data received from CERN Sensitivity General 75% N/A Restricted 25% PIR – defence policy work; Earth Sciences – commercial partnerships ∟ Restricted (PII) 20% Psychology – parent & child research; Geography – video research Location Online (local) 5% N/A Online (remote) 91% N/A Offline 4% Geography Bandwidth & I/O High 5%e Media Arts ; Psychology – MRI scanner Standard 95%e N/A
  • 16.
    • Calculated %which have external repositories available based on funding councils • Validated against estimated output per department • Worked with R&E and academics to determine: • Levels of personal identifiable information • Quantity of unfunded research • Collaboration requirements • Offline access requirements Methodology – User types cont.
  • 17.
    Key stats • 15-30%Personally identifiable information • 95% require collaboration capability • 95% require offline or remote access • 40% of research unfunded (based on TRAC return, compared with funding levels) • 10% of research data produced directly from equipment 29% 42% 11% 13% 5% No. of grants and repositories RCUK funded - Repository RCUK funded - No repository EU Funded Funded - other (DR) Funded - other (NDR)
  • 18.
    Methodology – Storagerequirements • Storage allocations throughout the lifecycle investigated and costed • Investigated delivery options for archive and active • Assessed options for different numbers of user licences • Appraised data for archive storage – reducing data volumes UserType Qty Research academics 250 All research staff 750 All research staff + RCUK students 1000 All research staff & students 2100 Funded research project 240 Unfunded research project 159 All research projects 399
  • 19.
    Methodology – Othercosts • Integrations • CRIS (Pure) • DMPOnline • Reporting tools • New applications • Service levels • Staffing levels • Application support • Advisory service • Project costs vs on-going costs • Allocated vs. project costs • Cost recovery • Estimated 35% return over 4-5 year grant lifecycle • Based onTRAC return • Overhead cost
  • 20.
    Findings  Balancing costand control is a challenge  Data security work found that the perceived risks of Cloud can be mitigated and managed  Data provision as it stands is not adequate  Cloud = higher OpEx – are you a CapEX or OpEx institution?  Very time-consuming  Involves a lot of people, disparate information sources - chase these down and use them  More confidence in decisions  More involvement with stakeholders across academic and professional services departments  Use expert sources like Gartner
  • 21.
    Recommendation/Next steps  Procuredata catalogue and archive storage in this financial year, before Aug 2016  Procure active storage solution to roll out in Sept – Jan 2016  Integrate planning tool with other systems to increase knowledge of planned research activities  Develop service around the system to support it and researchers using it Still need to…  Rate card for exceptional requirements  Develop process for managing exceptional requirements  Training for admin and end users on all new systems  Plan for analogue data storage
  • 22.

Editor's Notes

  • #4 FM
  • #5 DC Make mention of analogue and did work but ruled out of scope.
  • #7 FM
  • #8 DC Magic quadrants & preservation technology maturity discussions RDMF LARD Westminster Briefing UCISA JISC events
  • #9 FM
  • #10 FM
  • #11 DC and FM Data volumes are the same for a year and over a lifetime – issue in survey sampling. Researchers don’t know!
  • #13 DC Per TB costs On Prem vs cloud Differing security models ISOs etc
  • #14 DC What is this data, categorisation and quantification, how much do we actually need to store, how do we calculate a standard offering.
  • #15 Calculating appraisal reduction potential as Frances will mention. Shaping vs not shaping the offering Managing unfunded data Calculating research project opening and closure rate per year Margins for error
  • #16 FM and DC
  • #17 FM
  • #18 FM/DC TRAC smaller, unfunded – as calculations are based on overall data and we built in a margin for error
  • #19 FM – general considerations DC – the numbers – students as request service initially. Investigating global service.
  • #20 DC and FM TRAC Accounts for annual fluctuations in costs Penalties for claiming higher overheads Delays to include OpEx in overhead costs Delay from grant lifecycle – up to 8 years to see full potential of return.