The document summarizes the results of a research data management (RDM) pilot project at Imperial College London. It describes how £100k in funding was provided for six academic projects to develop exemplars of best practices in RDM. The funded projects developed various tools and frameworks to improve data curation, sharing, and citation. Overall, the pilot demonstrated that innovative RDM is possible but also difficult and expensive to develop sustainably. It helped establish an initial RDM community at Imperial.
Transforming scholarly communications support at Imperial College LondonTorsten Reimer
Presentation given by Ruth Harrison and Torsten Reimer at the 2016 RLUK Conference in London. We discuss how collaboration between Library Services and the Research Office has transformed Scholarly Communications Support (Open Access and Research Data Management, but also related areas such as reporting and ORCID) at Imperial College London.
The Good, the Bad and the Ugly. Open Access in the UKTorsten Reimer
This presentation was given at the Open Access Tage 2014 in Cologne, Germany. It
1) gives an overview of the OA policy context in the UK,
2) outlines how a research-intensive university (Imperial College London) addresses the issues with around the policies and
3) summarises the latest data available on OA publishing activity, in particular issues around hybrid journals.
Automate it – open access (compliance) as by-product of better workflowsTorsten Reimer
Presentation about challenges and solutions for open access workflows, including a case study on OA at Imperial College London. Presented at the 11 May Digital Science Webinar on "Smarter Open Access Workflows".
Imperial College London - journey to open scholarshipTorsten Reimer
Talk given at the 2016 Open Repositories conference in Dublin, Ireland. This paper follows the journey of a research intensive university towards making its outputs available openly, discusses approaches outlined above and identifies problems in the global scholarly communications landscape.
‘Everything Available’ – a vision for the development of the British Library ...Torsten Reimer
Presentation given at the annual RLUK (Research Libraries UK) conference on Thursday 9th March 2017. I discuss the British Library's 'Everything Available' portfolio that aims to transform the Library's research services, in particular around discovery, access and use of content.
Slides from a talk at the annual conference of the Deutsche Physikalische Gesellschaft e. V. (DPG) in Berlin (18/03/2015). I summarise the current OA policy landscape in the UK, use Imperial College London as an example of how a research-intensive university approaches these issues and then take a look at the (UK) data on the cost of open access and total cost of ownership.
Transforming scholarly communications support at Imperial College LondonTorsten Reimer
Presentation given by Ruth Harrison and Torsten Reimer at the 2016 RLUK Conference in London. We discuss how collaboration between Library Services and the Research Office has transformed Scholarly Communications Support (Open Access and Research Data Management, but also related areas such as reporting and ORCID) at Imperial College London.
The Good, the Bad and the Ugly. Open Access in the UKTorsten Reimer
This presentation was given at the Open Access Tage 2014 in Cologne, Germany. It
1) gives an overview of the OA policy context in the UK,
2) outlines how a research-intensive university (Imperial College London) addresses the issues with around the policies and
3) summarises the latest data available on OA publishing activity, in particular issues around hybrid journals.
Automate it – open access (compliance) as by-product of better workflowsTorsten Reimer
Presentation about challenges and solutions for open access workflows, including a case study on OA at Imperial College London. Presented at the 11 May Digital Science Webinar on "Smarter Open Access Workflows".
Imperial College London - journey to open scholarshipTorsten Reimer
Talk given at the 2016 Open Repositories conference in Dublin, Ireland. This paper follows the journey of a research intensive university towards making its outputs available openly, discusses approaches outlined above and identifies problems in the global scholarly communications landscape.
‘Everything Available’ – a vision for the development of the British Library ...Torsten Reimer
Presentation given at the annual RLUK (Research Libraries UK) conference on Thursday 9th March 2017. I discuss the British Library's 'Everything Available' portfolio that aims to transform the Library's research services, in particular around discovery, access and use of content.
Slides from a talk at the annual conference of the Deutsche Physikalische Gesellschaft e. V. (DPG) in Berlin (18/03/2015). I summarise the current OA policy landscape in the UK, use Imperial College London as an example of how a research-intensive university approaches these issues and then take a look at the (UK) data on the cost of open access and total cost of ownership.
Introducing ORCID at Imperial College LondonTorsten Reimer
Presentation given at the 6th German Library Congress (6. Bibliothekskongress Deutschland) in Leipzig, 15th March 2016. I discuss the position of ORCID in the scholarly communications system - in particular with reference to developments in the UK - and the ORCID implementation project at Imperial College London.
Presented by Stuart Macdonald at the IT Professionals Forum (20/5/14) and the PPLS (School of Philosophy, Psychology and Language Sciences) RDM Workshop (6/5/14).
Introduction to SUNCAT
Background to the redevelopment of the service
Key enhancements of the new interface
Contributing to SUNCAT
How SUNCAT can help you and your users
Demo of the new service
Future plans
Feedback and questions
Presented by Zena Mulligan at the Interlend 2014 Conference, 23-24 June 2014, Carlton Highland Hotel,
Edinburgh.
Presented by Peter Burnhill at the ost ALA Annual Holdings Update Forum, Universal and repurposed holdings information -- Emerging initiatives and projects, Morial Convention Center, New Orleans, Louisiana, USA, 25 June 2011
The main challenges facing universities and authors in moving to OA for journal articles are achieving compliance, managing costs, and realising the benefits of OA. This session will outline Jisc services that help, from submission of an article, through acceptance, to publication and use. It will show how these services build on existing infrastructure, where possible, to provide a solution that, while tailored to UK circumstances, is more widely applicable.
In order to be reused, research data must be discoverable.
The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects.
Universities are increasingly making research data assets available through repositories or other data portals.
The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops.
Presentation made at the 'Towards linked science - Open Data and DataCite Esrtonia seminar as part of the Estonian Open Access Week at University of Tartu
A workshop at the Repository Fringe 2014 in Edinburgh looks at the new Jisc Publications Router service, how it works and what it offers suppliers and consumers.
Presentation by Stuart Macdonald of the Edinburgh University Data Library at the Graduate School of Social and Political Science Induction, 15 and 16 Septeber, 2011, University of Edinburgh
Presented by Robin Rice at the "IRs dealing with data" workshop at the Open Repositories 2013 Conference in Charlottetown, Prince Edward Island, Canada, on 8 July 2013.
Introducing ORCID at Imperial College LondonTorsten Reimer
Presentation given at the 6th German Library Congress (6. Bibliothekskongress Deutschland) in Leipzig, 15th March 2016. I discuss the position of ORCID in the scholarly communications system - in particular with reference to developments in the UK - and the ORCID implementation project at Imperial College London.
Presented by Stuart Macdonald at the IT Professionals Forum (20/5/14) and the PPLS (School of Philosophy, Psychology and Language Sciences) RDM Workshop (6/5/14).
Introduction to SUNCAT
Background to the redevelopment of the service
Key enhancements of the new interface
Contributing to SUNCAT
How SUNCAT can help you and your users
Demo of the new service
Future plans
Feedback and questions
Presented by Zena Mulligan at the Interlend 2014 Conference, 23-24 June 2014, Carlton Highland Hotel,
Edinburgh.
Presented by Peter Burnhill at the ost ALA Annual Holdings Update Forum, Universal and repurposed holdings information -- Emerging initiatives and projects, Morial Convention Center, New Orleans, Louisiana, USA, 25 June 2011
The main challenges facing universities and authors in moving to OA for journal articles are achieving compliance, managing costs, and realising the benefits of OA. This session will outline Jisc services that help, from submission of an article, through acceptance, to publication and use. It will show how these services build on existing infrastructure, where possible, to provide a solution that, while tailored to UK circumstances, is more widely applicable.
In order to be reused, research data must be discoverable.
The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects.
Universities are increasingly making research data assets available through repositories or other data portals.
The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops.
Presentation made at the 'Towards linked science - Open Data and DataCite Esrtonia seminar as part of the Estonian Open Access Week at University of Tartu
A workshop at the Repository Fringe 2014 in Edinburgh looks at the new Jisc Publications Router service, how it works and what it offers suppliers and consumers.
Presentation by Stuart Macdonald of the Edinburgh University Data Library at the Graduate School of Social and Political Science Induction, 15 and 16 Septeber, 2011, University of Edinburgh
Presented by Robin Rice at the "IRs dealing with data" workshop at the Open Repositories 2013 Conference in Charlottetown, Prince Edward Island, Canada, on 8 July 2013.
Numerous scientific teams use the HDF5 format to store very large datasets. Efficient use of this data in a distributed environment depends on client applications being able to read any subset of the data without transferring the entire file to the local machine. The goal of the HDF5-iRODS Project was to develop an HDF5-iRODS module for the iRODS datagrid server that supported this capability, and to apply the technology to an NCSA/SDSC Strategic Applications Program (SAP) project, FLASH.
A joint team from The HDF Group (representing NCSA) and the SDSC SRB group collaborated to accomplish the project goal. The team implemented five HDF5 microservices functions on the iRODS server, and developed an iRODS FLASH slice client application. The client implementation also includes a JNI interface that allows HDFView, a standard tool for browsing HDF5 files, to access HDF5 files stored remotely in iRODS. Finally, three new collection client/server calls were added to the iRODS APIs, making it easier for users to query the content of an iRODS collection.
Private Cloud Architecture: Moving Wits beyond the cutting edge. This is a talk on our private cloud architecture that we are implementing at the University of the Witwatersrand, Johannesburg.
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...EMC
This white paper explains how the Renaissance Computing Institute (RENCI) of the University of North Carolina uses EMC Isilon scale-out NAS storage, Intel processor and system technology, and iRODS-based data management to tackle Big Data processing, Hadoop-based analytics, security and privacy challenges in research and clinical genomics.
A brief overview of the development and current workflows for Research Data Management at Imperial College London, presented to colleagues at the University of Copenhagen and Roskilde University in Denmark.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
ChemSpider is one of the chemistry community’s primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013SALCTG
An overview of Research Data Management: the research process from developing ideas to preservation of data; funder perspectives, the impact on the wider service, Data Asset Frameworks, preservation and access, and cost implications.
RDM Roadmap to the Future, or: Lords and Ladies of the DataRobin Rice
Story of the new 2017-2020 University of Edinburgh RDM Roadmap, with a Tolkienesque theme for IASSIST-CARTO 2018 in Montreal: "Once upon a data point: sustaining our data storytellers".
Overview of the UKRDDS pilot project at Univwersity of Edinburgh employing PhD interns to validate metadata about research data created by University of Edinburgh researchers and held in local RDM services solutions. This was presented at IASSIST in June 2016, Bergen, Norway.
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Building data networks: exploring trust and interoperability between authoris...Repository Fringe
Building data networks: exploring trust and interoperability between authoris, repositories and journals. Varsha Khodiyar , Scientific Data; Neil Chue Hong, Journal of Open Research Software; Rachael Kotarski, DataCite, Peter McQuilton, BioSharing; Reza Salek, Metabolights. At Repository Fringe 2015
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
A Manifesto for the Digital Shift in Research LibrariesTorsten Reimer
A report from the Digital Shift working group for RLUK (Research Libraries UK) on the challenges libraries face with regards to the digital shift and how to overcome them. Presented at a virtual RLUK seminar on 18th May 2020.
Researching researchers Delivering a systematic user research programme in a ...Torsten Reimer
Paper given with Sally Halper and Fiona McCarthy at the 2019 RLUK conference on a programme of user research currently underway at the British Library. A video recording is available at: https://www.youtube.com/watch?v=vh8il-Eur7E
The once and future library: will there be, and what might a research library...Torsten Reimer
A (personal) look at the challenges that lie ahead for research libraries, and a (doomed to fail) approach at predicting what research libraries will look like in 2030.
For repositories to succeed they have to end. Reflections on (not just) the U...Torsten Reimer
Presentation given at the Open Repositories 2018 conference in Bozeman, Montana, 6th June 2018. Starting with an assessment of the UK open access repository environment, this presentation asks broader questions about the state of the open repository landscape globally. In response to a report to the UK government on open access, Universities UK have set up a repositories working group to identify issues where common benefit can be delivered and actions that can be taken. In this talk I will combine my own assessment of the repository landscape with a summary of the work of the working group and its recommendations. The presentation will also introduce work underway at the British Library to address some of the issues the working group has identified, including an assessment of a national OA preservation solution and a shared-services repository infrastructure. I will make the case that to realise the benefits of open repositories we need to move away from the model of locally hosted repositories.
Making ‘Everything Available’ – Transforming the (online) services and experi...Torsten Reimer
In this closing keynote of the OpenAthens conference 2018 I discuss whether as a sector we have failed our users in how we currently provide access to scholarly information, and I describe the British Library's response - the change management portfolio 'Everything Available'.
Presentation on the Imperial College London ORCID project, given at the 'UK ORCID members meeting and launch of Jisc consortium', held at Imperial College London, 28th September 2015.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Chapter 3 - Islamic Banking Products and Services.pptx
Green Shoots:Research Data Management Pilot at Imperial College London
1. Green Shoots:
RDM Pilot at Imperial College London
Ian McArdle
Head of Research Systems & Information
i.mcardle@imperial.ac.uk
Torsten Reimer
Project Manager: Open Access & RDM
t.reimer@imperial.ac.uk
Presenting projects by: M. Bearpark & C. Fare; G. Thomas, S. Butcher & C. Tomlinson; M. Mueller;
H. S. Rzepa, M. J. Harvey, N. Mason & A. Mclean; G. Gorman, C. T. Jacobs & A. Avdis; N. Jones
www.imperial.ac.uk/researchsupport/rdm/policy/greenshoots
IDCC15, 10th February 2015
2. Imperial College London
• Seven London campuses
• Four Faculties: Engineering,
Medicine, Natural Sciences and
Business School
• Ranked 2nd in world
(QS University Ranking)
• Net income (2014): £855m, incl. £351m research grants and contracts
• ~15,000 students, ~7,200 staff, incl. ~3,700 academic & research staff
• Staff publish ~10,000 scholarly articles per year
http://www.imperial.ac.uk/
3. College Position Statement
“Imperial College London is committed to promoting the
highest standards of academic research, including
excellence in research data management. This includes
a robust digital curation infrastructure that supports open
data access and protects confidential data.
Imperial acknowledges legal, ethical and commercial
constraints on data sharing and the need to preserve the
academic entitlement to publication.”
- Approved by Provost Board, publicised via Staff Briefing
4. Investing in RDM
“Green Shoots” scheme is born
So where, specifically, should the College invest?
Considering large research income and reputation, Imperial cannot
afford to “get it wrong”
College acknowledges that excellence in RDM will require
significant investment and academic engagement
5. “Green Shoots” Funding - £100K Investment
What did we want?
• Academically-driven projects to
demonstrate best practice in
RDM
• Specifically frameworks /
prototypes that would comply
with funder policies and College
position
• Frameworks could be based
either on original ideas or
integrating existing solutions into
the research process
• Projects that supported Open
Innovation and open access for
data
What did we hope to achieve?
• Encourage a “bottoms up”
approach to maximise use of
local early adopters and
innovators
• Generate solutions that could be
grown to support RDM more
widely
• Demonstrate that innovative,
academically-driven, beneficial
RDM is possible and to stimulate
this further
• Advice concerning how Imperial
should proceed in supporting
RDM
6. FUNDING OPPORTUNITY:
Research Data Management
More Information: http://www.imperial.ac.uk/researchstrategy/funding
Contact: Ian McArdle i.mcardle@imperial.ac.uk
Submission Deadline: Friday 28th March 2014
Funding is available for academically-driven projects to identify
and generate exemplars of best practice in Research Data
Management (RDM), specifically frameworks and prototypes
that comply with key funder RDM policies and the College
position.
There is an expectation that solutions will support open access
for data and solutions that support Open Innovation are
strongly encouraged.
7. Funded Projects
• Haystack – A Computational Molecular Data Notebook
• M. Bearpark & C. Fare
• The Imperial College Tissue Bank: A Searchable Catalogue for Tissues, Research
Projects and Data Outcomes
• G. Thomas, S. Butcher & C. Tomlinson
• Integrated Rule-Based Data Management System for Genome Sequencing Data
• M. Mueller
• Research Data Management in Computational and Experimental Molecular
Science
• H. S. Rzepa, M. J. Harvey, N. Mason & A. Mclean
• Research Data Management: Where Software Meets Data
• G. Gorman, C. T. Jacobs & A. Avdis
• Research Data Management: Placing [Time Series] Data in its Context
• N. Jones
8. Haystack – A Computational Molecular Data Notebook
M. Bearpark & C. Fare
Idea
• Extend a working prototype of a computational chemical IPython notebook
making it available for all on github
Achievements
• Installation is now much simplified
• A tree document structure has been implemented
• Calculations using mainstream computational chemistry software can be set
up
• Calculations can be submitted to run on a high-performance computing cluster
• Data from completed calculations can be retrieved and visualised
RDM Benefits
• Enables computational molecular researchers to easily share a curated subset
of their results and document how those results were generated
More Information
• http://github.com/clyde-fare/cc_notebook
9. Imperial College Tissue Bank: A Searchable Catalogue for Tissues,
Research Projects and Data Outcomes
G. Thomas, S. Butcher & C. Tomlinson
Idea
• Extend the ICH tissue bank infrastructure to accept and catalogue research data
alongside the collection of 60,000 physical tissues specimens and donor records
Achievements
• A tool to automatically exchange data with the National Cancer Registry was built,
updating patient outcome data where known
• A pipeline to transfer summary sequencing data and metadata into the tissue bank
and a UI to view this information
• Prototyped a means for tracking location of associated raw sequencing data for
future development
• Began to investigate means to link publications back to associated tissue samples
RDM Benefits
• Enhances existing datasets and enables their reuse to maximise the benefits
gained from each tissue sample
More Information
• http://www.imperial.ac.uk/tissuebank/
10. Integrated Rule-Based Data Management System for Genome
Sequencing Data
M. Mueller
Idea
• Set up a data management system for the DNA sequencing service that will integrate with
existing central Imperial HPC infrastructure for processing, analysis and dissemination of raw
data and analysis results
Achievements
• See system on following slide
• iRODS-based system was implemented that:
• 1 – Transfers data from sequencer to HPC Service (different campus)
• 2 – Data are reformatted and split by sample and project and a quality report generated
• 3 – Reads are mapped to a reference genome, reformatting again, reducing file size
• 4 – Further compression achieved via compression algorithm
• 5 – Data transferred to a webserver and made available for download
• Overcame concerns over authentication by excluding the HPC storage from iRODS
RDM Benefits
• A robust infrastructure is now in place to effectively manage large volumes of complex
sequencing data
• The data are being made publicly available for re-use of this expensive resource
More Information
• http://www.imperial.ac.uk/genomicsfacility/informatics/
12. Research Data Management in Computational and Experimental
Molecular Science
H. S. Rzepa, M. J. Harvey, N. Mason & A. Mclean
Idea
• Address sustainability and scalability of a hub interfacing electronic lab notebooks with
HPC resources and digital data repositories
Achievements
• Produced an installer package to allow reuse of uportal DSpace front end
• Enhanced metadata in local repository to make it compliant with DataCite specifications –
all repository content automatically receives a DOI
• Integrated ORCID into their solution
• Developed a procedure using DOIs for directly retrieving data from a digital repository and
displaying it using Javascript components
• Curated 170,000 datasets from Cambridge to Imperial, adding standards-based metadata
RDM Benefits
• Molecular data can be referenced more robustly with persistent identifiers – step forward in
data citation
More Information
• http://doi.org/10042/a3v1w
13. Research Data Management in Computational and Experimental
Molecular Science
H. S. Rzepa, M. J. Harvey, N. Mason & A. Mclean
14. Research Data Management: Where Software Meets Data
G. Gorman, C. T. Jacobs & A. Avdis
Idea
• Integrating research data management into the research workflow so that data and software can
be curated at the push of a button using Figshare and Git
Achievements
• Developed and released an open source software library: PyRDM
• Automatically transfers software source code (stored under Git control) and data to Figshare
• Figshare generates a DOI for that code version and the data
• Metadata including author details and cross-referencing between code and data are uploaded
automatically
• Hoping for ORCID authentication via Figshare API to be added
• PyRDM was integrated into the Fluidity computational fluid dynamics code
• DOIs minted are stored in Fluidity to improve data provenance and allow a new revision of the
repository to be created if the data are updated at a later stage
RDM Benefits
• Research data published in line with funder expectations
• The DOI for a specific code version enables better recomputability of data
• Automated metadata generation reduces academic burden
More Information
• http://github.com/pyrdm http://dx.doi.org/10.5334/jors.bj www.fluidity-project.org
15. Research Data Management: Placing [Time Series] Data in its Context
N. Jones
Idea
• Provide a platform and technology which automatically connects researchers
through their time-series data, models and analysis methods
Achievements
• Online interdisciplinary collection of time-series data and time-series analysis code
• Functionality to automatically profile time series
• Functionality to automatically profile time series algorithms
• Functionality to use these profiles to place a user’s work in the context of others
RDM Benefits
• Incentivises data sharing by allowing data comparison – increases discoverability of
an academic’s data plus increases likelihood of finding other relevant data
• Resource also available to general public
More Information
• http://www.comp-engine.org/timeseries/
17. Overall Conclusions
Good data curation is HARD and EXPENSIVE
Development of sustainable research software is also HARD and EXPENSIVE
Data citation is
important
Immediate
incentives help
APIs useful
preferably open
Auto-generation
of metadata
E lab books
seem useful
Clinical data is a
minefield
Nucleus of an RDM community
at Imperial
Ideas to consider for wider deployment for
cross-College benefit
18. Thanks and Questions
Review of applications:
• Kevin Ashley, DCC Director
Green Shoots academics:
• M. Bearpark & C. Fare
• G. Thomas, S. Butcher & C. Tomlinson
• M. Mueller
• H. S. Rzepa, M. J. Harvey, N. Mason & A. Mclean
• G. Gorman, C. T. Jacobs & A. Avdis
• N. Jones
Provision of funds:
• Imperial Vice-Provost Advisory Group: Research