This presentation was provided by Dr. Paul Burton of the University of Bristol during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016, in conjunction with the International Data Week in Denver, Colorado.
This slide shows the set of task groups established under the aegis of the RDA/NISO Privacy Implications of Research Data Sets Interest Group; it was used during the NISO Symposium held on September 11, 2016 in conjunction with International Data Week events in Denver, Colorado.
This presentation was provided by Dr. Christine Borgman of UCLA during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016, as part of the International Data Week event in Denver, Colorado.
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
In June 2013, the Alfred P. Sloan Foundation awarded NISO a grant to undertake a two-phase initiative to explore, identify, and advance standards and/or best practices related to a new suite of potential metrics in the community.The NISO Altmetrics Project has successfully moved to Phase Two, the formation of three working groups, A, B, & C. Working Group B, led by Kristi Holmes, PhD, Director, Galter Health Sciences Library at Northwestern University, and Mike Taylor, Senior Product Manager, Informetrics at Elsevier, is focused on the Output Types & Identifiers within the alternative metrics landscape.
This was a joint presentation provided by Jeff Broadbent and Betty Rozum of Utah State University during a NISO webinar on Compliance with Funder Mandates, held on September 16, 2016.
Managing sensitive data at the University of BristolJisc RDM
Presentation on managing sensitive data at the University of Bristol by Kellie Snow, Research Data Librarian for the Research Data Network event, May 2016, Cardiff University.
This slide shows the set of task groups established under the aegis of the RDA/NISO Privacy Implications of Research Data Sets Interest Group; it was used during the NISO Symposium held on September 11, 2016 in conjunction with International Data Week events in Denver, Colorado.
This presentation was provided by Dr. Christine Borgman of UCLA during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016, as part of the International Data Week event in Denver, Colorado.
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
In June 2013, the Alfred P. Sloan Foundation awarded NISO a grant to undertake a two-phase initiative to explore, identify, and advance standards and/or best practices related to a new suite of potential metrics in the community.The NISO Altmetrics Project has successfully moved to Phase Two, the formation of three working groups, A, B, & C. Working Group B, led by Kristi Holmes, PhD, Director, Galter Health Sciences Library at Northwestern University, and Mike Taylor, Senior Product Manager, Informetrics at Elsevier, is focused on the Output Types & Identifiers within the alternative metrics landscape.
This was a joint presentation provided by Jeff Broadbent and Betty Rozum of Utah State University during a NISO webinar on Compliance with Funder Mandates, held on September 16, 2016.
Managing sensitive data at the University of BristolJisc RDM
Presentation on managing sensitive data at the University of Bristol by Kellie Snow, Research Data Librarian for the Research Data Network event, May 2016, Cardiff University.
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
Research data spring: extending the OPD to cover RDMJisc RDM
The research data spring project "Extending the Organisational Profile Document to cover Research Data Management" slides for the third sandpit workshop. Project led by Joy Davidson from the Digital Curation Centre.
Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
This presentation was provided by William Hoffman and Sri Rajan of Swets, during the NISO at NASIG Pre-conference "Metadata in a Digital Age: New Models of Creation, Discovery, and Use," held on June 4, 2008.
Stop press: should embargo conditions apply to metadata?Jisc RDM
Sarah Middle of Cambridge University discusses whether embargo conditions should apply to metadata. Session held at the Research Data Network event in May 2016, Cardiff University.
This presentation was provided by Libbie Stephenson, UCLA Social Science Data Archive, during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Research data spring: giving researchers credit for their dataJisc RDM
The research data spring project "Giving researchers credit for their data" slides for the third sandpit workshop. Project led by the University of Oxford Bodleian Libraries.
My presentation at the http://neuroinformatics2017.org (Kuala Lumpur, Malaysia) on FAIR and FAIRsharing (previously BioSharing); metadata standards and their implementation by databases/repositories and adoption by journals' and funders' data policies.
This presentation was provided by Christoph Bruch of the Helmholtz Association of German Research Centres during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016 in Denver, Colorado, in conjunction with the International Data Week event.
This presentation was provided by Todd Carpenter of NISO as the introduction to the day-long symposium, Privacy Implications of Research Data, held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
Research data spring: extending the OPD to cover RDMJisc RDM
The research data spring project "Extending the Organisational Profile Document to cover Research Data Management" slides for the third sandpit workshop. Project led by Joy Davidson from the Digital Curation Centre.
Increased access to the data generated is fuelling increased consumption and accelerating the cycle of discovery. But the successful integration and re-use of heterogeneous data from multiple providers and scientific domains is a major challenge within academia and industry, often due to incomplete description of the study details or metadata about the study. Using the BioSharing, ISA Commons and the STATistics Ontology (STATO) projects as exemplar community efforts, in this breakout session we will discuss the evolving portfolio of community-based standards and methods for structuring and curating datasets, from experimental descriptions to the results of analysis.
http://www.methodsinecologyandevolution.org/view/0/events.html#Data_workshop
This presentation was provided by William Hoffman and Sri Rajan of Swets, during the NISO at NASIG Pre-conference "Metadata in a Digital Age: New Models of Creation, Discovery, and Use," held on June 4, 2008.
Stop press: should embargo conditions apply to metadata?Jisc RDM
Sarah Middle of Cambridge University discusses whether embargo conditions should apply to metadata. Session held at the Research Data Network event in May 2016, Cardiff University.
This presentation was provided by Libbie Stephenson, UCLA Social Science Data Archive, during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Research data spring: giving researchers credit for their dataJisc RDM
The research data spring project "Giving researchers credit for their data" slides for the third sandpit workshop. Project led by the University of Oxford Bodleian Libraries.
My presentation at the http://neuroinformatics2017.org (Kuala Lumpur, Malaysia) on FAIR and FAIRsharing (previously BioSharing); metadata standards and their implementation by databases/repositories and adoption by journals' and funders' data policies.
This presentation was provided by Christoph Bruch of the Helmholtz Association of German Research Centres during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016 in Denver, Colorado, in conjunction with the International Data Week event.
This presentation was provided by Todd Carpenter of NISO as the introduction to the day-long symposium, Privacy Implications of Research Data, held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
This presentation was provided by John Wilbanks of Sage Bionetworks, during the NISO Symposium, Privacy Implications of Research Data held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
This presentation was provided by Dr. Micah Altman of MIT during the NISO Symposium, Privacy Implications of Research Data, held on September 11, 2016 in conjunction with the International Data Week events in Denver, Colorado.
These are the welcoming slides for the NISO-BISG Forum on The User Experience, held on June 24, during the 2016 ALA Annual Conference, in Orlando, Florida.
This presentation was provided by Andrew Albanese of Publishers Weekly during the NISO-BISG Forum held on Friday, June 24, at the 2016 ALA Annual Conference, Orlando, FL
This was a joint presentation by Daniel Ayala (Proquest); Michael C. Robinson (Univ Alaska-Anchorage) and Nettie Lagace (NISO) for the NISO-BISG Forum held on June 24, during the 2016 ALA Annual Conference in Orlando, FL.
These slides were used during a panel discussion between Todd Carpenter (NISO), Therese Hunt (Elsevier), Becky Clark (Library of Congress), and Lettie Conrad (SAGE) during the NISO-BISG Joint Forum, held June 24, 2016 during the 2016 ALA Annual Conference in Orlando, FL.
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
Presentation by Hugo Leroux and Liming Zhu, CSIRO, to the 'Unlocking value from publicly funded Clinical Research Data' workshop, cohosted by ARDC and CSIRO at ANU on 6 March 2019.
My testimony to NSTAC (http://www.dhs.gov/national-security-telecommunications-advisory-committee) on the need for more research data in big data networking analysis, better taxonomies/ontologies, and the need for more accessible tools, given December 8, 2015. A very insightful, thoughtful group of people. The administration really got it right with this one.
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
RWE & Patient Analytics Leveraging Databricks – A Use CaseDatabricks
Gaining insights and knowledge from real-world health data (RWD), i.e., data acquired outside the context of randomized clinical trials, has been an area of continued opportunity for pharma organizations.
What is real-world data and real-world evidence – how it is generated, what value it drives for life sciences in general and what kind of analytics are performed.
What are some considerations and challenges related to data security, privacy, and industrialization of a big data platform hosted in the cloud.
How we leveraged Databricks to perform big data ingestion – advantages over native AWS Batch/Glue Examples of some of the advanced analytics use cases downstream that leveraged DB for RWE.
Note: This solution and one of the use case leveraging the solution won the 2020 Gartner Eye for Innovation award.
https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-the-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-awardIn this
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Data Mesh is the decentralized architecture where your units of architecture is a domain driven data set that is treated as a product owned by domains or teams that most intimately know that data either creating it or they are consuming it and re-sharing it and allocated specific roles that have the accountability and the responsibility to provide that data as a product abstracting away complexity into infrastructure layer a self-serve infrastructure layer so that create these products more much more easily.
Responsible AI & Cybersecurity: A tale of two technology risksLiming Zhu
With the broader adoption of digital technologies and AI, organisations face the emerging risks of AI, the unfamiliar, and the intensified risk of cybersecurity, the familiar. AI and cybersecurity are intertwined, but risk silos are often created when they are dealt with at the technology and governance levels. This talk will explore the interactions between responsible AI and cybersecurity risks via industry case studies. It will show how we can break down the risk silos and use emerging trust-enhancing technologies, architecture and end-to-end software engineering/DevOps practices to connect the two worlds and uplift the risk management posture for both.
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons Learned in Academic and Life Science Settings
Dan Housman, Recombinant by Deloitte
The Recombinant by Deloitte team has worked with organizations such as Kimmel Cancer Center as a model to adapt existing mature i2b2 implementations to meet business and scientific needs. Other organizations are increasingly focused on how to use cloud and high performance computing models to achieve different performance levels. Advanced initiatives are progressing to link commercial tools such as Qlikview to explore tranSMART data and to solve for key gaps in scientific pipelines. Dan will present recent lessons learned, new capabilities, and some of the impact on the path forwards for future tranSMART updates.
Businesses make critical decisions using key data assets, but stakeholders often find it difficult to navigate the complex data landscape to ensure they have the right data and understand it correctly. Companies are dealing with a number of different technologies, multiple data formats, and high data volumes, along with the requirements for data security and governance.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
Similar to Burton - Security, Privacy and Trust (20)
This presentation was provided by William Mattingly of the Smithsonian Institution, during the closing segment of the NISO training series "AI & Prompt Design." Session Eight: Limitations and Potential Solutions, was held on May 23, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the seventh segment of the NISO training series "AI & Prompt Design." Session 7: Open Source Language Models, was held on May 16, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the sixth segment of the NISO training series "AI & Prompt Design." Session Six: Text Classification with LLMs, was held on May 9, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fifth segment of the NISO training series "AI & Prompt Design." Session Five: Named Entity Recognition with LLMs, was held on May 2, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fourth segment of the NISO training series "AI & Prompt Design." Session Four: Structured Data and Assistants, was held on April 25, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the third segment of the NISO training series "AI & Prompt Design." Session Three: Beginning Conversations, was held on April 18, 2024.
This presentation was provided by Kaveh Bazargan of River Valley Technologies, during the NISO webinar "Sustainability in Publishing." The event was held April 17, 2024.
This presentation was provided by Dana Compton of the American Society of Civil Engineers (ASCE), during the NISO webinar "Sustainability in Publishing." The event was held April 17, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the second segment of the NISO training series "AI & Prompt Design." Session Two: Large Language Models, was held on April 11, 2024.
This presentation was provided by Teresa Hazen of the University of Arizona, Geoff Morse of Northwestern University. and Ken Varnum of the University of Michigan, during the Spring ODI Conformance Statement Workshop for Libraries. This event was held on April 9, 2024
This presentation was provided by William Mattingly of the Smithsonian Institution, during the opening segment of the NISO training series "AI & Prompt Design." Session One: Introduction to Machine Learning, was held on April 4, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the eight and final session of NISO's 2023 Training Series on Text and Data Mining. Session eight, "Building Data Driven Applications" was held on Thursday, December 7, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the seventh session of NISO's 2023 Training Series on Text and Data Mining. Session seven, "Vector Databases and Semantic Searching" was held on Thursday, November 30, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the sixth session of NISO's 2023 Training Series on Text and Data Mining. Session six, "Text Mining Techniques" was held on Thursday, November 16, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the fifth session of NISO's 2023 Training Series on Text and Data Mining. Session five, "Text Processing for Library Data" was held on Thursday, November 9, 2023.
This presentation was provided by Todd Carpenter, Executive Director, during the NISO webinar on "Strategic Planning." The event was held virtually on November 8, 2023.
This presentation was provided by Rhonda Ross of CAS, a division of the American Chemical Society, and Jonathan Clark of the International DOI Foundation, during the NISO webinar on "Strategic Planning." The event was held virtually on November 8, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the fourth session of NISO's 2023 Training Series on Text and Data Mining. Session four, "Data Mining Techniques" was held on Thursday, November 2, 2023.
This presentation was provided by Tiffany Straza of UNESCO, during the two-day "NISO Tech Summit: Reflections Upon The Year of Open Science." Day two was held on October 26, 2023.
More from National Information Standards Organization (NISO) (20)
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
2024.06.01 Introducing a competency framework for languag learning materials ...
Burton - Security, Privacy and Trust
1. Security, privacy and trust: why and
how might we control access to
research data?
Paul Burton, Rebecca Wilson
University of Bristol, D2K Research Program
NISO Symposium, Denver
11th September, 2016
2. • Perhaps the most important message is in the
title:
• This is a complex challenge involving science,
technology, governance and other fundamental
social issues
• No single solution will be adequate
• True transdisciplinary programs of work are essential
• Even the most complex and sophisticated of
solutions will never offer fully effective exploitation
of available data with zero risk of mistakes in
managing data or of malign interference
Security, privacy and trust
3. From Preface:
“Our view has always been that
anonymisation is a heavily context-
dependent process and only by
considering the data and its
environment as a total system
(which we call the data situation),
can one come to a well informed
decision about whether and what
anonymisation is needed.”
5. • Who might share data?
• Distinct generator and user
• Share data across a consortium
• How is ‘sharing’ achieved?
• Physically transfer data to a user
• Provide access to analyse
• Analysis on-site
• Remote analysis
• Federated analysis
• All interpretations valid and important
What does “sharing” research microdata mean?
6. • Management of intellectual property invested
and held in data – most areas of research
• Legal, ethical and other governance stipulations
to protect the welfare of research participants –
particularly in health/social/biomedical research
• Disclosure of identity
• Disclosure of associated information
• Balance between these and the societal benefits of
streamlined comprehensive data access – which is
evolving rapidly with time and social context
Why control research data at all?
10. • Risks associated with: storage; transmission; use
• Accidental v deliberate violations
• Direct v inferential disclosure
• Risks and remedies lie in the nature of the data
themselves and the contextual environment in
which the data are to be used – and potentially
misused
• Mark Elliot, Elaine Mackey, Keith Spicer, Caroline Tudor,
the Anonymisation Decision-Making Framework, 2016
Issues to consider
11. • Consider the user(s)
• Are he/she/they bona fide researchers?
• Consider the application – for example:
• Does it violate (or potentially violate) any of the
ethical permissions granted to the study or any
of the consent forms signed by the
participants or their guardians?
• Is there a risk it may produce information that
may allow individual cohort members to be
identified?
How to implement control in practice? UK
MetaDAC and ALSPAC as illustrative examples
12. • Administrative and research data held
separately
• Hard copies of data held in locked storage
• Electronic data held on password protected
systems with access restricted to those who
really need it
• All electronic data held in encrypted form
• Extensive QC (security of quality)
• Multiple back-ups (security of existence)
Managing the data and the data environment
13. • All data pseudonomised before release
• If pseudonomisation scientifically impossible,
data can be analysed ‘on site’
• All data transfers encrypted using standard
protocols
• All linked data released on study-specific IDs
• Explicit acknowledgment that no system can
guarantee a zero risk of disclosure or misuse of
data
Managing the data and the data environment
14. • A strong underpinning governance structure
is essential.
• EAGDA (Expert Advisory Group on Data
Access) report 2015 considered, amongst
many other key issues that:
EAGDA report, 2015
• Governance must be proportionate and
context appropriate
• Must be transparent, auditable and
appealable
• Need mutual trust and respect amongst
stakeholders
15. • Applicants for data through MetaDAC or from
ALSPAC sign up agreeing to governance
documents which include statements such as:
• Applicants are reminded that the Terms and Conditions for the
cohort explicitly forbid any attempt to identify individuals or to
compromise or otherwise infringe the confidentiality of
information on data subjects and their right to privacy.
• Do you understand that you must not pass on any data or
samples awarded, or any derived variables or genotypes
generated by this application to a third party (i.e. to anybody
that is not included in this list of applicants on this project, nor is
a direct employee of one of these applicants)?
How to implement control in practice? UK
MetaDAC and ALSPAC as illustrative examples
16. Role of encryption and other
technology-based forms of
privacy protection in “open
science”
17. • When research data are very sensitive or are seen as
having a particular intellectual property value can we
develop technology-based solutions that facilitate
access to microdata by enhancing privacy protection
so that all intellectual property and governance
constraints are met in full while lowering the
governance bar? This can promote open science by
easing and/or speeding up access requests
• Should be seen as an additional component to be
applied on top of a data access and governance
system that is already well founded
• EAGDA report emphasises sustainability
Privacy protection in “open science”
18. Data
Computer
DC
Analysis
Computer
AC
Single site DataSHIELD
2009: The DataSHIELD
challenge
Given that microdata are scientifically
critical and yet potentially sensitive,
can we ensure that the information
driving analysis of the data at each
centre only ever emerges from the
firewall in non-disclosive form? (i)
encryption (trivial and non-trivial);
(ii) low dimensional (ideally
sufficient) statistics
Multi-site DataSHIELD
horizontally partitioned data
19. • One step analyses: e.g.
ds.table2D - request
non-disclosive output
from all sources
• Multi-step analyses: e.g.
ds.lexis – set up and then
request
• Iterative analyses: e.g.
ds.glm - parallel
processes linked
together by non-
identifying summary
statistics – e.g. for glm
= score vectors and
information matrices
• Can be used as
equivalent to
full individual
level analysis or
to study level
meta-analysis
The DataSHIELD
solution
21. Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
[36, 487.2951, 487.2951, 149]
Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
DataSHIELD
22. Σ Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
[36, 487.2951, 487.2951, 149]
Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
DataSHIELD
26. DataSHIELD analysis
Direct conventional analysis
Parameter Coefficient Standard Error
bintercept
-0.3296 0.02838
bBMI
0.02300 0.00621
bBMI.456
0.04126 0.01140
bSNP
0.5517 0.03295
Coefficients:
Estimate Std. Error
(Intercept) -0.32956 0.02838
BMI 0.02300 0.00621
BMI.456 0.04126 0.01140
SNP 0.55173 0.03295
Does it
work?
27. Server-side functions
Client-side functions
Individual level data never
transmitted or seen by the
statistician in charge, or by
anybody outside the
original centre in which
they are stored.
R
R
R R
Web services
Web servicesWeb services
Data server
Opal
Finrisk
Opal
Prevend
Opal
1958BC
Data server
Data server
BioSHaRE
web site
Web services
Analysis
client
DataSHIELD: current implementation for
horizontally partitioned data
28. IM5:
Analysis
Computer
R
R
R R
Web services
Web servicesWeb services
Data computer Opal
NHS
Opal
ALSPAC
Opal
Education
Data computer Data computer
Regression coefficients = XTY/ XTX
XTX: Need to calculate
XAXA XAXB XAXC
XAXB XBXB XBXC
XAXC XBXC XCXC
XA
XB
XAXB
XA1 * XB1
+
XA2 * XB2
+
XA3 * XB3
+
……
DataSHIELD: current implementation for
vertically partitioned (linked) data
29. IM5:
Analysis
Computer
R
R
R R
Web services
Web servicesWeb services
Data computer Opal
NHS
Opal
ALSPAC
Opal
Education
Data computer Data computer
Regression coefficients = XTY/ XTX
XTX: Need to calculate
XAXA XAXB XAXC
XAXB XBXB XBXC
XAXC XBXC XCXC
MA
MB
MC
XA
XB
XAXB
XA1 * XB1
+
XA2 * XB2
+
XA3 * XB3
+
……
DataSHIELD:current
implementationfor vertically
partitioned(linked) data
plain.text.vector.A plain.text.vector.N
0 1 1 1 0 0 1 1 1 0 1 0 0 1
encryption.matrix
[,1] [,2] [,3]
[1,] -1.444769 2.495677 -5.322736
[2,] -1.355529 -9.369041 2.687347
[3,] 4.603762 -3.622044 -2.817478
occluded.matrix.A
[,1] [,2] [,3]
[1,] -1.4546711 0 4.0722205
[2,] 6.4809785 1 -4.5814726
[3,] 4.4954801 1 -8.7036260
[4,] 0.1995684 1 -8.6872205
[5,] -6.4060220 0 -6.6471777
[6,] -0.5164345 0 -0.2564673
[7,] -5.8981933 1 -8.5032852
30. IM5:
Analysis
Computer
R
R
R R
Web services
Opal
NHS
Opal
ALSPAC
Opal
Education
Data computer Data computer
Regression coefficients = XTY/ XTX
XTX: Need to calculate
XAXA XAXB XAXC
XAXB XBXB XBXC
XAXC XBXC XCXC
MA XT
A
MAXT
AXBMB
MA
XA
XB
MB
(MA)-1 MAXT
AXBMB (MB)-1 = XAXB
DataSHIELD:current
implementationfor vertically
partitioned(linked) data
plain.text.vector.A plain.text.vector.N
0 1 1 1 0 0 1 1 1 0 1 0 0 1
encryption.matrix
[,1] [,2] [,3]
[1,] -1.444769 2.495677 -5.322736
[2,] -1.355529 -9.369041 2.687347
[3,] 4.603762 -3.622044 -2.817478
occluded.matrix.A
[,1] [,2] [,3]
[1,] -1.4546711 0 4.0722205
[2,] 6.4809785 1 -4.5814726
[3,] 4.4954801 1 -8.7036260
[4,] 0.1995684 1 -8.6872205
[5,] -6.4060220 0 -6.6471777
[6,] -0.5164345 0 -0.2564673
[7,] -5.8981933 1 -8.5032852
32. Becca Wilson
If people want to know technical details about DataSHIELD or methods for secure data
sharing/analysis: see 12th September 11:30 - 13:00 Secure Multiparty Computation for Statistical
Analysis of Private Data
Demetris Avraam
Poster #12 RDA poster session Wednesday, Thursday, Friday – DataSHIELD: a method for privacy
protected analysis of individual level data
RDA Working Group for Data Security and Trust
RDA 8th Plenary in the same venue as the NISO symposium. Session on 17th September 11:00 -
12:30. We are running a survey to gather information on current data security practices in our
community. The survey is available at www.bit.ly/dash-ing (see below)
Data to Knowledge (D2K) Research Group. Contact details:
@Data2Knowledge there is also a "contact us" page on www.datashield.ac.uk
Setting up a professional community for stakeholders in health data sharing
DASH-ING: DAta Sharing for Health - INnovation Group
The website will initially (and temporarily) be at: www.bit.ly/dash-ing This webpage contains
questions for RDA survey and link to join the professional community
Additional opportunities for interaction
37. > plain.text.vector.L
[1] 0 1 1 1 0 0 1
> e.mat.1
[,1]
[1,] 7.13763
> e.mat.1%*%t(matrix(plain.text.vector.L))
[,1] [,2] [,3] [,4] [,5] [,6][,7]
[1,] 0 7.13763 7.13763 7.13763 0 0 7.13763
>e.mat.1%*%t(matrix(plain.text.vector.L))%*%plain.text.matrix.N
[,1]
[1,] 21.41289
>(1/e.mat.1)*e.mat.1%*%t(matrix(plain.text.vector.L))%*%plain.tex
t.matrix.N
[,1]
[1,] 3
Why do we need to occlude the
original plain text vector?
38. • Is there a significant risk of upsetting or
alienating cohort members or of reducing their
willingness to remain as active participants?
• Does it address topics that fall within the
acknowledged scientific remit of the cohort?
• Is access requested to an infinite resource (data
or cell line DNA) or a depletable resource. If
non-depletable NO assessment of the science
underpinning the application
UK MetaDAC and ALSPAC as illustrative examples
39. • Wish to control access
• Undesirable loss of intellectual property
• Violation of legal, ethical and other governance
stipulations – particularly disclosure of identity
and/or associated information
• Wish to ensure data used widely for scientific
research and that access procedures are
streamlined
• Data and contextual “data environment” both
crucial and control systems are typically
complex and multi-faceted
How should access be controlled?