This presentation describes the concept of DataTags, which simplifies handling of sensitive datasets. It then shows the Tags toolset, and how it is integrated with Dataverse, Harvard's popular dataset repository.
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
Presentation for the NFAIS Webinar series: Open Data Fostering Open Science: Meeting Researchers' Needs
http://www.nfais.org/index.php?option=com_mc&view=mc&mcid=72&eventId=508850&orgId=nfais
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
Talk for the workshop on the Future of the Commons, November 18, 2015: http://cendievents.infointl.com/CENDI_NFAIS_RDA_2015/
Slides distributed under under CC-by license: https://creativecommons.org/licenses/by/2.0/
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
Talk given at Two Sigma:
The Dataverse project, developed at Harvard's Institute for Quantitative Social Science since 2006, is a widely used software platform to share and archive data for research. There are currently more than 20 Dataverse repository installations worldwide, with the Harvard Dataverse repository alone hosting more than 60,000 datasets. Dataverse provides incentives to researchers to share their data, giving them credit through data citation and control over terms of use and access. In this talk, I'll discuss the Dataverse project, as well as related projects such as DataTags to share sensitive data and Cloud Dataverse to share Big Data.
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
Presentation for the NFAIS Webinar series: Open Data Fostering Open Science: Meeting Researchers' Needs
http://www.nfais.org/index.php?option=com_mc&view=mc&mcid=72&eventId=508850&orgId=nfais
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
Talk for the workshop on the Future of the Commons, November 18, 2015: http://cendievents.infointl.com/CENDI_NFAIS_RDA_2015/
Slides distributed under under CC-by license: https://creativecommons.org/licenses/by/2.0/
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
Talk given at Two Sigma:
The Dataverse project, developed at Harvard's Institute for Quantitative Social Science since 2006, is a widely used software platform to share and archive data for research. There are currently more than 20 Dataverse repository installations worldwide, with the Harvard Dataverse repository alone hosting more than 60,000 datasets. Dataverse provides incentives to researchers to share their data, giving them credit through data citation and control over terms of use and access. In this talk, I'll discuss the Dataverse project, as well as related projects such as DataTags to share sensitive data and Cloud Dataverse to share Big Data.
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
Data Publishing: The research community needs reliable, standard ways to make the data produced by scientific research available to the community, while giving credit to data authors. As a result, a new form of scholarly publication is emerging: data publishing. Data publishing - or making data reusable, citable, and accessible for long periods - is more than simply providing a link to a data file or posting the data to the researcher’s web site. We will discuss best practices, including the use of persistent identifiers and full data citations, the importance of metadata, the choice between public data and restricted data with terms of use, the workflows for collaboration and review before data release, and the role of trusted archival repositories. The Harvard Dataverse repository (and the Dataverse open-source software) provides a solution for data publishing, making it easy for researchers to follow these best practices, while satisfying data management requirements and incentivizing the sharing of research data.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Data Citation Implementation at DataverseMerce Crosas
Presentation at the Data Citation Implementation Pilot Workshop in Boston, February 3rd, 2016.
https://www.force11.org/group/data-citation-implementation-pilot-dcip/pilot-project-kick-workshop
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
The Nuclear Receptor Signaling Atlas (NURSA) is partnering with dkNET (NIDDK Information Network) to host a dataset challenge, and we invite you to join! Everyone is talking about Big Data. How can we ensure that the impact of individual scientists working on a myriad of small and focused studies that discover and probe new phenomena - is not lost in the Big Data world. In fact, there is more than one way to generate big data and we would like your help in creating and expanding “big data” for NIDDK! In this 30-minute webinar, dkNET team will give a presentation about the overview of challenge task, how to use dkNET to find research resources, and top tips!
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Presentation by Luiz Olavo Bonino about the current state of the developments on FAIR Data supporting tools at the Dutch Techcentre for Life Sciences Partners Event on November 3-4 2016.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
This presentation was provided by Tim McGeary of Duke University during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
Data Publishing: The research community needs reliable, standard ways to make the data produced by scientific research available to the community, while giving credit to data authors. As a result, a new form of scholarly publication is emerging: data publishing. Data publishing - or making data reusable, citable, and accessible for long periods - is more than simply providing a link to a data file or posting the data to the researcher’s web site. We will discuss best practices, including the use of persistent identifiers and full data citations, the importance of metadata, the choice between public data and restricted data with terms of use, the workflows for collaboration and review before data release, and the role of trusted archival repositories. The Harvard Dataverse repository (and the Dataverse open-source software) provides a solution for data publishing, making it easy for researchers to follow these best practices, while satisfying data management requirements and incentivizing the sharing of research data.
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Data Citation Implementation at DataverseMerce Crosas
Presentation at the Data Citation Implementation Pilot Workshop in Boston, February 3rd, 2016.
https://www.force11.org/group/data-citation-implementation-pilot-dcip/pilot-project-kick-workshop
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
The concept of FAIR (Findable, Accessible, Interoperable and Reusable) data is becoming a reality as stakeholders from industry, academia, funding agencies and publishers are embracing this approach. For BioPharma being able to effectively share and reuse data is a tremendous competitive advantage, within a company, with peer organizations, key opinion leaders and regulatory agencies. A few key drivers, success stories and preliminary results of an industry data stewardship survey are presented.
The Nuclear Receptor Signaling Atlas (NURSA) is partnering with dkNET (NIDDK Information Network) to host a dataset challenge, and we invite you to join! Everyone is talking about Big Data. How can we ensure that the impact of individual scientists working on a myriad of small and focused studies that discover and probe new phenomena - is not lost in the Big Data world. In fact, there is more than one way to generate big data and we would like your help in creating and expanding “big data” for NIDDK! In this 30-minute webinar, dkNET team will give a presentation about the overview of challenge task, how to use dkNET to find research resources, and top tips!
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen. Our processes enable simple creation of dataset records and linking to source data, providing a seamless federated knowledge graph for novice and advanced users alike.
Presented May 7th, 2019 at the Knowledge Graph Conference, Columbia University.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Presentation by Luiz Olavo Bonino about the current state of the developments on FAIR Data supporting tools at the Dutch Techcentre for Life Sciences Partners Event on November 3-4 2016.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
This presentation was provided by Tim McGeary of Duke University during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
The DataTags framework makes it easy for data producers to deposit, data publishers to store and distribute, and data users to access and use datasets containing confidential information, in a standardized and responsible way. The talk will first introduce the concepts and tools behind DataTags, and then focus on the user-facing component of the system - Tagging Server (available today at datatags.org). We will conclude by describing how future versions of Dataverse will use DataTags to automatically handle sensitive datasets, that can only be shared under some restrictions.
In this paper, we discuss about the Big Data. We
analyze and reveals the benefits of Big Data. We analyze the
big data challenges and how Hadoop gives solution to it. This
research paper gives the comparison between relational
databases and Hadoop. This research paper also gives reason
of why Big Data and Hadoop.
General Terms
Data Explosion, Big Data, Big Data Analytics, Hadoop, Hadoop
Distributed File System, MapReduce
DOCUMENT SELECTION USING MAPREDUCE Yenumula B Reddy and Desmond HillClaraZara1
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The research discusses the MapReduce issues, framework for MapReduce programming model and implementation. The paper includes the analysis of Big Data using MapReduce techniques and identifying a required document from a stream of documents. Identifying a required document is part of the security in a stream of documents in the cyber world. The document may be significant in business, medical, social, or terrorism.
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to
manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully
balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The
research discusses the Map Reduce issues, framework for Map Reduce programming model and
implementation. The paper includes the analysis of Big Data using Map Reduce techniques and identifying
a required document from a stream of documents. Identifying a required document is part of the security in
a stream of documents in the cyber world. The document may be significant in business, medical, social, or
terrorism.
Standard Safeguarding Dataset - overview for CSCDUG.pptxRocioMendez59
13 July, 2023 - CSCDUG Online Event
Presenting the Sector-led Standard Safeguarding Dataset
Colleagues from Data to Insight, the LA-led service for children’s safeguarding data professionals, are delivering a DfE-funded project in partnership with LAs to define a new “standard safeguarding dataset” which all LAs will be able to produce from their safeguarding information systems.
At this session, they shared what they’ve learned so far from user research with LA colleagues and discussed their early thinking about what a better standard dataset might look like. Participants shared their own thoughts about how to improve these systems and processes.
Presenters
Alistair Herbert
Alistair is the lead officer for Data to Insight, the LA-led service for children’s safeguarding data professionals. With a career focused on local authority children’s services data work, he knows about safeguarding data, information systems, and cross-organisation collaboration.
John Foster
John is a Data Manager for Data to Insight. He has supported a range of children’s services data work, most recently at Shropshire Council. He led Data to Insight’s project to introduce the first national benchmarking dataset for Early Help, and is the user research lead for Data to Insight’s Standard Safeguarding Dataset project.
Rob Harrison and Joe Cornford-Hutchings
Rob and Joe are new Data Managers joining Data to Insight from the private and public sector respectively. They bring between them a wealth of experience and technical expertise, and will be working together to support design and implementation of the new Standard Safeguarding Dataset through 2023-24.
Eu gdpr technical workflow and productionalization neccessary w privacy ass...Steven Meister
GDPR = General Data Protection Regulations or GDPR = Get Demand Payment Ready when your hacked or audited.
A Realistic project plan for GDPR Compliance. Another reality is the 95% not ready and even the 5% that say they are, will not like what they see in this plan in the hopes of becoming GDPR compliant.
There is just not enough time or people to get it done in the next 8 months and even if you had
2 years. This is a harsh reality and without the use of software technology and strict yet flexible, repeatable methodologies, it just won’t happen. Look at this Project plan of what needs to be done, do the math, see the complexity of data movement and code and programs needed then give us a call.
JPJ1452 Privacy-Enhanced Web Service Compositionchennaijp
We are good IEEE java projects development center in Chennai and Pondicherry. We guided advanced java technologies projects of cloud computing, data mining, Secure Computing, Networking, Parallel & Distributed Systems, Mobile Computing and Service Computing (Web Service).
For More Details:
http://jpinfotech.org/final-year-ieee-projects/2014-ieee-projects/java-projects/
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
DataTags, The Tags Toolset, and Dataverse Integration
1. DataTags'and'Harm'Levels'
Create and maintain a user-friendly system that allows researchers to
share data with confidence, knowing they comply with the laws and
regulations governing shared datasets.
We plan to achieve the above by the following efforts:
1. Describe the space of possible data policies using orthogonal
dimensions, allowing an efficient and unambiguous description of each
policy.
2. Harmonize American jurisprudence into a single decision-graph for
making decisions about data sharing policies applicable to a given
dataset.
3. Create an automated interview for composing data policies, such that
the resulting policy complies with the harmonized laws and regulations
(initially assuming the researcher’s answers correctly described the
dataset).
4. Create a set of “DataTags” – fully specified data policies (defined in
Describing a Tag Space), that are the only possible results of a tagging
process.
5. Create a formal language for describing the data policies space and the
harmonized decision-graph, complete with a runtime engine and
inspection tools.
6. Create an inviting, user-friendly web-based automated interview system
to allow researchers to tag their data sets, as part of the Dataverse
system.
Datasets used in social science research are often subject to legal and
human subjects protections. Not only do laws and regulations require such
protection, but also, without promises of protection, people may not share
data with researchers. On the other hand, “good science” practices
encourage researchers to share data to assure their results are reproducible
and credible. Funding agencies and publications increasingly require data
sharing too. Sharing data while maintaining protections is usually left to
the social science researcher to do with little or no guidance or assistance.
It is no easy feat. There are about 2187 privacy laws at the state and federal
levels in the United States [1]. Additionally, some data sets are collected or
disseminated under binding contracts, data use agreements, data sharing
restrictions etc. Technologically, there is an ever-growing set of solutions
to protect data – but people outside of the data security community may not
know about them and their applicability to any legal setting is not clear.
The DataTags project aims to help social scientists share their data widely
with necessary protections. This is done by means of interactive
computation, where the researcher and the system traverse a decision
graph, creating a machine-actionable data handling policy as they go. The
system then makes guarantees that releases of the data adhere to the
associated policy.
INTRODUCTION'
OBJECTIVES'
Harvard Research Data Security Policy[2] describes a 5-level scale for
researchers to handle research data. We extend this to a 6-level scale for
specifying data policies regarding security and privacy of data. The scale is
based on the level of harm malicious use of the data may cause. The
columns represent some of the dimensions of the data policy space.
Harmonized decision-graphs are the programs interactively executed by
the runtime and the researcher. The language we develop to create them
will support tagging statements, suggested wording for questions, sub-
routines and more. As we realize harmonized decision-graphs take a long
time to create and verify legally, we plan to support a special TODO type,
such that partially implemented harmonized decision-graphs can be
executed and reasoned about.
Part of the tooling effort is creating useful views of the harmonized
decision-graph and its sub-parts. Below are two views of a harmonized
decision-graph – one interactive (based on HTML5) and another static
(based on Graphviz). The latter was automatically generated by our
interpreter. Nodes show technical information as well as basic wording
(actual wording presented to the researcher may be different).
We have already harmonized regulations related to IRBs, consent and
HIPAA and made a summary flow chart of questions for an interview of a
researcher. We have also had legal experts review our approach and all
agreed it was sufficient, proper and prudent with respect to data sharing
under HIPAA. The views below show parts of the HIPAA harmonized
decision-graph.
Harmonized'Decision@Graph'
Harm(Level( DUA(Agreement(
Method(
Authen$ca$on( Transit( Storage(
No(Risk( Implicit( None( Clear( Clear(
Data$is$non)confiden.al$informa.on$that$can$be$stored$and$shared$freely
Minimal( Implicit( Email/OAuth( Clear( Clear(
May$have$individually$iden.fiable$informa.on$but$disclosure$would$not$cause$material$harm$
Shame( Click(Through( Password( Encrypted( Encrypted(
May$have$individually$iden.fiable$informa.on$that$if$disclosed$could$be$expected$to$damage$a$
person’s$reputa.on$or$cause$embarrassment
Civil(Penal$es( Signed( Password( Encrypted( Encrypted(
May$have$individually$iden.fiable$informa.on$that$includes$Social$Security$numbers,$financial$
informa.on,$medical$records,$and$other$individually$iden.fiable$informa.on
Criminal(
Penal$es(
Signed(
(
Two:Factor( Encrypted( Encrypted(
May$have$individually$iden.fiable$informa.on$that$could$cause$significant$harm$to$an$individual$
if$exposed,$including$serious$risk$of$criminal$liability,$psychological$harm,$loss$of$insurability$or$
employability,$or$significant$social$harm
Maximum(
Control(
Signed(
(
Two:Factor( Double(
Encrypted(
Double(
Encrypted(
Defined$as$such,$or$may$be$life)threatening$(e.g.$interviews$with$iden.fiable$gang$members).
Screenshot*of*a*ques/on*screen,*part*of*the*tagging*process.*Note**
the*current*data*tags*on*the*right,*allowing*the*user*to*see*what*
was*achieved*so*far*in*the*tagging*process.*
In order to define the tags and their possible values, we are developing a
formal language, designed to allow legal experts with little or no
programming experience to write interviews. This will enable frequent
updates to the system, a fundamental requirement since laws governing
research data may change. Below is the full tag space needed for HIPAA
compliance, and part of the code used to create it.
Representing the tag space as a graph allows us to reason about it using
Graph Theory. Under these terms, creating DataTags to represent a data
policy translates to selecting a sub-graph from the tag space graph. A single
node n is said to be fully-specified in sub-graph S, if S contains an edge
from n to one of its leafs. A Compound node c is said to be fully-specified
in sub-graph S if all its single and compound child nodes are fully
specified in sub-graph S.
A tagging process has to yield a sub-graph in which the root node (shown
in yellow) is fully-specified.
Describing'a'Tag'Space'
DataType: Standards, Effort, Harm.!
!
Standards: some of HIPAA, FERPA,!
ElectronicWiretapping,!
CommonRule.!
Effort: one of Identified, Identifiable, !
DeIdentified, Anonymous.!
Harm: one of NoRisk, Minimal, Shame, Civil,!
Criminal, MaxControl.!
!
The*tag*space*graph*needed*for*HIPAA*compliance,*and*part*of*the*code*used*to*
describe*it.*Base*graph*for*the*diagram*was*created*by*our*language*
interpreter.*
DataTags
blue
green
orange
red
crimson
None1yr
2yr
5yr
No Restriction
Research
IRB
No Product
None
Email
OAuth
Password
none
Email
Signed
HIPAA
FERPA
ElectronicWiretapping CommonRule
Identified
Reidentifiable
DeIdentified
Anonymous
NoRisk
Minimal Shame
Civil
Criminal
MaxContro
l
Anyone
NotOnline
Organization
Group
NoOne
NoMatching
NoEntities
NoPeople NoProhibition
Contact NoRestriction
Notify
PreApprove
Prohibited
Click
Signed
SignWithId
Clear
Encrypt
DoubleEncrypt
Clear
Encrypt
DoubleEncrypt
code
Handling
DataType
DUA
Storage
Transit
Authentication
Standards
Effort
Harm
TimeLimit
Sharing
Reidentify
Publication
Use
Acceptance
Approval
Compund
Simple
Aggregate
Value
1.
Person-specific
[PrivacyTagSet ]
2.
Explicit consent
[PrivacyTagSet ]
YES
1.1.
Tags= [GREEN, store=clear, transfer=clear, auth=none, basis=not applicable, identity=not person-specific, harm=negligible]
[PrivacyTagSet (EncryptionType): Clear(AuthenticationType): None(EncryptionType): Clear(DuaAgreementMethod): None]
NO
2.1.
Did the consent have any restrictions on sharing?
[PrivacyTagSet ]
YES
3.
Medical Records
[PrivacyTagSet ]
NO
2.1.2.
Add DUA terms and set tags from DUA specifics
[PrivacyTagSet ]
YES
2.1.1.
Tags= [GREEN, store=clear, transfer=clear, auth=none, basis=Consent, effort=___, harm=___]
[PrivacyTagSet (EncryptionType): Clear(AuthenticationType): None(EncryptionType): Clear]
NO
YES NO
3.1.
HIPAA
[PrivacyTagSet ]
YES
3.2.
Not HIPAA
[PrivacyTagSet ]
NO
3.1.5.
Covered
[PrivacyTagSet ]
YES
4.
Arrest and Conviction Records
[PrivacyTagSet ]
NO
3.1.5.1.
Tags= [RED, store=encrypt, transfer=encrypt, auth=Approval, basis=HIPAA Business Associate, effort=identifiable, harm=criminal]
[PrivacyTagSet (EncryptionType): Encrypted(AuthenticationType): Password(EncryptionType): Encrypted(DuaAgreementMethod): Sign]
YES NO
5.
Bank and Financial Records
[PrivacyTagSet ]
YESNO
6.
Cable Television
[PrivacyTagSet ]
YESNO
7.
Computer Crime
[PrivacyTagSet ]
YESNO
8.
Credit reporting and Investigations (including ‘Credit Repair,’ ‘Credit Clinics,’ Check-Cashing and Credit Cards)
[PrivacyTagSet ]
YESNO
9.
Criminal Justice Information Systems
[PrivacyTagSet ]
YESNO
10.
Electronic Surveillance (including Wiretapping, Telephone Monitoring, and Video Cameras)
[PrivacyTagSet ]
YESNO
11.
Employment Records
[PrivacyTagSet ]
YESNO
12.
Government Information on Individuals
[PrivacyTagSet ]
YESNO
13.
Identity Theft
[PrivacyTagSet ]
YESNO
14.
Insurance Records (including use of Genetic Information)
[PrivacyTagSet ]
YESNO
15.
Library Records
[PrivacyTagSet ]
YESNO
16.
Mailing Lists (including Video rentals and Spam)
[PrivacyTagSet ]
YESNO
17.
Special Medical Records (including HIV Testing)
[PrivacyTagSet ]
YESNO
18.
Non-Electronic Visual Surveillance (also Breast-Feeding)
[PrivacyTagSet ]
YESNO
19.
Polygraphing in Employment
[PrivacyTagSet ]
YESNO
20.
Privacy Statutes/State Constitutions (including the Right to Publicity)
[PrivacyTagSet ]
YESNO
21.
Privileged Communications
[PrivacyTagSet ]
YESNO
22.
Social Security Numbers
[PrivacyTagSet ]
YESNO
23.
Student Records
[PrivacyTagSet ]
YESNO
24.
Tax Records
[PrivacyTagSet ]
YESNO
25.
Telephone Services (including Telephone Solicitation and Caller ID)
[PrivacyTagSet ]
YESNO
26.
Testing in Employment (including Urinalysis, Genetic and Blood Tests)
[PrivacyTagSet ]
YESNO
27.
Tracking Technologies
[PrivacyTagSet ]
YESNO
28.
Voter Records
[PrivacyTagSet ]
YESNO
YES NO
YES NO
Two*views*of*the*same*harmonized*decision*graph,*compu/ng*
HIPAA*compliance*
Usability is a major challenge for DataTags to be successful. From the data
publisher point of view, a data tagging process may be experienced as a
daunting chore containing many unfamiliar terms, and carrying dire legal
consequences if not done correctly. Thus, the interview process and its user
interface will be designed to be inviting, non-intimidating and user-
friendly. For example, whenever legal or technical terms are used, a
layman explanation will be readily available.
As the length of the interview process depends on the answers, existing
best practices for advancement display (such as progress bars or a check
list) cannot be used. Being able to convey the progress made so far in a
gratifying way, keeping the user engaged in the process is an open research
question which we intend to study.
User'Interface'
In*order*to*make*the*tagging*process*approachable*and*nonE
in/mida/ng,*whenever*a*technical*or*a*legal*term*is*used,*an*
explana/on*is*readily*available.*Shown*here*is*part*of*the*final*
tagging*page,*and*an*explained*technical*term.**
Ben-Gurion University
of the Negev
DataTags, the Tags Toolset,
and Dataverse IntegrationDataTags, Data Handling Policy Spaces and the
Tags Language
Michael Bar-Sinai
Computer Science Dept.
Ben-Gurion University of the Negev
Be’er-Sheva, Israel
Latanya Sweeney
Data Privacy Lab
Harvard University
Cambridge, MA
Merc`e Crosas
Institute for Quantitative Social Science
Harvard University
Cambridge, MA
Abstract—Widespread sharing of scientific datasets holds great
promise for new scientific discoveries and great risks for personal
privacy. Dataset handling policies play the critical role of balanc-
ing privacy risks and scientific value. We propose an extensible,
formal, theoretical model for dataset handling policies. We define
binary operators for policy composition and for comparing policy
strictness, such that propositions like “this policy is stricter than
that policy” can be formally phrased. Using this model, The poli-
cies are described in a machine-executable and human-readable
Tag Type Description Security Features Access Credentials
Blue Public
Clear storage,
Clear transmit
Open
Green Controlled public
Clear storage,
Clear transmit
Email- or OAuth Verified
Registration
Yellow Accountable
Clear storage,
Encrypted transmit
Password, Registered,
Approval, Click-through DUA
Orange More accountable
Encrypted storage,
Encrypted transmit
Password, Registered,
Approval, Signed DUA
Privacy Tools for Sharing Research Data
ANational ScienceFoundationSecureandTrustworthyCyberspaceProject,
withadditional supportfromtheSloanFoundationandGoogle,Inc.
Managing Privacy in Research Data Repositories Workshop
July 13th, 2016
@michbarsinai
2. Based in part on:
Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with
Confidence: The Datatags System. Technology Science
[Internet]. 2015. Technology Science
Bar-Sinai M, Sweeney L, Crosas M. DataTags, Data Handling
Policy Spaces, and the Tags Language. Proceedings of the
International Workshop on Privacy Engineering. 2016. IEEE
Symposium on Security and Privacy ("Oakland")
3. We present a framework for
formally describing,
reasoning about, and
arriving at
data-handling policies
4. Making it Easier to
store and share scientific datasets
We present a framework for
formally describing,
reasoning about, and
arriving at
data-handling policies
5. Why Share Data?
๏ Good Science
๏ Transparency
๏ Collaboration
๏ Research acceleration
๏ Reproducibility
๏ Data citation
๏ Compliance with requirements from sponsors and publishers
8. Sharing Data is Nontrivial
๏ Sharing may harm the data subjects
๏ Law is complex
๏ 2187 privacy laws in the US alone, at federal, state and
local level, usually context-specific [Sweeney, 2013]
๏ Technology is complex
๏ E.g. encryption standards change constantly,
as new vulnerabilities are found
๏ Specific dataset provenance (may be) complex
9. Dataset handling policies
play the critical role of balancing
privacy risks and scientific value
of sharing datasets.
12. Formalcs DHPs
W3C’s Privacy Preference Project (P3P)
Focuses on web data collection
Open Digital Rights Language (ODRL)
Models DRM, supports privacy and rule-based assertions
PrimeLife Policy Language (PPL)
Focuses on downstream usage, using rules
Data-Purpose Algebra
Models restriction transformation along data processing path
Robot Lawyers
See next session
13. Tag Type Description Security Features Access Credentials
Blue Public
Clear storage,
Clear transmit
Open
Green Controlled public
Clear storage,
Clear transmit
Email- or OAuth Verified
Registration
Yellow Accountable
Clear storage,
Encrypted transmit
Password, Registered,
Approval, Click-through DUA
Orange More accountable
Encrypted storage,
Encrypted transmit
Password, Registered,
Approval, Signed DUA
Red Fully accountable
Encrypted storage,
Encrypted transmit
Two-factor authentication,
Approval, Signed DUA
Crimson Maximally restricted
Multi-encrypted storage,
Encrypted transmit
Two-factor authentication,
Approval, Signed DUA
DataTags
DataTags and their respective policies
Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The Datatags System.
Technology Science [Internet]. 2015.
15. Data-handling policies consist of independent aspects.
Encryption at rest, transfer type, access credentials, etc.
Each aspect has multiple possible requirements, and can
be defined such that these requirements are ordered.
16. DHPs: From Text to Space
Data-handling policies consist of independent aspects.
Encryption at rest, transfer type, access credentials, etc.
Each aspect has multiple possible requirements, and can
be defined such that these requirements are ordered.
Construct a data-handling policy space
by viewing aspects as axes, where each
aspect’s possible requirements serves
as its coordinates.
17. Tag Type Description Security Features Access Credentials
Blue Public
Clear storage,
Clear transmit
Open
Green Controlled public
Clear storage,
Clear transmit
Email- or OAuth Verified
Registration
Yellow Accountable
Clear storage,
Encrypted transmit
Password, Registered,
Approval, Click-through DUA
Orange More accountable
Encrypted storage,
Encrypted transmit
Password, Registered,
Approval, Signed DUA
Red Fully accountable
Encrypted storage,
Encrypted transmit
Two-factor authentication,
Approval, Signed DUA
Crimson Maximally restricted
Multi-encrypted storage,
Encrypted transmit
Two-factor authentication,
Approval, Signed DUA
Going from this…
26. A Dataset and a Repository walk into
a DHP space…
Implied
Click-
Through
Sign
None
Email/
OAuth
Password
TwoFactor
DUAagreementMethod
Authentication
Blue Green
Orange Red
27. A Dataset and a Repository walk into
a DHP space…
Implied
Click-
Through
Sign
None
Email/
OAuth
Password
TwoFactor
DUAagreementMethod
Authentication
Dataset
Blue Green
Orange Red
28. compliance(Dataset)
A Dataset and a Repository walk into
a DHP space…
Implied
Click-
Through
Sign
None
Email/
OAuth
Password
TwoFactor
DUAagreementMethod
Authentication
Dataset
Blue Green
Orange Red
29. support(Orange)
A Dataset and a Repository walk into
a DHP space…
Implied
Click-
Through
Sign
None
Email/
OAuth
Password
TwoFactor
DUAagreementMethod
Authentication
Dataset
Blue Green
Orange Red
30. A Dataset and a Repository walk into
a DHP space…
Implied
Click-
Through
Sign
None
Email/
OAuth
Password
TwoFactor
DUAagreementMethod
Authentication
Dataset
Blue Green
Orange Red
32. A tag space is a hierarchical
structure that defined a DHP
space, with some assertion
dimensions added.
Tag Space
Atom package screenshot: Gal Maman, Matan Toledano, BGU
33. Compound Slot
Atomic Slot
Block Comment
Line Comment
Description
Tag Space
Atom package screenshot: Gal Maman, Matan Toledano, BGU
34. AccessCredentials
Security
one of: none password twoFactorAuthentication
one of: implied clickThrough signedDUAAcceptance
one of: none required
Approval
one of: none usingExternalSystem localRegistration
Registration
one of: clear encrypt multiEncryptStorage
one of: clear encrypt
Transmit
DataTags
Tag-Space Visualized
Visualization using CliRunner (on a later slide)
and Graphviz (www.graphviz.org).
38. Decision Graph - Visualized
simple.dg
[#1]
eduCompliance
start
ask
Do the data concern humans?
ask
Does the data contain educational
records?
Set
Handling=[Transit:encrypt]
no
eduCompliance
yes
Set
Assertions={humanData}
yes
Set
Handling=[Transit:clear Storage:clear]
no
todo
Handle IP issues here
eduCompliance
ask
Was written consent obtained?
REJECT
Cannot handle educational records
without written consent.
no
Set
Assertions={educationalRecords}
Handling=[Transit:encrypt Storage:encrypt]
yes
39. Decision Graph - Visualized
HIPAA-Sample.dg
duaReidentify
dua
[#1]
notHIPAAconsentDetails
HIPAA
medicalRecords
start
ask
Does your data include personal
information?
3.4
ask
Is a qualified person allowed to
re-identify and contact people whose
information is in the data?
3.5
ask
Is a qualified person allowed to
re-identify but not contact people whose
information is in the data?
no
Set
DUA=[Reidentify:noProhibition]
yes
3.6
ask
Is a qualified person allowed to contact
people whose information is in the data?
no
Set
DUA=[Reidentify:reidentify]
yes
ask
You must select one of the
reidentification options. Let's go
through them again.
no
Set
DUA=[Reidentify:contact]
yes
duaReidentify
ask
Is a qualified person prohibited from
matching the data to other data?
Set
DUA=[Reidentify:noMatching]
yes
3.2
ask
Is a qualified recipient prohibited from
identifying and contacting people or
organizations in the data?
no
duaReidentify
Set
DUA=[Reidentify:noEntities]
ok
Set
DUA=[Reidentify:noPeople]
yes
3.3
ask
Is a qualified recipient prohibited from
identifying and contacting people whose
information is in the data?
no
noyes
Set
DUA=[Publication:preApprove]
duaUsage
ask
How may a qualified recipient use the
data?
duaSharing
ask
How may the data be shared?
Set
DUA=[Sharing:organization]
within same organization
Set
DUA=[Sharing:anyone]
freely
Set
DUA=[Sharing:none]
sharing is prohibited
Set
DUA=[Sharing:notOnline]
not online
Set
DUA=[Sharing:group]
within immediate work group
duaReidentify
duaPublication
ask
A qualified data recipient may publish
results based on the data:
Set
DUA=[Publication:notify]
Set
DUA=[TimeLimit:_2years]
ask
For how long should we keep the data?
2 years
Set
DUA=[TimeLimit:_1year]
1 year
Set
DUA=[TimeLimit:_5years]
5 years
Set
DUA=[TimeLimit:none]
indefinitely
Set
DUA=[TimeLimit:none]
dua
ask
Is there any reason why we cannot store
the data indefinitely?
Limiting the time a dataset
could be held interferes with good
scie...
yesno
Set
DUA=[Acceptance:signed]
duaApproval
ask
Does a qualified user needs further
approval for using the data?
Set
DUA=[Approval:email]
Set
DUA=[Approval:signed]
Set
DUA=[Use:research]
research purposes only
Set
DUA=[Use:noRestriction]
freely
Set
DUA=[Use:noProduct]
no derivatives
Set
DUA=[Use:IRB]
IRB approved research
duaAcceptance
ask
How should a quaified user accept the
data use agreement?
Set
DUA=[Acceptance:signWithID]
yes, by email yes, signed
Set
DUA=[Approval:none]
no
sign digitallysign, with ID
Set
DUA=[Acceptance:click]
click through
Set
DUA=[Publication:noRestriction]
pending approvalafter notification freely
Set
DUA=[Publication:prohibited]
publications prohibited
medicalRecords
Set
DataType=[Harm:noRisk]
Handling=[Storage:clear Transit:clear Authentication:none]
no
explicitConsent
ask
Did each person whose information
appears in the data give explicit
permission to share the data?
yes
no
consentDetails
yes
notHIPAA
Set
DataType=[Basis:{agreement}]
ask
Did the data have any restrictions on
sharing?
dua
yes
3.2.1.1
Set
Handling=[Storage:clear Transit:clear Authentication:none]
no
dua
Set
Handling=[Storage:clear Transit:encrypt Authentication:contactable]
consentDetails
Set
DataType=[Basis:{consent}]
ask
Did the consent have any restrictions on
sharing?
Set
Handling=[Storage:clear Transit:clear Authentication:none]
yesno
3.1.3.1
ask
Did the limited data use agreement have
any additional restrictions on sharing?
no
dua
yes
Set
DataType=[Basis:{HIPAABusinessAssociate} Harm:criminal Effort:identifiable]
DUA=[Approval:signed]
Handling=[Storage:encrypt Transit:encrypt Authentication:twoFactor]
ask
Did the business associate agreement
have any additional restrictions on
sharing?
safeHarbor
ask
Does the data visually ahdere to the
HIPAA Safe Harbor provision?
3.1.1.1
ask
Do you know of a way to put names on the
patients in the data?
yes
statistician
ask
Has an expert certified the data as
being of minimal risk?
no
Set
DataType=[Basis:{HIPAASafeHarbor} Harm:noRisk Effort:deIdentified]
Handling=[Storage:clear Transit:clear Authentication:none]
no yes
HIPAA
ask
Was the data received from a HIPAA
covered entity or a business associate
of one?
no yes
no
dua
yes
3.1.2.1
Set
DataType=[Basis:{HIPAAStatistician} Harm:noRisk Effort:deIdentified]
Handling=[Storage:clear Transit:clear Authentication:none]
Set
DataType=[Basis:{HIPAACoveredEntity} Harm:criminal Effort:identifiable]
DUA=[Approval:signed]
Handling=[Storage:encrypt Transit:encrypt Authentication:twoFactor]
notHIPAA
coveredEntity
ask
Are you an entity that is directly or
indirectly covered by HIPAA?
yes
no
yes
limitedDataSet
ask
Did you acquire the data under a HIPAA
limited data use agreement?
no
businessAssociate
ask
Did you acquire the data under a HIPAA
Business Associate agreement?
no
Set
DataType=[Basis:{HIPAALimitedDataset} Harm:criminal Effort:identifiable]
DUA=[Approval:signed]
Handling=[Storage:encrypt Transit:encrypt Authentication:password]
yes
yes no
HIPAA
medicalRecords
ask
Does the data contain personal health
information?
yes
notHIPAA
no (not HIPAA)
HIPAA Compliance - Decision Graph
40. Decision Graph - Visualized
HIPAA-Sample.dg
duaReidentify
dua
[#1]
notHIPAAconsentDetails
HIPAA
medicalRecords
start
ask
Does your data include personal
information?
3.4
ask
Is a qualified person allowed to
re-identify and contact people whose
information is in the data?
3.5
ask
Is a qualified person allowed to
re-identify but not contact people whose
information is in the data?
no
Set
DUA=[Reidentify:noProhibition]
yes
3.6
ask
Is a qualified person allowed to contact
people whose information is in the data?
no
Set
DUA=[Reidentify:reidentify]
yes
ask
You must select one of the
reidentification options. Let's go
through them again.
no
Set
DUA=[Reidentify:contact]
yes
duaReidentify
ask
Is a qualified person prohibited from
matching the data to other data?
Set
DUA=[Reidentify:noMatching]
yes
3.2
ask
Is a qualified recipient prohibited from
identifying and contacting people or
organizations in the data?
no
duaReidentify
Set
DUA=[Reidentify:noEntities]
ok
Set
DUA=[Reidentify:noPeople]
yes
3.3
ask
Is a qualified recipient prohibited from
identifying and contacting people whose
information is in the data?
no
noyes
Set
DUA=[Publication:preApprove]
duaUsage
ask
How may a qualified recipient use the
data?
duaSharing
ask
How may the data be shared?
Set
DUA=[Sharing:organization]
within same organization
Set
DUA=[Sharing:anyone]
freely
Set
DUA=[Sharing:none]
sharing is prohibited
Set
DUA=[Sharing:notOnline]
not online
Set
DUA=[Sharing:group]
within immediate work group
duaReidentify
duaPublication
ask
A qualified data recipient may publish
results based on the data:
Set
DUA=[Publication:notify]
Set
DUA=[TimeLimit:_2years]
ask
For how long should we keep the data?
2 years
Set
DUA=[TimeLimit:_1year]
1 year
Set
DUA=[TimeLimit:_5years]
5 years
Set
DUA=[TimeLimit:none]
indefinitely
Set
DUA=[TimeLimit:none]
dua
ask
Is there any reason why we cannot store
the data indefinitely?
Limiting the time a dataset
could be held interferes with good
scie...
yesno
Set
DUA=[Acceptance:signed]
duaApproval
ask
Does a qualified user needs further
approval for using the data?
Set
DUA=[Approval:email]
Set
DUA=[Approval:signed]
Set
DUA=[Use:research]
research purposes only
Set
DUA=[Use:noRestriction]
freely
Set
DUA=[Use:noProduct]
no derivatives
Set
DUA=[Use:IRB]
IRB approved research
duaAcceptance
ask
How should a quaified user accept the
data use agreement?
Set
DUA=[Acceptance:signWithID]
yes, by email yes, signed
Set
DUA=[Approval:none]
no
sign digitallysign, with ID
Set
DUA=[Acceptance:click]
click through
Set
DUA=[Publication:noRestriction]
pending approvalafter notification freely
Set
DUA=[Publication:prohibited]
publications prohibited
medicalRecords
Set
DataType=[Harm:noRisk]
Handling=[Storage:clear Transit:clear Authentication:none]
no
explicitConsent
ask
Did each person whose information
appears in the data give explicit
permission to share the data?
yes
no
consentDetails
yes
notHIPAA
Set
DataType=[Basis:{agreement}]
ask
Did the data have any restrictions on
sharing?
dua
yes
3.2.1.1
Set
Handling=[Storage:clear Transit:clear Authentication:none]
no
dua
Set
Handling=[Storage:clear Transit:encrypt Authentication:contactable]
consentDetails
Set
DataType=[Basis:{consent}]
ask
Did the consent have any restrictions on
sharing?
Set
Handling=[Storage:clear Transit:clear Authentication:none]
yesno
3.1.3.1
ask
Did the limited data use agreement have
any additional restrictions on sharing?
no
dua
yes
Set
DataType=[Basis:{HIPAABusinessAssociate} Harm:criminal Effort:identifiable]
DUA=[Approval:signed]
Handling=[Storage:encrypt Transit:encrypt Authentication:twoFactor]
ask
Did the business associate agreement
have any additional restrictions on
sharing?
safeHarbor
ask
Does the data visually ahdere to the
HIPAA Safe Harbor provision?
3.1.1.1
ask
Do you know of a way to put names on the
patients in the data?
yes
statistician
ask
Has an expert certified the data as
being of minimal risk?
no
Set
DataType=[Basis:{HIPAASafeHarbor} Harm:noRisk Effort:deIdentified]
Handling=[Storage:clear Transit:clear Authentication:none]
no yes
HIPAA
ask
Was the data received from a HIPAA
covered entity or a business associate
of one?
no yes
no
dua
yes
3.1.2.1
Set
DataType=[Basis:{HIPAAStatistician} Harm:noRisk Effort:deIdentified]
Handling=[Storage:clear Transit:clear Authentication:none]
Set
DataType=[Basis:{HIPAACoveredEntity} Harm:criminal Effort:identifiable]
DUA=[Approval:signed]
Handling=[Storage:encrypt Transit:encrypt Authentication:twoFactor]
notHIPAA
coveredEntity
ask
Are you an entity that is directly or
indirectly covered by HIPAA?
yes
no
yes
limitedDataSet
ask
Did you acquire the data under a HIPAA
limited data use agreement?
no
businessAssociate
ask
Did you acquire the data under a HIPAA
Business Associate agreement?
no
Set
DataType=[Basis:{HIPAALimitedDataset} Harm:criminal Effort:identifiable]
DUA=[Approval:signed]
Handling=[Storage:encrypt Transit:encrypt Authentication:password]
yes
yes no
HIPAA
medicalRecords
ask
Does the data contain personal health
information?
yes
notHIPAA
no (not HIPAA)
HIPAA Compliance - Decision Graph
DataType
DUA
TODO
IP
Handling
some of
Basis
one of: noRisk minimal shame civil criminal maxControlHarm
one of: notApplicable identified identifiable deIdentified anonymous
Effort
consent
agreement
HIPAASafeHarbor
HIPAAStatistician
HIPAALimitedDataset
HIPAACoveredEntity
HIPAABusinessAssociate
one of: none email signed
Approval
one of: noRestriction research IRB noProduct
Use
one of: noRestriction notify preApprove prohibitedPublication
one of: contact reidentify noProhibition noPeople noEntities noMatchingReidentify
one of: implied click signed signWithID
Acceptance
one of: none _5years _2years _1year
TimeLimit
one of: anyone notOnline organization group none
Sharing
one of: clear encrypt multiEncryptStorage
one of: clear encrypt
Transit
one of: none contactable password twoFactor
Authentication
DataTags
49. I Data
http://datatags.org
http://datascience.iq.harvard.edu/about-datatags
are the programs interactively executed by
er. The language we develop to create them
nts, suggested wording for questions, sub-
alize harmonized decision-graphs take a long
ally, we plan to support a special TODO type,
ted harmonized decision-graphs can be
.
reating useful views of the harmonized
arts. Below are two views of a harmonized
tive (based on HTML5) and another static
ter was automatically generated by our
nical information as well as basic wording
the researcher may be different).
d regulations related to IRBs, consent and
y flow chart of questions for an interview of a
d legal experts review our approach and all
per and prudent with respect to data sharing
low show parts of the HIPAA harmonized
ized'Decision@Graph' CONCLUSIONS'
The DataTags project will allow researchers to publish their data, without
breaching laws or regulations. Using a simple interview process, the
system and researcher will generate a machine actionable data policy
appropriate for a dataset – its “DataTags”. This policy will later by used by
systems like Dataverse to decide how the data should be made available,
and to whom. The system will also be able to generate a customized DUA
based on these tags – a task that is currently done manually, consuming a
lot of time and resources.
The programming language for Tag Space and Harmonized decision-graph
description, and the tools related to it, will be able to describe general
harmonized decision-graphs, not just in the legal field. While easy to learn,
the language relies on Graph Theory, a robust foundation that will allow
various tools, including model checking and program/harmonized
decision-graph validations.
We believe DataTags will dramatically improve the rate of data sharing
among researchers, while maintaining legal compliance and at no cost to
the researcher or her institution. As a result, we expect more data to be
available for researchers, with fewer barriers of access.
REFERENCES'
[1] Sweeney L. Operationalizing American Jurisprudence for Data Sharing.
White Paper. 2013
[2] http://www.security.harvard.edu/research-data-security-policy
ACKNOWLEDGEMENTS''
Bob Gellman – validating the current harmonized decision-graph we have
is HIPAA compliant.
1.
Person-specific
[PrivacyTagSet ]
2.
Explicit consent
[PrivacyTagSet ]
YES
1.1.
Tags= [GREEN, store=clear, transfer=clear, auth=none, basis=not applicable, identity=not person-specific, harm=negligible]
[PrivacyTagSet (EncryptionType): Clear(AuthenticationType): None(EncryptionType): Clear(DuaAgreementMethod): None]
NO
2.1.
t have any restrictions on sharing?
[PrivacyTagSet ]
YES
3.
l Records
yTagSet ]
NO
2.1.1.
=clear, auth=none, basis=Consent, effort=___, harm=___]
ear(AuthenticationType): None(EncryptionType): Clear]
NO
3.2.
Not HIPAA
rivacyTagSet ]
NO
ction Records
gSet ]
cial Records
gSet ]
O
vision
gSet ]
O
Crime
gSet ]
O
ir,’ ‘Credit Clinics,’ Check-Cashing and Credit Cards)
gSet ]
O
rmation Systems
gSet ]
O
Telephone Monitoring, and Video Cameras)
gSet ]
O
Records
gSet ]
O
on on Individuals
gSet ]
O
Theft
gSet ]
O
se of Genetic Information)
gSet ]
O
ecords
gSet ]
O
deo rentals and Spam)
gSet ]
O
ncluding HIV Testing)
gSet ]
O
ance (also Breast-Feeding)
gSet ]
O
Employment
gSet ]
O
including the Right to Publicity)
gSet ]
O
munications
gSet ]
O
Numbers
gSet ]
O
ecords
gSet ]
O
ords
gSet ]
O
hone Solicitation and Caller ID)
gSet ]
O
nalysis, Genetic and Blood Tests)
gSet ]
O
hnologies
gSet ]
O
cords
gSet ]
O
ES NO
YES NO
e*harmonized*decision*graph,*compu/ng*
HIPAA*compliance*
e for DataTags to be successful. From the data
ta tagging process may be experienced as a
any unfamiliar terms, and carrying dire legal
rrectly. Thus, the interview process and its user
be inviting, non-intimidating and user-
ever legal or technical terms are used, a
eadily available.
w process depends on the answers, existing
ent display (such as progress bars or a check
ble to convey the progress made so far in a
user engaged in the process is an open research
study.
User'Interface'
tagging*process*approachable*and*nonE
er*a*technical*or*a*legal*term*is*used,*an*
available.*Shown*here*is*part*of*the*final*
*and*an*explained*technical*term.**
Dataset&
Interview&
Handling1
Access&
Control&
DUAs,&Legal&
Policies&
Data&Tags&
Dataset&
Dataset&
Dataset& Dataset&
Shame&
Civil&
Penal>es&
Criminal&
Penal>es&
Max&
Control&
No&Risk&
Minimal&
Direct&&
Access&
Criminal&
Penal>es&
Privacy&
Preserving&
Access&
Minimal&
Privacy&
Preserving&
Minimal&
Differen>al&
Privacy&
ε=1&
ε=1/10&
ε=1/100&
ε=1/1000&
Custom&
Agreement&
Overview*of*a*dataset*ingest*workflow*in*Dataverse,*showing*the*
role*of*the*DataTags*project*in*the*process.**
interpreter.*
Ben-Gurion University
of the Negev
[set: Thank+=you]
[end]
This work was funded by grant CNS-1237235 from the National Science Foundation.