Presentation by Angus Whyte at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at University of Edinburgh 27 October 2014
Preparing your data for sharing and publishingVarsha Khodiyar
This document provides information on preparing data for sharing and publishing. It discusses organizing data through clear file and folder labeling, including additional context about methods and instruments. It also describes publishing data through journals like Scientific Data, which provide peer review and credit. Sensitive data requires careful handling and may be suitable for controlled access repositories. Overall the document offers guidance on effective data organization, documentation, sharing and receiving credit for shared data.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
The Australian National Data Service (ANDS) aims to make Australian research data more valuable by partnering with research organizations and funding data projects. In 2015, ANDS conducted over 100 workshops and events with over 4,000 participants and developed online resources. ANDS provides guides on topics like data management and the FAIR data principles. ANDS also advocates for practices like data citation and publishing to ensure research data is preserved and reusable over time. The presentation outlines ANDS' role in supporting good research data management practices and sharing to ensure the integrity and impact of research evidence.
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
Dr Jacobs' introduction to the RIA Data Management Workshop in Brisbane on 31 March 2017. The RIA Data Management Workshop series is a joint collaboration of the Australian Research Council, the National Health and Medical Research Council, the Australasian Research Management Society and the Australian National Data Service.
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
This document discusses licensing research data for reuse. It begins by providing a scenario where a user has downloaded a dataset but is unsure what they can do with the data due to licensing. It then discusses that licensing is critical to enabling data reuse and citation. It provides information on AusGOAL, the Australian open access and licensing framework, and notes it is recommended for data publishing by ANDS partners. It also includes links to licensing guides and FAQs. In summary, the document emphasizes the importance of data licensing for enabling reuse and outlines Australia's recommended licensing system.
Preparing your data for sharing and publishingVarsha Khodiyar
This document provides information on preparing data for sharing and publishing. It discusses organizing data through clear file and folder labeling, including additional context about methods and instruments. It also describes publishing data through journals like Scientific Data, which provide peer review and credit. Sensitive data requires careful handling and may be suitable for controlled access repositories. Overall the document offers guidance on effective data organization, documentation, sharing and receiving credit for shared data.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017ARDC
The Australian National Data Service (ANDS) aims to make Australian research data more valuable by partnering with research organizations and funding data projects. In 2015, ANDS conducted over 100 workshops and events with over 4,000 participants and developed online resources. ANDS provides guides on topics like data management and the FAIR data principles. ANDS also advocates for practices like data citation and publishing to ensure research data is preserved and reusable over time. The presentation outlines ANDS' role in supporting good research data management practices and sharing to ensure the integrity and impact of research evidence.
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
Dr Jacobs' introduction to the RIA Data Management Workshop in Brisbane on 31 March 2017. The RIA Data Management Workshop series is a joint collaboration of the Australian Research Council, the National Health and Medical Research Council, the Australasian Research Management Society and the Australian National Data Service.
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
This document discusses licensing research data for reuse. It begins by providing a scenario where a user has downloaded a dataset but is unsure what they can do with the data due to licensing. It then discusses that licensing is critical to enabling data reuse and citation. It provides information on AusGOAL, the Australian open access and licensing framework, and notes it is recommended for data publishing by ANDS partners. It also includes links to licensing guides and FAQs. In summary, the document emphasizes the importance of data licensing for enabling reuse and outlines Australia's recommended licensing system.
The document provides guidance on writing a data management plan (DMP). It explains that DMPs are now required by many funders to accompany grant applications. A DMP outlines how research data will be managed and shared during and after a project. It should address issues like the type of data being collected, documentation, storage and backup plans, data sharing and reuse, legal and ethical concerns, and long-term preservation. Writing a DMP helps ensure good data management practices and that a project is compliant with funder policies supporting open access to research data.
The document discusses open data and data sharing, including defining open data, the benefits of open data, overcoming barriers to opening data such as concerns about scooping and sensitive data, best practices for making data open through formats, licensing and description, and the role of research databases and data citation in promoting open data.
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
Presentation given to the SciDataCon #IDW2018 session: Democratising Data Publishing: A Global Perspective, on Tuesday 6th November 2018, Gaborone, Botswana
Research data management at TU EindhovenLeon Osinski
The document discusses research data management at TU Eindhoven. It outlines the long process of developing RDM practices since 2008. It describes the current organization and governance structure for RDM. Key external requirements for RDM from funders, regulations, and integrity standards are also summarized. The document concludes by outlining RDM support services available and the benefits of good RDM practices.
What funders want you to do with your dataLeon Osinski
Funders want researchers to 1) deposit the relevant data from their research in an approved repository to make it FAIR (Findable, Accessible, Interoperable, Reusable), 2) make the data openly available whenever possible, and 3) write a Data Management Plan describing how they will manage their data during and after the project. Funders require depositing data in repositories to enable reuse, making data open access "as open as possible, as closed as necessary", and having a Data Management Plan that addresses reuse according to FAIR principles.
The document summarizes a pilot project at the University of Edinburgh to support the development of a UK Research Data Discovery Service. PhD interns engaged with researchers from various schools to describe and deposit research datasets in the university's systems to be harvested by the discovery service. Observations found mixed results across schools, with humanities researchers less comfortable sharing data due to copyright and reluctance to share interpretations. Other schools had established data repositories causing less interest in the university's system. Building research data management practices will require tailored approaches and more training over time.
The format for the data management plans for PhD students at Wagenigen UR explained. This format was developed by the library in cooperation with the Wageningen Graduate Schools.
DataONE Education Module 10: Legal and Policy IssuesDataONE
This document discusses legal, ethical and policy issues related to managing research data. It defines key concepts like copyright, licenses and waivers, and explains why identifying ownership and control is important. Restrictions on data use and sharing are discussed, including protecting privacy and following regulations. Open licensing is presented as a way to facilitate sharing while still giving credit. The importance of behaving ethically and respecting licenses is emphasized.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This document provides guidance on writing successful data management plans (DMPs). It explains that DMPs are required by many funders to anticipate and avoid data management problems. The document outlines the key sections to include in a DMP, such as data collection, documentation, storage and sharing. It recommends keeping a DMP simple, seeking advice, and ensuring plans are feasible. Tools like DMPOnline can help write DMPs according to different funder requirements.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Strand 1: Connecting research and researchers: An introduction to ORCID by Ed...OAbooks
ORCID is an open, non-profit organization that provides a registry of unique researcher identifiers and aims to link researchers to their professional activities such as publications, datasets, and more. The presentation discusses the problems ORCID aims to address like linking researchers across databases and improving discoverability. It outlines ORCID's mission, benefits to the research community, how the ORCID registry works, privacy considerations, integration opportunities, growth since launch, international usage, members, support available, and how to join ORCID.
Introduction to Research Data Management at UWAKatina Toufexis
This document summarizes the key benefits of research data management. It discusses how research data management helps with compliance by meeting requirements of international and national funding agencies as well as publishers. It also promotes efficiency in the research process, ensures security of data, allows access for validation and collaboration, and improves quality through enabling replication. The document provides an overview of the Research Data Management Toolkit available at UWA to support researchers in managing their data over the research lifecycle.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
An overview of the LSHTM Research Data Management Policy, outlining the motivations for its introduction, obligations that need to be met and the support available
This document provides an introduction to research data management for geoscience PhD students. It defines research data and different data types. It discusses the importance of managing data throughout its lifecycle for efficient and valid research. It outlines funder requirements, university policies, and activities involved in good research data management like data planning, documentation, storage, sharing and preservation.
Providing support and services for researchers in good data governanceRobin Rice
The University of Edinburgh provides support and services to help researchers with good data governance. This includes a research data policy, research data service with various tools across the data lifecycle, and a data safe haven for sensitive data. The research data service offers centralized storage, version control, collaboration tools, and repositories for sharing data openly or long-term retention. Training and outreach aim to educate researchers on topics like data management plans, sensitive data, and GDPR compliance.
Presentation given by Sarah Jones at a seminar run by LSHTM on 6th November 2012. http://www.lshtm.ac.uk/newsevents/events/2012/11/developing-data-management-expertise-in-research---half-day-event
The document provides guidance on writing a data management plan (DMP). It explains that DMPs are now required by many funders to accompany grant applications. A DMP outlines how research data will be managed and shared during and after a project. It should address issues like the type of data being collected, documentation, storage and backup plans, data sharing and reuse, legal and ethical concerns, and long-term preservation. Writing a DMP helps ensure good data management practices and that a project is compliant with funder policies supporting open access to research data.
The document discusses open data and data sharing, including defining open data, the benefits of open data, overcoming barriers to opening data such as concerns about scooping and sensitive data, best practices for making data open through formats, licensing and description, and the role of research databases and data citation in promoting open data.
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
Presentation given to the SciDataCon #IDW2018 session: Democratising Data Publishing: A Global Perspective, on Tuesday 6th November 2018, Gaborone, Botswana
Research data management at TU EindhovenLeon Osinski
The document discusses research data management at TU Eindhoven. It outlines the long process of developing RDM practices since 2008. It describes the current organization and governance structure for RDM. Key external requirements for RDM from funders, regulations, and integrity standards are also summarized. The document concludes by outlining RDM support services available and the benefits of good RDM practices.
What funders want you to do with your dataLeon Osinski
Funders want researchers to 1) deposit the relevant data from their research in an approved repository to make it FAIR (Findable, Accessible, Interoperable, Reusable), 2) make the data openly available whenever possible, and 3) write a Data Management Plan describing how they will manage their data during and after the project. Funders require depositing data in repositories to enable reuse, making data open access "as open as possible, as closed as necessary", and having a Data Management Plan that addresses reuse according to FAIR principles.
The document summarizes a pilot project at the University of Edinburgh to support the development of a UK Research Data Discovery Service. PhD interns engaged with researchers from various schools to describe and deposit research datasets in the university's systems to be harvested by the discovery service. Observations found mixed results across schools, with humanities researchers less comfortable sharing data due to copyright and reluctance to share interpretations. Other schools had established data repositories causing less interest in the university's system. Building research data management practices will require tailored approaches and more training over time.
The format for the data management plans for PhD students at Wagenigen UR explained. This format was developed by the library in cooperation with the Wageningen Graduate Schools.
DataONE Education Module 10: Legal and Policy IssuesDataONE
This document discusses legal, ethical and policy issues related to managing research data. It defines key concepts like copyright, licenses and waivers, and explains why identifying ownership and control is important. Restrictions on data use and sharing are discussed, including protecting privacy and following regulations. Open licensing is presented as a way to facilitate sharing while still giving credit. The importance of behaving ethically and respecting licenses is emphasized.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This document provides guidance on writing successful data management plans (DMPs). It explains that DMPs are required by many funders to anticipate and avoid data management problems. The document outlines the key sections to include in a DMP, such as data collection, documentation, storage and sharing. It recommends keeping a DMP simple, seeking advice, and ensuring plans are feasible. Tools like DMPOnline can help write DMPs according to different funder requirements.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Strand 1: Connecting research and researchers: An introduction to ORCID by Ed...OAbooks
ORCID is an open, non-profit organization that provides a registry of unique researcher identifiers and aims to link researchers to their professional activities such as publications, datasets, and more. The presentation discusses the problems ORCID aims to address like linking researchers across databases and improving discoverability. It outlines ORCID's mission, benefits to the research community, how the ORCID registry works, privacy considerations, integration opportunities, growth since launch, international usage, members, support available, and how to join ORCID.
Introduction to Research Data Management at UWAKatina Toufexis
This document summarizes the key benefits of research data management. It discusses how research data management helps with compliance by meeting requirements of international and national funding agencies as well as publishers. It also promotes efficiency in the research process, ensures security of data, allows access for validation and collaboration, and improves quality through enabling replication. The document provides an overview of the Research Data Management Toolkit available at UWA to support researchers in managing their data over the research lifecycle.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
An overview of the LSHTM Research Data Management Policy, outlining the motivations for its introduction, obligations that need to be met and the support available
This document provides an introduction to research data management for geoscience PhD students. It defines research data and different data types. It discusses the importance of managing data throughout its lifecycle for efficient and valid research. It outlines funder requirements, university policies, and activities involved in good research data management like data planning, documentation, storage, sharing and preservation.
Providing support and services for researchers in good data governanceRobin Rice
The University of Edinburgh provides support and services to help researchers with good data governance. This includes a research data policy, research data service with various tools across the data lifecycle, and a data safe haven for sensitive data. The research data service offers centralized storage, version control, collaboration tools, and repositories for sharing data openly or long-term retention. Training and outreach aim to educate researchers on topics like data management plans, sensitive data, and GDPR compliance.
Presentation given by Sarah Jones at a seminar run by LSHTM on 6th November 2012. http://www.lshtm.ac.uk/newsevents/events/2012/11/developing-data-management-expertise-in-research---half-day-event
Stuart Macdonald steps through the process of creating a robust data management plan for researchers. Presented at the European Association for Health Information and Libraries (EAHIL) 2015 workshop, Edinburgh, 11 June 2015.
The document provides information on creating a data management plan (DMP) for grant applications. It discusses what a DMP is, why they are important, and what funders require in a DMP. A DMP outlines how research data will be collected, documented, stored, shared, and preserved. The document recommends addressing six key themes in a DMP: data types and standards; ethics and intellectual property; data access, sharing and reuse; short-term storage and management; long-term preservation; and resourcing. Developing a strong DMP helps researchers manage data effectively and makes data available and reusable by others.
Managing data throughout the research lifecycleMarieke Guy
This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.
Slides from Thursday 2nd August 2018 - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
Presentation from a University of York Library workshop on research data management. The workshop provides an introduction to research data management, covering best practice for the successful organisation, storage, documentation, archiving, and sharing of research data.
Managing and Sharing Research Data: Good practices for an ideal world...in th...Martin Donnelly
This document discusses managing and sharing research data in an ideal versus real world setting. It outlines the agenda which includes an introduction, defining research data management, discussing ethics and integrity, context and policy drivers, incentives for data management, practical considerations, case studies, and concludes with a Q&A. Key points covered include the importance of documentation, metadata, backups, and depositing data long-term. Research data management is important for reproducibility, ethics, and increasingly required by funders and journals.
Presentation given at the Consorcio Madrono conference on Data Management Plans in Horizon 2020 http://www.consorciomadrono.es/info/web/blogs/formacion/217.php
This document provides an overview of a webinar on digital curation and research data management for universities. The webinar covers an introduction to digital curation, the benefits and drivers for research data management, current initiatives in UK universities, and the role of libraries in supporting research data management. Libraries are increasingly involved in developing institutional policies, providing training, and advising researchers on writing data management plans and sharing data. The webinar highlights training opportunities for librarians to develop skills in research data management and digital curation.
Aim:- To show how research data management can contribute to the success of your PhD.
*What is research data and why it is important?
*The Research Data lifecycle
* Research Data – more than just your results
* FAIR data and Open Research
* DMP online tool
Survey of research data management practices up2010digschol2011heila1
An analysis of data management practices at a large South African university was conducted through interviews with researchers and students to identify needs and challenges. The findings showed that while data collection methods vary, data storage is often ad hoc with no centralized support or resources. Researchers expressed a need for a central university server or repository for secure data storage and assistance with time constraints. It was concluded that a formal research data management program and staff support are needed to improve current practices.
Session presented by Judith Carr, Research Data Manager at the University of Liverpool on Research Data Management and your PhD.
Aim:- To show how research data management can contribute to the success of your PhD.
Covers:
* What is research data and why it is important?
* The Research Data lifecycle
Research Data – more than just your results
* FAIR data and Open Research
DMP online tool
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 24, 2009
Data Management Lab: Session 4 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Creating a Data Management Plan for your ResearchRobin Rice
This document provides an overview of creating a data management plan (DMP). It discusses what a DMP is, the benefits of creating one, and what funders require. A DMP defines what data will be collected, documented, stored, shared, and preserved. Developing a DMP helps avoid problems and ensures data are reliable and secure. The document outlines six key themes a DMP should address: data types and standards, ethics, access and sharing, storage, preservation, and resources. Support is available to help researchers develop effective DMPs.
Data Management Lab: Data management plan instructionsIUPUI
This document provides instructions for creating a data management plan. It outlines 17 components that should be addressed in a data management plan, including a description of the data, how it will be stored and backed up, how it will be shared and preserved, ethical and legal obligations, and a budget. Each component includes several questions to consider to ensure all relevant aspects are covered. Resources are also provided to help with developing policies for topics like data sharing, privacy, and legal requirements. The goal is to create a comprehensive plan for managing research data throughout its entire lifecycle and use.
The document summarizes a workshop hosted by the NIH Associate Director for Data Science to discuss charting the future of data science at NIH. The workshop goals were to get input from all stakeholders, identify strategic directions, policies, and funding initiatives, and have participants leave as advocates and supporters. The agenda included providing background, open discussion, identifying topics for breakout groups, subgroup discussions, and reporting back. The document provides context on current NIH data science efforts and examples of collaborators in building a biomedical research digital enterprise.
Getting to grips with research data management Wendy Mears
This document provides an overview of research data management. It defines research data management and discusses its importance. It also outlines the data lifecycle model and provides guidance on sharing data, working with data, planning for data management, and useful resources for research data management. The document aims to help researchers effectively manage the data created throughout the research process.
Similar to Long-term storage – will it fill up with the good stuff, or the big, bad, and ugly? Can checklists make a difference? (20)
European Research Funders and data sharing: an overview of current practicesDCC-info
This document provides an overview of data sharing policies and practices among European research funders. It finds that while many funders state a policy in support of open access to research data, fewer mandate sharing in repositories or monitor compliance. Common incentives for data sharing include guidance, tools and supported repositories, while rewards through additional funding or assessment are rare. Monitoring of data management plans and sharing is limited, occurring in only a few countries. The document discusses examples from the UK and other countries to identify best practices that could encourage data sharing while also building trust in repositories and services.
Presentation by Jim Cook at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at University of Edinburgh 27 October 2014
Research Data Management Programme in EdinburghDCC-info
Presentation by Stuart Macdonald at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at University of Edinburgh 27 October 2014
Presentation by Dominic Job at DCC-Arkivum event 'Data Storage & Preservation Strategies for Research Data Management' at 'University of Edinburgh 27 October 2014
Janet Cloud Services helps research and education institutions move to cloud services through guidance and collaborative purchasing. It provides a data archive framework agreement that offers benefits such as long-term data storage with 100% integrity guarantees. The agreement is available through Janet's contract with Arkivum, an archiving company spun off from Southampton University, and provides discounted pricing options for data archiving.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Population Growth in Bataan: The effects of population growth around rural pl...
Long-term storage – will it fill up with the good stuff, or the big, bad, and ugly? Can checklists make a difference?
1. Long-term storage – will it fill up with
the good stuff, or the big, bad, and ugly?
Can checklists make a difference?
Angus Whyte, DCC
‘Research Data Storage and Preservation Strategies’
University of Edinburgh 27 October 2014
a.whyte@ed.ac.uk
2. Long-term storage – will it fill up with the
good stuff, or just the big, bad, and ugly?
Will checklists encourage researchers to decide?
6. But more support needed!
Top 3 support needs for institutions *
1. Defining what to retain
2. Specifying tools/ infrastructure
3. Supporting metadata creation for
research data discovery
*March 2014 DCC 2014 RDM Survey of 61 institutions
Data available at: zenodo.org/collection/user-dcc-rdm-2014
7. Data Asset Surveys
Some institutions have estimated storage requirements from these
About your data and
its lifecycle…?
1.File type
2.Volumes
3.Density
4.Update frequence
5.Usage frequency
6.Availability req’d
7.Sensitivity
Active storage
Archival storage
Data Asset Framework Implementation guide
www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
8. Data Asset Surveys
Some institutions have estimated storage requirements from these
About your data and
its lifecycle…?
1.File type
2.Volumes
3.Density
4.Update frequence
5.Usage frequency
6.Availability req’d
7.Sensitivity
Active storage
Archival storage
But if you provide it will researchers use it, at what cost?
Data Asset Framework Implementation guide
www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
9. Practical checklists
key points in research cycle
Data Mgmt Plan
1. Collection
2. Documentation
3. Ethics & legal
4. Storage & backup
5. Selection& preserve
6. Data sharing
7. Responsibilities
Repository
selection
1. Policy & legal
2. Discoverable
3. Preservation
4. Reports
5. Trust
Archival storage Active storage
Data Selection
5 Steps to decide what
to keep
1. Could - benefit
2. Must - risks
3. Should - value
4. Cost factors
5. Weigh-up 1-4
Catalogue
Metadata
1. Name
2. Description
3. Identifier
4. Subject
5. URL
6. Date
7. Creator
8. Rights
9. Spatial
10.Publisher
Start
Write-up
11. 11
Data selection checklist
Straightforward steps to guide researchers
①Could this data be re-used
②Must it be kept to manage compliance risk
③Should it be kept for its potential value and…
④Considering costs
⑤Will ✔or won’t ✗ it be kept, shared on what terms
Institution or
external
repository
Data Selection
5 Steps to decide what
to keep
1. Could - benefit
2. Must - risks
3. Should - value
4. Cost factors
5. Weigh-up 1-4
Repository
selection
1. Policy & legal
2. Discoverable
3. Preservation
4. Reports
5. Trust
12. 12
Step 1 (?) What ‘must’ be kept?
Research record includes data as evidence for e.g. …
• Audit purposes
• Health & Safety (Lab book)
• Contractual requirement
Compliance also about data that won’t be kept, or
may only be shared with approved researchers…
Research Ethics, Duty of Confidentiality, Data Protection Act, Human Rights Act, Statistics &
Registration Services Act. UK Data Archive:
http://www.data-archive.ac.uk/create-manage/consent-ethics/legal
Jisc Infonet Guidance on Managing Research Records
tools.jiscinfonet.ac.uk/downloads/bcs-rrs/managing-research-records.pdf
13. 13
Step 1 (?) What ‘must’ be kept?
Research record includes data as evidence for e.g. …
• Audit purposes
• Health & Safety (Lab book)
• Contractual requirement
Compliance also about data that won’t be kept, or
may only be shared with approved researchers…
Research Ethics, Duty of Confidentiality, Data Protection Act, Human Rights Act, Statistics &
Registration Services Act. UK Data Archive:
http://www.data-archive.ac.uk/create-manage/consent-ethics/legal
Available choices depend on what purposes the data serves
Jisc Infonet Guidance on Managing Research Records
tools.jiscinfonet.ac.uk/downloads/bcs-rrs/managing-research-records.pdf
14. 14
Step 1 (?) What ‘must’ be kept?
But what about funder & journal data policies?
“Data with acknowledged long-term value ”
RCUK Common Principles on Data Policy
“Data, information and other electronic resources of long-term interest”
ESRC UK Data Archive Collections Development Policy
“Where data underpins published research there is much greater
expectation that it will be kept”
Ben Ryan, EPSRC
“An inherent principle of publication is that others should be able to
replicate and build upon the authors' published claims. Nature
15. 15
Step 1 (?) What ‘must’ be kept?
But what about funder & journal data policies?
“Data with acknowledged long-term value ”
RCUK Common Principles on Data Policy
“Data, information and other electronic resources of long-term interest”
ESRC UK Data Archive Collections Development Policy
“Where data underpins published research there is much greater
expectation that it will be kept”
Ben Ryan, EPSRC
“An inherent principle of publication is that others should be able to
replicate and build upon the authors' published claims. Nature
Still researchers’ judgement- what purposes the data may serve
16. Still researchers’ judgement- what purposes the data may serve
16
Step 1 (?) What ‘must’ be kept?
But what about funder & journal data policies?
“Data with acknowledged long-term value ”
RCUK Common Principles on Data Policy
“Data, information and other electronic resources of long-term interest”
ESRC UK Data Archive Collections Development Policy
“Where data underpins published research there is much greater
expectation that it will be kept”
Ben Ryan, EPSRC
“An inherent principle of publication is that others should be able to
replicate and build upon the authors' published claims. Nature
So make thinking about that the first step
17. Step 2 1 What could it be reused for?
17
Any angles the researcher has not already considered?
1. Verification
2. Further analysis
3. Reputation building
4. Resource development
5. Further publications inc. data articles
6. Learning and teaching materials
7. Private reference
18. Step 2 1 What could it be reused for?
18
Any angles the researcher has not already considered?
1. Verification
2. Further analysis
3. Reputation building
4. Resource development
5. Further publications inc. data articles
6. Learning and teaching materials
7. Private reference
Then, relative to these, which data must be kept
19. Step 3 What data should have value
19
Any two of these fit?
1. Good quality data and description
complete, accurate, reliable, valid, representative etc
2. High demand
known users, integration potential, reputation, recommendation, appeal
3. High effort to replicate
difficult, costly, or impossible to reproduce
4. Low barriers to reuse
legal/ ethical, copyright non-restrictive terms and conditions
5. Rarity value
unique copy or other copies at risk
Then what else e.g. software does it depend on?
20. Step 4 Cost factors
20
Why?
• Costs incurred during project may add to value
• Post-project costs must be covered
1. Creation, collection & cleaning
2. Short-term storage & backup
3. Short-term access & security
4. Team communication & development
5. Preservation & long-term access
So what action needed to ensure on budget?
21. Step 5 Bring it all together
21
Balance risks, costs and value
Document the choices made
1. Name, contributors, description, sensitivity - metadata
2. Reuse purposes and value – the ‘reuse case’
3. Risk of non-compliance and costs shortfall
4. Justification to keep or dispose
5. Actions to prepare for preservation or disposal
22. But will this work
From research perspective will active selection mean bureacracy?
Data Mgmt Plan
1. Collection
2. Documentation
3. Ethics & legal
4. Storage & backup
5. Selection& preserve
6. Data sharing
7. Responsibilities
Repository
selection
1. Policy & legal
2. Discoverable
3. Preservation
4. Reports
5. Trust
Archival storage Active storage
Data Selection
5 Steps to decide what
to keep
1. Could - benefit
2. Must - risks
3. Should - value
4. Cost factors
5. Weigh-up 1-4
Catalogue
Metadata
1. Name
2. Description
3. Identifier
4. Subject
5. URL
6. Date
7. Creator
8. Rights
9. Spatial
10.Publisher
23. But will it work
Easier to avoid selecting the good and let someone else deal with de-allocation?
Data Mgmt Plan
- enough to
identify which
project this data
relates to
The ugly
“dont know
its value or
where else to
put it”
Archival storage Active storage
“The bad”
Can’t share as
nobody knows
its sensitivity
The “too
big for
anywhere
else”