Talk given at the Westminster Higher Education Forum policy conference: Next steps for protecting research integrity in the UK, Monday 9 September 2019
The document summarizes the UK experience with open data and the establishment of data.gov.uk. It discusses the objectives of creating a single portal, establishing public data principles, providing datasets using open standards, and developing an open government license. It also discusses lessons learned, such as using various arguments to promote open data, the need for engagement at multiple levels of government, and continuously engaging with developers to highlight applications. The overall message is that open data, standards, licensing, and engagement can create social and economic value when approached adaptively.
A presentation given at the RECODE workshop on 25th September 2014. It covers what is happening in terms of opening up access to research data at the University of Glasgow and via the Digital Curation Centre. The RECODE project is developing policy recommendations for open access to research data in Europe - http://recodeproject.eu
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Jisc
The Research Data Service at Bristol University aims to make research data support a regular service by August 2015. It currently operates as a pilot program with staff including a service manager, research data librarians, and a technical developer. The service offers guidance on data management plans, data storage, sharing, publication, and training. It provides researchers with 5TB of storage and tools for collaboration, deposition of published datasets, and a public catalogue. Priorities for the next year include promoting the service, establishing ongoing service levels, developing an institutional research data policy, and integrating the data repository with the university's research information system.
Digitalisation and the future of research environmentsJisc
The document discusses how higher education is embracing digitalization through technologies like digitization, digitalization, and transformation. It also discusses how the digitalization journey can vary across different parts of an organization. Any future research environment will need to consider administration/management/support, the research process itself, and the community being served. A digital twin represents a technology that could help with a future research environment by providing a virtual representation to facilitate decision making and scenario testing across the research lifecycle.
Developing a persistent identifier roadmap for open access to UK researchJisc
The Jisc/UKRI PID roadmap project and report (developing a persistent identifier roadmap for open access to UK research).
Hilda Muchando, senior information policy officer, Jisc.
A presentation at Jisc's persistent identifiers and open access in the UK: the way forward online event on 25 June 2020.
A community response to the Jisc/UKRI PID project and its goalsJisc
Matthew Buys, executive director, DataCite
A presentation at Jisc's persistent identifiers and open access in the UK: the way forward online event on 25 June 2020.
The document discusses making scientific data and research more open, transparent, and interoperable according to FAIR principles. It argues that achieving FAIR data requires standards for data annotation and sharing, addressing ethics concerns, maximizing infrastructure use, and engaging stakeholders across academia, industry and government. Adopting FAIR principles more widely could improve data discovery and reuse to advance scientific knowledge.
The document summarizes the UK experience with open data and the establishment of data.gov.uk. It discusses the objectives of creating a single portal, establishing public data principles, providing datasets using open standards, and developing an open government license. It also discusses lessons learned, such as using various arguments to promote open data, the need for engagement at multiple levels of government, and continuously engaging with developers to highlight applications. The overall message is that open data, standards, licensing, and engagement can create social and economic value when approached adaptively.
A presentation given at the RECODE workshop on 25th September 2014. It covers what is happening in terms of opening up access to research data at the University of Glasgow and via the Digital Curation Centre. The RECODE project is developing policy recommendations for open access to research data in Europe - http://recodeproject.eu
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Jisc
The Research Data Service at Bristol University aims to make research data support a regular service by August 2015. It currently operates as a pilot program with staff including a service manager, research data librarians, and a technical developer. The service offers guidance on data management plans, data storage, sharing, publication, and training. It provides researchers with 5TB of storage and tools for collaboration, deposition of published datasets, and a public catalogue. Priorities for the next year include promoting the service, establishing ongoing service levels, developing an institutional research data policy, and integrating the data repository with the university's research information system.
Digitalisation and the future of research environmentsJisc
The document discusses how higher education is embracing digitalization through technologies like digitization, digitalization, and transformation. It also discusses how the digitalization journey can vary across different parts of an organization. Any future research environment will need to consider administration/management/support, the research process itself, and the community being served. A digital twin represents a technology that could help with a future research environment by providing a virtual representation to facilitate decision making and scenario testing across the research lifecycle.
Developing a persistent identifier roadmap for open access to UK researchJisc
The Jisc/UKRI PID roadmap project and report (developing a persistent identifier roadmap for open access to UK research).
Hilda Muchando, senior information policy officer, Jisc.
A presentation at Jisc's persistent identifiers and open access in the UK: the way forward online event on 25 June 2020.
A community response to the Jisc/UKRI PID project and its goalsJisc
Matthew Buys, executive director, DataCite
A presentation at Jisc's persistent identifiers and open access in the UK: the way forward online event on 25 June 2020.
The document discusses making scientific data and research more open, transparent, and interoperable according to FAIR principles. It argues that achieving FAIR data requires standards for data annotation and sharing, addressing ethics concerns, maximizing infrastructure use, and engaging stakeholders across academia, industry and government. Adopting FAIR principles more widely could improve data discovery and reuse to advance scientific knowledge.
A presentation by Rachel Bruce, director open science and research lifecycle, Jisc and Matthew Spitzer, community manager, Centre for Open Science (COS).
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Susanna-Assunta Sansone
This document discusses the importance of digital research objects being not just open, but FAIR (Findable, Accessible, Interoperable, Reusable). It notes that big life science companies have evolved from keeping data and innovation mostly inside the company to now distributing data more openly and collaborating in heterogeneous partnerships across different organizations. However, current academic incentive and evaluation systems do not properly recognize or reward activities like sharing data, software, publications or patents. The document calls for rethinking these systems and designing new career paths for data scientists to better align incentives with open and collaborative research practices.
WikiRate - Data Liberation and Radical TransparencyVishal Kapadia
WikiRate.org responds to the problem of siloed and trapped data around company CSR reporting with an open source, public platform, which allows anyone to utilise and analyse the data.
Discovery: Implementing a vision for a 'virtuous' flow of metadata across the...Joy Palmer
This document summarizes a national initiative in the UK to develop a shared infrastructure for open metadata across education and research institutions. There was agreement on the vision of open metadata but disagreement on how to achieve it. Key points of discussion included what makes data truly open through licensing, the need for standards and engagement. The initiative aims to establish principles of open by default, adopt standard open licenses, and make metadata accessible to machines to create an ecosystem of open and reusable data.
Jisc is refreshing its research sector strategy and is consulting stakeholders on a draft strategy. The draft strategy outlines Jisc's vision to be recognized as a major provider of research infrastructure and services. It identifies 7 themes for Jisc to support, including a new national data infrastructure, recording the UK's research assets, improving research analytics, supporting open access, and realizing the potential of advanced technologies through "Research 4.0". The strategy proposes specific initiatives and services under each theme to modernize research practices and infrastructure in the UK.
Data.gov.uk by Nigel Shadbolt, Professor of Artificial Intelligence, Head of Web and Internet Science Group, Electronics and Computer Science, University of Southampton
The document discusses open access publishing and predatory publishers. It notes that many regions and governments, including in Europe, the UK, and Australia, have policies supporting open access to publicly funded research. It includes a quote from the UK government emphasizing the benefits of free and open access to taxpayer-funded research. The document also lists some reasons why open access is important and compares the intermediary and output between traditional publishers and open access publishing. It provides tips on how to identify predatory publishers and journals, including checking for signs like launching many new titles at once with few papers, unsolicited email requests, lack of editor information, and little evidence of peer review.
Open access - a guide to Jisc's evolving offer to universities - Jisc Digital...Jisc
Universities are implementing open access to research publications, partly in response to policies from the UK funding and research councils.
This aims to provide the “big picture” of how Jisc is supporting universities in this challenge, both now and into the future.
The document discusses the role of data incubators in shaping European Data Spaces. It describes the European Data Incubator (EDI) project which incubated over 100 startups and SMEs working with data from 30 providers over 3 rounds. EDI helped broker data sharing agreements and connect companies to investors. The REACH incubator builds on EDI's work, facilitating cross-sector experimentation through an 11-month program involving startups, large corporations, and Digital Innovation Hubs to develop trusted and secure data solutions. REACH aims to demonstrate how data silos can be broken by enabling multi-stakeholder collaboration across industries.
This document summarizes ILRI's commitments and plans for open access and open science. ILRI established an open access task force in 2017 to increase staff awareness and support open policies. Key recent activities include external reviews of repositories and increasing IP awareness. Future plans include analyzing digital transformation and including informatics in an organizational assessment. The document discusses why openness is important for impact, collaboration and safeguarding legacy. It outlines ILRI's aims to empower sharing, organize research outputs, and extend accessibility and visibility. Various approaches are presented for open data, knowledge, intellectual property and multimedia. Tracking attention through tools like Altmetric is also mentioned.
This document provides an overview and status update of the "equipment.data" national equipment portal project. The project aims to establish a searchable portal of research equipment across UK higher education institutions. To date, the project has developed a branded website with a search function, recruited contributors and compiled statistics on available data. It has also engaged vendors to influence system designs and interoperability standards. Moving forward, objectives include continued community engagement, encouraging adoption of standards and procedures, and further vendor engagement to facilitate wider use and developments.
BRISSKit: biomedical research made easy - Jisc Digital Festival 2015Jisc
BRISSKit is a demo web application that intends to simplify the process whereby medical and translational researchers find and study patient cohorts and link to other biomedical datasets.
Skills and knowledge for scholarly communication rolesJisc
This document analyzed 71 UK job postings for scholarly communication roles from 2015-2017 to identify common skills and qualifications. It found that general skills like liaising with researchers and advising on compliance were most common. Specific skills for open access officers included implementing policies and managing funds, while repository managers focused on maintaining systems and ensuring metadata quality. A range of degree levels were required but roles increasingly expected expertise in multiple areas like open access and research data management.
Presentation: ODINE - Open Data Incubator Europe, by Elena Simperl, University of Southampton & The ODI (UK), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
Susanna Sansone - OpenCon Oxford, 1st Dec 2017Crossref
FAIR Data: principles and practices
A growing worldwide movement for reproducible research encourages making data, along with the experimental details, available according to the FAIR principles of Findability, Accessibility, Interoperability and Reusability (see http://www.nature.com/articles/sdata201618). Several data management, sharing policies and plans have emerged and, in parallel, a growing number of community-based groups are developing hundreds of standards to harmonize the reporting of different experiments. Community mobilization is evident also by the number of efforts and alliances, but also data journals and data centres being launched.
The document discusses principles and best practices for open data policies. It outlines six responsibilities for scientists, research institutions, publishers, funding agencies, professional associations, and libraries to make data openly available. Open data should be the default, with limited exceptions for privacy, safety and commercial interests justified on a case-by-case basis. Effective open data policy development requires consideration of context, content and impact. Key pillars for sustainable open data programs include supporting infrastructure, easy access, user feedback channels, high-value datasets, data quality, and privacy protection.
Talk given at the Westminster Higher Education Forum Keynote Seminar: Next steps for Open Access and Open Data research policy, Tuesday, 22nd November 2016
Accessing data for research: data publishing pathways and the Five SafesLouise Corti
Presented atL Assessing Disclosure Risk in Population Research Data and Outputs, Children of the 90s (ALSPAC)
Bristol Medical School, 24 January 2020.
In this half day session, we introduce the concept of a Safe Health Researcher, where both data producers and users are not only aware of key data legal, ethical and security measures surrounding the management and publication of biomedical research data, but also any risk in outputs they are creating.
The practical training session aimed at aimed at data managers looks at key elements of disclosure risk and trust in sharing biomedical data. We will cover the principles and practicalities of reviewing disclosure risk in numeric data sources and in research outputs.
A presentation by Rachel Bruce, director open science and research lifecycle, Jisc and Matthew Spitzer, community manager, Centre for Open Science (COS).
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Susanna-Assunta Sansone
This document discusses the importance of digital research objects being not just open, but FAIR (Findable, Accessible, Interoperable, Reusable). It notes that big life science companies have evolved from keeping data and innovation mostly inside the company to now distributing data more openly and collaborating in heterogeneous partnerships across different organizations. However, current academic incentive and evaluation systems do not properly recognize or reward activities like sharing data, software, publications or patents. The document calls for rethinking these systems and designing new career paths for data scientists to better align incentives with open and collaborative research practices.
WikiRate - Data Liberation and Radical TransparencyVishal Kapadia
WikiRate.org responds to the problem of siloed and trapped data around company CSR reporting with an open source, public platform, which allows anyone to utilise and analyse the data.
Discovery: Implementing a vision for a 'virtuous' flow of metadata across the...Joy Palmer
This document summarizes a national initiative in the UK to develop a shared infrastructure for open metadata across education and research institutions. There was agreement on the vision of open metadata but disagreement on how to achieve it. Key points of discussion included what makes data truly open through licensing, the need for standards and engagement. The initiative aims to establish principles of open by default, adopt standard open licenses, and make metadata accessible to machines to create an ecosystem of open and reusable data.
Jisc is refreshing its research sector strategy and is consulting stakeholders on a draft strategy. The draft strategy outlines Jisc's vision to be recognized as a major provider of research infrastructure and services. It identifies 7 themes for Jisc to support, including a new national data infrastructure, recording the UK's research assets, improving research analytics, supporting open access, and realizing the potential of advanced technologies through "Research 4.0". The strategy proposes specific initiatives and services under each theme to modernize research practices and infrastructure in the UK.
Data.gov.uk by Nigel Shadbolt, Professor of Artificial Intelligence, Head of Web and Internet Science Group, Electronics and Computer Science, University of Southampton
The document discusses open access publishing and predatory publishers. It notes that many regions and governments, including in Europe, the UK, and Australia, have policies supporting open access to publicly funded research. It includes a quote from the UK government emphasizing the benefits of free and open access to taxpayer-funded research. The document also lists some reasons why open access is important and compares the intermediary and output between traditional publishers and open access publishing. It provides tips on how to identify predatory publishers and journals, including checking for signs like launching many new titles at once with few papers, unsolicited email requests, lack of editor information, and little evidence of peer review.
Open access - a guide to Jisc's evolving offer to universities - Jisc Digital...Jisc
Universities are implementing open access to research publications, partly in response to policies from the UK funding and research councils.
This aims to provide the “big picture” of how Jisc is supporting universities in this challenge, both now and into the future.
The document discusses the role of data incubators in shaping European Data Spaces. It describes the European Data Incubator (EDI) project which incubated over 100 startups and SMEs working with data from 30 providers over 3 rounds. EDI helped broker data sharing agreements and connect companies to investors. The REACH incubator builds on EDI's work, facilitating cross-sector experimentation through an 11-month program involving startups, large corporations, and Digital Innovation Hubs to develop trusted and secure data solutions. REACH aims to demonstrate how data silos can be broken by enabling multi-stakeholder collaboration across industries.
This document summarizes ILRI's commitments and plans for open access and open science. ILRI established an open access task force in 2017 to increase staff awareness and support open policies. Key recent activities include external reviews of repositories and increasing IP awareness. Future plans include analyzing digital transformation and including informatics in an organizational assessment. The document discusses why openness is important for impact, collaboration and safeguarding legacy. It outlines ILRI's aims to empower sharing, organize research outputs, and extend accessibility and visibility. Various approaches are presented for open data, knowledge, intellectual property and multimedia. Tracking attention through tools like Altmetric is also mentioned.
This document provides an overview and status update of the "equipment.data" national equipment portal project. The project aims to establish a searchable portal of research equipment across UK higher education institutions. To date, the project has developed a branded website with a search function, recruited contributors and compiled statistics on available data. It has also engaged vendors to influence system designs and interoperability standards. Moving forward, objectives include continued community engagement, encouraging adoption of standards and procedures, and further vendor engagement to facilitate wider use and developments.
BRISSKit: biomedical research made easy - Jisc Digital Festival 2015Jisc
BRISSKit is a demo web application that intends to simplify the process whereby medical and translational researchers find and study patient cohorts and link to other biomedical datasets.
Skills and knowledge for scholarly communication rolesJisc
This document analyzed 71 UK job postings for scholarly communication roles from 2015-2017 to identify common skills and qualifications. It found that general skills like liaising with researchers and advising on compliance were most common. Specific skills for open access officers included implementing policies and managing funds, while repository managers focused on maintaining systems and ensuring metadata quality. A range of degree levels were required but roles increasingly expected expertise in multiple areas like open access and research data management.
Presentation: ODINE - Open Data Incubator Europe, by Elena Simperl, University of Southampton & The ODI (UK), at the European Data Economy Workshop taking place back to back to SEMANTiCS2015 on 15 September 2015 in Vienna
Susanna Sansone - OpenCon Oxford, 1st Dec 2017Crossref
FAIR Data: principles and practices
A growing worldwide movement for reproducible research encourages making data, along with the experimental details, available according to the FAIR principles of Findability, Accessibility, Interoperability and Reusability (see http://www.nature.com/articles/sdata201618). Several data management, sharing policies and plans have emerged and, in parallel, a growing number of community-based groups are developing hundreds of standards to harmonize the reporting of different experiments. Community mobilization is evident also by the number of efforts and alliances, but also data journals and data centres being launched.
The document discusses principles and best practices for open data policies. It outlines six responsibilities for scientists, research institutions, publishers, funding agencies, professional associations, and libraries to make data openly available. Open data should be the default, with limited exceptions for privacy, safety and commercial interests justified on a case-by-case basis. Effective open data policy development requires consideration of context, content and impact. Key pillars for sustainable open data programs include supporting infrastructure, easy access, user feedback channels, high-value datasets, data quality, and privacy protection.
Talk given at the Westminster Higher Education Forum Keynote Seminar: Next steps for Open Access and Open Data research policy, Tuesday, 22nd November 2016
Accessing data for research: data publishing pathways and the Five SafesLouise Corti
Presented atL Assessing Disclosure Risk in Population Research Data and Outputs, Children of the 90s (ALSPAC)
Bristol Medical School, 24 January 2020.
In this half day session, we introduce the concept of a Safe Health Researcher, where both data producers and users are not only aware of key data legal, ethical and security measures surrounding the management and publication of biomedical research data, but also any risk in outputs they are creating.
The practical training session aimed at aimed at data managers looks at key elements of disclosure risk and trust in sharing biomedical data. We will cover the principles and practicalities of reviewing disclosure risk in numeric data sources and in research outputs.
This document summarizes the key aspects of an open data incubator program. The incubator provides resources like funding up to €100,000, computing resources, introductions to mentors and investors, and training to help projects make use of open data and develop businesses or services. It accepts applications on a rolling basis and projects are selected every two months for six month incubations. The goal is to help facilitate the commercialization of open data and data-driven businesses through various supports.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Advocacy in Research Data Management. Session 3.2 of the RDMRose v3 materials.
The JISC funded RDMRose project (June 2012-May 2013) was a collaboration between the libraries of the University of Leeds, Sheffield and York, with the Information School at Sheffield to provide an Open Educational Resource for information professionals on Research Data Management. The materials were revised between November 2014 and February 2015 for the consortium of North West Academic Libraries (NoWAL).
http://www.sheffield.ac.uk/is/research/projects/rdmrose
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
An introduction to open science, why it's important and how to do it. This presentation was given at the European Medical Students Association (EMSA) event, 'Open Access in Action' in Berlin on 14th-15th September 2015
The art of depositing social science data: maximising quality and ensuring go...Louise Corti
The document provides guidance for depositing data into a research data repository. It discusses incentivizing researchers to share data, developing data policies, reviewing data for quality and disclosure risks, preparing documentation, assigning licenses, and providing support to depositors. The role of the repository manager is to work with depositors to prepare data according to best practices and the repository's standards to ensure long-term preservation and access.
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018
Managing Your Research Data for Maximum Impact -Rob Daley 300616_SharedRob Daley
This document provides an overview of best practices for managing research data. It discusses why data management is important given changing policies from funders that require making data openly available. It outlines challenges for researchers in managing data and provides guidance on developing a data management plan to address issues like data types, access, storage, and long-term preservation. The document also covers topics like formatting data, addressing legal and ethical concerns, publishing and citing data, and tools like ORCID and DOIs to help maximize the impact of research data.
Building data networks: exploring trust and interoperability between authoris...Repository Fringe
Building data networks: exploring trust and interoperability between authoris, repositories and journals. Varsha Khodiyar , Scientific Data; Neil Chue Hong, Journal of Open Research Software; Rachael Kotarski, DataCite, Peter McQuilton, BioSharing; Reza Salek, Metabolights. At Repository Fringe 2015
This document summarizes a presentation on building data networks between authors, repositories, and journals. It discusses why researchers should work with data journals, the general criteria data journals require of repositories, and introduces the Journal of Open Research Software and initiatives like DataCite UK and BioSharing that aim to improve data sharing and reuse through standards and databases.
The document discusses managing research data and reputation. It provides tips for curating data to showcase outputs, highlight collaborations, and promote reuse. Good data practices are important for both protecting against risks and enhancing reputation. Institutions should develop policies, plans, training, and repositories to help researchers manage and share their data.
Jisc on repositories unleashing data - Daniela DucaRepository Fringe
Jisc aims to make the UK the most digitally advanced education and research nation. It supports research through developing shared infrastructure, providing input to funders and publishers, and supporting standards. It is working on two relevant projects: the UK Research Data Discovery Service, which aims to make research data more discoverable by evaluating metadata models from Australia and Canada; and Research Data Metrics, which is scoping a tool to assess data usage and management systems through a proof of concept using the IRUS dataset.
Grampian safe haven, research data networkJisc RDM
Safe havens" should be developed as an environment for population-based research where the risk of identifying individuals is minimized. Researchers in safe havens are bound by strict confidentiality codes preventing disclosure of personally identifying information and providing sanctions for breaches of confidentiality.
This document provides guidance on developing research data management services at universities. It discusses 10 key steps: 1) Understanding current practices, 2) Deciding what services are needed, 3) Balancing the needs of stakeholders, 4) Securing input and buy-in, 5) Defining roles and responsibilities, 6) Positioning support appropriately, 7) Balancing internal and external provision, 8) Being agile and adaptable to change, 9) Linking systems to integrate services, and 10) Planning for long-term sustainability. The overall message is that developing effective RDM requires understanding user needs, engaging stakeholders, and continually adapting services.
Challenges for research support - Sarah Jones, University of Glasgow, Digital...Mari Tinnemans
This document provides guidance on developing research data management services at universities. It discusses 10 key points: 1) Understanding current research data practices, 2) Deciding what services are needed, 3) Balancing the needs of stakeholders, 4) Securing input and buy-in, 5) Defining roles and responsibilities, 6) Positioning support appropriately, 7) Balancing internal and external provision, 8) Being agile and adaptable to change, 9) Linking systems to integrate services, and 10) Planning for long-term sustainability. The overall message is that developing effective RDM requires understanding user needs, engaging stakeholders, and continually adapting services.
Transparency and reproducibility in researchLouise Corti
Talk given at the ESS Summer School: An introduction to using big data in the social sciences, 20-24 July 2020, University of Essex, Colchester, UK.
In the morning we look at publishing and sharing data and the importance of research replication, code sharing, examining what methodological issues peer reviewers might look for in a published paper using big data. An increasing number of journals in the sciences and social sciences expect a high degree of transparency and knowing how best to publish high quality raw (or processed data), methodology and code is a useful skill. We show how ‘data papers’ help to elucidate how datasets were constructed, compiled and processed, and help to showcase the value of data beyond the original research.
Use of data in safe havens: ethics and reproducibility issuesLouise Corti
The document discusses ethics and reproducibility issues related to using data in safe havens. It summarizes the UK Data Service, which curates social science data and uses various safeguards to provide access to controlled data through its spectrum of access. It describes legal gateways for data sharing, the Digital Economy Act, and the UK Statistics Authority's accreditation process for researchers and projects. It also discusses the UK Statistics Authority's ethics self-assessment tool and factors that can impact reproducibility when data and code are behind access restrictions in safe havens.
Reproducibility is our gold standard, but what happens when data are analysed in a safe haven? Restrictions, which can be a high bar, on those accessing data mean that reproducers also need to meet any requirements. The UK Statistics Authority Safe Researcher type model can help create a trusted network of people with the training and skills to access data to check and rerun code. Reproducibility certification, like cascad, can help provide a robust checking process for controlled data.
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
This document discusses incentivizing the uptake of reusable metadata in survey production. It notes that there is no universal language used to document survey questions and variables, leading to wasted resources. The Data Documentation Initiative (DDI) is proposed as a standard. Barriers to adopting metadata best practices include legacy systems, manual processes, and reluctance to change. The document outlines ideas to incentivize metadata use such as specifying documentation requirements in funding calls and improving documentation tools and workflows. Showing tangible benefits through applications like question banks and data exploration systems is also suggested.
How metadata drives data sharing; UK Data Archive Louise Corti
The document discusses metadata and its importance for archiving survey data. It summarizes that metadata drives access to survey data through online browsing systems by providing essential documentation about the variables, questions, and structure of the surveys. It notes common issues with deposited survey metadata including a lack of consistent variable naming and incomplete documentation of changes over time. Improving metadata practices throughout the data lifecycle from production to archiving is important to support reuse of the data.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
The role of open data in enhancing reproducibility
1. The role of open data in
enhancing reproducibility
Westminster Higher Education Forum policy conference:
Next steps for protecting research integrity in the UK
Monday 9 September 2019
Louise Corti
Director, Collections Development and
Producer Relations
UK Data Service
2. • Reproducibility
• Transparency
• Good science
• Helps avoid QRP
• Code
• Data:
• Truthful
• Well documented
• Ethical
• Legal
• Limit embargos on data
• Strict governance e.g. suitable measures and safeguards in place
• Undergrad top senior prf. Dissertaions and their data screen dump
3. Screen dump:
• Open Scienece is not equal to Open Data
• Show a good collection
• Blog Clinical Trials with Cochrane ‘How to share’
• Training Safe Health Researcher (creator/user)
4. Not just open data …high quality open data
• Pathways to access for data held by
UK Data Service
• Gaining ODI Platinum certification for open data
• Running an App Challenge
5. • Builds on 50 years of practice in data curation
• 7000 data collections across social science
spectrum (265 open)
• Supporting the ESRC Data Policy since 1995
• Work with 1000s of data creators using
accumulated tried and tested best practice
• Host the prestigious UK governmental and
academic surveys
• Concordat with Office for National Statistics
UK Data Service – trusted digital repository
6. Managing access to our data holdings
• Download/online access under open
licence without any registrationOpen
• Download/online access to registered
users who agreed to an End User
Licence; possibly for special
conditions, vetted projects etc.
Safeguarded
• Remote or safe room access to
accredited authorised, authenticated,
users whose research proposals and
outputs have been approved
Controlled
Open where possible, closed when necessary
7. Our strategies for enabling safe access
Trusted accredited digital repository
Informed consent for long-term data sharing
Protection of identities when promised
Regulated access where needed
5 SAFES - safe access to data. Fulfils demands
for open science and transparency
Safe data - Safe people - Safe projects –
Safe settings - Safe outputs
https://www.youtube.com/embed/Mln9T52mwj0?platform=hootsuite
8. Successful data ingest
Standard depositors licences or data sharing
agreements and End User Agreements
Use robust and explicit quality assessment
techniques
Use standard deposit metadata and data
description
Work to an agreed timetable for data publishing
training and capacity building to safely share
research data
9. Open Innovation project - App Challenge.net
• Open data and crowdsourcing project to generate
innovative uses and outlets for our data
• Created a harmonised open dataset over 2 waves
• European Quality of Life survey
• Data and disclosure control work with data owners
• Engagement with Open Innovation Community
• Platinum certification – Open Data Institute
• Data delivered via an open API
• Ran a App Challenge
12. An open dataset: whose standards?
• 90 questions – each answer = a URL
- quality, provenance, ethics and legal, documentation,
communication
• Focus on machine readable & actionable metadata
• Massive XML file:
http://doc.ukdataservice.ac.uk/DDI25/7724.xml
• Preservation vs. linked open data challenge
• API data delivery
Developers may NOT read our beautiful
documentation!
Requests to survey API must deliver weighted results
13. #EULIFE entries and winners
Crowdsourced app ideas from developers across the world
• India, Sweden, Serbia, Germany, Finland, Estonia,
Poland, the United Kingdom and New Zealand
• Judges from Google, ODI, RSS, Digital Catapult,
Transport API
Winners: social facts in context
• Quizzes – educational/fun – linked to evidence
• Results linked to contextualised news
• Social or community challenges
• User-friendly visualisation for mobiles
15. Keep connected with us
• Subscribe to UK Data Service list:
www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKDATASERVICE
• Follow UK Data Service on Twitter: @UKDataService
• Facebook
• Youtube: www.youtube.com/user/UKDATASERVICE
• corti@essex.ac.uk (Collections Development and Producer
Relations team)