SlideShare a Scribd company logo
1 of 23
Toward a
Frictionless
Data FuturePRESENTED BY
Jo Barratt
jo.barratt@okfn.org (@jobarratt/@okfnlabs)
AT
3rd Research Data Network - St Andrews University - 30
November 2016
Licensed under cc-by v3.0 (any jurisdiction)
International non-profit founded in 2004
Who we are
● Vision
o A world where open knowledge is ubiquitous, enabling
citizens and organizations to create insights that drive
change on global and local challenges, combat injustice
and inequality and hold governments and corporations
to account.
● Mission
o Open up all essential, public interest information and
see it used to create insight that drives positive change
o Build communities, tools and skills to empower
individuals and organizations to use open information to
create insights that drive change.
Widely adopted - over 20 national governments
and 60+ local governments & cities
ckan.org/instances
£4m in 15mins
Frictionless Data is…
● Lightweight specifications for “packaging” datasets
● Integrations for loading datasets into tools and platforms relevant to
researchers
The Goals...
● Introduce a significant, measurable improvement in how research data is
shared, consumed, and analyzed.
● Make it easier to maintain and improve data quality.
The Problem
Treemap of issues …
Legal barriers
(open data, sharing
agreements etc)
Data Quality
Hard to find
Interoperability
No tool integration
Cargo loading ~1955
Manual, Slow, Costly
(and Dangerous)
Data is Shipping Pre-
Containerization
Containerization
Standards
- Standards (a few, simple ones)
- Tools (primarily for integration)
- Documentation
- Datasets
Data
Containerization
http://www.flickr.com/photos/photohome_uk/1494590209/
Key Principles
1. Simplicity
2. Web Oriented
3. Existing Tools
4. Open
5 Distributed
Tabular Data Package
Data Package
JSON Table
Schema
CSV
http://frictionlessdata.io/guides/tabular-data-package/
http://frictionlessdata.io/guides/data-package/
http://frictionlessdata.io/guides/json-table-schema/
Tabular Data Package
Tooling …
Data Package
Tabular Data Package
JSON Table
Schema
CSV
Tool
e.g. import to R, SQL etc
Tool
e.g. data checking ala GoodTables
Validation
http://goodtables.okfnlabs.org
Validation
Continuous Validation
● If you’re working in a group, you need continuous validation… for data!
● In < 1 hour, we integrated elements (datapackage.json + Python libraries +
GoodTables API) to support continuous data validation
http://uk-25k.datadashboards.io/
Platform Integrations
Partners
View more: http://frictionlessdata.io/partners/
Dataship
YOUR
RESEARCH
ORG
● Project website: http://frictionlessdata.io/
● Specifications: http://specs.frictionlessdata.io/
● GitHub: https://github.com/frictionlessdata/
● User Stories: http://frictionlessdata.io/user-stories/
● Newsletter: http://frictionlessdata.io/get-
involved/#newsletter
● Follow @okfnlabs on Twitter (#frictionlessdata)

More Related Content

What's hot

Gold, silver, bronze - research data network
Gold, silver, bronze - research data networkGold, silver, bronze - research data network
Gold, silver, bronze - research data networkJisc RDM
 
Stop press: should embargo conditions apply to metadata?
Stop press: should embargo conditions apply to metadata?Stop press: should embargo conditions apply to metadata?
Stop press: should embargo conditions apply to metadata?Jisc RDM
 
Towards Open Research
Towards Open ResearchTowards Open Research
Towards Open ResearchJisc RDM
 
What I wish I’d known at the start!
What I wish I’d known at the start!What I wish I’d known at the start!
What I wish I’d known at the start!Jisc RDM
 
Discovering the research data alliance
Discovering the research data allianceDiscovering the research data alliance
Discovering the research data allianceJisc RDM
 
Frances Burton on sensitive data
Frances Burton on sensitive dataFrances Burton on sensitive data
Frances Burton on sensitive dataJisc RDM
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to librariesJisc RDM
 
Rubrics for DMPs
Rubrics for DMPsRubrics for DMPs
Rubrics for DMPsJisc RDM
 
Business case and cost modelling for an end-to-end RDM service
Business case and cost modelling for an end-to-end RDM serviceBusiness case and cost modelling for an end-to-end RDM service
Business case and cost modelling for an end-to-end RDM serviceJisc RDM
 
Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014Jisc
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLHJisc
 
Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Jisc
 
Research data spring: giving researchers credit for their data
Research data spring: giving researchers credit for their dataResearch data spring: giving researchers credit for their data
Research data spring: giving researchers credit for their dataJisc RDM
 
Connected health cities
Connected health citiesConnected health cities
Connected health citiesJisc
 
Research data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDMResearch data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDMJisc RDM
 
Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...Jisc
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
Implementing figshare, research data network
Implementing figshare, research data networkImplementing figshare, research data network
Implementing figshare, research data networkJisc RDM
 
Data sharing in the Netherlands
Data sharing in the NetherlandsData sharing in the Netherlands
Data sharing in the NetherlandsJisc RDM
 
A discovery service for UK research data
A discovery service for UK research dataA discovery service for UK research data
A discovery service for UK research dataJisc RDM
 

What's hot (20)

Gold, silver, bronze - research data network
Gold, silver, bronze - research data networkGold, silver, bronze - research data network
Gold, silver, bronze - research data network
 
Stop press: should embargo conditions apply to metadata?
Stop press: should embargo conditions apply to metadata?Stop press: should embargo conditions apply to metadata?
Stop press: should embargo conditions apply to metadata?
 
Towards Open Research
Towards Open ResearchTowards Open Research
Towards Open Research
 
What I wish I’d known at the start!
What I wish I’d known at the start!What I wish I’d known at the start!
What I wish I’d known at the start!
 
Discovering the research data alliance
Discovering the research data allianceDiscovering the research data alliance
Discovering the research data alliance
 
Frances Burton on sensitive data
Frances Burton on sensitive dataFrances Burton on sensitive data
Frances Burton on sensitive data
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to libraries
 
Rubrics for DMPs
Rubrics for DMPsRubrics for DMPs
Rubrics for DMPs
 
Business case and cost modelling for an end-to-end RDM service
Business case and cost modelling for an end-to-end RDM serviceBusiness case and cost modelling for an end-to-end RDM service
Business case and cost modelling for an end-to-end RDM service
 
Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014Opening up data – Jisc and CNI conference 10 July 2014
Opening up data – Jisc and CNI conference 10 July 2014
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014Authority files - Jisc Digital Festival 2014
Authority files - Jisc Digital Festival 2014
 
Research data spring: giving researchers credit for their data
Research data spring: giving researchers credit for their dataResearch data spring: giving researchers credit for their data
Research data spring: giving researchers credit for their data
 
Connected health cities
Connected health citiesConnected health cities
Connected health cities
 
Research data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDMResearch data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDM
 
Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Implementing figshare, research data network
Implementing figshare, research data networkImplementing figshare, research data network
Implementing figshare, research data network
 
Data sharing in the Netherlands
Data sharing in the NetherlandsData sharing in the Netherlands
Data sharing in the Netherlands
 
A discovery service for UK research data
A discovery service for UK research dataA discovery service for UK research data
A discovery service for UK research data
 

Viewers also liked

Welcome to 3rd Research Data Network
Welcome to 3rd Research Data NetworkWelcome to 3rd Research Data Network
Welcome to 3rd Research Data NetworkJisc RDM
 
Managing Arts and Humanities Data
Managing Arts and Humanities DataManaging Arts and Humanities Data
Managing Arts and Humanities DataJisc RDM
 
Managing DOIs across lifecycles
Managing DOIs across lifecyclesManaging DOIs across lifecycles
Managing DOIs across lifecyclesJisc RDM
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding ProgrammeJisc RDM
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data networkJisc RDM
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data networkJisc RDM
 
Measuring the costs and benefits of RDM to supporta a business case
Measuring the costs and benefits of RDM to supporta a business caseMeasuring the costs and benefits of RDM to supporta a business case
Measuring the costs and benefits of RDM to supporta a business caseJisc RDM
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data networkJisc RDM
 
UK Research Data Discovery Service metadata schema
UK Research Data Discovery Service metadata schemaUK Research Data Discovery Service metadata schema
UK Research Data Discovery Service metadata schemaJisc RDM
 
Secure Lab at the UK Data Service
Secure Lab at the UK Data ServiceSecure Lab at the UK Data Service
Secure Lab at the UK Data ServiceJisc RDM
 

Viewers also liked (12)

NOMAD
NOMADNOMAD
NOMAD
 
SMRUDAS
SMRUDAS SMRUDAS
SMRUDAS
 
Welcome to 3rd Research Data Network
Welcome to 3rd Research Data NetworkWelcome to 3rd Research Data Network
Welcome to 3rd Research Data Network
 
Managing Arts and Humanities Data
Managing Arts and Humanities DataManaging Arts and Humanities Data
Managing Arts and Humanities Data
 
Managing DOIs across lifecycles
Managing DOIs across lifecyclesManaging DOIs across lifecycles
Managing DOIs across lifecycles
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding Programme
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data network
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
Measuring the costs and benefits of RDM to supporta a business case
Measuring the costs and benefits of RDM to supporta a business caseMeasuring the costs and benefits of RDM to supporta a business case
Measuring the costs and benefits of RDM to supporta a business case
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data network
 
UK Research Data Discovery Service metadata schema
UK Research Data Discovery Service metadata schemaUK Research Data Discovery Service metadata schema
UK Research Data Discovery Service metadata schema
 
Secure Lab at the UK Data Service
Secure Lab at the UK Data ServiceSecure Lab at the UK Data Service
Secure Lab at the UK Data Service
 

Similar to Towards a frictionless data future

My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
OKFN, CKAN & OpenData at #OpenRoma
OKFN, CKAN & OpenData at #OpenRomaOKFN, CKAN & OpenData at #OpenRoma
OKFN, CKAN & OpenData at #OpenRomaIrina Bolychevsky
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018Pedro Sousa
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataMartin Kaltenböck
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAResearch Data Alliance
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are goingEuropean Data Forum
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 

Similar to Towards a frictionless data future (20)

My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
OKFN, CKAN & OpenData at #OpenRoma
OKFN, CKAN & OpenData at #OpenRomaOKFN, CKAN & OpenData at #OpenRoma
OKFN, CKAN & OpenData at #OpenRoma
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018
OPEN KNOWLEDGE PLATFORM USE-CASES - TugaIT 2018
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDA
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
Open Sesame: Open Data, Data Liberation and Opportunities for Librarians
Open Sesame: Open Data, Data Liberation and Opportunities for LibrariansOpen Sesame: Open Data, Data Liberation and Opportunities for Librarians
Open Sesame: Open Data, Data Liberation and Opportunities for Librarians
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Open Data is not Enough
Open Data is not EnoughOpen Data is not Enough
Open Data is not Enough
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 

More from Jisc RDM

2019-06_Eunis_Burland
2019-06_Eunis_Burland2019-06_Eunis_Burland
2019-06_Eunis_BurlandJisc RDM
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc RDM
 
Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc RDM
 
Jisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc RDM
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data ModellingJisc RDM
 
Building a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewBuilding a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewJisc RDM
 
Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Jisc RDM
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data ToolkitJisc RDM
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318Jisc RDM
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okJisc RDM
 
Fair data - dinkum research - by Andy Turner
Fair data -  dinkum research - by Andy TurnerFair data -  dinkum research - by Andy Turner
Fair data - dinkum research - by Andy TurnerJisc RDM
 
2018 03 codata - making the case
2018 03 codata - making the case2018 03 codata - making the case
2018 03 codata - making the caseJisc RDM
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPCJisc RDM
 
Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Jisc RDM
 
Managing data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMManaging data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMJisc RDM
 
Managing data behind creative masterpieces
Managing data behind creative masterpiecesManaging data behind creative masterpieces
Managing data behind creative masterpiecesJisc RDM
 
Lightning Talks - Intro
Lightning Talks - IntroLightning Talks - Intro
Lightning Talks - IntroJisc RDM
 
Lightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanLightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanJisc RDM
 
Lightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardLightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardJisc RDM
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela DappartJisc RDM
 

More from Jisc RDM (20)

2019-06_Eunis_Burland
2019-06_Eunis_Burland2019-06_Eunis_Burland
2019-06_Eunis_Burland
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 Paper
 
Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7
 
Jisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case study
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data Modelling
 
Building a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewBuilding a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture Overview
 
Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data Toolkit
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) ok
 
Fair data - dinkum research - by Andy Turner
Fair data -  dinkum research - by Andy TurnerFair data -  dinkum research - by Andy Turner
Fair data - dinkum research - by Andy Turner
 
2018 03 codata - making the case
2018 03 codata - making the case2018 03 codata - making the case
2018 03 codata - making the case
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPC
 
Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1
 
Managing data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMManaging data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCM
 
Managing data behind creative masterpieces
Managing data behind creative masterpiecesManaging data behind creative masterpieces
Managing data behind creative masterpieces
 
Lightning Talks - Intro
Lightning Talks - IntroLightning Talks - Intro
Lightning Talks - Intro
 
Lightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanLightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellan
 
Lightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardLightning Talk - Nick Sheppard
Lightning Talk - Nick Sheppard
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela Dappart
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 

Towards a frictionless data future

Editor's Notes

  1. Hello. I’m Jo Barratt I’m the project manager for the frictionless Data project at Open Knowledge. I have worked as a journalist and and in branding, and I tell you this just by way of explaining that I’m not a coder So I’m here to present an overview of the technology and some of the tooling and projects we are working on. But please do come and talk to me because there are a team of people really keen to talk to you about this.
  2. Open Knowledge. Here’s a bit of info for those of you who might not have come across us before Open Knowledge is an non-profit founded in 2004. It’s based in the UK, though our team is spread around the world. I’m in London, which is where a few other people are, and there and we have people in Berlin, Tel Aviv, Adis Ababa to name but a few. Our vision is a world where open knowledge is ubiquitous, and citizens and organizations are able to create insights from knowledge that drive change. We’ve had over a decade of experience working with organizations to unlock the value of their data.
  3. A lot of our work is based in government. CKAN worlds leading open data portal.
  4. I probably don’t need to do very much to persuade people here of the value of open data. But I’m going to start with a story because it shows firstly the value of opening data, but also a bigger issue, which is one we are looking to solve with the work we are doing now. And it’s a true story. it shows how the UK Government saved four million pounds in fifteen minutes. As citizens it’s quite easy for most people to understand the concept of how the data can benefit them. Even if data exists, and there is a willingness to share it, friction may prevent it “flowing” to where it is most needed and most valuable. Im sure working with universites you understand this. Universities AND governments are prime examples of institutions who are often show great willingness to make data available, but culture or infrastructure make this difficult. I this example although it is open data we’re actually talking about sharing data within government. In 2010, the UK began publishing transactional government spending data as open data. New! Didn’t know the benefits. Three and a half years later, Liam Maxwell, the UK Government CTO has an idea that maybe people in other departments are using a report he’s interested in and so they are duplicating the purchase, needlessly. In the old days this would not be worth his time. Open data = Problem solved right? Not quite. Each department published their data in separate monthly CSV files. It was difficult to understand and laborious to search through to find the information he was looking for. But luckily, someone! conveniently made these searchable whole on the OpenSpending website. Three clicks. Total estimated savings from eliminating that duplication: just over 4 million pounds. Open data had allowed him to turn a question into insight in minutes - and saved the government and taxpayer money in doing so. So the innovation is not that the data is open. But the fact that is was open meant that another organisation could come and build something which could come and link it all up. So the real value is that is it possible to find the data and use it. People have knowledge. We know things. This needs to flow between to people in order to make use of this information, build on it and drive change and support the projects we are working on. We need a better way of matching everything up. Look at academic research. Universities are the places where human knowledge is advanced and pushed to the next level. There are a lot of people, working on a lot of different things across the world but how is this information shared? A paper is published and you have to hope that somebody reads it and then, tells somebody else about it. There is so much out there that is not being used. ANYBODY should be able to use information to generate insight. The best thing to do with your data will be thought of by someone else
  5. Frictionless Data is about removing the friction in working with data. We are developing a set of tools, specifications, and best practices for processing, describing, and publishing data. The heart of this project is the “Data Package”, a containerization format that’s based on existing practices for publishing open-source software.
  6. But Now, i let’s look a little closer at the issues at the specific problems we have working with with data. Openness is not the only one. There is more and more data being opened up, and more and more data available in all sorts of different places and formats. but, there are additional problems involved in getting hold of, sharing, and using this data in research - or for whatever you want to use it for. There is FRICTION in the process.
  7.  And we’ve done a rough and ready proportional representation this. On the Left here are all the things which are stopping people wanting to release the data. So the reasons for this might be economic. Or political.   And we’ve made huge progress here in the last decade. From my experience working with government, this feels like a battle which is already won. It’s the other side of the chat which is a bit more of a worry. Hard to discover or find. Structure is poor and/or needs significant manipulation to be usable so even when you do find data, it is usually a manual process to connect it with your tool. No standardized schema and different data sources are hard to compare or integrate And it might be impossible to get it into the tool you are using. Which might mean laborious time spent coping data form one tool to another.
  8. In 1955 shipping was slow, it was dangerous and it was complicated. But then came modern shipping containers. On 26 April 1956, the Ideal X, made its maiden voyage from Port Newark to Houston stacked with 58 metal containers. They we’re taking orders before the had docked and the enterprise expanded into what became knows as Sea-Land Services. Suddenly a ship could be unloaded with a tiny proportion of the men it too before, reducing the price and the cost. It wasn’t long before other companies began to adopt the same specifications. Use increased quickly and because it made sense for everybody, standardisation followed quickly. In 1961 the ISO set the sizes for Shipping containers which are still pretty much in use today.
  9. Data is Shipping Pre-Containerization Which is where the Frictionless Data project and Data Packages come in
  10. DP = Metal box FD = cranes
  11. The main idea behind the Data Package format is to create a common interchange format: publishers can publish their data as a Data Package while consumers can integrate Data Packages into their research workflow. They can just plug it in. Whether that involves an SQL database, BigQuery, analysis using Python, pandas, or Excel.   With Data Packages, data publishers only need to support this common container format to simplify export to an ever-increasing number of other tools and services. Getting these elements of the global data infrastructure right can reduce the friction experienced by researchers who work with data. We believe this will result in improved data quality, use, and sharing leading to more insight.   Now to run though some of the key points about our approach here.
  12. They are simple. They use the most basic formats. They are Web Orientated. Use formats that are web "native" (JSON) and work naturally with HTTP (e.g. CSV streams). We have designed the data package to work as easily as possible with existing tools. Everyone has tools to use CSV and its supported by almost every language. Why would we want to change the approach here.   They are open. Anyone should be able to freely and openly use and reuse what we build. Our community is open to everyone.   And they are Distributed: This is not about creating a central data registry, but rather a basic framework that would enable anyone to publish and use high quality datasets more easily.   Through this approach, we aim to revolutionize how research data is shared, consumed, and analyzed while also enabling massive improvements in data quality.
  13. Here is a DP – or a representation of one. In real life, this is code. This is a “tabular” data package. Which is designed specifically for tabular data DP for different groups. Different ways of working with data, different common metadata etc. DP - the box carton, with some basic descriptions of the data CSV - the data No, or very few or no changes to existing data. It all goes into the data package. JSON Table Schema – lets us know what we expect to see in the csv file.
  14. And once you have the data package, you can play with all the tools we are creating. Or import to other tools directly. Just like with Shipping containerisation it’s not the metal box which is the innovation.
  15. Here one of the first tools we have built. Goodtables Much of the “friction” in using the data comes from the time and effort needed to identify and address these errors before analyzing it in in a given tool. . So we’re focussing on this early stage in the process. To remove the friction. 2 things you can do here You can upload a data set which will test for structural errors in table (e.g. missing headers, blank rows, etc.) And you have the option to test against a Schema errors which pre defines what we expect to be in the fields.
  16. Now upload… Shows what is wrong with your data how to fix it in a user friendly fashion. So again, this is not big data, not complicated data, but it vastly reduced the friction involved and if we are fixing this part of the chain, it makes the job of people further along, a lot more straightforward! For example we are exploring this in a working group we have set up around Archeological data. They are really quite excited about the difference this can make in the filed. At the stage the data is becoimng digitised. But now with there’s another step in the process which is CONTINUOUS validation, and the benefits are multiplied.
  17. Software projects have long benefited from Continuous Integration services like Travis CI for making sure code is of a high quality. With every update to a bit of code Tests are automatically run and a report is generated to a project’s shared repository. And developers can find and resolve errors quickly and reliably. As with software, datasets are created, edited, and updated over time, and by different people. And with continuous validation using goodtables, we do exactly the same. A set of tests are run on the data using goodtables, If “bad” data, is used the “build” fails and issues a report indicating what went wrong. So it can be fixed and we can build stronger, better data. This is not just for science or match subjects where you will eventually go on and imput this into a complex analysis tool. Imagine you have a series of questionnaires on a google form. You can use good tables to instantly check a survey which could save you hours and hours of time reorganising your data.
  18. Data Quality Dashboard Something we built for UK gov, to compare quality of openly published data across department. Ou can tell just by looking at this, that the issue is not the openness of the data, but its quality. This can do several things. Keep organisations in check. Allow people to get better. Specific issues with the data. Also building a registry which will allow you to publish data from the command line, and will support much of our other work we are doing. ALL FREE OPEN SOURCE.
  19. Our overall mission is to make it easier to develop tools and services for working with data and also to ensure greater interoperability between new and existing tools and services.   A KEY part of it is to speak to people and find out what they need to help make their lives easier working with data.   We now already support import/export for: CKAN, BigQuery, AWS Redshift, SQL, …   We are building libraries in Python, Ruby, Matlab, R which will allow users to easily get data into a proper backend for further use   And what else? You tell us. In piloting our approach, we are placing a particular emphasis on supporting researchers in addressing their existing data needs across various scientific disciplines.
  20. We are running targeted pilots to trial these tools and specifications on real data Are you a researcher looking for better tooling to manage your data? Are youworking on research data and would like to work with us on issues for which data packages are suited? Are you a developer and have an idea for something we can build together? Talk to us. We have time and attention to give you (and funding!)
  21. Talk to me!