SlideShare a Scribd company logo
DATA LIBERATION
Opening Up Data by Hook
or by Crook - Data
Scraping, Linkage and the
Value of a Good Identifier
                               Tony Hirst
                      Department of Communication
                             and Systems
                          The Open University
data NOT
information
              by Vick
[Disruptive
Innovation?]
“First” generation:
 data catalogues
Breathing life
 into data…
=importData(“CSV_URL”)
the spreadsheet becomes

A DATABASE
“Second” generation:
 data management
      systems
There’s lots more
data that’s locked
up in web pages…
Scraping…
“grabbing web content
in a machine readable
   format and then
 processing it for your
    own purposes”
Original      Extract
                          Accessible
HTML web    Information
                          web page
  page         -> data
Recreating the
database that was used
     to populate a
   (templated) page
…quick’n’dirty
Scrapers
                  SQLite
    Scraper      database




Views
   SQLitedatab
       ase
                 Scraper
Sometimes the
 data is spread
across different
     files…
Row based
aggregation
Sometimes the
 data is spread
across different
  websites…
…   Normalisation…
Data
Enrichment
Column
Additions/An
 notations
Sometimes the
  data is split
across different
     files…
Column
based merge
-> Data
cleansing
Clustering…
http://mashe.hawksey.info/2012/11/mining-and-openrefineing-jiscmail-a-look-at-oer-discuss/

/via Martin Hawksey/@mhawksey
“Finessing” a
  common
  identifer
Common identifiers
 (common KEYS) make
it MUCH easier to JOIN
   datasets by column
Book Title
-> ISBN
I am “psychemedia”
            on
Twitter, delicious, slide
  share, flickr, etc etc
Reconciliation…
Linked
Data™
So who speaks SPARQL?




     Diners - Journal Canteen
     by avlxyz
You DON’T have to….
Just think about how one piece of
 data might be related to another
   through a common means of
        addressing them…
http://ouseful.info

 @psychemedia

More Related Content

What's hot

Soton2013 opendata
Soton2013 opendataSoton2013 opendata
Soton2013 opendataTony Hirst
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Emily Nimsakont
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
ekansa
 
I say NoSQL you say what
I say NoSQL you say whatI say NoSQL you say what
I say NoSQL you say what
Pratik Khasnabis
 
Relevance of clasification and indexing
Relevance of clasification and indexingRelevance of clasification and indexing
Relevance of clasification and indexingVaralakshmiRSR
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
Enno Meijers
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?Emily Nimsakont
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
Dejan Radic
 
Databases and types of databases
Databases and types of databasesDatabases and types of databases
Databases and types of databases
baabtra.com - No. 1 supplier of quality freshers
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
Shana McDanold
 
Lodlam.slideshare
Lodlam.slideshareLodlam.slideshare
Lodlam.slideshare
Hafabe
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
Shana McDanold
 
ECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern World
Marc D Anderson
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Lars Marius Garshol
 
Towards collaboration at scale: Libraries, the social and the technical
Towards collaboration at scale:  Libraries, the social and the technicalTowards collaboration at scale:  Libraries, the social and the technical
Towards collaboration at scale: Libraries, the social and the technical
lisld
 
Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An Introduction
Emily Nimsakont
 
The network reconfigures the catalog
The network reconfigures the catalogThe network reconfigures the catalog
The network reconfigures the catalog
lisld
 
Practical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital ProjectPractical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital Project
Jill Strass
 

What's hot (18)

Soton2013 opendata
Soton2013 opendataSoton2013 opendata
Soton2013 opendata
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
I say NoSQL you say what
I say NoSQL you say whatI say NoSQL you say what
I say NoSQL you say what
 
Relevance of clasification and indexing
Relevance of clasification and indexingRelevance of clasification and indexing
Relevance of clasification and indexing
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
Databases and types of databases
Databases and types of databasesDatabases and types of databases
Databases and types of databases
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
 
Lodlam.slideshare
Lodlam.slideshareLodlam.slideshare
Lodlam.slideshare
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
 
ECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern World
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Towards collaboration at scale: Libraries, the social and the technical
Towards collaboration at scale:  Libraries, the social and the technicalTowards collaboration at scale:  Libraries, the social and the technical
Towards collaboration at scale: Libraries, the social and the technical
 
Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An Introduction
 
The network reconfigures the catalog
The network reconfigures the catalogThe network reconfigures the catalog
The network reconfigures the catalog
 
Practical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital ProjectPractical Metadata Where Do I Start For a Digital Project
Practical Metadata Where Do I Start For a Digital Project
 

Similar to Data Liberation - Tony Hirst

What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...Emily Nimsakont
 
What is the Semantic Web
What is the Semantic WebWhat is the Semantic Web
What is the Semantic Web
Juan Sequeda
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
Emily Nimsakont
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?
MIUR
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
ljsmart
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
lisld
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
Jenel Farrell
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
Alessandro Adamou
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked data
Ryan Johnson
 
Madrid Building blocks of Linked Data
Madrid Building blocks of Linked DataMadrid Building blocks of Linked Data
Madrid Building blocks of Linked Data
Victor de Boer
 
Engineering a Semantic Web (Spring 2018)
Engineering a Semantic Web (Spring 2018)Engineering a Semantic Web (Spring 2018)
Engineering a Semantic Web (Spring 2018)
Rensselaer Polytechnic Institute
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project posters
PrattSILS
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environment
Jakob .
 
Linked library data
Linked library dataLinked library data
Linked library data
Jindřich Mynarz
 
What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection?
Debra Shapiro
 
Semantic Mapping and LOD prez
Semantic Mapping and LOD prezSemantic Mapping and LOD prez
Semantic Mapping and LOD prezCarol Chiodo
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
Laura Po
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
Santhosh Basavarajappa
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
John Breslin
 

Similar to Data Liberation - Tony Hirst (20)

What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...
 
What is the Semantic Web
What is the Semantic WebWhat is the Semantic Web
What is the Semantic Web
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked data
 
Madrid Building blocks of Linked Data
Madrid Building blocks of Linked DataMadrid Building blocks of Linked Data
Madrid Building blocks of Linked Data
 
Engineering a Semantic Web (Spring 2018)
Engineering a Semantic Web (Spring 2018)Engineering a Semantic Web (Spring 2018)
Engineering a Semantic Web (Spring 2018)
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project posters
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environment
 
Linked library data
Linked library dataLinked library data
Linked library data
 
What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection?
 
Semantic Mapping and LOD prez
Semantic Mapping and LOD prezSemantic Mapping and LOD prez
Semantic Mapping and LOD prez
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 

More from Incisive_Events

Gaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experimentGaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experimentIncisive_Events
 
Louise Corti Data scientists
Louise Corti Data scientistsLouise Corti Data scientists
Louise Corti Data scientistsIncisive_Events
 
Richard Wallis Linked Data
Richard Wallis Linked DataRichard Wallis Linked Data
Richard Wallis Linked DataIncisive_Events
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data managementIncisive_Events
 
Anne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your researchAnne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your researchIncisive_Events
 
Mahendra Mahey British Library Labs
Mahendra Mahey British Library LabsMahendra Mahey British Library Labs
Mahendra Mahey British Library LabsIncisive_Events
 
Phil Bradley The future of Search
Phil Bradley The future of SearchPhil Bradley The future of Search
Phil Bradley The future of SearchIncisive_Events
 
Arthur Weiss Google vs other search tools
Arthur Weiss Google vs other search toolsArthur Weiss Google vs other search tools
Arthur Weiss Google vs other search toolsIncisive_Events
 
James Bennett CLA Search and Licence System
James Bennett CLA Search and Licence SystemJames Bennett CLA Search and Licence System
James Bennett CLA Search and Licence SystemIncisive_Events
 
Lucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly booksLucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly booksIncisive_Events
 
Max Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open AccessMax Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open AccessIncisive_Events
 
Jacob Morgan The Future of Work
Jacob Morgan The Future of WorkJacob Morgan The Future of Work
Jacob Morgan The Future of WorkIncisive_Events
 
Mark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing worldMark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing worldIncisive_Events
 
Alex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environmentAlex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environmentIncisive_Events
 
Sarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your TeamSarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your TeamIncisive_Events
 
James Andrews User Engagement
James Andrews User EngagementJames Andrews User Engagement
James Andrews User EngagementIncisive_Events
 

More from Incisive_Events (20)

Gaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experimentGaby Lutgens Edl@b experiment
Gaby Lutgens Edl@b experiment
 
Hugh Davis MOOCs
Hugh Davis MOOCsHugh Davis MOOCs
Hugh Davis MOOCs
 
Louise Corti Data scientists
Louise Corti Data scientistsLouise Corti Data scientists
Louise Corti Data scientists
 
Richard Wallis Linked Data
Richard Wallis Linked DataRichard Wallis Linked Data
Richard Wallis Linked Data
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 
Jan Reichelt Mendeley
Jan Reichelt MendeleyJan Reichelt Mendeley
Jan Reichelt Mendeley
 
Rachel Green Jove
Rachel Green JoveRachel Green Jove
Rachel Green Jove
 
Anne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your researchAnne Osterrieder Tools for sharing your research
Anne Osterrieder Tools for sharing your research
 
Mahendra Mahey British Library Labs
Mahendra Mahey British Library LabsMahendra Mahey British Library Labs
Mahendra Mahey British Library Labs
 
Phil Bradley The future of Search
Phil Bradley The future of SearchPhil Bradley The future of Search
Phil Bradley The future of Search
 
Arthur Weiss Google vs other search tools
Arthur Weiss Google vs other search toolsArthur Weiss Google vs other search tools
Arthur Weiss Google vs other search tools
 
James Bennett CLA Search and Licence System
James Bennett CLA Search and Licence SystemJames Bennett CLA Search and Licence System
James Bennett CLA Search and Licence System
 
Lucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly booksLucy Montgomery Open access for scholarly books
Lucy Montgomery Open access for scholarly books
 
Max Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open AccessMax Espley Royal Society of Chemistry and Open Access
Max Espley Royal Society of Chemistry and Open Access
 
Jacob Morgan The Future of Work
Jacob Morgan The Future of WorkJacob Morgan The Future of Work
Jacob Morgan The Future of Work
 
Mark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing worldMark Stevenson Surviving in a fast changing world
Mark Stevenson Surviving in a fast changing world
 
Alex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environmentAlex Follett Integrating your library into wider institutional environment
Alex Follett Integrating your library into wider institutional environment
 
Sarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your TeamSarah Fahy Reshaping Your Team
Sarah Fahy Reshaping Your Team
 
James Andrews User Engagement
James Andrews User EngagementJames Andrews User Engagement
James Andrews User Engagement
 

Recently uploaded

Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Data Liberation - Tony Hirst

Editor's Notes

  1. Tony HirstTwitter:@psychemediaBlog: http://blog.ouseful.infoPresentation prepared for: Online Info 12/11/2012DATA LIBERATION: OPENING UP DATA BY HOOK OR BY CROOK - DATA SCRAPING, LINKAGE AND THE VALUE OF A GOOD IDENTIFIERThe 1/9/90 rule is often used to characterise the way in which a small number of creators generate content that a larger number (but still small percentage in the greater scheme of things) comment on or amplify, whilst the majority just passively consume. In this presentation, I will explore the extent to which a similar view applies to the world of "data liberation". After reviewing the idea of data scraping, and some of the techniques surrounding it, I will describe how online tools such as Scraperwiki provide a platform for concentrating data scraping activity and expertise, as well as supporting the publication of data /as data/ in a variety of formats, in addition to 'end user' views in the form of graphical charts and interactive visualisations.One of the major motivations for data scraping is the aggregation of data from a variety of data sources into a larger, integrated whole. For example, the aggregation of research council funding data from separate research councils allows us to view a large proportion of the publicly funded research grants received by a single institution; or the collection of local council spending data across all UK councils allows us to see how councils spend money with each other across a range of transaction areas. But how do we actually create such aggregations when the data is sourced from different areas? In order to do this, we need to know when different datasets are actually talking about the same thing, which is where common identifiers come in. For it is surely the case that when we have common identifiers, we can have linkage, and as a result start to realise some of the benefits of Linked Data (as well as developing a wider appreciation of what those benefits might actually be...) (As an aside, I'll describe how we might go about deriving such identifiers when they are missing from a data set that might otherwise, or more conveniently, be expected to publish them.)Throughout the presentation, I will draw on practical examples of how aggregated "liberated" data has been used as the basis of wider interest, and even status quo disrupting, services, as well as reflecting on what other sources of data we might see the data liberators turning their attention to next...Key learning points:1 - What is "data scraping", how can I do it and is my website at risk of it?2 - Why the secret to understanding "Linked Data" is the very idea of it, not just (or not even) the technology.3 - How has data scraping been used to "open up" data in actual practice?
  2. The focus on this presentation is not the release of “information”, but the release of data in raw form so that it can be interpreted and presented in informative ways by other parties.
  3. The London Datastore is an early example of a council-centric open data website. Early signs suggest it is natural to locate data websites at addresses of the form data.COUNCILNAME.gov.uk or www.COUNCILNAME.gov.uk/data
  4. Another example that demonstrates how CSV can be used to help data flow is demonstrated by Google Spreadsheets. The =importData formula allows a user to specify a source data URL, and pull the CSV data found at that location in to the spreadsheet. Unlike Many Eyes Wikified, if the source data at the URL is updated, the updated will (eventually) be pulled into the spreadsheet automatically.
  5. One of the really good reasons for getting data into a data processing environment such as a spreadsheet is that you can start to work it. In the case of Google Spreadsheets, the spreadsheet environment can also be used as a database environment. That is, we can treat one or more data containing sheets in a spreadsheet as a database, and generate new views over the data, as well as running queries over that data.
  6. Another way of using a Google Spreadsheet as a database is via the Google Spreadsheets API. The GoogleVisualisation API (?) provides a way of passing queries written using the Google ???viz query language from an arbitrary web page or web application, and receiving the resulting data in a standard JSON based format, which also happens to play nicely with the Google Visualisation API???The Guardian Datastore explorer is a crude demonstration for 2009(??) demonstrating how data from the Guardian datastore, data that is stored across a range of Google spreadsheets, can be explored , queried and visualised via these APIs. Users can select a dataset from a drop down menu, fed from a delicious account to which various datastore spreadsheets have been bookmarked using a particular set of tags, or by pasting in the URL of an arbitrary (public) Google spreadsheet. The first row/headings of the data can then be previewed (a simple spreadsheet is assumed, in which column headings appear In the first row of the spreadsheet).
  7. A series of list boxes are then populated with the column labels and there names, and provide a certain amount of help for the creation of a query over the spreadsheet data. A range of output formats can also be selected, from simple HTML data tables, to a range of charts. URLs are also generated for HTML and CSV representations of the data returned from the query.
  8. One of the nice things about the data table widget (a standard GoogleVisualisation API component in this case, though similar examples exist for YUI, the Yahoo User Interface Libraries, or frameworks such as JQuery), is that is supports things like row sorting by column, (for free – no programming required!), allowing even further manipulation of the data, albeit at a simplistic level.(It’s probably worth pointing out here that it may be worth providing a preview of the column headings and first few rows (or a sample of random rows) of data when datasets are published, just so that users can see what sort of data is on offer without having to download the whole data set?)
  9. If you’re in the business of selling information as data, you are under threat where that information is published in an openly licensed way.
  10. Linked Data – the TM is something of a joke and refers to the particular style of publishing data according to set of principles first outlined by the inventor of the World Wide Web, Sir Tim Berners Lee – is one of the data formats that the Government’s data task force favour for the publication of data.
  11. There is a problem though – at the moment, there are barriers to entry to Linked Data world from both the query side (not many people speak SPARQL, or know how to construct a SPARQL query to an endpoint) and the results side (data is returned as RDF).
  12. So – do you speak SPARQL?