SlideShare a Scribd company logo
Using entity extraction extension with 	

OpenRefine and Dandelion API	

!
food for thoughts
What we are talking about
OpenRefine www.openrefine.org
NER extension integrated with
Dandelion API
http://freeyourmetadata.org/named-entity-extraction/
(dandelion.eu)
What industries are using OpenRefine?
https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
data journalists
metadata curators
museums
libraries
research labs
SEO folks
data scientists
enterprises
universities
patent attorneys
Open Data 	

hackers
Social Media specialists
civil servants
What does OpenRefine offer that other 	

data-parsing tools don't?
http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
reconciliation of text data against reference data 	

services containing strong identifiers (Freebase,
OpenCorporates, any SPARQL or RDF, etc)	

!
simple linking of reconciled entities to other info 	

sources like Wikipedia, MusicBrainz, IMDB, etc	

[…]
[…]
How we are using it, at SpazioDati?
OpenRefine is inside 
our data curation controller
normalize, clean and extract data from different 	

sources	

reconcile against internal reconciliation services 	

( administrative regions, names and telephone 	

numbers… )
apply rules and transformations to data, aligned	

it with our internal ontologies
A look at OpenRefine &	

reconciliation
Why it’s useful reconciliation?
Instruments
bla bla bla
bla bla bla bla
…
what kind of 	

instruments?
reconciliation identifies 	

keywords in flowing text and gives them a URL
from strings to things
instruments	

data column
musical instruments
measuring instruments
aeronautical instruments
URL
URL
URL
Instruments
bla bla bla
reconciliation works great for those fields 	

in your dataset that contain single terms
names of people	

countries, 	

works of art	

[…]
and what if we have a column with	

unstructured texts, like this one?
we need a new step in the data curation workflow…
a new column data,	

labelled “dataTXT”
extract named 	

entities using	

NER extension 	

+ Dandelion API
data column with 	

some texts
in this column, there are named concepts, 	

linked to Wikipedia
label + URI
“Collective action” + http://en.wikipedia.org/wiki/Collective_action
make a text filter
looking for a concept
classify and categorize 	

the content
…
things, not strings
some scenarios
Open Data community real issues
Using OpenRefine + NER extension with 	

Dandelion API
extract meaninful informations from some	

CVs, like names, organizations, skills, …
http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction
normalize organizations names cited in some	

texts
Data journalists
Using OpenRefine + NER extension with 	

Dandelion API
extract relevant news to a precise topic	

( a person, a brand or a company )
write a summary from a politician speech, starting 	

from the main concepts extracted from the text
mine specific informations in judicial decisions 	

(judge's name, court, area of law and neutral citation
number
Using OpenRefine + NER extension with 	

Dandelion API
Text mining on tweets: extract brands,	

places and concepts easily from a twitter flow	

related to an event
Text mining on website content: extract concepts and
places easily from a webpage, to improve website	

SEO ranking
Social media specialists
Using OpenRefine + NER extension with 	

Dandelion API
Understand your own bank account statements: 	

extract useful informations, like brands and places, 	

to categorize and classify your own expenses
“Quantify self” movement
Analytics on Personal Data
@dandelionapi	

#refine	

#ner
you know other use cases?	

tell us on Twitter!
@spaziodatidandelion.eu

More Related Content

What's hot

Data Wrangling with Open Refine
Data Wrangling with Open RefineData Wrangling with Open Refine
Data Wrangling with Open Refine
LOUIS Libraries
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
Richard Cyganiak
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
Michele Pasin
 
SemanticWebApp
SemanticWebAppSemanticWebApp
SemanticWebApp
Adela Beres
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
Liz Grumbach
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
Ontotext
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddata
Tope Omitola
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
Peter Haase
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
Ontotext
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
Terry Reese
 
Linked Open (meta)Data
Linked Open (meta)DataLinked Open (meta)Data
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
Tope Omitola
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
CIARD Movement
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
mhaendel
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Ronald Ashri
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
mhaendel
 
Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
Martin Magdinier
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
Crossref
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Ontotext
 

What's hot (20)

Data Wrangling with Open Refine
Data Wrangling with Open RefineData Wrangling with Open Refine
Data Wrangling with Open Refine
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
 
SemanticWebApp
SemanticWebAppSemanticWebApp
SemanticWebApp
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddata
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
 
Linked Open (meta)Data
Linked Open (meta)DataLinked Open (meta)Data
Linked Open (meta)Data
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
 

Viewers also liked

Journaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensablesJournaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensables
Cedric Motte
 
OpenRefine Tutorial
OpenRefine TutorialOpenRefine Tutorial
OpenRefine Tutorial
Alex Petralia
 
Neural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learningNeural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learning
Matthew
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
Vijaya Prabhu
 
OpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for LibrariansOpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for Librarians
tfmorris
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
huguk
 

Viewers also liked (6)

Journaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensablesJournaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensables
 
OpenRefine Tutorial
OpenRefine TutorialOpenRefine Tutorial
OpenRefine Tutorial
 
Neural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learningNeural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learning
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
OpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for LibrariansOpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for Librarians
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 

Similar to Using entity extraction extension with OpenRefine and Dandelion API

Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
Neo4j
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovations
Bert Carelli
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
Mathieu d'Aquin
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
gramana
 
OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data context
eldorina
 
Making things findable
Making things findableMaking things findable
Making things findable
Peter Mika
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Mark Wilkinson
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
Jean Graef
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
Ontotext
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
Charlie Hull
 
PoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataPoolParty SKOS and Linked Data
PoolParty SKOS and Linked Data
Andreas Blumauer
 
Linked Data
Linked DataLinked Data
Linked Data
Danny Ayers
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Laurent Alquier
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
Rensselaer Polytechnic Institute
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
Mathieu d'Aquin
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
George Thomas
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
Peter Mika
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked Data
Mathieu d'Aquin
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
sssw2011
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
poojagupta267
 

Similar to Using entity extraction extension with OpenRefine and Dandelion API (20)

Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovations
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data context
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
 
PoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataPoolParty SKOS and Linked Data
PoolParty SKOS and Linked Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked Data
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 

More from SpazioDati

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data Journalism
SpazioDati
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the story
SpazioDati
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked Statistics
SpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch API
SpazioDati
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
SpazioDati
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in Italy
SpazioDati
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuable
SpazioDati
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked
SpazioDati
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
SpazioDati
 
Introducing JSONpedia
Introducing JSONpediaIntroducing JSONpedia
Introducing JSONpedia
SpazioDati
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1
SpazioDati
 

More from SpazioDati (14)

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data Journalism
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meeting
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the story
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked Statistics
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch API
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in Italy
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuable
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
 
Introducing JSONpedia
Introducing JSONpediaIntroducing JSONpedia
Introducing JSONpedia
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1
 

Recently uploaded

Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 

Recently uploaded (20)

Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 

Using entity extraction extension with OpenRefine and Dandelion API

  • 1. Using entity extraction extension with OpenRefine and Dandelion API ! food for thoughts
  • 2. What we are talking about OpenRefine www.openrefine.org NER extension integrated with Dandelion API http://freeyourmetadata.org/named-entity-extraction/ (dandelion.eu)
  • 3. What industries are using OpenRefine? https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
  • 4. data journalists metadata curators museums libraries research labs SEO folks data scientists enterprises universities patent attorneys Open Data hackers Social Media specialists civil servants
  • 5. What does OpenRefine offer that other data-parsing tools don't? http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
  • 6. reconciliation of text data against reference data services containing strong identifiers (Freebase, OpenCorporates, any SPARQL or RDF, etc) ! simple linking of reconciled entities to other info sources like Wikipedia, MusicBrainz, IMDB, etc […] […]
  • 7. How we are using it, at SpazioDati?
  • 8. OpenRefine is inside our data curation controller
  • 9. normalize, clean and extract data from different sources reconcile against internal reconciliation services ( administrative regions, names and telephone numbers… ) apply rules and transformations to data, aligned it with our internal ontologies
  • 10. A look at OpenRefine & reconciliation
  • 11. Why it’s useful reconciliation? Instruments bla bla bla bla bla bla bla … what kind of instruments?
  • 12. reconciliation identifies keywords in flowing text and gives them a URL from strings to things
  • 13. instruments data column musical instruments measuring instruments aeronautical instruments URL URL URL Instruments bla bla bla
  • 14. reconciliation works great for those fields in your dataset that contain single terms names of people countries, works of art […]
  • 15. and what if we have a column with unstructured texts, like this one?
  • 16. we need a new step in the data curation workflow… a new column data, labelled “dataTXT” extract named entities using NER extension + Dandelion API data column with some texts
  • 17. in this column, there are named concepts, linked to Wikipedia label + URI “Collective action” + http://en.wikipedia.org/wiki/Collective_action
  • 18. make a text filter looking for a concept classify and categorize the content … things, not strings
  • 20. Open Data community real issues Using OpenRefine + NER extension with Dandelion API extract meaninful informations from some CVs, like names, organizations, skills, … http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction normalize organizations names cited in some texts
  • 21. Data journalists Using OpenRefine + NER extension with Dandelion API extract relevant news to a precise topic ( a person, a brand or a company ) write a summary from a politician speech, starting from the main concepts extracted from the text mine specific informations in judicial decisions (judge's name, court, area of law and neutral citation number
  • 22. Using OpenRefine + NER extension with Dandelion API Text mining on tweets: extract brands, places and concepts easily from a twitter flow related to an event Text mining on website content: extract concepts and places easily from a webpage, to improve website SEO ranking Social media specialists
  • 23. Using OpenRefine + NER extension with Dandelion API Understand your own bank account statements: extract useful informations, like brands and places, to categorize and classify your own expenses “Quantify self” movement Analytics on Personal Data
  • 24. @dandelionapi #refine #ner you know other use cases? tell us on Twitter! @spaziodatidandelion.eu