LOD2 Webinar Series: Zemanta / Open refine

Creating Knowledge out of Interlinked Data

LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu


LOD2 is a large-scale integrating project co-funded by the European
Commission within the FP7 Information and Communication Technologies
Work Programme. This 4-year project comprises leading Linked Open
Data technology researchers, companies, and service providers. Coming
from across 12 countries the partners are coordinated by the Agile
Knowledge Engineering and Semantic Web Research Group at the
University of Leipzig, Germany.

LOD2 will integrate and syndicate Linked Data with existing large-scale
applications. The project shows the benefits in the scenarios of Media and
Publishing, Corporate Data intranets and eGovernment.

http://lod2.eu


Once
per
month
the
LOD2
webinar
series
oﬀer
a
free
webinar
about

tools
and
services
along
the
Linked
Open
Data
Life
Cycle.

Stay
with
us
and
learn
more
about
acquisiAon,
ediAng,
composing,

connected
applicaAons
–
and
ﬁnally
publishing
Linked
Open
Data.

http://lod2.eu


LODRefine – LOD-enabled
OpenRefine
The tool for cleansing, linking and augmenting data by Mateja Verlic, Zemanta

http://lod2.eu


Company

Zemanta brings useful content to bloggers,
connect authors to their peers and publishers
to marketers.

•  Content research services
•  Content enrichment tools

Our role in LOD2
•  Web scale link & text mining from unstructured data
•  Tools for cleansing data and crowdsourcing of cleansing

Dr. Mateja Verlič



Presentation outline

•  Terminology briefing
•  Introduction to LODRefine
•  The core: OpenRefine
•  LOD-friendly extensions
•  Demonstration
•  Q&A



Reconciling

Def: to reconcile
•  To reestablish a close relationship between.
•  To make compatible or consistent.
(The Free Dictionary)



Augmenting / extending

Def: to augment
•  To make (something already developed or well under way) greater, as in size,
extent, or quantity
(The Free Dictionary)



Crowdsourcing

Def: crowdsourcing
•  is the act of outsourcing tasks, traditionally performed by an employee or
contractor, to an undefined, large group of people or community (a crowd),
through an open call.



Introduction to LODRefine

LOD-enabled OpenRefine

Google Refine ==> OpenRefine
LODGrefine ==> LODRefine

•  Supporting DBpedia (and Freebase)
•  Supporting crowdsourcing
•  Exporting RDF
•  Extracting named entities



LODRefine’s place in LOD life cycle



OpenRefine

Cross-platform server-client application
•  Runs locally
•  No dataset

Supports:
•  Faceted browsing
•  Regular expressions
•  GREL expressions
•  Extensions

value.split(",")[0].strip()



OpenRefine



The Extensions

Extend functionalities of OpenRefine

Developed by
•  Zemanta: DBpedia extension, Crowdsourcing
•  DERI: RDF Refine
•  Free Your Metadata Group: Named Entity Extraction extension



RDF Refine extension

Reconciliation and interlinking
•  DBpedia
•  Any SPARQL Endpoint or RDF dump
•  Supporting for Apache Stanbol

Exporting RDF
•  Defining graph shape before exporting
•  Using custom vocabularies or importing existing ones

Webpage: http://refine.deri.ie/
Github: https://github.com/fadmaa/grefine-rdf-extension



RDF Refine extension - reconciling



DBpedia extension

Extending reconciled data with columns from DBpedia
•  RDF extension recommended

Extracting Named Entities using Zemanta API
•  API key required

Webpage: http://code.zemanta.com/sparkica
Github: https://github.com/sparkica/dbpedia-extension



DBpedia extension – extending data



DBpedia extension – extracting entities



NER extension

Extracts named entities from unstructured text

Currently supports
•  Alchemy API
•  DBpedia Lookup
•  Zemanta API

API keys required

Webpage: http://freeyourmetadata.org/named-entity-extraction/
Github: https://github.com/RubenVerborgh/Refine-NER-Extension



NER extension – extracting entities



Crowdsourcing extension

Support for
•  Creating new crowdsourcing jobs
•  Publishing data on CrowdFlower service
•  Multiple labor channels (Amazon MT)
•  CrowdFlower API key required

Job templates
•  Evaluating reconciliation results
•  Finding information (e.g. URLs)

Webpage: http://code.zemanta.com/sparkica/
Github: https://github.com/sparkica/crowdsourcing



Crowdsourcing extension – create job from template



Crowdsourcing extension – upload data



Availability of LODRefine & extensions



Demonstration

Top 50 summer books by Forbes
•  Creating project
•  Preparing data
•  Reconciling, extending data with DBpedia

Reconciliation evaulation for NHL players (links extracted from blogs)
•  Create crowdsourcing job from template
•  Upload data to CrowdFlower



Contact
Zemanta Other extensions – resources
Celovska 32, SI-1000 Ljubljana, Slovenia
RDF extension
Presenter Webpage: http://refine.deri.ie/
Mateja Verlic Github: https://github.com/fadmaa/grefine-rdf-extension
Email: mateja.verlic@zemanta.com
Twitter: @sparkica NER extension
Skype: mverlic Webpage: http://freeyourmetadata.org/named-entity-extraction/
Github: https://github.com/RubenVerborgh/Refine-NER-Extension

LODRefine and extensions – resources
LOD2 project & Webinars
LODRefine LOD2 project: http://lod2.eu
Webpage: http://code.zemanta.com/sparkica Webinar series: http://lod2.eu/BlogPost/webinar-series
Github: https://github.com/sparkica/OpenRefine/tree/lodrefine
OpenRefine Resources
Extensions Google Group: https://groups.google.com/forum/#!forum/openrefine
DBpedia extension: https://github.com/sparkica/dbpedia-extension Github: https://github.com/OpenRefine/OpenRefine/
Crowdsourcing extension: Wiki: https://github.com/OpenRefine/OpenRefine/wiki
https://github.com/sparkica/crowdsourcing
Refine-stats extension: https://github.com/sparkica/refine-stats
Utlitities extension: https://github.com/sparkica/utilities

Thanks for your attention!
LOD2 Webinar . 29.11.2011 . Page 28
http://lod2.eu
http://lod2.eu


Credits

Jingle R.E.M., Martin Kaltenböck, Florian Kondert
Coordination Thomas Thurner
Martin Kaltenböck
Moderation Martin Kaltenböck
Presented by Mateja Verlič



Hope
you
enjoyed
staying
with
us
–
if
you
need
more
detailed

informaAon,
visit
us
at
www.lod2.eu
and
let
us
know
how
we
can

improve
to
meet
your
expectaAons!

Don’t
forget
to
register
for
our
next
webinar

26.02.
2013
–
dbPedia
Spotlight
(University
of
Mannheim)

27.03.
2013
–
CKAN
and
publicdata.eu
(Open
Knowledge
FoundaAon)

Have
a
great
day
and
don’t
forget
...

http://lod2.eu


http://lod2.eu

LOD2 Webinar Series: Zemanta / Open refine

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to LOD2 Webinar Series: Zemanta / Open refine

Similar to LOD2 Webinar Series: Zemanta / Open refine (20)

More from LOD2 Creating Knowledge out of Interlinked Data

More from LOD2 Creating Knowledge out of Interlinked Data (14)

LOD2 Webinar Series: Zemanta / Open refine