Facilitating the discovery of public datasets

•Download as PPTX, PDF•

1 like•87 views

The document discusses facilitating the discovery of public datasets. It describes Schema.org, a collaborative project to add metadata to content using microdata, RDFa or JSON-LD formats. It also discusses challenges in identifying and relating datasets, as well as properties for describing datasets, such as name, description, URL, version, and spatial/temporal coverage. An example is given of markup for a seismic hazard zones dataset using these properties.

Internet

Journal Club, April 06, 2017
Nafiseh Navabpour
nafiseh.navabpour@uni-jena.de

Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 2 / 13
 Which dataset is relevant?
 Where does the data come from?
 Is the data reliable?

Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 3 / 13
Schema.org
collaborative project between Google, Yandex, Bing and Yahoo
Content provider use the Schema.org vocabulary with the Microdata, RDFa or JSON-LD
formats to add information inside the content.

Technical / Social / Research Challenges
• Defining more consistently what constitutes a dataset
• Identifying datasets
• Relating datasets to each other
• Propagating metadata between related datasets
• Describing content of datasets
• Many datasets are described in unstructured way
Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 8 / 13

What qualifies as a dataset?
• A table or a CSV file with some data
• A file in a proprietary format that contains data
• A collection of files that together constitute some meaningful dataset
• A structured object with data in some other format that you might want to
load into a special tool for processing
• Images capturing the data
Anything that looks like a dataset to you
Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 9 / 13

Basic dataset properties
itemtype="http://schema.org/Dataset"
• Name
• Description
• URL(s)
• Version number
• Keywords
• Variable Measured
• Creator name (person, organization)
Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 10 / 13

More properties
• Data catalog properties
• Download information properties
• Temporal coverage
• Spatial coverage
• Points
• Coordinates
• Named locations
• Citations and publications
• Provenance and license information
Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 11 / 13

<div itemscope="itemscope" itemtype="http://schema.org/Dataset">
<meta itemprop="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
<span itemprop="name">
<a href="http://www.example.org/story.php?title=seismic-hazard-zones">
<b>Seismic Hazard Zones</b>
</a>
</span>
<span itemprop="temporal">2011</span>, version "<span itemprop="version">2011-Sep-13</span>"
<div itemprop="description">This is a dataset of liquefaction and landslide zones in the state of
California.</div>
<div itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country"
itemid="http://dbpedia.org/resource/United_States">
<i>Country:</i>
<a href="http://en.wikipedia.org/wiki/United_States">
<span itemprop="name">United States</span>
</a>
</div>
...
Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 12 / 13
Example

Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 13 / 13
Example
Bexis 2 User and Developer Conference

Thank you for your attention!
Nafiseh.navabpour@uni-jena.de

References
• https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html
• http://schema.org/
Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets Folie 15

What's hot

On building a search interface discovery systemDenis Shestakov

Usage of Linked Data: Introduction and Application ScenariosEUCLID project

Scripting User Contributed Interlinkingwhalb

Talis Platform: A Linked Data EngineLeigh Dodds

Open data and linked dataMarie Gustafsson Friberger

Metadata lecture riley_2011jmcriley

Semantic Web (Web 3.0)John Dougherty

RDFa: an introductionKai Li

Open data easy, explicit and fastMetaSolutions AB

Hack U Barcelona 2011Peter Mika

Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert

DBpedia InsideOutCristina Pattuelli

Building Linked Data ApplicationsEUCLID project

Linked Datacyriacsmail

Linked Data TutorialMichael Hausenblas

Nobel Prizes as Linked Open DataMetaSolutions AB

Dash: data sharing made easyUniversity of California Curation Center

Linked Data - the Future for Open Repositories?Adrian Stevenson

Linked Library Data: stap voor stapVlaamse Vereniging voor Bibliotheek, Archief & Documentatie vzw (VVBAD)

Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext

What's hot (20)

On building a search interface discovery system

Usage of Linked Data: Introduction and Application Scenarios

Scripting User Contributed Interlinking

Talis Platform: A Linked Data Engine

Open data and linked data

Metadata lecture riley_2011

Semantic Web (Web 3.0)

RDFa: an introduction

Open data easy, explicit and fast

Hack U Barcelona 2011

Linked data demystified:Practical efforts to transform CONTENTDM metadata int...

DBpedia InsideOut

Building Linked Data Applications

Linked Data

Linked Data Tutorial

Nobel Prizes as Linked Open Data

Dash: data sharing made easy

Linked Data - the Future for Open Repositories?

Linked Library Data: stap voor stap

Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage

Similar to Facilitating the discovery of public datasets

160606 data lifecycle project outlineIan Duncan

SemWeb Fundamentals - Info Linking & Layering in PracticeDan Brickley

The Web of data and web data commonsJesse Wang

Plays Well with Others: Getting Your Digital Collection Metadata Ready for th...William Fee

Exploring the Semantic WebRoberto García

ImageSnippets - Using Linked Data Metadata to Organize, Share and Publish you...Margaret Warren

How to use NCI's national repository of big spatial data collectionsARDC

Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch

Open Science Days 2014 - Becker - Repositories and Linked DataPascal-Nicolas Becker

BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYJonathon Hare

Introduction to OmekaShawn Day

Lecture 6 Data Driven DesignSur College of Applied Sciences

Linked Data, Library Users, and the Discovery Tools of the FutureEmily Nimsakont

Fox-Keynote-Now and Now of Data Publishing-nfdp13DataDryad

Data standardization process for social sciences and humanitiesvty

Edinburgh DataShare - DSpace for DataHistoric Environment Scotland

FAIR Dataversevty

Linked Data and Locah, UKSG2011 Jane Stevenson

Drupal, CKAN and Public Data. DrupalGov 08 february 2016Steven De Costa

Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf ...soapconf

Similar to Facilitating the discovery of public datasets (20)

160606 data lifecycle project outline

SemWeb Fundamentals - Info Linking & Layering in Practice

The Web of data and web data commons

Plays Well with Others: Getting Your Digital Collection Metadata Ready for th...

Exploring the Semantic Web

ImageSnippets - Using Linked Data Metadata to Organize, Share and Publish you...

How to use NCI's national repository of big spatial data collections

Linked Data (1st Linked Data Meetup Malmö)

Open Science Days 2014 - Becker - Repositories and Linked Data

BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY

Introduction to Omeka

Lecture 6 Data Driven Design

Linked Data, Library Users, and the Discovery Tools of the Future

Fox-Keynote-Now and Now of Data Publishing-nfdp13

Data standardization process for social sciences and humanities

Edinburgh DataShare - DSpace for Data

FAIR Dataverse

Linked Data and Locah, UKSG2011

Drupal, CKAN and Public Data. DrupalGov 08 february 2016

Felix Sasaki - Value beyond content creation - Introducing ITS 2.0; soapconf ...

Recently uploaded

PHP-based rendering of TYPO3 DocumentationLinaWolf1

Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow

Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb

定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco

定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs

Contact Rya Baby for Call Girls New Delhimiss dipika

Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh

Git and Github workshop GDSC MLRITMgdsc13

Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Font Performance - NYC WebPerf Meetup April '24Paul Calvano

Film cover research (1).pptxsdasdasdasdasdasa494f574xmv

Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard

定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs

A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton

Recently uploaded (20)

PHP-based rendering of TYPO3 Documentation

Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip

Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service

Call Girls Near The Suryaa Hotel New Delhi 9873777170

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作

定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书

定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一

Contact Rya Baby for Call Girls New Delhi

Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝

Git and Github workshop GDSC MLRITM

Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service

Font Performance - NYC WebPerf Meetup April '24

Film cover research (1).pptxsdasdasdasdasdasa

Magic exist by Marta Loveguard - presentation.pptx

定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一

A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)

Facilitating the discovery of public datasets

1. Journal Club, April 06, 2017 Nafiseh Navabpour nafiseh.navabpour@uni-jena.de

2. Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 2 / 13  Which dataset is relevant?  Where does the data come from?  Is the data reliable?

3. Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 3 / 13 Schema.org collaborative project between Google, Yandex, Bing and Yahoo Content provider use the Schema.org vocabulary with the Microdata, RDFa or JSON-LD formats to add information inside the content.

4. Mark up the content using microdata Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 4 / 13 <div itemscope> <h1>Avatar</h1> <span>Director: James Cameron (born August 16, 1954) </span> <span>Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html">Trailer</a> </div> itemscope element

5. Mark up the content using microdata Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 5 / 13 <div itemscope itemtype="http://schema.org/Movie"> <h1>Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span> <span>Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html">Trailer</a> </div> itemscope element itemtype attribute

6. Mark up the content using microdata Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 6 / 13 <div itemscope itemtype ="http://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <span>Director: <span itemprop="director">James Cameron</span> (born ... <span itemprop="genre">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a> </div> itemscope element itemtype attribute itemprop attribute

7. Mark up the content using microdata Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 7 / 13 <div itemscope itemtype ="http://schema.org/Movie"> <h1 itemprop="name">Avatar</h1> <div itemprop="director" itemscope itemtype="http://schema.org/Person"> Director: <span itemprop="name">James Cameron</span> </div> <span itemprop="genre">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a> </div> itemscope element itemtype attribute itemprop attribute Embedded items

8. Technical / Social / Research Challenges • Defining more consistently what constitutes a dataset • Identifying datasets • Relating datasets to each other • Propagating metadata between related datasets • Describing content of datasets • Many datasets are described in unstructured way Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 8 / 13

9. What qualifies as a dataset? • A table or a CSV file with some data • A file in a proprietary format that contains data • A collection of files that together constitute some meaningful dataset • A structured object with data in some other format that you might want to load into a special tool for processing • Images capturing the data Anything that looks like a dataset to you Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 9 / 13

10. Basic dataset properties itemtype="http://schema.org/Dataset" • Name • Description • URL(s) • Version number • Keywords • Variable Measured • Creator name (person, organization) Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 10 / 13

11. More properties • Data catalog properties • Download information properties • Temporal coverage • Spatial coverage • Points • Coordinates • Named locations • Citations and publications • Provenance and license information Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 11 / 13

12. <div itemscope="itemscope" itemtype="http://schema.org/Dataset"> <meta itemprop="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/> <span itemprop="name"> <a href="http://www.example.org/story.php?title=seismic-hazard-zones"> <b>Seismic Hazard Zones</b> </a> </span> <span itemprop="temporal">2011</span>, version "<span itemprop="version">2011-Sep-13</span>" <div itemprop="description">This is a dataset of liquefaction and landslide zones in the state of California.</div> <div itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country" itemid="http://dbpedia.org/resource/United_States"> <i>Country:</i> <a href="http://en.wikipedia.org/wiki/United_States"> <span itemprop="name">United States</span> </a> </div> ... Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 12 / 13 Example

13. Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets 13 / 13 Example Bexis 2 User and Developer Conference

14. Thank you for your attention! Nafiseh.navabpour@uni-jena.de

15. References • https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html • http://schema.org/ Nafiseh.Navabpour@uni-jena.de Facilitating the discovery of public datasets Folie 15

Editor's Notes

This talk is about “facilitating the discovery of public datasets”. Because Birgitta would like to discuss this topic here in this group, i prepared this presentation.
As you know, there is a huge number of data repositories in many different fields and with many different purposes. When we are looking for something, it could be extremely difficult to determine: - Where is the dataset that has the information that I am looking for - Where is the origin of this information? - Is this information reliable? An idea for get the best result is using the same vocabulary in different websites.
An idea for get the best result is using the same vocabulary in different websites. Schema.org is an agreed vocabulary of HTML properties between Google, Yandex, Bing and Yahoo that help search engines understand the meaning of context of a webpage. In this scenario, a mark-up describe type of a thing. But how? Content provider use the Schema.org vocabulary with the Microdata, RDFa or JSON-LD formats to add information inside the content.
I have here an example for mark-up the content, but in microdata format. At first, content provider should add the itemscope element to the HTML tag that encloses information about a particular item. But it is not enough. Content provider should also specifies what kind of an item it is.
Content provider should also specifies what kind of an item it is. For example here the particular item is the movie AVATAR. itemtype attribute comes immediately after the itemscope, and it is provided as URL, defined in the schema.org type hierarchy
Then the properties of an item should be defined by adding itemprop attribute. For example, to identify the director or genre.
Sometimes the value of an item property can itself be another item with its own set of properties. For example, we can specify that the director of the movie is an item of type Person and the Person has the properties name and so on. OK, now we know, how search an item (such as movie or book) with the help of using this kind of mark up is easy. But how about searching data in a scientific dataset?
We are able to use the same method, but There is many technical, social and research challenges. . For example, we have to at first define what a dataset is. . Working with related datasets also is not easy. . We need to know how could we mark up the metadata between related datasets? . Or How to describe the content of a dataset? . How about the unstructured datasets?
It is important to know that a Dataset is anything that looks like a dataset to you.
The mark-up method is the same that we have seen before. The item type is this URL. For each dataset we need to determine some basic properties like the name…
We could also make a dataset more explicit, with determine some more properties…
I think we could also use this kind of mark-up in our websites to define not only for the simple pages such as visitor information or information about people, but also we could determine many datasets: for example publication, talks and so on.
I think we could also use this kind of mark-up in our websites to define not only for the simple pages such as visitor information or information about people, but also we could determine many datasets: for example publication, talks and so on.

Facilitating the discovery of public datasets

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Facilitating the discovery of public datasets

Similar to Facilitating the discovery of public datasets (20)

More from Nafiseh Navabpour

More from Nafiseh Navabpour (11)

Recently uploaded

Recently uploaded (20)

Facilitating the discovery of public datasets

Editor's Notes