Dataincubator

•Download as ODP, PDF•

1 like•773 views

Slides from my talk at the Sept'09 Linked Data Meetup in London. The talk introduces the DataIncubator.org project, reviewing its aims and progress to date.

Technology Travel

DataIncubator.org What Is It? And What's In It? Leigh Dodds London Linked Data Meetup 9 th September 2009 http://creativecommons.org/licenses/by/2.0/uk/

Talis Connected Commons Free 50 million triple store No service usage limits PDDL or CC0 License http://www.talis.com/cc

<http://nasa.dataincubator.org> a void:DataSet ; dc:title “NASA Space Flight data” ; dc:source <http://nssdc.gsfc.nasa.gov/nmc/>

<http://ol.dataincubator.org> a void:DataSet ; dc:title “OpenLibrary Data” ; dc:source <http://openlibrary.org>

<http://periodicals.dataincubator.org> a void:DataSet ; dc:title “Linked Periodicals” ; dc:source <http://www.nlm.nih.gov/> ; dc:source <http://highwire.stanford.edu> ; dc:source <http://crossref.org> ;

<http://discogs.dataincubator.org> a void:DataSet ; dc:title “Discogs” ; dc:source <http://www.discogs.com>

<http://airports.dataincubator.org> a void:DataSet ; dc:title “OurAirports” ; dc:source <http://ourairports.com>

My Personal Wishlist* *...or Stuff I've Not Gotten Around To Yet, But Maybe You'd Like To Help? ** ** ...or maybe just do, so I can reclaim my life?

Lego peeron.com lugnet.com bricklink.com

What's hot

Linked DataDanny Ayers

Data on the web - an inconvenient truthmarcobrattinga

Awash With DataLeigh Dodds

Linked Data Overview - AGI Technical SIGChris Ewing

Locah Project Show and TellAdrian Stevenson

LOCAH Project and Considerations of Linked Data ApproachesAdrian Stevenson

Open data highs-lows-big-unknowns - Data DaysSarahBuelens

Open Data - The Fingal PerspectiveFingal Open Data

Aggregation Using Linked Data – LOCAH Project ExperiencesAdrian Stevenson

Open Government & Fingal Open DataFingal Open Data

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp

Open Data Talk at a Wiki Advocate Training Workshop in Cape Town - 27 Februar...Code for South Africa

URI Disambiguation in the Context of Linked Databutest

What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...Emily Nimsakont

Publishing and Using Linked Dataostephens

BIBFRAME as a Library Linked Data StandardThomas Meehan

A Framework for Verifying the Fixity of Archived Web Resourcesmaturban

Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked DataAdrian Stevenson

"Research Data: Management, Access, Control" Symposium at the University at B...Charles Lyons

How to start editing Wikidata (for Wikipedians and GLAM staff)Sandra Fauconnier

What's hot (20)

Linked Data

Data on the web - an inconvenient truth

Awash With Data

Linked Data Overview - AGI Technical SIG

Locah Project Show and Tell

LOCAH Project and Considerations of Linked Data Approaches

Open data highs-lows-big-unknowns - Data Days

Open Data - The Fingal Perspective

Aggregation Using Linked Data – LOCAH Project Experiences

Open Government & Fingal Open Data

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?

Open Data Talk at a Wiki Advocate Training Workshop in Cape Town - 27 Februar...

URI Disambiguation in the Context of Linked Data

What Is Linked Data, and What Does it Mean for Libraries? ALAO TEDSIG Spring ...

Publishing and Using Linked Data

BIBFRAME as a Library Linked Data Standard

A Framework for Verifying the Fixity of Archived Web Resources

Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

"Research Data: Management, Access, Control" Symposium at the University at B...

How to start editing Wikidata (for Wikipedians and GLAM staff)

Viewers also liked

Fanhu.bzLeigh Dodds

The RDF Report Card: Beyond the Triple CountLeigh Dodds

Talis Platform: A Linked Data EngineLeigh Dodds

Cheap bots done quick lightning talkLeigh Dodds

Open data in bathLeigh Dodds

Bath: Hacked Learning Night: Introduction to CartoDBLeigh Dodds

Viewers also liked (6)

Fanhu.bz

The RDF Report Card: Beyond the Triple Count

Talis Platform: A Linked Data Engine

Cheap bots done quick lightning talk

Open data in bath

Bath: Hacked Learning Night: Introduction to CartoDB

Similar to Dataincubator

Open (linked) bibliographic data edmund chamberlain (university of cambridge)RDTF-Discovery

Open (linked) bibliographic dataEdmund Chamberlain

Lisa green oss deckOpen Science Summit

WoT 2013 InteropMichael Blackstock

RDFa From Theory to PracticeAdrian Stevenson

Semantic Web, an introduction for bioscientistsEmanuele Della Valle

George James :: Querying The Webgeorge.james

Introduction to Linked DataThomas Meehan

Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...Eric D. Boyd

The Impact of BibframeThomas Meehan

DC-2008 Tutorial 3 - Dublin Core and other metadata schemasMikael Nilsson

Visualize open data with Plone - eea.daviz PLOG 2013Antonio De Marinis

20100614 ISWSA KeynoteAxel Polleres

Semantic Web Services: A RESTful ApproachOtavio Ferreira

Consuming open and linked data with open source toolsJoanne Cook

Open Data and CKAN Data Cataloguesdavid-read

DTL Partners Event - FAIR Data Tech overview - Day 1Luiz Olavo Bonino da Silva Santos

Semantic PuzzleAndreas Blumauer

Developments in catalogues and data sharingEdmund Chamberlain

Gaining the Knowledge of the Open Data Protocol (OData) - Prairie Dev ConWoodruff Solutions LLC

Similar to Dataincubator (20)

Open (linked) bibliographic data edmund chamberlain (university of cambridge)

Open (linked) bibliographic data

Lisa green oss deck

WoT 2013 Interop

RDFa From Theory to Practice

Semantic Web, an introduction for bioscientists

George James :: Querying The Web

Introduction to Linked Data

Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...

The Impact of Bibframe

DC-2008 Tutorial 3 - Dublin Core and other metadata schemas

Visualize open data with Plone - eea.daviz PLOG 2013

20100614 ISWSA Keynote

Semantic Web Services: A RESTful Approach

Consuming open and linked data with open source tools

Open Data and CKAN Data Catalogues

DTL Partners Event - FAIR Data Tech overview - Day 1

Semantic Puzzle

Developments in catalogues and data sharing

Gaining the Knowledge of the Open Data Protocol (OData) - Prairie Dev Con

Recently uploaded

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Why Teams call analytics are critical to your entire businesspanagenda

Real Time Object Detection Using Open CVKhem

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Manulife - Insurer Innovation Award 2024The Digital Insurer

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Recently uploaded (20)

presentation ICT roal in 21st century education

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Exploring the Future Potential of AI-Enabled Smartphone Processors

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

How to Troubleshoot Apps for the Modern Connected Worker

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

HTML Injection Attacks: Impact and Mitigation Strategies

Artificial Intelligence Chap.5 : Uncertainty

Why Teams call analytics are critical to your entire business

Real Time Object Detection Using Open CV

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Partners Life - Insurer Innovation Award 2024

Manulife - Insurer Innovation Award 2024

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Boost PC performance: How more available memory can improve productivity

Dataincubator

1. DataIncubator.org What Is It? And What's In It? Leigh Dodds London Linked Data Meetup 9 th September 2009 http://creativecommons.org/licenses/by/2.0/uk/

2. Keep Up The Good Work

3. Show Don't Tell

4. Sustainability

5. Talis Connected Commons Free 50 million triple store No service usage limits PDDL or CC0 License http://www.talis.com/cc

6. Community Norms

7. Linking and Attribution

8. Open Code & Open Data

9. Link Maintenance

10. So What's In It?

11. <http://nasa.dataincubator.org> a void:DataSet ; dc:title “NASA Space Flight data” ; dc:source <http://nssdc.gsfc.nasa.gov/nmc/>

12. <http://ol.dataincubator.org> a void:DataSet ; dc:title “OpenLibrary Data” ; dc:source <http://openlibrary.org>

13. <http://periodicals.dataincubator.org> a void:DataSet ; dc:title “Linked Periodicals” ; dc:source <http://www.nlm.nih.gov/> ; dc:source <http://highwire.stanford.edu> ; dc:source <http://crossref.org> ;

14. <http://discogs.dataincubator.org> a void:DataSet ; dc:title “Discogs” ; dc:source <http://www.discogs.com>

15. <http://airports.dataincubator.org> a void:DataSet ; dc:title “OurAirports” ; dc:source <http://ourairports.com>

16. My Personal Wishlist* *...or Stuff I've Not Gotten Around To Yet, But Maybe You'd Like To Help? ** ** ...or maybe just do, so I can reclaim my life?

17. Prelinger Archives

18. Lego peeron.com lugnet.com bricklink.com

19.

20. Thanks!

Editor's Notes

Afternoon everyone, I wanted to take the opportunity today to tell you about the DataIncubator project, provide an overview of what the project is about and some of the datasets that we've accumulated so far.
This community has had a huge amount of success in bootstrapping the Linked Data cloud. There's a definite sense of momentum, and the size of the audience today testifies to the growing interest in the technology. Personally I think that the key challenges ahead relate to demonstrating the usefulness of that data. Now that we have it, what can we do with it? BUT we shouldn't lose sight of the fact that there's still a huge amount of evangelism to be done, and a great deal of data that could still be part of the web of data. In short we need to keep up the process of accumulating data in as many different subject areas and disciplines as possible.
The DataIncubator aims to help achieve that, by adopting the same “show don't tell” approach that has worked to date. i.e. actually convert some data, and show how it can be used and inter-linked. Practical evangelism, you might call it.
A goal of the project is to come up with a sustainable way to manage these dataset conversions that makes it increasingly easy to carry them out, and to show the benefits. But also, crucially, that it is possible for the original owners or curators of the data to build on the initial community efforts. The DataIncubator project was started by Ian Davis. And Ian, myself and a number of our other colleagues have been involved in getting the first datasets converted. The project isn't a formal Talis project though. It just happens that in our off hours from building an platform for publishing Linked Data, we like to relax by, erm, publishing Linked Data. (I'm really trying not to think about that too closely).
But where Talis is supporting the project is through the Connected Commons scheme. This is an initiative that we launched about 6 months ago with the aim of helping to support these kinds of boot-strapping efforts, as well as providing a sustainable approach for publishing open, public domain data. Something which we're quite passionate about. The deal with the scheme is that, if your data is published, using one of the two available open data licenses, then you can use the Platform for free. This gives you a online triple store, complete with SPARQL endpoint and integrated search engine which you can use without any service usage limits. There's a soft limit of 50m triples there currently, this is arbitrary, but provides plenty of space to play with. So this is one way that the project is achieving sustainability by building on this offer from Talis
There are also some community norms that we're hoping to build around the process of converting and publishing datasets. Hopefully some of these can carry through...
The first of these is to ensure that there is a sufficient amount of linking and attribution. Every dataset should reference its original sources, not just at a high-level, e.g. in the VOID description, but at a deeper level, so resources can be associated with, for example, the original web pages that describe them, etc. Attribution is an important community norm that we should be adopting anyway, but its especially important in this context as we want to ensure that the original curators of the data don't think that the community is trying to appropriate or steal its work. Quite the opposite, we want them to embrace it.
The other aim is to ensure that both the code and the data are open. The original data owners can build on the work of the community by making use of the effort put into the modelling and inter-linking of the data with other sources. This helps to lower the barrier of entry for an organization wanting to expose Linked Data. A lot of the ground work is in place. By ensuring that the data conversion and hosting code is open there's also potential for the original owner to use that too. This is unlikely, especially if the code is just scraping a website. But by ensuring that the code is open it means that if the original evangelist in the dataincubator community gets bored or moves on, then other people can build on their original work much more easily. This spreads the maintenance cost and makes it easier to manage the converted dataset. There are quite a few RDF and LinkedData conversions littered around the web that use the same data but apply different models. And its hard to tell which is most used or which project is most active. We should be taking steps to avoid that.
The other aspect is ensure that the data links are as stable as possible. The ideal outcome is that a data owner will see the benefits of embracing Linked Data, at which point the data in DataIncubator and the Platform becomes obsolete. We won't need a secondary source any longer. So the project aims to commit to supporting this by redirecting URIs from the incubated data to its new primary home.
OK, so thats the basics of what we're attempting. A continuation of the Linking Open Data project but with an eye on making it easier to build community of people around managing the conversion of specific datasets But whats actually be done so far?
One of my projects is a conversion of the Nasa Space Science Dataset. This currently provides access to over 6000 space flight launches and descriptions of a slightly higher number of satellites. My overall goal is to try and knit together the wide range of open data published by NASA into a more coherent whole. Currently its scattered across a number of different sites and services. I've also supplemented the data with information on the Apollo missions so its possible to find out who played which role on which mission. Should be heaven for space geeks. Pun intended.
The OpenLibrary project provides a number of data exports, including Linked Data. But despite some recent improvements, the modelling of the data is not ideal. OpenLibrary also only exposes a subset of the underlying bibliographic data, some of which was originally donated to the project by Talis. This incubation effort aims to show some alternative ways that the OL data could be modelled and published.
The Linked Periodicals dataset currently holds data about academic journals and publications. The data set is a merge of data from the National Library of Medicine, Highwire, and CrossRef (a not-for-profit who work in the publishing industry). The project was started as a result of someone in the research community looking for an integrated set of journal data. So we took the NLM data and converted that. Highwire then offered up their journal lists for us to add to it, and CrossRef have done the same for their publisher lists. CrossRef are also relicensing the original data to make it more open as a result of this. Nice small example of the process in action.
Discogs is a conversion of the discogs.com community managed music database. Its similar in scope to MusicBrainz but comes from a community orientated towards trading and selling records and music. The core data is in the public domain and so far I've converted and loaded data on over a million artists and 600,000 different music labels. There is a huge amount of additional data about music releases, tracks, and roles that artists have played in writing, recording, etc the music. I've put up around 250,000 releases worth of data so far to get some feedback on the modelling.
Finally there is the Airports dataset which is a conversion of the data published by ourairports.com. This is another community managed dataset that has data on thousands of different airports and includes a rich set of information including information on runways, links to Yahoo Weather reports, etc.
So thats where we're at currently. I've got a personal wish-list of additional datasets I'd like to convert. And this seems like the ideal audience to ask for help
The first is the Prelinger Archives. This is part of the Internet Archive (which is a whole other dataset of its own). The archives consist of over 2000 industrial, educational, travel, and propaganda videos published from 1903 to the 1970s. The content is completely in the public domain so it just begging to be converted. Would be a great dataset on which to explore modelling of media, media annotations and the like.
The other is Lego. Its a continual disappointment to me that there's not some Lego data on the Linked Data web. Surely the confluence of geekery involved demands that it happens? To my joy and shame I've already scoped this one out and there's a huge amount of open data ranging from the pantone colours of individual lego bricks through to complete parts lists for every lego sets. All of which has been crowd-sourced. Wouldn't it be cool to be able to navigate all that? And then maybe someone can apply some reasoning to tell me which lego sets I can build with parts I already have?
I've even started making in-roads into evangelising to the core Lego community. Here's the result of me trying to teach my son about the core RDF model. We were using Star Wars because he's a domain expert in that. The irony is that within a few minutes he was criticising my modelling.
So thats enough from me. If you're interested in learning more about dataincubator, the Talis Connected Commons or the Platform then I'll be around for the rest of the day.

Dataincubator

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Dataincubator

Similar to Dataincubator (20)

More from Leigh Dodds

More from Leigh Dodds (20)

Recently uploaded

Recently uploaded (20)

Dataincubator

Editor's Notes