Failing fast with Explore&Query

1,008 views

Published on

If you are going to fail because your assumptions were wrong, do it before you waste your time and resources.

This applies also to promise of Linked Open Data as a data source for analytics. Just that certain data is published does not mean it is useful for certain type of tasks.

This case study here shows how do you quickly validate a semantic data source from quantitative and completeness perspective.

First slideshare on Explore&Quey : http://www.slideshare.net/JrgenKerstna/can-spsrqlexplore-query-with-vinge-tutorial

2 Comments
1 Like
Statistics
Notes
  • That's why the role of the trusted curator is still important. And having the crowd as curator is more important still. Hopefully, Wikidata will help.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Surely many if not most public data collections will have deficiencies of this kind - the question is to determine what use can be made of resources that you probably won't be able to improve?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,008
On SlideShare
0
From Embeds
0
Number of Embeds
69
Actions
Shares
0
Downloads
10
Comments
2
Likes
1
Embeds 0
No embeds

No notes for slide

Failing fast with Explore&Query

  1. 1. Failing Fast with Explore&Query © 2013 Vinge Free AB, Sweden
  2. 2. Failing fast is a good thing If you are going to fail because your assumptions were wrong, do it before you waste your time and resources. This applies also to promise of Linked Open Data as a data source for analytics. Just that certain data is published does not mean it is useful for certain type of tasks. This case study here shows how do you quickly validate a semantic data source from quantitative and completeness perspective. The tool used is Free Edition of Explore&Query from Vinge Free. © 2013 Vinge Free AB, Sweden
  3. 3. Film industry analysis The case study goes as following: DBpedia contains information about films, their directors, their producers, their release dates, starring actors, composers, budget etc. - seemingly everything needed to produce a comprehensive statistical analysis about film industry. Let’s get started by taking a look in the data and try to do a simple initial quantification how much facts do we actually have in our disposal and do we believe it provides sufficiently accurate statistical base for not getting a skewed results. © 2013 Vinge Free AB, Sweden
  4. 4. The main concept in film industry is film. Lets find out how many films there is in DBpedia: there are 78,000 films Films © 2013 Vinge Free AB, Sweden
  5. 5. But over 13,000 films are creations of no one? Every films should have a director? © 2013 Vinge Free AB, Sweden
  6. 6. But even though films usually have several stars we seem to keep seeing our list shrinking Every film have starring actors © 2013 Vinge Free AB, Sweden
  7. 7. reveals that cost of the film and its turnover is only sparsely available Budget of films v. sales © 2013 Vinge Free AB, Sweden
  8. 8. Is the money spent on quality or quantity? Leaking here as well Length of the films v. cost © 2013 Vinge Free AB, Sweden
  9. 9. Adding release date, studio, distributor, composer Only 400 films left from 77,000 More fields of interest © 2013 Vinge Free AB, Sweden
  10. 10. There 1769 films without “length” property but with all other We can find exactly how much we lack © 2013 Vinge Free AB, Sweden
  11. 11. If we make everything optional - are we back in business? Not all fields are equally important © 2013 Vinge Free AB, Sweden
  12. 12. The answer is left to the viewer. But it went fast to find out Have we failed or not? © 2013 Vinge Free AB, Sweden
  13. 13. © 2013 Vinge Free AB, Sweden download from http://www.vingefree.com/querybyexplore

×