I want to be a Data DJ!
Upcoming SlideShare
Loading in...5
×
 

I want to be a Data DJ!

on

  • 1,149 views

This talk provides an overview of my work towards enabling Data DJs. That is enabling users to create, remix, record, and share their data analyses as easily as DJs make and share mixes. The talk ...

This talk provides an overview of my work towards enabling Data DJs. That is enabling users to create, remix, record, and share their data analyses as easily as DJs make and share mixes. The talk touches on a variety of topics including linked data, scientific workflows, provenance, enterprise mashups and Facebook. It draws these topics into a unified research framework and discusses future research directions.

Statistics

Views

Total Views
1,149
Slideshare-icon Views on SlideShare
1,045
Embed Views
104

Actions

Likes
0
Downloads
8
Comments
0

3 Embeds 104

http://thinklinks.wordpress.com 99
http://www.linkedin.com 4
http://polosclubdj.blogspot.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Title: I want to be a Data DJ! Abstract: This talk provides an overview of my work towards enabling Data DJs. That is enabling users to create, remix, record, and share their data analyses as easily as DJs make and share mixes. The talk touches on a variety of topics including linked data, scientific workflows, provenance, enterprise mashups and Facebook. It draws these topics into a unified research framework and discusses future research directions.
  • Because of I want an audience….not really….
  • Records Simple components (effects, fades) chained together: workflow Whole albums of dj Creativity (through on a new record – backtrack) – fast to novelty You can continually improve because it’s easy to revisit and remix The ability to remix enables combinatorial innovation
  • Intuitevly….
  • 1800: Interchangeable parts 1900: Gasoline engine 1960: Integrated circuits 1995-now: Internet
  • Web services is lower case because not about SOA… Flickr, Google Maps, Twitter,
  • not easy enough for the user… or developers
  • Records = data and data discovery Turntables = components and composition Recording = capturing what’s gone on
  • Data
  • Common apis= sparql and rdf Things like factual and yql Machine readable data on the web
  • Common apis= “sparql and api
  • I see that there is a technique called “drive across country” and I go ahead and import it.
  • Also if we extract information this is exposed as its own RDF triple. (see the references field)
  • RDF Query Answering using Evolutionary Algorithms
  • Fault-tolerance Data movement Provenance tracking Validation Component Discovery Reproduction
  • A proliferation of boxes and arrow diagrams
  • Natural instruction…
  • How do people “naturally describe workflows”? Study with myExperiment workflows
  • - Workflow for estimating the maximum accuracy of a model for a set of test data
  • Linked data + mashup (workflow) = a new cool application, but then what? Need for provenance
  • IPOD has 451 parts provided by 10 suppliers… but apple trusts all of them http://pcic.merage.uci.edu/papers/2007/AppleiPod.pdf http://people.ischool.berkeley.edu/~hal/people/hal/NYTimes/2007-06-28.html The problem is not mixing and matching components the problem is the need for provenance
  • Get applications to record process documentation! Log data ! But the key here is to structure that data….
  • Guarantees that documentation will be captured… Attributable, finalizable, process reflecting, You can also just use log4j
  • Say it’s an
  • Condor dag…. Number of jobs
  • How many people have cell phones? How many people understand their cell phone contract?
  • I trust the contract because people I know have told me the
  • Mechanism design, trust because of enforcement
  • Trust based on the artifact itself
  • Availability of support for example
  • Trust based on experience… what you’ve seen before
  • Note that this is not to say these can’t work together

I want to be a Data DJ! I want to be a Data DJ! Presentation Transcript

  • Paul Groth | Vrije Universiteit Amsterdam | pgroth@few.vu.nl Image: http://www.flickr.com/photos/tomk32/2988993409 / All images are under a creative commons license
  • Image: http://www.flickr.com/photos/lyza/2487848260/sizes/l/
  • Image: http://www.flickr.com/photos/gigi_murru/2757085392/sizes/l
    • Set of component technologies that can be combined and recombined to create new innovations
    • From Hal Varian
      • http://people.ischool.berkeley.edu/~hal/
      • Chief Economist at Google
    3. http://www.flickr.com/photos/oskay/1364146497/sizes/m/ 2. http://www.flickr.com/photos/cwalker71/1041784395/sizes/l / 1. http://www.flickr.com/photos/restlessglobetrotter/448362507/sizes/m/ 1 2 3
    • TCP/IP, XML, HTTP, Standard Libraries
    • 1469 Web APIs
      • http://www.programmableweb.com/
  • Image: http://www.flickr.com/photos/davestfu/2157396025/sizes/l/ Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/
    • By 2012, 90M end user programmers in the US alone
      • 13M would describe themselves as programmers
      • 55M will use spreadsheets and databases
      • [Scaffidi et al 06]
    • We have gone from dozens of markets of millions of users to millions of markets of dozens of users
    • [Adams 08]
      • The “long tail of programming” [Anderson 08]
  • 1. Records 2. Turntables and mixers 3. Recording equipment
  • Image: http://www.flickr.com/photos/melodramababs/2446537799/sizes/l/
    • Remixing ++
    • Common, flexible and usable APIs
    • Standard data models
    • Emergence of nice tools
    • Convergence of the Web and the Semantic Web
      • RDFa, OpenCalais, more “other structured data”
    • Remixing ++
    • Common, flexible and usable APIs
    • Standard data models
    • Emergence of nice tools
    • Convergence of the Web and the Semantic Web
      • RDFa, OpenCalais, more “other structured data”
    No more conversion components
  • Shared Techniques [WIKIAI’09 @IJCAI]
  • Open Task Repository
    • http://www.like.nu
    • 57 endpoints
    • Over 1 billion triples
      • Billion Triple Challenge Dataset
      • We made it available on Amazon EC2
        • See http://bit.ly/13FOWT
    • Built on eRDF
      • evolutionary algorithm for searching over triples
      • Christophe Guéret and Stefan Schlobach
  • Image: http://www.flickr.com/photos/danielleblue/170497153/sizes/o/
    • Declaratively capture analysis steps and their dependencies
    • Steps are represented by components
      • Software programs (codes), Web services, …
    • Workflow systems enable the creation, editing and management of workflows and their executions
      • Wings/Pegasus, Taverna, Yahoo Pipes, VizTrails, Kepler, …
  • 7/10/09 SWF 2009
    • Visual programming not inherently superior
      • Green, Nardi, Moher
    • End-user development
      • See Nardi, Repenning and Loannidou, Myers, Lieberman,...
    • Workflows are end-user development environments
  • Title: BLASTP with simplified results returned   Description: This workflow Performs a blastp search on protein sequence, extracts sequence id within the blast report and retrives the corresponding seuqences.[sic] ≅
  • - myexperiment.org - 2300 users - 750 workflows - 160 groups
  •  
  • [IUI’09] [AAAI SS 09] [SWF 2009]
    • High level templates
    • Adapt to availability of components and data
    • Use rich descriptions
    [e-science 09]
  • Data (triples) How were they produced? Which ones should I trust? Who’s responsible? From Chris Bizer From pipes.deri.org
    • Enterprises need to know where and how their data was produced.
      • Uptime
      • Compliance to regulations
      • Quality assurance
      • Performance improvements
      • … .
    http://www.ifixit.com/Teardown/iPod-touch-3rd-Generation/1158/1
  • Image: http://www.flickr.com/photos/seidsvag/122718624/sizes/l/
    • Oxford English Dictionary:
      • the fact of coming from some particular source or quarter;
      • origin , derivation
      • the history or pedigree of a work of art, manuscript, rare book, etc.;
      • concretely, a record of the ultimate derivation and passage of an item through its various owners.
    • Computer representation
    • of provenance
    • Provenance is represented by documentation
    • Provenance is a query answered by searching over documentation
    • Instrument & Collect
    • Collate
    • Query
    • Use
    • Ensure high-quality characteristics
    • Protocol for recording documentation
    • Formalised as an abstract state machine
    • Proofs to ensure these characteristics
    [IEEE TPDS Groth 08]
    • Common logical structure shared by all creating and querying actors
    • Enables the autonomous, asynchronous production of documentation by different application components
    • Open, extensible model
      • XML + OWL serializations
    • Tools can operate on it (e.g. visualisation, reasoning)
    [ACM Toit 08: Groth, Moreau, Miles]
  • [e-Science 08]
  • from esaw09
  • http://www.flickr.com/photos/newbirth/2834643961/
  •  
  • Reputation
  • http://www.flickr.com/photos/el_ramon/3804532661/
  • Content http://www.flickr.com/photos/ogcodes/2095054686/
  • Content Nice Letterhead
  • Content Nice Letterhead Official Seal
  • Content Nice Letterhead Official Seal A particular statement is present
  • Content Nice Letterhead Official Seal ≈ A particular statement is present
    • Works well in open systems
    • Deals with unknown agents
      • Make a decision without reputation information
    • Deals with new agents
      • No need for a system designer to analyse new entrants
    • Deals with agents behaving unexpectedly
    • System for gaining experience about contracts ( provenance )
    • Algorithms for assessment of contract proposals based on prior experience
    • Use prior experience based on provenance to ascertain trust
    [esaw 09]
    • Use prior experience based on provenance to ascertain trust
    Trust of new workflow components
  • The Community http://www.flickr.com/photos/dunechaser/142079357/sizes/o/
    • Provenance Challenges
      • Testing the interoperability of provenance systems
      • 14 Teams – 3 Challenges
      • http://twiki.ipaw.info/bin/view/Challenge/
    • Open Provenance Model
      • Interoperability model for exchanging provenance
      • http://twiki.ipaw.info/bin/view/OPM/
    • The W3C Provenance Incubator Group
      • http://www.w3.org/2005/Incubator/prov/
    • Remixing = combinatorial innovation
    • Make remixing easier
    1. Data and Data Discovery 2. Component exposure and composition 3. Process capture and organization
    • Contact: pgroth@few.vu.nl
    • Follow: http://twitter.com/pgroth
    • Read: http://www.pgroth.com
    • Make syntactic errors hard
    • Make syntactic errors impossible
    • Use objects as language elements
    • Make domain-oriented languages
    • Meta-domain orientation
    • Support incremental development
    • Facilitate decomposable test units
    • Multiple views and incremental disclosure
    • Integrate with the web
    • Encourage syntonicity
    • Allow immersion
    • Scaffold typical designs
    • Community tools