Infovore: An Open Source MapReduce Framework For Processing Graph Data
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Infovore: An Open Source MapReduce Framework For Processing Graph Data

  • 2,551 views
Uploaded on

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,551
On Slideshare
2,504
From Embeds
47
Number of Embeds
1

Actions

Shares
Downloads
8
Comments
0
Likes
5

Embeds 47

http://www.linkedin.com 47

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Infovore, an Open-Source Map/ReduceFramework For Processing GraphDataPaul HouleOntology2
  • 2. 2+ billion facts, 20+ gb!
  • 3. the data your project needs
  • 4. Why handle complete data sets?Quality PerimeterInfovore
  • 5. RDF Tools vs.Invalid TriplesImage cc-by from arj03
  • 6. Scaling Limits of Triple StoresCPU Main MemoryCPUCPUCPUCPUCPURandom-access bottleneckHard Drive or Flash Storage
  • 7. Map/Reduce conserves memory!Image cc-by-sa from Anua22a
  • 8. Partitioning Datamd5(“http://dbpedia.org/resource/Tree”) =b78f8f508982ceb4e8dd3510fac75f62331 332330 333 334 335… …
  • 9. If you really try it…331332330333334 335… …
  • 10. Preprocessing Freebase• Expand prefixes• Remove• fbase:type.type.instance• fbase:type.type.expected_by• rdfs:type w/ fbase:* subject• Reverse• Fbase:type.permission.controls• Fbase:dataworld_gardening_hint.replaced_by• Rewrite• Fbase:type.object.type to rdfs:type
  • 11. Parallel Super Eyeball
  • 12. sort | uniq:Surgeon a :Occupation .:Surgeon rdfs:label “Surgeon” @en.:Surgeon :mustHave :Md.:Tree a :Plant .:Tree rfs:label “Tree” @en .:Tree :has :Leaves .:Victory a :AbstractConcept .:Vectory rdfs:label “Victory” .:Victory :emotialTone :Positive .
  • 13. Huge scalability…:Tree:Victory:SurgeonMain memory
  • 14. Pig, Hadoop and All That…Source: http://www.dbis.informatik.hu-berlin.de/forschung/projekte/query-optimization-in-rdf-databases.html
  • 15. Monitoring for Quality ControlOperational Statistics(rdf)Preprocess Partition Clean Sort Classify Filter
  • 16. :basekb
  • 17. Parallel Loading into Triple Stores331 332330 333 334 335… …Openlink Virtuoso4x Speedup
  • 18. :basekb lite:Freebase:Chosenfacts:Rulebox:Chosentopics
  • 19. rdf diff
  • 20. See for yourselfhttps://github.com/paulhoule/infovore/wiki