Your SlideShare is downloading. ×
0
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HBaseCon 2012 | Storing and Manipulating Graphs in HBase

4,249

Published on

Google’s original use case for BigTable was the storage and processing of web graph information, represented as sparse matrices. However, many organizations tend to treat HBase as merely a “web scale” …

Google’s original use case for BigTable was the storage and processing of web graph information, represented as sparse matrices. However, many organizations tend to treat HBase as merely a “web scale” RDBMS. This session will cover several use cases for storing graph data in HBase, including social networks and web link graphs, MapReduce processes like cached traversal, PageRank, and clustering and lastly will look at some lower-level modeling details like row key and column qualifier design, using FullContact’s graph processing systems as a real-world use case.

Published in: Technology, Education
3 Comments
29 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,249
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
214
Comments
3
Likes
29
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Storing and Manipulating Graphs in HBase Dan Lynn dan@fullcontact.com @danklynn
  • 2. Keeps Contact Information Current and Complete Based in Denver, Colorado CTO & Co-Founder
  • 3. Turn Partial Contacts Into Full Contacts
  • 4. Refresher: Graph Theory
  • 5. Refresher: Graph Theory
  • 6. Refresher: Graph Theory rt exVe
  • 7. Refresher: Graph Theory Edg e
  • 8. Social Networks
  • 9. Tweets@danklynn retweeted “#HBase rocks” follows author @xorlev
  • 10. Web Linkshttp://fullcontact.com/blog/ <a href=”...”>TechStars</a> http://techstars.com/
  • 11. Why should you care?Vertex Influence- PageRank- Social Influence- Network bottlenecksIdentifying Communities
  • 12. Storage Options
  • 13. neo4j
  • 14. neo4jVery expressive querying (e.g. Gremlin)
  • 15. neo4jTransactional
  • 16. neo4jData must fit on a single machine :-(
  • 17. FlockDB
  • 18. FlockDBScales horizontally
  • 19. FlockDBVery fast
  • 20. FlockDBNo multi-hop query support :-(
  • 21. RDBMS(e.g. MySQL, Postgres, et al.)
  • 22. RDBMSTransactional
  • 23. RDBMSHuge amounts of JOINing :-(
  • 24. HBaseMassively scalable
  • 25. HBaseData model well-suited
  • 26. HBaseMulti-hop querying?
  • 27. ModelingTechniques
  • 28. Adjacency Matrix1 3 2
  • 29. Adjacency Matrix 1 2 31 0 1 12 1 0 13 1 1 0
  • 30. Adjacency MatrixCan use vectorized libraries
  • 31. Adjacency MatrixRequires O(n2) memory n = number of vertices
  • 32. Adjacency MatrixHard(er) to distribute
  • 33. Adjacency List1 3 2
  • 34. Adjacency List1 2,32 1,33 1,2
  • 35. Adjacency List Design in HBasee:dan@fullcontact.com p:+13039316251 t:danklynn
  • 36. Adjacency List Design in HBase row key “edges” column familye:dan@fullcontact.com p:+13039316251= ... t:danklynn= ...p:+13039316251 e:dan@fullcontact.com= ... t:danklynn= ...t:danklynn e:dan@fullcontact.com= ... p:+13039316251= ...
  • 37. Adjacency List Design in HBase row key “edges” column familye:dan@fullcontact.com p:+13039316251= ... t:danklynn= ... at to W e?hp:+13039316251 e:dan@fullcontact.com= ... st or t:danklynn= ...t:danklynn e:dan@fullcontact.com= ... p:+13039316251= ...
  • 38. Custom Writablespackage org.apache.hadoop.io;public interface Writable { void write(java.io.DataOutput dataOutput); void readFields(java.io.DataInput dataInput);} java
  • 39. Custom Writablesclass EdgeValueWritable implements Writable { EdgeValue edgeValue void write(DataOutput dataOutput) { dataOutput.writeDouble edgeValue.weight } void readFields(DataInput dataInput) { Double weight = dataInput.readDouble() edgeValue = new EdgeValue(weight) } // ...} groovy
  • 40. Don’t get fancy with byte[]class EdgeValueWritable implements Writable { EdgeValue edgeValue byte[] toBytes() { // use strings if you can help it } static EdgeValueWritable fromBytes(byte[] bytes) { // use strings if you can help it }} groovy
  • 41. Querying by vertexdef get = new Get(vertexKeyBytes)get.addFamily(edgesFamilyBytes)Result result = table.get(get);result.noVersionMap.each {family, data -> // construct edge objects as needed // data is a Map<byte[],byte[]>}
  • 42. Adding edges to a vertexdef put = new Put(vertexKeyBytes)put.add( edgesFamilyBytes, destinationVertexBytes, edgeValue.toBytes() // your own implementation here)// if writing directlytable.put(put)// if using TableReducercontext.write(NullWritable.get(), put)
  • 43. Distributed Traversal / Indexinge:dan@fullcontact.com p:+13039316251 t:danklynn
  • 44. Distributed Traversal / Indexinge:dan@fullcontact.com p:+13039316251 t:danklynn
  • 45. Distributed Traversal / Indexinge:dan@fullcontact.com p:+13039316251 Pi v ot v e rt ex t:danklynn
  • 46. Distributed Traversal / Indexing e:dan@fullcontact.com p:+13039316251Ma pReduce ove rout bou nd edges t:danklynn
  • 47. Distributed Traversal / Indexing e:dan@fullcontact.com p:+13039316251Em it vertexes an d edgedat a gro upe d bythe piv ot t:danklynn
  • 48. Distributed Traversal / Indexing Re duc e key p:+13039316251“Ou t” vertex e:dan@fullcontact.com t:danklynn“In” vertex
  • 49. Distributed Traversal / Indexinge:dan@fullcontact.com t:danklynnRe duc er em its higher-order edge
  • 50. Distributed Traversal / IndexingIte rat ion 0
  • 51. Distributed Traversal / IndexingIte rat ion 1
  • 52. Distributed Traversal / IndexingIte rat ion 2
  • 53. Distributed Traversal / Indexing Reuse edges created during previ ous iterat ionsIte rat ion 2
  • 54. Distributed Traversal / IndexingIte rat ion 3
  • 55. Distributed Traversal / Indexing Reuse edges created during previ ous iterat ionsIte rat ion 3
  • 56. Distributed Traversal / Indexing hop s req uires on ly ite rat ion s
  • 57. Tips / Gotchas
  • 58. Do implement your own comparatorpublic static class Comparator extends WritableComparator { public int compare( byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { // ..... }} java
  • 59. Do implement your own comparatorstatic { WritableComparator.define(VertexKeyWritable, new VertexKeyWritable.Comparator())} java
  • 60. MultiScanTableInputFormatMultiScanTableInputFormat.setTable(conf, "graph");MultiScanTableInputFormat.addScan(conf, new Scan());job.setInputFormatClass( MultiScanTableInputFormat.class); java
  • 61. TableMapReduceUtilTableMapReduceUtil.initTableReducerJob( "graph", MyReducer.class, job); java
  • 62. ElasticMapReduce
  • 63. Elastic MapReduceHFi les
  • 64. Elastic MapReduceHFi les Copy to S3 Seq uen ceFiles
  • 65. Elastic MapReduceHFi les Copy to S3 Elastic MapReduce Seq uen ceFiles Seq uen ceFiles
  • 66. Elastic MapReduceHFi les Copy to S3 Elastic MapReduce Seq uen ceFiles Seq uen ceFiles
  • 67. Elastic MapReduceHFi les Copy to S3 Elastic MapReduce Seq uen ceFiles Seq uen ceFiles HFileOutputFormat.configureIncrementalLoad(job, outputTable) HFi les
  • 68. Elastic MapReduceHFi les Copy to S3 Elastic MapReduce Seq uen ceFiles Seq uen ceFiles HFileOutputFormat.configureIncrementalLoad(job, outputTable) HFi les HBase $ hadoop jar hbase-VERSION.jar completebulkload
  • 69. Additional ResourcesGoogle Pregel: BSP-based graph processing systemApache Giraph: Implementation of Pregel for HadoopMultiScanTableInputFormat: (code to appear on GitHub)Apache Mahout - Distributed machine learning on Hadoop
  • 70. Thanks!dan@fullcontact.com

×