High-PerformanceStorage Services   With Java and HailDB      Sunny Gleason      April 14, 2011
whoami• Sunny Gleason, human• passion: distributed systems engineering• previous...   Ning : custom social networks   Amaz...
whereami• twitter : twitter.com/sunnygleason• github : github.com/sunnygleason• linkedin : linkedin.com/in/sunnygleason• s...
what’s in this presentation?  • MySQL & NoSQL as Inspiration  • HailDB & InnoDB  • JNA: Integration with Java  • St8 : A R...
prior art•   Mad props to:    •   MySQL & InnoDB teams for creating InnoDB        and Embedded InnoDB    •   Stewart Smith...
MySQL & InnoDB• Super-Efficient Database Server• Tried & True Replication• Bulletproof Durability (when configured  correctl...
motivation• database on 1 box : ok• database with master/slave replication : ok• database on cluster : tricky• database on...
NoSQL• “Not Only” SQL• What’s the point?• Proponent: “reaching next level of scale”• Cynic: “cloud is hype, ops nightmare”
what does it gain?• Higher performance, scalability, availability• More robust fault-tolerance• Simplified systems design• ...
what does it lose?• Reduced / simplified programming model• No ad-hoc queries, no joins, no txns• Not ACID: Weakened Atomic...
NoSQL Map                                KV Stores                                (volatile)           Memcached,         ...
durable vs. volatile• RAM is ridiculous speed (ns), not durable• Disk is persistent and slow (3-7ms)• RAID eases the pain ...
performance &                      operational complexity*                                                         + Shard...
just a thought...What if we could use the highly optimized &durable ‘guts’ of MySQL without having to gothrough JDBC & SQL?
enter HailDB• use case:Voldemort Storage Engine• let’s evaluate relative to other NoSQL  options• focus on stability & pre...
Voldemort schema_key VARBINARY(200)_version VARBINARY(200)_value BLOBPRIMARY KEY(_key, _version)
experimental setup• OS X: 8-Core Xeon, 32GB RAM, 200GB  OWC SSD• Faban Benchmark : PUT 64-byte key, 1024-  byte value• Sce...
BDB-JE• Log-Structured B-Tree• Fast Storage When Mostly Cached• Configured without fsync() by default -  writes are batched...
Perf: BDB Put 100%
Krati• Fast Hash-Oriented Storage• Uses memory-mapped files for speed• Configured without fsync() by default -  writes are b...
Perf: Krati Put 100%
Perf: HailDB Put 100%
HailDB & Java• g414-haildb : where the magic happens• Open Source on GitHub• uses JNA: Java Native Access• dynamic binding...
implementation gotchas• InnoDB API-level usage is unclear• Synchronization & locking is unclear• Therefore... I learned to...
kinder, friendlier APIs• Level 0: JNA bindings    int err = ib_dostuff();• Level 1: Object-Oriented   Transaction t = db.o...
St8 Server• HTTP-enabled Access to HailDB• PUT /1.0/t/mytable  {  "columns":[    {"name":"a","type":"INT","length":4},    ...
rest-enabled access  • GET /1.0/d/mytable;a=0  • POST /1.0/d/mytable;a=1;b=42;c=xyz  • PUT /1.0/d/mytable;a=1;b=43;c=abc  ...
cursors & iterators• GET /1.0/i/mytable.P?q=a+ge+4• GET /1.0/i/mytable.SecIndex?q=b+le+4• GET /1.0/i/mytable.SecIndex?q=b+...
result• REST API provides fun, straightforward  access from Ruby, Python, Java, Command-  line...• very easy benchmarking ...
high-performance counts• GET /1.0/counts/mykey  0• POST /1.0/counts/mykey[?inc=1]  1• POST /1.0/counts/mykey?inc=42  43• D...
counts schema• HailDB count service schema   _id int 8-byte unsigned,   _key_hash int 8-byte unsigned,   _key varchar(80),...
raid0 put counts
ssd put counts
raid0 put/get
ssd put/get
operation: graph store• Social networks, recommendations, any  relation you can think of• Which would you prefer? • SQL ad...
nifty graph store 1                              3                   2       1                            4               ...
nifty graph store 2       1     2     3     4             5           6                         8GET /1.0/graph/topo?a=1&a...
nifty recovery tool                (Just an idea)• for recovery: shut down mysql server• run HailDB-enabled recovery tool•...
wrap-up• HailDB & InnoDB are phenomenal• With g414-haildb, can be integrated directly  into applications running on the JV...
resources• github.com/sunnygleason/g414-st8  github.com/sunnygleason/g414-haildb• haildb.com• jna.dev.java.net
Questions? Thank You!
bonus material!• we probably didn’t get this far in the live  presentation; the following material is here  for eager, bra...
future work• Improve Packaging / Installation• Codify schema refinements & perf  enhancements• Online backup/export with Xt...
InnoDB tuning• Skinny columns, skinny rows! (esp. Primary Key) • Varchar enum ‘bad’, enum, int or smallint ‘good’ • fixed-w...
HailDB schema_key VARBINARY(200)_version VARBINARY(200)_value BLOBPRIMARY KEY(_key, _version)
refined schema_id BIGINT (auto increment)_key_hash BIGINT_key VARBINARY(200)_version VARBINARY(200)_value BLOBPRIMARY KEY(_...
online backup• hot backup of data to other machine /  destination• test Percona Xtrabackup with HailDB• next step: backup/...
JNI bindings• JNI can get 2-5x perf boost vs. JNA• ... at the expense of nasty code• Will go for schema optimizations and ...
Thank You!
Upcoming SlideShare
Loading in...5
×

High-Performance Storage Services with HailDB and Java

3,048

Published on

High-Performance Storage Services with HailDB and Java

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,048
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
49
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

High-Performance Storage Services with HailDB and Java

  1. 1. High-PerformanceStorage Services With Java and HailDB Sunny Gleason April 14, 2011
  2. 2. whoami• Sunny Gleason, human• passion: distributed systems engineering• previous... Ning : custom social networks Amazon.com : infra & web services• now... building cloud infrastructure
  3. 3. whereami• twitter : twitter.com/sunnygleason• github : github.com/sunnygleason• linkedin : linkedin.com/in/sunnygleason• slideshare : slideshare.net/sunnygleason
  4. 4. what’s in this presentation? • MySQL & NoSQL as Inspiration • HailDB & InnoDB • JNA: Integration with Java • St8 : A REST-Enabled Data Store • A Handful of Nifty Applications • Results & Next Steps
  5. 5. prior art• Mad props to: • MySQL & InnoDB teams for creating InnoDB and Embedded InnoDB • Stewart Smith & Drizzle folks for leading the HailDB charge and encouraging plugin apis • Nokia & Percona for publishing results of their Voldemort / MySQL integration • Basho for publishing Riak / InnoStore integration
  6. 6. MySQL & InnoDB• Super-Efficient Database Server• Tried & True Replication• Bulletproof Durability (when configured correctly)• Fantastic Stability, Predictability & Insight into Operation
  7. 7. motivation• database on 1 box : ok• database with master/slave replication : ok• database on cluster : tricky• database on SAN : scary
  8. 8. NoSQL• “Not Only” SQL• What’s the point?• Proponent: “reaching next level of scale”• Cynic: “cloud is hype, ops nightmare”
  9. 9. what does it gain?• Higher performance, scalability, availability• More robust fault-tolerance• Simplified systems design• Easier operations
  10. 10. what does it lose?• Reduced / simplified programming model• No ad-hoc queries, no joins, no txns• Not ACID: Weakened Atomicity / Consistency / Isolation / Durability• Operations / management is still evolving• Challenging to quantify health of system• Fewer domain experts
  11. 11. NoSQL Map KV Stores (volatile) Memcached, Redis KV Stores Dynamo, Key-Value (durable) Voldemort, Store Riak Document StoreNoSQL CouchDB, MongoDB Column Store Cassandra, BigTable, HBase Graph Neo4J Store
  12. 12. durable vs. volatile• RAM is ridiculous speed (ns), not durable• Disk is persistent and slow (3-7ms)• RAID eases the pain a bit (4-8x throughput)• SSD is providing good promise (100-300us)• FusionIO is redefining the space (30-100us)
  13. 13. performance & operational complexity* + Sharding Complexity +FusionIO +SSD MySQL Voldemort +Cluster Memcached 1K 10K 100K 1M Aggregate Operations / Sec* This is not a real graph
  14. 14. just a thought...What if we could use the highly optimized &durable ‘guts’ of MySQL without having to gothrough JDBC & SQL?
  15. 15. enter HailDB• use case:Voldemort Storage Engine• let’s evaluate relative to other NoSQL options• focus on stability & predictability of performance• Graphs are throughput (ops/sec) vs. time
  16. 16. Voldemort schema_key VARBINARY(200)_version VARBINARY(200)_value BLOBPRIMARY KEY(_key, _version)
  17. 17. experimental setup• OS X: 8-Core Xeon, 32GB RAM, 200GB OWC SSD• Faban Benchmark : PUT 64-byte key, 1024- byte value• Scenarios:1, 2, 4, 8 threads• 512M Java Heap
  18. 18. BDB-JE• Log-Structured B-Tree• Fast Storage When Mostly Cached• Configured without fsync() by default - writes are batched and flushed periodically
  19. 19. Perf: BDB Put 100%
  20. 20. Krati• Fast Hash-Oriented Storage• Uses memory-mapped files for speed• Configured without fsync() by default - writes are batched and flushed periodically
  21. 21. Perf: Krati Put 100%
  22. 22. Perf: HailDB Put 100%
  23. 23. HailDB & Java• g414-haildb : where the magic happens• Open Source on GitHub• uses JNA: Java Native Access• dynamic binding to libhaildb shared library• auto-generate initial Java class from .h file (w/ JNAerator)• Pointer classes & other shenanigans
  24. 24. implementation gotchas• InnoDB API-level usage is unclear• Synchronization & locking is unclear• Therefore... I learned to love reading C• Error handling is *nasty*• Native library installation a bit of a pain (need to configure LD_LIBRARY_PATH)
  25. 25. kinder, friendlier APIs• Level 0: JNA bindings int err = ib_dostuff();• Level 1: Object-Oriented Transaction t = db.openTransaction(); t.commit();• Level 2: Templated dbt.inTransaction() { dbt.insert(value); }• Level 3: Functional Maps, Iteration, Filters, Apply
  26. 26. St8 Server• HTTP-enabled Access to HailDB• PUT /1.0/t/mytable { "columns":[   {"name":"a","type":"INT","length":4},   {"name":"b","type":"INT","length":8},   {"name":"c","type":"BLOB","length":0}, ], "indexes":[   {    "name":"P",    "clustered":true,"unique":true,    "indexColumns":[{"name":"a"}]   } ] }
  27. 27. rest-enabled access • GET /1.0/d/mytable;a=0 • POST /1.0/d/mytable;a=1;b=42;c=xyz • PUT /1.0/d/mytable;a=1;b=43;c=abc • DELETE /1.0/d/mytable;a=0*This is matrix-param style, can also use form data style for specifying data
  28. 28. cursors & iterators• GET /1.0/i/mytable.P?q=a+ge+4• GET /1.0/i/mytable.SecIndex?q=b+le+4• GET /1.0/i/mytable.SecIndex?q=b+le+4 &s=abce1212121ceeee2120911• “s” value is opaque index key of next page of results - way better than LIMIT/OFFSET! (since HailDB can seek directly to the row)
  29. 29. result• REST API provides fun, straightforward access from Ruby, Python, Java, Command- line...• very easy benchmarking with HTTP-based performance tools• range query support, and more efficient iteration model for large result sets than MySQL provides
  30. 30. high-performance counts• GET /1.0/counts/mykey 0• POST /1.0/counts/mykey[?inc=1] 1• POST /1.0/counts/mykey?inc=42 43• DELETE /1.0/counts/mykey
  31. 31. counts schema• HailDB count service schema _id int 8-byte unsigned, _key_hash int 8-byte unsigned, _key varchar(80), _count int 8-byte unsigned primary key (“_id”) unique key (“_key_hash”, “key”)
  32. 32. raid0 put counts
  33. 33. ssd put counts
  34. 34. raid0 put/get
  35. 35. ssd put/get
  36. 36. operation: graph store• Social networks, recommendations, any relation you can think of• Which would you prefer? • SQL adjacency list, stored procedure, custom storage engine, external (Memcached), ... • Graph-aware HailDB application in Java
  37. 37. nifty graph store 1 3 2 1 4 5 6 8GET /1.0/graph/bfs?a=1&maxDepth=3 => [[1, 0], [2, 1], [3, 2], [4, 3], [5, 3]]
  38. 38. nifty graph store 2 1 2 3 4 5 6 8GET /1.0/graph/topo?a=1&a=5&a=8 => [8, 6, 4, 3, 2, 5, 1]
  39. 39. nifty recovery tool (Just an idea)• for recovery: shut down mysql server• run HailDB-enabled recovery tool• export as JSON or whatever
  40. 40. wrap-up• HailDB & InnoDB are phenomenal• With g414-haildb, can be integrated directly into applications running on the JVM• All the InnoDB tuning tricks apply• Opens up new applications that are tricky with a traditional SQL database
  41. 41. resources• github.com/sunnygleason/g414-st8 github.com/sunnygleason/g414-haildb• haildb.com• jna.dev.java.net
  42. 42. Questions? Thank You!
  43. 43. bonus material!• we probably didn’t get this far in the live presentation; the following material is here for eager, brave & interested folks...
  44. 44. future work• Improve Packaging / Installation• Codify schema refinements & perf enhancements• Online backup/export with XtraBackup• JNI Bindings• PBXT explorations
  45. 45. InnoDB tuning• Skinny columns, skinny rows! (esp. Primary Key) • Varchar enum ‘bad’, enum, int or smallint ‘good’ • fixed-width rows allow in-place updates• Use covering indexes strategically• More data per page means faster index scans, more efficient buffer pool utilization• You only get so many trx’s (read & write) on given CPU/RAM configuration - benchmark this!• Strategically offload reads to Memcached/Redis
  46. 46. HailDB schema_key VARBINARY(200)_version VARBINARY(200)_value BLOBPRIMARY KEY(_key, _version)
  47. 47. refined schema_id BIGINT (auto increment)_key_hash BIGINT_key VARBINARY(200)_version VARBINARY(200)_value BLOBPRIMARY KEY(_id)KEY(_key_hash)
  48. 48. online backup• hot backup of data to other machine / destination• test Percona Xtrabackup with HailDB• next step: backup/export to Hadoop/HDFS (similar to Cloudera Sqoop tool)
  49. 49. JNI bindings• JNI can get 2-5x perf boost vs. JNA• ... at the expense of nasty code• Will go for schema optimizations and InnoDB tuning tips *first*
  50. 50. Thank You!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×