g414-inno
Embedded InnoDB,Voldemort, St8,
    and a Few Other Tidbits



        Sunny Gleason
What’s in this Preso

• What is InnoDB?
• Relation to MySQL & Other Products
• InnoDB Model
• g414-inno: a Java Access Lib...
What else is in this Preso

 • Creating a Voldemort Storage Engine with
   Embedded InnoDB
 • St8: A REST-based Storage Se...
What is InnoDB?
• High-Performance “guts” of MySQL
• Finely Tuned B-Tree Storage Engine
• MVCC Transactional Store a la Ji...
Relation to MySQL
• One of many MySQL storage engines
• Transactional, in contrast to MYISAM
• Well-known, Bullet-Proof Ba...
Other Products
• Tokyo BDB, Oracle BDB & BDB-JE
• Schema-Free (No Structure / Data Types)
• Lower Concurrency (fewer write...
InnoDB Model (Logical)
• Database == Tablespace
• Tablespace has Table(s) and Log(s)
• Table has columns (rich datatypes)
...
InnoDB Model (Txns)
• Everything uses a Transaction
• Isolation Level: Serialized, Read Committed, Read
  Uncommitted
• Lo...
InnoDB Model (Physical)
• Tablespace is a collection of pages (16K)
• Pages organized as a B-Tree: infimum &
  supremum key...
Physical Considerations
•   New pages requested from OS in extend_size increments

•   OS Assigns space from file system / ...
!""#$%&$'(')'*+&,-.+*
                                -"!./'0($('012%,$32"
!"#$%&'$()*%#+(,%

                            ...
!""#$%&'()*+,-(.+,
 0)>8$/.)?$
                                            &$'($%#
                                       ...
!""#$%&'#()
                               *      !"#$%&'()*+,              *
                                            ...
!""#$%&!"'()(*&+ ,-./0-1
                                            !$020&-#3*&0-(&*2#-('&
 !          !                 ...
!""#$%&!"'()(*&+ ,(-#"'./0
                                                           !"#$%&'&(
                          ...
How can we use InnoDB?
 • Download Embedded InnoDB or HailDB
 • Use C-API for access to InnoDB tables
 • Innostore: Erlang...
g414-inno Foundations
• Uses JNA (Java Native Access): Like JNI, but
  doesn’t provoke (as much) insanity
• JNAerate: crea...
g414-inno Library
• Provides a more Object-Oriented API to
  mask all of the JNA “Pointer” madness
• Transaction Objects, ...
Use Case:Voldemort
• Voldemort: High-Performance Key-Value
  Store (Amazon Dynamo clone)
• Nokia: good results with Voldem...
Voldemort Storage Engines
• Trivial to integrate new persistence
  mechanisms with Voldemort
• 2 Classes: Config & Storage ...
V Storage Engine: put
•   put(byte[] key, byte[] version, byte[] value)


• Start transaction, open table cursor
• Create ...
V Storage Engine: get
•   get(byte[] key, byte[] version)


• Start transaction, open table cursor
• Create search tuple f...
V Storage Engine: delete
 •   delete(byte[] key)


 • Start transaction, open table cursor
 • Create search tuple for key
...
V Storage Engine: TODO

 • Perform Benchmarks (in EC2, local)
 • Tuning / Optimization
 • Clarify licenses (GPLv2 + Apache...
St8
• Simple, Open Source REST-based Storage Server
• Wraps InnoDB with thin “but pleasant” HTTP API
• Custom Tables using...
St8 Table Def
{
"columns":[
   {"name":"key1","type":"INT","length":4},
   {"name":"key2","type":"VARCHAR","length":50},
 ...
St8 Interface
• Operations for Table Management: create,
  describe, delete, truncate
• Operations for Data Management: Cr...
St8: Sample Requests
SIMPLE GET:

curl "http://localhost:8080/d/atable;key1=123"

INSERT:

curl -X PUT "http://localhost:8...
g414-inno: Faban
      Benchmark
• Row: 4-byte Key, 4096-byte value
• Insert Sequential, Random
• Single disk, 3-disk RAID...
Benchmark Results
                                         Embedded InnoDB Latency (ms)
             20




             1...
Next Steps / Future Work
 • Finish St8: Queries & Iteration, Benchmark
 • Package / Qualify Voldemort Storage Engine
 • In...
Questions?
• Thank you for listening!
References / More Info
• Embedded InnoDB, HailDB (drizzle)
• InnoDB Performance
• GitHub: g414-inno, st8, voldemort, xfaba...
Upcoming SlideShare
Loading in …5
×

InnoDB Magic

2,120 views
2,026 views

Published on

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
2,120
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
35
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

InnoDB Magic

  1. 1. g414-inno Embedded InnoDB,Voldemort, St8, and a Few Other Tidbits Sunny Gleason
  2. 2. What’s in this Preso • What is InnoDB? • Relation to MySQL & Other Products • InnoDB Model • g414-inno: a Java Access Library for InnoDB
  3. 3. What else is in this Preso • Creating a Voldemort Storage Engine with Embedded InnoDB • St8: A REST-based Storage Server • Faban Benchmark Results
  4. 4. What is InnoDB? • High-Performance “guts” of MySQL • Finely Tuned B-Tree Storage Engine • MVCC Transactional Store a la Jim Gray (“Transactional Processing Systems”) • Available Stand-Alone as Embedded InnoDB (stagnant) or HailDB (drizzle)
  5. 5. Relation to MySQL • One of many MySQL storage engines • Transactional, in contrast to MYISAM • Well-known, Bullet-Proof Backup, Failure & Recovery Modes • Advanced Buffer Pool Management (adaptive hash index, tunable LRU) • Online Backup Support (Xtrabackup / Hot)
  6. 6. Other Products • Tokyo BDB, Oracle BDB & BDB-JE • Schema-Free (No Structure / Data Types) • Lower Concurrency (fewer writers) • Performance Degradation in Larger DBs • (TODO: quantify performance gap - in meantime, see Dynamo & Voldemort)
  7. 7. InnoDB Model (Logical) • Database == Tablespace • Tablespace has Table(s) and Log(s) • Table has columns (rich datatypes) • Tables have a PRIMARY clustered index • Tables may have SECONDARY indexes • Row == Tuple • Tuples are stored / clustered by index sort • Secondary index stores full Primary Key
  8. 8. InnoDB Model (Txns) • Everything uses a Transaction • Isolation Level: Serialized, Read Committed, Read Uncommitted • Locks: Shared (Read-only), Exclusive (Read/Write) • Cursors provide access to tables: Lookup by index, Iteration / Traversal • Secondary index contains partial Tuples • Secondary cursor can access primary (full tuple)
  9. 9. InnoDB Model (Physical) • Tablespace is a collection of pages (16K) • Pages organized as a B-Tree: infimum & supremum keys, pointers to children • Pages contain row or index tuple data, or blob overflow data • Pages written to log first and flushed to tablespace based on ‘sync’ policy
  10. 10. Physical Considerations • New pages requested from OS in extend_size increments • OS Assigns space from file system / partition “free list” • Temporal Locality (pages close together) • Spatial Locality / Fragmentation from Updates • Prefer “narrow” rows / indexes: faster scan, keeps working set in-memory • Secondary “covering” indexes can save primary index access
  11. 11. !""#$%&$'(')'*+&,-.+* -"!./'0($('012%,$32" !"#$%&'$()*%#+(,% *++,-. %$"'() !"#$%"&' (&#& 0.%2 .!'$1 (!)#!*"&%+ !"1$%#5 /0 !""*(,-.!'$-/$%-#&,'$ ,3..$%5 3"(* '*41 0!,( .!'$1 !"#$%$ &!'() *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  12. 12. !""#$%&'()*+,-(.+, 0)>8$/.)?$ &$'($%# !"#$%# !"#$%# <$)=*%53$*/$'($%# A5%B8$)=*%53$*/$'($%# !"#$%# !"#$%# !"#$%# 4588>)?@*/$'($%# ;)'$ 456 456 456 01"*23 456 456 456 4588*.52%#$1 72$83*.52%#$1/ 456 456 72$83*9 72$83*: 72$83*% )%*$"#$%#*+*,-*.)'$/ *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  13. 13. !""#$%&'#() * !"#$%&'()*+, * -./0-12$3"456 #+,-./#( !"#$%&'( 012, * * 7890/:-2$3"456 #+,-./#( 012, !"#$%&'(&%'''')%*'+,''''!$--'./%''''0-&'./%1'''$2"%3-$45.67"'./%'88'09"-&'26-:"1 *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  14. 14. !""#$%&!"'()(*&+ ,-./0-1 !$020&-#3*&0-(&*2#-('& ! ! ."&24(&%+2-((&5(06& "#'(*&#6&0&758*2(-('& ."'() ! %+2-((&.*&#-90".:('& ;1&<-./0-1&=(1&#-& "#"+"855&8".>8(&=(1& #6&20;5(?&.6&'(6."('@& (5*(?&0"&."2(-"05& !"#$%&'&( )*'+,-'./0&.1 +2(&3 "#$%&#'()*+,- 7#58/"&3.24&A+;12(& BCDE!$&.*&0''('F *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  15. 15. !""#$%&!"'()(*&+ ,(-#"'./0 !"#$%&'&( !"#$%&'&( )*'+,-'./0&.1 )*'+,-'./0&.1 +2(&3 +2(&3 ! ,(-#"'./0&1"'()&%+ 2/((&3(.4&"#'(*& -#"2.1"5&4#/&(.-6&7(0& 8.39(5&26(&:/1;./0& 67%'&&/"&-8/25(&$9/!52%-+2+2:/(-%- 7(0*&#4&26(& -#//(*:#"'1"<&/#=*5& 9*('&2#&.--(**& -39*2(/1"<&1"'()&2#& #>2.1"&26(&'.2. ,(-#"'./0 !"#$% 67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$ 4&!52(-'./+2(&3 4&!52(-'./+2(&3 *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  16. 16. How can we use InnoDB? • Download Embedded InnoDB or HailDB • Use C-API for access to InnoDB tables • Innostore: Erlang library for InnoDB access (from Basho’s Riak NoSQL project) • g414-inno: Open-Source Java access library for Embedded InnoDB
  17. 17. g414-inno Foundations • Uses JNA (Java Native Access): Like JNI, but doesn’t provoke (as much) insanity • JNAerate: creates thin Java Class wrapper from a C-based header file (innodb.h) • But, complex C API’s are super ugly in Java • Need to clean that up a bit...
  18. 18. g414-inno Library • Provides a more Object-Oriented API to mask all of the JNA “Pointer” madness • Transaction Objects, Cursors, Table Builder, Tuple Builder, Datatype Validation • Java Enum Types for ‘int’ enums in C API • inTransaction() templates (like Spring, JDBI) • Contains sanity checks to prevent common errors (mostly C API order of operations)
  19. 19. Use Case:Voldemort • Voldemort: High-Performance Key-Value Store (Amazon Dynamo clone) • Nokia: good results with Voldemort on MySQL with InnoDB • Typical features of DB (network connectivity, SQL language) not really necessary • Thought: why bother with DB layer? The g414-inno project is born ...
  20. 20. Voldemort Storage Engines • Trivial to integrate new persistence mechanisms with Voldemort • 2 Classes: Config & Storage Engine • Trivial InnoDB Table: key_ VARBINARY(200) NOT NULL version_ VARBINARY(200) NOT NULL value_ BLOB PRIMARY KEY (key_, version_) • 3 Operations: put(k, v), get(k), delete(k) • Complication: k is Versioned<Key>
  21. 21. V Storage Engine: put • put(byte[] key, byte[] version, byte[] value) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key if row.version is below, delete row if row.version is above, throw exception • Cursor.insert(key, version, value)
  22. 22. V Storage Engine: get • get(byte[] key, byte[] version) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key add to results • Return results
  23. 23. V Storage Engine: delete • delete(byte[] key) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key delete row
  24. 24. V Storage Engine: TODO • Perform Benchmarks (in EC2, local) • Tuning / Optimization • Clarify licenses (GPLv2 + Apache == ouch) • Organize & streamline distribution
  25. 25. St8 • Simple, Open Source REST-based Storage Server • Wraps InnoDB with thin “but pleasant” HTTP API • Custom Tables using JSON table definitions • Natural, JSON-based access to tables: CRUD, Index- based Query & Iteration • Under the hood: Jetty, Jersey, Guice, Jackson, g414- inno, Embedded InnoDB
  26. 26. St8 Table Def { "columns":[ {"name":"key1","type":"INT","length":4}, {"name":"key2","type":"VARCHAR","length":50}, {"name":"val","type":"BLOB","length":0} ], "indexes":[ { "name":"PRIMARY", "clustered":true,"unique":true, "indexColumns":[{"name":"key1"}] }, { "name":"key2", "clustered":false,"unique":false, "indexColumns":[{"name":"key2"}] } ] }
  27. 27. St8 Interface • Operations for Table Management: create, describe, delete, truncate • Operations for Data Management: Create, Retrieve, Update, Delete • Influences g414-inno design: template methods for inTransaction(), insert, update, insertOrUpdate, delete, load • Coming Soon: Query & Iteration APIs
  28. 28. St8: Sample Requests SIMPLE GET: curl "http://localhost:8080/d/atable;key1=123" INSERT: curl -X PUT "http://localhost:8080/d/atable;key1=123;key2=ABC;val=AVERYLONGDATA" UPDATE: curl -X POST "http://localhost:8080/d/atable;key1=123;key2=CDE;val=NEWDATA" DELETE: curl -X DELETE "http://localhost:8080/d/atable;key1=123"
  29. 29. g414-inno: Faban Benchmark • Row: 4-byte Key, 4096-byte value • Insert Sequential, Random • Single disk, 3-disk RAID 0, SSD • TODO: Concurrent Benchmarks, Mixed Read/Write
  30. 30. Benchmark Results Embedded InnoDB Latency (ms) 20 15 10 5 0 InsertSeq InsertRnd SelectRnd Single Disk (OS X 1) 3-Disk Raid 0 (OS X 1) SSD (OS X 2) Single-Threaded Benchmarks InsertSeq InsertRnd SelectRnd Single Disk (OS X 1) 9.0 9.3 16 3-Disk Raid 0 (OS X 1) 0.47 1.4 5.2 SSD (OS X 2) 0.51 1.2 0.71
  31. 31. Next Steps / Future Work • Finish St8: Queries & Iteration, Benchmark • Package / Qualify Voldemort Storage Engine • Integrate with Xtrabackup (hot backup) • Integrate with Sqoop (hadoop export) • Explore more advanced App-Level Replication Support
  32. 32. Questions? • Thank you for listening!
  33. 33. References / More Info • Embedded InnoDB, HailDB (drizzle) • InnoDB Performance • GitHub: g414-inno, st8, voldemort, xfaban • Java Native Access (JNA) • Tokyo BDB, Oracle BDB & BDB-JE • Amazon Dynamo;Voldemort Project

×