Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

InnoDB Magic

2,288 views

Published on

Published in: Technology

InnoDB Magic

  1. 1. g414-inno Embedded InnoDB,Voldemort, St8, and a Few Other Tidbits Sunny Gleason
  2. 2. What’s in this Preso • What is InnoDB? • Relation to MySQL & Other Products • InnoDB Model • g414-inno: a Java Access Library for InnoDB
  3. 3. What else is in this Preso • Creating a Voldemort Storage Engine with Embedded InnoDB • St8: A REST-based Storage Server • Faban Benchmark Results
  4. 4. What is InnoDB? • High-Performance “guts” of MySQL • Finely Tuned B-Tree Storage Engine • MVCC Transactional Store a la Jim Gray (“Transactional Processing Systems”) • Available Stand-Alone as Embedded InnoDB (stagnant) or HailDB (drizzle)
  5. 5. Relation to MySQL • One of many MySQL storage engines • Transactional, in contrast to MYISAM • Well-known, Bullet-Proof Backup, Failure & Recovery Modes • Advanced Buffer Pool Management (adaptive hash index, tunable LRU) • Online Backup Support (Xtrabackup / Hot)
  6. 6. Other Products • Tokyo BDB, Oracle BDB & BDB-JE • Schema-Free (No Structure / Data Types) • Lower Concurrency (fewer writers) • Performance Degradation in Larger DBs • (TODO: quantify performance gap - in meantime, see Dynamo & Voldemort)
  7. 7. InnoDB Model (Logical) • Database == Tablespace • Tablespace has Table(s) and Log(s) • Table has columns (rich datatypes) • Tables have a PRIMARY clustered index • Tables may have SECONDARY indexes • Row == Tuple • Tuples are stored / clustered by index sort • Secondary index stores full Primary Key
  8. 8. InnoDB Model (Txns) • Everything uses a Transaction • Isolation Level: Serialized, Read Committed, Read Uncommitted • Locks: Shared (Read-only), Exclusive (Read/Write) • Cursors provide access to tables: Lookup by index, Iteration / Traversal • Secondary index contains partial Tuples • Secondary cursor can access primary (full tuple)
  9. 9. InnoDB Model (Physical) • Tablespace is a collection of pages (16K) • Pages organized as a B-Tree: infimum & supremum keys, pointers to children • Pages contain row or index tuple data, or blob overflow data • Pages written to log first and flushed to tablespace based on ‘sync’ policy
  10. 10. Physical Considerations • New pages requested from OS in extend_size increments • OS Assigns space from file system / partition “free list” • Temporal Locality (pages close together) • Spatial Locality / Fragmentation from Updates • Prefer “narrow” rows / indexes: faster scan, keeps working set in-memory • Secondary “covering” indexes can save primary index access
  11. 11. !""#$%&$'(')'*+&,-.+* -"!./'0($('012%,$32" !"#$%&'$()*%#+(,% *++,-. %$"'() !"#$%"&' (&#& 0.%2 .!'$1 (!)#!*"&%+ !"1$%#5 /0 !""*(,-.!'$-/$%-#&,'$ ,3..$%5 3"(* '*41 0!,( .!'$1 !"#$%$ &!'() *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  12. 12. !""#$%&'()*+,-(.+, 0)>8$/.)?$ &$'($%# !"#$%# !"#$%# <$)=*%53$*/$'($%# A5%B8$)=*%53$*/$'($%# !"#$%# !"#$%# !"#$%# 4588>)?@*/$'($%# ;)'$ 456 456 456 01"*23 456 456 456 4588*.52%#$1 72$83*.52%#$1/ 456 456 72$83*9 72$83*: 72$83*% )%*$"#$%#*+*,-*.)'$/ *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  13. 13. !""#$%&'#() * !"#$%&'()*+, * -./0-12$3"456 #+,-./#( !"#$%&'( 012, * * 7890/:-2$3"456 #+,-./#( 012, !"#$%&'(&%'''')%*'+,''''!$--'./%''''0-&'./%1'''$2"%3-$45.67"'./%'88'09"-&'26-:"1 *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  14. 14. !""#$%&!"'()(*&+ ,-./0-1 !$020&-#3*&0-(&*2#-('& ! ! ."&24(&%+2-((&5(06& "#'(*&#6&0&758*2(-('& ."'() ! %+2-((&.*&#-90".:('& ;1&<-./0-1&=(1&#-& "#"+"855&8".>8(&=(1& #6&20;5(?&.6&'(6."('@& (5*(?&0"&."2(-"05& !"#$%&'&( )*'+,-'./0&.1 +2(&3 "#$%&#'()*+,- 7#58/"&3.24&A+;12(& BCDE!$&.*&0''('F *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  15. 15. !""#$%&!"'()(*&+ ,(-#"'./0 !"#$%&'&( !"#$%&'&( )*'+,-'./0&.1 )*'+,-'./0&.1 +2(&3 +2(&3 ! ,(-#"'./0&1"'()&%+ 2/((&3(.4&"#'(*& -#"2.1"5&4#/&(.-6&7(0& 8.39(5&26(&:/1;./0& 67%'&&/"&-8/25(&$9/!52%-+2+2:/(-%- 7(0*&#4&26(& -#//(*:#"'1"<&/#=*5& 9*('&2#&.--(**& -39*2(/1"<&1"'()&2#& #>2.1"&26(&'.2. ,(-#"'./0 !"#$% 67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$ 4&!52(-'./+2(&3 4&!52(-'./+2(&3 *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  16. 16. How can we use InnoDB? • Download Embedded InnoDB or HailDB • Use C-API for access to InnoDB tables • Innostore: Erlang library for InnoDB access (from Basho’s Riak NoSQL project) • g414-inno: Open-Source Java access library for Embedded InnoDB
  17. 17. g414-inno Foundations • Uses JNA (Java Native Access): Like JNI, but doesn’t provoke (as much) insanity • JNAerate: creates thin Java Class wrapper from a C-based header file (innodb.h) • But, complex C API’s are super ugly in Java • Need to clean that up a bit...
  18. 18. g414-inno Library • Provides a more Object-Oriented API to mask all of the JNA “Pointer” madness • Transaction Objects, Cursors, Table Builder, Tuple Builder, Datatype Validation • Java Enum Types for ‘int’ enums in C API • inTransaction() templates (like Spring, JDBI) • Contains sanity checks to prevent common errors (mostly C API order of operations)
  19. 19. Use Case:Voldemort • Voldemort: High-Performance Key-Value Store (Amazon Dynamo clone) • Nokia: good results with Voldemort on MySQL with InnoDB • Typical features of DB (network connectivity, SQL language) not really necessary • Thought: why bother with DB layer? The g414-inno project is born ...
  20. 20. Voldemort Storage Engines • Trivial to integrate new persistence mechanisms with Voldemort • 2 Classes: Config & Storage Engine • Trivial InnoDB Table: key_ VARBINARY(200) NOT NULL version_ VARBINARY(200) NOT NULL value_ BLOB PRIMARY KEY (key_, version_) • 3 Operations: put(k, v), get(k), delete(k) • Complication: k is Versioned<Key>
  21. 21. V Storage Engine: put • put(byte[] key, byte[] version, byte[] value) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key if row.version is below, delete row if row.version is above, throw exception • Cursor.insert(key, version, value)
  22. 22. V Storage Engine: get • get(byte[] key, byte[] version) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key add to results • Return results
  23. 23. V Storage Engine: delete • delete(byte[] key) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key delete row
  24. 24. V Storage Engine: TODO • Perform Benchmarks (in EC2, local) • Tuning / Optimization • Clarify licenses (GPLv2 + Apache == ouch) • Organize & streamline distribution
  25. 25. St8 • Simple, Open Source REST-based Storage Server • Wraps InnoDB with thin “but pleasant” HTTP API • Custom Tables using JSON table definitions • Natural, JSON-based access to tables: CRUD, Index- based Query & Iteration • Under the hood: Jetty, Jersey, Guice, Jackson, g414- inno, Embedded InnoDB
  26. 26. St8 Table Def { "columns":[ {"name":"key1","type":"INT","length":4}, {"name":"key2","type":"VARCHAR","length":50}, {"name":"val","type":"BLOB","length":0} ], "indexes":[ { "name":"PRIMARY", "clustered":true,"unique":true, "indexColumns":[{"name":"key1"}] }, { "name":"key2", "clustered":false,"unique":false, "indexColumns":[{"name":"key2"}] } ] }
  27. 27. St8 Interface • Operations for Table Management: create, describe, delete, truncate • Operations for Data Management: Create, Retrieve, Update, Delete • Influences g414-inno design: template methods for inTransaction(), insert, update, insertOrUpdate, delete, load • Coming Soon: Query & Iteration APIs
  28. 28. St8: Sample Requests SIMPLE GET: curl "http://localhost:8080/d/atable;key1=123" INSERT: curl -X PUT "http://localhost:8080/d/atable;key1=123;key2=ABC;val=AVERYLONGDATA" UPDATE: curl -X POST "http://localhost:8080/d/atable;key1=123;key2=CDE;val=NEWDATA" DELETE: curl -X DELETE "http://localhost:8080/d/atable;key1=123"
  29. 29. g414-inno: Faban Benchmark • Row: 4-byte Key, 4096-byte value • Insert Sequential, Random • Single disk, 3-disk RAID 0, SSD • TODO: Concurrent Benchmarks, Mixed Read/Write
  30. 30. Benchmark Results Embedded InnoDB Latency (ms) 20 15 10 5 0 InsertSeq InsertRnd SelectRnd Single Disk (OS X 1) 3-Disk Raid 0 (OS X 1) SSD (OS X 2) Single-Threaded Benchmarks InsertSeq InsertRnd SelectRnd Single Disk (OS X 1) 9.0 9.3 16 3-Disk Raid 0 (OS X 1) 0.47 1.4 5.2 SSD (OS X 2) 0.51 1.2 0.71
  31. 31. Next Steps / Future Work • Finish St8: Queries & Iteration, Benchmark • Package / Qualify Voldemort Storage Engine • Integrate with Xtrabackup (hot backup) • Integrate with Sqoop (hadoop export) • Explore more advanced App-Level Replication Support
  32. 32. Questions? • Thank you for listening!
  33. 33. References / More Info • Embedded InnoDB, HailDB (drizzle) • InnoDB Performance • GitHub: g414-inno, st8, voldemort, xfaban • Java Native Access (JNA) • Tokyo BDB, Oracle BDB & BDB-JE • Amazon Dynamo;Voldemort Project

×