InnoDB Magic
Upcoming SlideShare
Loading in...5
×
 

InnoDB Magic

on

  • 2,117 views

 

Statistics

Views

Total Views
2,117
Views on SlideShare
2,105
Embed Views
12

Actions

Likes
3
Downloads
33
Comments
1

2 Embeds 12

http://localhost:3000 8
http://speckel.dev:3000 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

InnoDB Magic InnoDB Magic Presentation Transcript

  • g414-inno Embedded InnoDB,Voldemort, St8, and a Few Other Tidbits Sunny Gleason
  • What’s in this Preso • What is InnoDB? • Relation to MySQL & Other Products • InnoDB Model • g414-inno: a Java Access Library for InnoDB
  • What else is in this Preso • Creating a Voldemort Storage Engine with Embedded InnoDB • St8: A REST-based Storage Server • Faban Benchmark Results
  • What is InnoDB? • High-Performance “guts” of MySQL • Finely Tuned B-Tree Storage Engine • MVCC Transactional Store a la Jim Gray (“Transactional Processing Systems”) • Available Stand-Alone as Embedded InnoDB (stagnant) or HailDB (drizzle)
  • Relation to MySQL • One of many MySQL storage engines • Transactional, in contrast to MYISAM • Well-known, Bullet-Proof Backup, Failure & Recovery Modes • Advanced Buffer Pool Management (adaptive hash index, tunable LRU) • Online Backup Support (Xtrabackup / Hot)
  • Other Products • Tokyo BDB, Oracle BDB & BDB-JE • Schema-Free (No Structure / Data Types) • Lower Concurrency (fewer writers) • Performance Degradation in Larger DBs • (TODO: quantify performance gap - in meantime, see Dynamo & Voldemort)
  • InnoDB Model (Logical) • Database == Tablespace • Tablespace has Table(s) and Log(s) • Table has columns (rich datatypes) • Tables have a PRIMARY clustered index • Tables may have SECONDARY indexes • Row == Tuple • Tuples are stored / clustered by index sort • Secondary index stores full Primary Key
  • InnoDB Model (Txns) • Everything uses a Transaction • Isolation Level: Serialized, Read Committed, Read Uncommitted • Locks: Shared (Read-only), Exclusive (Read/Write) • Cursors provide access to tables: Lookup by index, Iteration / Traversal • Secondary index contains partial Tuples • Secondary cursor can access primary (full tuple)
  • InnoDB Model (Physical) • Tablespace is a collection of pages (16K) • Pages organized as a B-Tree: infimum & supremum keys, pointers to children • Pages contain row or index tuple data, or blob overflow data • Pages written to log first and flushed to tablespace based on ‘sync’ policy
  • Physical Considerations • New pages requested from OS in extend_size increments • OS Assigns space from file system / partition “free list” • Temporal Locality (pages close together) • Spatial Locality / Fragmentation from Updates • Prefer “narrow” rows / indexes: faster scan, keeps working set in-memory • Secondary “covering” indexes can save primary index access
  • !""#$%&$'(')'*+&,-.+* -"!./'0($('012%,$32" !"#$%&'$()*%#+(,% *++,-. %$"'() !"#$%"&' (&#& 0.%2 .!'$1 (!)#!*"&%+ !"1$%#5 /0 !""*(,-.!'$-/$%-#&,'$ ,3..$%5 3"(* '*41 0!,( .!'$1 !"#$%$ &!'() *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  • !""#$%&'()*+,-(.+, 0)>8$/.)?$ &$'($%# !"#$%# !"#$%# <$)=*%53$*/$'($%# A5%B8$)=*%53$*/$'($%# !"#$%# !"#$%# !"#$%# 4588>)?@*/$'($%# ;)'$ 456 456 456 01"*23 456 456 456 4588*.52%#$1 72$83*.52%#$1/ 456 456 72$83*9 72$83*: 72$83*% )%*$"#$%#*+*,-*.)'$/ *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  • !""#$%&'#() * !"#$%&'()*+, * -./0-12$3"456 #+,-./#( !"#$%&'( 012, * * 7890/:-2$3"456 #+,-./#( 012, !"#$%&'(&%'''')%*'+,''''!$--'./%''''0-&'./%1'''$2"%3-$45.67"'./%'88'09"-&'26-:"1 *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  • !""#$%&!"'()(*&+ ,-./0-1 !$020&-#3*&0-(&*2#-('& ! ! ."&24(&%+2-((&5(06& "#'(*&#6&0&758*2(-('& ."'() ! %+2-((&.*&#-90".:('& ;1&<-./0-1&=(1&#-& "#"+"855&8".>8(&=(1& #6&20;5(?&.6&'(6."('@& (5*(?&0"&."2(-"05& !"#$%&'&( )*'+,-'./0&.1 +2(&3 "#$%&#'()*+,- 7#58/"&3.24&A+;12(& BCDE!$&.*&0''('F *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  • !""#$%&!"'()(*&+ ,(-#"'./0 !"#$%&'&( !"#$%&'&( )*'+,-'./0&.1 )*'+,-'./0&.1 +2(&3 +2(&3 ! ,(-#"'./0&1"'()&%+ 2/((&3(.4&"#'(*& -#"2.1"5&4#/&(.-6&7(0& 8.39(5&26(&:/1;./0& 67%'&&/"&-8/25(&$9/!52%-+2+2:/(-%- 7(0*&#4&26(& -#//(*:#"'1"<&/#=*5& 9*('&2#&.--(**& -39*2(/1"<&1"'()&2#& #>2.1"&26(&'.2. ,(-#"'./0 !"#$% 67%'&&/"&-8/25(&$9/!52%-+2+2: ;<$ 4&!52(-'./+2(&3 4&!52(-'./+2(&3 *source: http://www.mysqlconf.com/mysql2009/public/schedule/detail/7052
  • How can we use InnoDB? • Download Embedded InnoDB or HailDB • Use C-API for access to InnoDB tables • Innostore: Erlang library for InnoDB access (from Basho’s Riak NoSQL project) • g414-inno: Open-Source Java access library for Embedded InnoDB
  • g414-inno Foundations • Uses JNA (Java Native Access): Like JNI, but doesn’t provoke (as much) insanity • JNAerate: creates thin Java Class wrapper from a C-based header file (innodb.h) • But, complex C API’s are super ugly in Java • Need to clean that up a bit...
  • g414-inno Library • Provides a more Object-Oriented API to mask all of the JNA “Pointer” madness • Transaction Objects, Cursors, Table Builder, Tuple Builder, Datatype Validation • Java Enum Types for ‘int’ enums in C API • inTransaction() templates (like Spring, JDBI) • Contains sanity checks to prevent common errors (mostly C API order of operations)
  • Use Case:Voldemort • Voldemort: High-Performance Key-Value Store (Amazon Dynamo clone) • Nokia: good results with Voldemort on MySQL with InnoDB • Typical features of DB (network connectivity, SQL language) not really necessary • Thought: why bother with DB layer? The g414-inno project is born ...
  • Voldemort Storage Engines • Trivial to integrate new persistence mechanisms with Voldemort • 2 Classes: Config & Storage Engine • Trivial InnoDB Table: key_ VARBINARY(200) NOT NULL version_ VARBINARY(200) NOT NULL value_ BLOB PRIMARY KEY (key_, version_) • 3 Operations: put(k, v), get(k), delete(k) • Complication: k is Versioned<Key>
  • V Storage Engine: put • put(byte[] key, byte[] version, byte[] value) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key if row.version is below, delete row if row.version is above, throw exception • Cursor.insert(key, version, value)
  • V Storage Engine: get • get(byte[] key, byte[] version) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key add to results • Return results
  • V Storage Engine: delete • delete(byte[] key) • Start transaction, open table cursor • Create search tuple for key • Cursor.find(key) • Foreach row matching key delete row
  • V Storage Engine: TODO • Perform Benchmarks (in EC2, local) • Tuning / Optimization • Clarify licenses (GPLv2 + Apache == ouch) • Organize & streamline distribution
  • St8 • Simple, Open Source REST-based Storage Server • Wraps InnoDB with thin “but pleasant” HTTP API • Custom Tables using JSON table definitions • Natural, JSON-based access to tables: CRUD, Index- based Query & Iteration • Under the hood: Jetty, Jersey, Guice, Jackson, g414- inno, Embedded InnoDB
  • St8 Table Def { "columns":[ {"name":"key1","type":"INT","length":4}, {"name":"key2","type":"VARCHAR","length":50}, {"name":"val","type":"BLOB","length":0} ], "indexes":[ { "name":"PRIMARY", "clustered":true,"unique":true, "indexColumns":[{"name":"key1"}] }, { "name":"key2", "clustered":false,"unique":false, "indexColumns":[{"name":"key2"}] } ] }
  • St8 Interface • Operations for Table Management: create, describe, delete, truncate • Operations for Data Management: Create, Retrieve, Update, Delete • Influences g414-inno design: template methods for inTransaction(), insert, update, insertOrUpdate, delete, load • Coming Soon: Query & Iteration APIs
  • St8: Sample Requests SIMPLE GET: curl "http://localhost:8080/d/atable;key1=123" INSERT: curl -X PUT "http://localhost:8080/d/atable;key1=123;key2=ABC;val=AVERYLONGDATA" UPDATE: curl -X POST "http://localhost:8080/d/atable;key1=123;key2=CDE;val=NEWDATA" DELETE: curl -X DELETE "http://localhost:8080/d/atable;key1=123"
  • g414-inno: Faban Benchmark • Row: 4-byte Key, 4096-byte value • Insert Sequential, Random • Single disk, 3-disk RAID 0, SSD • TODO: Concurrent Benchmarks, Mixed Read/Write
  • Benchmark Results Embedded InnoDB Latency (ms) 20 15 10 5 0 InsertSeq InsertRnd SelectRnd Single Disk (OS X 1) 3-Disk Raid 0 (OS X 1) SSD (OS X 2) Single-Threaded Benchmarks InsertSeq InsertRnd SelectRnd Single Disk (OS X 1) 9.0 9.3 16 3-Disk Raid 0 (OS X 1) 0.47 1.4 5.2 SSD (OS X 2) 0.51 1.2 0.71
  • Next Steps / Future Work • Finish St8: Queries & Iteration, Benchmark • Package / Qualify Voldemort Storage Engine • Integrate with Xtrabackup (hot backup) • Integrate with Sqoop (hadoop export) • Explore more advanced App-Level Replication Support
  • Questions? • Thank you for listening!
  • References / More Info • Embedded InnoDB, HailDB (drizzle) • InnoDB Performance • GitHub: g414-inno, st8, voldemort, xfaban • Java Native Access (JNA) • Tokyo BDB, Oracle BDB & BDB-JE • Amazon Dynamo;Voldemort Project