• Save
Os Zaitsev
Upcoming SlideShare
Loading in...5
×
 

Os Zaitsev

on

  • 1,157 views

 

Statistics

Views

Total Views
1,157
Views on SlideShare
1,157
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Os Zaitsev Os Zaitsev Presentation Transcript

  • Landscape of Open Source Transactional Storage Engines Peter Zaitsev Vadim Tkachenko http://MySQLPerformanceBlog.com
  • Aboutus - Founder Per s cona Lt d - M ySQ L Peror ance Focused C onsuli fm tng - htp:/ w w . ySQ LPeror anceBl com -aut s t /w M fm og. hor - W orked f M ySQ L AB f year or or s - Pet – l er ead of“ i Peror ance G r H gh fm oup” Vadi hi , ms rghthand i - Long tm e M ySQ L user f bunch ofper i s or sonaly l i ved pr ect nvol oj s
  • M ySQ L pl nabl ar t ur ugi e chiect e
  • M ySQ L TransactonalEngi i nes - B DB -Legacy St age Engi r oved i 5. not or ne, em n1 t ed est - InnoDB -“ ostpopul ” ( M ar The onl com m onl used) y y st age engi by I or ne nnobase O y. - S olidDB -St age Engi fom Sold I or aton Technol or ne r i nf m i ogy - PB XT - St age Engi by SN AP I i Paul McCullagh) or ne nnovaton ( - Falcon - New Storage Engine by MySQL AB, Project lead by Jim Starkey - NDB - MySQL Cluster is a whole other beast and not covered
  • I nnoD B - htp:/ w w .nnodb. t /w i com / - M at e St age Engi devel ur or ne, opm entst t by H ei ared kki Tuur over10 year ago. i s - H ei w as l ng f a w ays t i pr kki ooki or o m ove tadii r tonal databases peror ance fm - Acquied by O r e i t end of2005 r acl n he - The onl Tr y ansactonalst age engi avaiabl i i or ne l en M ySQ L 5. ofi alr ease 0 fci el
  • soldD B i - htp:/ w w . i ech. t /w soldt com / i Bf M ySQ L/ soldD or - O penSourced i 2006 n - Exi i St age Engi t stng or ne echnol ogy “nt at w ih i egr ed” t M ySQ L - Focused on r i lt and M uli ocessorScal lt elabiiy tpr abiiy - C ur enty shi rl pped as producton r i eady.
  • Prm eBase XT ( i PBXT) - htp:/ w w . i ebase. t /w prm com / / xt - W rten m ai y by Paul McCullagh since 2005 it nl - Not a port of existing storage engine to MySQL but new writeup - Uses number of unusual design decisions - Only 50% transactional - Focused on efficient BLOB storage - http://www.blobstreaming.org/
  • Fal con - htp:/ t /dev. ysqlcom / m . doc/al f con/ i en/ndex. m l ht - Based on “ etr r ur engi by Ji St key N fastuct e” ne m ar - Purchased by M ySQ L AB i eary 2006 n l - “ ght ei D esi Li w ght gn” - Focused on Transactonalneeds ofW eb Applcaton, i ii efi entuse ofl ge am ountofm em or fci ar y
  • Design and Behavior
  • I nnoD B desi gn - M VC C and ver efi entr l y fci ow evell ocks - C l erng by prm ar key,w rt t sam e pages ust i iy ie o - non-com pressed secondar i y ndexes w .tansacton i o r i nf - Si e t espace ort espace pert e ngl abl abl abl - Pessi i i l ng m stc ocki - I antD eadl det i nst ock ecton - Fuzzy C heckpoi i ntng - “ oubl rt f pari page w rt pr ecton D eW ie” or tal ie ot i
  • I nnoD B - D EAD LO C K det i ecton - Session 1: BEGIN; Session 2: BEGIN; Session 1: UPDATE test SET name=‘random1-1’ WHERE id=1; Session 2: UPDATE test SET name=‘random2-1’ WHERE id=2; Session 1: UPDATE test SET name=‘random1-2’ WHERE id=2; Session 2: UPDATE test SET name=‘random2-2’ WHERE id=1; -InnoD B det deadl ( r 1213)Ins tantly i ect ock Er or n second sessi on - Pessi i i l ng: m stc ocki - U PD ATE t sam e r i t o concur enttansacton – he ow n w r r i second tansacton w ais on C O M M I R O LLBAC K i r i t T/ n fr ist
  • I nnoD B Stengt r hs - Pow erulM VC C f - G ood peror ance on w i r fm de ange ofw or oads kl - G r St lt eat abiiy - G r D at Pr ecton eat a ot i - Prm ar Key C l erng alow s a l ofoptm i i iy ust i l ot i zatons - Transacton i o i secondar i i nf n y ndexes alow f i l ast ndex onl scans y - Adaptve H ash i i ndexes and ot advanced t her echni ques
  • I nnoD B W eaknesses - Sl D evel ow opm entpace i r n ecentyears - Stl havi scal lt i il ng abiiy ssues w ih m uli e C PU s t tpl - U nscal e Aut I em ent Br abl o-ncr , oken G roup C om m i t t ake ver l y ong t fx oi - Lar f prnt especi l f secondar i ge oot i , aly or y ndexes - I t ns outnotso l ge as w e com par t ur ar e - Stl m essy i egr i w ih M ySQ L il nt aton t - H ow do you see how m uch space i fee i I s r n nnodb t espace ? abl
  • SoldD B D esi i gn - M VC C and R ow l evell ng ocki - C l erng by Prm ar Key ust i iy - N ew dat st ed i new pages a or n -“BonsaiTr used f M uliVer oni ee” or t si ng - O PTI I C and PESSI I C l ng specii on M STI M STI ocki fed t el abl evel - O nlne Backup ( otusabl f Sl i N e or ave cr i eaton) - H i Avaiabl sync r i i pr i gh le eplcaton om sed soon.
  • soldD B -PESSI I C i M STI - D EAD LO C K -D EAD LO C K det ed i fr Sessi afer ect n ist on t 20 sec ofw aii tng - Ti eoutbased deadl m ocks - U PD ATE t o r s – second sessi w ai on fr w ow on t ist
  • soldD B -O PTI I C i M STI - D EAD LO C K -D EAD LO C K det ed i second Sessi ect n on i m edi el butw ih er or1205 – Lock w ai tm eoutexceeded m at y tr ti - U PD ATE t o concur entr s: w r ow - SESSI N O 1:BEG I ; N SESSI N O 2:BEG I ; N SESSI N O 1:U PD ATE t SET nam e = ‘nd’W H ER E i est r d=2; SESSI N O 2:U PD ATE t SET nam e = ‘nd’W H ER E i est r d=2; - I Sessi 2 w e got n on : ER R O R 1205 ( Y000) Lock w ai tm eoutexceeded;ty H : ti r r ari tansacton est tng r i - Thi i O K f O PTI I C engi ss or M STI nes,butm ay cause toubl i r en W eb applcatons. ii
  • S olidDB S treng ths and Weaknes s - Li ied pr mt oducton usage t r l t l i o ealy el - O utofst age engi or nes r ew ed m ostsi iari desi evi ml n gn tI o nnodb - C hoi ofO ptm i i vs Pessi i i i ni f som e ce i stc m stc s ce or applcatons ii - N o i antdeadl det i nst ock ecton - So f avaiabl as speci dow nl ar le al oad onl ( even a y not pl n) ugi
  • PBXT D esi gn - M VC C W ih r l t ow evell ng ocki -“PerD at abase”Transactons i - N o r dur lt yet w eak cr eal abiiy , ash recovery - O PTI I C l ng M STI ocki - W rt once,w rt sequentaly t l ie ie i l o og - N everupdat i pl e n ace - D at cache + Key cache a - Efi entBLO B H andlng fci i
  • PBXT - D EAD LO C K det ed i second sessi 1213 er or ect n on, r - U PD ATE t o concur entr s – optm i i w r ow i stc, second session: ER R O R 1020 ( Y000) R ecor has changed si H : d nce l ast read i t e 'est n abl t 2'
  • PBXT Stengt and W eaknesses r hs - N otyetcom m onl used i pr y n oducton ( e ti butgot i w red t m any bugs) oo - Ver good peror ance f som e w or oads y fm or kl - Efi entSt age, cl fci or ose t M yI o SAM - Focused on BLO B efi enthandlng,exta f ur lke fci i r eat es i Bl Steam i ob r ng - Stl m ai y one m an pr ect il nl oj - Lar ToD o,a l needs t be done,i udi R ecover ge ot o ncl ng y - Pot i l l ge Pur ng over entaly ar gi head
  • Fal con D esi gn - M VC C , r l ow evell ng (n pr i noti t ocki i actce, n heory) - PESSI I C l ng M STI ocki - N otcl er by prm ar key ust ed iy - R ow cache (cache onl r s you need) y ow - “ ptm al i O i ” ndex taver on r si - “ at C om pr Da essi -N uls,Em pt Sti on” l y rngs - Al ays needs t r w o ead r dat ( ow a because ofi ndex stuct e) r ur
  • Fal con - D EAD LO C K: I Sessi n on2: ER R O R 1020 ( Y000) R ecor has changed si H : d nce l ast read i t e 'est n abl t 2' - Ann H ar i rson t l Fal els con checks cycl i l gr es n ock aph perodi l r hert i caly at han i anty on r l w ai nst l ow ock t - U PD ATE: Second sessi w ais on t
  • Fal con Stengt i W eaknesses r hs n - Stl Al il pha w ih m any bugs – Eary t j t l o udge - Ver actve suppor fom M ySQ L AB y i tr - Fastdevel opm entpace – bugs bei fxed qui y,m aj ng i ckl or peror ance i pr fm m ovem ent durng l 3 m ont s i ast hs - G ood i egr i w ih M ySQ L,i t es f peror ance nt aton t e abl or f m data - N o Prm ar key cl erng orcoverng i iy ust i i ndex support - D if entdesi deci ons can com plcat m i aton fom fer gn si i e gr i r I nnodb (hough l calbehavi becam e cl t ogi or oser)
  • There are lies, big lies and there are Benchmarks
  • Benchm ar – t ngs t not ks hi o e - Benchm ar m ay notbe r evantf peror ance of ks el or f m yourapplcaton ii - Eary ver ons w e ti f Fal l si red or con,PBXT m ay change t rperor ance pr hei fm operi bef e pr tes or oductoni - Ther i nott m uch experence outw her t ng es oo i e uni Falcon,PBXT and Sold w ih M ySQ L as t it hey ar bar y e el used i pr n oducton i - W e di l benchm ar t d ess ks han w ant – spenta l of ed ot tm e fghtng/epori bugs and checki fxes i iir tng ng i
  • Benchm arks - R ead- nl on t calt e f w eb- Oy ypi abl or applcaton ii - D BT2 – TPC - em ul i C aton - D el D VD St e – em ul i ofe- l or aton com m er sie ce t - Sysbench – O LTP tansactons r i - Sqlbench -sm al dat set si e user t calquer la , ngl , ypi y pater t ns
  • Box - D el Pow er l Edge 2950 - C ent S r ease 4. O el 5 - 4 C PU m odelnam e :I elR )Xeon( )C PU nt ( R 5148 @ 2.33G H z stepping :6 cpu M H z :2327.529 cache size :4096 KB - 16 G B ofR AM - R AI 10 ( 10K R PM 3. SAS har drves) D 6 5” di
  • M ySQ L Ver ons si - Yes,t s m eans ver on afect peror ance notonl hi si fs fm y st age engi butw e coul notgetal st age engi or ne d l or ne w or ng w ih sam e M ySQ L ver on. ki t si -InnoD B and PBXT 5. 19 1. - Fal con 6. 1- pha,bk tee fom 10- 0. al rr Jul - SoldD B i 5. 41- 0. 0073
  • Engi nes par et s am er - 12 G B ofR AM f bufer or f s -InnoD B --innodb_buffer_pool_s ize=12G --innodb_flus h_method=O_DIR E C T --innodb-log -file-s ize=100M - SoldD B --s oliddb-cache-s ize=12G i - Fal con --falcon_min_record_memory=2G --falcon_max_record_memory=4G --falcon_pag e_cache_s ize=8G - PBXT pbxt_index_cache_s ize=8G pbxt_record_cache_s ize=4G
  • D BT2 C onfgur i D et l i aton ais - D BT2 - htp:/ t /osdl . dbtsour or net cef ge. / - 10 C oncur entuser ( r s about2 f each C PU cor and or e di sk) - “ o D el t f l l Zer ay” o uly oad M ySQ L Server - I 400W confgur i r n i aton educed avaiabl m em or t 4G le yo by l ng 12G B ofm em or t have i I bound. ocki yo tO - Bufersi w er r f zes e educed t 2G B o
  • D BT2 – 10 w arehouses - 10 w arehouses,10 17744 18000 17000 clent ( asi ~ i s dat ze 16000 15000 700M ) 14000 13000 - R esul i N ew O r tn der 12000 11000 Transacton PerM i e, i nut 10000 InnoDB SolidDB m or i beter es t 9000 8209 Falcon 8000 PBXT 7000 - PBXT cr 6097 ashed 6000 5000 - O l ver on ofFal 4000 d si con 3000 had ~1100 N O TPM 2000 1000 0 - G r i pr eat m ovem ent! NOTPM
  • D BT2 – 400 w arehouses - D at si ~ 29G B a ze Load time - SoldD B 140 136 i 130 crashed afer336 m i t ns 120 110 - D i N otdi e l d sabl ogs on 100 SoldD B t have t ngs i o hi 90 InnoDB 80 com par e. abl PBXT 70 Falcom 63 60 50 40 40 30 20 10 0 Time, min
  • D BT2,400W ,D at si a ze - Sur i ngl l ge si przi y ar ze Size of loaded data fom PBXT r 45000 42191 41770 - SoldD B – t es w er 40000 i abl e 38266 35000 l oaded i o M yI nt SAM and 30726 t hen convered t to 30000 InnoDB SoldD B i 25000 SolidDB PBXT - I w as cr 20000 Falcon t ashi ng 15000 ot w i her se 10000 5000 0 MB
  • D BT2,400W ,R esuls t - PBXT crashed 1200 - R esul i N ew O r tn der 1105 1100 Transacton PerM i e, i nut 1000 m or i beter es t 900 800 InnoDB 700 SolidDB 600 Falcon 495 500 400 300 200 178 100 0 NOTPM
  • D el D VD St e l or - D at ze asi 18000 17589 17000 M edi um 1 GB 16000 15000 2,000,000 C ust er om s 14000 100,000 Products 13000 12000 - Fal 11000 con – crashed 10000 InnoDB 9000 - PBXT – a l ofer or SolidDB 8000 ot rs 7594 7000 6000 - R esul i N ew O r s tn der 5000 4000 perm i e,m or i nut es 3000 beter t 2000 1000 0 orders per minute
  • sysbench - O l Fal der con used i t s t . N ew one cr n hi est ashes : ( - C oupl ofR EAD - N LY queres agai t calt e f e O i nst ypi abl or W eb-applcatons – i o ofuseraccount ii nf : CREATE TABLE IF NOT EXISTS sbtest ( id int(10) unsigned NOT NULL auto_increment, name varchar(64) NOT NULL default '', email varchar(64) NOT NULL default '', password varchar(64) NOT NULL default '', dob date default NULL, address varchar(128) NOT NULL default '', city varchar(64) NOT NULL default '', state_id tinyint(3) unsigned NOT NULL default '0', zip varchar(8) NOT NULL default '', country_id smallint(5) unsigned NOT NULL default '0', PRIMARY KEY (id), KEY `country_id` (country_id,state_id,city) )
  • sysbench,read by prm ar key iy SELECT name • 65000.00 FROM sbtest 60000.00 55000.00 WHERE id=? Innodb 50000.00 Falcon 45000.00 Innodb and SolidDB • 40000.00 PBXT Solid have 35000.00 quries / sec 30000.00 sweat spot 25000.00 being 20000.00 15000.00 clustered by 10000.00 PK 5000.00 0.00 1 4 16 64 128 256 clients
  • sysbench,read by i ndex SELECT name 200.00 ● FROM sbtest 175.00 WHERE 150.00 country_id=? 125.00 100.00 quries / sec PBXT Excels Innodb ● 75.00 Falcon Falcon comes 50.00 ● SolidDB next 25.00 PBXT 0.00 1 4 16 64 128 256 clients
  • sysbench,read by cover i ed ndex 250.00 SELECT ● 225.00 state_id 200.00 FROM sbtest 175.00 WHERE 150.00 country_id=? 125.00 quries / sec Innodb 100.00 Falcon PBXT still 75.00 ● SolidDB best 50.00 PBXT 25.00 Falcon can't 0.00 ● use covered 1 4 16 64 128 256 index clients
  • sysbench,read by i ndex,LI I 20 MT SELECT name 50000.00 ● 45000.00 FROM sbtest 40000.00 WHERE 35000.00 country_id=? 30000.00 LIMIT 20 25000.00 quries / sec Innodb 20000.00 Falcon Does Falcon ● 15000.00 not optimize SolidDB 10000.00 Limit PBXT 5000.00 0.00 Innodb ● 1 4 16 64 128 256 Scales clients poorly
  • Sysbench O LTP - D at ze asi 100,000, 000 r s ow ~25G B - U nior di rbuton f m sti i - IO - / bound l oad -read /w rt tansactons ie r i - R educed avaiabl m em or by l ng 12G B ourof le y ocki 16G B
  • Sysbench O LTP,tm e t l i o oad dat a - U si m ulival ng t- ue 3500 3364 3250 I SER Ts r hert N at han 3000 2880 LO AD D ATA I FI N LE 2750 2500 - Sold and Fal i con ar e 2250 even sl ert ow han I nnodb 2000 1930 InnoDB SolidDB w hi i know n t be ch s o 1750 PBXT 1500 sl com par t ow ed o Falcon 1237 1250 M yISAM f dat l or a oad. 1000 750 500 250 0 sec
  • Sysbench O LTP,D at ze asi - C om parson ofst ages i or Datasize, varchar vs char ofcharand var char 26.44 27.5 col ns i t t e um n he abl 25 23.0323 22.51 22.5 - Fal con uses dynam i c 20 l engt r s anyw ay h ow 17.5 14.8 char, GB 15 - PBXT sur i ngl has prsi y varchar, GB 12.5 sam e huge si i bot ze n h 9.6 8.718.71 10 cases 7.5 5 2.5 0 InnoDB SolidDB PBXT Falcon
  • Sysbench O LTP,resuls t - M em or lm ied t 4G B, yi t o I/O bound 2G B f bufer or f s 50 46.24 -I 45 nnodb and SoldD B have i beneftdue t cl erng i o ust i 40 by prm ar key iy transactions / sec 35 30.14 30 - Al butFal InnoDB l con scal w el e l 26.11 SolidDB 25 f I bound w or oad or O kl 22.33 PBXT Falcon 19.06 20 w ih t s am ountofhar t hi d 15 12.77 drves. i 10.62 10.3 10 5.8 5.71 5 4.86 3.87 0 1 4 64 clients
  • Sel ed sql ect bench resuls t - single operation repeated N times, total time in secs. less is better - Operation | 1| 2| 3| |innodb_|pbxt_fa|soliddb| alter_table_add (100) | 8.00| 3.00| 32.00| count (100) | 12.00| 8.00| 28.00| count_distinct (1000) | 6.00| 8.00| 74.00| count_distinct_2 (1000) | 11.00| 11.00| 16.00| count_group_on_key_parts (1000) | 7.00| 10.00| 83.00| count_on_key (50100) | 70.00| 94.00| 210.00| delete_all_many_keys (1) | 17.00| 2.00| 28.00| insert (350768) | 6.00| 5.00| 21.00| outer_join (10) | 14.00| 7.00| 61.00| select_key2_return_prim (200000) | 30.00| 29.00| 25.00| select_many_fields (2000) | 8.00| 6.00| 5.00| update_big (10) | 18.00| 56.00| 727.00| update_of_key_big (501) | 19.00| 6.00| 165.00| update_of_primary_key_many_keys (256| 44.00| 17.00| 55.00| update_with_key_prefix (100000) | 19.00| 8.00| 10.00|
  • C oncl on usi - Al r ew ed st age engi l evi or nes butI nnoD B ar cur enty e rl t unst e f pr oo abl or oducton use.SoldD B com es cl i i osest . -I nnoD B i stl w i s il nneri m aj iy oft s n ort est - Fal con has ser i ve ssues w ih LI I optm i i and I t MT i zaton O bound scal lt abiiy - PBXT and Fal con w i i cerai t s nn t n est - SoldD B i cur enty an out deri t m s ofPeror ance i s rl si n er fm - N eed t r si w hen pr o evi t oducton ver ons ofal st age i si l or engines ar r e eady.
  • The End - Thanks f com i ! or ng - Sldes w ilbe publshed at i l i htp:/ w w . ysql f m ancebl com / t /w m peror og. - Feelfee t appr ro oach us w ih yourqueston t i - M ySQ L Peror ance O ptm i i C onsuli Avaiabl fm i zaton tng le - htp:/ w w . ysql f m ancebl com / ysqlconsuli t /w m peror og. m - tng/
  • Sysbench O LTP,resuls,char t - D at ze com par e asi abl CPU bound w ih m em or si t y ze 37.5 36.71 34.77 35 32.5 29.36 30 29.1 27.5 transactions / sec 25.11 25 22.5 InnoDB 20.4 20 SolidDB 18.75 17.51 17.27 17.5 PBXT 15.15 Falcon 15 13.81 12.5 10 8.87 7.5 5 2.5 0 1 4 64 clients