Successfully reported this slideshow.
Stonebraker Live!Navigating the Database UniverseVoltDB presents
BRUCE READINGPresident and CEO
• Traditional RDBMS is all wrong– Presented by Dr. Michael Stonebraker, Co-founder• Making sense of the database universe–...
TRADITIONAL RDBMS WISDOM IS ALLWRONGDr. Michael Stonebraker
Traditional RDBMS Wisdom• Data is in disk block formatting (heavily encoded)• With a main memory buffer pool of blocks• Qu...
Traditional RDBMS Wisdom• Dynamic row-level locking• Aries-style write-ahead log• Replication (asynchronous or synchronous...
Traditional RDBMS Wisdom• Describes MySQL, DB2, Postgres, SQLServer, Oracle…• Focus of most college-level DBMS courses– In...
Traditional RDBMS Wisdom• Is completely wrong• (More charitably) is obsolete
The DBMS Marketplace• About 1/3 “data warehouses”– Lots of big reads– Bulk-loaded from OLTP systems• About 1/3 “OLTP”– Lot...
The DBMS Marketplace• Data warehouses– Market already moving strongly in the direction of column stores– Which have nothin...
The Participants• Native column store vendors– HP/Vertica, SAP/Hana, Red Shift (Amazon/Paraccl), SAP/Sybase/IQ• Native row...
The DBMS Marketplace• OLTP– NewSQL systems are wildly faster than the traditional wisdom• Everything else– Not an RDBMS ma...
OLTP Databases – 3 Big Decisions• Main memory vs. disk orientation• Replication strategy• Concurrency control strategy
Reality Check on OLTP Databases• TP database size grows at the rate transactions increase• 1 Tbyte of main memory buyable ...
Reality Check – Main Memory Performance• TPC-C CPU cycles• On the Shore DBMSprototype• “Elephants” should besimilar
To Go Fast• Must focus on overhead– B-trees affects a small fraction of the path length• Must get rid of all four pie slic...
Buffer Pool Overhead• Get rid of the buffer pool• i.e., run a main-memory DBMS– Like VoltDB
Single Threading• Hosed unless you do this– Unless you get rid of queuing (somehow)– Or eliminate shared data structures (...
Concurrency Control• MVCC popular (NuoDB, Hekaton)• Time stamp order popular (VoltDB)• I don’t know anybody who is doing n...
Reality Check – High Availability (HA)• Requirement in today’s OLTP systems• Nobody will take down time• Must be solved th...
How to Implement HA• I am only interested in ACID outcomes!!!!• Eventual consistency actually means “creates garbage”– Con...
How to Implement HA• Active-Passive– Effectively requires you to write a log– One of the four pie slices• Active-Active (V...
Reality Check – Power Failures• What to do if you don’t have UPS…• Cannot lose data on a power failure!!!!• Two options– B...
Some Data From Nirmesh Malvaiya• Implemented Aries in VoltDB• Compared against the VoltDB command logging• Command logging...
The Nail in the Coffin• Time stamp order compatible with active-active– As are any deterministic schemes• Locking and MVCC...
Net-Net on OLTP• Main memory DBMS• Deterministic concurrency control• HA via active-active• Has nothing to do with the tra...
Summary• What we teach our DBMS students is all wrong• Implementations from the “elephants” are all obsolete– One-size-doe...
MAKING SENSE OF THE DATABASEUNIVERSEBruce Reading
The fact is…There’s only more andmore to come.And it’s not slowingdown…Record amounts of dataare being createdeveryday…
And if that data is most valuable atthe moment it’s created, how do youput it to use NOW?How do you automate decisioningag...
NOW
Imagine…
Nice story. So what?
Large, busy bankRogue trader5 “Mistypednumber”-$Small sum lost9 “Mistypednumber”& “Mistypednumber-$Small sum lost-$Small s...
-$2BNLarge sum lostThird largest loss inbanking history
UBS couldnt flag it among allthe data... until it was too late.
This is our world now.
Same old, same oldwon’t cut it.
What’s a developer to do?
Data Value ChainInteractive Real-time Analytics Record Lookup Historical Analytics Exploratory AnalyticsMilliseconds Hundr...
Data Value ChainInteractive Real-time Analytics Record Lookup Historical Analytics Exploratory AnalyticsMilliseconds Hundr...
Traditional RDBMSSimple SlowSmallFastComplexLargeApplicationComplexityValue of Individual Data Item Aggregate Data ValueDa...
Traditional RDBMSSimple SlowSmallFastComplexLargeApplicationComplexityValue of Individual Data Item Aggregate Data ValueDa...
The fastest, most scalable database onthe market todayVoltDBIngest massive quantities of data andperform automated decisio...
PREVENTACHIEVEAnything is possible…
Electrical smart grids
Micro-personalization
Real-time display targeting
Dynamic airline ticket purchasing
State-of-the-art social networking
Session management
Network monitoring
We enableNOW.www.VoltDB.com
HELLO 3.0!Ryan Betts
Introducing VoltDB 3.0VoltDB 3.0VoltDB: a modern OLTP database built for a high velocity world.– Horizontal scalability– H...
Latency and Throughput, 50-50 Read/Write WorkloadLatency and Throughput, 50-50 Read/Write Workload0246810121416-50000 0 50...
Read/Write Workload Latency/ThroughputRead/Write WorkloadLatency/Throughput0123456789-50000 0 50000 100000 150000 200000 2...
Faster: Ad Hoc SQL Performance• Conversational SQL• Thousands to 10,000+ ad hoc SQL transactions/second• Single or multipl...
Easier Development: New SQL Support• SQL LIKE and NOT LIKE• UNION• Column Functions• Counting function (leaderboard rankin...
• JSON values stored in a varchar column• Field() column function• Indexing on JSON elementsCREATE INDEX session_site_mode...
Easier Development:Online OperationsEasier Development: Online Operations• Ability to re-join a failed node to cluster wit...
Easier Development: Streamlined Development• Elimination of project.xml• VoltDB-specific configuration now defined in DDL•...
Expanded Reach: Cloud-Friendly• Reduce impact of variable node performance and latency• Elimination of strict NTP configur...
Integration: High-Performance Export• Parallelized export• New connectors: JDBC, Netezza, VerticaIntegration: High-Perform...
Integration: Client Library Updates• New PHP Client• Node.js client v1.0• Go Client• Coming soon: updated Erlang clientInt...
Other Notable New Features• Explain command• CSV loader utility• CSV snapshots• New Administration CLI: voltadmin– voltadm...
More Samples Availablefor DownloadMore Samples Available for Downloadhttp://voltdb.com/community/volt-labs.php
Volt University• Portfolio of instructional content, classes, tools, and otherresources to help them built applications qu...
Summary: VoltDB v3.0• Run faster: transactions at high velocity scale.• Create faster: write and scale your ACID applicati...
DOWNLOAD 3.0atwww.voltdb.comImagine thePossibilities
More Information?E-mailinfo@voltdb.comVisit our forumshttp://community.voltdb.com/forumRead the VoltDB “Getting Started Gu...
QUESTIONS?
THANK YOU
Upcoming SlideShare
Loading in …5
×

VoltDB - Stonebraker Live! - New York City 2013

2,471 views

Published on

VoltDB’s Dr. Michael Stonebraker of MIT, UC Berkeley, and Ingres and Postgres fame, presents the founding principles of solving modern data-velocity problems: “Data is growing faster than hard drives,” “Move the computation to the data, never move the data to the computation,” “Bet on main memory because there's no other way to go fast,” and “Run transactions to completion, and you eliminate locking and multithreading,” are central to his beliefs.

Not-so-coincidentally, they’re also central concepts of VoltDB.

Published in: Technology, Business
  • Be the first to comment

VoltDB - Stonebraker Live! - New York City 2013

  1. 1. Stonebraker Live!Navigating the Database UniverseVoltDB presents
  2. 2. BRUCE READINGPresident and CEO
  3. 3. • Traditional RDBMS is all wrong– Presented by Dr. Michael Stonebraker, Co-founder• Making sense of the database universe– Presented by Bruce Reading, President and CEO• Hello VoltDB 3.0– Presented by Ryan Betts, Field CTOAgenda
  4. 4. TRADITIONAL RDBMS WISDOM IS ALLWRONGDr. Michael Stonebraker
  5. 5. Traditional RDBMS Wisdom• Data is in disk block formatting (heavily encoded)• With a main memory buffer pool of blocks• Query plans– Optimize CPU, I/O– Fundamental operation is read a row• Indexing via B-trees– Clustered or unclustered
  6. 6. Traditional RDBMS Wisdom• Dynamic row-level locking• Aries-style write-ahead log• Replication (asynchronous or synchronous)– Update the primary first– Then move the log to other sites– And roll forward at the secondary (s)
  7. 7. Traditional RDBMS Wisdom• Describes MySQL, DB2, Postgres, SQLServer, Oracle…• Focus of most college-level DBMS courses– Including M.I.T.• Focus of most DBMS textbooks
  8. 8. Traditional RDBMS Wisdom• Is completely wrong• (More charitably) is obsolete
  9. 9. The DBMS Marketplace• About 1/3 “data warehouses”– Lots of big reads– Bulk-loaded from OLTP systems• About 1/3 “OLTP”– Lots of small updates– And a few reads• About 1/3 “everything else”– Hadoop, NoSQL, graph DBMS, Array DBMS…
  10. 10. The DBMS Marketplace• Data warehouses– Market already moving strongly in the direction of column stores– Which have nothing to do with the traditional wisdom– Because column stores are 50 – 100 X row stores
  11. 11. The Participants• Native column store vendors– HP/Vertica, SAP/Hana, Red Shift (Amazon/Paraccl), SAP/Sybase/IQ• Native row store vendors– Microsoft, Oracle, DB2, Netezza• In transition– Teradata, Asterdata, Greenplum• If you are running a row store, then be prepared to switch!
  12. 12. The DBMS Marketplace• OLTP– NewSQL systems are wildly faster than the traditional wisdom• Everything else– Not an RDBMS market
  13. 13. OLTP Databases – 3 Big Decisions• Main memory vs. disk orientation• Replication strategy• Concurrency control strategy
  14. 14. Reality Check on OLTP Databases• TP database size grows at the rate transactions increase• 1 Tbyte of main memory buyable for around $30K (or less)– (say) 64 Gbytes per server in 16 servers• 10+ Tbytes possible• If your data doesn’t fit in main memory now, then wait acouple of years and it will…
  15. 15. Reality Check – Main Memory Performance• TPC-C CPU cycles• On the Shore DBMSprototype• “Elephants” should besimilar
  16. 16. To Go Fast• Must focus on overhead– B-trees affects a small fraction of the path length• Must get rid of all four pie slices– Anything less gives you a marginal win– TimesTen as an example16
  17. 17. Buffer Pool Overhead• Get rid of the buffer pool• i.e., run a main-memory DBMS– Like VoltDB
  18. 18. Single Threading• Hosed unless you do this– Unless you get rid of queuing (somehow)– Or eliminate shared data structures (somehow)• VoltDB statically divides shared memory among the cores– And cores are single threaded
  19. 19. Concurrency Control• MVCC popular (NuoDB, Hekaton)• Time stamp order popular (VoltDB)• I don’t know anybody who is doing normal dynamic locking– It’s too slow!!!!
  20. 20. Reality Check – High Availability (HA)• Requirement in today’s OLTP systems• Nobody will take down time• Must be solved through replication
  21. 21. How to Implement HA• I am only interested in ACID outcomes!!!!• Eventual consistency actually means “creates garbage”– Consider 2 customers at 2 sites, each buying the last “widget”• Even Jeff Dean (Google) has come around to this pointof view
  22. 22. How to Implement HA• Active-Passive– Effectively requires you to write a log– One of the four pie slices• Active-Active (VoltDB solution)– Send only the transaction, not the effect of the transaction– Allows read-queries to be sent to any replica
  23. 23. Reality Check – Power Failures• What to do if you don’t have UPS…• Cannot lose data on a power failure!!!!• Two options– Bring back the log (and the pie slice)– Command log plus asynchronous checkpoints
  24. 24. Some Data From Nirmesh Malvaiya• Implemented Aries in VoltDB• Compared against the VoltDB command logging• Command logging about 3X faster in total throughput
  25. 25. The Nail in the Coffin• Time stamp order compatible with active-active– As are any deterministic schemes• Locking and MVCC are not– Need a 2 phase commit between the replicas– Slow, slow, slow
  26. 26. Net-Net on OLTP• Main memory DBMS• Deterministic concurrency control• HA via active-active• Has nothing to do with the traditional wisdom• Even if your data is too big for main memory– The traditional wisdom is still wrong– Stay tuned for a paper on this topic
  27. 27. Summary• What we teach our DBMS students is all wrong• Implementations from the “elephants” are all obsolete– One-size-does-not-fit-all– Several million lines of code per vendor are obsolete• I expect a lot of turmoil in the market off into the future
  28. 28. MAKING SENSE OF THE DATABASEUNIVERSEBruce Reading
  29. 29. The fact is…There’s only more andmore to come.And it’s not slowingdown…Record amounts of dataare being createdeveryday…
  30. 30. And if that data is most valuable atthe moment it’s created, how do youput it to use NOW?How do you automate decisioningagainst it NOW?
  31. 31. NOW
  32. 32. Imagine…
  33. 33. Nice story. So what?
  34. 34. Large, busy bankRogue trader5 “Mistypednumber”-$Small sum lost9 “Mistypednumber”& “Mistypednumber-$Small sum lost-$Small sum lostOblivious-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$ -$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$ -$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$ -$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$-$ -$
  35. 35. -$2BNLarge sum lostThird largest loss inbanking history
  36. 36. UBS couldnt flag it among allthe data... until it was too late.
  37. 37. This is our world now.
  38. 38. Same old, same oldwon’t cut it.
  39. 39. What’s a developer to do?
  40. 40. Data Value ChainInteractive Real-time Analytics Record Lookup Historical Analytics Exploratory AnalyticsMilliseconds Hundredths of seconds Second(s) Minutes Hours• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.• Calculate risk• Leaderboard• Aggregate• Count• Retrieve clickstream• Show orders• Backtest algo• BI• Daily reports• Algo discovery• Log analysis• Fraud pattern matchAge of Data
  41. 41. Data Value ChainInteractive Real-time Analytics Record Lookup Historical Analytics Exploratory AnalyticsMilliseconds Hundredths of seconds Second(s) Minutes Hours• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.• Calculate risk• Leaderboard• Aggregate• Count• Retrieve clickstream• Show orders• Backtest algo• BI• Daily reports• Algo discovery• Log analysis• Fraud pattern matchValue of IndividualData ItemDataValueAggregateData ValueAge of Data
  42. 42. Traditional RDBMSSimple SlowSmallFastComplexLargeApplicationComplexityValue of Individual Data Item Aggregate Data ValueDataValueThe Database UniverseInteractive Real-time Analytics Record Lookup Historical AnalyticsExploratoryAnalyticsTransactional Analytic
  43. 43. Traditional RDBMSSimple SlowSmallFastComplexLargeApplicationComplexityValue of Individual Data Item Aggregate Data ValueDataValueDataWarehouseHadoop, etc.NoSQLThe Database UniverseInteractive Real-time Analytics Record Lookup Historical AnalyticsExploratoryAnalyticsTransactional AnalyticNewSQLVelocity
  44. 44. The fastest, most scalable database onthe market todayVoltDBIngest massive quantities of data andperform automated decisioning in real time3 MILLION transactionsper secondDramatically lowering your cost pertransactionVoltDB enablesNOW.A huge impact on the bottom lineNOW
  45. 45. PREVENTACHIEVEAnything is possible…
  46. 46. Electrical smart grids
  47. 47. Micro-personalization
  48. 48. Real-time display targeting
  49. 49. Dynamic airline ticket purchasing
  50. 50. State-of-the-art social networking
  51. 51. Session management
  52. 52. Network monitoring
  53. 53. We enableNOW.www.VoltDB.com
  54. 54. HELLO 3.0!Ryan Betts
  55. 55. Introducing VoltDB 3.0VoltDB 3.0VoltDB: a modern OLTP database built for a high velocity world.– Horizontal scalability– Hundreds of thousands of transactions per second– Relational SQL
  56. 56. Latency and Throughput, 50-50 Read/Write WorkloadLatency and Throughput, 50-50 Read/Write Workload0246810121416-50000 0 50000 100000 150000 200000 250000 300000Latency(ms)TPS3.02.8.4.1VoltDB 3.0 vs. v2.8.4.1Key/Value 50/50 read/write workload3 Node, K=1 Cluster
  57. 57. Read/Write Workload Latency/ThroughputRead/Write WorkloadLatency/Throughput0123456789-50000 0 50000 100000 150000 200000 250000 300000 350000Avg.Latency(ms)TPS10% read/90% write50% read/50% write90% read/10% writeVoltDB 3.0Key/Value various read/write workload3 Node, K=1 Cluster
  58. 58. Faster: Ad Hoc SQL Performance• Conversational SQL• Thousands to 10,000+ ad hoc SQL transactions/second• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQLPerformance
  59. 59. Easier Development: New SQL Support• SQL LIKE and NOT LIKE• UNION• Column Functions• Counting function (leaderboard ranking queries)• Ability to define index using column functionsEasier Development:New SQL Support
  60. 60. • JSON values stored in a varchar column• Field() column function• Indexing on JSON elementsCREATE INDEX session_site_moderatorON user_session_table (field(json_data, site),field(json_data, moderator), username);• New JSON sample in kitEasier Development:JSON SupportEasier Development: JSON Support
  61. 61. Easier Development:Online OperationsEasier Development: Online Operations• Ability to re-join a failed node to cluster with no impact toexisting operations• Online schema update• No service window
  62. 62. Easier Development: Streamlined Development• Elimination of project.xml• VoltDB-specific configuration now defined in DDL• Defaulting of deployment.xml• New Volt Compiler CLI:voltdb compileEasier Development:Streamlined Development
  63. 63. Expanded Reach: Cloud-Friendly• Reduce impact of variable node performance and latency• Elimination of strict NTP configuration• Scales to large # of nodesExpanded Reach:Cloud-Friendly
  64. 64. Integration: High-Performance Export• Parallelized export• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export
  65. 65. Integration: Client Library Updates• New PHP Client• Node.js client v1.0• Go Client• Coming soon: updated Erlang clientIntegration: ClientLibrary Updateshttp://golang.org
  66. 66. Other Notable New Features• Explain command• CSV loader utility• CSV snapshots• New Administration CLI: voltadmin– voltadmin save– voltadmin restore– voltadmin pause– voltadmin resume– voltadmin shutdownOther NotableNew Features
  67. 67. More Samples Availablefor DownloadMore Samples Available for Downloadhttp://voltdb.com/community/volt-labs.php
  68. 68. Volt University• Portfolio of instructional content, classes, tools, and otherresources to help them built applications quickly• Curriculum and supporting material range from beginner toadvanced• Three types of instruction:– Volt University Online– Volt University Classroom– Volt Vanguard CertificationVolt University
  69. 69. Summary: VoltDB v3.0• Run faster: transactions at high velocity scale.• Create faster: write and scale your ACID application.• Learn faster: Volt Labs & VoltDB UniversityVoltDB v3.0
  70. 70. DOWNLOAD 3.0atwww.voltdb.comImagine thePossibilities
  71. 71. More Information?E-mailinfo@voltdb.comVisit our forumshttp://community.voltdb.com/forumRead the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/indexFollow@VoltDB on TwitterMore Information?
  72. 72. QUESTIONS?
  73. 73. THANK YOU

×