Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

1,082 views

Published on

"Navigating the Database Universe" was the topic of the Big Data Cloud meetup held on Jan 24th 2013 in Santa Clara, CA. This is the presentation made by Mike Stonebraker & Scott Jarr of VoltDB.

This meetup was sponsored by VoltDb.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,082
On SlideShare
0
From Embeds
0
Number of Embeds
130
Actions
Shares
0
Downloads
35
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • done on the volt10'sDell R510 server2 x Intel(R) Xeon(R) (quad core) CPU X5670  @ 2.93GHz64GB RAM
  • Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

    1. 1. VoltDB presents Stonebraker Live!Navigating the Database Universe
    2. 2. Co-founder and Chief Strategy Officer SCOTT JARR
    3. 3. Agenda• The (proper) design of DBMSs – Presented by Dr. Michael Stonebraker, Co-founder• The database universe – Presented by Scott Jarr, Co-founder and Chief Strategy Officer• Introducing VoltDB 3.0 – Presented by Mark Hydar, VP of Market Technology and Strategy
    4. 4. We Believe…• “Big Data” is a rare, transformative market• Velocity is becoming the cornerstone• Specialized databases (working together) are the answer• Products must provide tangible customer value... Fast
    5. 5. Dr. Michael StonebrakerTHE (PROPER) DESIGN OF THE DBMS
    6. 6. Lessons from 40 Years of Database Design1. Get the user interaction right – Bet on a small number of easy-to-2. understand constructs – Plus standards Get the implementation right “ Those who don’t learn from history are – Bet on a small number of easy-to- understand constructs destined to repeat it. -Winston Churchill ”3. One size does not fit all – At least not if you want fast, big or complex
    7. 7. #1: Get the User Interaction Right Historical Lesson: RDBMS vs. CODASYL vs. OODBWinner: RDBMS Loser: CODASYL Loser: OODBs• Simple data model • Complicated data model • Complex data model (records; participate in “sets”; (hierarchical (tables) set has one owner records, pointers, sets, ar• Simple access and, perhaps, many rays, etc.) members, etc.) language (SQL) • Complex access • Messy access language (sea• ACID (transactions) of “cursors”; some -- but not language all -- move on every (navigation, through this• Standards (SQL) command, navigation sea) programming) • No standards
    8. 8. Interaction Take Away − Simple is Good• ACID was easy for people to understand• SQL provided a standard, high-level language and made people productive (transportable skills)
    9. 9. #2: Get the Implementation Right• Leverage a few simple ideas: Early relational implementations Historical Winners – System R storage system dropped links – Views (protection, schema modification, performance) – Cost-based optimizer• Leverage a few simple ideas: Postgres – User-defined data types and functions (adopted by most everybody) – Rules/triggers – No-overwrite storage• Leverage a few simple ideas: Vertica – Store data by column – Compressed up the ging gong – Parallel load without compromising ACID
    10. 10. #3: One Size Does NOT Fit All• OSFA is an old technology with hundreds of bags hanging off it• It breaks 100% of the time when under “ …specialized systems can each be a factor of load 50 faster than the• Load = size or speed or complexity single ‘one size fits all’• Load is increasing at a startling rate system…A factor of 50 is nothing to sneeze at.• Purpose-built will exceed by 10x to 100x• History has not been completely written yet…but let’s look at VoltDB as an -My Top 10 Assertions About Data Warehouses, 2010 ” example
    11. 11. Example: VoltDB• Get the interface right – SQL – ACID• Implementation: Leverage a few simple ideas – Main memory – Stored procedures – Deterministic scheduling• Specialization – OLTP focus allowed for above implementation choices
    12. 12. Proving the Theory Useful Work• Challenge: OLTP 4% performance Recovery 24% Latching 24% – TPC-C CPU cycles Buffer Pool 24% – On the Shore DBMS Locking 24% prototype – Elephants should be similar
    13. 13. Single Threaded• Gets rid of the latching problem• What about Multicore? – Divide the memory on an N-core node so it looks like N single-core nodes – Which are single threaded…
    14. 14. Implementation Construct #1: Main Memory• Main memory format for data – Disk format gets you buffer pool overhead• What happens if data doesn’t fit? – Return to disk-buffer pool architecture (slow) – Anti-caching • Main memory format for data • When memory fills up, then bundle together elderly tuples and write them out • Run a transaction in “sleuth mode”; find the required records and move to main memory (and pin) • Run Xact normally
    15. 15. Implementation Construct #2: Stored Procedures• Round trip to the DBMS is expensive – Do it once per transaction – Not once per command – Or even once per cursor move• Ad-hoc queries supported – Turn them into dynamic stored procedures
    16. 16. Implementation Construct #3: Deterministic Scheduling• Transactions are ordered and run to completion – No locking• Active-active replication (HA) – Run transaction at all replicas – in the same pre-determined order• What about a cluster-wide power failure? – Asyn checkpointing – With a command log – Wildly faster than data logging
    17. 17. Result of Design Principles: VoltDB Example• Good interface decisions – made developers more productive – SQL & ACID• Leveraging a few simple implementation ideas – made VoltDB wicked fast – Main memory – Stored procedures – Deterministic scheduling
    18. 18. Proving the Theory• Answer: OLTP performance – 3 million transactions per second “ …we are heading toward a world with at least 5 (and probably – 7x Cassandra more) specialized – 15 million SQL statements per engines and the death second of the ‘one size fits all’ – 100,000+ transactions per legacy systems. commodity server ” -The End of an Architectural Era (It’s Time for a Complete Rewrite), 2007
    19. 19. Scott JarrTHE DATABASE UNIVERSE
    20. 20. Technology Meets the MarketBelieve – “Big Data” is a rare, transformative market – Velocity is becoming the cornerstone – Specialized databases (working together) are the answer – Products must provide tangible customer value… FastObservations – Noisy, crowded and new – kinda like Christmas shopping at the mall – Everyone wants to understand where the pieces fit – Analysts build maps on technology NOT use casesWhat we need is…
    21. 21. Data Value Chain Age of Data Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics Milliseconds Hundredths of seconds Second(s) Minutes Hours• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery• Serve ad • Leaderboard stream • BI • Log analysis• Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match• Examine packet • Count• Approve trans.
    22. 22. Data Value Chain Value of Individual Aggregate Data Item Data Value Data Value Age of Data Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics Milliseconds Hundredths of seconds Second(s) Minutes Hours• Place trade • Calculate risk • Retrieve click • Backtest algo • Algo discovery• Serve ad • Leaderboard stream • BI • Log analysis• Enrich stream • Aggregate • Show orders • Daily reports • Fraud pattern match• Examine packet • Count• Approve trans.
    23. 23. The Database Universe Fast Complex Large Value of Individual Data Item Aggregate Data Value Application Complexity Data Value Traditional RDBMSSimple SlowSmall Transactional Analytic Exploratory Interactive Real-time Analytics Record Lookup Historical Analytics Analytics
    24. 24. The Database Universe Fast Complex Large Value of Individual Data Item Aggregate Data Value Application Complexity Data Value Velocity Hadoop, etc. NoSQL Data NewSQL Warehouse Traditional RDBMSSimple SlowSmall Transactional Analytic Exploratory Interactive Real-time Analytics Record Lookup Historical Analytics Analytics
    25. 25. logins trades authorizations clicks sensors orders impressions Closed-loop Big Data Interactive & Real-time Analytics Historical Reports & Analytics Exploratory Analytics
    26. 26. logins trades authorizations clicks sensors orders impressions Closed-loop Big Data • Make the most Interactive & Real-time Analytics informed decision every time there is an interaction • Real-time decisions Historical Reports & Analytics are informed byKnowledge operational analytics and past knowledge Exploratory Analytics
    27. 27. The Velocity Use CaseWhat’s it look like? – High throughput, relentless data feeds – Fast decisions on high-value data – Real-time, operational analytics present immediate visibilityWhat’s the big deal? – Batch visibility converts to real time = immediate business impact – Decisions made at time of event = higher impact decisions with immediate returns – Ability to ingest and manage massive amounts of data = business differentiation and disruption
    28. 28. Mark HydarHELLO 3.0!
    29. 29. Introducing VoltDB 3.0• Available now! – Both commercial and open source offerings – www.voltdb.com/downloadsIntroducing VoltDB 3.0• Key improvements – Even faster – Easier to build high-velocity applications – Expanded reach across developers and applications – Extensible to integrate with existing data infrastructure
    30. 30. Latency and Throughput, 50-50 Read/Write Workload VoltDB 3.0 vs. v2.8.4.1 Key/Value 50/50 read/write workload 16 3 Node, K=1 ClusterLatency and Throughput, 50- 14 12 Latency (ms) 50 Read/Write Workload 10 8 3.0 2.8.4.1 6 4 2 0 -50000 0 50000 100000 150000 200000 250000 300000 TPS
    31. 31. Read/Write Workload Latency/Throughput 9 VoltDB 3.0 Key/Value various read/write workload 8 3 Node, K=1 Cluster Avg. Latency (ms)Read/Write Workload 7 6 5 10% read/90% write 50% read/50% writeLatency/Throughput 90% read/10% write 4 3 2 1 0 -50000 0 50000 100000 150000 200000 250000 300000 350000 TPS
    32. 32. Faster: Ad Hoc SQL Performance• Conversational SQL Faster: Ad Hoc SQL• Thousands to 10,000+ ad hoc SQL transactions/second• Single or multiple (batch) SQL statement transaction Performance
    33. 33. Easier Development: New SQL Support• SQL LIKE and NOT LIKEEasier Development:• UNION• Column Functions New SQL Support• Counting function (leaderboard ranking queries)• Ability to define index using column functions
    34. 34. Easier Development: JSON Support• JSON values stored in a varchar columnEasier Development:• Field() column function• Indexing on JSON elements JSON Support CREATE INDEX session_site_moderator ON user_session_table (field(json_data, site), field(json_data, moderator), username);• New JSON sample in kit
    35. 35. Easier Development: Online Operations• Ability to re-join a failed node to cluster with no impact to existing operationsEasier Development:• Online schema update• No service window Online Operations
    36. 36. Easier Development: Streamlined Development• Elimination of project.xml• VoltDB-specific configuration now defined in DDL Easier Development:• Defaulting of deployment.xmlStreamlined Development• New Volt Compiler CLI: voltdb compile
    37. 37. Expanded Reach: Cloud-Friendly• Reduce impact of variable node performance and latency Expanded Reach:• Elimination of strict NTP configuration• Scales to large # of nodes Cloud-Friendly
    38. 38. Integration: High-Performance Export• Parallelized export Integration: High-• New connectors: JDBC, Netezza, Vertica Performance Export
    39. 39. Integration: Client Library Updates• New PHP Client Integration: Client• Node.js client v1.0• Go Client Library Updates• Coming soon: updated Erlang client http://golang.org
    40. 40. Other Notable New Features• Explain command• CSV loader utility Other Notable• CSV snapshots• New Administration CLI: voltadmin New Features – voltadmin save – voltadmin restore – voltadmin pause – voltadmin resume – voltadmin shutdown
    41. 41. More Samples Available for DownloadMore Samples Available for Download http://voltdb.com/comm unity/volt-labs.php
    42. 42. Volt University• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly• Curriculum and supporting material range from beginner to advanced Volt University• Three types of instruction: – Volt University Online – Volt University Classroom – Volt Vanguard Certification
    43. 43. Summary: VoltDB v3.0 Features• Even faster• Easier to build high-velocity applications VoltDB v3.0• Expanded reach across developers and applications• Extensible to integrate with existing data infrastructure• Volt Labs• Volt University
    44. 44. DOWNLOAD 3.0 Imagine the at Possibilities www.voltdb.com
    45. 45. More Information? E-mail info@voltdb.com Visit our forums More Information? http://community.voltdb.com/forum Read the VoltDB “Getting Started Guide” http://community.voltdb.com/docs/GettingStarted/index Follow @VoltDB on Twitter
    46. 46. QUESTIONS?
    47. 47. THANK YOU

    ×