VoltDB Application Development
Upcoming SlideShare
Loading in...5
×
 

VoltDB Application Development

on

  • 4,429 views

This presentation by Tim Callaghan explains VoltDB's technical overview, "Do's and Dont's," and how to build a VoltDB application.

This presentation by Tim Callaghan explains VoltDB's technical overview, "Do's and Dont's," and how to build a VoltDB application.

Statistics

Views

Total Views
4,429
Views on SlideShare
4,428
Embed Views
1

Actions

Likes
3
Downloads
118
Comments
0

1 Embed 1

http://www.docseek.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

VoltDB Application Development VoltDB Application Development Presentation Transcript

  • Building VoltDB ApplicationsTim Callaghan, VoltDB Field Engineertcallaghan@voltdb.com
  • Agenda
    Who am I?
    VoltDB Technical Overview
    VoltDB Do’s and Don’ts
    Building a VoltDB Application
    Q & A
    2
  • Who am I?
    Tim Callaghan
    VoltDB Field Engineer and Community Advocate
    Supporting both the commercial and community customers
    Technical background
    18 years of Oracle design, development, and administration
    user/abuser of many programming languages
    Joined VoltDB in September, 2009
    Full contact information at end
    3
  • VoltDB
    Technical
    Overview
    4
  • Technical Overview – 1 Slide
    VoltDB avoids the overhead plaguing traditional databases…
    K-safety for fault tolerance
    - no logging
    In memory operation for maximum throughput
    - no buffer management
    Partitions operate autonomously and single-threaded
    - no latching or locking
    Built to horizontally scale
    X
    X
    X
    X
    5
  • X
    X
    X
    X
    X
    Approximately 1 partition per CPU “core”
    Data - Two types of tables
    Partitioned
    Rows exist within a single VoltDB partition
    Single table column serves as partitioning key
    High frequency of modification (transactional data)
    Replicated
    Rows exist within all VoltDB partitions
    Low frequency of modification (lookup tables: city, state, …)
    Code - Two types of work – both ACID
    Single-Partition
    All insert/update/delete operations operate on partitioned data only
    All partitioned data is local
    *** select – all data required is satisfied by a single partition
    Multi-Partition
    All partitioned data is not local or
    Insert/update/delete of replicated data
    Technical Overview – Partitions (1/3)
  • Technical Overview – Partitions (2/3)
    Looking inside a VoltDB partition…
    Each partition contains data and an execution engine.
    The execution engine contains a queue for transaction requests.
    Requests are executed sequentially (single threaded).
    WorkQueue
    execution engine
    Table Data
    Index Data
  • Technical Overview – Partitions (3/3)
    Single-partition vs. Multi-partition
    Partition 1
    Partition 2
    Partition 3
    select count(*) from orders where customer_id = 5
    single-partition
    1 101 2
    1 101 3
    4 401 2
    2 201 1
    5 501 3
    5 502 2
    3 201 1
    6 601 1
    6 601 2
    select count(*) from orders where product_id = 3
    multi-partition
    1 knife
    2 spoon
    3 fork
    1 knife
    2 spoon
    3 fork
    1 knife
    2 spoon
    3 fork
    insert into orders (customer_id, order_id, product_id) values (3,303,2)
    single-partition
    update products set product_name = ‘spork’ where product_id = 3
    multi-partition
    table orders : customer_id (partition key)
    (partitioned)order_id
    product_id
    table products : product_id
    (replicated) product_name
  • Technical Overview – Compiling
    Stored Procedures
    Schema
    import org.voltdb. * ;
    @ProcInfo(
    partitionInfo = "HELLOWORLD.DIA
    singlePartition = true
    )
    public class Insert extends VoltPr
    public final SQLStmtsql =
    new SQLStmt("INSERT INTO HELLO
    public VoltTable[] run( String hel
    import org.voltdb. * ;
    @ProcInfo(
    partitionInfo = "HELLOWORLD.DIA
    singlePartition = true
    )
    public class Insert extends VoltPr
    public final SQLStmt sql =
    new SQLStmt("INSERT INTO HELLO
    public VoltTable[] run( String hel
    CREATE TABLE HELLOWORLD (
    HELLO CHAR(15),
    WORLD CHAR(15),
    DIALECT CHAR(15),
    PRIMARY KEY (DIALECT)
    );
    import org.voltdb. * ;
    @ProcInfo(
    partitionInfo = "HELLOWORLD.DIA
    singlePartition = true
    )
    public class Insert extends VoltPr
    public final SQLStmt sql =
    new SQLStmt("INSERT INTO HELLO
    public VoltTable[] run( String hel
    Project.xml
    <?xml version="1.0"?>
    <project>
    <database name='data
    <schema path='ddl.
    <partition table=‘
    </database>
    </project>
    The database is constructed from:
    The schema
    The work load (Java stored procedures)
    The physical topology (number of hosts and partitions)
    This allows the appropriate work to be partitioned with the data.
  • Technical Overview – Transactions
    All access to VoltDB is via Java stored procedures (Java + SQL)
    A single invocation of a stored procedure is a transaction (committed on success)
    Clients can communicate with VoltDB asynchronously
    Access via native client libraries (Java, C++, and Erlang exist, more coming) or our HTTP/JSON interface
    No ODBC/JDBC
    Clients connect to one or more nodes in the VoltDB cluster, transactions are forwarded to the correct node.
  • Technical Overview - Clusters
    Linear scale (tested to 240 cores)
    Scalability
    Add servers to increase
    performance / capacity
    Cluster management
    Change stored procedure (V1)
    Add/drop table (V1.1)
    Add/drop column (V1.2)
    Replace failed node (V1.2)
    Add/remove nodes (Future)
    Automatic partition redistribution (Future)
    Tps
    100,000’s
    VoltDB Servers
  • Technical Overview - Durability
    High availability
    K-safety for redundancy
    Snapshots
    Scheduled, continuous, on demand
    Disaster Recovery/WAN replication (Future)
    Asynchronous replication
    Spooling to data warehouse
  • VoltDB
    Do’s and Don’ts
    * complete list at http://community.voltdb.com/dos_and_donts
    13
  • VoltDB Do’s and Don’ts (1/8)
    DO #1
    Partition your tables to maximize the frequency of single-partition transactions and minimize multi-partition transactions.
    Single-partition transaction vs. multi-partition transaction, 1 unit of time
    or
    … now imagine this on a 12 node cluster with 96 partitions
    14
    s1
    s2
    s3
    s4
    s5
    s6
    s7
    s8
    s9
    m1
    m1
    m1
    m1
    m1
    m1
    m1
    m1
    m1
  • VoltDB Do’s and Don’ts (2/8)
    DO #2
    Implement a partitioning strategy that creates an even distribution of data and stored procedure calls.
    Data skew – partition 1 contains the majority of the data, your application will not be able to use much of the cluster’s RAM.
    Work skew – partition 2 is executing 90% of the stored procedures, you will be transactionally limited to the amount of work a single CPU core can accomplish.
    15
    Partition 1
    Partition 2
    Partition 3
    Partition 1
    Partition 2
    Partition 3
  • VoltDB Do’s and Don’ts (3/8)
    DO #3
    Use multiple SQL queries in your stored procedures.
    Iterating and looping in your stored procedures is fine.
    This is OK and FAST
    16
    public final SQLStmtselectSQL = new SQLStmt("select customer_id, product_id from customer_orders where order_id = ?");
    public final SQLStmtinsertSQL = new SQLStmt("insert into completed_orders (customer_id, product_id) values (?, ?)");
    ...
    voltQueueSQL(selectSQL, orderId);
    VoltTable results1[] = voltExecuteSQL();
    for (inti = 0; i < results1[0].getRowCount(); i++) {
    voltQueueSQL(insertSQL, results1[0].fetchRow(i).getLong(0), results1[0].fetchRow(i).getLong(1));
    }
    return voltExecuteSQL();
  • VoltDB Do’s and Don’ts (4/8)
    DO #4
    Implement single SQL statement stored procedures in your project.xml file.
    17
    package com.procedures;
    import org.voltdb.*;
    @ProcInfo(
    partitionInfo = “table1.column1: 0",
    singlePartition = true
    )
    public class SampleProcedure extends VoltProcedure {
    public final SQLStmtselectSQL = new SQLStmt("select column2, column3 from table1 where column1 = ?;");
    public VoltTable[] run(
    Long column1Value
    ) {
    voltQueueSQL(selectSQL, column1Value);
    return = voltExecuteSQL(true);
    }
    }
    <procedure class=‘SampleProcedure' partitioninfo='table1.column1:0'><select column2, column3 from table1 where column1 = ?;</sql></procedure>
  • VoltDB Do’s and Don’ts (5/8)
    DO #5
    Use our forums, ask questions, contribute to VoltDB
    details at http://community.voltdb.com
    18
  • VoltDB Do’s and Don’ts (6/8)
    DON’T #1
    Don’t use ad hoc SQL queries as part of a production application.
    Ad hoc SQL must be compiled and planned “on the fly”
    Always executed multi-partition
    Security is allowed or disallowed, cannot control access to individual tables
    For development purposes we provide a browser based tool that allows execution of ad hoc queries
    19
  • VoltDB Do’s and Don’ts (7/8)
    DON’T #2
    Don’t create queries that return large volumes of data, such as “SELECT * FROM FOO” with no constraints, especially for multi-partition transactions (this seems like an analytic type query that is better served by a data warehouse DBMS).
    Be conservative in the data returned by stored procedures.
    There are limits in the amount of data that can be transferred between partitions, between servers and clients, and the size of temp tables.
    20
  • VoltDB Do’s and Don’ts (8/8)
    DON’T #3
    Don’t assume exceptionally low latency (< 5ms) for any single VoltDB transaction.
    VoltDB is optimized for application throughput, not individual transaction latency.
    However, VoltDB’s latency is competitive with other database products. I’m measuring latencies in the single digit milliseconds on properly sized clusters.
    21
  • Building a
    VoltDB Application
    22
  • Building a VoltDB Application
    “Voter” Example Application Requirements:
    Build an application to track an online voting process.
    Must support processing > 100,000 votes per second.
    Each contest contains between 2 and 12 contestants.
    Voters are allowed a maximum of 10 votes.
    A vote consists of a phone number and contestant number.
    Successful votes must be kept for audit.
    Application must provide real-time results (1x per second).
    23
  • Building an App - Schema (1/4)
    Schema requires a single table - “vote”
    contestant_number
    phone_number
    Two transactions, “vote” and “results”
    How do I partition?
    24
    “vote” transaction
    select count(*)
    from vote
    where phone_number = ?;
    insert into vote values (?,?);
    “results” transaction
    select contestant_number,
    count(*) num_votes
    from vote
    group by contestant_number;
  • Building an App - Schema (2/4)
    Partitioning “vote” data
    “vote” table partitioned by contestant_number
    “vote” table partitioned by phone_number
    25
    partition 1
    4 111-111-1111
    4 111-111-1111
    4 222-222-2222
    4 444-444-4444
    4 999-999-9999
    partition 2
    2 111-111-1111
    2 555-555-5555
    5 888-888-8888
    partition 3
    6 999-999-9999
    partition 1
    111-111-1111 2
    111-111-1111 4
    111-111-1111 4
    444-444-4444 4
    partition 2
    222-222-2222 4
    555-555-5555 2
    888-888-8888 5
    partition 3
    999-999-9999 4
    999-999-9999 6
  • Building an App - Schema (3/4)
    “vote” table (contestant_number, phone_number)
    If partitioned by contestant number…
    Transaction “vote” – multi-partition, must look in all partitions to see if voter is over the limit
    Transaction “results” – multi-partition by design
    Maximum of 12 partitions (no more than 12 contestants), popular contestants will have busier CPU and require more RAM
    26
    “vote” transaction
    select count(*)
    from vote
    where phone_number = ?;
    insert into vote values (?,?);
    “results” transaction
    select contestant_number,
    count(*) num_votes
    from vote
    group by contestant_number;
  • Building an App - Schema (4/4)
    “vote” table (contestant_number, phone_number)
    If partitioned by phone number…
    Transaction “vote” – single-partition, all votes for my phone number are within the partition
    Transaction “results” – multi-partition by design
    Virtually unlimited number of partitions, popular contestants are spread over all partitions CPU/RAM
    27
    “vote” transaction
    select count(*)
    from vote
    where phone_number = ?;
    insert into vote values (?,?);
    “results” transaction
    select contestant_number,
    count(*) num_votes
    from vote
    group by contestant_number;
  • Building an App - Views
    Increase performance by pre-aggregating results with materialized views
    VoltDB supports count() and sum()
    Extremely performant
    28
    “vote” transaction
    select count(*)
    from vote
    where phone_number = ?;
    insert into vote values (?,?);
    “results” transaction
    select contestant_number,
    count(*) num_votes
    from vote
    group by contestant_number;
    create view v_votes_by_phone_number
    (phone_number,
    num_votes)
    as select phone_number,
    count(*)
    from votes
    group by phone_number;
    create view v_votes_by_contestant_number
    (contestant_number,
    num_votes)
    as select contestant_number,
    count(*)
    from votes
    group by contestant_number;
  • Q & A
    Any questions?
    http://community.voltdb.com (website/forums)
    tcallaghan@voltdb.com (email)
    @tmcallaghan (Twitter)
    @voltdb (Twitter)
    29