Data SLA in the public cloud
Upcoming SlideShare
Loading in...5
×
 

Data SLA in the public cloud

on

  • 2,151 views

 

Statistics

Views

Total Views
2,151
Views on SlideShare
2,150
Embed Views
1

Actions

Likes
1
Downloads
22
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data SLA in the public cloud Data SLA in the public cloud Presentation Transcript

  • Data SLA in the cloud
  • About Us
    ScaleBase is a new startup targeting the database-as-a-service market (DBaaS)
    We offer unlimited database scalability and availability using our Database Load Balancer
    We launch in September, 2010. Stay tuned at our site.
  • Agenda
    The requirements for data SLA in public cloud environments
    Achieving data SLA with NOSQL
    Achieving data SLA with relational databases
  • The requirements for data SLA in public cloud environments
  • What We Need
    Availability
    Consistency
    Scalability
  • Brewer's (CAP) Theorem
    It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
    Consistency (all nodes see the same data at the same time)
    Availability (node failures do not prevent survivors from continuing to operate)
    Partition Tolerance (the system continues to operate despite arbitrary message loss)
    http://en.wikipedia.org/wiki/CAP_theorem
  • What It Means
    http://guyharrison.squarespace.com/blog/2010/6/13/consistency-models-in-non-relational-databases.html
  • Dealing With CAP
    Drop Partition Tolerance
    Run everything on one machine.
    This is, of course, not very scalable.
  • Dealing With CAP
    Drop Availability
    If a partition fail, everything waits until the data is consistent again.
    This can be very complex to handle over a large number of nodes.
  • Dealing With CAP
    Drop Consistency
    Welcome to the “Eventually Consistent” term.
    At the end – everything will work out just fine - And hi, sometimes this is a good enough solution
    When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent
    For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service
    Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID
  • Reading More On CAP
    This is an excellent read, and some of my samples are from this blog
    http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  • Achieving data SLA with relational databases
  • Databases And CAP
    ACID – Consistency
    Availability – tons of solutions, most of them not cloud oriented
    Oracle RAC
    MySQL Proxy
    Etc.
    Replication based solutions can solve at least read availability and scalability (see Azure SQL)
  • Database Cloud Solutions
    Amazon RDS
    NaviSite Oracle RAC
    Not that popular
    Costs to cloud providers (complexity, not standard)
  • So Where Is The Problem?
    Partition Tolerance just doesn’t work
    Scaling problems (usually write but also read)
    BigData problems
  • Scaling Up
    Issues with scaling up when the dataset is just too big
    RDBMS were not designed to be distributed
    Began to look at multi-node database solutions
    Known as ‘scaling out’ or ‘horizontal scaling’
    Different approaches include:
    Master-slave
    Sharding
  • Scaling RDBMS – Master/Slave
    Master-Slave
    All writes are written to the master. All reads performed against the replicated slave databases
    Critical reads may be incorrect as writes may not have been propagated down
    Large data sets can pose problems as master needs to duplicate data to slaves
  • Scaling RDBMS - Sharding
    Partition or sharding
    Scales well for both reads and writes
    Not transparent, application needs to be partition-aware
    Can no longer have relationships/joins across partitions
    Loss of referential integrity across shards
  • Other ways to scale RDBMS
    Multi-Master replication
    INSERT only, not UPDATES/DELETES
    No JOINs, thereby reducing query time
    This involves de-normalizing data
    In-memory databases
  • Achieving data SLA with NOSQL
  • NoSQL
    A term used to designate databases which differ from classic relational databases in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage, a term which would include classic relational databases as a subset.
    http://en.wikipedia.org/wiki/NoSQL
  • NoSQL Types
    Key/Value
    A big hash table
    Examples: Voldemort, Amazon Dynamo
    Big Table
    Big table, column families
    Examples: Hbase, Cassandra
    Document based
    Collections of collections
    Examples: CouchDB, MongoDB
    Graph databases
    Based on graph theory
    Examples: Neo4J
    Each solves a different problem
  • NO-SQL
    http://browsertoolkit.com/fault-tolerance.png
  • Pros/Cons
    Pros:
    Performance
    BigData
    Most solutions are open source
    Data is replicated to nodes and is therefore fault-tolerant (partitioning)
    Don't require a schema
    Can scale up and down
    Cons:
    Code change
    No framework support
    Not ACID
    Eco system (BI, Backup)
    There is always a database at the backend
    Some API is just too simple
  • Amazon S3 Code Sample
    AWSAuthConnection conn = new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey, secure, server, format);
    Response response = conn.createBucket(bucketName, location, null);
    final String text = "this is a test";
    response = conn.put(bucketName, key, new S3Object(text.getBytes(), null), null);
  • Cassandra Code Sample
    CassandraClient cl = pool.getClient() ;
    KeySpaceks = cl.getKeySpace("Keyspace1") ;
    // insert value
    ColumnPathcp = new ColumnPath("Standard1" , null, "testInsertAndGetAndRemove".getBytes("utf-8"));
    for(int i = 0 ; i < 100 ; i++){
    ks.insert("testInsertAndGetAndRemove_"+i, cp , ("testInsertAndGetAndRemove_value_"+i).getBytes("utf-8"));
    }
    //get value
    for(inti = 0 ; i < 100 ; i++){
    Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);
    String value = new String(col.getValue(),"utf-8") ;
    }
    //remove value
    for(int i = 0 ; i < 100 ; i++){
    ks.remove("testInsertAndGetAndRemove_"+i, cp);
    }
  • Cassandra Code Sample – Cont’
    try{
    ks.remove("testInsertAndGetAndRemove_not_exist", cp);
    }catch(Exception e){
    fail("remove not exist row should not throw exceptions");
    }
    //get already removed value
    for(int i = 0 ; i < 100 ; i++){
    try{
    Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);
    fail("the value should already being deleted");
    }catch(NotFoundException e){
    }catch(Exception e){
    fail("throw out other exception, should be NotFoundException." + e.toString() );
    }
    }
    pool.releaseClient(cl) ;
    pool.close() ;
  • Cassandra Statistics
    Facebook Search
    MySQL > 50 GB Data
    Writes Average : ~300 ms
    Reads Average : ~350 ms
    Rewritten with Cassandra > 50 GB Data
    Writes Average : 0.12 ms
    Reads Average : 15 ms
  • MongoDB
    Mongo m = new Mongo();
    DB db = m.getDB( "mydb" );
    Set<String> colls = db.getCollectionNames();
    for (String s : colls) {
    System.out.println(s);
    }
  • MongoDB – Cont’
    BasicDBObjectdoc = new BasicDBObject();
    doc.put("name", "MongoDB");
    doc.put("type", "database");
    doc.put("count", 1);
    BasicDBObject info = new BasicDBObject();
    info.put("x", 203);
    info.put("y", 102);
    doc.put("info", info);
    coll.insert(doc);
  • Neo4J
    GraphDatabaseServicegraphDb = new EmbeddedGraphDatabase("var/base");
    Transaction tx = graphDb.beginTx();
    try {
    Node firstNode = graphDb.createNode();
    Node secondNode = graphDb.createNode();
    Relationship relationship = firstNode.createRelationshipTo(secondNode, MyRelationshipTypes.KNOWS);
    firstNode.setProperty("message", "Hello, ");
    secondNode.setProperty("message", "world!");
    relationship.setProperty("message", "brave Neo4j ");
    tx.success();
    System.out.print(firstNode.getProperty("message"));
    System.out.print(relationship.getProperty("message"));
    System.out.print(secondNode.getProperty("message"));
    }
    finally {
    tx.finish();
    graphDb.shutdown();
    }
  • The Bottom Line
  • Data SLA
    There is no golden hammer
    Choose your tool wisely, based on what you need
    Usually
    Start with RDBMS (shortest TTM, which is what we really care about)
    When scale issues occur – start moving to NoSQL based on your needs
    You can get Data SLA in the cloud – just think before you code!!!