Introduction to Apache Cassandra and support within WSO2 Platform
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Please help how can achieve multitenancy in cassandra
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
7,771
On Slideshare
6,551
From Embeds
1,220
Number of Embeds
41

Actions

Shares
Downloads
113
Comments
1
Likes
2

Embeds 1,220

http://srinathsview.blogspot.com 717
http://srinathsview.blogspot.in 161
http://srinathsview.blogspot.co.uk 36
http://srinathsview.blogspot.fr 27
http://srinathsview.blogspot.ca 27
http://srinathsview.blogspot.com.es 25
http://srinathsview.blogspot.de 25
http://srinathsview.blogspot.pt 24
http://tweetedtimes.com 21
http://srinathsview.blogspot.com.au 19
http://srinathsview.blogspot.com.br 18
http://srinathsview.blogspot.tw 13
http://srinathsview.blogspot.co.nz 12
http://srinathsview.blogspot.nl 12
http://srinathsview.blogspot.it 11
http://srinathsview.blogspot.co.il 8
http://www.linkedin.com 7
http://srinathsview.blogspot.ru 6
http://srinathsview.blogspot.mx 5
http://srinathsview.blogspot.jp 5
http://srinathsview.blogspot.hk 3
http://srinathsview.blogspot.com.ar 3
http://srinathsview.blogspot.se 3
http://a0.twimg.com 3
http://srinathsview.blogspot.be 3
http://srinathsview.blogspot.fi 3
http://srinathsview.blogspot.ch 3
http://srinathsview.blogspot.kr 3
https://www.linkedin.com 2
http://twitter.com 2
http://us-w1.rockmelt.com 2
http://people.apache.org 2
http://srinathsview.blogspot.sk 1
http://srinathsview.blogspot.ie 1
http://srinathsview.blogspot.sg 1
http://srinathsview.blogspot.co.at 1
http://webcache.googleusercontent.com 1
http://srinathsview.blogspot.hu 1
http://srinathsview.blogspot.cz 1
http://translate.googleusercontent.com 1
http://srinathsview.blogspot.ro 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to Apache Cassandra and support within WSO2 Platform
    Srinath Perera
    WSO2 Inc.
  • 2. Cassandra within the WSO2 Platform
    We support Apache Cassandra within WSO2 Platform
    This is to provide NoSQL data support within the platform
    Cassandra can be used for both Column family or Key-value pair usecases.
    Fully integrated with the Platform
    We will discuss what this means.
  • 3. What is Cassandra?
    Apache Cassandra http://cassandra.apache.org/
    NoSQL column family implementation (more about it later)
    Highly scalable, available and no Single Point of Failure.
    Very high write throughput and good read throughput. It is pretty fast.
    SQL like query language (from 0.8) and support search through secondary indexes (well no JOINs, Group By etc. ..).
    Tunable consistency and support replication
    Flexible Schema
  • 4. Column Family Data Model
    Column – name, value, and a timestamp (ignore this for now). Column is bit of a misnomer, may be they should have called it a named cell.
    E.g. author=“Asimov” .
    Row – row is a collection of Columns with a name. entries are sorted by the column names. You can do a slice and get some of the columns only.
    E.g. “Second Foundation”->{author=“Asmiov”, publishedDate=“..”,
    tag=“sci-fi”, tag2=“asimov” }
    Column family – Collection of rows, usually no sort order among rows*.
    Books->{
    “Foundation”->{author=“Asmiov”, publishedDate=“..”},
    “Second Foundation”->{author=“Asmiov”, publishedDate=“..”},
    …..
    }
    There are other stuff, but these are the key.
  • 5. Column Family Data Model (Contd.)
    It is crucial to understand that Cassandra Columns are very different from RDBMS Columns.
    Columns are only applied within a given row, different row may have different columns.
    You can have thousands to millions of column for a row (2 million max, and a row should fit in one node).
    Column names may represent data, not just metadata like with RDBMS.
    You will understand more with the example.
  • 6. OK?? How can I do something useful withthis?
  • 7. Example: Book Rating Site
    Let us take a Book rating site as an example. Users add books, comment them and tag them.
    Can Add books (author, rank, price, link)
    Can add Comments for books (text, time, name)
    Can add tags for books
    Need to list books sorted by rank
    Need to list books by tag
    Need to list comments for a book
  • 8. Relational Approach
    Schema
    Books(bookid, author, rank, price, link)
    Comments->(id, text, user, time, bookid)
    Tags(id, bookid, tag)
    Queries
    Select * from Books orderby rank;
    Select text, time, user from Comments where bookid=? Orderby time
    Select tag from Tags where bookid=?
    Select bookid from Tags where tag=“”
    Select distinct author from Tags, Books where Tags.bookid=Books.bookidand tag=?
  • 9. Cassandra Approach
    Schema
    Books[BookID->(author, rank, price, link, tag1, tag2 ..) ]
    Tags2Books[TagID->(timestamp1=bookID1, timestamp2=bookID2, ..) ]
    Tags2Authors[TagID->(timestamp1=bookID1, timestamp2=bookID2, ..) ]
    Comments[BookID-> (timestamp1= text + “-” + author …)]
    Ranks[“RANK” -> (rank=bookID)]
    Example data snapshot
  • 10. Potential Solution [Contd.]
    Handling Queries
  • 11. Some Queries You Can Not Do
    Above setup can do some queries it designed for.
    It can not queries it can not designed for
    For example, it can not do following
    Select * from Books where price > 50;
    Select * from Books where author=“Asimov”
    Select * from Books, Comments where rank> 9 && Comments.bookid=Books.bookid;
    Well it can, but by writing code to walk through. It is like supporting search by going through all the data.
    This is a limitation, specially when queries are provided at the runtime.
  • 12. A Sample Program
    Cluster cluster = HFactory.createCluster("TestCluster",
    new CassandraHostConfigurator("localhost:9160”));
    Keyspacekeyspace = HFactory.createKeyspace(keyspaceName, cluster);
    Mutator<String> mutator = HFactory.createMutator(keyspace, sser);
    mutator.insert(“wso2”, columnFamily,
    HFactory.createStringColumn("address", ”4131 El Camino Real Suite 200, Palo Alto, CA 94306"));
    ColumnQuery<String, String, String> columnQuery =
    HFactory.createStringColumnQuery(keyspace);
    columnQuery.setColumnFamily(columnFamily).setKey(”wso2”).setName("address");
    QueryResult<HColumn<String, String>> result = columnQuery.execute();
    System.out.println("received "+ result.get().getName() + "= "
    + result.get().getValue() + " ts = "+ result.get().getClock());
  • 13. Cassandra: How does it work?
    Nodes are arranged in a circle according to a key space(P2P networkand uses consistent hashing).
    Each node owns the next clockwise address space.
    If replicated, each node owns next two clockwise address spaces.
    Any node can accept any request and route it to the correct node.
  • 14. Cassandra: How does it work? (Contd.)
    Writes are written to enough nodes, and Cassandra repairs data while reading. (As you would guess, that is how writes are fast.)
    Data is updated in the memory, and it keeps an append only commit log to recover from failures. (This avoid rotational latency at the disk). Can do about 80-360MB/sec per node.
    When ever a read happens, Cassandra will sync all the nodes having replicas (read repair).
  • 15. All these are great, but what is the catch?
    Do not get me wrong, Cassandra is a great tool, but you have to know where it does not work.
  • 16. Surprises if you are using Cassandra
    No transactions, no JOINs. Hope there is no surprise here.
    No foreign keys, and keys are immutable. (well no JOINs, and use surrogate keys if you need to change keys)
    Keys has to be unique (use composite keys)
    Super Columns and order preserving partitioner are discouraged.
    Searching is complicated
    No Search coming from the core. Secondary indexes are layered on top, and they do not do range search or pattern search.
    When secondary indexes does not work, have to learn the data model and build your indexes using sort orders and slices.
    Sort orders are complicated
    Column are always sorted by name, but row order depends on the partitioner. Sort orders are crucial when you build your own indexes.
  • 17. Surprises if you are using Cassandra (Cont.)
    Failed Operations may leave changes
    If operation is successful, all is well
    If it failed, actually changes may have been applied. But operations are idempotent, so you can retry until successful.
    Batch operations are not atomic, but you can retry until successful (as operations are idempotent).
    If a node fails, Cassandra does not figure it out and do a self healing. Assuming you have replicas, things will continue to work. But the whole system recovers only when a manual recover operation is done.
    It remembers deletes
    When we delete a data item, a node may be down at the time and may come back after the delete is done. To avoid this, Cassandra mark the as deleted (Tombstones) but does not delete this until configurable timeout or a repair. Space is actually freed up only then.
  • 18. Cassandra within WSO2 Platform
  • 19. Cassandra within the WSO2 Platform
    As a part of WSO2 data solutions
    Because one storage cannot handle all cases
    Specifically for applications that need to scale. For applications that can work with a single DB, we have “Database as a Service”
    Two offerings
    Provide Cassandra as a Service
    Provide Cassandra within Carbon as a standalone product (integrated with WSO2 security model)
  • 20. Apache Cassandra as a Service
    Users can log in to the Web Console (both in Stratos and in WSO Data Server) and create Cassandra key spaces.
  • 21. Apache Cassandra as a Service (Contd.)
    Key spaces
    will be allocated from a Cassandra cluster
    they are isolated from other tenants in Stratos
    it is integrated with WSO2 Security model.
    Users can manage and share his key spaces through Stratos Web Console and use those key spaces through Hector Client (Java Client for Cassandra)
    In essence we provide
    Cassandra as a part of Stratos as a Service
    Multi-tenancy support
    Security integration with WSO2 security model
  • 22. A sample Program
    Map<String, String> credentials = new HashMap<String, String>();
    credentials.put(USERNAME_KEY, "admin@srinath.org");
    credentials.put(PASSWORD_KEY, "admin1234");
    Cluster cluster = HFactory.createCluster("TestCluster",
    new CassandraHostConfigurator("localhost:9160”, credentials));
    Keyspacekeyspace = HFactory.createKeyspace(keyspaceName, cluster);
    Mutator<String> mutator = HFactory.createMutator(keyspace, sser);
    mutator.insert(“wso2”, columnFamily,
    HFactory.createStringColumn("address", ”4131 El Camino Real Suite 200, Palo Alto, CA 94306"));
    ColumnQuery<String, String, String> columnQuery =
    HFactory.createStringColumnQuery(keyspace);
    columnQuery.setColumnFamily(columnFamily).setKey(”wso2”).setName("address");
    QueryResult<HColumn<String, String>> result = columnQuery.execute();
    System.out.println("received "+ result.get().getName() + "= "
    + result.get().getValue() + " ts = "+ result.get().getClock());
  • 23. Implementation
  • 24. Implementation (Contd.)
    Cassandra includes a plug point to add support for different security models at the server (Authentication and authorization for server).
    We do security integration and support isolation among tenants (multi-tenancy) by writing new implementation of this plug point.
    Also we provide a Web console to manage Cassandra Key spaces.
    Cassandra is highly scalable and highly available, so no work needed at that department.
  • 25. Cassandra within Carbon Platform
    Users may choose to run Carbon enabled Cassandra also in two other alternative settings.
    Running whole Stratos within a private Cloud
    Gets full support for the Multi-tenancy and other cloud benefits
    Let user run it in his own controlled environment
    Running a standalone Cassandra node (without Multi-tenancy)
    Get seamless integration with WSO2 Security model
    Use the Configuration Console for Cassandra
  • 26. Demo
  • 27. Summary
    We discuss what Cassandra is, its strength, weaknesses, and Column Family Data Model.
    Has a data model very different from relational style
    Need users to rethink their data model
    There is a complexity at design, which is a tradeoff for achieving higher scalability.
    Of course, Cassandra is not the solution for everything. It should be used when it make sense based on the usecase.
    We discuss Cassandra integration to WSO2 platform
    Carbon integration – how to run Cassandra that is integrated with WSO2 Carbon platform security model.
    Cassandra as a Service – how to use Cassandra as a Service from WSO2 Stratos Platform as a Service offering.
  • 28. References
    Apache Cassandra
    http://cassandra.apache.org
    Understanding Column family Model - http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
    Hector Client
    http://github.com/rantav/hector
    http://prettyprint.me/2010/08/06/hector-api-v2/
    Some Theory
    Malae, N., Cassandra--A Decentralized Structured Storage System
    Chang, F. and Dean, J. and Ghemawat, S. and Hsieh, W.C. and Wallach, D.A. and Burrows, M. and Chandra, T. and Fikes, A. and Gruber, R.E., Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), 2008
  • 29. Questions?