Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Upcoming SlideShare
Loading in...5
×
 

Introduce Apache Cassandra - JavaTwo Taiwan, 2012

on

  • 2,106 views

 

Statistics

Views

Total Views
2,106
Views on SlideShare
2,106
Embed Views
0

Actions

Likes
0
Downloads
53
Comments
3

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

13 of 3 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduce Apache Cassandra - JavaTwo Taiwan, 2012 Introduce Apache Cassandra - JavaTwo Taiwan, 2012 Presentation Transcript

    • 美商優科無線 資深工程師 Boris Yen
    • 專家講座 B:淺談 Apache Cassandra
    • Outline• Cassandra vs SQL Server• Overview• Data in Cassandra• Data Partitioning• Data Replication• Data Consistency• Client Libraries
    • Cassandra vs SQL Server• Cassandra o More servers = More capacity. o The concerns of scaling is transparent to application. o No single point of failure. o Horizontal scale.• SQL Server o More power machine = More capacity. o Adding capacity requires manual labor from ops people and substantial downtime. o There would be limit on how big you could go. o Vertical scale, Moore’s law scaling
    • Overview• Features are coming from Dynamo and BigTable• Distributed o Data partitioned among all nodes• Extremely Scalable o Add new node = Add more capacity o Easy to add new node• Fault tolerant o All nodes are the same o Read/Write anywhere o Automatic Data replication• High Performance
    • Overviewhttp://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance http://www.cubrid.org/blog/dev-platform/nosql- benchmarking/ http://techblog.netflix.com/2011/11/benchmarking- cassandra-scalability-on.html
    • Data in Cassandra• Keyspace ~ Database in RDBMS• Column Family ~ Table in RDBMS Keyspace ColumnFamily { column: Phone, ID Addr Phone value: 09..., Key: Boris timestamp: 1000 1 ... Taiwan 09..... } timestamp is used to resolve conflict.
    • Data in Cassandra• Keyspace o Where the replication strategy and replication factor is defined. CREATE KEYSPACE keyspace_name WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor=2;• ColumnFamily CREATE COLUMNFAMILY user ( id uuid PRIMARY KEY, address text, userName text ) WITH comment= AND comparator=text AND read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True AND compaction_strategy_class=SizeTieredCompactionStrategy AND compression_parameters:sstable_compression=org.apache.cassandra.io.compress.SnappyCompres sor;
    • Data in Cassandra• Commit log o Used to capture write activities. Data durability is assured.• Memtable o Used to store most recent write activities.• SSTable o When a memtable got flushed to disk, it becomes a sstable.
    • Data Read/Write• Write Data Commitlog Memtable Flushed SSTable• Read o Search Row cache, if the result is not empty, then return the result. No further actions are needed. o If no hit in the Row cache. Try to get data from Memtable(s) and SSTable(s) that might contain requested key. Collate the results and return.
    • Data Compaction t2 > t1 Boris:{ name: boris (t1)sstable1 phone: 092xxx (t1) addr: tainan (t1) } Boris:{ addr: tainan (t1) email: y@gmail (t2) sstableX name: boris.yen (t2) Boris:{ phone: 092xxx (t1) name: boris.yen (t2) sex: male (t2)sstable2 sex: male (t2) email: y@gmail (t2) } } . . . .
    • Data Partitioning• The total data managed by the cluster is represented as a circular space or ring.• Before a node can join the ring, it must be assigned a token.• The token determines the node’s position on the ring and the range of data it is responsible for.• Partitioning strategy o Random Partitioning  Default and Recommended o Order Partitioning  Sequential writes can cause hot spots  More administrative overhead to load balance the cluster
    • Data Partitioning Random Partitioning t1 hash(k2) hash(k1)Data: k1 t5 t2 Data: k3 hash(k4) hash(k3) t4 t3
    • Data Replication• To ensure fault tolerance and no single point of failure.• Replication is controlled by the parameters replication factor and replication strategy of a keyspace.• Replication factor controls how many copies of a row should be stored in the cluster• Replication strategy controls how the data being replicated.
    • Data Replication Random Partitioning t1 RF=3 hash(k1)Data: k1 t5 t2 coordinator t4 t3
    • Data Consistency• Cassandra supports tunable data consistency.• Choose from strong and eventual consistency depending on the need.• Can be done on a per-operation basis, and for both reads and writes.• Handles multi-data center operations
    • Consistency Level Write Read Any One One Quorum QuorumLocal_Quorum Local_QuorumEach_Quorum Each_Quorum All All
    • Built-in Consistency Repair Features• Read Repair• Hinted Handoff• Anti-Entropy Node Repairhttp://www.datastax.com/docs/0.8/dml/data_consistency#builtin-consistency
    • Client Library for Java• Hector o https://github.com/hector-client/hector.git o https://github.com/hector-client/hector/wiki/User- Guide• Astyanax o https://github.com/Netflix/astyanax.git• CQL + JDBC o http://code.google.com/a/apache- extras.org/p/cassandra-jdbc/
    • Hector• High level, simple object oriented interface to cassandra• Failover behavior on the client side• Connection pooling for improved performance and scalability• Automatic retry of downed hosts...
    • Hector// slice querySliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se);q.setColumnFamily(cf).setKey("jsmith").setColumnNames("first", "last","middle");Result<ColumnSlice<String, String>> r = q.execute();// multi-getMultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer,stringSerializer);multigetSliceQuery.setColumnFamily("Standard1");multigetSliceQuery.setKeys("fake_key_0", "fake_key_1", "fake_key_2", "fake_key_3", "fake_key_4");multigetSliceQuery.setRange("", "", false, 3);Result<Rows<String, String, String>> result = multigetSliceQuery.execute();// batch operationMutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);mutator.addInsertion("jsmith", "Standard1",HFactory.createStringColumn("first", "John")).addInsertion("jsmith","Standard1", HFactory.createStringColumn("last","Smith")).addInsertion("jsmith", "Standard1",HFactory.createStringColumn("middle", "Q"));mutator.execute();https://github.com/hector-client/hector/wiki/User-Guide
    • CQL+JDBCClass.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); String URL = String.format("jdbc:cassandra://%s:%d/%s",HOST,PORT,"system"); System.out.println("Connection URL = "+URL +""); con = DriverManager.getConnection(URL); Statement stmt = con.createStatement();// Create KeySpaceString createKS = String.format("CREATE KEYSPACE %s WITH strategy_class =SimpleStrategy AND strategy_options:replication_factor = 1;",KEYSPACE);stmt.execute(createKS);// Create the target Column family String createCF = "CREATE COLUMNFAMILY RegressionTest (keyname text PRIMARYKEY,” + "bValue boolean, “+ "iValue int “+ ") WITH comparator = ascii AND default_validation =bigint;"; stmt.execute(createCF);https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
    • CQL+JDBCStatement statement = con.createStatement();String truncate = "TRUNCATE RegressionTest;";statement.execute(truncate);String insert1 = "INSERT INTO RegressionTest (keyname,bValue,iValue) VALUES (key0,true,2000);";statement.executeUpdate(insert1);String insert2 = "INSERT INTO RegressionTest (keyname,bValue) VALUES( key1,false);";statement.executeUpdate(insert2);String select = "SELECT * from RegressionTest;";ResultSet result = statement.executeQuery(select);ResultSetMetaData metadata = result.getMetaData();...https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
    • Useful Tools• cassandra-cli o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/dml/using_cli• cqlsh o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/cql/index• nodetool o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/nodetool• stress o <cassandra-dir>/tools/bin o http://www.datastax.com/docs/1.0/references/stress_java
    • Useful Tools• OpsCenter o http://www.datastax.com/products/opscenter• sstableloader o <cassandra-dir>/bin o http://www.datastax.com/dev/blog/bulk-loading• More tools http://en.wikipedia.org/wiki/Apache_Cassandra#Tools _for_Cassandra
    • Questions?