Your SlideShare is downloading. ×
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduce Apache Cassandra - JavaTwo Taiwan, 2012


Published on

Published in: Technology
  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 美商優科無線 資深工程師 Boris Yen
  • 2. 專家講座 B:淺談 Apache Cassandra
  • 3. Outline• Cassandra vs SQL Server• Overview• Data in Cassandra• Data Partitioning• Data Replication• Data Consistency• Client Libraries
  • 4. Cassandra vs SQL Server• Cassandra o More servers = More capacity. o The concerns of scaling is transparent to application. o No single point of failure. o Horizontal scale.• SQL Server o More power machine = More capacity. o Adding capacity requires manual labor from ops people and substantial downtime. o There would be limit on how big you could go. o Vertical scale, Moore’s law scaling
  • 5. Overview• Features are coming from Dynamo and BigTable• Distributed o Data partitioned among all nodes• Extremely Scalable o Add new node = Add more capacity o Easy to add new node• Fault tolerant o All nodes are the same o Read/Write anywhere o Automatic Data replication• High Performance
  • 6. Overview benchmarking/ cassandra-scalability-on.html
  • 7. Data in Cassandra• Keyspace ~ Database in RDBMS• Column Family ~ Table in RDBMS Keyspace ColumnFamily { column: Phone, ID Addr Phone value: 09..., Key: Boris timestamp: 1000 1 ... Taiwan 09..... } timestamp is used to resolve conflict.
  • 8. Data in Cassandra• Keyspace o Where the replication strategy and replication factor is defined. CREATE KEYSPACE keyspace_name WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor=2;• ColumnFamily CREATE COLUMNFAMILY user ( id uuid PRIMARY KEY, address text, userName text ) WITH comment= AND comparator=text AND read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True AND compaction_strategy_class=SizeTieredCompactionStrategy AND sor;
  • 9. Data in Cassandra• Commit log o Used to capture write activities. Data durability is assured.• Memtable o Used to store most recent write activities.• SSTable o When a memtable got flushed to disk, it becomes a sstable.
  • 10. Data Read/Write• Write Data Commitlog Memtable Flushed SSTable• Read o Search Row cache, if the result is not empty, then return the result. No further actions are needed. o If no hit in the Row cache. Try to get data from Memtable(s) and SSTable(s) that might contain requested key. Collate the results and return.
  • 11. Data Compaction t2 > t1 Boris:{ name: boris (t1)sstable1 phone: 092xxx (t1) addr: tainan (t1) } Boris:{ addr: tainan (t1) email: y@gmail (t2) sstableX name: boris.yen (t2) Boris:{ phone: 092xxx (t1) name: boris.yen (t2) sex: male (t2)sstable2 sex: male (t2) email: y@gmail (t2) } } . . . .
  • 12. Data Partitioning• The total data managed by the cluster is represented as a circular space or ring.• Before a node can join the ring, it must be assigned a token.• The token determines the node’s position on the ring and the range of data it is responsible for.• Partitioning strategy o Random Partitioning  Default and Recommended o Order Partitioning  Sequential writes can cause hot spots  More administrative overhead to load balance the cluster
  • 13. Data Partitioning Random Partitioning t1 hash(k2) hash(k1)Data: k1 t5 t2 Data: k3 hash(k4) hash(k3) t4 t3
  • 14. Data Replication• To ensure fault tolerance and no single point of failure.• Replication is controlled by the parameters replication factor and replication strategy of a keyspace.• Replication factor controls how many copies of a row should be stored in the cluster• Replication strategy controls how the data being replicated.
  • 15. Data Replication Random Partitioning t1 RF=3 hash(k1)Data: k1 t5 t2 coordinator t4 t3
  • 16. Data Consistency• Cassandra supports tunable data consistency.• Choose from strong and eventual consistency depending on the need.• Can be done on a per-operation basis, and for both reads and writes.• Handles multi-data center operations
  • 17. Consistency Level Write Read Any One One Quorum QuorumLocal_Quorum Local_QuorumEach_Quorum Each_Quorum All All
  • 18. Built-in Consistency Repair Features• Read Repair• Hinted Handoff• Anti-Entropy Node Repair
  • 19. Client Library for Java• Hector o o Guide• Astyanax o• CQL + JDBC o
  • 20. Hector• High level, simple object oriented interface to cassandra• Failover behavior on the client side• Connection pooling for improved performance and scalability• Automatic retry of downed hosts...
  • 21. Hector// slice querySliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se);q.setColumnFamily(cf).setKey("jsmith").setColumnNames("first", "last","middle");Result<ColumnSlice<String, String>> r = q.execute();// multi-getMultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer,stringSerializer);multigetSliceQuery.setColumnFamily("Standard1");multigetSliceQuery.setKeys("fake_key_0", "fake_key_1", "fake_key_2", "fake_key_3", "fake_key_4");multigetSliceQuery.setRange("", "", false, 3);Result<Rows<String, String, String>> result = multigetSliceQuery.execute();// batch operationMutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);mutator.addInsertion("jsmith", "Standard1",HFactory.createStringColumn("first", "John")).addInsertion("jsmith","Standard1", HFactory.createStringColumn("last","Smith")).addInsertion("jsmith", "Standard1",HFactory.createStringColumn("middle", "Q"));mutator.execute();
  • 22. CQL+JDBCClass.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); String URL = String.format("jdbc:cassandra://%s:%d/%s",HOST,PORT,"system"); System.out.println("Connection URL = "+URL +""); con = DriverManager.getConnection(URL); Statement stmt = con.createStatement();// Create KeySpaceString createKS = String.format("CREATE KEYSPACE %s WITH strategy_class =SimpleStrategy AND strategy_options:replication_factor = 1;",KEYSPACE);stmt.execute(createKS);// Create the target Column family String createCF = "CREATE COLUMNFAMILY RegressionTest (keyname text PRIMARYKEY,” + "bValue boolean, “+ "iValue int “+ ") WITH comparator = ascii AND default_validation =bigint;"; stmt.execute(createCF);
  • 23. CQL+JDBCStatement statement = con.createStatement();String truncate = "TRUNCATE RegressionTest;";statement.execute(truncate);String insert1 = "INSERT INTO RegressionTest (keyname,bValue,iValue) VALUES (key0,true,2000);";statement.executeUpdate(insert1);String insert2 = "INSERT INTO RegressionTest (keyname,bValue) VALUES( key1,false);";statement.executeUpdate(insert2);String select = "SELECT * from RegressionTest;";ResultSet result = statement.executeQuery(select);ResultSetMetaData metadata = result.getMetaData();...
  • 24. Useful Tools• cassandra-cli o <cassandra-dir>/bin o• cqlsh o <cassandra-dir>/bin o• nodetool o <cassandra-dir>/bin o• stress o <cassandra-dir>/tools/bin o
  • 25. Useful Tools• OpsCenter o• sstableloader o <cassandra-dir>/bin o• More tools _for_Cassandra
  • 26. Questions?