Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Introduce Apache Cassandra - JavaTwo Taiwan, 2012

on

  • 2,194 views

 

Statistics

Views

Total Views
2,194
Views on SlideShare
2,194
Embed Views
0

Actions

Likes
0
Downloads
53
Comments
3

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduce Apache Cassandra - JavaTwo Taiwan, 2012 Presentation Transcript

  • 1. 美商優科無線 資深工程師 Boris Yen
  • 2. 專家講座 B:淺談 Apache Cassandra
  • 3. Outline• Cassandra vs SQL Server• Overview• Data in Cassandra• Data Partitioning• Data Replication• Data Consistency• Client Libraries
  • 4. Cassandra vs SQL Server• Cassandra o More servers = More capacity. o The concerns of scaling is transparent to application. o No single point of failure. o Horizontal scale.• SQL Server o More power machine = More capacity. o Adding capacity requires manual labor from ops people and substantial downtime. o There would be limit on how big you could go. o Vertical scale, Moore’s law scaling
  • 5. Overview• Features are coming from Dynamo and BigTable• Distributed o Data partitioned among all nodes• Extremely Scalable o Add new node = Add more capacity o Easy to add new node• Fault tolerant o All nodes are the same o Read/Write anywhere o Automatic Data replication• High Performance
  • 6. Overviewhttp://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance http://www.cubrid.org/blog/dev-platform/nosql- benchmarking/ http://techblog.netflix.com/2011/11/benchmarking- cassandra-scalability-on.html
  • 7. Data in Cassandra• Keyspace ~ Database in RDBMS• Column Family ~ Table in RDBMS Keyspace ColumnFamily { column: Phone, ID Addr Phone value: 09..., Key: Boris timestamp: 1000 1 ... Taiwan 09..... } timestamp is used to resolve conflict.
  • 8. Data in Cassandra• Keyspace o Where the replication strategy and replication factor is defined. CREATE KEYSPACE keyspace_name WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor=2;• ColumnFamily CREATE COLUMNFAMILY user ( id uuid PRIMARY KEY, address text, userName text ) WITH comment= AND comparator=text AND read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True AND compaction_strategy_class=SizeTieredCompactionStrategy AND compression_parameters:sstable_compression=org.apache.cassandra.io.compress.SnappyCompres sor;
  • 9. Data in Cassandra• Commit log o Used to capture write activities. Data durability is assured.• Memtable o Used to store most recent write activities.• SSTable o When a memtable got flushed to disk, it becomes a sstable.
  • 10. Data Read/Write• Write Data Commitlog Memtable Flushed SSTable• Read o Search Row cache, if the result is not empty, then return the result. No further actions are needed. o If no hit in the Row cache. Try to get data from Memtable(s) and SSTable(s) that might contain requested key. Collate the results and return.
  • 11. Data Compaction t2 > t1 Boris:{ name: boris (t1)sstable1 phone: 092xxx (t1) addr: tainan (t1) } Boris:{ addr: tainan (t1) email: y@gmail (t2) sstableX name: boris.yen (t2) Boris:{ phone: 092xxx (t1) name: boris.yen (t2) sex: male (t2)sstable2 sex: male (t2) email: y@gmail (t2) } } . . . .
  • 12. Data Partitioning• The total data managed by the cluster is represented as a circular space or ring.• Before a node can join the ring, it must be assigned a token.• The token determines the node’s position on the ring and the range of data it is responsible for.• Partitioning strategy o Random Partitioning  Default and Recommended o Order Partitioning  Sequential writes can cause hot spots  More administrative overhead to load balance the cluster
  • 13. Data Partitioning Random Partitioning t1 hash(k2) hash(k1)Data: k1 t5 t2 Data: k3 hash(k4) hash(k3) t4 t3
  • 14. Data Replication• To ensure fault tolerance and no single point of failure.• Replication is controlled by the parameters replication factor and replication strategy of a keyspace.• Replication factor controls how many copies of a row should be stored in the cluster• Replication strategy controls how the data being replicated.
  • 15. Data Replication Random Partitioning t1 RF=3 hash(k1)Data: k1 t5 t2 coordinator t4 t3
  • 16. Data Consistency• Cassandra supports tunable data consistency.• Choose from strong and eventual consistency depending on the need.• Can be done on a per-operation basis, and for both reads and writes.• Handles multi-data center operations
  • 17. Consistency Level Write Read Any One One Quorum QuorumLocal_Quorum Local_QuorumEach_Quorum Each_Quorum All All
  • 18. Built-in Consistency Repair Features• Read Repair• Hinted Handoff• Anti-Entropy Node Repairhttp://www.datastax.com/docs/0.8/dml/data_consistency#builtin-consistency
  • 19. Client Library for Java• Hector o https://github.com/hector-client/hector.git o https://github.com/hector-client/hector/wiki/User- Guide• Astyanax o https://github.com/Netflix/astyanax.git• CQL + JDBC o http://code.google.com/a/apache- extras.org/p/cassandra-jdbc/
  • 20. Hector• High level, simple object oriented interface to cassandra• Failover behavior on the client side• Connection pooling for improved performance and scalability• Automatic retry of downed hosts...
  • 21. Hector// slice querySliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se);q.setColumnFamily(cf).setKey("jsmith").setColumnNames("first", "last","middle");Result<ColumnSlice<String, String>> r = q.execute();// multi-getMultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer,stringSerializer);multigetSliceQuery.setColumnFamily("Standard1");multigetSliceQuery.setKeys("fake_key_0", "fake_key_1", "fake_key_2", "fake_key_3", "fake_key_4");multigetSliceQuery.setRange("", "", false, 3);Result<Rows<String, String, String>> result = multigetSliceQuery.execute();// batch operationMutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);mutator.addInsertion("jsmith", "Standard1",HFactory.createStringColumn("first", "John")).addInsertion("jsmith","Standard1", HFactory.createStringColumn("last","Smith")).addInsertion("jsmith", "Standard1",HFactory.createStringColumn("middle", "Q"));mutator.execute();https://github.com/hector-client/hector/wiki/User-Guide
  • 22. CQL+JDBCClass.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); String URL = String.format("jdbc:cassandra://%s:%d/%s",HOST,PORT,"system"); System.out.println("Connection URL = "+URL +""); con = DriverManager.getConnection(URL); Statement stmt = con.createStatement();// Create KeySpaceString createKS = String.format("CREATE KEYSPACE %s WITH strategy_class =SimpleStrategy AND strategy_options:replication_factor = 1;",KEYSPACE);stmt.execute(createKS);// Create the target Column family String createCF = "CREATE COLUMNFAMILY RegressionTest (keyname text PRIMARYKEY,” + "bValue boolean, “+ "iValue int “+ ") WITH comparator = ascii AND default_validation =bigint;"; stmt.execute(createCF);https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
  • 23. CQL+JDBCStatement statement = con.createStatement();String truncate = "TRUNCATE RegressionTest;";statement.execute(truncate);String insert1 = "INSERT INTO RegressionTest (keyname,bValue,iValue) VALUES (key0,true,2000);";statement.executeUpdate(insert1);String insert2 = "INSERT INTO RegressionTest (keyname,bValue) VALUES( key1,false);";statement.executeUpdate(insert2);String select = "SELECT * from RegressionTest;";ResultSet result = statement.executeQuery(select);ResultSetMetaData metadata = result.getMetaData();...https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
  • 24. Useful Tools• cassandra-cli o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/dml/using_cli• cqlsh o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/cql/index• nodetool o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/nodetool• stress o <cassandra-dir>/tools/bin o http://www.datastax.com/docs/1.0/references/stress_java
  • 25. Useful Tools• OpsCenter o http://www.datastax.com/products/opscenter• sstableloader o <cassandra-dir>/bin o http://www.datastax.com/dev/blog/bulk-loading• More tools http://en.wikipedia.org/wiki/Apache_Cassandra#Tools _for_Cassandra
  • 26. Questions?