美商優科無線 資深工程師       Boris Yen
專家講座 B:淺談 Apache Cassandra
Outline•   Cassandra vs SQL Server•   Overview•   Data in Cassandra•   Data Partitioning•   Data Replication•   Data Consi...
Cassandra vs SQL Server•   Cassandra    o More servers = More capacity.    o The concerns of scaling is transparent to app...
Overview•   Features are coming from Dynamo and BigTable•   Distributed    o   Data partitioned among all nodes•   Extreme...
Overviewhttp://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance                                           ...
Data in Cassandra•   Keyspace ~ Database in RDBMS•   Column Family ~ Table in RDBMS    Keyspace     ColumnFamily          ...
Data in Cassandra•    Keyspace      o   Where the replication strategy and replication factor          is defined.        ...
Data in Cassandra•   Commit log    o   Used to capture write activities. Data durability is        assured.•   Memtable   ...
Data Read/Write•   Write            Data          Commitlog        Memtable                                               ...
Data Compaction                                    t2 > t1           Boris:{             name: boris (t1)sstable1     phon...
Data Partitioning•   The total data managed by the cluster is    represented as a circular space or ring.•   Before a node...
Data Partitioning           Random           Partitioning                          t1            hash(k2)            hash(...
Data Replication•   To ensure fault tolerance and no single point    of failure.•   Replication is controlled by the param...
Data Replication             Random Partitioning                                   t1             RF=3                    ...
Data Consistency•   Cassandra supports tunable data    consistency.•   Choose from strong and eventual    consistency depe...
Consistency Level   Write           Read    Any    One             One  Quorum          QuorumLocal_Quorum    Local_Quorum...
Built-in Consistency Repair                  Features•   Read Repair•   Hinted Handoff•   Anti-Entropy Node Repairhttp://w...
Client Library for Java•   Hector    o https://github.com/hector-client/hector.git    o https://github.com/hector-client/h...
Hector•   High level, simple object oriented    interface to cassandra•   Failover behavior on the client side•   Connecti...
Hector// slice querySliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se);q.setColumnFamily(cf).setKey(...
CQL+JDBCClass.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");    String URL = String.format("jdbc:cassandra://%s...
CQL+JDBCStatement statement = con.createStatement();String truncate = "TRUNCATE RegressionTest;";statement.execute(truncat...
Useful Tools•   cassandra-cli    o <cassandra-dir>/bin    o http://www.datastax.com/docs/1.0/dml/using_cli•   cqlsh    o <...
Useful Tools•   OpsCenter    o    http://www.datastax.com/products/opscenter•   sstableloader    o    <cassandra-dir>/bin ...
Questions?
Upcoming SlideShare
Loading in...5
×

Introduce Apache Cassandra - JavaTwo Taiwan, 2012

1,856

Published on

Published in: Technology
3 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
1,856
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
55
Comments
3
Likes
0
Embeds 0
No embeds

No notes for slide

Introduce Apache Cassandra - JavaTwo Taiwan, 2012

  1. 1. 美商優科無線 資深工程師 Boris Yen
  2. 2. 專家講座 B:淺談 Apache Cassandra
  3. 3. Outline• Cassandra vs SQL Server• Overview• Data in Cassandra• Data Partitioning• Data Replication• Data Consistency• Client Libraries
  4. 4. Cassandra vs SQL Server• Cassandra o More servers = More capacity. o The concerns of scaling is transparent to application. o No single point of failure. o Horizontal scale.• SQL Server o More power machine = More capacity. o Adding capacity requires manual labor from ops people and substantial downtime. o There would be limit on how big you could go. o Vertical scale, Moore’s law scaling
  5. 5. Overview• Features are coming from Dynamo and BigTable• Distributed o Data partitioned among all nodes• Extremely Scalable o Add new node = Add more capacity o Easy to add new node• Fault tolerant o All nodes are the same o Read/Write anywhere o Automatic Data replication• High Performance
  6. 6. Overviewhttp://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance http://www.cubrid.org/blog/dev-platform/nosql- benchmarking/ http://techblog.netflix.com/2011/11/benchmarking- cassandra-scalability-on.html
  7. 7. Data in Cassandra• Keyspace ~ Database in RDBMS• Column Family ~ Table in RDBMS Keyspace ColumnFamily { column: Phone, ID Addr Phone value: 09..., Key: Boris timestamp: 1000 1 ... Taiwan 09..... } timestamp is used to resolve conflict.
  8. 8. Data in Cassandra• Keyspace o Where the replication strategy and replication factor is defined. CREATE KEYSPACE keyspace_name WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor=2;• ColumnFamily CREATE COLUMNFAMILY user ( id uuid PRIMARY KEY, address text, userName text ) WITH comment= AND comparator=text AND read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True AND compaction_strategy_class=SizeTieredCompactionStrategy AND compression_parameters:sstable_compression=org.apache.cassandra.io.compress.SnappyCompres sor;
  9. 9. Data in Cassandra• Commit log o Used to capture write activities. Data durability is assured.• Memtable o Used to store most recent write activities.• SSTable o When a memtable got flushed to disk, it becomes a sstable.
  10. 10. Data Read/Write• Write Data Commitlog Memtable Flushed SSTable• Read o Search Row cache, if the result is not empty, then return the result. No further actions are needed. o If no hit in the Row cache. Try to get data from Memtable(s) and SSTable(s) that might contain requested key. Collate the results and return.
  11. 11. Data Compaction t2 > t1 Boris:{ name: boris (t1)sstable1 phone: 092xxx (t1) addr: tainan (t1) } Boris:{ addr: tainan (t1) email: y@gmail (t2) sstableX name: boris.yen (t2) Boris:{ phone: 092xxx (t1) name: boris.yen (t2) sex: male (t2)sstable2 sex: male (t2) email: y@gmail (t2) } } . . . .
  12. 12. Data Partitioning• The total data managed by the cluster is represented as a circular space or ring.• Before a node can join the ring, it must be assigned a token.• The token determines the node’s position on the ring and the range of data it is responsible for.• Partitioning strategy o Random Partitioning  Default and Recommended o Order Partitioning  Sequential writes can cause hot spots  More administrative overhead to load balance the cluster
  13. 13. Data Partitioning Random Partitioning t1 hash(k2) hash(k1)Data: k1 t5 t2 Data: k3 hash(k4) hash(k3) t4 t3
  14. 14. Data Replication• To ensure fault tolerance and no single point of failure.• Replication is controlled by the parameters replication factor and replication strategy of a keyspace.• Replication factor controls how many copies of a row should be stored in the cluster• Replication strategy controls how the data being replicated.
  15. 15. Data Replication Random Partitioning t1 RF=3 hash(k1)Data: k1 t5 t2 coordinator t4 t3
  16. 16. Data Consistency• Cassandra supports tunable data consistency.• Choose from strong and eventual consistency depending on the need.• Can be done on a per-operation basis, and for both reads and writes.• Handles multi-data center operations
  17. 17. Consistency Level Write Read Any One One Quorum QuorumLocal_Quorum Local_QuorumEach_Quorum Each_Quorum All All
  18. 18. Built-in Consistency Repair Features• Read Repair• Hinted Handoff• Anti-Entropy Node Repairhttp://www.datastax.com/docs/0.8/dml/data_consistency#builtin-consistency
  19. 19. Client Library for Java• Hector o https://github.com/hector-client/hector.git o https://github.com/hector-client/hector/wiki/User- Guide• Astyanax o https://github.com/Netflix/astyanax.git• CQL + JDBC o http://code.google.com/a/apache- extras.org/p/cassandra-jdbc/
  20. 20. Hector• High level, simple object oriented interface to cassandra• Failover behavior on the client side• Connection pooling for improved performance and scalability• Automatic retry of downed hosts...
  21. 21. Hector// slice querySliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se);q.setColumnFamily(cf).setKey("jsmith").setColumnNames("first", "last","middle");Result<ColumnSlice<String, String>> r = q.execute();// multi-getMultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer,stringSerializer);multigetSliceQuery.setColumnFamily("Standard1");multigetSliceQuery.setKeys("fake_key_0", "fake_key_1", "fake_key_2", "fake_key_3", "fake_key_4");multigetSliceQuery.setRange("", "", false, 3);Result<Rows<String, String, String>> result = multigetSliceQuery.execute();// batch operationMutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);mutator.addInsertion("jsmith", "Standard1",HFactory.createStringColumn("first", "John")).addInsertion("jsmith","Standard1", HFactory.createStringColumn("last","Smith")).addInsertion("jsmith", "Standard1",HFactory.createStringColumn("middle", "Q"));mutator.execute();https://github.com/hector-client/hector/wiki/User-Guide
  22. 22. CQL+JDBCClass.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); String URL = String.format("jdbc:cassandra://%s:%d/%s",HOST,PORT,"system"); System.out.println("Connection URL = "+URL +""); con = DriverManager.getConnection(URL); Statement stmt = con.createStatement();// Create KeySpaceString createKS = String.format("CREATE KEYSPACE %s WITH strategy_class =SimpleStrategy AND strategy_options:replication_factor = 1;",KEYSPACE);stmt.execute(createKS);// Create the target Column family String createCF = "CREATE COLUMNFAMILY RegressionTest (keyname text PRIMARYKEY,” + "bValue boolean, “+ "iValue int “+ ") WITH comparator = ascii AND default_validation =bigint;"; stmt.execute(createCF);https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
  23. 23. CQL+JDBCStatement statement = con.createStatement();String truncate = "TRUNCATE RegressionTest;";statement.execute(truncate);String insert1 = "INSERT INTO RegressionTest (keyname,bValue,iValue) VALUES (key0,true,2000);";statement.executeUpdate(insert1);String insert2 = "INSERT INTO RegressionTest (keyname,bValue) VALUES( key1,false);";statement.executeUpdate(insert2);String select = "SELECT * from RegressionTest;";ResultSet result = statement.executeQuery(select);ResultSetMetaData metadata = result.getMetaData();...https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
  24. 24. Useful Tools• cassandra-cli o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/dml/using_cli• cqlsh o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/cql/index• nodetool o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/nodetool• stress o <cassandra-dir>/tools/bin o http://www.datastax.com/docs/1.0/references/stress_java
  25. 25. Useful Tools• OpsCenter o http://www.datastax.com/products/opscenter• sstableloader o <cassandra-dir>/bin o http://www.datastax.com/dev/blog/bulk-loading• More tools http://en.wikipedia.org/wiki/Apache_Cassandra#Tools _for_Cassandra
  26. 26. Questions?
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×