Your SlideShare is downloading. ×
0
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra

1,829

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,829
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. cassandra
  • 2. Where Did Cassandra Come From• Cassandra originated at Facebook in 2007 to solve that company’s inbox search problem – large volumes of data – many random reads – many simultaneous random writes• was released as an open source Google Code project in July 2008• March 2009 it was moved to an Apache Incubator project• February 17, 2010 it was voted into a top-level project
  • 3. Cassandra in 50 Words or Less• Apache Cassandra is an – open source – distributed – Decentralized – elastically scalable – highly available – fault-tolerant – tuneably consistent – column-oriented• Database that• bases its distribution design on Amazon’s Dynamo• its data model on Google’s Bigtable• Created at Facebook• it is now used at some of the most popular sites on the Web
  • 4. Who Is Using Cassandra• Twitter is using Cassandra for analytics.• Mahalo uses it for its primary near-time data store.• Facebook still uses it for inbox search, though they are using a proprietary fork.• Digg uses it for its primary near-time data store.• Rackspace uses it for its cloud service, monitoring, and logging.• Reddit uses it as a persistent cache.• Cloudkick uses it for monitoring statistics and analytics.• Ooyala uses it to store and serve near real-time video analytics data.• SimpleGeo uses it as the main data store for its real-time location infrastructure.• Onespot uses it for a subset of its main data store
  • 5. Decentralized• Master/slave: Decentralized Master/slave all nodes are the same, If the master node fails, the failures of a whole database is in jeopardy node won’t disrupt service
  • 6. Elastic Scalability• add another machine—Cassandra will find it and start sending it work
  • 7. High Availability and Fault Tolerance
  • 8. SCID• Atomic – All or nothing• Consistent• Isolated – Two transaction modify same data• Durable
  • 9. Brewer’s CAP Theorem• you can strongly support only two of the Three: – Consistency • All database client will read the same value for same query, even given concurrent updates – Availability • All database clients will always be able to read and write data – Partition Tolerance • The database can be split into multiple machines • It can continue functioning in fact of network segmentation breaks
  • 10. CAPtransaction
  • 11. usage• Connect localhost/9160 ;• Show cluster name• Show keyspaces• Create keyspace XXXXX• Use XXXXX• Create column family YYYYY• Describe keyspace XXXXX
  • 12. • Set YYYYY[“XiaoMing”][“name”] = “小明”• Get YYYYY[“XiaoMing”]
  • 13. • List• Map• MapList<row_id, Map>
  • 14. • Column Family 列簇• create column family User with key_validation_class=UTF8Type
  • 15. Column family• Ddd
  • 16. Super column family• d
  • 17. Clusters (Ring)• If the first node goes down, a replica can respond to queries. The peer-to-peer protocol allows the data to replicate across nodes in a manner transparent to the user• Replaction factor
  • 18. Keyspaces• Don’t add too much Keyspaces• (database)
  • 19. Gossip protocols• intra-ring communication so that each node can have state information about other nodes• Runs every second• Gossip Message: – Send: GossipDigestSynMessage – Ack: GossipDigestAckMessage – send: GossipDigestAck2Message• algorithm : – Phi Accrual Failure Detection
  • 20. Anti-entropy• Anti-entropy is the replica synchronization mechanism in Cassandra for ensuring that data on different nodes is updated to the newest version• Merkle tree
  • 21. Memtable&SSTable&CommitLog• Memtable – Value is written to a memory-resident data structure• SSTable – Include: Data, Index, and Filter – concept borrowed from Google’s Bigtable – Memtable reaches a threshold, flushed to disk• Commit log – Flush status: 0 / 1 • 1:start to flush • 0: flush success
  • 22. hinted handoff & Compaction• hinted handoff – When a write no available – Create a hint to node Cassandra• Compaction: – In order to merge SSTable – merged data is sorted – new index is created over the sorted data
  • 23. major compaction• stored in memory• used to improve performance by reducing disk access on key lookups
  • 24. Tombstones 墓碑• Knows as “soft delete”• Not immediately deleted after execute a delete operation• Garbage Collection Grace Seconds: – GCGraceSeconds • Default: 10 days (864000 sec)
  • 25. Staged Event-Driven Architecture (SEDA)• originally proposed in a 2001 paper called “SEDA: An Architecture for Well-Conditioned, Scalable Internet Services”• A stage consists of an incoming event queue – Read – Mutation – Gossip – Response – Anti-Entropy – Load Balance – Migration – Streaming – …
  • 26. Custom FactoryUtil• Prevent version uncompatible
  • 27. Configuring Cassandra• system_add_keyspace – Creates a keyspace.• system_rename_keyspace – Changes the name of a keyspace after taking a snapshot of it. Note that this method – blocks until its work is done.• system_drop_keyspace – Deletes an entire keyspace after taking a snapshot of it.• system_add_column_family – Creates a column family.• system_drop_column_family – Deletes a column family after taking a snapshot of it.• system_rename_column_family – Changes the name of a column family after taking a snapshot of it. Note that this – method blocks until its work is done.
  • 28. Creating a Column Family• column_type – Either Super or Standard.• clock_type – The only valid value is Timestamp.• comparator – Valid options include AsciiType, BytesType, LexicalUUIDType, LongType, TimeUUID Type, and UTF8Type.• subcomparator – Name of comparator used for subcolumns when the column_type is Super. Valid options are the same as comparator.• reconciler – Name of the class that will reconcile conflicting column versions. The only valid value at this time is Timestamp.• comment – Any human-readable comment in the form of a string.• rows_cached – The number of rows to cache.• preload_row_cache – Set this to true to automatically load the row cache.• key_cache_size – The number of keys to pull into the cache.• read_repair_chance – Valid values are a number between 0.0 and 1.0.
  • 29. Replicas• Simple Strategy – RackUnawareStrategy• Old Network Topology Strategy – RackAwareStrategy• Network Topology Strategy – DataCenterShardStrategy – datacenter.properties
  • 30. Replication Factor• specifies how many copies of each piece of data will be stored and distributed throughout the Cassandra cluster• Factor = 1 : your data will exist only in a single node in the cluster. Losing that node means that data becomes unavailable
  • 31. Increasing the Replication Factor• Nodes grows and should increasing factor• How to do: – ensure that all the data is flushed to the SSTables • flush -h 192.168.1.1 -p 9160 – stop that node – copy the datafiles from your keyspaces – Paste those datafiles to the new node
  • 32. Replica Placement Strategies• Simple Strategy• Old Network Topology Strategy• Network Topology Strategy
  • 33. Adding Nodes to a Cluster• If you want to add a new seed node, then you should autobootstrap it first, and then change it to a seed afterward• Node1: – listen_address: 192.168.1.1 – rpc_address: 0.0.0.0• Node2: – auto_bootstrap: true – listen_address: 192.168.2.34 – rpc_address: 0.0.0.0
  • 34. Hector• Cluster myCluster = HFactory.getOrCreateCluster("Test Cluster", "192.168.2.3:9160");• ThriftCfDef columnFamilyDefinition = new ThriftCfDef("s3","nb",ComparatorType.UTF8TYPE );• columnFamilyDefinition.setReplicateOnWrite(tru e);
  • 35. Hector• ThriftCfDef columnFamilyDefinition = new ThriftCfDef("s3","bb",ComparatorType.UTF8TYPE);• columnFamilyDefinition.setKeyValidationClass("org.apache. cassandra.db.marshal.UTF8Type");• columnFamilyDefinition.setDefaultValidationClass("org.apa che.cassandra.db.marshal.UTF8Type");• //myCluster.addColumnFamily(columnFamilyDefinition) ;• columnFamilyDefinition.setId(1013);• myCluster.updateColumnFamily(columnFamilyDefinition);
  • 36. Hector• Keyspace myKeyspace = HFactory.createKeyspace("s3", myCluster);• Mutator<String> mutator = HFactory.createMutator(myKeyspace, StringSerializer.get());• mutator.insert("b", "bb", HFactory.createStringColumn("column1", "你好 在"));
  • 37. Hector• ColumnQuery q = HFactory.createColumnQuery(myKeyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get());• // set key, name, cf and execute• QueryResult<HColumn> r = q• .setColumnFamily("bb")• .setKey("b")• .setName("column1")• .execute();• // read value from the result• HColumn<String,String> c = r.get();• String value = c.getValue();• System.out.println(value);

×