Cassandra is designed to handle big data workloads across multiple nodes with no
single point of failure. Its architecture is based on the understanding that system and
hardware failures can and do occur. Cassandra addresses the problem of failures by
employing a peer-to-peer distributed system across homogeneous nodes where data
is distributed among all nodes in the cluster
Apache Cassandra is :
● Open Source
● A NoSQL (not only SQL) database Technology
● A Distributed Database Technology.
● It's a big data technology which provide massive scalability .
● Commonly used to create a database that is spread across nodes in more than 1 datacentre ,for
high availability .
● Based on Amazon dynamo and Google big table.
Installing cassandra :
===============
hadoop@hadoop-VirtualBox:~/cassandra$ ls
apache-cassandra-2.1.16 apache-cassandra-2.1.16-bin.tar.gz
hadoop@hadoop-VirtualBox:~/cassandra$ cd apache-cassandra-2.1.16/
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16$ ls
bin CHANGES.txt conf interface javadoc lib LICENSE.txt NEWS.txt NOTICE.txt pylib tools
CHANGES.txt doc javadoc LICENSE.txt NOTICE.txt tools
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ ls
cassandra-env.ps1 cassandra.yaml logback.xml
cassandra-env.sh commitlog_archiving.properties metrics-reporter-config-sample.yaml
cassandra-rackdc.properties cqlshrc.sample README.txt
cassandra-topology.properties hotspot_compiler triggers
cassandra-topology.yaml logback-tools.xml
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ gedit cassandra.yaml
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ cd
hadoop@hadoop-VirtualBox:~$ sudo mkdir /var/lib/cassandra
[sudo] password for hadoop:
hadoop@hadoop-VirtualBox:~$ sudo mkdir /var/log/cassandra
hadoop@hadoop-VirtualBox:~$ ls /var/lib/cassandra
hadoop@hadoop-VirtualBox:~$ ls /var/log/cassandra
hadoop@hadoop-VirtualBox:~$ sudo chown -R $USER:$GROUP /var/lib/cassandra
hadoop@hadoop-VirtualBox:~$ sudo chown -R $USER:$GROUP /var/log/cassandra
Starting Cassandra :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ cassandra -f
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ cassandra -f
cassandra: command not found
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ ./cassandra -f
Cassandra 3.0 and later require Java 8u40 or later.
Ps aux | grep cass --- to check cassandra job id
hadoop 2293 7.8 5.2 1935056 211064 pts/1 Sl+ 09:24 0:29 /usr/local/java/jdk1.7.0_79/bin/java -ea
-javaagent:./../lib/jamm-0.3.0.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M -Xmn100M
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-XX:CompileCommandFile=./../conf/hotspot_compiler -XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways
-XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true
-Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Dlogback.configurationFile=logback.xml
-Dcassandra.logdir=./../logs -Dcassandra.storagedir=./../data -Dcassandra-foreground=yes -cp
./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/ST4-4.0.8.jar:./../lib/airline-0.6.jar:./../lib/antlr-r
untime-3.5.2.jar:./../lib/apache-cassandra-2.1.16.jar:./../lib/apache-cassandra-clientutil-2.1.16.jar:./../lib/a
pache-cassandra-thrift-2.1.16.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commo
ns-lang3-3.1.jar:./../lib/commons-math3-3.2.jar:./../lib/compress-lzf-0.8.4.jar:./../lib/concurrentlinkedhash
map-lru-1.4.jar:./../lib/disruptor-3.0.1.jar:./../lib/guava-16.0.jar:./../lib/high-scale-lib-1.0.6.jar:./../lib/jackson-
core-asl-1.9.2.jar:./../lib/jackson-mapper-asl-1.9.2.jar:./../lib/jamm-0.3.0.jar:./../lib/javax.inject.jar:./../lib/jbc
rypt-0.3m.jar:./../lib/jline-1.0.jar:./../lib/jna-4.0.0.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.9.2.jar:./../lib/l
ogback-classic-1.1.2.jar:./../lib/logback-core-1.1.2.jar:./../lib/lz4-1.2.0.jar:./../lib/metrics-core-2.2.0.jar:./../li
b/netty-all-4.0.23.Final.jar:./../lib/reporter-config-2.1.0.jar:./../lib/slf4j-api-1.7.2.jar:./../lib/snakeyaml-1.11.ja
r:./../lib/snappy-java-1.0.5.2.jar:./../lib/stream-2.5.2.jar:./../lib/super-csv-2.1.0.jar:./../lib/thrift-server-0.3.7.ja
r org.apache.cassandra.service.CassandraDaemon
hadoop 2507 0.0 0.0 13580 924 pts/2 S+ 09:30 0:00 grep --color=auto cass
Main Cassandra Conf file
The main configuration file for setting the initialization properties for a cluster,
caching parameters for tables, properties for tuning and resource utilization, timeout
settings, client connections, backups, and security.
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ ls cassandra.yaml
cassandra.yaml
To check the status for cassandrta node :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 51.7 KB 256 100.0% f8a55ff5-fe3f-44dc-a8b9-99175485c84c rack1
We are seeing as Datacentre 1
UN - > U for staus Up and N for State Normal
Node tool info :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./nodetool info
ID : f8a55ff5-fe3f-44dc-a8b9-99175485c84c
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 51.7 KB
Generation No : 1484625254
Uptime (seconds) : 1179
Heap Memory (MB) : 80.77 / 1014.00
Off Heap Memory (MB) : 0.00
Data Center : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : entries 4, size 312 bytes, capacity 50 MB, 18 hits, 25 requests, 0.720 recent hit rate, 14400
save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save
period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 25 MB, 0 hits, 0 requests, NaN recent hit rate, 7200
save period in seconds
Token : (invoke with -T/--tokens to see all 256 tokens)
nodetool ring status – (single node)
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8714037520057488677
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8848194723078119543
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8853698384018712664
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8901979223032280038
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 9053994914891736648
Warning: "nodetool ring" is used to output all the tokens of a node.
To view status related info of a node use "nodetool status" instead.
System Log file :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/logs$ ls
system.log
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ grep log4j-server.properties *
cassandra-env.ps1: $env:JVM_OPTS = "$env:JVM_OPTS
-Dlog4j.configuration=log4j-server.properties"
Running CQL :
Like SQL in Cassandra we are using CQL
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.16 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
cqlsh>
Example :
cqlsh> DESCRIBE CLUSTER;
Cluster: Test Cluster
Partitioner: Murmur3Partitioner

Cassandra

  • 1.
    Cassandra is designedto handle big data workloads across multiple nodes with no single point of failure. Its architecture is based on the understanding that system and hardware failures can and do occur. Cassandra addresses the problem of failures by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster Apache Cassandra is : ● Open Source ● A NoSQL (not only SQL) database Technology ● A Distributed Database Technology. ● It's a big data technology which provide massive scalability . ● Commonly used to create a database that is spread across nodes in more than 1 datacentre ,for high availability . ● Based on Amazon dynamo and Google big table. Installing cassandra : =============== hadoop@hadoop-VirtualBox:~/cassandra$ ls apache-cassandra-2.1.16 apache-cassandra-2.1.16-bin.tar.gz hadoop@hadoop-VirtualBox:~/cassandra$ cd apache-cassandra-2.1.16/ hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16$ ls bin CHANGES.txt conf interface javadoc lib LICENSE.txt NEWS.txt NOTICE.txt pylib tools CHANGES.txt doc javadoc LICENSE.txt NOTICE.txt tools hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ ls cassandra-env.ps1 cassandra.yaml logback.xml cassandra-env.sh commitlog_archiving.properties metrics-reporter-config-sample.yaml cassandra-rackdc.properties cqlshrc.sample README.txt cassandra-topology.properties hotspot_compiler triggers cassandra-topology.yaml logback-tools.xml hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ gedit cassandra.yaml hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ cd hadoop@hadoop-VirtualBox:~$ sudo mkdir /var/lib/cassandra [sudo] password for hadoop: hadoop@hadoop-VirtualBox:~$ sudo mkdir /var/log/cassandra hadoop@hadoop-VirtualBox:~$ ls /var/lib/cassandra hadoop@hadoop-VirtualBox:~$ ls /var/log/cassandra hadoop@hadoop-VirtualBox:~$ sudo chown -R $USER:$GROUP /var/lib/cassandra hadoop@hadoop-VirtualBox:~$ sudo chown -R $USER:$GROUP /var/log/cassandra
  • 2.
    Starting Cassandra : hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$cassandra -f hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ cassandra -f cassandra: command not found hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ ./cassandra -f Cassandra 3.0 and later require Java 8u40 or later. Ps aux | grep cass --- to check cassandra job id hadoop 2293 7.8 5.2 1935056 211064 pts/1 Sl+ 09:24 0:29 /usr/local/java/jdk1.7.0_79/bin/java -ea -javaagent:./../lib/jamm-0.3.0.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=./../conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=./../logs -Dcassandra.storagedir=./../data -Dcassandra-foreground=yes -cp ./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/ST4-4.0.8.jar:./../lib/airline-0.6.jar:./../lib/antlr-r untime-3.5.2.jar:./../lib/apache-cassandra-2.1.16.jar:./../lib/apache-cassandra-clientutil-2.1.16.jar:./../lib/a pache-cassandra-thrift-2.1.16.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commo ns-lang3-3.1.jar:./../lib/commons-math3-3.2.jar:./../lib/compress-lzf-0.8.4.jar:./../lib/concurrentlinkedhash map-lru-1.4.jar:./../lib/disruptor-3.0.1.jar:./../lib/guava-16.0.jar:./../lib/high-scale-lib-1.0.6.jar:./../lib/jackson- core-asl-1.9.2.jar:./../lib/jackson-mapper-asl-1.9.2.jar:./../lib/jamm-0.3.0.jar:./../lib/javax.inject.jar:./../lib/jbc rypt-0.3m.jar:./../lib/jline-1.0.jar:./../lib/jna-4.0.0.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.9.2.jar:./../lib/l ogback-classic-1.1.2.jar:./../lib/logback-core-1.1.2.jar:./../lib/lz4-1.2.0.jar:./../lib/metrics-core-2.2.0.jar:./../li b/netty-all-4.0.23.Final.jar:./../lib/reporter-config-2.1.0.jar:./../lib/slf4j-api-1.7.2.jar:./../lib/snakeyaml-1.11.ja r:./../lib/snappy-java-1.0.5.2.jar:./../lib/stream-2.5.2.jar:./../lib/super-csv-2.1.0.jar:./../lib/thrift-server-0.3.7.ja r org.apache.cassandra.service.CassandraDaemon hadoop 2507 0.0 0.0 13580 924 pts/2 S+ 09:30 0:00 grep --color=auto cass Main Cassandra Conf file The main configuration file for setting the initialization properties for a cluster, caching parameters for tables, properties for tuning and resource utilization, timeout settings, client connections, backups, and security. hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ ls cassandra.yaml cassandra.yaml To check the status for cassandrta node : hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./nodetool status Datacenter: datacenter1
  • 3.
    ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 127.0.0.1 51.7 KB 256 100.0% f8a55ff5-fe3f-44dc-a8b9-99175485c84c rack1 We are seeing as Datacentre 1 UN - > U for staus Up and N for State Normal Node tool info : hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./nodetool info ID : f8a55ff5-fe3f-44dc-a8b9-99175485c84c Gossip active : true Thrift active : true Native Transport active: true Load : 51.7 KB Generation No : 1484625254 Uptime (seconds) : 1179 Heap Memory (MB) : 80.77 / 1014.00 Off Heap Memory (MB) : 0.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache : entries 4, size 312 bytes, capacity 50 MB, 18 hits, 25 requests, 0.720 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 25 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Token : (invoke with -T/--tokens to see all 256 tokens) nodetool ring status – (single node) 127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8714037520057488677 127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8848194723078119543 127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8853698384018712664 127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8901979223032280038 127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 9053994914891736648 Warning: "nodetool ring" is used to output all the tokens of a node. To view status related info of a node use "nodetool status" instead. System Log file : hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/logs$ ls system.log
  • 4.
    hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ grep log4j-server.properties* cassandra-env.ps1: $env:JVM_OPTS = "$env:JVM_OPTS -Dlog4j.configuration=log4j-server.properties" Running CQL : Like SQL in Cassandra we are using CQL hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.1.16 | CQL spec 3.2.1 | Native protocol v3] Use HELP for help. cqlsh> Example : cqlsh> DESCRIBE CLUSTER; Cluster: Test Cluster Partitioner: Murmur3Partitioner