Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Cassandra
1. Cassandra is designed to handle big data workloads across multiple nodes with no
single point of failure. Its architecture is based on the understanding that system and
hardware failures can and do occur. Cassandra addresses the problem of failures by
employing a peer-to-peer distributed system across homogeneous nodes where data
is distributed among all nodes in the cluster
Apache Cassandra is :
● Open Source
● A NoSQL (not only SQL) database Technology
● A Distributed Database Technology.
● It's a big data technology which provide massive scalability .
● Commonly used to create a database that is spread across nodes in more than 1 datacentre ,for
high availability .
● Based on Amazon dynamo and Google big table.
Installing cassandra :
===============
hadoop@hadoop-VirtualBox:~/cassandra$ ls
apache-cassandra-2.1.16 apache-cassandra-2.1.16-bin.tar.gz
hadoop@hadoop-VirtualBox:~/cassandra$ cd apache-cassandra-2.1.16/
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16$ ls
bin CHANGES.txt conf interface javadoc lib LICENSE.txt NEWS.txt NOTICE.txt pylib tools
CHANGES.txt doc javadoc LICENSE.txt NOTICE.txt tools
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ ls
cassandra-env.ps1 cassandra.yaml logback.xml
cassandra-env.sh commitlog_archiving.properties metrics-reporter-config-sample.yaml
cassandra-rackdc.properties cqlshrc.sample README.txt
cassandra-topology.properties hotspot_compiler triggers
cassandra-topology.yaml logback-tools.xml
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ gedit cassandra.yaml
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ cd
hadoop@hadoop-VirtualBox:~$ sudo mkdir /var/lib/cassandra
[sudo] password for hadoop:
hadoop@hadoop-VirtualBox:~$ sudo mkdir /var/log/cassandra
hadoop@hadoop-VirtualBox:~$ ls /var/lib/cassandra
hadoop@hadoop-VirtualBox:~$ ls /var/log/cassandra
hadoop@hadoop-VirtualBox:~$ sudo chown -R $USER:$GROUP /var/lib/cassandra
hadoop@hadoop-VirtualBox:~$ sudo chown -R $USER:$GROUP /var/log/cassandra
2. Starting Cassandra :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ cassandra -f
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ cassandra -f
cassandra: command not found
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-3.9/bin$ ./cassandra -f
Cassandra 3.0 and later require Java 8u40 or later.
Ps aux | grep cass --- to check cassandra job id
hadoop 2293 7.8 5.2 1935056 211064 pts/1 Sl+ 09:24 0:29 /usr/local/java/jdk1.7.0_79/bin/java -ea
-javaagent:./../lib/jamm-0.3.0.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M -Xmn100M
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-XX:CompileCommandFile=./../conf/hotspot_compiler -XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways
-XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true
-Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Dlogback.configurationFile=logback.xml
-Dcassandra.logdir=./../logs -Dcassandra.storagedir=./../data -Dcassandra-foreground=yes -cp
./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/ST4-4.0.8.jar:./../lib/airline-0.6.jar:./../lib/antlr-r
untime-3.5.2.jar:./../lib/apache-cassandra-2.1.16.jar:./../lib/apache-cassandra-clientutil-2.1.16.jar:./../lib/a
pache-cassandra-thrift-2.1.16.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commo
ns-lang3-3.1.jar:./../lib/commons-math3-3.2.jar:./../lib/compress-lzf-0.8.4.jar:./../lib/concurrentlinkedhash
map-lru-1.4.jar:./../lib/disruptor-3.0.1.jar:./../lib/guava-16.0.jar:./../lib/high-scale-lib-1.0.6.jar:./../lib/jackson-
core-asl-1.9.2.jar:./../lib/jackson-mapper-asl-1.9.2.jar:./../lib/jamm-0.3.0.jar:./../lib/javax.inject.jar:./../lib/jbc
rypt-0.3m.jar:./../lib/jline-1.0.jar:./../lib/jna-4.0.0.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.9.2.jar:./../lib/l
ogback-classic-1.1.2.jar:./../lib/logback-core-1.1.2.jar:./../lib/lz4-1.2.0.jar:./../lib/metrics-core-2.2.0.jar:./../li
b/netty-all-4.0.23.Final.jar:./../lib/reporter-config-2.1.0.jar:./../lib/slf4j-api-1.7.2.jar:./../lib/snakeyaml-1.11.ja
r:./../lib/snappy-java-1.0.5.2.jar:./../lib/stream-2.5.2.jar:./../lib/super-csv-2.1.0.jar:./../lib/thrift-server-0.3.7.ja
r org.apache.cassandra.service.CassandraDaemon
hadoop 2507 0.0 0.0 13580 924 pts/2 S+ 09:30 0:00 grep --color=auto cass
Main Cassandra Conf file
The main configuration file for setting the initialization properties for a cluster,
caching parameters for tables, properties for tuning and resource utilization, timeout
settings, client connections, backups, and security.
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ ls cassandra.yaml
cassandra.yaml
To check the status for cassandrta node :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./nodetool status
Datacenter: datacenter1
3. =======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 51.7 KB 256 100.0% f8a55ff5-fe3f-44dc-a8b9-99175485c84c rack1
We are seeing as Datacentre 1
UN - > U for staus Up and N for State Normal
Node tool info :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./nodetool info
ID : f8a55ff5-fe3f-44dc-a8b9-99175485c84c
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 51.7 KB
Generation No : 1484625254
Uptime (seconds) : 1179
Heap Memory (MB) : 80.77 / 1014.00
Off Heap Memory (MB) : 0.00
Data Center : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : entries 4, size 312 bytes, capacity 50 MB, 18 hits, 25 requests, 0.720 recent hit rate, 14400
save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save
period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 25 MB, 0 hits, 0 requests, NaN recent hit rate, 7200
save period in seconds
Token : (invoke with -T/--tokens to see all 256 tokens)
nodetool ring status – (single node)
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8714037520057488677
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8848194723078119543
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8853698384018712664
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 8901979223032280038
127.0.0.1 rack1 Up Normal 51.7 KB 100.00% 9053994914891736648
Warning: "nodetool ring" is used to output all the tokens of a node.
To view status related info of a node use "nodetool status" instead.
System Log file :
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/logs$ ls
system.log
4. hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/conf$ grep log4j-server.properties *
cassandra-env.ps1: $env:JVM_OPTS = "$env:JVM_OPTS
-Dlog4j.configuration=log4j-server.properties"
Running CQL :
Like SQL in Cassandra we are using CQL
hadoop@hadoop-VirtualBox:~/cassandra/apache-cassandra-2.1.16/bin$ ./cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.16 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
cqlsh>
Example :
cqlsh> DESCRIBE CLUSTER;
Cluster: Test Cluster
Partitioner: Murmur3Partitioner