5. Key Features
Elastic scalability
Always on
architecture
Fast linear-scale
performance
Flexible data storage
Easy data
distribution
Operational
simplicity
Transaction support
6. Cluster
● Noeuds
o 1 seul type
o 1 IP
● Configuration
o cassandra.yaml
o cassandra-env.sh
o log4j.properties
● Données
o CommitLog
o Data
o Saved caches
o Log
7. Column oriented data model
Keyspace Table
(ColumnFamily)
Row:
Row key
Column:
Clustering key
Value
TTL
Map<PartKey, SortedMap<ClustKey, Column>>
Username Timestamp Message Status
gquintana 2015-01-15 Meetup Speaking
jdoe 2015-01-15 Meetup
jdoe 2015-01-16 Holidays
Partition
Row Key
Primary Key
11. Multi datacenters
● Physiques
o Distribution géographique
o Failover
● Logiques
o Live backup
o Analytique (OLAP)
● Sans ETL
● Replication factor
12. Réplication: Combien?
● Niveau keyspace
create keyspace meetup
with replication={
'class':'NetworkTopologyStrategy',
'lyon':2,'paris':2};
13. Réplication: Où?
● Niveau node:
o SimpleSnitch
o RackInferringSnitch
+ IP 10.dc.rc.n
o PropertyFileSnitch
+ cassandra-topology.properties
o GossipingPropertyFileSnitch
+ cassandra-rackdc.properties
o EC2Snitch, GoogleCloudSnitch, CloudstackSnitch
16. Etape 1 Copier/Télécharger
● Java 7
● Python 2.6+
o https://www.python.org/downloads/release/python-
279/ (ou bien paquets apt-get, yum)
● Cassandra 2.0 DSC (DataStax Community)
o http://planetcassandra.org/cassandra/
o http://downloads.datastax.com/community/dsc-
cassandra-2.0.11-bin.tar.gz
● DataStax OpsCenter Agent
o http://downloads.datastax.com/community/datastax-
agent-5.0.2.tar.gz
● Injecteur
17. Etape 2 Configurer le réseau
● Désactiver le Wifi & le firewall
● Configurer l’IP de manière statique
o IP 10.dc.rc.n
o Masque 255.0.0.0
● Pinguer & se faire pinguer
18. Etape 3 Installer
● Java: JAVA_HOME, java -version
● Python: python --version
● Cassandra
o Si besoin, supprimer les data
● OpsCenter Agent
● Synchroniser les horloges
Ne pas démarrer
Cassandra
19. Etape 4a Configurer Cassandra
● cassandra.yaml
o cluster_name: 'meetup'
o listen_address: 10.dc.rc.n
o rpc_address: 0.0.0.0
o seed_provider.parameters.seeds:
10.1.1.1, 10.1.1.2, 10.2.1.1
o endpoint_snitch: GossipingPropertyFileSnitch
o commitlog_directory, data_file_directories,
saved_caches_directory
Ne pas démarrer
Cassandra
20. Etape 4b Configurer Cassandra
● cassandra-rackdc.properties
o dc=lyon|paris, rack=RAC1
● log4-server.properties
o ...File=log/system.log
● Vérifier la configuration du voisin
Ne pas démarrer
Cassandra
21. Etape 5 Configurer DataStax Agent
● datastax-agent.bat
o set JAR=datastax-agent-5.0.2-standalone.jar
● address.yaml
o stomp_interface: '10.1.0.1'
o local_interface: '10.dc.rc.n'
22. Etape 6 Démarrer Cassandra
● Attendre le Go
● Démarrer Cassandra puis l’Agent
● Surveiller les logs
● Surveiller l’état
o du noeud nodetool info
o du cluster nodetool status
o des tokens nodetool ring
● Surveiller OpsCenter
23. Etape 7 Initialiser le schéma
create keyspace meetup
with replication={
'class':'NetworkTopologyStrategy',
'lyon':2,'paris':2};
create table metric (
host varchar,
name varchar,
date timestamp,
value bigint,
primary key ((host,name),date)
);
24. Etape 7 Insérer des données
● Avec cqlsh
● Démarrer l’injecteur
describe keyspaces;
describe keyspace meetup;
use meetup;
describe tables;
describe table metric;
select * from metric;
insert into metric(host,name,date,value)
values ('localhost','cpu',dateOf(now()), 12);
25. Scénario 1 Perte d’un noeud
● Hinted handoff
● Surveiller
o notetool status
o OpsCenter
o cqlsh
select * from system.hints
26. Scénario 2 Ajout d’un noeud
● Streaming
● Surveiller
o Cluster: nodetool status
o Streaming: nodetool netstats
o OpsCenter
27. Scénario 3 Perte d’un datacenter
● Surveiller
o notetool status
o OpsCenter
28. Conclusion
● Les vertues de Cassandra
o Tolérance aux pannes
o Scalabilité linéaire
o Configuration Simple
● Reset configuration réseau
o Configuration IP
o Réactiver firewall
Editor's Notes
Elastic scalability - Allows you to easily add capacity online to accommodate more customers and more data whenever you need.
Always on architecture - Contains no single point of failure (as with traditional master/slave RDBMS’s and other NoSQL solutions) resulting in continuous availability for business-critical applications that can’t afford to go down, ever.
Fast linear-scale performance - Enables sub-second response times with linear scalability (double your throughput with two nodes, quadruple it with four, and so on) to deliver response time speeds your customers have come to expect.
Flexible data storage - Easily accommodates the full range of data formats including: structured, semi-structured and unstructured, that run through today’s modern applications. Also dynamically accommodates changes to your data structures as your data needs evolve.
Easy data distribution - Gives you maximum flexibility to distribute data where you need by replicating data across multiple datacenters, the cloud and even mixed cloud/on-premise environments – all of which are becoming extremely common deployment environments. Read and write to any node with all changes being automatically synchronized across a cluster.
Operational simplicity - with all nodes in a cluster being the same, there is no complex configuration to manage so administration duties are greatly simplified.
Transaction support - Delivers atomicity, isolation and durability of ACID compliance through its use of a commit log to capture all writes and built-in redundancies that ensure data durability in the event of hardware failures, as well as transaction isolation, atomicity, with consistency being tunable.