Cassandra trainings presentation for R&D:
Training objectives links:
http://www.datastax.com/what-we-offer/products-services/training/objectives-developer
http://www.datastax.com/what-we-offer/products-services/training/objectives-administrator
14. Scalability
Cassandra – One additional
server adds performance to whole ring.
Mongo – one more server in replica set increases read performance
Adding shard requires adding whole replica set
21:00
15. Indexing
Cassandra secondary indexes
MongoDB secondary, geospatial, unique
Every node has its own part of index
Does not scale !!!
18:00
16. Schema denormalization
Create additional table and duplicate data
Use instead of indexes and joins
select * from audiofile where id = 1
select * from audiofile where artist = Sting
15:00
17. CQL: Cassandra query language (v3.1.1)
http://cassandra.apache.org/doc/cql3/CQL.html
DDL: Data definition language
DML: Data modification language
CREATE TABLE monkeySpecies (
species text PRIMARY KEY,
common_name text,
population varint,
average_size int
)
CREATE KEYSPACE Excelsior
WITH replication = {
'class':'SimpleStrategy',
'replication_factor' : 3
}
SELECT time, value
FROM events
WHERE event_type = 'myEvent'
AND time > '2011-02-03'
AND time <= '2012-01-01'
INSERT INTO NerdMovies
(movie, director, main_actor, year)
VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion',
2005)
USING TTL 86400;
12:00
18. Time series example
CREATE TABLE timeseries (
pkey date,
skey time,
temperature 19,
PRIMARY KEY (pkey, skey)
)
select * from timeseries
9:00
19. Map-Reduce
Hadoop map reduce is used
Advanced Task Tracker balancing
Use Pig & Hive. Almost not possible Java code
6:00
22. Full text search
Solr sharding the same problem like with secondary indexes
MongoDB full text search
db.articles.ensureIndex( { subject: "text" } )
db.articles.runCommand( "text", { search: "bake coffee -cake" } )