Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2

The
C*
Experience
at
Orange
Jean
Armel
Luce
Orange
France
Season
2
Thursday,
September
11
2014

2
The Cassandra experience at Orange - season 1
Summary
1. Why
did
we
choose
C*
for
our
applicaHon
PnS
?
2. Our
migraHon
strategy
(without
any
interrupHon
of
service)
3. AOer
the
migraHon
…
§ For more details , watch « the Cassandra Summit Europe 2013 »:
– hQps://www.youtube.com/watch?v=mefOE9K7sLI&feature=youtube_gdata
Jean Armel Luce - Orange-France

3
The Cassandra experience at Orange - season 2
Summary
1. Short
descripHon
of
the
applicaHon
PnS
2. Keyring:
why
design
customer
ids
with
graphs
in
C*
?
3. BYOHH
(Bring
Your
Own
Hadoop
&
Hive)
with
Cassandra

If you missed season 1:
Short description of PnS

5
PnS: Short description of the application
§ PnS means Profile and Syndication: a highly available service for
collecting and serving live data about Orange customers
§ End users:
– Orange customers (www.orange.fr)
– Sellers in Orange shops
– Some services in Orange (advertisements, …)

6
PnS: The Big Picture
End users
Millions of HTTP requests
(Rest or Soap)
Fast and highly available
WebService to get or set
data stored by pns:
- postProcessing(data1)
- postProcessing(datax)
- …
Database
PNS
Data providers
Thousands of files
(Csv, json or Xml)
Scheduled data injection
DB Queries
R/W operations

§ 1 multi DC cluster
§ and web services
(read and writes)
§ for batch updates
7
PnS: Architecture at the end of 2013
2 DCs architecture for high availability
Bagnol
et
Sophia
Antipolis

8
PnS: Some key dates about the PnS3.0 project
Season 1 (From April 2013 to October 2013)
Migration to C*
Season 2 part 1 (November 2013)
Keyring
Season 2 part 1 (April 2014)
Hadoop & Hive for Analytics

Keyring:
Why
design
customer
ids
with
graphs
in
C*
?

10
PnS database design
§ Nearly 35 tables at the end of 2013
CREATE TABLE customers (
customer_id varchar,
col1 varchar,
col2 bigint,
col3 set<text>,
...,
coln timestamp,
PRIMARY KEY (customer_id));
§ SELECT colx, coly, colz FROM customers WHERE customer_id = '???' ;

11
Customer ids
§ What is a customer id ?
– cell
number
– internet
account
– email
address
– ISE
(internal
identifier
used
by
many
other
Orange
applications)
– ....
§ For many reasons, data is stored in tables with different primary keys
– some data are often retrieved using a cell number
è stored when possible in a table where PK is a cell number
– … but all customers don’t have a cell number
è stored in a table where PK is not a cell number
– …

12
Customer ids translation
§ A PnS user knows only 1 customer id
§ He often needs to retrieve data indexed by another kind of cust id in the
DB
My cell number
is (209)
123-4567 SELECT * FROM pns
WHERE cust_id = ‘ISE_QWERTY’
customer_id
translation

13
Database design in the old relational databases
§ Design with secondary indexes ?
SELECT email_address FROM customer_ids WHERE cell_number
= ???;
§ Requires a lot of secondary indexes with values having high cardinality
§ With C*, secondary indexes with values having a high cardinality are
wasteful
ISE Cell_
number
email_
address … idtypeN
Primary
Key
Secondary
indexes
intranet
account

14
Design with graph for C*
IdType=‘EMAIL’
IdValue=‘priam@
orange.com’ IdType=‘ISE’
IdValue=‘myISE1’
IdType=‘Internet’
IdValue=‘99999999999
’
IdType=‘Cell’
IdValue=(209) 123-4567
IdType=‘EMAIL’
IdValue=‘hecuba@
orange.com’
IdType=ISE
IdValue=‘myISE2’
IdType=‘Cell’
IdValue=(209)
123-4568
Type=‘ISE’
Value=‘myISE3’
IdType=‘Cell’
IdValue=(209)
123-4569
IdType=‘EMAIL’
IdValue=‘paris@
orange.com’
IdType=‘EMAIL’
IdValue=‘alexander@
orange.com’

15
The new « Customer ids » table in C*
§ Table of edges between customer ids
CREATE TABLE graph(
idvalue1 text, -- type of the initial vertex of the arc
idtype1 text, -- value of the initial vertex of the arc
idvalue2 text, -- type of the terminal vertex of the arc
idtype2 text, -- value of the terminal vertex of the arc
attr map<text, text>, -- a column of map type for storing any kind of property
t timestamp,
PRIMARY KEY ((idvalue1) , idtype1 , idtype2 , idvalue2 )
);
SELECT * FROM graph WHERE idvalue1 IN (‘???’)

16
Small independant graphs
§ 500.000.000 edges in the graph
§ The keyring graph is not a single large graph
§ It’s rather a lot of small independant undirected graphs
Ø Each vertex has a small neighborhood.
Ø The search of a customer id is limited into a small subset of
the edges and vertices

17
Atomicity
§ The edges are bi-directional (undirected)
– We need to insert or update 2 rows for each edge
– The atomic batch mode guarantees that the 2 directions are updated
atomically

18
Optimization of the search of the shortest path
§ We know which kind of customer id are used by the PnS users
§ We know which kind of customer id are used for indexation
§ For each pair, the shortest paths are predefined in our application
PnS (according to the kind of customer ids)

19
Search API in the graph
§ An in-house C++ library offers an API for an iterative breadth-first
graph exploration
§ Example: looking for H from A
E
C
H
F
D
G
I
A
B
SELECT * FROM graph WHERE credval1 IN (‘B’, ‘F’);

20
Nb queries per search
§ Looking for a direct neighbour requires only 1 SELECT
§ Looking for a neighbour of a neighbour requires 2 SELECT
§ Looking for a neighbour of a neighbour of a neighbour requires 3
SELECT
§ …

21
Search Response time
Number of searches/sec Response time per search (in ms)
Nearly 700 searches/sec 2ms < RTT < 3.5 ms
§ A search executed using 1, 2 or 3 reads è very low response time
(thanks to FusionIO and C++ code)

22
Conclusions about Keyring
§ We had to rethink this feature,
because C* != RDBMS
§ At first glance, a graph looks
like an exotic design … but for
our use case, it works well with
C* … and FusionIO.
§ Favoring the access to data
through the partitioning key is
very efficient for getting a low
response time and a linear
scalability.

BYOHH:
Bring Your Own Hadoop &
Hive with Cassandra

24
Basic architecture of the Cassandra cluster
§ Cluster without Hadoop: 2 datacenters, 16 nodes in each DC
§ RF (DC1, DC2) = (3, 3)
§ CL = ONE or LOCAL_QUORUM for online queries
§ Requests from web servers in DC1 are sent to C* nodes in DC1
§ Requests from web servers in DC2 are sent to C* nodes in DC2
Pool
of
web
servers
DC1
Pool
of
web
servers
DC2
DC1 DC2

25
Adding a new datacenter for analytics
§ Cluster with Hadoop/Hive: 3 datacenters, 16 nodes in DC1, 16
nodes in DC2, 4 nodes in DC3
§ RF (DC1, DC2, DC3) = (3, 3, 1)
§ Because RF = 1 in DC3, we need less storage space in this
datacenter
§ We favor cheaper disks (SATA) in DC3 rather than SSDs or
FusionIo cards

26
Architecture of the Cassandra cluster with the
new datacenter for analytics
DC1 DC2
DC3
Pool
of
web
servers
DC1
Pool
of
web
servers
DC2

27
Potential impacts of map reduce tasks for online
queries
DC1 DC2
DC3
Pool
of
web
servers
DC1
Pool
of
web
servers
DC2
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
Timeouts
HH
Timeouts
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH
Hinted Handoffs
for online
update queries
not replicated in DC3
Timeouts
due to
CL=ONE
used for
online
READ
queries
Map reduce tasks take all the
resources (CPU, RAM, IO, …)

28
Isolation between online queries and map reduce tasks:
CL ANY
CL LOCAL_ONE
CL ONE
CL LOCAL_QUORUM
CL EACH_QUORUM
CL QUORUM
CL ALL
Solution for timeouts (online READ queries)
§ Use a LOCAL CONSISTENCY LEVEL:
– For map reduce tasks in DC3:
– LOCAL_ONE
– For online queries in DC1 or DC2:
– LOCAL_ONE
– LOCAL_QUORUM
LOCAL_ONE is available since C* 1.2.12
(cf. JIRA CASSANDRA-6238)
Timeouts
due to
CL=ONE
used for
online
READ
queries

29
Solution for Hinted Handoffs (online WRITE
queries) 1/2
Guarantee on resources for online queries
§ Use CGROUPS:
– Can guarantee a minimum of CPU/RAM for online queries
– Cgroups cannot be used for I/O disks (Map tasks call C* processes
when reading data on disk)
Hinted Handoffs
for online
update queries

30
Solution for Hinted Handoffs (online WRITE queries)
2/2
Swap global and local read repair
chances
§ By default, in C* 1.2:
– read_repair_chance = 0.1
– dclocal_read_repair_chance = 0.0
§ For highly read tables, the read repairs are
not sent to DC3:
– Set read_repair_chance = 0.00
– Set dclocal_read_repair_chance = 0.1
Ø Less load and IO disks in DC3
DCLOCAL_READ_REPAIR_CHANCE=0.1 is
now the default since C* 2.0.9 (cf. JIRA
CASSANDRA_7320)
Hinted Handoffs
for online
update queries
DC1 DC2
DC3

§ 256 VN per C* node is usually recommended
§ At least 1 map task per virtual node in DC3
31
Tradeoff “ease of exploitation vs optimization”
– Disabling virtual nodes in DC3
adding new nodes in DC3 is less easy
shorten the execution time
– Enabling virtual nodes in DC3
adding new nodes in DC3 is easier,
What is the right number of vnodes ? 64 VN/node
looks good.

32
Contributions and open sourced modules
§ Hive Handler open sourced by Orange
§ Works with CDH4.4 and C* 1.2.13
§ Feature added to this handler: authentication
§ Github:
https://github.com/Orange-OpenSource/cassandra_handler_for_hive
Thanks to Cyril Scetbon for this handler

33
Conclusions about BYOHH
§ The installation of Hadoop & Hive is tricky, but we didn’t have choice for
analytics because CQL has many limitations
§ We had to rethink our architecture. Now, we are able to do analytics with
Hadoop + Hive with a better isolation between online and analytics queries.
§ We have also discovered an interesting ecosystem around C* which offers
more capabilities. With this ecosystem, we can benefit from the strengths of
C* and workaround some of the limitations.

35
Season 2: Conclusions
1. Rethink
2. Adapt
3. Leverage

36
Thank
you

37
Questions

38
A few answers about hardware/OS version /Java version/
Cassandra version/Hadoop version
§ Hardware:
§ 16 nodes in DC1 and DC2 at the end of 2013:
§ 2 CPU 6cores each Intel® Xeon® 2.00 GHz
§ 64 GB RAM
§ FusionIO ® 800 GB MLC
§ 4 nodes in DC3
§ 24 GB de RAM
§ 2 CPU 6cores each Intel® Xeon® 2.00 GHz
§ SATA Disks 15Krpm
§ OS: Ubuntu Precise (12.04 LTS)
§ Cassandra version: 1.2.13
§ Hadoop version: CDH 4.4 (with Hive 0.10): Hadoop 2 with MRv1
§ Hive handler: https://github.com/Orange-OpenSource/cassandra_handler_for_hive
§ Java version: Java7u45 (GC CMS with option CMSClassUnloadingEnabled)

39
A
few
answers
about
data
and
requests
§ Data types:
§ Volume: 6 TB at the end of 2013
§ elementary types: boolean, integer, string, date
§ collection types
§ complex types: json, xml (between 1 and 20 KB)
§ Requests:
§ 10.000 requests/sec at the end of 2013
§ 80% get
§ 20% set
§ Consistency level used by PnS for online queries and batch updates:
§ LOCAL_ONE (95% of the queries)
§ LOCAL_QUORUM (5% of the queries)

Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2

Similar to Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2 (20)

More from DataStax Academy

More from DataStax Academy (20)

Recently uploaded

Recently uploaded (20)

Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2