Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed

UNITED STATES CHILE INDIA NISUM.COM P. 1
What is “big” in big data?
……Cassandra
Faraz Mohammed
VP @ Nisum
• Innovation Lab – time boxed, fixed cost co-research with clients
on complex problems
• IT Consulting/Implementation
June 9, 2017
Global Software Architecture Conference

Simplifying complex technologies (cutting edge) adoption, backed
by deep research and understanding.
Who are we?

Agenda
– What is “big” in Bigdata?
– Something interesting happening
– Cassandra

Does data size really
matters today?
BigData
.......Break thru to process
large amount of data
RDBMS
……Large data, yet
RDBMS
…..Struggle of converting
OLTP to OLAP
Technology
Explosion
.......Too many options,
complex choices.
DBMS
……small data
5 years ago
.......data was big here …..and here ….. but not here

Example….RDBMS vs Big Data Tech
Today we can handle large
data… we just need to choose
right technology.

Technology Explosion

Something interesting happening
Heavy downloads
Negligible uploads
Heavy downloads
Heavy uploads
Internet is turning upside down,
or to be precise downside up

Product Digitalization: data will keep
growing
Google Cars - ~2 PB per year par car

Our Observation
Despite the fact that data is growing
significantly, and its not going to slow down.
The present day challenge is not the volume or
variety of data, but rather it is the overload of
“technologies”.

Cassandra
– Continuous Availability
– Linear Scalability
– No single point of failure
– Spans multiple DC’s
– Powerful Dynamic Data Model
• Maximum Flexibility
• Fast response
• 2 billion columns per row
– Open Source
– NoSQL
– 3.10
– Java
– Walmart
– Facebook
– Twitter
– Netflix
Operational Complexities

Careful Cassandra
Teams often misunderstand the use case for Cassandra and
use it as general purpose DB. It’s a great tool and we like it,
but too often we see teams run into trouble using it.
Require joins or complex search? Predefined indexes/keys
Say, no Cassandra Yeah ….Cassandra

Cassandra Careful - Lessons
It’s a great tool and we like it, but too often we see teams run
into trouble using it• Data Modeling is not simple: We saw cases where engineers re-modelled entire
databases multiple times to meet changing business needs.
• Not a general purpose database: It is optimized for fast reads on large data sets
based on predefined keys or indexes
• Time series: Suitable for storing time series data or metrics.
• Require Processing at Retrieval? If your use case require complex filtering or
processing when retrieving data, then Cassandra may not be the right choice for
you.
• Not Row Level Consistent: Data integrity challenges for non-key columns.
• Operational Complexities: Require careful planning and considerations

Design Considerations – Success Factors
It’s a great tool and we like it, but too often we see teams run
into trouble using it
• In depth “underlying architecture” understanding
• infrastructure awareness
• proactive “capacity planning”
Is key to succeed….
Cassandra Underlying Architecture AWS – Regions and Zones

TEST
Design Considerations
CPU
Cassandra is highly
concurrent and uses as
many CPU cores as
available
Insert heavy use cases
are CPU bound.
AWS - at least 4 vCPU's
AWS - Choose
computing optimized
instance types for heavy
inserts
Memory
Runs on JVM – properly
heap size , avoid too
large heaps
MAX_HEAP_SIZE not
more than 8 GB.
HEAP_NEW_SIZE,
100MB per vCPU
Leave enough memory
for OS file cache
AWS - 32GB RAM
Storage
mostly sequential, but
require random I/O
SSD preferred – low
latency for random
reads, and high
performance for
sequential writes for
compactions
Storage requirements -
storage overhead for
compaction
Adopt XFS or Ext4 file
system… avoid Ext3
Network
Gossip/Replication –
heavy traffic. At least 1
Gbps bandwidth
Spread across Regions
& Zones i.e DC”s and
racks. SNITCH settings
AWS - choose enhanced
networking.
VPC – private subnets =
replication factor. IP
Scheme
AWS - Use ENI - for
seeds. And spread
seeds across zones

DATA IS NOT BIG, BUT CHALLENGE IS WITH TECHNOLOGY
CHOICE OVERLOAD
DATA WILL KEEP GROWING, AS INTERNET IS TURNING UPSIDE
DOWN
CONSIDER LAMBDA ARCHITECTURE – IT CATERS MANY USE
CASES
CASSANDRA CAREFUL – IT IS NOT FOR EVERYONE
SUMMARY
@nisumtech

Faraz Mohammed
VP, INNOVATION & PRODUCT
714-204-7712
mfaraz@nisum.com
THANK YOU
www.nisum.com
500 S. Kraemer Boulevard, Suite 301, Brea, CA 92821
Building SuccessTogether®
@Captain_Faraz
We’re hiring….

Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed

Similar to Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed (20)

Recently uploaded

Recently uploaded (20)

Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed

Editor's Notes