SlideShare a Scribd company logo
1 of 16
UNITED STATES CHILE INDIA NISUM.COM P. 1
What is “big” in big data?
……Cassandra
Faraz Mohammed
VP @ Nisum
• Innovation Lab – time boxed, fixed cost co-research with clients
on complex problems
• IT Consulting/Implementation
June 9, 2017
Global Software Architecture Conference
UNITED STATES CHILE INDIA NISUM.COM P. 2
Simplifying complex technologies (cutting edge) adoption, backed
by deep research and understanding.
Who are we?
UNITED STATES CHILE INDIA NISUM.COM P. 3
Agenda
– What is “big” in Bigdata?
– Something interesting happening
– Cassandra
UNITED STATES CHILE INDIA NISUM.COM P. 4
Does data size really
matters today?
BigData
.......Break thru to process
large amount of data
RDBMS
……Large data, yet
RDBMS
…..Struggle of converting
OLTP to OLAP
Technology
Explosion
.......Too many options,
complex choices.
DBMS
……small data
5 years ago
.......data was big here …..and here ….. but not here
UNITED STATES CHILE INDIA NISUM.COM P. 5
Example….RDBMS vs Big Data Tech
Today we can handle large
data… we just need to choose
right technology.
UNITED STATES CHILE INDIA NISUM.COM P. 6
Technology Explosion
UNITED STATES CHILE INDIA NISUM.COM P. 7
Something interesting happening
Heavy downloads
Negligible uploads
Heavy downloads
Heavy uploads
Internet is turning upside down,
or to be precise downside up
UNITED STATES CHILE INDIA NISUM.COM P. 8
Product Digitalization: data will keep
growing
Google Cars - ~2 PB per year par car
UNITED STATES CHILE INDIA NISUM.COM P. 9
Our Observation
Despite the fact that data is growing
significantly, and its not going to slow down.
The present day challenge is not the volume or
variety of data, but rather it is the overload of
“technologies”.
UNITED STATES CHILE INDIA NISUM.COM P. 10
Cassandra
– Continuous Availability
– Linear Scalability
– No single point of failure
– Spans multiple DC’s
– Powerful Dynamic Data Model
• Maximum Flexibility
• Fast response
• 2 billion columns per row
– Open Source
– NoSQL
– 3.10
– Java
– Walmart
– Facebook
– Twitter
– Netflix
Operational Complexities
UNITED STATES CHILE INDIA NISUM.COM P. 11
Careful Cassandra
Teams often misunderstand the use case for Cassandra and
use it as general purpose DB. It’s a great tool and we like it,
but too often we see teams run into trouble using it.
Require joins or complex search? Predefined indexes/keys
Say, no Cassandra Yeah ….Cassandra
UNITED STATES CHILE INDIA NISUM.COM P. 12
Cassandra Careful - Lessons
It’s a great tool and we like it, but too often we see teams run
into trouble using it• Data Modeling is not simple: We saw cases where engineers re-modelled entire
databases multiple times to meet changing business needs.
• Not a general purpose database: It is optimized for fast reads on large data sets
based on predefined keys or indexes
• Time series: Suitable for storing time series data or metrics.
• Require Processing at Retrieval? If your use case require complex filtering or
processing when retrieving data, then Cassandra may not be the right choice for
you.
• Not Row Level Consistent: Data integrity challenges for non-key columns.
• Operational Complexities: Require careful planning and considerations
UNITED STATES CHILE INDIA NISUM.COM P. 13
Design Considerations – Success Factors
It’s a great tool and we like it, but too often we see teams run
into trouble using it
• In depth “underlying architecture” understanding
• infrastructure awareness
• proactive “capacity planning”
Is key to succeed….
Cassandra Underlying Architecture AWS – Regions and Zones
UNITED STATES CHILE INDIA NISUM.COM P. 14
TEST
Design Considerations
CPU
Cassandra is highly
concurrent and uses as
many CPU cores as
available
Insert heavy use cases
are CPU bound.
AWS - at least 4 vCPU's
AWS - Choose
computing optimized
instance types for heavy
inserts
Memory
Runs on JVM – properly
heap size , avoid too
large heaps
MAX_HEAP_SIZE not
more than 8 GB.
HEAP_NEW_SIZE,
100MB per vCPU
Leave enough memory
for OS file cache
AWS - 32GB RAM
Storage
mostly sequential, but
require random I/O
SSD preferred – low
latency for random
reads, and high
performance for
sequential writes for
compactions
Storage requirements -
storage overhead for
compaction
Adopt XFS or Ext4 file
system… avoid Ext3
Network
Gossip/Replication –
heavy traffic. At least 1
Gbps bandwidth
Spread across Regions
& Zones i.e DC”s and
racks. SNITCH settings
AWS - choose enhanced
networking.
VPC – private subnets =
replication factor. IP
Scheme
AWS - Use ENI - for
seeds. And spread
seeds across zones
DATA IS NOT BIG, BUT CHALLENGE IS WITH TECHNOLOGY
CHOICE OVERLOAD
DATA WILL KEEP GROWING, AS INTERNET IS TURNING UPSIDE
DOWN
CONSIDER LAMBDA ARCHITECTURE – IT CATERS MANY USE
CASES
CASSANDRA CAREFUL – IT IS NOT FOR EVERYONE
SUMMARY
@nisumtech
UNITED STATES CHILE INDIA NISUM.COM P. 16
Faraz Mohammed
VP, INNOVATION & PRODUCT
714-204-7712
mfaraz@nisum.com
THANK YOU
www.nisum.com
500 S. Kraemer Boulevard, Suite 301, Brea, CA 92821
Building SuccessTogether®
@Captain_Faraz
We’re hiring….

More Related Content

What's hot

Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens
 
Building a Scalable and Modern Infrastructure at CARFAX
Building a Scalable and Modern Infrastructure at CARFAXBuilding a Scalable and Modern Infrastructure at CARFAX
Building a Scalable and Modern Infrastructure at CARFAXMongoDB
 
Is "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIs "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIvan Donev
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing IceotopePR
 

What's hot (6)

Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
Building a Scalable and Modern Infrastructure at CARFAX
Building a Scalable and Modern Infrastructure at CARFAXBuilding a Scalable and Modern Infrastructure at CARFAX
Building a Scalable and Modern Infrastructure at CARFAX
 
Is "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIs "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database world
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing
 
Data streaming at VRT
Data streaming at VRTData streaming at VRT
Data streaming at VRT
 

Similar to Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed

NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015Doug O'Flaherty
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Building Low Cost Scalable Web Applications Tools & Techniques
Building Low Cost Scalable Web Applications   Tools & TechniquesBuilding Low Cost Scalable Web Applications   Tools & Techniques
Building Low Cost Scalable Web Applications Tools & Techniquesrramesh
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability ConsiderationsNavid Malek
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3David Byte
 
Complex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardComplex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardRedis Labs
 
AWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases OptionsAWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases OptionsAmazon Web Services
 
Make a Move to AWS Now
Make a Move to AWS Now Make a Move to AWS Now
Make a Move to AWS Now Buurst
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAmazon Web Services
 

Similar to Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed (20)

NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
No sql
No sqlNo sql
No sql
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Building Low Cost Scalable Web Applications Tools & Techniques
Building Low Cost Scalable Web Applications   Tools & TechniquesBuilding Low Cost Scalable Web Applications   Tools & Techniques
Building Low Cost Scalable Web Applications Tools & Techniques
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability Considerations
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
 
Complex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff PollardComplex Ephemeral Caching With Redis: Jeff Pollard
Complex Ephemeral Caching With Redis: Jeff Pollard
 
AWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases OptionsAWS Summit 2013 | Singapore - Understanding Databases Options
AWS Summit 2013 | Singapore - Understanding Databases Options
 
Make a Move to AWS Now
Make a Move to AWS Now Make a Move to AWS Now
Make a Move to AWS Now
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
Redis meetup
Redis meetupRedis meetup
Redis meetup
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Nisum - Global Big Data Conference - Advance Cassandra by Faraz Mohammed

  • 1. UNITED STATES CHILE INDIA NISUM.COM P. 1 What is “big” in big data? ……Cassandra Faraz Mohammed VP @ Nisum • Innovation Lab – time boxed, fixed cost co-research with clients on complex problems • IT Consulting/Implementation June 9, 2017 Global Software Architecture Conference
  • 2. UNITED STATES CHILE INDIA NISUM.COM P. 2 Simplifying complex technologies (cutting edge) adoption, backed by deep research and understanding. Who are we?
  • 3. UNITED STATES CHILE INDIA NISUM.COM P. 3 Agenda – What is “big” in Bigdata? – Something interesting happening – Cassandra
  • 4. UNITED STATES CHILE INDIA NISUM.COM P. 4 Does data size really matters today? BigData .......Break thru to process large amount of data RDBMS ……Large data, yet RDBMS …..Struggle of converting OLTP to OLAP Technology Explosion .......Too many options, complex choices. DBMS ……small data 5 years ago .......data was big here …..and here ….. but not here
  • 5. UNITED STATES CHILE INDIA NISUM.COM P. 5 Example….RDBMS vs Big Data Tech Today we can handle large data… we just need to choose right technology.
  • 6. UNITED STATES CHILE INDIA NISUM.COM P. 6 Technology Explosion
  • 7. UNITED STATES CHILE INDIA NISUM.COM P. 7 Something interesting happening Heavy downloads Negligible uploads Heavy downloads Heavy uploads Internet is turning upside down, or to be precise downside up
  • 8. UNITED STATES CHILE INDIA NISUM.COM P. 8 Product Digitalization: data will keep growing Google Cars - ~2 PB per year par car
  • 9. UNITED STATES CHILE INDIA NISUM.COM P. 9 Our Observation Despite the fact that data is growing significantly, and its not going to slow down. The present day challenge is not the volume or variety of data, but rather it is the overload of “technologies”.
  • 10. UNITED STATES CHILE INDIA NISUM.COM P. 10 Cassandra – Continuous Availability – Linear Scalability – No single point of failure – Spans multiple DC’s – Powerful Dynamic Data Model • Maximum Flexibility • Fast response • 2 billion columns per row – Open Source – NoSQL – 3.10 – Java – Walmart – Facebook – Twitter – Netflix Operational Complexities
  • 11. UNITED STATES CHILE INDIA NISUM.COM P. 11 Careful Cassandra Teams often misunderstand the use case for Cassandra and use it as general purpose DB. It’s a great tool and we like it, but too often we see teams run into trouble using it. Require joins or complex search? Predefined indexes/keys Say, no Cassandra Yeah ….Cassandra
  • 12. UNITED STATES CHILE INDIA NISUM.COM P. 12 Cassandra Careful - Lessons It’s a great tool and we like it, but too often we see teams run into trouble using it• Data Modeling is not simple: We saw cases where engineers re-modelled entire databases multiple times to meet changing business needs. • Not a general purpose database: It is optimized for fast reads on large data sets based on predefined keys or indexes • Time series: Suitable for storing time series data or metrics. • Require Processing at Retrieval? If your use case require complex filtering or processing when retrieving data, then Cassandra may not be the right choice for you. • Not Row Level Consistent: Data integrity challenges for non-key columns. • Operational Complexities: Require careful planning and considerations
  • 13. UNITED STATES CHILE INDIA NISUM.COM P. 13 Design Considerations – Success Factors It’s a great tool and we like it, but too often we see teams run into trouble using it • In depth “underlying architecture” understanding • infrastructure awareness • proactive “capacity planning” Is key to succeed…. Cassandra Underlying Architecture AWS – Regions and Zones
  • 14. UNITED STATES CHILE INDIA NISUM.COM P. 14 TEST Design Considerations CPU Cassandra is highly concurrent and uses as many CPU cores as available Insert heavy use cases are CPU bound. AWS - at least 4 vCPU's AWS - Choose computing optimized instance types for heavy inserts Memory Runs on JVM – properly heap size , avoid too large heaps MAX_HEAP_SIZE not more than 8 GB. HEAP_NEW_SIZE, 100MB per vCPU Leave enough memory for OS file cache AWS - 32GB RAM Storage mostly sequential, but require random I/O SSD preferred – low latency for random reads, and high performance for sequential writes for compactions Storage requirements - storage overhead for compaction Adopt XFS or Ext4 file system… avoid Ext3 Network Gossip/Replication – heavy traffic. At least 1 Gbps bandwidth Spread across Regions & Zones i.e DC”s and racks. SNITCH settings AWS - choose enhanced networking. VPC – private subnets = replication factor. IP Scheme AWS - Use ENI - for seeds. And spread seeds across zones
  • 15. DATA IS NOT BIG, BUT CHALLENGE IS WITH TECHNOLOGY CHOICE OVERLOAD DATA WILL KEEP GROWING, AS INTERNET IS TURNING UPSIDE DOWN CONSIDER LAMBDA ARCHITECTURE – IT CATERS MANY USE CASES CASSANDRA CAREFUL – IT IS NOT FOR EVERYONE SUMMARY @nisumtech
  • 16. UNITED STATES CHILE INDIA NISUM.COM P. 16 Faraz Mohammed VP, INNOVATION & PRODUCT 714-204-7712 mfaraz@nisum.com THANK YOU www.nisum.com 500 S. Kraemer Boulevard, Suite 301, Brea, CA 92821 Building SuccessTogether® @Captain_Faraz We’re hiring….

Editor's Notes

  1. 2008
  2. ENI – Elastic Network Interfaces Storage Most of the I/O happening in Cassandra is sequential, but ther are cases where you require random I/O … an example is when reading Sstables during read operations. SSD is recommended storage .. As it provides exremely low-latench response times for random read operations, while supporing ample sequential write performance for compaction operations. Replication and storage overhead due to compaction … should be taken into account while determining storage requirements. Recommend file system for all volumes is XFS. Ext4 can be used, but avoid Ext3, as it is considerably slower. Networking Cassandra uses Goassip protocol to exchange information with other nodes about network topology…. Involves talking to multiple nodes for read/write… results in a lot of data transfer thru the network. We recommend to at always choose instances with at least 1 Gbps network bandwidth… to accommodate replication and Gossip. If you use AWS – choosed Enhanced networking enabled on your instances… for better performance. Ensure to use VPC, keep nodes in private subnet, and create as many subnets as replication-factor. Use NAT for translation. Another thing to account for while planning subnects for your cassandra cluster is that Amazon reserves the first four IP addresses and the last IP address of every subnet for IP networking purposes Use Elastic Network Interfaces – ENI - It’s a virtual network interface, can be used for managing SEED server Memory Cassandra primarily runs on JVM The JVM has to be appropriately sized for performance. Large heaps can introduce garbage collection (GC) pauses that can lead to latency or even make a Cassandra node appear to have gone offline. Proper heap settings can minimize the impact of GC in JVM The MAX_HEAP_SIZE parameter determines the heap size of the Cassandra JVM. DataStax recommends not to allocate more than 8 GB for the heap. The HEAP_NEW_SIZE parameter is the size of the you generateion in Java. A general rule of thumb is to set this value at 100MB per vCPU Cassandra also largely depends on the OS file cache for read performance. Hence choosing an optimum JVM heap size and leaving enough memory for OS file cache is important…. For product workloads we recommend to at least go with 32GB of DRAM CPU Insert-heavy workloads are CPU-bound in cassandra before becoming IO-bound. In other words all write operations go to the commit log, but cassandra is so efficient in writing, that the CPU becomes the limiting factor. Cassandra is highly concurrent and uses as many CPU cores as available. Recommend at least 4 vCPU’s … test it before you settle. Others AWS – choose memory optimized or storage optimized instance types. Test a representative workload… before choosing the final instance types. Spread acorss AZ… so in case of disaster you can still ensure availability & uptime Application When building Cassandra cluster, select the same region for your data and application … to minimize application latency Cassandra cluster can be made Amazon EC2 aware …. Thus support high availability by defining an appropriate snitch settings. This allows cassandra to place the replicas for data partitions on nodes that are in different AZ.s Spread your seed nodes across multiple availability zones … seed nodes help bootstrap new nodes Cassandra nodes can be Datacenter aware or Rack aware. DataCenter = AWS Region, and Rack = Zones. Replication of the region cloud also server as backups Launch cassandra in a VPC … it supports enhanced networking feature… means low latency For example, go with /16 class instead of /28 class, as latter has only 14 IP address AWS Regions are independent, but AZ’s are connected on low-latency. Communication between regions is on public-internet … so ensure encryption. Also, there charge for data transfer between regions … it’s a nasty surprise. …no fee between transfers – AWS Kinesis …one way fee, not two way – S3 across regions ... Two way cost, IN and OUT – transferring across EC2’s in different AZ’s