Speed @ Scale with
NoSQL
Aveekshith Bushan
Regional Sales and SA Director - APAC
aveek@aerospike.com
Twitter: @aveekshith
2Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Then and now!
3Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Volume, Variety and Velocity
4Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Scale - Closer to Home
1956
IBM 350 Hard
Disk
5MB of storage
System Cost:
160K$
1980
IBM 3380
1GB of storage
Cost: 50K$
2015
Multiple
Options
1TB of storage
Cost: 0.8K$
5Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Over the Years – Scale!
6Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Scale Changes Everything!
Source: The Black Swan by Nassim Nocholas Taleb
7Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
The Black Swan Effect
8Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
“Known” and “Unknown” Unknowns!
Known Unknowns
• Can be Planned For
• Through BCP, Risk Matrix etc
Unknown Unknowns
• Difficult to Model and Foresee
• Impact can be reduced by
Diversification Across Investments,
Business, Markets and Product Types
9Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
What Does it Mean – IT Perspective
Positive Black Swans
• Explosion in Data
• Exposure to Different
Types of Data
• Agility in IT Infrastructure
• Ex: Successful New
Product or Market Launch
Negative Black Swans
• Globally Distributed IT
Infrastructure
• No Vendor Lock-In
• Easy Deployment Models
• Ex: Natural or Man-made
Disasters, Market Changes
Gaussian World
• Structured Data
• Predictable Growth in
Data Volume
• Lower Cost of Overall
Operation
• Ex: Traditional
Applications
10Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Positive Black Swans - Data
Positive Black
Swans
•Explosion in Data
•Exposure to Different
Types of Data
•Agility in IT
Infrastructure
•Ex: Successful New
Product or Market
Launch
Horizontal
Scalability
Dynamic Data
Model
PerformanceAgility
Geospatial
Information
11Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Negative Black Swans - Data
Negative Black
Swans
•Globally Distributed IT
Infrastructure
•No Vendor Lock-In
•Easy Deployment
Models
•Ex: Natural or Man-
made Disasters, Market
Changes
Geographically
Distributed
Clusters
Built on
Commodity
Hardware
Cloud-Ready
Flexible Data
Model
Low Cost
Solution
12Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Gaussian World - Data
Gaussian World
•Structured Data
•Predictable Growth in
Data Volume
•Lower Cost of Overall
Operation
•Ex: Traditional
Applications
Consistency
Query Model
Structured
Data
Manageability
Ecosystem
13Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Real World ERD Diagram
14Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Familiar World!
ORM Relational DB
15Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Making Changes
New
Table New
Table
16Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
What you don’t get with Relational Databases!
•Unstructured
Data
•Semi-structured
Data
Data Types
•Speed at Scale
•Petabytes Scale
Volume
•Quick Time to
Market
•Agile
Development
Agility
•Cloud Ready
•Scale-out and
Scale-up
Deployment
Models
17Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
NoSQL Types
• Key Value Stores
• Document Stores
• Columnar Stores
• Graph Stores
• Other Stores
– Time-Series
– New SQL
– SSD Optimized DBs
– In-Memory Stores
18Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
SSD
Key Value Store
Relational Key Value Store
F_Na
me
L_Nam
e
Dept Location Skill_Details
John Marsh E11 [45.123
,47.232
]
{
Skill_Name:
‘Java’,
Version:
‘1.8’,
Level:3, …
},
{Skill_Name:
‘Go’,
Version:
‘1.7’,
Level:2, … }
0 Memory
Ex: Aerospike, Redis
Emp_ID F_Name L_Name Dept City
1 John Marsh E11 New York
2 Satish Rao E12 Bengaluru
3 Alok Jain E12 New Delhi
4 Raghu G E11 Bengaluru
Skill_ID Skill_Name Version
1 Java 1.8
2 Go 1.7
3 Python 3.5
ID Emp_ID Skill_ID Level
100 1 1 3
101 1 2 2
102 2 2 3
103 3 1 4
104 4 3 1
19Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Document Stores
Document DB
{
F_Name: ‘John’,
L_Name: ‘Marsh’
city: ‘New York’,
location: [45.123,47.232],
skills: [
{ Skill_Name: ‘Java’,
Version: ‘1.8’,
Level: 3, … },
{ Skill_Name: ‘Go’,
Version: ‘1.7’,
Level: 2, … }
]
}
Ex: MongoDB, CouchDB, OrientDB
Relational
Emp_ID F_Name L_Name Dept City
1 John Marsh E11 New York
2 Satish Rao E12 Bengaluru
3 Alok Jain E12 New Delhi
4 Raghu G E11 Bengaluru
Skill_ID Skill_Name Version
1 Java 1.8
2 Go 1.7
3 Python 3.5
ID Emp_ID Skill_ID Level
100 1 1 3
101 1 2 2
102 2 2 3
103 3 1 4
104 4 3 1
20Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Hadoop and NoSQL
• Hadoop is a Map/Reduce Framework
• Used to partition computation on large datasets
• Used where you need to analyse most of the data
• E.g.
– Count all the links on all the web pages in India
– Analyse the recommendations based on yesterdays
purchases
• Use a connector to Push and Pull Data from Hadoop
in to NoSQL
MONGODB
22Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Architecture
AEROSPIKE
24Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Architecture
1) No Hotspots
– Distributed Hash Table
simplifies data partitioning
2) Smart Client – 1 hop to data,
no load balancers
3) Shared Nothing
Architecture,
every node is identical
6) XDR – sync replication
across data centers
ensures Zero Downtime
4) Smart Cluster, Zero Touch
– auto-failover, rebalancing,
rolling upgrades
5) Operations and long-running
tasks prioritized in real-time
25Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Data is Distributed Randomly
Every key is hashed into a
20 byte (fixed length) string
using the RIPEMD160 hash function
This hash + additional data
(fixed 64 bytes)
are stored in RAM in the index
12 bits of this hash are used to compute the partition id
There are 4096 partitions
Partition id maps to node id
based on cluster membership
cookie-abcdefg-
12345678
182023kh15hh3kahdj
sh
Partition
ID
Master
node
Replica
node
… 1 4
1820 2 3
1821 3 2
4096 4 1
26Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved.[ ]
Even record distribution
Node A Node B Node C
Z
Z’
Y
Y’
X
X’
AerospikeClient
Application
Thank You!
Aveek
aveek@aerospike.com

IIMB presentation

  • 1.
    Speed @ Scalewith NoSQL Aveekshith Bushan Regional Sales and SA Director - APAC aveek@aerospike.com Twitter: @aveekshith
  • 2.
    2Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Then and now!
  • 3.
    3Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Volume, Variety and Velocity
  • 4.
    4Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Scale - Closer to Home 1956 IBM 350 Hard Disk 5MB of storage System Cost: 160K$ 1980 IBM 3380 1GB of storage Cost: 50K$ 2015 Multiple Options 1TB of storage Cost: 0.8K$
  • 5.
    5Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Over the Years – Scale!
  • 6.
    6Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Scale Changes Everything! Source: The Black Swan by Nassim Nocholas Taleb
  • 7.
    7Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] The Black Swan Effect
  • 8.
    8Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] “Known” and “Unknown” Unknowns! Known Unknowns • Can be Planned For • Through BCP, Risk Matrix etc Unknown Unknowns • Difficult to Model and Foresee • Impact can be reduced by Diversification Across Investments, Business, Markets and Product Types
  • 9.
    9Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] What Does it Mean – IT Perspective Positive Black Swans • Explosion in Data • Exposure to Different Types of Data • Agility in IT Infrastructure • Ex: Successful New Product or Market Launch Negative Black Swans • Globally Distributed IT Infrastructure • No Vendor Lock-In • Easy Deployment Models • Ex: Natural or Man-made Disasters, Market Changes Gaussian World • Structured Data • Predictable Growth in Data Volume • Lower Cost of Overall Operation • Ex: Traditional Applications
  • 10.
    10Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Positive Black Swans - Data Positive Black Swans •Explosion in Data •Exposure to Different Types of Data •Agility in IT Infrastructure •Ex: Successful New Product or Market Launch Horizontal Scalability Dynamic Data Model PerformanceAgility Geospatial Information
  • 11.
    11Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Negative Black Swans - Data Negative Black Swans •Globally Distributed IT Infrastructure •No Vendor Lock-In •Easy Deployment Models •Ex: Natural or Man- made Disasters, Market Changes Geographically Distributed Clusters Built on Commodity Hardware Cloud-Ready Flexible Data Model Low Cost Solution
  • 12.
    12Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Gaussian World - Data Gaussian World •Structured Data •Predictable Growth in Data Volume •Lower Cost of Overall Operation •Ex: Traditional Applications Consistency Query Model Structured Data Manageability Ecosystem
  • 13.
    13Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Real World ERD Diagram
  • 14.
    14Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Familiar World! ORM Relational DB
  • 15.
    15Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Making Changes New Table New Table
  • 16.
    16Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] What you don’t get with Relational Databases! •Unstructured Data •Semi-structured Data Data Types •Speed at Scale •Petabytes Scale Volume •Quick Time to Market •Agile Development Agility •Cloud Ready •Scale-out and Scale-up Deployment Models
  • 17.
    17Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] NoSQL Types • Key Value Stores • Document Stores • Columnar Stores • Graph Stores • Other Stores – Time-Series – New SQL – SSD Optimized DBs – In-Memory Stores
  • 18.
    18Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] SSD Key Value Store Relational Key Value Store F_Na me L_Nam e Dept Location Skill_Details John Marsh E11 [45.123 ,47.232 ] { Skill_Name: ‘Java’, Version: ‘1.8’, Level:3, … }, {Skill_Name: ‘Go’, Version: ‘1.7’, Level:2, … } 0 Memory Ex: Aerospike, Redis Emp_ID F_Name L_Name Dept City 1 John Marsh E11 New York 2 Satish Rao E12 Bengaluru 3 Alok Jain E12 New Delhi 4 Raghu G E11 Bengaluru Skill_ID Skill_Name Version 1 Java 1.8 2 Go 1.7 3 Python 3.5 ID Emp_ID Skill_ID Level 100 1 1 3 101 1 2 2 102 2 2 3 103 3 1 4 104 4 3 1
  • 19.
    19Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Document Stores Document DB { F_Name: ‘John’, L_Name: ‘Marsh’ city: ‘New York’, location: [45.123,47.232], skills: [ { Skill_Name: ‘Java’, Version: ‘1.8’, Level: 3, … }, { Skill_Name: ‘Go’, Version: ‘1.7’, Level: 2, … } ] } Ex: MongoDB, CouchDB, OrientDB Relational Emp_ID F_Name L_Name Dept City 1 John Marsh E11 New York 2 Satish Rao E12 Bengaluru 3 Alok Jain E12 New Delhi 4 Raghu G E11 Bengaluru Skill_ID Skill_Name Version 1 Java 1.8 2 Go 1.7 3 Python 3.5 ID Emp_ID Skill_ID Level 100 1 1 3 101 1 2 2 102 2 2 3 103 3 1 4 104 4 3 1
  • 20.
    20Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Hadoop and NoSQL • Hadoop is a Map/Reduce Framework • Used to partition computation on large datasets • Used where you need to analyse most of the data • E.g. – Count all the links on all the web pages in India – Analyse the recommendations based on yesterdays purchases • Use a connector to Push and Pull Data from Hadoop in to NoSQL
  • 21.
  • 22.
    22Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Architecture
  • 23.
  • 24.
    24Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Architecture 1) No Hotspots – Distributed Hash Table simplifies data partitioning 2) Smart Client – 1 hop to data, no load balancers 3) Shared Nothing Architecture, every node is identical 6) XDR – sync replication across data centers ensures Zero Downtime 4) Smart Cluster, Zero Touch – auto-failover, rebalancing, rolling upgrades 5) Operations and long-running tasks prioritized in real-time
  • 25.
    25Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Data is Distributed Randomly Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function This hash + additional data (fixed 64 bytes) are stored in RAM in the index 12 bits of this hash are used to compute the partition id There are 4096 partitions Partition id maps to node id based on cluster membership cookie-abcdefg- 12345678 182023kh15hh3kahdj sh Partition ID Master node Replica node … 1 4 1820 2 3 1821 3 2 4096 4 1
  • 26.
    26Proprietary & Confidential| © 2015 Aerospike Inc. All rights reserved.[ ] Even record distribution Node A Node B Node C Z Z’ Y Y’ X X’ AerospikeClient Application
  • 27.

Editor's Notes

  • #4 Inspired by Deirdre Spilane
  • #5 IBM RAMAC Oracle – About the time Image 2 happened MongoDB – Few Years before Image 3 What do we see?
  • #6 By 2020, the world will have 40 Zetabytes of data. 57 times more data than every grain of sand in the world