Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Aerospike : High Performance NoSQL
Store with Flash Optimization
Gagan Agrawal, Sr. Principal Engineer,
Snapdeal
Agenda
•
What is Aerospike
•
Aerospike Architecture
•
Data Model
•
Demo
•
UDF & Aggregation framework
•
Aerospike @ Snapde...
What is Aerospike ?
Aerospike
•
Distributed and Scalable Key Value Store
•
Designed for Flash Optimized Storage
•
99% < 1 millisecond
•
1 M TP...
Comparison
Aerospike Architecture
Architecture Objectives
•
To create flexible and scalable platform
•
To provide robustness and reliability (ACID)
•
To pro...
Architecture
3 Layers
•
Cluster aware Client Layer
– Track Nodes
– Know where data resides in Cluster
– Implements its own TCP/IP conne...
Cluster Layer
●
Cluster Management Module
– Tracks Nodes in the Cluster
– Paxos like consensus voting process
●
Data Migra...
Cluster Formation
Data Storage Layer
Data Storage Layer
•
Designed for speed
•
Can operate all in-memory or with flash storage
•
100 Million keys take up only ...
Scales on Commodity Server
•
Each server scales to manage upto 16TB data
•
Parallel access to up to 20 SSDs
•
Scales out l...
XDR Architecture
Data Model
Data Hierarchy
•
Namespace (Database)
•
Set (Table)
•
Record (Row)
•
Bins (Columns)
How Data is Organized
Namespace
•
Storage definition- DRAM or Flash
– Storage block size (128k – 2MB)
– Controls size of Ram and Storage
•
Polic...
Set
•
Similar To Table
•
But has no Schema
•
Inherits policy from NS
•
Prefix to Primary Key
•
Name <= 63 chars
•
1023 per...
Record
•
A record is row of key-value
•
Value : one or more bins
•
Bin has a name and type
– Types : String,Integer,Blob,L...
Bins
•
Bins have a:
– Name : 14 character or less
– Type : one of following
●
String, Integer, Blob, List, Map
●
Large Dat...
Bins
Id lname fname address favorites
1 Able John 123 First Cats, dogs,
mice
2 Baker Kris 234
Second
3 Charlie
4 Delta Moe...
Type Mapping
●
Types are mapped to equivalent language type
Demo and Code Samples
Data Modeling
●
Focus on how you will query the data
– All Tweets from a given user
– Give me the last 10 Tweets for a giv...
Data Modeling - RDBMS
Data Modeling - Aerospike
●
Users
– Namespace : test, Set : users,
– Key : <username>
– Bins
●
username
●
password
●
gende...
Data Modeling - Aerospike
●
Tweets
– Namespace : test, Set : users,
– Key : <username:<counter>>
– Bins
●
tweet
●
ts
●
use...
Secondary Index
●
Value based lookup
●
Query is sent to all nodes in the cluster in parallel
●
Best for low selectivity in...
Secondary Index
UDF & Aggregation Framework
UDF
●
Move compute close to data
●
UDFs are common in many databases
– MySQL
– SQL Server
– Oracle
– DB2
UDF
●
Written in Lua
●
Record oriented
– acts on a single record (row)
●
Stream oriented
– Acts on a stream of records res...
Example UDF
Aggregations
●
Programmatic framework similar to Map Reduce
●
Processes a collection of rows
●
Primarily used for counts, ...
Aggregations
Aerospike @ Snapdeal
Aerospike : High Performance NoSQL Store with Flash Optimization
Upcoming SlideShare
Loading in …5
×

Aerospike : High Performance NoSQL Store with Flash Optimization

1,640 views

Published on

High Performance databases are the need of most widely used real-time internet services. Low latency and high throughput has always been of utmost importance in bringing traffic to the site. Aerospike is one such noSql store designed to maintain under 1 millisecond response time even under peak load. Optimized for flash storage, aerospike can be scaled by adding new nodes and provides high operational efficiency due to minimal manual involvement. It supports a variety of data types such as String, Integer, Bytes, List, Map, Large Data Types etc. Aersopike also has a unique aggregation framework where complex computation can be pushed directly to the server for analytics. As a result, Aerospike can be used in a variety of use cases including cache service, persistent store or analytics engine.

Published in: Software
  • Be the first to comment

Aerospike : High Performance NoSQL Store with Flash Optimization

  1. 1. Aerospike : High Performance NoSQL Store with Flash Optimization Gagan Agrawal, Sr. Principal Engineer, Snapdeal
  2. 2. Agenda • What is Aerospike • Aerospike Architecture • Data Model • Demo • UDF & Aggregation framework • Aerospike @ Snapdeal
  3. 3. What is Aerospike ?
  4. 4. Aerospike • Distributed and Scalable Key Value Store • Designed for Flash Optimized Storage • 99% < 1 millisecond • 1 M TPS / 100 TB • 100% Uptime with strong consistency (ACID) • Available in Community and Enterprise editions
  5. 5. Comparison
  6. 6. Aerospike Architecture
  7. 7. Architecture Objectives • To create flexible and scalable platform • To provide robustness and reliability (ACID) • To provide Operational Efficiency
  8. 8. Architecture
  9. 9. 3 Layers • Cluster aware Client Layer – Track Nodes – Know where data resides in Cluster – Implements its own TCP/IP connection pool • Self Managing Clustering and Data Distribution Layer – Automatic fail over – Replication – Intelligent Re-balancing and data migration • Flash Optimized Data Storage Layer – Stores data in RAM and Flash
  10. 10. Cluster Layer ● Cluster Management Module – Tracks Nodes in the Cluster – Paxos like consensus voting process ● Data Migration Module – Balances distribution of data – Ensures data duplication as per replication factor ● Transaction Processing Module – Sync / Async Replication – Proxy – Duplicate Resolution
  11. 11. Cluster Formation
  12. 12. Data Storage Layer
  13. 13. Data Storage Layer • Designed for speed • Can operate all in-memory or with flash storage • 100 Million keys take up only 6.4 GB • Log Structured • Writes to disk are performed in large blocks • Bypasses standard file system • Built in Smart Defragmenter and Intelligent Evictor
  14. 14. Scales on Commodity Server • Each server scales to manage upto 16TB data • Parallel access to up to 20 SSDs • Scales out linearly with each identical server • Battle tested with 100TB+ across the cluster
  15. 15. XDR Architecture
  16. 16. Data Model
  17. 17. Data Hierarchy • Namespace (Database) • Set (Table) • Record (Row) • Bins (Columns)
  18. 18. How Data is Organized
  19. 19. Namespace • Storage definition- DRAM or Flash – Storage block size (128k – 2MB) – Controls size of Ram and Storage • Policy Container – Replicator Factor – Default Expiry • Data Container – Namespace contains Set – Set contains Records
  20. 20. Set • Similar To Table • But has no Schema • Inherits policy from NS • Prefix to Primary Key • Name <= 63 chars • 1023 per Namespace • Cannot be deleted or renamed
  21. 21. Record • A record is row of key-value • Value : one or more bins • Bin has a name and type – Types : String,Integer,Blob,List,Map, Large Data Types (LDT) • Bins can be added at any time • Generation Counter – Optimistic Concurrency • Time-to-live – Auto Expiration
  22. 22. Bins • Bins have a: – Name : 14 character or less – Type : one of following ● String, Integer, Blob, List, Map ● Large Data Types – Large Ordered List – Large Map • Bins are stored in the record • A Bin can have different type in another record
  23. 23. Bins Id lname fname address favorites 1 Able John 123 First Cats, dogs, mice 2 Baker Kris 234 Second 3 Charlie 4 Delta Moe 456 Fourth Stake, ice cream apples
  24. 24. Type Mapping ● Types are mapped to equivalent language type
  25. 25. Demo and Code Samples
  26. 26. Data Modeling ● Focus on how you will query the data – All Tweets from a given user – Give me the last 10 Tweets for a given user – Last 10 Tweets for all users – How many users Tweeted in last X minutes
  27. 27. Data Modeling - RDBMS
  28. 28. Data Modeling - Aerospike ● Users – Namespace : test, Set : users, – Key : <username> – Bins ● username ● password ● gender ● lasttweeted (timestamp) ● tweetcount (total count)
  29. 29. Data Modeling - Aerospike ● Tweets – Namespace : test, Set : users, – Key : <username:<counter>> – Bins ● tweet ● ts ● username
  30. 30. Secondary Index ● Value based lookup ● Query is sent to all nodes in the cluster in parallel ● Best for low selectivity indexes – Result set of 1k to 1million
  31. 31. Secondary Index
  32. 32. UDF & Aggregation Framework
  33. 33. UDF ● Move compute close to data ● UDFs are common in many databases – MySQL – SQL Server – Oracle – DB2
  34. 34. UDF ● Written in Lua ● Record oriented – acts on a single record (row) ● Stream oriented – Acts on a stream of records resulting from a Query ● Using UDFs you can move processing to the same node as the data
  35. 35. Example UDF
  36. 36. Aggregations ● Programmatic framework similar to Map Reduce ● Processes a collection of rows ● Primarily used for counts, aggregate and sums ● Runs in parallel on all cluster nodes ● Lua for function language ● Reduce in the client
  37. 37. Aggregations
  38. 38. Aerospike @ Snapdeal

×