More Related Content Similar to Aerospike DB and Storm for real-time analytics (20) Aerospike DB and Storm for real-time analytics1. Aerospike aer . o . spike [air-oh- spahyk]
noun, 1. tip of a rocket that enhances speed and stability
STORM
PERSISTENCE
AND REAL-TIME
ANALYTICS
APRIL 1, 2014
IN-MEMORY NOSQL DATABASE
brian@aerospike.com
2. Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 2
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Real-time analytics with
Storm and Aerospike
Brian Bulkowski
Founder and CTO
Aerospike
3. Streaming architecture
© 2014 Aerospike. All rights reserved. | Pg. 3
Data
Warehouse,
Hadoop
Cluster
Real-time
Interactions
Server
Batch Analytics
• User segmentation
• Location patterns
• Similar audience
Real-time Interactions
• Frequency caps
• Recent ads served
• Recent search terms
User
Data
Streaming
(Storm)
Hadoop
4. Why not other databases?
➤ Database requests in bolts
➤ Flash optimized
§ Do you need more than 30G ?
➤ Read / write optimized
➤ Faster & more reliable than
than Kafka (Cassandra based)
➤ Faster than Mongo
➤ More scale than Redis
© 2012 Aerospike. All rights reserved. Confidential | Pg. 4
5. Examples
➤ Recommendations
§ Multiple recommendation systems
§ Multi-arm bandit
§ https://github.com/tdunning/
storm-counts/wiki/Bayesian-Bandit
➤ Simple fraud counts
§ Store recent requests for payment
§ Store recent users
§ Calculate fraud scores, drop events
if past threshold
© 2014 Aerospike. All rights reserved. Confidential | Pg. 5
6. Aerospike Bolts
➤ Aerospike has speed, reliability, scale for Storm
§ Free version at http://aerospike.com/
§ Internap – free high performance SSD servers for trial
➤ Bolts available on github
§ https://github.com/aerospike/storm-aerospike
➤ EnrichBolt
§ Add fields from column after looking up a key
➤ PersistBolt
§ Store fields based on a key
➤ Benefits
§ In memory with FLASH
§ Clustered for high performance
§ HA state matches Storm’s stateless model
© 2014 Aerospike. All rights reserved. Confidential | Pg. 6
7. Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2012 Aerospike. All rights reserved. Confidential Pg. 7
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
The power of Flash
8. OTHER DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
OTHER
DATABASE
AEROSPIKE FLASH OPTIMIZED
IN-MEMORY DATABASE
Ask me and I’ll tell you the answer.Ask me. I’ll look up the answer and then tell it to
you.
AEROSPIKE
HYBRID MEMORY SYSTEM™
Flash-optimization Delivers Disruptive Performance
9. DRAM & HDD SSD & DRAM
Storage /server 180 GB (196 GB Server) 2.4 TB (4 x 700 GB)
TPS /server 500,000 500,000
Cost /server $8,000 $11,000
Server costs $1,488,000 $154,000
Power /server 0.9 kW 1.1 kW
Power (2 years) $0.12 per kWh ave.
US
$352,000 $32,400
Maintenance (2 years) $3,600 /
server
$670,000 $50,400
Total $2,510,000 $236,800
…at 1/10 the hardware cost
Actual customer analysis
500K TPS
10 TB Storage
2x Replication
186 SERVERS 14 SERVERS
OTHER DATABASES
ONLY
10. © 2012 Aerospike. All rights reserved. Pg. 10
Measure your drives!
Aerospike Certification Tool (ACT)
http://github.com/aerospike/act
Transactional database workload
Reads: 1.5KB
(can’t batch / cache reads, random)
Writes: 128K blocks
(log based layout)
(plus defragmentation)
Turn up the load until
latency is over required SLA
11. © 2012 Aerospike. All rights reserved. Pg. 11
Micron P320h – ACT results
[root@144.bm-general.dev.nym2 act]#
latency_calc/act_latency.py -l
actconfig_micron_75x_1d_rssdb_20130503232823.out
trans device %>(ms) %>(ms)
hour 1 8 64 1 8 64
----- ------ ------ ------ ------ ------
1 0.17 0.00 0.00 0.03 0.00 0.00
2 0.17 0.00 0.00 0.03 0.00 0.00
3 0.18 0.00 0.00 0.03 0.00 0.00
4 0.18 0.00 0.00 0.03 0.00 0.00
5 0.18 0.00 0.00 0.03 0.00 0.00
6 0.19 0.00 0.00 0.04 0.00 0.00
150K read IOPS @ 1.5K
225MB writes @ 128K
225MB reads @ 128K
$8/GB
12. © 2012 Aerospike. All rights reserved. Pg. 12
Test data – the next generation
6K reads per second, 9MB/sec write load
> 1 ms > 8 ms > 64 ms
Intel s3700, 20% OP - 6k iops 1.6 0 0 ($3/GB)
Intel s3700, 20% OP - 12k iops 5.4 0 0
Intel s3700, 20% OP - 24k iops 12.29 0 0
Intel s3700, NO OP - 24k iops 15.33 0 0
FusionIO Iodrive 2 – 6k iops 2.63 0.01 0 ($8/GB)
FusionIO iodrive 2 – 12k iops 7.32 0.1 0
13. © 2012 Aerospike. All rights reserved. Pg. 13
Test data – the previous generation
2K reads per second, 3MB/sec write load
> 1 ms > 8 ms > 64 ms
Intel X25-M + w/No OP (160G): 17.9% 0.6% 0.4%
Intel X25-M + OP (126G): 3.4% 0.1% 0.08%
OCZ Deneva 2 SLC + OP (95G): 0.9% 0.08% 0%
Samsung SS805 (100G): 2.0% 0.09% 0%
Intel 710 + OP (158G): 4.0% 0.01% 0%
Intel 320 + OP (126G): 5.6% 0% 0%
OCZ Vertex 2 + OP (190G): 6.3% 0.5% 0.01%
SMART XceedIOPS + OP (158G): 5.4% 0.4% 0%
Intel 510 + OP (95G): 6.2% 4.0% 0.03%
Micron P300 + OP (79GB): 1.3% 1.0% 0.7%
14. © 2012 Aerospike. All rights reserved. Pg. 14
Test data – the previous generation
6K reads per second, 18MB/sec write load
> 1 ms > 8 ms > 64 ms
OCZ Deneva 2 SLC + OP (95G): 3.2% 0.4% 0%
Samsung SS805 (100G): 10.1% 0.8% 0.02%
Intel 320 + OP (126G): 22.0% 0.3% 0.03%
OCZ Deneva 2 MLC (Sync) 8.8% 0.6% 0.06%
OCZ Vertex 2 + OP (190G): 27.6% 4.6% 0.4%
SMART XceedIOPS + OP (158G): 24.5% 5.4% 1.0%
15. Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 15
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Why Aerospike ?
16. © 2013 Aerospike. All rights reserved. Confidential. 16
➤ Key Value API
➤ Real-time Performance
➤ Read/Write Workloads
➤ Clustering
➤ High Availability
➤ Commodity Hardware
➤ RAM + Flash
➤ XDR
Distributed Key Value Database +
Global Data Management
17. © 2013 Aerospike. All rights reserved. Confidential. 17
Challenges
1. Handle extremely high rates of persistent read/write
transactions
2. Avoid hot spots to maintain tight latency SLAs
3. Provide immediate consistency with replication
4. Allow long running tasks with transactions
5. Scale linearly as data sizes increase
6. Add capacity with no service interruption
18. Aerospike: the gold standard for high throughput,
low latency, high reliability transactions
Performance
• Over ten trillion transactions per
month
• 99% of transactions faster than 2
ms
• 150K TPS per server
Scalability
• Billions of Internet users
• Clustered Software
• Automatic Data Rebalancing
Reliability
• 50 customers; zero service down-
time
• Immediate Consistency
• Rapid Failover; Data Center
Replication
Price/Performance
• Makes impossible projects
affordable
• Flash-optimized
• 1/10 the servers required
19. © 2013 Aerospike. All rights reserved. Confidential. 19
10x Performance
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
Balanced Read-Heavy
Aerospike Cassandra MongoDB Couchbase 2.0*
*We were forced to exclude Couchbase...since when run with either disk
or replica durability on it was unable to complete the test.”
– Thumbtack Technology
0
2.5
5
7.5
10
0 50,000 100,000 150,000 200,000
AverageLatency,ms
Throughput, ops/sec
Balanced Workload Read Latency
Aerospike
Cassandra
MongoDB
0
4
8
12
16
0 50,000 100,000 150,000 200,000
AverageLatency,ms
Throughput, ops/sec
Balanced Workload Update Latency
Aerospike
Cassandra
MongoDB
HIGH THROUGHPUT LOW LATENCY
Throughput,TPS
20. © 2013 Aerospike. All rights reserved. Confidential. 20
High Availability
1 32 4 5 Phases
1) 100KTPS – 4 nodes
2) Clients at Max
3) 400KTPS – 4 nodes
4) 400KTPS – 3 nodes
5) 400KTPS – 4 nodes
Aerospike Node Specs:
CentOS 6.3
Intel i5-2400@ 3.1 GHz (Quad core)
16 GB RAM@1333 MHz
21. © 2013 Aerospike. All rights reserved. Confidential. 21
➤ Hard to
Maintain
➤ Performance
Better than the Competition
➤ Latency
➤ Number of
Servers
➤ Stability
➤ Cost of
RAM
➤ Cost of
RAM
➤ Scalability
22. © 2013 Aerospike. All rights reserved. Confidential. 22
OHIO
1) No Hotspots
– DHT with RIPEMD160
simplifies data
partitioning
2) Smart Client – 1 hop to
data, no load balancers
3) Shared Nothing
Architecture,
every node identical
7) XDR – asynch replication
across data centers ensures
Zero Downtime
4) Single row ACID
– synch replication in cluster
5) Smart Cluster, Zero Touch
– auto-failover, rebalancing,
rolling upgrades..
6) Transactions and long running
tasks prioritized real-time
Simpler Scaling: Fewer Servers, ACID, Zero Touch
23. © 2013 Aerospike. All rights reserved. Confidential. 23
Intelligent Client
• Implements Aerospike API
• Optimistic row locking
• Optimized binary protocol
• Cluster tracking
– Learns about cluster
changes, partition map
– Gossip protocol
• Transaction semantics
– Global transaction ID
– Retransmit and timeout
Shields Applications from the Complexity of the Cluster
24. © 2013 Aerospike. All rights reserved. Confidential. 24
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write synchronously to master memory
and replica memory
4. Queue operations to disk
5. Signal completed transaction
(optional storage commit wait)
6. Master applies conflict resolution policy
(rollback/ rollforward)
master replica
1. Cluster discovers new node via gossip
protocol
2. Paxos vote determines new data
organization
3. Partition migrations scheduled
4. When a partition migration starts,
write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions
continue
Writing with Immediate Consistency Adding a Node
ACID Transactions
25. © 2013 Aerospike. All rights reserved. Confidential. 25
➤ Distributed Hash Table with No Hotspots
§ Every key hashed with RIPEMD160
into a 20 byte (fixed length) string
NO KNOWN COLLISIONS
§ Hash + additional (fixed 64 bytes) data
stored in DRAM in the index
§ Some bits from hash value are used to
calculate the Partition ID (4096 partitions)
§ Partition ID maps to Node ID in the cluster
➤ 1 Hop to data
§ Smart Client simply calculates Partition ID to
determine Node ID
§ No Load Balancers required
➤ Shared Nothing architecture
§ Every node is indentical
Distribution
cookie-abcdefg-12345678
182023kh15hh3kahdjsh
Partition ID Master
Node ID
Replica
Node ID
… 1 4
1820 2 3
1821 3 2
4096 4 1
26. © 2013 Aerospike. All rights reserved. Confidential. 26
➤ Super Storm Sandy 2012
§ NYC down for 17 hours
§ Back up and synched in 1 hour via
Aerospike Cross-Data Center Replication (XDR)
Replication that Works
“Aerospike allows us to
handle business continuity
and reliability across 4 data
centers seamlessly. And we
can now expand our
deployment to new data
centers in less than a week.”
- Elad Efraim, CTO
27. © 2013 Aerospike. All rights reserved. Confidential. 27
➤ Namespaces (policy containers)
§ Determine storage - DRAM or Flash
§ Determine replication factor
§ Contain records and sets
➤ Sets (tables) of records
§ Arbitrary grouping
➤ Records (rows)
§ Max 128k, contain key and bins
§ Bin with same name can contain
values of different types
u String, integer, bytes (raw, blob, etc)
u list ( an ordered collection of
values )
u map ( a collection of keys and
values )
§ Bins can be added anytime
NOSQL EXTENSIBILITY
28. © 2013 Aerospike. All rights reserved. Confidential. 28
DISTRIBUTED QUERIES
1. “Scatter” requests to all nodes
2. Indexes in DRAM for fast map of secondary à primary keys
3. Indexes co-located with data to guarantee ACID,
manage migrations
4. Records read in parallel from all SSDs
using lock free concurrency control
5. Aggregate results on each node
6. “Gather” results from all nodes on client
STREAM AGGREGATIONS
1. Push Code/ Security Policies/ Rules to Data with UDFs
2. Pipe Query results through UDFs to
Filter, Transform, Aggregate.. Map, Reduce
REAL-TIME ANALYTICS on OPERATIONAL DATA (No ETL)
➤ In Database, within the same Cluster
➤ On the same Data, on XDR Replicated Clusters
Real-time Analytics on Operational Data
29. © 2013 Aerospike. All rights reserved. Confidential. 29
brian@aerospike.com
srini@aerospike.com
QUESTIONS