Real-Time Analytics in
Transactional Applications
—
Brian Bulkowski, CTO and Founder
2© 2016 Aerospike Inc. All rights reserved.[ ]
What is Aerospike ?
Large-scale DHT Database ( 10B ++ objects, 100T++, O(1) get / put )
… with queries, data structures, UDF, fast clients ...
... On Linux ...
High availability clustering & rebalancing ( proven 5 9’s, no load balancer )
Hybrid Memory – provides either persistent DRAM or online Flash economics
KVS++ provides query, predicate filters, table/columns, aggregations
Cloud-savvy – runs with EC2, GCE others; Docker, more …
Dual License: Open Source for devs, Enterprise for deployment
Stable business: Large VC backed, database insiders, solid revinue streams
Transactions and Analytics
4© 2016 Aerospike Inc. All rights reserved.[ ]
Traditional Architecture Has Significant Limitations
Challenges
• Complex
• Maintainability
• Durability
• Consistency
• Scalability
• Cost ($)
• Data Lag
Caching Layer
Operational Database
Real-time
Consumer Facing
Pricing /
Inventory/Billing
Real-time
Decisioning
Streaming
Data
Legacy Database
(Mainframe)
RDBMS
Database
Transactional
Systems
Enterprise Environment
Legacy RDBMS
HDFS BASED
Fast speed – Consumer Scale
5© 2016 Aerospike Inc. All rights reserved.[ ]
Simplified Data Architecture
Aerospike
Connectors
Legacy Database
(Mainframe)
RDBMS
Database
Transactional
Systems
Enterprise Environment
XDR
Aerospike
Legacy
RDBMS
HDFS BASED
Powered by High Performance NoSQL
Fast speed – Consumer Scale
Hybrid Memory Database
Benefits:
• Simplicity
• Maintainability
• Durability
• Consistency
• Scalability
• Cost ($)
• Data Lag Reduced
Real-time
Consumer Facing
Pricing /
Inventory/Billing
Real-time
Decisioning
Streaming
Data
6© 2016 Aerospike Inc. All rights reserved.[ ]
Non-Relational Real-Time
LEGACY DATABASE
(Mainframe)
XDR
Decisioning Engine
DATA WAREHOUSE/
DATA LAKE
LEGACY RDBMS
HDFS BASED
BUSINESS
TRANSACTIONS
Web views
( Payments )
( Mobile Queries ) (
Recommendation )
( And More )
High Performance NoSQL
“REAL-TIME BIG DATA”
“DECISIONING”
500
Business Trans per sec
5000
Calculations per business transaction
X = 2.5 M
Database Transactions per sec
7© 2016 Aerospike Inc. All rights reserved.[ ]
To Summarize
XDR
DATA WAREHOUSE/
DATA LAKE
LEGACY RDBMS
HADOOP HDFS
INTERATIONS
AND
ENGAGEMENTS
( Web views )
( Payments )
( Recommendation
)
( And More )
High Performance NoSQL
SHARED STATE
SESSION DATA
COUNTERS
AGGEREGATION
YY
YOUR STREAMING
ANALYTICS
FRAMEWORK
YY
Decisioning Engine
8© 2016 Aerospike Inc. All rights reserved.[ ]
■ Use Application Frameworks, and Write Code
■ Allow multiple frameworks, based on problem + team
■ Primary Key Data Architecture
■ Don’t use complex queries on the front edge !
■ Understand CP vs AP tradeoffs
■ You don’t always want CP
■ Build for both data growth, and transaction growth
Keys to Success
Real-world examples
10© 2016 Aerospike Inc. All rights reserved.[ ]
AdTech – Real-time Advertisement Optimization
Challenge
• Rapid, custom algorithm deployment
• Low read latency (milliseconds)
• Scale from 100K to 5M operations / second
• Ensure 100% uptime with global data
Performance requirement
• 1 to 6 billion cookies tracked
• 5.0M auctions per second
• 100ms ad rendering, 50ms real-time bidding,
1ms database access
• 1.5KB median object size
Selected NoSQL
• Flash economics
• Faster algo development
• Easy partner onboarding
Ads is Displayed
Publishers
Ad Networks & SSPs
Ad Exchanges
Demand Side
Platform
Data Management
Platforms
Brands Agencies Buyers
0 ms 100 ms
11© 2016 Aerospike Inc. All rights reserved.[ ]
CREDIT CARD
PROCESSING SYSTEM
FRAUD DETECTION &
PROTECTION APP
ACCOUNT
BEHAVIOR
ACCOUNT
STATISTICS
STATIC DATA
RULE 1 – PASSED ✔
RULE 2 – PASSED ✔
RULE 3 – FAILED ✗
HISTORICAL
DATA
RULES
RULE 1
RULE 2
RULE 3
…
Challenge
■ Rapid, custom algorithm deployment
■ Overall SLA 750 ms
■ Plan for 10x growth
■ Every payment transaction requires
hundreds of DB reads/writes
Need to scale reliably
■ 10  100 TB
■ 10B  100 B objects
■ 200k  I Million+ TPS
Selected NoSQL
■ Flash economics with DRAM performance
■ Cross data center (XDR) support
Fraud Prevention Beyond the Rules Engine
12© 2016 Aerospike Inc. All rights reserved.[ ]
Challenge
■ Real-time Exchange Monitoring
■ Per-account Risk calculation
■ Global Risk calculation
■ In-flight trade analysis
■ MemCache + DB/2 has data inconsistancy
Need to scale reliably
■ 3  13 TB
■ 100  400 Million objects
■ 200k  I Million TPS
Selected NoSQL
■ Built for Flash
■ Predictable Low latency at High Throughput
■ Immediate consistency, no data loss
■ Cross data center (XDR) support
IBM DB2
(MAINFRAME)
Read/Write
Start of Day
Data Loading
End of Day
Reconciliation
Query
REAL-TIME
DATA FEED
ACCOUNT
POSITIONS
XDR
Retail Trading
13© 2016 Aerospike Inc. All rights reserved.[ ]
Nielsen Machine Learning Example
14© 2016 Aerospike Inc. All rights reserved.[ ]
Unique User DataStore
53 Servers across
4 data centers
Specs
Memory: 512GB
CPU: e5-2620v2 (Dual-Socket)
Disk: Intel S3710(13-15 1.2TB SSDs)
Network: Aggregated 10GB NICs
2-Namespaces
Online Learning (Models Store)
9 Servers across
3 data centers
Specs
Memory: 32GB
CPU: e5-2620 (Dual-Socket)
Disk:1-240GB SSDs
Network: Aggregated 1GB NICs
1-Namespace
Online Learning
Machine Learning – Data and Model Storage
15© 2016 Aerospike Inc. All rights reserved.[ ]
■ Risk recalculation in minutes, not hours
■ Immediate post-trade analytics
■ Internet cache replacement / session management
■ Telecom integrated real-time billing and routing
■ Account status determines routing, traffic shaping
■ Retail predictive analytics
■ Gaming and gambling
■ Social messaging
Similar use cases
To Summarize
17© 2016 Aerospike Inc. All rights reserved.[ ]
To Summarize
XDR
DATA WAREHOUSE/
DATA LAKE
LEGACY RDBMS
HADOOP HDFS
INTERATIONS
AND
ENGAGEMENTS
( Web views )
( Payments )
( Recommendation
)
( And More )
High Performance NoSQL
SHARED STATE
SESSION DATA
COUNTERS
AGGEREGATION
YY
YOUR STREAMING
ANALYTICS
FRAMEWORK
YY
Decisioning Engine
18© 2016 Aerospike Inc. All rights reserved.[ ]
Questions?
Aerospike open source at
https://github.com/aerospike/aerospike-server

Real-Time Analytics in Transactional Applications by Brian Bulkowski

  • 1.
    Real-Time Analytics in TransactionalApplications — Brian Bulkowski, CTO and Founder
  • 2.
    2© 2016 AerospikeInc. All rights reserved.[ ] What is Aerospike ? Large-scale DHT Database ( 10B ++ objects, 100T++, O(1) get / put ) … with queries, data structures, UDF, fast clients ... ... On Linux ... High availability clustering & rebalancing ( proven 5 9’s, no load balancer ) Hybrid Memory – provides either persistent DRAM or online Flash economics KVS++ provides query, predicate filters, table/columns, aggregations Cloud-savvy – runs with EC2, GCE others; Docker, more … Dual License: Open Source for devs, Enterprise for deployment Stable business: Large VC backed, database insiders, solid revinue streams
  • 3.
  • 4.
    4© 2016 AerospikeInc. All rights reserved.[ ] Traditional Architecture Has Significant Limitations Challenges • Complex • Maintainability • Durability • Consistency • Scalability • Cost ($) • Data Lag Caching Layer Operational Database Real-time Consumer Facing Pricing / Inventory/Billing Real-time Decisioning Streaming Data Legacy Database (Mainframe) RDBMS Database Transactional Systems Enterprise Environment Legacy RDBMS HDFS BASED Fast speed – Consumer Scale
  • 5.
    5© 2016 AerospikeInc. All rights reserved.[ ] Simplified Data Architecture Aerospike Connectors Legacy Database (Mainframe) RDBMS Database Transactional Systems Enterprise Environment XDR Aerospike Legacy RDBMS HDFS BASED Powered by High Performance NoSQL Fast speed – Consumer Scale Hybrid Memory Database Benefits: • Simplicity • Maintainability • Durability • Consistency • Scalability • Cost ($) • Data Lag Reduced Real-time Consumer Facing Pricing / Inventory/Billing Real-time Decisioning Streaming Data
  • 6.
    6© 2016 AerospikeInc. All rights reserved.[ ] Non-Relational Real-Time LEGACY DATABASE (Mainframe) XDR Decisioning Engine DATA WAREHOUSE/ DATA LAKE LEGACY RDBMS HDFS BASED BUSINESS TRANSACTIONS Web views ( Payments ) ( Mobile Queries ) ( Recommendation ) ( And More ) High Performance NoSQL “REAL-TIME BIG DATA” “DECISIONING” 500 Business Trans per sec 5000 Calculations per business transaction X = 2.5 M Database Transactions per sec
  • 7.
    7© 2016 AerospikeInc. All rights reserved.[ ] To Summarize XDR DATA WAREHOUSE/ DATA LAKE LEGACY RDBMS HADOOP HDFS INTERATIONS AND ENGAGEMENTS ( Web views ) ( Payments ) ( Recommendation ) ( And More ) High Performance NoSQL SHARED STATE SESSION DATA COUNTERS AGGEREGATION YY YOUR STREAMING ANALYTICS FRAMEWORK YY Decisioning Engine
  • 8.
    8© 2016 AerospikeInc. All rights reserved.[ ] ■ Use Application Frameworks, and Write Code ■ Allow multiple frameworks, based on problem + team ■ Primary Key Data Architecture ■ Don’t use complex queries on the front edge ! ■ Understand CP vs AP tradeoffs ■ You don’t always want CP ■ Build for both data growth, and transaction growth Keys to Success
  • 9.
  • 10.
    10© 2016 AerospikeInc. All rights reserved.[ ] AdTech – Real-time Advertisement Optimization Challenge • Rapid, custom algorithm deployment • Low read latency (milliseconds) • Scale from 100K to 5M operations / second • Ensure 100% uptime with global data Performance requirement • 1 to 6 billion cookies tracked • 5.0M auctions per second • 100ms ad rendering, 50ms real-time bidding, 1ms database access • 1.5KB median object size Selected NoSQL • Flash economics • Faster algo development • Easy partner onboarding Ads is Displayed Publishers Ad Networks & SSPs Ad Exchanges Demand Side Platform Data Management Platforms Brands Agencies Buyers 0 ms 100 ms
  • 11.
    11© 2016 AerospikeInc. All rights reserved.[ ] CREDIT CARD PROCESSING SYSTEM FRAUD DETECTION & PROTECTION APP ACCOUNT BEHAVIOR ACCOUNT STATISTICS STATIC DATA RULE 1 – PASSED ✔ RULE 2 – PASSED ✔ RULE 3 – FAILED ✗ HISTORICAL DATA RULES RULE 1 RULE 2 RULE 3 … Challenge ■ Rapid, custom algorithm deployment ■ Overall SLA 750 ms ■ Plan for 10x growth ■ Every payment transaction requires hundreds of DB reads/writes Need to scale reliably ■ 10  100 TB ■ 10B  100 B objects ■ 200k  I Million+ TPS Selected NoSQL ■ Flash economics with DRAM performance ■ Cross data center (XDR) support Fraud Prevention Beyond the Rules Engine
  • 12.
    12© 2016 AerospikeInc. All rights reserved.[ ] Challenge ■ Real-time Exchange Monitoring ■ Per-account Risk calculation ■ Global Risk calculation ■ In-flight trade analysis ■ MemCache + DB/2 has data inconsistancy Need to scale reliably ■ 3  13 TB ■ 100  400 Million objects ■ 200k  I Million TPS Selected NoSQL ■ Built for Flash ■ Predictable Low latency at High Throughput ■ Immediate consistency, no data loss ■ Cross data center (XDR) support IBM DB2 (MAINFRAME) Read/Write Start of Day Data Loading End of Day Reconciliation Query REAL-TIME DATA FEED ACCOUNT POSITIONS XDR Retail Trading
  • 13.
    13© 2016 AerospikeInc. All rights reserved.[ ] Nielsen Machine Learning Example
  • 14.
    14© 2016 AerospikeInc. All rights reserved.[ ] Unique User DataStore 53 Servers across 4 data centers Specs Memory: 512GB CPU: e5-2620v2 (Dual-Socket) Disk: Intel S3710(13-15 1.2TB SSDs) Network: Aggregated 10GB NICs 2-Namespaces Online Learning (Models Store) 9 Servers across 3 data centers Specs Memory: 32GB CPU: e5-2620 (Dual-Socket) Disk:1-240GB SSDs Network: Aggregated 1GB NICs 1-Namespace Online Learning Machine Learning – Data and Model Storage
  • 15.
    15© 2016 AerospikeInc. All rights reserved.[ ] ■ Risk recalculation in minutes, not hours ■ Immediate post-trade analytics ■ Internet cache replacement / session management ■ Telecom integrated real-time billing and routing ■ Account status determines routing, traffic shaping ■ Retail predictive analytics ■ Gaming and gambling ■ Social messaging Similar use cases
  • 16.
  • 17.
    17© 2016 AerospikeInc. All rights reserved.[ ] To Summarize XDR DATA WAREHOUSE/ DATA LAKE LEGACY RDBMS HADOOP HDFS INTERATIONS AND ENGAGEMENTS ( Web views ) ( Payments ) ( Recommendation ) ( And More ) High Performance NoSQL SHARED STATE SESSION DATA COUNTERS AGGEREGATION YY YOUR STREAMING ANALYTICS FRAMEWORK YY Decisioning Engine
  • 18.
    18© 2016 AerospikeInc. All rights reserved.[ ] Questions? Aerospike open source at https://github.com/aerospike/aerospike-server