Real-time Analytics for Data-Driven Applications

Real-Time Analytics for Data-Driven
Applications
Milind Bhandarkar, Founder & CEO, Ampool
@techmilind
1

2
Increasing demand for
intelligent experiences!
Immediate
Fulfillment
Anywhere,
Real-time
Powered by
Analytics
Ongoing
Value
∞
…powered by all available
data!
Transactions
Points
User actions
Workflow
Location
Social
Financial
Behavioral Contextual
Hot (Fresh)
Data
Real-time
Actions
…and actionable, timely
insights driving value!
$$$
Business
Value

Real-Time Enterprise
“Companies need to learn how to catch people or things in the act of
doing something and affect the outcome”
-Paul Maritz
Executive Chairman, Pivotal
The Real-Time Enterprise is an enterprise that competes by using up-to-date
information to progressively remove delays to the management and execution of
its critical business processes”
- Gartner
(https://www.gartner.com/doc/372176/gartner-definition-realtime-enterprise)
3
Core Problem:
Real-Time, Personalized, Actionable Information, in the Current Context

Meanwhile, enterprises suffer from “Data Blackout
Periods”
CONFIDENTIAL AND PROPRIETARY | 4
~12-48 Hours
Data Extracts, Data Staging, Complex Joins, ETL, Data
Loading, Bulk Updates, Format Conversions, File-Base Data
Exchanges
OLTP RDBMS, NoSQL OLAP Data Warehouse,
Data Lake
Apps, APIs, Services
(External)
BI, Analytics
(Internal)

Ampool Mission: Eliminate “Data Blackout Periods”
Ingest and update active data in real time
Analyze using “best-of-breed” engines
Serve data concurrently to multiple tenants/appsOLTP RDBMS, NoSQL OLAP Data Warehouse, Data
Lake
(External)
BI, Analytical Apps
(Internal)
Modern Data-Driven Applications
Capture and Deliver Value Now with Ampool’s Robust Memory Layer

What differentiates Ampool?
Fast Continuous Ingestion, In-place Real-time Updates,
No ETL, Memory-speed Analytics, Flexible Processing,
Low-Latency ServingOLTP RDBMS, NoSQL OLAP Data Warehouse, Data
Lake
(External)
BI, Analytical Apps
(Internal)
Modern Data-Driven Applications
Designed to support both transactional and analytical workloads
Best of Breed enginesRobust in-memory technology

Data-driven Apps require several capabilities…
Analyze
Support streaming,
batch/ machine learning &
interactive querying.
Store
Flexibility in storing data;
keep-up with fast
ingestion needs.
Serve
Serve processed data
(aggregates or insights)
at scale and speed
APP
Persistence

APP
… that are well-supported by Ampool ADS
For ALL data processing needs near
applications
1. Store ALL active data & update it, as
required
2. Analyze through ‘best-of-breed’ compute
engines & frameworks
3. Serve data concurrently to multiple data
processing stages, tenants & applications
Long-term
Persistence
Manage hot data
in-memory
Process where
data is stored
Primary store;
not a cache!
An Active Data Store between compute & long-term storage

Powered by Apache Geode©
9
In-Memory Distributed Sys
Low-latency Comms.
Key-Value Store
Function Pushdown
+
High Throughput
Table Abstractions
Native InterfacePluggable Persistence
Java API
MASH (CLI Ext)
Java API
Smart Data Tiering
Mature Event Model
Tunable Consistency
Metadata/ Catalog
Security AuthZ

Built With An Extensible Architecture
Compute Frameworks (Spark, Hive, EsgynDB, Apex, CDAP, Flink, Storm, HAWQ…)
Storage Handlers, Native API (Java, REST), Shell
Security: Authentication & Authorization
Metadata, Type System, Statistics, & Smart Tiering Logic
Data Distribution & Operators (Filters, Projections…)
Off-Heap DRAM + Extended Memory (SCM, NVMe Flash) Layouts & Replication
Recovery & Persistent Secondary Stores (S3, ADLS, HDFS, Hbase, MPP DB…)

Example Use-Case:
Enabling Real-Time Apps, Removing Complexity
BEFORE AFTER

Multiple Verticals & Use-cases
Financial Services
• Fraud Detection
• Credit/ Market risks
• Event-based marketing
Telecom
• Network/ quality opt.
• Mobile user analysis
• Event-based marketing
Retail
• Targeted digital offers
• Markdown optimization
• Event-based targeting
Media
• Content/ ad delivery
• Event/ behavior-based
targeting
Anomaly Detection
• Event/ activity monitoring
• Real-time automated decisions
IoT Analytics
• Device management
• Comms. optimization
360 Customer Analytics
• Social media sentiment analysis
• Event-based ad targeting

Initial Performance Benchmarks
13
0.0000
50.0000
100.0000
150.0000
200.0000
250.0000
4 16 32 64 96 128 160 192 224 256 288
Throughput
Number of Clients
YCSB Workload A
Ampool
Hbase
0.0000
100.0000
200.0000
300.0000
400.0000
4 16 32 64 96 128 160 192 224 256 288
Throughput
Number of Clients
YCSB Workload B
Ampool
Hbase
0.0000
100.0000
200.0000
300.0000
400.0000
4 16 32 64 96 128 160 192 224 256 288
Throughput
Number of Clients
YCSB Workload C
Ampool
Hbase

Customer Analytical Queries
14
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
1 2 3 4 5 6 7 8 9 10
QueryTime(Seconds)
Query #
Analytical Query Performance (Lower is better)
HBase
Ampool

Single Node Performance
15
0
50000
100000
150000
40 80 160
OPs/Sec
Clients
YCSB WORKLOAD-A
A-MySQL A-Ampool
0
50000
100000
150000
40 80 160
OPs/Sec
Clients
YCSB WORKLOAD-B
B-MySQL B-Ampool
0
100000
200000
40 80 160
OPs/Sec
Clients
YCSB WORKLOAD-C
C-MySQL C-Ampool
0
1000
2000
3000
4000
5000
6000
A-MySQL A-Ampool B-MySQL B-Ampool C-MySQL C-Ampool
Latency(microseconds)
Average Latency (80 Clients)

Let’s take a closer look!
ADS Core based on Apache Geode
• Tabular object/ structures
• APIs for extending capabilities
• Compute, storage & import/ export
• User-defined functions (co-processors)
Pre-built connectors for:
• Data ingest/ export paths
• Data processing (compute f/works)
Pre-built extensions for persistence
• From on-prem shared FS to Cloud storage
1
2 3
1
2
3
Modular components for deployment flexibility and extensibility
Powered
By

Ampool Core: Objects & Structures
Region
Get/Put/Delete operations
Arbitrary object values
JSON support (using PDX)
Query-able (using OQL)
Filters (Function execution)
Get/ Put/ Delete/ Scan
Typed columns (Hive-style)
Ordered or Unordered
Filters & Co-processors
HBase-like APIs
Get/ Append/ Scan (immutable)
Typed columns (Hive-style)
Ordered-only, typically by time
Filters, Scan, Append, & Bulk Mutation
APIs
Suitable for user-data with smaller
dimensions and direct-app
interactions.
Suitable for dimensional/
reference data and frequent
updates.
Suitable for continuously flowing
factual data (Tx or time-series).
MTable FTable
1
Options for different data needs & workloads

EXT: Data Ingestion capabilities
Kafka Sink
…
Java/ REST DataFrame
Configure as Kafka sink
Push multiple topics across
all servers
Implement your own client using
Java APIs
Directly ingest (PUT) data using
REST APIs
Spark import through streaming
or files
Persist DataFrames as Ampool
Tables
2
Direct (stream) ingestion or import through frameworks

EXT: Data Processing options
External Tables
…
DataFrame Pandas/Frames
Query using HiveQL
Column projections
Filter pushdowns
Computations
Column selection
Value filters
Query with Spark SQL
Language bindings to
manipulate data
Co-processors for data science
using APIs
2
Access data programmatically or through structured queries

EXT: Data Persistence capabilities
Write-ahead Log for all data
within memory. Used for
server recovery
Java API to capture data
changes in MTable
E.g: implement your
own JDBC listener
Move older/ less used data to
next tier (local/ remote)
Seamless scan on tiered data
in ORC/Parquet format
Local FileSystem
Change Data
Capture (CDC)
MTable FTable
Tiered Data
Persistence
3
Options for system availability/ recovery, data tiering and archiving

How is Ampool ADS deployed?
Î
Locator
Server
Server
Server
Server
 Clients store, scan & retrieve
data
 Direct (REST, Java), Data
ingest (Kafka) or compute
engines (Spark)
 Locators provide up to-date
topology info to Clients and
Servers
 Servers communicate to
maintain data (load) balance &
consistency
Client
Client
Client
Client

Deployment, Management & Monitoring
Deployment & Service Management
Management REST APIs for service setup
JMX endpoints for complete management
Memory Analytics SHell (MASH)
Monitoring & Performance Management
JMX attributes with complete coverage
Statistical metric sampling for diagnostics & tuning
Enterprise-grade Security
Kerberos Authentication
LDAP authorization for users, roles & data access
REST
JMX
CLI
REST
JMX
Stats
REST
JMX
Security
JMC
LDAPKerberos
Production-ready services with deployment & management flexibility

Ampool in Event-Driven Architectures

Event-Driven Architecture
Mobile Applications
PoSCall Centers
Web Applications IoT Devices Business Systems
External
HTTP
Message QueuesLog Files
Web Sockets HTTP Streaming Polling
Extracts
Web-App Backend
Micro-batchingData Pipelines
Microservices Stream Processing Triggers
Fast Batching
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
API Gateway
Continuous DeliveryAuto Scaling
Service Discovery Long-Lived Service Hosting
Functions
(Serverless)
Load Balancing
Deployment
Management
Monitoring
Auditing
Governance
Security
Event Generation
Event Transport
Event Processing
Analytics &
Serving
Runtime
Adapted from @rseroter
https://content.pivotal.io/blog/how-to-deliver-an-event-driven-architecture

Ampool Simplifies Event-Driven Architecture
Mobile Applications
PoSCall Centers
Web Applications IoT Devices Business Systems
External
HTTP
Message QueuesLog Files
Web Sockets HTTP Streaming Polling
Extracts
Web-App Backend
Fast Batching
Stream-Brokers
API Gateway
Continuous DeliveryAuto Scaling
Service Discovery Long-Lived Service Hosting
Functions
(Serverless)
Load Balancing
Deployment
Management
Monitoring
Auditing
Governance
Security
Event Generation
Event Transport
Runtime

Ampool ADS for Analytics & Serving
Stream-Brokers

Ampool ADS & Event Processing Engines
Web-App Backend
Fast Batching

In summary, use Ampool ADS to…
Create an analytical foundation for Apps
• Understand usage in real-time
• Learn from App’s data ‘exhaust’
Reduce operational complexity
• Replace multiple single-function stores with
a single, versatile in-memory store
Get in-memory processing speed-up
• Low-latency responses
• Serve multiple data processes & tenants,
reducing data copies

As of today, Ampool ADS is Open Source
Project name “Monarch”
Apache License (ASLv2)
• Powered by Apache Geode
Includes several connectors
• Spark (1.6 & 2.x), Hive (1.2.x & 2.x), PrestoDB, Apache Kafka, R, Python
Contributions welcome!
Give it a try: http://github.com/ampool/monarch
30

Available on AWS Marketplace
Free Single Node AMI (EC2 charges apply)
• https://aws.amazon.com/marketplace/pp/B077D81DD1
Multi-Node Ampool ADS Cluster
• https://aws.amazon.com/marketplace/pp/B0784YHDW8
• Single Click Deployment
• Local SSD Storage (no EBS costs)
• Autoscaling
• M3.2xlarge instances (More coming soon)
• US-East & US-West Regions (More coming soon)
• 31-Day Free Trial
• Support by Email & Web-based Ticketing
• Annual Subscription Discount
31

Ampool ADS v 2.0 (Coming Soon)
Notable new features
• Support for in-memory columnar storage in FTables
• Support for partition pruning
• Several fold performance gains with filter pushdowns
• Support for fast data ingestion from Kafka topics
• Integration with Kafka Connect
• New Presto-DB Connector
• New Apache Calcite Connector
• Delta Persistence
And several performance improvements, and stability fixes
32

Try out Ampool today!
Download: http://www.ampool.io/product
Code: https://github.com/ampool/monarch
Single Node AMI: https://aws.amazon.com/marketplace/pp/B077D81DD1
Ampool Cluster: https://aws.amazon.com/marketplace/pp/B0784YHDW8
Documentation: http://docs.ampool-inc.com/
Support: support@ampool.io
Discuss: https://groups.google.com/forum/#!forum/ampool-users

Real-time Analytics for Data-Driven Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real-time Analytics for Data-Driven Applications

Similar to Real-time Analytics for Data-Driven Applications (20)

More from VMware Tanzu

More from VMware Tanzu (20)

Recently uploaded

Recently uploaded (20)

Real-time Analytics for Data-Driven Applications

Editor's Notes