Real-time analytics is important for data-driven applications. Ampool provides an active data store (ADS) that can ingest data in real-time, analyze it using various engines, and serve the results concurrently. This eliminates "data blackout periods" and enables applications to use up-to-date information. Ampool's ADS is powered by Apache Geode and has connectors for ingesting and processing data. It supports both transactional and analytical workloads in memory for low-latency.
2. 2
Increasing demand for
intelligent experiences!
Immediate
Fulfillment
Anywhere,
Real-time
Powered by
Analytics
Ongoing
Value
∞
…powered by all available
data!
Transactions
Points
User actions
Workflow
Location
Social
Financial
Behavioral Contextual
Hot (Fresh)
Data
Real-time
Actions
…and actionable, timely
insights driving value!
$$$
Business
Value
3. Real-Time Enterprise
“Companies need to learn how to catch people or things in the act of
doing something and affect the outcome”
-Paul Maritz
Executive Chairman, Pivotal
The Real-Time Enterprise is an enterprise that competes by using up-to-date
information to progressively remove delays to the management and execution of
its critical business processes”
- Gartner
(https://www.gartner.com/doc/372176/gartner-definition-realtime-enterprise)
3
Core Problem:
Real-Time, Personalized, Actionable Information, in the Current Context
4. Meanwhile, enterprises suffer from “Data Blackout
Periods”
CONFIDENTIAL AND PROPRIETARY | 4
~12-48 Hours
Data Extracts, Data Staging, Complex Joins, ETL, Data
Loading, Bulk Updates, Format Conversions, File-Base Data
Exchanges
OLTP RDBMS, NoSQL OLAP Data Warehouse,
Data Lake
Apps, APIs, Services
(External)
BI, Analytics
(Internal)
5. Ampool Mission: Eliminate “Data Blackout Periods”
Ingest and update active data in real time
Analyze using “best-of-breed” engines
Serve data concurrently to multiple tenants/appsOLTP RDBMS, NoSQL OLAP Data Warehouse, Data
Lake
Apps, APIs, Services
(External)
BI, Analytical Apps
(Internal)
Modern Data-Driven Applications
Capture and Deliver Value Now with Ampool’s Robust Memory Layer
6. What differentiates Ampool?
Fast Continuous Ingestion, In-place Real-time Updates,
No ETL, Memory-speed Analytics, Flexible Processing,
Low-Latency ServingOLTP RDBMS, NoSQL OLAP Data Warehouse, Data
Lake
Apps, APIs, Services
(External)
BI, Analytical Apps
(Internal)
Modern Data-Driven Applications
Designed to support both transactional and analytical workloads
Best of Breed enginesRobust in-memory technology
7. Data-driven Apps require several capabilities…
Analyze
Support streaming,
batch/ machine learning &
interactive querying.
Store
Flexibility in storing data;
keep-up with fast
ingestion needs.
Serve
Serve processed data
(aggregates or insights)
at scale and speed
APP
Persistence
8. APP
… that are well-supported by Ampool ADS
For ALL data processing needs near
applications
1. Store ALL active data & update it, as
required
2. Analyze through ‘best-of-breed’ compute
engines & frameworks
3. Serve data concurrently to multiple data
processing stages, tenants & applications
Long-term
Persistence
Manage hot data
in-memory
Process where
data is stored
Primary store;
not a cache!
An Active Data Store between compute & long-term storage
16. Let’s take a closer look!
ADS Core based on Apache Geode
• Tabular object/ structures
• APIs for extending capabilities
• Compute, storage & import/ export
• User-defined functions (co-processors)
Pre-built connectors for:
• Data ingest/ export paths
• Data processing (compute f/works)
Pre-built extensions for persistence
• From on-prem shared FS to Cloud storage
1
2 3
1
2
3
Modular components for deployment flexibility and extensibility
Powered
By
17. Ampool Core: Objects & Structures
Region
Get/Put/Delete operations
Arbitrary object values
JSON support (using PDX)
Query-able (using OQL)
Filters (Function execution)
Get/ Put/ Delete/ Scan
Typed columns (Hive-style)
Ordered or Unordered
Filters & Co-processors
HBase-like APIs
Get/ Append/ Scan (immutable)
Typed columns (Hive-style)
Ordered-only, typically by time
Filters, Scan, Append, & Bulk Mutation
APIs
Suitable for user-data with smaller
dimensions and direct-app
interactions.
Suitable for dimensional/
reference data and frequent
updates.
Suitable for continuously flowing
factual data (Tx or time-series).
MTable FTable
1
Options for different data needs & workloads
18. EXT: Data Ingestion capabilities
Kafka Sink
…
Java/ REST DataFrame
Configure as Kafka sink
Push multiple topics across
all servers
Implement your own client using
Java APIs
Directly ingest (PUT) data using
REST APIs
Spark import through streaming
or files
Persist DataFrames as Ampool
Tables
2
Direct (stream) ingestion or import through frameworks
19. EXT: Data Processing options
External Tables
…
DataFrame Pandas/Frames
Query using HiveQL
Column projections
Filter pushdowns
Computations
Column selection
Value filters
Query with Spark SQL
Language bindings to
manipulate data
Co-processors for data science
using APIs
2
Access data programmatically or through structured queries
20. EXT: Data Persistence capabilities
Write-ahead Log for all data
within memory. Used for
server recovery
Java API to capture data
changes in MTable
E.g: implement your
own JDBC listener
Move older/ less used data to
next tier (local/ remote)
Seamless scan on tiered data
in ORC/Parquet format
Local FileSystem
Change Data
Capture (CDC)
MTable FTable
Tiered Data
Persistence
3
Options for system availability/ recovery, data tiering and archiving
21. How is Ampool ADS deployed?
Î
Locator
Server
Server
Server
Server
Clients store, scan & retrieve
data
Direct (REST, Java), Data
ingest (Kafka) or compute
engines (Spark)
Locators provide up to-date
topology info to Clients and
Servers
Servers communicate to
maintain data (load) balance &
consistency
Client
Client
Client
Client
22. Deployment, Management & Monitoring
Deployment & Service Management
Management REST APIs for service setup
JMX endpoints for complete management
Memory Analytics SHell (MASH)
Monitoring & Performance Management
JMX attributes with complete coverage
Statistical metric sampling for diagnostics & tuning
Enterprise-grade Security
Kerberos Authentication
LDAP authorization for users, roles & data access
REST
JMX
CLI
REST
JMX
Stats
REST
JMX
Security
JMC
LDAPKerberos
Production-ready services with deployment & management flexibility
24. Event-Driven Architecture
Mobile Applications
PoSCall Centers
Web Applications IoT Devices Business Systems
External
HTTP
Message QueuesLog Files
Web Sockets HTTP Streaming Polling
Extracts
Web-App Backend
Micro-batchingData Pipelines
Microservices Stream Processing Triggers
Fast Batching
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
API Gateway
Continuous DeliveryAuto Scaling
Service Discovery Long-Lived Service Hosting
Functions
(Serverless)
Load Balancing
Deployment
Management
Monitoring
Auditing
Governance
Security
Event Generation
Event Transport
Event Processing
Analytics &
Serving
Runtime
Adapted from @rseroter
https://content.pivotal.io/blog/how-to-deliver-an-event-driven-architecture
25. Ampool Simplifies Event-Driven Architecture
Mobile Applications
PoSCall Centers
Web Applications IoT Devices Business Systems
External
HTTP
Message QueuesLog Files
Web Sockets HTTP Streaming Polling
Extracts
Web-App Backend
Micro-batchingData Pipelines
Microservices Stream Processing Triggers
Fast Batching
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
API Gateway
Continuous DeliveryAuto Scaling
Service Discovery Long-Lived Service Hosting
Functions
(Serverless)
Load Balancing
Deployment
Management
Monitoring
Auditing
Governance
Security
Event Generation
Event Transport
Runtime
26. Ampool ADS for Analytics & Serving
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
28. In summary, use Ampool ADS to…
Create an analytical foundation for Apps
• Understand usage in real-time
• Learn from App’s data ‘exhaust’
Reduce operational complexity
• Replace multiple single-function stores with
a single, versatile in-memory store
Get in-memory processing speed-up
• Low-latency responses
• Serve multiple data processes & tenants,
reducing data copies
30. As of today, Ampool ADS is Open Source
Project name “Monarch”
Apache License (ASLv2)
• Powered by Apache Geode
Includes several connectors
• Spark (1.6 & 2.x), Hive (1.2.x & 2.x), PrestoDB, Apache Kafka, R, Python
Contributions welcome!
Give it a try: http://github.com/ampool/monarch
30
31. Available on AWS Marketplace
Free Single Node AMI (EC2 charges apply)
• https://aws.amazon.com/marketplace/pp/B077D81DD1
Multi-Node Ampool ADS Cluster
• https://aws.amazon.com/marketplace/pp/B0784YHDW8
• Single Click Deployment
• Local SSD Storage (no EBS costs)
• Autoscaling
• M3.2xlarge instances (More coming soon)
• US-East & US-West Regions (More coming soon)
• 31-Day Free Trial
• Support by Email & Web-based Ticketing
• Annual Subscription Discount
31
32. Ampool ADS v 2.0 (Coming Soon)
Notable new features
• Support for in-memory columnar storage in FTables
• Support for partition pruning
• Several fold performance gains with filter pushdowns
• Support for fast data ingestion from Kafka topics
• Integration with Kafka Connect
• New Presto-DB Connector
• New Apache Calcite Connector
• Delta Persistence
And several performance improvements, and stability fixes
32
Finding many companies with “Blackout Periods” where critical hot and warm data is not available
Data is still silo’d and lose hours and days Conditioning the data so multiple streams can be correlated … in Real Time
Still have challenges bringing transactional/reference data together + Behavioral, Contextual Data … as an example