SlideShare a Scribd company logo
1 of 33
Real-Time Analytics for Data-Driven
Applications
Milind Bhandarkar, Founder & CEO, Ampool
@techmilind
1
2
Increasing demand for
intelligent experiences!
Immediate
Fulfillment
Anywhere,
Real-time
Powered by
Analytics
Ongoing
Value
∞
…powered by all available
data!
Transactions
Points
User actions
Workflow
Location
Social
Financial
Behavioral Contextual
Hot (Fresh)
Data
Real-time
Actions
…and actionable, timely
insights driving value!
$$$
Business
Value
Real-Time Enterprise
“Companies need to learn how to catch people or things in the act of
doing something and affect the outcome”
-Paul Maritz
Executive Chairman, Pivotal
The Real-Time Enterprise is an enterprise that competes by using up-to-date
information to progressively remove delays to the management and execution of
its critical business processes”
- Gartner
(https://www.gartner.com/doc/372176/gartner-definition-realtime-enterprise)
3
Core Problem:
Real-Time, Personalized, Actionable Information, in the Current Context
Meanwhile, enterprises suffer from “Data Blackout
Periods”
CONFIDENTIAL AND PROPRIETARY | 4
~12-48 Hours
Data Extracts, Data Staging, Complex Joins, ETL, Data
Loading, Bulk Updates, Format Conversions, File-Base Data
Exchanges
OLTP RDBMS, NoSQL OLAP Data Warehouse,
Data Lake
Apps, APIs, Services
(External)
BI, Analytics
(Internal)
Ampool Mission: Eliminate “Data Blackout Periods”
Ingest and update active data in real time
Analyze using “best-of-breed” engines
Serve data concurrently to multiple tenants/appsOLTP RDBMS, NoSQL OLAP Data Warehouse, Data
Lake
Apps, APIs, Services
(External)
BI, Analytical Apps
(Internal)
Modern Data-Driven Applications
Capture and Deliver Value Now with Ampool’s Robust Memory Layer
What differentiates Ampool?
Fast Continuous Ingestion, In-place Real-time Updates,
No ETL, Memory-speed Analytics, Flexible Processing,
Low-Latency ServingOLTP RDBMS, NoSQL OLAP Data Warehouse, Data
Lake
Apps, APIs, Services
(External)
BI, Analytical Apps
(Internal)
Modern Data-Driven Applications
Designed to support both transactional and analytical workloads
Best of Breed enginesRobust in-memory technology
Data-driven Apps require several capabilities…
Analyze
Support streaming,
batch/ machine learning &
interactive querying.
Store
Flexibility in storing data;
keep-up with fast
ingestion needs.
Serve
Serve processed data
(aggregates or insights)
at scale and speed
APP
Persistence
APP
… that are well-supported by Ampool ADS
For ALL data processing needs near
applications
1. Store ALL active data & update it, as
required
2. Analyze through ‘best-of-breed’ compute
engines & frameworks
3. Serve data concurrently to multiple data
processing stages, tenants & applications
Long-term
Persistence
Manage hot data
in-memory
Process where
data is stored
Primary store;
not a cache!
An Active Data Store between compute & long-term storage
Powered by Apache Geode©
9
In-Memory Distributed Sys
Low-latency Comms.
Key-Value Store
Function Pushdown
+
High Throughput
Table Abstractions
Native InterfacePluggable Persistence
Java API
MASH (CLI Ext)
Java API
Smart Data Tiering
Mature Event Model
Tunable Consistency
Metadata/ Catalog
Security AuthZ
Built With An Extensible Architecture
Compute Frameworks (Spark, Hive, EsgynDB, Apex, CDAP, Flink, Storm, HAWQ…)
Storage Handlers, Native API (Java, REST), Shell
Security: Authentication & Authorization
Metadata, Type System, Statistics, & Smart Tiering Logic
Data Distribution & Operators (Filters, Projections…)
Off-Heap DRAM + Extended Memory (SCM, NVMe Flash) Layouts & Replication
Recovery & Persistent Secondary Stores (S3, ADLS, HDFS, Hbase, MPP DB…)
Example Use-Case:
Enabling Real-Time Apps, Removing Complexity
BEFORE AFTER
Multiple Verticals & Use-cases
Financial Services
• Fraud Detection
• Credit/ Market risks
• Event-based marketing
Telecom
• Network/ quality opt.
• Mobile user analysis
• Event-based marketing
Retail
• Targeted digital offers
• Markdown optimization
• Event-based targeting
Media
• Content/ ad delivery
• Event/ behavior-based
targeting
Anomaly Detection
• Event/ activity monitoring
• Real-time automated decisions
IoT Analytics
• Device management
• Comms. optimization
360 Customer Analytics
• Social media sentiment analysis
• Event-based ad targeting
Initial Performance Benchmarks
13
0.0000
50.0000
100.0000
150.0000
200.0000
250.0000
4 16 32 64 96 128 160 192 224 256 288
Throughput
Number of Clients
YCSB Workload A
Ampool
Hbase
0.0000
100.0000
200.0000
300.0000
400.0000
4 16 32 64 96 128 160 192 224 256 288
Throughput
Number of Clients
YCSB Workload B
Ampool
Hbase
0.0000
100.0000
200.0000
300.0000
400.0000
4 16 32 64 96 128 160 192 224 256 288
Throughput
Number of Clients
YCSB Workload C
Ampool
Hbase
Customer Analytical Queries
14
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
1 2 3 4 5 6 7 8 9 10
QueryTime(Seconds)
Query #
Analytical Query Performance (Lower is better)
HBase
Ampool
Single Node Performance
15
0
50000
100000
150000
40 80 160
OPs/Sec
Clients
YCSB WORKLOAD-A
A-MySQL A-Ampool
0
50000
100000
150000
40 80 160
OPs/Sec
Clients
YCSB WORKLOAD-B
B-MySQL B-Ampool
0
100000
200000
40 80 160
OPs/Sec
Clients
YCSB WORKLOAD-C
C-MySQL C-Ampool
0
1000
2000
3000
4000
5000
6000
A-MySQL A-Ampool B-MySQL B-Ampool C-MySQL C-Ampool
Latency(microseconds)
Average Latency (80 Clients)
Let’s take a closer look!
ADS Core based on Apache Geode
• Tabular object/ structures
• APIs for extending capabilities
• Compute, storage & import/ export
• User-defined functions (co-processors)
Pre-built connectors for:
• Data ingest/ export paths
• Data processing (compute f/works)
Pre-built extensions for persistence
• From on-prem shared FS to Cloud storage
1
2 3
1
2
3
Modular components for deployment flexibility and extensibility
Powered
By
Ampool Core: Objects & Structures
Region
Get/Put/Delete operations
Arbitrary object values
JSON support (using PDX)
Query-able (using OQL)
Filters (Function execution)
Get/ Put/ Delete/ Scan
Typed columns (Hive-style)
Ordered or Unordered
Filters & Co-processors
HBase-like APIs
Get/ Append/ Scan (immutable)
Typed columns (Hive-style)
Ordered-only, typically by time
Filters, Scan, Append, & Bulk Mutation
APIs
Suitable for user-data with smaller
dimensions and direct-app
interactions.
Suitable for dimensional/
reference data and frequent
updates.
Suitable for continuously flowing
factual data (Tx or time-series).
MTable FTable
1
Options for different data needs & workloads
EXT: Data Ingestion capabilities
Kafka Sink
…
Java/ REST DataFrame
Configure as Kafka sink
Push multiple topics across
all servers
Implement your own client using
Java APIs
Directly ingest (PUT) data using
REST APIs
Spark import through streaming
or files
Persist DataFrames as Ampool
Tables
2
Direct (stream) ingestion or import through frameworks
EXT: Data Processing options
External Tables
…
DataFrame Pandas/Frames
Query using HiveQL
Column projections
Filter pushdowns
Computations
Column selection
Value filters
Query with Spark SQL
Language bindings to
manipulate data
Co-processors for data science
using APIs
2
Access data programmatically or through structured queries
EXT: Data Persistence capabilities
Write-ahead Log for all data
within memory. Used for
server recovery
Java API to capture data
changes in MTable
E.g: implement your
own JDBC listener
Move older/ less used data to
next tier (local/ remote)
Seamless scan on tiered data
in ORC/Parquet format
Local FileSystem
Change Data
Capture (CDC)
MTable FTable
Tiered Data
Persistence
3
Options for system availability/ recovery, data tiering and archiving
How is Ampool ADS deployed?
Î
Locator
Server
Server
Server
Server
 Clients store, scan & retrieve
data
 Direct (REST, Java), Data
ingest (Kafka) or compute
engines (Spark)
 Locators provide up to-date
topology info to Clients and
Servers
 Servers communicate to
maintain data (load) balance &
consistency
Client
Client
Client
Client
Deployment, Management & Monitoring
Deployment & Service Management
Management REST APIs for service setup
JMX endpoints for complete management
Memory Analytics SHell (MASH)
Monitoring & Performance Management
JMX attributes with complete coverage
Statistical metric sampling for diagnostics & tuning
Enterprise-grade Security
Kerberos Authentication
LDAP authorization for users, roles & data access
REST
JMX
CLI
REST
JMX
Stats
REST
JMX
Security
JMC
LDAPKerberos
Production-ready services with deployment & management flexibility
Ampool in Event-Driven Architectures
Event-Driven Architecture
Mobile Applications
PoSCall Centers
Web Applications IoT Devices Business Systems
External
HTTP
Message QueuesLog Files
Web Sockets HTTP Streaming Polling
Extracts
Web-App Backend
Micro-batchingData Pipelines
Microservices Stream Processing Triggers
Fast Batching
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
API Gateway
Continuous DeliveryAuto Scaling
Service Discovery Long-Lived Service Hosting
Functions
(Serverless)
Load Balancing
Deployment
Management
Monitoring
Auditing
Governance
Security
Event Generation
Event Transport
Event Processing
Analytics &
Serving
Runtime
Adapted from @rseroter
https://content.pivotal.io/blog/how-to-deliver-an-event-driven-architecture
Ampool Simplifies Event-Driven Architecture
Mobile Applications
PoSCall Centers
Web Applications IoT Devices Business Systems
External
HTTP
Message QueuesLog Files
Web Sockets HTTP Streaming Polling
Extracts
Web-App Backend
Micro-batchingData Pipelines
Microservices Stream Processing Triggers
Fast Batching
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
API Gateway
Continuous DeliveryAuto Scaling
Service Discovery Long-Lived Service Hosting
Functions
(Serverless)
Load Balancing
Deployment
Management
Monitoring
Auditing
Governance
Security
Event Generation
Event Transport
Runtime
Ampool ADS for Analytics & Serving
Stream-Brokers
Data WarehouseLog Stores
State Caches Relational DB NoSQL DB
ML Training Platforms
Ampool ADS & Event Processing Engines
Web-App Backend
Micro-batchingData Pipelines
Microservices Stream Processing Triggers
Fast Batching
In summary, use Ampool ADS to…
Create an analytical foundation for Apps
• Understand usage in real-time
• Learn from App’s data ‘exhaust’
Reduce operational complexity
• Replace multiple single-function stores with
a single, versatile in-memory store
Get in-memory processing speed-up
• Low-latency responses
• Serve multiple data processes & tenants,
reducing data copies
One More Thing,
As of today, Ampool ADS is Open Source
Project name “Monarch”
Apache License (ASLv2)
• Powered by Apache Geode
Includes several connectors
• Spark (1.6 & 2.x), Hive (1.2.x & 2.x), PrestoDB, Apache Kafka, R, Python
Contributions welcome!
Give it a try: http://github.com/ampool/monarch
30
Available on AWS Marketplace
Free Single Node AMI (EC2 charges apply)
• https://aws.amazon.com/marketplace/pp/B077D81DD1
Multi-Node Ampool ADS Cluster
• https://aws.amazon.com/marketplace/pp/B0784YHDW8
• Single Click Deployment
• Local SSD Storage (no EBS costs)
• Autoscaling
• M3.2xlarge instances (More coming soon)
• US-East & US-West Regions (More coming soon)
• 31-Day Free Trial
• Support by Email & Web-based Ticketing
• Annual Subscription Discount
31
Ampool ADS v 2.0 (Coming Soon)
Notable new features
• Support for in-memory columnar storage in FTables
• Support for partition pruning
• Several fold performance gains with filter pushdowns
• Support for fast data ingestion from Kafka topics
• Integration with Kafka Connect
• New Presto-DB Connector
• New Apache Calcite Connector
• Delta Persistence
And several performance improvements, and stability fixes
32
Try out Ampool today!
Download: http://www.ampool.io/product
Code: https://github.com/ampool/monarch
Single Node AMI: https://aws.amazon.com/marketplace/pp/B077D81DD1
Ampool Cluster: https://aws.amazon.com/marketplace/pp/B0784YHDW8
Documentation: http://docs.ampool-inc.com/
Support: support@ampool.io
Discuss: https://groups.google.com/forum/#!forum/ampool-users

More Related Content

What's hot

Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 

What's hot (20)

SAP HORTONWORKS
SAP HORTONWORKSSAP HORTONWORKS
SAP HORTONWORKS
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 

Similar to Real-time Analytics for Data-Driven Applications

Unifying Analytics
Unifying AnalyticsUnifying Analytics
Unifying Analytics
Data Con LA
 

Similar to Real-time Analytics for Data-Driven Applications (20)

SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Unify Analytics: Combine Strengths of Data Lake and Data Warehouse
Unify Analytics: Combine Strengths of Data Lake and Data WarehouseUnify Analytics: Combine Strengths of Data Lake and Data Warehouse
Unify Analytics: Combine Strengths of Data Lake and Data Warehouse
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Unifying Analytics
Unifying AnalyticsUnifying Analytics
Unifying Analytics
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 

More from VMware Tanzu

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Real-time Analytics for Data-Driven Applications

  • 1. Real-Time Analytics for Data-Driven Applications Milind Bhandarkar, Founder & CEO, Ampool @techmilind 1
  • 2. 2 Increasing demand for intelligent experiences! Immediate Fulfillment Anywhere, Real-time Powered by Analytics Ongoing Value ∞ …powered by all available data! Transactions Points User actions Workflow Location Social Financial Behavioral Contextual Hot (Fresh) Data Real-time Actions …and actionable, timely insights driving value! $$$ Business Value
  • 3. Real-Time Enterprise “Companies need to learn how to catch people or things in the act of doing something and affect the outcome” -Paul Maritz Executive Chairman, Pivotal The Real-Time Enterprise is an enterprise that competes by using up-to-date information to progressively remove delays to the management and execution of its critical business processes” - Gartner (https://www.gartner.com/doc/372176/gartner-definition-realtime-enterprise) 3 Core Problem: Real-Time, Personalized, Actionable Information, in the Current Context
  • 4. Meanwhile, enterprises suffer from “Data Blackout Periods” CONFIDENTIAL AND PROPRIETARY | 4 ~12-48 Hours Data Extracts, Data Staging, Complex Joins, ETL, Data Loading, Bulk Updates, Format Conversions, File-Base Data Exchanges OLTP RDBMS, NoSQL OLAP Data Warehouse, Data Lake Apps, APIs, Services (External) BI, Analytics (Internal)
  • 5. Ampool Mission: Eliminate “Data Blackout Periods” Ingest and update active data in real time Analyze using “best-of-breed” engines Serve data concurrently to multiple tenants/appsOLTP RDBMS, NoSQL OLAP Data Warehouse, Data Lake Apps, APIs, Services (External) BI, Analytical Apps (Internal) Modern Data-Driven Applications Capture and Deliver Value Now with Ampool’s Robust Memory Layer
  • 6. What differentiates Ampool? Fast Continuous Ingestion, In-place Real-time Updates, No ETL, Memory-speed Analytics, Flexible Processing, Low-Latency ServingOLTP RDBMS, NoSQL OLAP Data Warehouse, Data Lake Apps, APIs, Services (External) BI, Analytical Apps (Internal) Modern Data-Driven Applications Designed to support both transactional and analytical workloads Best of Breed enginesRobust in-memory technology
  • 7. Data-driven Apps require several capabilities… Analyze Support streaming, batch/ machine learning & interactive querying. Store Flexibility in storing data; keep-up with fast ingestion needs. Serve Serve processed data (aggregates or insights) at scale and speed APP Persistence
  • 8. APP … that are well-supported by Ampool ADS For ALL data processing needs near applications 1. Store ALL active data & update it, as required 2. Analyze through ‘best-of-breed’ compute engines & frameworks 3. Serve data concurrently to multiple data processing stages, tenants & applications Long-term Persistence Manage hot data in-memory Process where data is stored Primary store; not a cache! An Active Data Store between compute & long-term storage
  • 9. Powered by Apache Geode© 9 In-Memory Distributed Sys Low-latency Comms. Key-Value Store Function Pushdown + High Throughput Table Abstractions Native InterfacePluggable Persistence Java API MASH (CLI Ext) Java API Smart Data Tiering Mature Event Model Tunable Consistency Metadata/ Catalog Security AuthZ
  • 10. Built With An Extensible Architecture Compute Frameworks (Spark, Hive, EsgynDB, Apex, CDAP, Flink, Storm, HAWQ…) Storage Handlers, Native API (Java, REST), Shell Security: Authentication & Authorization Metadata, Type System, Statistics, & Smart Tiering Logic Data Distribution & Operators (Filters, Projections…) Off-Heap DRAM + Extended Memory (SCM, NVMe Flash) Layouts & Replication Recovery & Persistent Secondary Stores (S3, ADLS, HDFS, Hbase, MPP DB…)
  • 11. Example Use-Case: Enabling Real-Time Apps, Removing Complexity BEFORE AFTER
  • 12. Multiple Verticals & Use-cases Financial Services • Fraud Detection • Credit/ Market risks • Event-based marketing Telecom • Network/ quality opt. • Mobile user analysis • Event-based marketing Retail • Targeted digital offers • Markdown optimization • Event-based targeting Media • Content/ ad delivery • Event/ behavior-based targeting Anomaly Detection • Event/ activity monitoring • Real-time automated decisions IoT Analytics • Device management • Comms. optimization 360 Customer Analytics • Social media sentiment analysis • Event-based ad targeting
  • 13. Initial Performance Benchmarks 13 0.0000 50.0000 100.0000 150.0000 200.0000 250.0000 4 16 32 64 96 128 160 192 224 256 288 Throughput Number of Clients YCSB Workload A Ampool Hbase 0.0000 100.0000 200.0000 300.0000 400.0000 4 16 32 64 96 128 160 192 224 256 288 Throughput Number of Clients YCSB Workload B Ampool Hbase 0.0000 100.0000 200.0000 300.0000 400.0000 4 16 32 64 96 128 160 192 224 256 288 Throughput Number of Clients YCSB Workload C Ampool Hbase
  • 14. Customer Analytical Queries 14 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 1 2 3 4 5 6 7 8 9 10 QueryTime(Seconds) Query # Analytical Query Performance (Lower is better) HBase Ampool
  • 15. Single Node Performance 15 0 50000 100000 150000 40 80 160 OPs/Sec Clients YCSB WORKLOAD-A A-MySQL A-Ampool 0 50000 100000 150000 40 80 160 OPs/Sec Clients YCSB WORKLOAD-B B-MySQL B-Ampool 0 100000 200000 40 80 160 OPs/Sec Clients YCSB WORKLOAD-C C-MySQL C-Ampool 0 1000 2000 3000 4000 5000 6000 A-MySQL A-Ampool B-MySQL B-Ampool C-MySQL C-Ampool Latency(microseconds) Average Latency (80 Clients)
  • 16. Let’s take a closer look! ADS Core based on Apache Geode • Tabular object/ structures • APIs for extending capabilities • Compute, storage & import/ export • User-defined functions (co-processors) Pre-built connectors for: • Data ingest/ export paths • Data processing (compute f/works) Pre-built extensions for persistence • From on-prem shared FS to Cloud storage 1 2 3 1 2 3 Modular components for deployment flexibility and extensibility Powered By
  • 17. Ampool Core: Objects & Structures Region Get/Put/Delete operations Arbitrary object values JSON support (using PDX) Query-able (using OQL) Filters (Function execution) Get/ Put/ Delete/ Scan Typed columns (Hive-style) Ordered or Unordered Filters & Co-processors HBase-like APIs Get/ Append/ Scan (immutable) Typed columns (Hive-style) Ordered-only, typically by time Filters, Scan, Append, & Bulk Mutation APIs Suitable for user-data with smaller dimensions and direct-app interactions. Suitable for dimensional/ reference data and frequent updates. Suitable for continuously flowing factual data (Tx or time-series). MTable FTable 1 Options for different data needs & workloads
  • 18. EXT: Data Ingestion capabilities Kafka Sink … Java/ REST DataFrame Configure as Kafka sink Push multiple topics across all servers Implement your own client using Java APIs Directly ingest (PUT) data using REST APIs Spark import through streaming or files Persist DataFrames as Ampool Tables 2 Direct (stream) ingestion or import through frameworks
  • 19. EXT: Data Processing options External Tables … DataFrame Pandas/Frames Query using HiveQL Column projections Filter pushdowns Computations Column selection Value filters Query with Spark SQL Language bindings to manipulate data Co-processors for data science using APIs 2 Access data programmatically or through structured queries
  • 20. EXT: Data Persistence capabilities Write-ahead Log for all data within memory. Used for server recovery Java API to capture data changes in MTable E.g: implement your own JDBC listener Move older/ less used data to next tier (local/ remote) Seamless scan on tiered data in ORC/Parquet format Local FileSystem Change Data Capture (CDC) MTable FTable Tiered Data Persistence 3 Options for system availability/ recovery, data tiering and archiving
  • 21. How is Ampool ADS deployed? Î Locator Server Server Server Server  Clients store, scan & retrieve data  Direct (REST, Java), Data ingest (Kafka) or compute engines (Spark)  Locators provide up to-date topology info to Clients and Servers  Servers communicate to maintain data (load) balance & consistency Client Client Client Client
  • 22. Deployment, Management & Monitoring Deployment & Service Management Management REST APIs for service setup JMX endpoints for complete management Memory Analytics SHell (MASH) Monitoring & Performance Management JMX attributes with complete coverage Statistical metric sampling for diagnostics & tuning Enterprise-grade Security Kerberos Authentication LDAP authorization for users, roles & data access REST JMX CLI REST JMX Stats REST JMX Security JMC LDAPKerberos Production-ready services with deployment & management flexibility
  • 23. Ampool in Event-Driven Architectures
  • 24. Event-Driven Architecture Mobile Applications PoSCall Centers Web Applications IoT Devices Business Systems External HTTP Message QueuesLog Files Web Sockets HTTP Streaming Polling Extracts Web-App Backend Micro-batchingData Pipelines Microservices Stream Processing Triggers Fast Batching Stream-Brokers Data WarehouseLog Stores State Caches Relational DB NoSQL DB ML Training Platforms API Gateway Continuous DeliveryAuto Scaling Service Discovery Long-Lived Service Hosting Functions (Serverless) Load Balancing Deployment Management Monitoring Auditing Governance Security Event Generation Event Transport Event Processing Analytics & Serving Runtime Adapted from @rseroter https://content.pivotal.io/blog/how-to-deliver-an-event-driven-architecture
  • 25. Ampool Simplifies Event-Driven Architecture Mobile Applications PoSCall Centers Web Applications IoT Devices Business Systems External HTTP Message QueuesLog Files Web Sockets HTTP Streaming Polling Extracts Web-App Backend Micro-batchingData Pipelines Microservices Stream Processing Triggers Fast Batching Stream-Brokers Data WarehouseLog Stores State Caches Relational DB NoSQL DB ML Training Platforms API Gateway Continuous DeliveryAuto Scaling Service Discovery Long-Lived Service Hosting Functions (Serverless) Load Balancing Deployment Management Monitoring Auditing Governance Security Event Generation Event Transport Runtime
  • 26. Ampool ADS for Analytics & Serving Stream-Brokers Data WarehouseLog Stores State Caches Relational DB NoSQL DB ML Training Platforms
  • 27. Ampool ADS & Event Processing Engines Web-App Backend Micro-batchingData Pipelines Microservices Stream Processing Triggers Fast Batching
  • 28. In summary, use Ampool ADS to… Create an analytical foundation for Apps • Understand usage in real-time • Learn from App’s data ‘exhaust’ Reduce operational complexity • Replace multiple single-function stores with a single, versatile in-memory store Get in-memory processing speed-up • Low-latency responses • Serve multiple data processes & tenants, reducing data copies
  • 30. As of today, Ampool ADS is Open Source Project name “Monarch” Apache License (ASLv2) • Powered by Apache Geode Includes several connectors • Spark (1.6 & 2.x), Hive (1.2.x & 2.x), PrestoDB, Apache Kafka, R, Python Contributions welcome! Give it a try: http://github.com/ampool/monarch 30
  • 31. Available on AWS Marketplace Free Single Node AMI (EC2 charges apply) • https://aws.amazon.com/marketplace/pp/B077D81DD1 Multi-Node Ampool ADS Cluster • https://aws.amazon.com/marketplace/pp/B0784YHDW8 • Single Click Deployment • Local SSD Storage (no EBS costs) • Autoscaling • M3.2xlarge instances (More coming soon) • US-East & US-West Regions (More coming soon) • 31-Day Free Trial • Support by Email & Web-based Ticketing • Annual Subscription Discount 31
  • 32. Ampool ADS v 2.0 (Coming Soon) Notable new features • Support for in-memory columnar storage in FTables • Support for partition pruning • Several fold performance gains with filter pushdowns • Support for fast data ingestion from Kafka topics • Integration with Kafka Connect • New Presto-DB Connector • New Apache Calcite Connector • Delta Persistence And several performance improvements, and stability fixes 32
  • 33. Try out Ampool today! Download: http://www.ampool.io/product Code: https://github.com/ampool/monarch Single Node AMI: https://aws.amazon.com/marketplace/pp/B077D81DD1 Ampool Cluster: https://aws.amazon.com/marketplace/pp/B0784YHDW8 Documentation: http://docs.ampool-inc.com/ Support: support@ampool.io Discuss: https://groups.google.com/forum/#!forum/ampool-users

Editor's Notes

  1. Finding many companies with “Blackout Periods” where critical hot and warm data is not available Data is still silo’d and lose hours and days Conditioning the data so multiple streams can be correlated … in Real Time Still have challenges bringing transactional/reference data together + Behavioral, Contextual Data … as an example