SlideShare a Scribd company logo
1 of 16
Apache Apex + Apache Geode
In-Memory Streaming, Storage & Analytics
Ashish Tadose
Streaming meets In Memory Data Grid
Apache Geode: Listeners
• CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
3
Apache Geode: Events & Notifications
Register Interest
•Individual Keys OR RegEx for Keys
•Updates Local Copy
•Examples:
• region.registerInterest(“key-1”);
• region1.registerInterestRegex(“[a-z]+“);
Continuous Query
•Receive Notification when Query condition met on server
•Example:
– SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
4
Apex: Checkpointing Today
er
Operator
er
Operator
er
Operator
Filtered
Stream
Filtered
Stream
er
OperatorInput
Stream
Enriched
Stream
Enriched
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Checkpoint
State
Persistence
In-Memory
Apex: Checkpointing with Geode
er
Operator
er
Operator
er
Operator
Filtered
Stream
Filtered
Stream
er
OperatorInput
Stream
Enriched
Stream
Enriched
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Checkpoint
State
Persistence
In-Memory
Operator Checkpointing in Geode
Apex Operator check-pointing in an IMDG (Geode store)
•Checkpointing is an essential mechanism to ensure Fault Tolerance
•Apex checkpoints operator state to HDFS
•Slower HDFS checkpointing hurts application performance
•Checkpointing in Geode ensures that application performance is not impacted
•Geode has better latency for write operations than HDFS.
Implementation: GeodeStorageAgent
https://issues.apache.org/jira/browse/APEXCORE-283
Apex: Input/Output Operators
er
Operator
er
Output
Operator
Input
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Data Store
In-Memory
…
No SQL DatabaseNo SQL Database
er
Operator
er
Geode
Output
Input
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Data Store
In-Memory
…
Geode Output Operator
• Built-in OQL support
• Visualization support
• Persistence options
• Transaction support
Data Streams to Geode Store
Apex + Geode: Future Integrations
• Geode output operator with transactional support
• Input Operator: Ingest data from Geode to Apex DAG
• Distributed Cache Operator
• Scan Operator: Parallel query execution & result retrieval
Geode Transaction Operator
Apex Output Operator to write to Geode store with Transactions
•Apex DAG uses TransactionableStore to provide guarantee that records are written are
exactly once. E.g. JdbcTransactionalStore
•Geode provides transaction support for efficient and safe coordinated operations
•Geode store using transactions guarantee that records are written exactly once
•Put operator backed by GeodeTransactional store can help to achieve Exactly once
semantics
Implementation: GeodeWindowStore as TransactionableStore
Proposed
Proposed
Input Operator: Streaming Geode data
Apex Input Operator to read from Geode store
•Apex Input operators – Ingest data from external sources into Apex DAG
•Geode provides versatile and reliable event distribution to provide Real Time
updates to data
• Use case – Apex operator to stream async events from Geode in DAG
• Call back events reduce polling cycles over network
Implementation: GeodeRegionStreamOperator
receives a newly added tuples and emits in DAG
Proposed
Proposed
Geode Cache Operator
Apex+Geode Cache Operator
•Geode provides efficient Events & Notifications
• Register interest – update local copies
• Continuous Query
• Receive notification when Query condition met on server
• Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00
•Use Geode events notification framework to maintain & invalidate cache.
Implementation: GeodeCacheOperator
maintains consistent cache based on subscribed keyset/query
Proposed
Proposed
Geode Scan Operator
Apex+Geode Scan Operator
•Function Execution provides Parallel Query Execution
•MapReduce like execution - concurrent execution on members & results are
collected from members & sent to caller.
•Use case: Streaming application depending on large scan result from external store
Implementation: GeodeQueryOperator
execute data dependent queries on distributed region
emit results in DAG
Proposed
Proposed
Questions ???
Thank You …

More Related Content

What's hot

Kapacitor Stream Processing
Kapacitor Stream ProcessingKapacitor Stream Processing
Kapacitor Stream ProcessingInfluxData
 
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres OpenCraig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres OpenPostgresOpen
 
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf
 
tado° Makes Your Home Environment Smart with InfluxDB
tado° Makes Your Home Environment Smart with InfluxDBtado° Makes Your Home Environment Smart with InfluxDB
tado° Makes Your Home Environment Smart with InfluxDBInfluxData
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit
 
A True Story About Database Orchestration
A True Story About Database OrchestrationA True Story About Database Orchestration
A True Story About Database OrchestrationInfluxData
 
Flux and InfluxDB 2.0
Flux and InfluxDB 2.0Flux and InfluxDB 2.0
Flux and InfluxDB 2.0InfluxData
 
Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013
Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013
Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013DataStax Academy
 
From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!Nicolas (Nick) Barcet
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWSPaolo latella
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containerskbajda
 
Terraform – Infrastructure as Code (Kielux'18)
Terraform – Infrastructure as Code (Kielux'18)Terraform – Infrastructure as Code (Kielux'18)
Terraform – Infrastructure as Code (Kielux'18)Martin Schütte
 
Real World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and SparkReal World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and SparkQAware GmbH
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Gavin Lin
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineTianlun Zhang
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScaleDataWorks Summit
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeMartin Schütte
 
Infrastructure as Code in Google Cloud
Infrastructure as Code in Google CloudInfrastructure as Code in Google Cloud
Infrastructure as Code in Google CloudRadek Simko
 

What's hot (20)

Kapacitor Stream Processing
Kapacitor Stream ProcessingKapacitor Stream Processing
Kapacitor Stream Processing
 
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres OpenCraig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
Craig Kerstiens - Scalable Uniques in Postgres @ Postgres Open
 
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
 
tado° Makes Your Home Environment Smart with InfluxDB
tado° Makes Your Home Environment Smart with InfluxDBtado° Makes Your Home Environment Smart with InfluxDB
tado° Makes Your Home Environment Smart with InfluxDB
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc Bourlier
 
A True Story About Database Orchestration
A True Story About Database OrchestrationA True Story About Database Orchestration
A True Story About Database Orchestration
 
OpenStack Ceilometer
OpenStack CeilometerOpenStack Ceilometer
OpenStack Ceilometer
 
Graphite cluster setup blueprint
Graphite cluster setup blueprintGraphite cluster setup blueprint
Graphite cluster setup blueprint
 
Flux and InfluxDB 2.0
Flux and InfluxDB 2.0Flux and InfluxDB 2.0
Flux and InfluxDB 2.0
 
Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013
Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013
Apache Cassandra at Videoplaza — Stockholm Cassandra Users — September 2013
 
From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Terraform – Infrastructure as Code (Kielux'18)
Terraform – Infrastructure as Code (Kielux'18)Terraform – Infrastructure as Code (Kielux'18)
Terraform – Infrastructure as Code (Kielux'18)
 
Real World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and SparkReal World Analytics with Solr Cloud and Spark
Real World Analytics with Solr Cloud and Spark
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming Engine
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte Scale
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Infrastructure as Code in Google Cloud
Infrastructure as Code in Google CloudInfrastructure as Code in Google Cloud
Infrastructure as Code in Google Cloud
 

Viewers also liked

Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six MonthsAnthony Baker
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
Building Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFireBuilding Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFireJohn Blum
 
Liberador De Espacio Y Disco
Liberador De Espacio Y DiscoLiberador De Espacio Y Disco
Liberador De Espacio Y DiscoZore VR
 
MBM1601_40-43_GI Sjoerd de Waal + Review
MBM1601_40-43_GI Sjoerd de Waal + ReviewMBM1601_40-43_GI Sjoerd de Waal + Review
MBM1601_40-43_GI Sjoerd de Waal + ReviewTrainnovation
 
Csi pavia notizie_n_12_del_29.03.16
Csi pavia notizie_n_12_del_29.03.16Csi pavia notizie_n_12_del_29.03.16
Csi pavia notizie_n_12_del_29.03.16CSI PAVIA
 
SUPLETORIOS FINALES I GRUPO AÑO 2016
SUPLETORIOS FINALES I GRUPO AÑO 2016SUPLETORIOS FINALES I GRUPO AÑO 2016
SUPLETORIOS FINALES I GRUPO AÑO 2016cencepolpne
 
WorkKeys Individual Score Report 11-20-2015
WorkKeys Individual Score Report 11-20-2015WorkKeys Individual Score Report 11-20-2015
WorkKeys Individual Score Report 11-20-2015jamie Justice
 

Viewers also liked (11)

Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six Months
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
Building Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFireBuilding Effective Apache Geode Applications with Spring Data GemFire
Building Effective Apache Geode Applications with Spring Data GemFire
 
Liberador De Espacio Y Disco
Liberador De Espacio Y DiscoLiberador De Espacio Y Disco
Liberador De Espacio Y Disco
 
MBM1601_40-43_GI Sjoerd de Waal + Review
MBM1601_40-43_GI Sjoerd de Waal + ReviewMBM1601_40-43_GI Sjoerd de Waal + Review
MBM1601_40-43_GI Sjoerd de Waal + Review
 
Csi pavia notizie_n_12_del_29.03.16
Csi pavia notizie_n_12_del_29.03.16Csi pavia notizie_n_12_del_29.03.16
Csi pavia notizie_n_12_del_29.03.16
 
SUPLETORIOS FINALES I GRUPO AÑO 2016
SUPLETORIOS FINALES I GRUPO AÑO 2016SUPLETORIOS FINALES I GRUPO AÑO 2016
SUPLETORIOS FINALES I GRUPO AÑO 2016
 
O que é texto
O que é textoO que é texto
O que é texto
 
WorkKeys Individual Score Report 11-20-2015
WorkKeys Individual Score Report 11-20-2015WorkKeys Individual Score Report 11-20-2015
WorkKeys Individual Score Report 11-20-2015
 
Energía eólica
Energía eólicaEnergía eólica
Energía eólica
 

Similar to ApexMeetup Geode - Talk2 2016-03-17

Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexDataWorks Summit
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexThomas Weise
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Comsysto Reply GmbH
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 

Similar to ApexMeetup Geode - Talk2 2016-03-17 (20)

Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

ApexMeetup Geode - Talk2 2016-03-17

  • 1. Apache Apex + Apache Geode In-Memory Streaming, Storage & Analytics Ashish Tadose
  • 2. Streaming meets In Memory Data Grid
  • 3. Apache Geode: Listeners • CacheWriter / CacheListener • AsyncEventListener (queue / batch) • Parallel or Serial • Conflation 3
  • 4. Apache Geode: Events & Notifications Register Interest •Individual Keys OR RegEx for Keys •Updates Local Copy •Examples: • region.registerInterest(“key-1”); • region1.registerInterestRegex(“[a-z]+“); Continuous Query •Receive Notification when Query condition met on server •Example: – SELECT * FROM /tradeOrder t WHERE t.price > 100.00 Can be DURABLE 4
  • 6. Apex: Checkpointing with Geode er Operator er Operator er Operator Filtered Stream Filtered Stream er OperatorInput Stream Enriched Stream Enriched Stream Output Stream Checkpoint State Checkpoint State Checkpoint State Persistence In-Memory
  • 7. Operator Checkpointing in Geode Apex Operator check-pointing in an IMDG (Geode store) •Checkpointing is an essential mechanism to ensure Fault Tolerance •Apex checkpoints operator state to HDFS •Slower HDFS checkpointing hurts application performance •Checkpointing in Geode ensures that application performance is not impacted •Geode has better latency for write operations than HDFS. Implementation: GeodeStorageAgent https://issues.apache.org/jira/browse/APEXCORE-283
  • 9. er Operator er Geode Output Input Stream Output Stream Checkpoint State Checkpoint State Data Store In-Memory … Geode Output Operator • Built-in OQL support • Visualization support • Persistence options • Transaction support
  • 10. Data Streams to Geode Store
  • 11. Apex + Geode: Future Integrations • Geode output operator with transactional support • Input Operator: Ingest data from Geode to Apex DAG • Distributed Cache Operator • Scan Operator: Parallel query execution & result retrieval
  • 12. Geode Transaction Operator Apex Output Operator to write to Geode store with Transactions •Apex DAG uses TransactionableStore to provide guarantee that records are written are exactly once. E.g. JdbcTransactionalStore •Geode provides transaction support for efficient and safe coordinated operations •Geode store using transactions guarantee that records are written exactly once •Put operator backed by GeodeTransactional store can help to achieve Exactly once semantics Implementation: GeodeWindowStore as TransactionableStore Proposed Proposed
  • 13. Input Operator: Streaming Geode data Apex Input Operator to read from Geode store •Apex Input operators – Ingest data from external sources into Apex DAG •Geode provides versatile and reliable event distribution to provide Real Time updates to data • Use case – Apex operator to stream async events from Geode in DAG • Call back events reduce polling cycles over network Implementation: GeodeRegionStreamOperator receives a newly added tuples and emits in DAG Proposed Proposed
  • 14. Geode Cache Operator Apex+Geode Cache Operator •Geode provides efficient Events & Notifications • Register interest – update local copies • Continuous Query • Receive notification when Query condition met on server • Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00 •Use Geode events notification framework to maintain & invalidate cache. Implementation: GeodeCacheOperator maintains consistent cache based on subscribed keyset/query Proposed Proposed
  • 15. Geode Scan Operator Apex+Geode Scan Operator •Function Execution provides Parallel Query Execution •MapReduce like execution - concurrent execution on members & results are collected from members & sent to caller. •Use case: Streaming application depending on large scan result from external store Implementation: GeodeQueryOperator execute data dependent queries on distributed region emit results in DAG Proposed Proposed

Editor's Notes

  1. What IMDG like Geode does is host data in memory and distribute it across a cluster of commodity servers. Provide an object oriented data storage model, they provide APIs for updating data objects typically in well under a millisecond (depending on the size of the object).   This enables Streaming computation systems like Apex to use Geode for storing, accessing, and updating fast-changing, “live” data, while maintaining fast access times even as the storage workload grows.
  2. Geode provides versatile and reliable event distribution and handling for your cached data and  Events are content based & async, provides distributed notification & continous querying Event handler call backs are triggered can be triggered before or after event event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries. You can use a cache-listener to receive notifications after the data in the Region changes. Cache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entries Can use a cache-writer to synchronously persist region's data in an archival system. Cache-Loader A event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache
  3. Apex serializes the state of operators to local disks, and then asynchronously copies serialized state to HDFS. The state is asynchronously copied to HDFS HDFS read/write latency is limited and doesn't improve beyond certain point because of disk io & staging writes In case of Exactly-Once recovery mechanism, platform checkpoints at every window boundary and it behaves in synchronous mode i.e the operator is blocked till the state is copied to HDFS, for application with more number of operators instances this impacts to overall application performance. Apex applications are specified as Directed Acyclic Graphs or DAGs. DAGs express processing logic using operators (vertices) and streams (edges), thereby providing a way to describe complex logic for sequential or parallel execution and breaking up the application logic into smaller functional components. - See more at: https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/#sthash.LcqwfmRS.dpuf
  4. Apex serializes the state of operators to local disks, and then asynchronously copies serialized state to HDFS. The state is asynchronously copied to HDFS HDFS read/write latency is limited and doesn't improve beyond certain point because of disk io & staging writes In case of Exactly-Once recovery mechanism, platform checkpoints at every window boundary and it behaves in synchronous mode i.e the operator is blocked till the state is copied to HDFS, for application with more number of operators instances this impacts to overall application performance.
  5. The last processed window id is stored along with the application data modified in the window. On recovery and replay, it can be used to detect what was already processed and skip instead of writing duplicates. This technique permits to make results available in the database with minimized latency. It requires idempotency, the guarantee that events are always delivered in the same window on replay, provided by Apex.
  6. Geode provides versatile and reliable event distribution and handling for your cached data and  Events are content based & async, provides distributed notification & continous querying Event handler call backs are triggered can be triggered before or after event event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries.You can use a cache-listener to receive notifications after the data in the Region changesCache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entriesCan use a cache-writer to synchronously persist region's data in an archival system.Cache-LoaderA event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache Apex applications are specified as Directed Acyclic Graphs or DAGs. DAGs express processing logic using operators (vertices) and streams (edges), thereby providing a way to describe complex logic for sequential or parallel execution and breaking up the application logic into smaller functional components. - See more at: https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/#sthash.LcqwfmRS.dpuf
  7. Events are content based & async, provides distributed notification & continous querying Event handler call backs are triggered can be triggered before or after event event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries.You can use a cache-listener to receive notifications after the data in the Region changesCache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entriesCan use a cache-writer to synchronously persist region's data in an archival system.Cache-LoaderA event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache
  8. Events are content based & async, provides distributed notification & continous querying Event handler call backs are triggered can be triggered before or after event event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries.You can use a cache-listener to receive notifications after the data in the Region changesCache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entriesCan use a cache-writer to synchronously persist region's data in an archival system.Cache-LoaderA event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache