3. Apache Geode: Listeners
• CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
3
4. Apache Geode: Events & Notifications
Register Interest
•Individual Keys OR RegEx for Keys
•Updates Local Copy
•Examples:
• region.registerInterest(“key-1”);
• region1.registerInterestRegex(“[a-z]+“);
Continuous Query
•Receive Notification when Query condition met on server
•Example:
– SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
4
6. Apex: Checkpointing with Geode
er
Operator
er
Operator
er
Operator
Filtered
Stream
Filtered
Stream
er
OperatorInput
Stream
Enriched
Stream
Enriched
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Checkpoint
State
Persistence
In-Memory
7. Operator Checkpointing in Geode
Apex Operator check-pointing in an IMDG (Geode store)
•Checkpointing is an essential mechanism to ensure Fault Tolerance
•Apex checkpoints operator state to HDFS
•Slower HDFS checkpointing hurts application performance
•Checkpointing in Geode ensures that application performance is not impacted
•Geode has better latency for write operations than HDFS.
Implementation: GeodeStorageAgent
https://issues.apache.org/jira/browse/APEXCORE-283
11. Apex + Geode: Future Integrations
• Geode output operator with transactional support
• Input Operator: Ingest data from Geode to Apex DAG
• Distributed Cache Operator
• Scan Operator: Parallel query execution & result retrieval
12. Geode Transaction Operator
Apex Output Operator to write to Geode store with Transactions
•Apex DAG uses TransactionableStore to provide guarantee that records are written are
exactly once. E.g. JdbcTransactionalStore
•Geode provides transaction support for efficient and safe coordinated operations
•Geode store using transactions guarantee that records are written exactly once
•Put operator backed by GeodeTransactional store can help to achieve Exactly once
semantics
Implementation: GeodeWindowStore as TransactionableStore
Proposed
Proposed
13. Input Operator: Streaming Geode data
Apex Input Operator to read from Geode store
•Apex Input operators – Ingest data from external sources into Apex DAG
•Geode provides versatile and reliable event distribution to provide Real Time
updates to data
• Use case – Apex operator to stream async events from Geode in DAG
• Call back events reduce polling cycles over network
Implementation: GeodeRegionStreamOperator
receives a newly added tuples and emits in DAG
Proposed
Proposed
14. Geode Cache Operator
Apex+Geode Cache Operator
•Geode provides efficient Events & Notifications
• Register interest – update local copies
• Continuous Query
• Receive notification when Query condition met on server
• Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00
•Use Geode events notification framework to maintain & invalidate cache.
Implementation: GeodeCacheOperator
maintains consistent cache based on subscribed keyset/query
Proposed
Proposed
15. Geode Scan Operator
Apex+Geode Scan Operator
•Function Execution provides Parallel Query Execution
•MapReduce like execution - concurrent execution on members & results are
collected from members & sent to caller.
•Use case: Streaming application depending on large scan result from external store
Implementation: GeodeQueryOperator
execute data dependent queries on distributed region
emit results in DAG
Proposed
Proposed
What IMDG like Geode does is host data in memory and distribute it across a cluster of commodity servers.
Provide an object oriented data storage model, they provide APIs for updating data objects typically in well under a millisecond (depending on the size of the object).
This enables Streaming computation systems like Apex to use Geode for storing, accessing, and updating fast-changing, “live” data, while maintaining fast access times even as the storage workload grows.
Geode provides versatile and reliable event distribution and handling for your cached data and
Events are content based & async, provides distributed notification & continous querying
Event handler call backs are triggered can be triggered before or after event
event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries. You can use a cache-listener to receive notifications after the data in the Region changes.
Cache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entries Can use a cache-writer to synchronously persist region's data in an archival system. Cache-Loader A event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache
Apex serializes the state of operators to local disks, and then asynchronously copies serialized state to HDFS. The state is asynchronously copied to HDFS
HDFS read/write latency is limited and doesn't improve beyond certain point because of disk io & staging writes
In case of Exactly-Once recovery mechanism, platform checkpoints at every window boundary and it behaves in synchronous mode i.e the operator is blocked till the state is copied to HDFS, for application with more number of operators instances this impacts to overall application performance.
Apex applications are specified as Directed Acyclic Graphs or DAGs. DAGs express processing logic using operators (vertices) and streams (edges), thereby providing a way to describe complex logic for sequential or parallel execution and breaking up the application logic into smaller functional components. - See more at: https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/#sthash.LcqwfmRS.dpuf
Apex serializes the state of operators to local disks, and then asynchronously copies serialized state to HDFS. The state is asynchronously copied to HDFS
HDFS read/write latency is limited and doesn't improve beyond certain point because of disk io & staging writes
In case of Exactly-Once recovery mechanism, platform checkpoints at every window boundary and it behaves in synchronous mode i.e the operator is blocked till the state is copied to HDFS, for application with more number of operators instances this impacts to overall application performance.
The last processed window id is stored along with the application data modified in the window. On recovery and replay, it can be used to detect what was already processed and skip instead of writing duplicates. This technique permits to make results available in the database with minimized latency. It requires idempotency, the guarantee that events are always delivered in the same window on replay, provided by Apex.
Geode provides versatile and reliable event distribution and handling for your cached data and
Events are content based & async, provides distributed notification & continous querying
Event handler call backs are triggered can be triggered before or after event
event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries.You can use a cache-listener to receive notifications after the data in the Region changesCache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entriesCan use a cache-writer to synchronously persist region's data in an archival system.Cache-LoaderA event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache
Apex applications are specified as Directed Acyclic Graphs or DAGs. DAGs express processing logic using operators (vertices) and streams (edges), thereby providing a way to describe complex logic for sequential or parallel execution and breaking up the application logic into smaller functional components. - See more at: https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/#sthash.LcqwfmRS.dpuf
Events are content based & async, provides distributed notification & continous querying
Event handler call backs are triggered can be triggered before or after event
event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries.You can use a cache-listener to receive notifications after the data in the Region changesCache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entriesCan use a cache-writer to synchronously persist region's data in an archival system.Cache-LoaderA event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache
Events are content based & async, provides distributed notification & continous querying
Event handler call backs are triggered can be triggered before or after event
event handler plug-in that receives synchronous, after-event callbacks for modifications to the Region and its entries.You can use a cache-listener to receive notifications after the data in the Region changesCache-Writers A event handler plug-in that receives synchronous, before-event callbacks for modifications to the region and its entriesCan use a cache-writer to synchronously persist region's data in an archival system.Cache-LoaderA event handler plug-in that receives callbacks for cache misses, when a requested key is not present. can be used to populate new value for key not present in cache