SlideShare a Scribd company logo
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1
Oracle Stream Analytics
Complex Event Processing for Apache Spark Streaming
Complex Event Processing for Spark Streaming
Prabhu Thukkaram, Senior Director, Oracle Product Development
Hoyong Park, Architect, Oracle Product Development
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2
Complex Event Processing with Continuous Query Processor
Continuous
Query Processor
Pre-registered Queries
E3@T3, E2@T2, E1@T1
ResultsInput Events
R3@T3, R2@T2, R1@T1
Heartbeats
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why Continuous Query Processor ?
Oracle Confidential – Internal/Restricted/Highly Restricted 3
• Complex event processing requires events to be processed one at a time
– Each event must be processed as identified by its individual timestamp
– Real world events originate at different times and must be processed as such
– CEP applications seek correlation and patterns across events in time within or across
batches and irrespective of batch boundaries
• Window length can span fractional batches
• Micro-batching with Spark Streaming
– All events in the batch are identified by same time (RDD Time)
– No progression of time between events in the same batch
– No progression of time when RDD partitions are empty
• Critical for missing event scenarios. E.g. alert when order status “Received” is not followed by
“Shipped” for order Id 10001 within 1 hour
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Pattern Detection in CQP
Oracle Confidential – Internal/Restricted/Highly Restricted 4
Checks if temperature readings from a power sensor are wobbling
during a certain time interval.
The CQP code checks for a W-Pattern in temperature readings during a
10 minute interval and selects support levels as output
SELECT LAST(A.value), LAST(C.value) FROM TEMP_STREAM
MATCH_RECOGNIZE (
PARTITION BY DEVICE_ID
PATTERN (A+ B+ C+ D+) DURATION OF 10 MINUTES
DEFINE
A AS (value < PREV(value))
B AS (value > PREV(value))
C AS (value < PREV(value))
D AS (value > PREV(value))
)
A
B
C D
10 Minutes
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Distributed Complex Event Processing
Oracle Confidential – Internal/Restricted/Highly Restricted 5
• Continuous Query Processor
– Event by event processing
• Each event is assigned a unique timestamp
• Apache Spark
– Distributed computing with scale out and fault tolerance
Spark Streaming + Continuous Query Processor
=
Distributed, Scalable, and Fault Tolerant CEP Platform
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How CQP complements Spark Streaming ?
Oracle Confidential – Internal/Restricted/Highly Restricted 6
• Continuous, event-by-event, and stateful processing
• Flexible temporal windows (Time & Rows)
• Automatic progression of time
– Heartbeat propagation to advance time
• Pattern detection without batch boundaries
– Built-in finite state automaton
• Declarative SQL like language
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How Spark complements CQP ?
• Distributed computing framework for CQP
• OOTB sources for data Ingestion
• Horizontal scale-out
• Fault tolerance and high availability
• Spark can detect failures, restart required resources, and automatically replay parts
of the stream that experienced failure
Oracle Confidential – Internal
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
High-Level Architecture
Data Stream Oracle’s CQP Engine
Finite State Automaton for
Pattern Detection across
Discrete Events
HDFS
Journaled CQP engine State serialized to HDFS
after computing each partition
CQP Engine State restored on
Executor Restart and Recompute of a
partition
Geo Sensing Cartridge for
Spatial Analytics
RETE Rules for Conditional
Logic
Complex Pattern Detection, Temporal Queries, Spatial
Queries, Contextual Queries, and Conditional Logic are all
executed in Oracle’s CEP Engine
Distributed Cache for
Ultra-fast Context Lookup
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
CQP Spark Application
Oracle Confidential – Internal
Cluster
Spark Standalone or YARN or Mesos
Spark Driver
Spark Executor
Spark Executor
Spark Executor
• Transformation delegated to CQP
• CQP application parsed in Spark’s Driver process
• Driver builds a DAG (job definition) from CQP application
• Driver runs job for each micro-batch
CQP
CQP
CQP
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Spark Integration - High Level
• Extensions
– CQLDStream extends DStream overriding compute()
– CQLRDD extends RDD overriding compute()
• Initialization/DAG creation phase
– Create Spark Context, Streaming Context, and start CQP on each Executor
– Setup DAG of CQLDStream
• Physical plan execution phase
– Assign unique timestamp to each event in micro-batch based on RDD time
– Register query with CQP on RDD compute if not already registered
– Send micro-batch to CQP and accept returned results
– Checkpoint CQP state if HA enabled
Oracle Confidential – Internal
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Spark Integration - High Availability
• Executor Failure
– Restart CQP on newly restarted Executor
– On RDD compute, re-register query and restore query state from snapshots
– Continue processing stream
Oracle Confidential – Internal
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
Demo
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
Q & A
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
CQP scalability in Spark
Oracle Confidential – Internal/Restricted/Highly Restricted 14
0
20000
40000
60000
80000
w1(c3) w2(c5) w3(c7) w4(c9)
Processing Time (seconds) for 40 Million
Records
0
5000
10000
15000
20000
w1(c3) w2(c5) w3(c7) w4(c9)
Avg. Processing Time Per Batch
(milliseconds)
0
20
40
60
80
100
120
w1(c3) w2(c5) w3(c7) w4(c9)
Number of Batches
Processed over 10
Minutes
Legend
Wn = n number of workers or
executors
W2(c5) means 2 executors
and a total of 5 cores across
both executors

More Related Content

What's hot

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDB
InfluxData
 
Networking in Java with NIO and Netty
Networking in Java with NIO and NettyNetworking in Java with NIO and Netty
Networking in Java with NIO and Netty
Constantine Slisenka
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
Kai Wähner
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Ververica
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
Asis Mohanty
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
StreamNative
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Josef A. Habdank
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka: All an engineer needs to know
Kafka: All an engineer needs to knowKafka: All an engineer needs to know
Kafka: All an engineer needs to know
Thao Huynh Quang
 

What's hot (20)

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDB
 
Networking in Java with NIO and Netty
Networking in Java with NIO and NettyNetworking in Java with NIO and Netty
Networking in Java with NIO and Netty
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
Kafka on Pulsar
Kafka on Pulsar Kafka on Pulsar
Kafka on Pulsar
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka: All an engineer needs to know
Kafka: All an engineer needs to knowKafka: All an engineer needs to know
Kafka: All an engineer needs to know
 

Similar to Bringing complex event processing to Spark streaming

Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
Franco Ucci
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Kellyn Pot'Vin-Gorman
 
The Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result CacheThe Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result Cache
Steven Feuerstein
 
ODTUG Webinar AWR Warehouse
ODTUG Webinar AWR WarehouseODTUG Webinar AWR Warehouse
ODTUG Webinar AWR Warehouse
Kellyn Pot'Vin-Gorman
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
AiougVizagChapter
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
Mark Swarbrick
 
Power of the AWR Warehouse
Power of the AWR WarehousePower of the AWR Warehouse
Power of the AWR Warehouse
Kellyn Pot'Vin-Gorman
 
UKOUG
UKOUG UKOUG
AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
Kellyn Pot'Vin-Gorman
 
AWR and ASH in an EM12c World
AWR and ASH in an EM12c WorldAWR and ASH in an EM12c World
AWR and ASH in an EM12c World
Kellyn Pot'Vin-Gorman
 
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute InfodeckServlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Edward Burns
 
Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2
gaougorg
 
Kellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and AshKellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and Ash
gaougorg
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Courtney Llamas
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EECON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
Masoud Kalali
 

Similar to Bringing complex event processing to Spark streaming (20)

Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream AnalyticsApache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
The Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result CacheThe Amazing and Elegant PL/SQL Function Result Cache
The Amazing and Elegant PL/SQL Function Result Cache
 
ODTUG Webinar AWR Warehouse
ODTUG Webinar AWR WarehouseODTUG Webinar AWR Warehouse
ODTUG Webinar AWR Warehouse
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
Apouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12cApouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12c
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Power of the AWR Warehouse
Power of the AWR WarehousePower of the AWR Warehouse
Power of the AWR Warehouse
 
UKOUG
UKOUG UKOUG
UKOUG
 
AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
AWR and ASH in an EM12c World
AWR and ASH in an EM12c WorldAWR and ASH in an EM12c World
AWR and ASH in an EM12c World
 
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute InfodeckServlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
 
Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2Kellyn Pot'Vin-Gorman - Power awr warehouse2
Kellyn Pot'Vin-Gorman - Power awr warehouse2
 
Kellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and AshKellyn Pot'Vin-Gorman - Awr and Ash
Kellyn Pot'Vin-Gorman - Awr and Ash
 
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
Zero to Manageability in 60 Minutes: Building a Solid Foundation for Oracle E...
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EECON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
CON 2107- Think Async: Embrace and Get Addicted to the Asynchronicity of EE
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Bringing complex event processing to Spark streaming

  • 1. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1 Oracle Stream Analytics Complex Event Processing for Apache Spark Streaming Complex Event Processing for Spark Streaming Prabhu Thukkaram, Senior Director, Oracle Product Development Hoyong Park, Architect, Oracle Product Development
  • 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 2 Complex Event Processing with Continuous Query Processor Continuous Query Processor Pre-registered Queries E3@T3, E2@T2, E1@T1 ResultsInput Events R3@T3, R2@T2, R1@T1 Heartbeats
  • 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Why Continuous Query Processor ? Oracle Confidential – Internal/Restricted/Highly Restricted 3 • Complex event processing requires events to be processed one at a time – Each event must be processed as identified by its individual timestamp – Real world events originate at different times and must be processed as such – CEP applications seek correlation and patterns across events in time within or across batches and irrespective of batch boundaries • Window length can span fractional batches • Micro-batching with Spark Streaming – All events in the batch are identified by same time (RDD Time) – No progression of time between events in the same batch – No progression of time when RDD partitions are empty • Critical for missing event scenarios. E.g. alert when order status “Received” is not followed by “Shipped” for order Id 10001 within 1 hour
  • 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Pattern Detection in CQP Oracle Confidential – Internal/Restricted/Highly Restricted 4 Checks if temperature readings from a power sensor are wobbling during a certain time interval. The CQP code checks for a W-Pattern in temperature readings during a 10 minute interval and selects support levels as output SELECT LAST(A.value), LAST(C.value) FROM TEMP_STREAM MATCH_RECOGNIZE ( PARTITION BY DEVICE_ID PATTERN (A+ B+ C+ D+) DURATION OF 10 MINUTES DEFINE A AS (value < PREV(value)) B AS (value > PREV(value)) C AS (value < PREV(value)) D AS (value > PREV(value)) ) A B C D 10 Minutes
  • 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Distributed Complex Event Processing Oracle Confidential – Internal/Restricted/Highly Restricted 5 • Continuous Query Processor – Event by event processing • Each event is assigned a unique timestamp • Apache Spark – Distributed computing with scale out and fault tolerance Spark Streaming + Continuous Query Processor = Distributed, Scalable, and Fault Tolerant CEP Platform
  • 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How CQP complements Spark Streaming ? Oracle Confidential – Internal/Restricted/Highly Restricted 6 • Continuous, event-by-event, and stateful processing • Flexible temporal windows (Time & Rows) • Automatic progression of time – Heartbeat propagation to advance time • Pattern detection without batch boundaries – Built-in finite state automaton • Declarative SQL like language
  • 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How Spark complements CQP ? • Distributed computing framework for CQP • OOTB sources for data Ingestion • Horizontal scale-out • Fault tolerance and high availability • Spark can detect failures, restart required resources, and automatically replay parts of the stream that experienced failure Oracle Confidential – Internal
  • 8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | High-Level Architecture Data Stream Oracle’s CQP Engine Finite State Automaton for Pattern Detection across Discrete Events HDFS Journaled CQP engine State serialized to HDFS after computing each partition CQP Engine State restored on Executor Restart and Recompute of a partition Geo Sensing Cartridge for Spatial Analytics RETE Rules for Conditional Logic Complex Pattern Detection, Temporal Queries, Spatial Queries, Contextual Queries, and Conditional Logic are all executed in Oracle’s CEP Engine Distributed Cache for Ultra-fast Context Lookup
  • 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | CQP Spark Application Oracle Confidential – Internal Cluster Spark Standalone or YARN or Mesos Spark Driver Spark Executor Spark Executor Spark Executor • Transformation delegated to CQP • CQP application parsed in Spark’s Driver process • Driver builds a DAG (job definition) from CQP application • Driver runs job for each micro-batch CQP CQP CQP
  • 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Spark Integration - High Level • Extensions – CQLDStream extends DStream overriding compute() – CQLRDD extends RDD overriding compute() • Initialization/DAG creation phase – Create Spark Context, Streaming Context, and start CQP on each Executor – Setup DAG of CQLDStream • Physical plan execution phase – Assign unique timestamp to each event in micro-batch based on RDD time – Register query with CQP on RDD compute if not already registered – Send micro-batch to CQP and accept returned results – Checkpoint CQP state if HA enabled Oracle Confidential – Internal
  • 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Spark Integration - High Availability • Executor Failure – Restart CQP on newly restarted Executor – On RDD compute, re-register query and restore query state from snapshots – Continue processing stream Oracle Confidential – Internal
  • 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Demo
  • 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Q & A
  • 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | CQP scalability in Spark Oracle Confidential – Internal/Restricted/Highly Restricted 14 0 20000 40000 60000 80000 w1(c3) w2(c5) w3(c7) w4(c9) Processing Time (seconds) for 40 Million Records 0 5000 10000 15000 20000 w1(c3) w2(c5) w3(c7) w4(c9) Avg. Processing Time Per Batch (milliseconds) 0 20 40 60 80 100 120 w1(c3) w2(c5) w3(c7) w4(c9) Number of Batches Processed over 10 Minutes Legend Wn = n number of workers or executors W2(c5) means 2 executors and a total of 5 cores across both executors