SlideShare a Scribd company logo
S4 : Distributed Stream Computing Platform


     Aleksandar Bradic, Sr. Director, Engineering and R&D, Vast.com

              Software Scalability | Belgrade meetup #04
Introduction



  Simple Scalable Streaming System
  General-purpose, distributed, scalable, partially fault-
  tolerant, pluggable platform that allows programmers to
  easily develop applications for processing continuous
  unbounded streams of data
  Inspired by MapReduce and Actor models of computation
Introduction

Stream data processing | use cases :

   Real-time data analysis (financial data, twitter feeds, news
   data ...)
   High-frequency trading
   Complex Event Processing
   Real-time Search
   Social Networks
   Incorporating Web application user feedback in real time
   Search advertising personalization
   Low-latency data processing pipelines
   Online algorithm development
   ...
Introduction

  Existing large-scale MapReduce data processing platforms
  (Hadoop) are highly optimized for batch processing
  MapReduce systems typically operate on static data by
  scheduling batch jobs
  In stream computing, the paradigm is to have a stream of
  events that flow into the system at a given data rate over
  which we have no control
  The processing system must keep up with event rate or
  degrade gracefully by eliminating events (load shedding)
  The streaming paradigm dictates a very different
  architecture than the one used in batch processing
  Attempting to build a general-purpose platform for both
  batch and stream computing would result in highly complex
  system that may end up not being optimal for either task
Introduction

  Many real-world systems implement a streaming strategy of
  partitioning the input data into fixed-size segments that are
  processed by MapReduce platform
  The disadvantage of this approach is that the latency is
  proportional to the length of the segment plus the overhead
  required to do the segmentation and initiate the processing
  jobs
  Small segments will reduce latency and overhead, but will
  make it more complex to manage inter-segment
  dependencies
  The optimal segment size will depend on the application
  S4 attemps to explore a different paradigm that is simple
  and can operate on data streams in real time
Introduction

S4 | Design goals :
   Provide a simple Programming Interface for processing data
   streams
   Design a cluster with high availability that can scale using
   commodity hardware
   Minimize latency by using local memory in each processing
   node and avoiding I/O bottlenecks
   Use a decentralized and symmetric architecture; all nodes
   share the same functionality and responsibilities. There is
   no central node with specific responsibilities.
   Use a pluggable architecture, to keep the design as generic
   and customizable as possible
   Make the design flexible and easy to program
Introduction

S4 | Assumptions :

    Lossy failover is acceptable. Upon a server failure,
   processes are automatically moved to a standby server.
   The state of the processes, which is stored in local memory,
   is lost during the handoff. The state is regenerated using the
   input streams. Downstream systems must degrade
   gracefully.

   Nodes will not be added to or removed from a running
   cluster
Introduction

  Data processing is based on Processing Elements (PE) -
  corresponding to Actor model of concurrent computation
  Messages are transmitted between PEs in the form of data
  events
  The state of each PE is inaccessible to other PEs
  The framework provides capability to route events to
  appropriate PEs and create new instances of PEs
Design

 Stream = sequence of elements ("events") of the form (K,A)
     K - tuple-valued keys
     A - attributes
 Design goal is to create a flexible stream computing
 platform which consumes such a stream, computes
 intermediate values and possibly emits other streams in
 distributed computing environment
Design

Example :

   Input event contains a document with a quotation in English

   Continuously produce a sorted list of top K most frequent
   words across all documents with minimal latency
Word Count Example
Processing Elements

 Processing Elements (PEs) are the basic computational
 units in S4
 Each instance of PE is uniquely identified by four
 components:
     it's functionality as defined by PE class and associated
     configuration
     the types of events that it consumes
     the keyed attribute in those events
     the value of the keyed attribute in events which it
     consumes
Processing Elements

 Every PE consumes exactly those events which
 correspond to the value on which it is keyed and can
 produce output events
 PE is instantiated for each value of the key attribute
 Special class of PEs is the set of keyless PEs with no keyed
 attribute or value. These PEs consume all events of the type
 with which they are associated
 Keyless PEs are typically used at the input layer of an S4
 cluster where events are assigned a key
 "Garbage Collection" of PEs represents a major challenge
 for the platform
 State of the PE is lost after the "cleanup"
Processing Node

 Processing Nodes (PNs) are the logical hosts to PEs
 They are responsible for listening to events, executing
 operations on the incoming events, dispatching events with
 the assistance of communication layer and emitting output
 events
 S4 routes each event to PN based on a hash function of
 the values of all known keyed attributes in that event
 A single event may be routed to multiple PNs
 The set of all possible keying attributes is known from the
 configuration of S4 cluster
 An event listener in the PN passes incoming events to the
 processing element container (PEC) which invokes the
 appropriate PEs in appropriate order
Processing Node

 There is a special type of PE object : the PE prototype
 It has the first three components of its identity :
     functionality
     event type
     keyed attribute
 The PE prototype does not have the attribute value
 assigned
 This object is configured upon initialization and for any value
 V is capable of cloning itself to create fully qualified PEs of
 that class with identical configuration and value V for the
 keyed attribute
 This operation is triggered by the PN once for each unique
 value of the keyed attribute that it encounters
Processing Node

 As a result of this design - all events with a particular value
 of a keyed attribute are guaranteed to arrive at particular
 corresponding PN and be routed to the corresponding PE
 instance within it
 Every keyed PE can be mapped to exactly one PN based
 on the value of the hash function applied to the value of
 keyed attribute of that PE
 Keyless PEs may be instantiated on every PN
Processing Node
Communication Layer

 Communication layer provides cluster management and
 automatic failover to standby nodes and maps physical
 nodes to logical nodes
 It automatically detects hardware failures and accordingly
 updates the mapping
 Emitters specify only logical nodes when sending messages
 and are unaware of the physical nodes or when logical
 nodes are re-mapped due to failures
 The communication layer uses ZooKeeper to help
 coordinate between node in a S4 cluster.
Programming Model

 High-level programming paradigm is to write generic,
 reusable and configurable processing elements that can be
 used across various applications
 PEs are assembled into applications using Spring
 Framework
 The processing element API is fairly simple and flexible :
 developers essentially implement two primary handlers:
    input event handler processEvent() - invoked for each
    incoming event of the types the PE has subscribed to
    output mechanism output()- optional method that
    implements the output of PE to an external system. Can
    be configured to be invoked in a variety of ways - at
    regular time intervals t or on receiving n input events.
Programming Model
                                                           QueryCounterPE.java

private queryCount = 0;

public void processEvent(Event event) {
  queryCount++;
}

public void output() {
  String query = (String)this.getKeyValue().get(0);
  persister.set(query, queryCount);
}



                                                                               QueryCounterPE.xml


                <bean id="queryCounterPE" class="com.company.s4.processor.QueryCounterPE">
                <property name="keys">
                  <list>
                     <value>QueryEvent queryString</value>
                  </list>
                </property>
                <property name="persister" ref="externalPersister">
                <property name="outputFrequencyByTimeBoundary" value="600"/>
                </bean>
Example : Streaming CTR Computation


  Click-through rate (CTR) represents the ratio of the number
  of clicks divided by the number of impressions
  When sufficient historical data is available, CTR is a good
  estimate of the probability that user will click on the item
  S4 is used to measure CTR in real-time for each query-ad
  combination
  To compute CTR in this manner - we need to route click and
  serve events using a key composed of the query and ad IDs
  If the click payload doesn't include query and ad
  information, we need to do a join by serve ID prior to routing
  the events with query-ad as the key
  Once joined, the events must pass-through a bot filter
  Finally, serves and clicks are aggregated to compute CTR
Streaming CTR Computation
Example : online parameter optimization

  System for automating tuning of one or more parameters of
  a search advertising system using live traffic
  The system removes the need for manual tuning and
  consistent human intervention, while searching a large
  parameter space in less time than possible by manual
  searches
  The system ingests events emitted by the target system
  (search advertising system), measures performance,
  applies an adaptation algorithm to determine new
  parameters, and injects the new parameters back into the
  target system
  (Closed loop approach similar to traditional control systems)
Example : online parameter optimization


  We assume that the target system (TS) produces output
  that can be represented as a stream and has measurable
  performance in the form of a configurable objective function
  (OF)
  We set aside 2 randomly-assigned slices of the output
  stream of the target system as slice1 and slice2
  We require that the target system has the ability to apply
  different parameter values in each slice and that each slice
  can be identified as such output stream
Example : online parameter optimization


The system has 3 high-level functional components:

   Measurement : ingests the slice1 and slice2 streams of the
   TS and measures the value of OF in each slice. The OF is
   measured for the duration of a slot
   Comparator : takes as input the measurement produced by
   the measurement component and determines if and when
   there is a statistically significant difference in performance
   between the slice1 and slice2 slices. When it determines a
   significant difference, it sends a message to the optimizer.
   Optimizer : implements the adaption strategy. Takes as
   input the history of parameter influences and output new
   parameter values for both slice1 and slice2
Online Parameter Optimization
Questions


 Project status ?
 Future work ?
 Use cases ?
 Limitations ?
 Anti-patterns ?
 Project future ?
 Future of the paradigm ?
 ...

More Related Content

What's hot

Weh Teleport List
Weh Teleport ListWeh Teleport List
Weh Teleport List
guestde590
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
udaymoogala
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Edelweiss Kammermann
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
Scott Jenner
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
Altinity Ltd
 
Create Directory Under ASM Diskgroup
Create Directory Under ASM DiskgroupCreate Directory Under ASM Diskgroup
Create Directory Under ASM Diskgroup
Arun Sharma
 
Getting optimal performance from oracle e-business suite presentation
Getting optimal performance from oracle e-business suite presentationGetting optimal performance from oracle e-business suite presentation
Getting optimal performance from oracle e-business suite presentation
Berry Clemens
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
Dmv's & Performance Monitor in SQL Server
Dmv's & Performance Monitor in SQL ServerDmv's & Performance Monitor in SQL Server
Dmv's & Performance Monitor in SQL Server
Zeba Ansari
 
Intro to Dynamic Web Pages
Intro to Dynamic Web PagesIntro to Dynamic Web Pages
Intro to Dynamic Web Pages
Jussi Pohjolainen
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
Dabbal Singh Mahara
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
Ayub Mohammad
 
Oracle architecture ppt
Oracle architecture pptOracle architecture ppt
Oracle architecture ppt
Deepak Shetty
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Altinity Ltd
 
Les 13 memory
Les 13 memoryLes 13 memory
Les 13 memory
Femi Adeyemi
 
Les 05 Create Bu
Les 05 Create BuLes 05 Create Bu
Les 05 Create Bu
vivaankumar
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 

What's hot (20)

Weh Teleport List
Weh Teleport ListWeh Teleport List
Weh Teleport List
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Create Directory Under ASM Diskgroup
Create Directory Under ASM DiskgroupCreate Directory Under ASM Diskgroup
Create Directory Under ASM Diskgroup
 
Getting optimal performance from oracle e-business suite presentation
Getting optimal performance from oracle e-business suite presentationGetting optimal performance from oracle e-business suite presentation
Getting optimal performance from oracle e-business suite presentation
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
 
Dmv's & Performance Monitor in SQL Server
Dmv's & Performance Monitor in SQL ServerDmv's & Performance Monitor in SQL Server
Dmv's & Performance Monitor in SQL Server
 
Intro to Dynamic Web Pages
Intro to Dynamic Web PagesIntro to Dynamic Web Pages
Intro to Dynamic Web Pages
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
 
Oracle architecture ppt
Oracle architecture pptOracle architecture ppt
Oracle architecture ppt
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
Les 13 memory
Les 13 memoryLes 13 memory
Les 13 memory
 
Les 05 Create Bu
Les 05 Create BuLes 05 Create Bu
Les 05 Create Bu
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 

Similar to S4: Distributed Stream Computing Platform

S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
Farzad Nozarian
 
INTERNET OF THINGS & AZURE
INTERNET OF THINGS & AZUREINTERNET OF THINGS & AZURE
INTERNET OF THINGS & AZURE
DotNetCampus
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
Intelie
 
Guido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soaGuido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soa
Guido Schmutz
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Splunk
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
Sap esp integration options
Sap esp integration optionsSap esp integration options
Sap esp integration options
Luc Vanrobays
 
FIWARE CEP GE introduction, ICT 2015
FIWARE CEP GE introduction, ICT 2015FIWARE CEP GE introduction, ICT 2015
FIWARE CEP GE introduction, ICT 2015
ishkin
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
István Dávid
 
07 integrated process modelling
07   integrated process modelling07   integrated process modelling
07 integrated process modelling
Yury Kupriyanov
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoCreating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Fernando Lopez Aguilar
 
PATTERNS06 - The .NET Event Model
PATTERNS06 - The .NET Event ModelPATTERNS06 - The .NET Event Model
PATTERNS06 - The .NET Event Model
Michael Heron
 
Complex Event Processor 3.0.0 - An overview of upcoming features
Complex Event Processor 3.0.0 - An overview of upcoming features Complex Event Processor 3.0.0 - An overview of upcoming features
Complex Event Processor 3.0.0 - An overview of upcoming features
WSO2
 
Overview of VS2010 and .NET 4.0
Overview of VS2010 and .NET 4.0Overview of VS2010 and .NET 4.0
Overview of VS2010 and .NET 4.0
Bruce Johnson
 
Open Source Event Processing for Sensor Fusion Applications
Open Source Event Processing for Sensor Fusion ApplicationsOpen Source Event Processing for Sensor Fusion Applications
Open Source Event Processing for Sensor Fusion Applications
guestc4ce526
 
Webx 2010
Webx 2010Webx 2010
Webx 2010
steccami
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
Boyan Dimitrov
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
Stephanie Locke
 

Similar to S4: Distributed Stream Computing Platform (20)

S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
INTERNET OF THINGS & AZURE
INTERNET OF THINGS & AZUREINTERNET OF THINGS & AZURE
INTERNET OF THINGS & AZURE
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Guido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soaGuido schmutz-jax2011-event-driven soa
Guido schmutz-jax2011-event-driven soa
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Sap esp integration options
Sap esp integration optionsSap esp integration options
Sap esp integration options
 
FIWARE CEP GE introduction, ICT 2015
FIWARE CEP GE introduction, ICT 2015FIWARE CEP GE introduction, ICT 2015
FIWARE CEP GE introduction, ICT 2015
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 
07 integrated process modelling
07   integrated process modelling07   integrated process modelling
07 integrated process modelling
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoCreating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
 
PATTERNS06 - The .NET Event Model
PATTERNS06 - The .NET Event ModelPATTERNS06 - The .NET Event Model
PATTERNS06 - The .NET Event Model
 
Complex Event Processor 3.0.0 - An overview of upcoming features
Complex Event Processor 3.0.0 - An overview of upcoming features Complex Event Processor 3.0.0 - An overview of upcoming features
Complex Event Processor 3.0.0 - An overview of upcoming features
 
Overview of VS2010 and .NET 4.0
Overview of VS2010 and .NET 4.0Overview of VS2010 and .NET 4.0
Overview of VS2010 and .NET 4.0
 
Open Source Event Processing for Sensor Fusion Applications
Open Source Event Processing for Sensor Fusion ApplicationsOpen Source Event Processing for Sensor Fusion Applications
Open Source Event Processing for Sensor Fusion Applications
 
Webx 2010
Webx 2010Webx 2010
Webx 2010
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
 

More from Aleksandar Bradic

Creativity, Language, Computation - Sketching in Hardware 2022
Creativity, Language, Computation - Sketching in Hardware 2022Creativity, Language, Computation - Sketching in Hardware 2022
Creativity, Language, Computation - Sketching in Hardware 2022
Aleksandar Bradic
 
Technology and Power: Agency, Discourse and Community Formation
Technology and Power: Agency, Discourse and Community FormationTechnology and Power: Agency, Discourse and Community Formation
Technology and Power: Agency, Discourse and Community Formation
Aleksandar Bradic
 
Theses on AI User Experience Design - Sketching in Hardware 2020
Theses on AI User Experience Design - Sketching in Hardware 2020Theses on AI User Experience Design - Sketching in Hardware 2020
Theses on AI User Experience Design - Sketching in Hardware 2020
Aleksandar Bradic
 
Go-to-Market & Scaling: An Engineering Perspective
Go-to-Market & Scaling: An Engineering PerspectiveGo-to-Market & Scaling: An Engineering Perspective
Go-to-Market & Scaling: An Engineering Perspective
Aleksandar Bradic
 
Human-Data Interfaces: A case for hardware-driven innovation
Human-Data Interfaces: A case for hardware-driven innovationHuman-Data Interfaces: A case for hardware-driven innovation
Human-Data Interfaces: A case for hardware-driven innovation
Aleksandar Bradic
 
Computational Complexity for Poets
Computational Complexity for PoetsComputational Complexity for Poets
Computational Complexity for Poets
Aleksandar Bradic
 
Kolmogorov Complexity, Art, and all that
Kolmogorov Complexity, Art, and all thatKolmogorov Complexity, Art, and all that
Kolmogorov Complexity, Art, and all that
Aleksandar Bradic
 
De Bruijn Sequences for Fun and Profit
De Bruijn Sequences for Fun and ProfitDe Bruijn Sequences for Fun and Profit
De Bruijn Sequences for Fun and Profit
Aleksandar Bradic
 
Can Data Help Us Build Better Hardware Products ?
Can Data Help Us Build Better Hardware Products ? Can Data Help Us Build Better Hardware Products ?
Can Data Help Us Build Better Hardware Products ?
Aleksandar Bradic
 
Supplyframe Engineering
Supplyframe EngineeringSupplyframe Engineering
Supplyframe Engineering
Aleksandar Bradic
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
Aleksandar Bradic
 

More from Aleksandar Bradic (11)

Creativity, Language, Computation - Sketching in Hardware 2022
Creativity, Language, Computation - Sketching in Hardware 2022Creativity, Language, Computation - Sketching in Hardware 2022
Creativity, Language, Computation - Sketching in Hardware 2022
 
Technology and Power: Agency, Discourse and Community Formation
Technology and Power: Agency, Discourse and Community FormationTechnology and Power: Agency, Discourse and Community Formation
Technology and Power: Agency, Discourse and Community Formation
 
Theses on AI User Experience Design - Sketching in Hardware 2020
Theses on AI User Experience Design - Sketching in Hardware 2020Theses on AI User Experience Design - Sketching in Hardware 2020
Theses on AI User Experience Design - Sketching in Hardware 2020
 
Go-to-Market & Scaling: An Engineering Perspective
Go-to-Market & Scaling: An Engineering PerspectiveGo-to-Market & Scaling: An Engineering Perspective
Go-to-Market & Scaling: An Engineering Perspective
 
Human-Data Interfaces: A case for hardware-driven innovation
Human-Data Interfaces: A case for hardware-driven innovationHuman-Data Interfaces: A case for hardware-driven innovation
Human-Data Interfaces: A case for hardware-driven innovation
 
Computational Complexity for Poets
Computational Complexity for PoetsComputational Complexity for Poets
Computational Complexity for Poets
 
Kolmogorov Complexity, Art, and all that
Kolmogorov Complexity, Art, and all thatKolmogorov Complexity, Art, and all that
Kolmogorov Complexity, Art, and all that
 
De Bruijn Sequences for Fun and Profit
De Bruijn Sequences for Fun and ProfitDe Bruijn Sequences for Fun and Profit
De Bruijn Sequences for Fun and Profit
 
Can Data Help Us Build Better Hardware Products ?
Can Data Help Us Build Better Hardware Products ? Can Data Help Us Build Better Hardware Products ?
Can Data Help Us Build Better Hardware Products ?
 
Supplyframe Engineering
Supplyframe EngineeringSupplyframe Engineering
Supplyframe Engineering
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 

S4: Distributed Stream Computing Platform

  • 1. S4 : Distributed Stream Computing Platform Aleksandar Bradic, Sr. Director, Engineering and R&D, Vast.com Software Scalability | Belgrade meetup #04
  • 2. Introduction Simple Scalable Streaming System General-purpose, distributed, scalable, partially fault- tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data Inspired by MapReduce and Actor models of computation
  • 3. Introduction Stream data processing | use cases : Real-time data analysis (financial data, twitter feeds, news data ...) High-frequency trading Complex Event Processing Real-time Search Social Networks Incorporating Web application user feedback in real time Search advertising personalization Low-latency data processing pipelines Online algorithm development ...
  • 4. Introduction Existing large-scale MapReduce data processing platforms (Hadoop) are highly optimized for batch processing MapReduce systems typically operate on static data by scheduling batch jobs In stream computing, the paradigm is to have a stream of events that flow into the system at a given data rate over which we have no control The processing system must keep up with event rate or degrade gracefully by eliminating events (load shedding) The streaming paradigm dictates a very different architecture than the one used in batch processing Attempting to build a general-purpose platform for both batch and stream computing would result in highly complex system that may end up not being optimal for either task
  • 5. Introduction Many real-world systems implement a streaming strategy of partitioning the input data into fixed-size segments that are processed by MapReduce platform The disadvantage of this approach is that the latency is proportional to the length of the segment plus the overhead required to do the segmentation and initiate the processing jobs Small segments will reduce latency and overhead, but will make it more complex to manage inter-segment dependencies The optimal segment size will depend on the application S4 attemps to explore a different paradigm that is simple and can operate on data streams in real time
  • 6. Introduction S4 | Design goals : Provide a simple Programming Interface for processing data streams Design a cluster with high availability that can scale using commodity hardware Minimize latency by using local memory in each processing node and avoiding I/O bottlenecks Use a decentralized and symmetric architecture; all nodes share the same functionality and responsibilities. There is no central node with specific responsibilities. Use a pluggable architecture, to keep the design as generic and customizable as possible Make the design flexible and easy to program
  • 7. Introduction S4 | Assumptions : Lossy failover is acceptable. Upon a server failure, processes are automatically moved to a standby server. The state of the processes, which is stored in local memory, is lost during the handoff. The state is regenerated using the input streams. Downstream systems must degrade gracefully. Nodes will not be added to or removed from a running cluster
  • 8. Introduction Data processing is based on Processing Elements (PE) - corresponding to Actor model of concurrent computation Messages are transmitted between PEs in the form of data events The state of each PE is inaccessible to other PEs The framework provides capability to route events to appropriate PEs and create new instances of PEs
  • 9. Design Stream = sequence of elements ("events") of the form (K,A) K - tuple-valued keys A - attributes Design goal is to create a flexible stream computing platform which consumes such a stream, computes intermediate values and possibly emits other streams in distributed computing environment
  • 10. Design Example : Input event contains a document with a quotation in English Continuously produce a sorted list of top K most frequent words across all documents with minimal latency
  • 12. Processing Elements Processing Elements (PEs) are the basic computational units in S4 Each instance of PE is uniquely identified by four components: it's functionality as defined by PE class and associated configuration the types of events that it consumes the keyed attribute in those events the value of the keyed attribute in events which it consumes
  • 13. Processing Elements Every PE consumes exactly those events which correspond to the value on which it is keyed and can produce output events PE is instantiated for each value of the key attribute Special class of PEs is the set of keyless PEs with no keyed attribute or value. These PEs consume all events of the type with which they are associated Keyless PEs are typically used at the input layer of an S4 cluster where events are assigned a key "Garbage Collection" of PEs represents a major challenge for the platform State of the PE is lost after the "cleanup"
  • 14. Processing Node Processing Nodes (PNs) are the logical hosts to PEs They are responsible for listening to events, executing operations on the incoming events, dispatching events with the assistance of communication layer and emitting output events S4 routes each event to PN based on a hash function of the values of all known keyed attributes in that event A single event may be routed to multiple PNs The set of all possible keying attributes is known from the configuration of S4 cluster An event listener in the PN passes incoming events to the processing element container (PEC) which invokes the appropriate PEs in appropriate order
  • 15. Processing Node There is a special type of PE object : the PE prototype It has the first three components of its identity : functionality event type keyed attribute The PE prototype does not have the attribute value assigned This object is configured upon initialization and for any value V is capable of cloning itself to create fully qualified PEs of that class with identical configuration and value V for the keyed attribute This operation is triggered by the PN once for each unique value of the keyed attribute that it encounters
  • 16. Processing Node As a result of this design - all events with a particular value of a keyed attribute are guaranteed to arrive at particular corresponding PN and be routed to the corresponding PE instance within it Every keyed PE can be mapped to exactly one PN based on the value of the hash function applied to the value of keyed attribute of that PE Keyless PEs may be instantiated on every PN
  • 18. Communication Layer Communication layer provides cluster management and automatic failover to standby nodes and maps physical nodes to logical nodes It automatically detects hardware failures and accordingly updates the mapping Emitters specify only logical nodes when sending messages and are unaware of the physical nodes or when logical nodes are re-mapped due to failures The communication layer uses ZooKeeper to help coordinate between node in a S4 cluster.
  • 19. Programming Model High-level programming paradigm is to write generic, reusable and configurable processing elements that can be used across various applications PEs are assembled into applications using Spring Framework The processing element API is fairly simple and flexible : developers essentially implement two primary handlers: input event handler processEvent() - invoked for each incoming event of the types the PE has subscribed to output mechanism output()- optional method that implements the output of PE to an external system. Can be configured to be invoked in a variety of ways - at regular time intervals t or on receiving n input events.
  • 20. Programming Model QueryCounterPE.java private queryCount = 0; public void processEvent(Event event) { queryCount++; } public void output() { String query = (String)this.getKeyValue().get(0); persister.set(query, queryCount); } QueryCounterPE.xml <bean id="queryCounterPE" class="com.company.s4.processor.QueryCounterPE"> <property name="keys"> <list> <value>QueryEvent queryString</value> </list> </property> <property name="persister" ref="externalPersister"> <property name="outputFrequencyByTimeBoundary" value="600"/> </bean>
  • 21. Example : Streaming CTR Computation Click-through rate (CTR) represents the ratio of the number of clicks divided by the number of impressions When sufficient historical data is available, CTR is a good estimate of the probability that user will click on the item S4 is used to measure CTR in real-time for each query-ad combination To compute CTR in this manner - we need to route click and serve events using a key composed of the query and ad IDs If the click payload doesn't include query and ad information, we need to do a join by serve ID prior to routing the events with query-ad as the key Once joined, the events must pass-through a bot filter Finally, serves and clicks are aggregated to compute CTR
  • 23. Example : online parameter optimization System for automating tuning of one or more parameters of a search advertising system using live traffic The system removes the need for manual tuning and consistent human intervention, while searching a large parameter space in less time than possible by manual searches The system ingests events emitted by the target system (search advertising system), measures performance, applies an adaptation algorithm to determine new parameters, and injects the new parameters back into the target system (Closed loop approach similar to traditional control systems)
  • 24. Example : online parameter optimization We assume that the target system (TS) produces output that can be represented as a stream and has measurable performance in the form of a configurable objective function (OF) We set aside 2 randomly-assigned slices of the output stream of the target system as slice1 and slice2 We require that the target system has the ability to apply different parameter values in each slice and that each slice can be identified as such output stream
  • 25. Example : online parameter optimization The system has 3 high-level functional components: Measurement : ingests the slice1 and slice2 streams of the TS and measures the value of OF in each slice. The OF is measured for the duration of a slot Comparator : takes as input the measurement produced by the measurement component and determines if and when there is a statistically significant difference in performance between the slice1 and slice2 slices. When it determines a significant difference, it sends a message to the optimizer. Optimizer : implements the adaption strategy. Takes as input the history of parameter influences and output new parameter values for both slice1 and slice2
  • 27. Questions Project status ? Future work ? Use cases ? Limitations ? Anti-patterns ? Project future ? Future of the paradigm ? ...