SlideShare a Scribd company logo
Why Stream Data as Part
of Data Transformation
Glen Gomez Zuazo, Senior Solutions Architect
Presenter
Glen Gomez Zuazo, Senior Solutions Architect
● Data Science, Machine Learning, Distributed Systems, Full
Stack Development, Blockchain and Enterprise
Architecture
● Passionate involvement in Diversity and Inclusion
● STEM advocate for young people (Middle and High School)
● Teaching technology (CSSE, AWS and Microservices)
● Spending time with his family, including his dog (Bolillo),
running and camping
Event-Driven Data Architecture in 2019
■ Event-driven architectures are increasingly part of a complete data
transformation solution
■ This talks covers
● details of each
● advantages and disadvantages
● how to select the best for your company’s needs
Prevalent examples
■ Apache Kafka
■ Cloud Native Computing Foundation’s NATS
■ Amazon SQS
■ Lightbend Akka
AWS SQS
Amazon Simple Queue Service
■ Fully managed message queuing service
■ Allows decoupling/scaling microservices, distributed systems, and
serverless applications from Sync to Asynch.
■ Eliminates complexity/overhead of managing and operating
message oriented middleware
SQS: type types of message queues
■ Standard queues: maximum throughput, best-effort ordering,
and at-least-once delivery
■ SQS FIFO queues: guarantees messages are processed exactly
once, in the exact order that they are sent.
SQS Functionality
■ Unlimited queues and messages
■ Payload
● Up to 256KB of text in any format
● Each 64KB ‘chunk’ of payload is billed as 1 request
● (E.g. 256KB payload is billed as four requests)
● Use Amazon SQS Extended Client Library for Java to send messages >256KB
● Extended Client Library uses Amazon S3 to store the message payload
■ Batches
● Send, receive, or delete messages in batches of up to 10 messages or 256KB
● Batches cost the same amount as single messages
● More cost effective for customers
SQS Functionality (cont’d)
■ Long polling
● Reduce extraneous polling to minimize cost while receiving new messages as
quickly as possible.
● When your queue is empty, long-poll requests wait up to 20 seconds for the next
message to arrive
● Long poll requests cost the same amount as regular requests.
■ Retain messages in queues for up to 14 days.
■ Send and read messages simultaneously
Functionality
■ Message locking.
● While is Processing.
■ Queue sharing
● Anonymously
● Specific AWS Accounts
■ Server-side encryption (SSE)
● AWS Key Management Service (AWS KMS)
■ Dead Letter Queues (DLQ)
● source queue (standard or FIFO).
Publish-Subscribe for Application Integration
● Exchange Data Asynchronously
● Be Independent and fault-tolerant
● Allow Systems to be in different environments (OS, Language)
Messaging Patterns
Message queuing
Publish-subscribe (pub-sub)
NATS
NATS
■ high-performance, cloud native messaging system
■ provides an entire foundational level
■ can build both synchronous and asynchronous, reliable, highly
available systems
■ 2.0 release provides incredible features both for high availability and
security
● not to be confused with NGS, the Synadia commercial version
Let’s cover the details of how we plan to deploy and configure NATS with
special focus on HA and security.
High Availability
■ Deploy a NATS cluster as a global entity with NATS gateways used to
connect multi regions. Both NATS and System proper will be
deployed active/active.
■ It is assumed that there is a geographically pinned single point of
entry into each cluster in all of these scenarios as per standard AWS
practices.
■ In "classic" active-active scenarios, you have two or more completely
isolated mirrors.
Sharing Streams and Services
■ NATS account model also comes with an explicit and secure by
default means of allowing communication between accounts.
● Account owners can export either a stream (write-only from the account, read-only
to subscribers)
● Service (read/write).
■ Ability to export your service or stream
● Public export (allows any authorized account to import that subject)
● Private export. (Requires an explicit, out of band delivery of an activation token).
Security and Multi-Tenancy
■ Main considerations / concerns in a multi-tenant system that sits on top of a
central messaging system
● Security of clients and the message traffic
● Configuration maintenance.
● Multi-tenant systems running in the same cluster (e.g. K8s tenants co-
existing with ECS tenants) complexity
■ In a decentralized model, clients authenticate to NATS with signed user JWTs.
There is a hierarchy that goes from Operator to Account to User.
■ In NATS, an account is a unit of isolation and a user is a unit of client
authentication and authorization.
RabbitMQ
RabbitMQ
■ Messages published to queues (through exchange points).
■ Multiple consumers can connect to a queue.
■ Message broker distribute messages across all available consumers.
■ Also, we can re-deliver the message if the consumer fails.
■ Delivery order guaranteed for queues with a single consumer (this is
not possible when the queue has multiple consumers).
Architecture Considerations
■ Performance:
● RabbitMQ is around 20,000 messages/second
■ Processing:
● The consumer is just FIFO based, reading from the HEAD and processing 1 by 1
■ HA
● Provides High Availability Support
■ Open Source
● RabbitMQ is open Source through Mozilla Public License
Architecture Diagram
Apache Kafka
Kafka
■ We use Apache Kafka when it comes to enabling communication
between producers and consumers using message-based topics.
Apache Kafka is a fast, scalable, fault-tolerant, publish-subscribe
messaging system.
■ Basically, it designs a platform for high-end new generation
distributed applications. Also, it allows a large number of permanent
or ad-hoc consumers.
Architecture
■ Kafka Producer API
● Permits an application to publish a stream of records to one or more Kafka topics.
■ Kafka Consumer API
● To subscribe to one or more topics and process the stream of records produced to
them in an application
■ Kafka Streams API
● Gives permission to an application in order to act as a stream processor
● Consumes an input stream from one or more topics
● Produces an output stream to one or more output topics
● Also effectively transforming the input streams to output streams
■ Kafka Connector API
● Allows building and running reusable producers or consumers that connect Kafka
topics to existing applications or data systems
● Example: connector to a relational database might capture every change to a table
Architecture Diagram
Data Transformation - Architecture
Scylla + Kafka Users — just at Scylla Summit!
Scylla Summit 2018 Presenters
■ Discord
■ Faraday Future
■ GE
■ Grab
■ Natura
■ Nauto
■ Numberly
Scylla Summit 2019 Presenters
■ Lookout
■ Nauto
■ Numberly
■ OlaCabs
■ SmartDeployAI
■ Zeotap
Take away
Architectural Message Review Example
We follow processes to define which technology and patterns are going
to be apply base on the specifics requirements of the system.
We perform the following steps:
■ System Requirements
■ ASR (Architecturally Significant Requirements)
■ ADR (Architecturally Decisions Record)
■ System Context and Data Flow
■ PoC
■ MVPx
Architecturally Significant Requirements
Architecturally Significant Requirements (ASR) have a measurable effect on a system's
architecture, which includes application and infrastructure.
ASR Criteria
Requirements that have wide effects, are strict, or difficult to achieve are often ASRs. Per the Wikipedia article
on ASRs, some common indicators for a requirement being an ASR are:
■ The requirement is associated with high business value and/or technical risk.
■ The requirement is a concern of a particularly important (influential, that is) stakeholder.
■ The requirement has a first-of-a-kind character, e.g. none of the responsibilities of already existing
components in the architecture addresses it.
■ The requirement has QoS/SLA characteristics that deviate from all ones that are already satisfied by the
evolving architecture.
■ The requirement has caused budget overruns or client dissatisfaction in a previous project with a similar
context.
Architecturally Significant Requirements
Categories
We have split our ASRs up into categories to make them easier to read and to allow us to
provide more detail for each requirement. These categories are:
■ Availability
■ Maintainability
■ Observability
■ Performance
■ Resiliency
■ Testability
■ Usability
Architecturally Decision Record
■ NATS is an open source, powerful, lightweight, secure-by-default
messaging system.
■ Gives same kind of delivery control as consumer groups in Kafka
■ But without overhead of maintenance and operations cost.
■ NATS is essentially self-managing---it doesn’t need anyone to create
new partitions to scale up or down
■ Clusters form themselves and self-heal, and clients are immediately
notified of cluster topology changes.
■ NATS supports traditional request/reply, pub/sub, fanout, and many
more messaging patterns.
Why did we need a message broker?
Our ASRs lean heavily toward:
■ Resiliency,
■ Stability, and
■ Performance
When doing traditional point-to-point communications you have to do a number of things
that introduce points of failure, possible performance degradation, and loss of stability:
■ Service discovery (what's the address for a service?)
■ Retries and Failure Responses
■ Coping with slow connections and intermittent failure
■ Exponential back-off to avoid cascading failures
Why not Kafka?
Once we decided that we wanted to take advantage of a message
broker and utilize all of the asynchronous power that comes with it,
we needed to pick which broker.
■ We require low operations burden.
■ Ability to scale without having delicate reconfiguration
■ Fast request-response performance
Why not RabbitMQ?
Rabbit has a reputation for reliability and speed, and some of the team
members had used it before. One of the main reasons we disliked the
use of Rabbit was because of the explicit nature of fanout exchanges.
■ Require explicit definition of queues and subscriptions
■ Not recommended for multi-tenant systems
■ Ability to add instances / subscribers without reconfiguration
NATS Security
Neither Rabbit nor Kafka gave us the kind of security support we
needed. We need the ability to explicitly control which clients can
publish to which topics and which clients can subscribe to those
topics.
■ Ability to inject the security information without taking broker
down.
■ Flexibility to work with nkeys
■ Asymmetric encryption key system
Comparison Matrix
The following is a summary of the satisfaction of requirements for each
of the options.
References
■ Apache Kafka Website: https://kafka.apache.org
■ NATS Website: https://nats.io
■ AWS SQS: https://aws.amazon.com/sqs/
■ MQ Website: https://www.rabbitmq.com
■ Benchmarking Message Queue Latency: https://bravenewgeek.com/benchmarking-
message-queue-latency/
Thank you Stay in touch
Any questions?
Glen Gomez Zuazo
g_gomez_zuazo@hotmail.com
@ZuazoGlen

More Related Content

What's hot

Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
ScyllaDB
 
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
ScyllaDB
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
ScyllaDB
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
DataStax Academy
 
Free & Open DynamoDB API for Everyone
Free & Open DynamoDB API for EveryoneFree & Open DynamoDB API for Everyone
Free & Open DynamoDB API for Everyone
ScyllaDB
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
ScyllaDB
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
ScyllaDB
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
DataStax Academy
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
ScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
MariaDB plc
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
ScyllaDB
 
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformReal-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
ScyllaDB
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
ScyllaDB
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
ScyllaDB
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
ScyllaDB
 

What's hot (20)

Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
Free & Open DynamoDB API for Everyone
Free & Open DynamoDB API for EveryoneFree & Open DynamoDB API for Everyone
Free & Open DynamoDB API for Everyone
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformReal-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
 

Similar to Captial One: Why Stream Data as Part of Data Transformation?

Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
Altinity Ltd
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
Maheedhar Gunturu
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013
adamnelson
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty
 
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Lightbend
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
1480-techintrotoiib-150224130001-conversion-gate01.pptx
1480-techintrotoiib-150224130001-conversion-gate01.pptx1480-techintrotoiib-150224130001-conversion-gate01.pptx
1480-techintrotoiib-150224130001-conversion-gate01.pptx
BalakoteswaraReddyM
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsBeyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Clemens Vasters
 
Message queues
Message queuesMessage queues
Message queues
Ahmad karawash
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
sedukull
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
Michael Christofferson
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!
pflueras
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
Matt Masuda
 

Similar to Captial One: Why Stream Data as Part of Data Transformation? (20)

Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
1480-techintrotoiib-150224130001-conversion-gate01.pptx
1480-techintrotoiib-150224130001-conversion-gate01.pptx1480-techintrotoiib-150224130001-conversion-gate01.pptx
1480-techintrotoiib-150224130001-conversion-gate01.pptx
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsBeyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
 
Message queues
Message queuesMessage queues
Message queues
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 

More from ScyllaDB

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 

More from ScyllaDB (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 

Captial One: Why Stream Data as Part of Data Transformation?

  • 1. Why Stream Data as Part of Data Transformation Glen Gomez Zuazo, Senior Solutions Architect
  • 2. Presenter Glen Gomez Zuazo, Senior Solutions Architect ● Data Science, Machine Learning, Distributed Systems, Full Stack Development, Blockchain and Enterprise Architecture ● Passionate involvement in Diversity and Inclusion ● STEM advocate for young people (Middle and High School) ● Teaching technology (CSSE, AWS and Microservices) ● Spending time with his family, including his dog (Bolillo), running and camping
  • 3. Event-Driven Data Architecture in 2019 ■ Event-driven architectures are increasingly part of a complete data transformation solution ■ This talks covers ● details of each ● advantages and disadvantages ● how to select the best for your company’s needs
  • 4. Prevalent examples ■ Apache Kafka ■ Cloud Native Computing Foundation’s NATS ■ Amazon SQS ■ Lightbend Akka
  • 6. Amazon Simple Queue Service ■ Fully managed message queuing service ■ Allows decoupling/scaling microservices, distributed systems, and serverless applications from Sync to Asynch. ■ Eliminates complexity/overhead of managing and operating message oriented middleware
  • 7. SQS: type types of message queues ■ Standard queues: maximum throughput, best-effort ordering, and at-least-once delivery ■ SQS FIFO queues: guarantees messages are processed exactly once, in the exact order that they are sent.
  • 8. SQS Functionality ■ Unlimited queues and messages ■ Payload ● Up to 256KB of text in any format ● Each 64KB ‘chunk’ of payload is billed as 1 request ● (E.g. 256KB payload is billed as four requests) ● Use Amazon SQS Extended Client Library for Java to send messages >256KB ● Extended Client Library uses Amazon S3 to store the message payload ■ Batches ● Send, receive, or delete messages in batches of up to 10 messages or 256KB ● Batches cost the same amount as single messages ● More cost effective for customers
  • 9. SQS Functionality (cont’d) ■ Long polling ● Reduce extraneous polling to minimize cost while receiving new messages as quickly as possible. ● When your queue is empty, long-poll requests wait up to 20 seconds for the next message to arrive ● Long poll requests cost the same amount as regular requests. ■ Retain messages in queues for up to 14 days. ■ Send and read messages simultaneously
  • 10. Functionality ■ Message locking. ● While is Processing. ■ Queue sharing ● Anonymously ● Specific AWS Accounts ■ Server-side encryption (SSE) ● AWS Key Management Service (AWS KMS) ■ Dead Letter Queues (DLQ) ● source queue (standard or FIFO).
  • 11. Publish-Subscribe for Application Integration ● Exchange Data Asynchronously ● Be Independent and fault-tolerant ● Allow Systems to be in different environments (OS, Language)
  • 13. NATS
  • 14. NATS ■ high-performance, cloud native messaging system ■ provides an entire foundational level ■ can build both synchronous and asynchronous, reliable, highly available systems ■ 2.0 release provides incredible features both for high availability and security ● not to be confused with NGS, the Synadia commercial version Let’s cover the details of how we plan to deploy and configure NATS with special focus on HA and security.
  • 15. High Availability ■ Deploy a NATS cluster as a global entity with NATS gateways used to connect multi regions. Both NATS and System proper will be deployed active/active. ■ It is assumed that there is a geographically pinned single point of entry into each cluster in all of these scenarios as per standard AWS practices. ■ In "classic" active-active scenarios, you have two or more completely isolated mirrors.
  • 16. Sharing Streams and Services ■ NATS account model also comes with an explicit and secure by default means of allowing communication between accounts. ● Account owners can export either a stream (write-only from the account, read-only to subscribers) ● Service (read/write). ■ Ability to export your service or stream ● Public export (allows any authorized account to import that subject) ● Private export. (Requires an explicit, out of band delivery of an activation token).
  • 17.
  • 18.
  • 19. Security and Multi-Tenancy ■ Main considerations / concerns in a multi-tenant system that sits on top of a central messaging system ● Security of clients and the message traffic ● Configuration maintenance. ● Multi-tenant systems running in the same cluster (e.g. K8s tenants co- existing with ECS tenants) complexity ■ In a decentralized model, clients authenticate to NATS with signed user JWTs. There is a hierarchy that goes from Operator to Account to User. ■ In NATS, an account is a unit of isolation and a user is a unit of client authentication and authorization.
  • 20.
  • 22. RabbitMQ ■ Messages published to queues (through exchange points). ■ Multiple consumers can connect to a queue. ■ Message broker distribute messages across all available consumers. ■ Also, we can re-deliver the message if the consumer fails. ■ Delivery order guaranteed for queues with a single consumer (this is not possible when the queue has multiple consumers).
  • 23. Architecture Considerations ■ Performance: ● RabbitMQ is around 20,000 messages/second ■ Processing: ● The consumer is just FIFO based, reading from the HEAD and processing 1 by 1 ■ HA ● Provides High Availability Support ■ Open Source ● RabbitMQ is open Source through Mozilla Public License
  • 26. Kafka ■ We use Apache Kafka when it comes to enabling communication between producers and consumers using message-based topics. Apache Kafka is a fast, scalable, fault-tolerant, publish-subscribe messaging system. ■ Basically, it designs a platform for high-end new generation distributed applications. Also, it allows a large number of permanent or ad-hoc consumers.
  • 27. Architecture ■ Kafka Producer API ● Permits an application to publish a stream of records to one or more Kafka topics. ■ Kafka Consumer API ● To subscribe to one or more topics and process the stream of records produced to them in an application ■ Kafka Streams API ● Gives permission to an application in order to act as a stream processor ● Consumes an input stream from one or more topics ● Produces an output stream to one or more output topics ● Also effectively transforming the input streams to output streams ■ Kafka Connector API ● Allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems ● Example: connector to a relational database might capture every change to a table
  • 29. Data Transformation - Architecture
  • 30. Scylla + Kafka Users — just at Scylla Summit! Scylla Summit 2018 Presenters ■ Discord ■ Faraday Future ■ GE ■ Grab ■ Natura ■ Nauto ■ Numberly Scylla Summit 2019 Presenters ■ Lookout ■ Nauto ■ Numberly ■ OlaCabs ■ SmartDeployAI ■ Zeotap
  • 32. Architectural Message Review Example We follow processes to define which technology and patterns are going to be apply base on the specifics requirements of the system. We perform the following steps: ■ System Requirements ■ ASR (Architecturally Significant Requirements) ■ ADR (Architecturally Decisions Record) ■ System Context and Data Flow ■ PoC ■ MVPx
  • 33. Architecturally Significant Requirements Architecturally Significant Requirements (ASR) have a measurable effect on a system's architecture, which includes application and infrastructure. ASR Criteria Requirements that have wide effects, are strict, or difficult to achieve are often ASRs. Per the Wikipedia article on ASRs, some common indicators for a requirement being an ASR are: ■ The requirement is associated with high business value and/or technical risk. ■ The requirement is a concern of a particularly important (influential, that is) stakeholder. ■ The requirement has a first-of-a-kind character, e.g. none of the responsibilities of already existing components in the architecture addresses it. ■ The requirement has QoS/SLA characteristics that deviate from all ones that are already satisfied by the evolving architecture. ■ The requirement has caused budget overruns or client dissatisfaction in a previous project with a similar context.
  • 34. Architecturally Significant Requirements Categories We have split our ASRs up into categories to make them easier to read and to allow us to provide more detail for each requirement. These categories are: ■ Availability ■ Maintainability ■ Observability ■ Performance ■ Resiliency ■ Testability ■ Usability
  • 35. Architecturally Decision Record ■ NATS is an open source, powerful, lightweight, secure-by-default messaging system. ■ Gives same kind of delivery control as consumer groups in Kafka ■ But without overhead of maintenance and operations cost. ■ NATS is essentially self-managing---it doesn’t need anyone to create new partitions to scale up or down ■ Clusters form themselves and self-heal, and clients are immediately notified of cluster topology changes. ■ NATS supports traditional request/reply, pub/sub, fanout, and many more messaging patterns.
  • 36. Why did we need a message broker? Our ASRs lean heavily toward: ■ Resiliency, ■ Stability, and ■ Performance When doing traditional point-to-point communications you have to do a number of things that introduce points of failure, possible performance degradation, and loss of stability: ■ Service discovery (what's the address for a service?) ■ Retries and Failure Responses ■ Coping with slow connections and intermittent failure ■ Exponential back-off to avoid cascading failures
  • 37. Why not Kafka? Once we decided that we wanted to take advantage of a message broker and utilize all of the asynchronous power that comes with it, we needed to pick which broker. ■ We require low operations burden. ■ Ability to scale without having delicate reconfiguration ■ Fast request-response performance
  • 38. Why not RabbitMQ? Rabbit has a reputation for reliability and speed, and some of the team members had used it before. One of the main reasons we disliked the use of Rabbit was because of the explicit nature of fanout exchanges. ■ Require explicit definition of queues and subscriptions ■ Not recommended for multi-tenant systems ■ Ability to add instances / subscribers without reconfiguration
  • 39. NATS Security Neither Rabbit nor Kafka gave us the kind of security support we needed. We need the ability to explicitly control which clients can publish to which topics and which clients can subscribe to those topics. ■ Ability to inject the security information without taking broker down. ■ Flexibility to work with nkeys ■ Asymmetric encryption key system
  • 40. Comparison Matrix The following is a summary of the satisfaction of requirements for each of the options.
  • 41. References ■ Apache Kafka Website: https://kafka.apache.org ■ NATS Website: https://nats.io ■ AWS SQS: https://aws.amazon.com/sqs/ ■ MQ Website: https://www.rabbitmq.com ■ Benchmarking Message Queue Latency: https://bravenewgeek.com/benchmarking- message-queue-latency/
  • 42. Thank you Stay in touch Any questions? Glen Gomez Zuazo g_gomez_zuazo@hotmail.com @ZuazoGlen

Editor's Notes

  1. Event-driven architectures are increasingly part of a complete data transformation solution. Learn how to employ Apache Kafka, Cloud Native Computing Foundation’s NATS, Amazon SQS, or other message queueing technologies. This talks covers the details of each, their advantages and disadvantages and how to select the best for your company’s needs.
  2. Notes: Lightbend Akka, this ie beyond my analysis scope for this presentation for Capital One applications, But I know that at least one other presenter is going to be speaking about Akka/Scala — that is Alexandros Bantis from Tubi.tv. Even though it may have been beyond Capital One's consideration, you may wish to mention it in a roundup of popular solutions.
  3. Extra Notes: Send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available. AWS console, Command Line Interface or SDK of your choice, and three simple commands.
  4. Message locking: When a message is received, it becomes “locked” while being processed. This keeps other computers from processing the message simultaneously. If the message processing fails, the lock will expire and the message will be available again. Queue sharing: Securely share Amazon SQS queues anonymously or with specific AWS accounts. Queue sharing can also be restricted by IP address and time-of-day. Server-side encryption (SSE): Protect the contents of messages in Amazon SQS queues using keys managed in the AWS Key Management Service (AWS KMS). SSE encrypts messages as soon as Amazon SQS receives them. The messages are stored in encrypted form and Amazon SQS decrypts messages only when they are sent to an authorized consumer. Dead Letter Queues (DLQ): Handle messages that have not been successfully processed by a consumer with Dead Letter Queues. When the maximum receive count is exceeded for a message it will be moved to the DLQ associated with the original queue. Set up separate consumer processes for DLQs which can help analyze and understand why messages are getting stuck. DLQs must be of the same type as the source queue (standard or FIFO).
  5. In a solution where every service requires NATS to be available in order to function, we clearly need to ensure that NATS meets or exceeds our Top Resiliency Tier level SLAs. To do this, we'll deploy a NATS cluster as a global entity with NATS gateways used to connect east and west. Both NATS and System proper will be deployed active/active. It is assumed that there is a geographically pinned single point of entry into each cluster in all of these scenarios as per standard AWS practices. In "classic" active-active scenarios, you have two or more completely isolated mirrors. These two geolocated clusters are completely unaware of each other. Independent component failure is isolated within a region, and in the case of an entire region failure, routes are updated to direct all traffic to the other remaining regions.
  6. The NATS account model also comes with an explicit and secure by default means of allowing communication between accounts. As an account owner, you can export either a stream (write-only from the account, read-only to subscribers) or a service (read/write). When you export your service or stream, you can choose to do so as a public or a private export. A public export allows any authorized account to import that subject. A private export requires an explicit, out of band delivery of an activation token to the account wishing to import. Without this token, an account cannot import a private export. What this boils down to is that, with some facilitation by a service to generate keys and tokens, tenants can manage their own topic namespaces, their own users (connected clients), and their own imports/exports with no manual operations overhead. We get security by default, decentralized configuration, self-service secure message exchange, and a "service marketplace" where account (tenant) owners can browse exported subjects and add requests like a shopping cart.
  7. In a multi-tenant system that sits on top of a central messaging system, one of our main concerns was not just the security of clients and the message traffic, but in maintaining configuration. If we had to re-write a configuration file and send an update signal to a server every time we added or removed a tenant, this would become a maintenance nightmare. This would be compounded even more with two multi-tenant systems running in the same cluster (e.g. K8s tenants co-existing with ECS tenants). In a decentralized model, clients authenticate to NATS with signed user JWTs. There is a hierarchy that goes from Operator to Account to User. In NATS, an account is a unit of isolation and a user is a unit of client authentication and authorization. This decentralized security model actually solves a number of other problems we would have inevitably run into.
  8. RabbitMQ is an open-source message-broker software (sometimes called message-oriented middleware) that originally implemented the Advanced Message Queuing Protocol (AMQP) and has since been extended with a plug-in architecture to support Streaming Text Oriented Messaging Protocol (STOMP), Message Queuing Telemetry Transport (MQTT), and other protocols. The output of RabbitMQ design:
  9. One of the best features of Kafka is, it is highly available and resilient to node failures and supports automatic recovery. This feature makes Apache Kafka ideal for communication and integration between components of large-scale data systems in real-world data systems.
  10. Point to point operations are generally synchronous, though you can accomplish some decent asynchronous operations with gRPC streaming. Finally, point-to-point means that no interested parties can become aware of communications unless the sender goes out of its way to make multiple P2P connections or emit secondary events. Our thought is if you're going to emit secondary events, why not build the entire substrate out of asynchronous messaging, skipping point to point altogether? Service discovery, especially explicit discovery requiring a discovery broker like Netflix Eureka, introduces a new single point of failure to the entire system and, even when working perfectly, introduces the latency cost of at least one more network hop (if you're caching, then you have to deal with the consequences of outdated discovery data).
  11. Because of the history and precedent of using Kafka within Capital One, including its role as the backbone behind the Streaming Data Platform (SDP), we considered using Kafka for our broker. Once we decided that we wanted to take advantage of a message broker and utilize all of the asynchronous power that comes with it, we needed to pick which broker. There are a number of critical reasons why we chose against Kafka. First and foremost, we wanted a low operations burden and Kafka is anything but that. Further, we need the ability to scale our services and to dynamically add new topics and new subscribers live, at runtime, in production, without having to perform delicate reconfiguration. Because of the way Kafka works, we would have to reconfigure partitions and topics manually or through some form of potentially brittle automation. You can't simply scale up and down subscribers and publishers without altering Kafka configuration accordingly. We also needed incredibly fast request-response performance. We wanted the flexibility of an asynchronous substrate without sacrificing synchronous point-to-point performance. We could not get that with Kafka and NATS outperformed Kafka for non-durable messages in every benchmark.
  12. Because of the history and precedent of using Kafka within Capital One, including its role as the backbone behind the Streaming Data Platform (SDP), we considered using Kafka for our broker. With Rabbit, clients must explicitly define the queues and subscriptions and exchanges in use when they connect. This can be problematic and create problems in multi-tenant systems. We needed a system where we could dynamically scale the number of instances of a queue subscriber AND add more subscribers to the same queue without negatively impacting existing service or requiring a reconfiguration (manual or automatic) of the message broker.
  13. Because the client list is external to the message broker (a 1:1 correlation with tenant services), this security information needs to be injectable into the broker cluster, no matter how many instances of the broker are running, without ever taking the broker down in production. NATS security not only gives us this, but lets us work with nkeys, an incredibly powerful asymmetric encryption key system that is less vulnerable to attack than traditional SSH keys and can allow security information to easily flow from a Kubernetes secret to tenant services and the broker configuration.