SlideShare a Scribd company logo
1 of 34
Download to read offline
William McKnight
www.mcknightcg.com
214-514-1444
Trends in Streaming Analytics and
Message-oriented Middleware
@williammcknight
The ETL Legacy
• An ad hoc manner of connecting sources and destinations
• ETL surfaced in the 1990s
– Far fewer data platforms and types
– Built for DW
– Bottleneck in DW population
– Time and Resource intensive
– Batch
• Can be chaotic and unmanageable
2
EAI
• Then came EAI
– Facilitate exchange of business transactions messages between
applications
– Used Enterprise Service classes underneath the covers
– Works for small scale data
– Not designed to handle the span of data that is required for
modern day, like sensors
3
Modern Realities of Data Integration
• Desire for consolidated methods for data integration
• New types of data sources
– Logs, sensors, etc.
• We have more than OLTP and OLAP
– Distributed data platforms
• Desire for real-time data
• High-velocity data increasingly needs integration
• Traditional approaches, without Stream Processing, turn
into ETL+custom scripts+middleware+MQ
4
Streaming: Real-Time and Scalable
• Streaming is Forward-
thinking
• Real-Time and Scale
Becoming the Rule Not
the Exception
5
SUN MON TUES WED THU FRI SAT
BATCHREAL-TIME
SCALABILITY
ETL
EAI
STREAMING
PLATFORM
Point to point
• Old way
• Add another database? Repeat process
6
S
t
a
g
i
n
g
T
a
b
l
e
s
ERP
CRM
Financials
HR
BI Tools
BI Tools/
OLAP Clients
Physical
OLAP
Cubes
Physical
Object
ETL is Insufficient for this combination
• Data platforms operating at an enterprise-wide scale
• A high variety of data sources
• Real-time/streaming data
• ETL forces either real-time loading without being scalable or
scalability with batch loading
– Data, produced from numerous sources, is a torrent of flowing
information, needing to be timestamped, dispatched, and even
duplicated (to protect against data loss)
– A postman is needed to distribute data from message senders to
receivers at the right place at the right time.
7
Real-Time Data
• A.k.a. messaging, live feeds, real-time, event-driven
• Comes in continuously and often quickly, so we also call it
streaming data
• Needs special attention and can be of immense value, but
only if we are alerted in time
• Foundation for Artificial Intelligence excellence
– Stream data forms the core of data for artificial intelligence
8
Message Brokers
• Message Brokers are a way of decoupling the sending and
receiving services through the concept of Publish &
Subscribe
• Another thing Message Brokers do is queue or retain the
message till the consumer picks it up
• Streaming allows us to have both Pub-Sub as well as
queuing features (historically, either one or the other was
supported by such brokers
9
Streaming Architecture
Apps
10
Streaming
Platform
Change logs
Streaming data pipelines
Messaging /
Stream processing
Request - Response
DW
Technical
Support
Web Services
API
Big Data
Analysis
IDE / Developer
GUI
Hadoop
Parallel
Tools
Multi-Threaded
Math Libraries
Cluster
support
All Data Can Be Represented as Streams
11
Streaming
Platform
DW
Hadoop
RDBMS
NOSQL
Apps
Real-time Analytics
Search
Monitoring
Web Services
API
Big Data
Analysis
Parallel
Tools
Multi-Threaded
Math Libraries
Cluster
support
Streaming Data
• Unbounded, continuous flow of real-time records
• Stream APIs transform and enrich data
• Millisecond latency
• Stateless or stateful
• Incorporate data into your applications; deploy anywhere,
including containers
12
Enter Message-Oriented Middleware aka Streaming and
message queuing technology
• Messages can be any kind of data wrapped in a neat package
with a very simple header as a bow on top.
• Messages are sent by “producers”—systems, sensors, or
devices that generate the messages—toward a “broker.”
• A broker does not process the messages, but instead routes
them into queues according to the information enclosed in the
message header or its own routing process.
• Then “consumers” retrieve the messages from the queues to
which they subscribe (although sometimes messages are
pushed to consumers rather than pulled).
• The consumers open the messages and perform some kind of
action on them.
13
Streaming solutions
Intelligent data platform for fast data:
Connect, process, and store data in real-time
…in a unified, flexible solution
…able to meet demanding SLAs even at scale
…without operational burdens and complexity
14
Performance and scalability in streaming
15
Storage
Ability to retain varying volumes of
messages for varying lengths of time
Throughput
High, sustainable rate of message
processing
Latency
Fast, consistent responsiveness for
publishing and consumption
Operations
Minimizing operational burden for scaling,
tuning, and monitoring
Comprehensive capabilities
16
Stream-Native Functions
Apply processing functions on data
Multi-tenancy
A single cluster can support
many tenants and use cases
Durability
Data replicated and synced to
disk
Geo-replication
Out of box support for
geographically distributed
applications
Unified messaging model
Support both Topic & Queue
semantic in a single model
Delivery Guarantees
At least once, at most once
and effectively once
Scalability
Supports millions of topics in a
single cluster
Apache Kafka
• Open source streaming platform developed at LinkedIn
• A distributed publish-subscribe messaging system that maintains feeds of
messages called topics
– Publishers write data to topics and subscribers read from topics
– Kafka topics are partitioned and replicated across multiple nodes in your Hadoop
cluster
• Enables “source to sink” data pipelines
• Kafka messages are simple, byte-long arrays that can store objects in
virtually any format with a key attached to each message; often in JSON
• E&L in ETL through Kafka Connect API
• T in ETL through Kafka Streams API
• Fault-tolerant
• DIY
17
Sources and Sinks
18
Source
Sink
ConnectAPI
ConnectAPI
Application programming interfaces
• A ubiquitous method and de facto standard of communication among modern
information technologies.
• APIs have begun to replace older, more cumbersome methods of information
sharing with lightweight endpoints.
• Due to the popularity and proliferation of APIs and microservices, the need has
arisen to manage the multitude of services a company relies on—both internal
and external.
• Organizations depend on these services to be properly managed, with high
performance and availability.
19
API & Microservices Ecosystem
Public Private - External Private - Internal
Over 20,000 public APIs*
*according to https://www.programmableweb.com/apis/directory
External Partners Connected Apps & Data
20
The Need for Management
HTTP Basic Auth
OAuth2.0
OpenID
API Keys
Test
Production
Rate
limiting
Analytics
Transformations
Quotas
Caching
CORS
21
Platform Architecture
Load
Balancer
(Nginx,
HAProxy,
ELB, etc.)
API Nodes
Database Back End
API Endpoint 1
API Endpoint 2
API Endpoint…n
Client 1
Client 2
Client …n
22
API Requirements
• Performance: Good for high performance
workloads (>1,000TPS)
• Reliability: All workloads completed with 100%
message completion
• Complexity: Multiple plugins enabled
23
RabbitMQ
• Open source message broker platform
• Created in 2007 and is managed by Pivotal Software
• Uses an exchange to receive messages from brokers and pushes them to the
registered consumers
• The broker pushes messages—which are queued in random order—toward the
consumers.
• Brokers are persistently connected to consumer, and they know which ones are
subscribed to which queues
• Consumers cannot fetch specific messages, but can receive them unordered
– unaware of the queue state
• Messages, queues, and exchanges do not persist unless otherwise instructed.
– If a broker is restarted or fails, the messages are lost
– Has settings to make both queues and messages durable. Moreover, non-critical messages
can be tagged by the producer to not be sent to a durable queue
• Allows producers’ and consumers’ code to declare new queues and exchanges
• Several replication and load balancing alternatives
24
Amazon Kinesis
• Similar to Kafka
• In enterprise-ready package
• Amazon users pay for by the shard-hour and payload
25
Apache Pulsar
• Originally developed at Yahoo
• Began its incubation at Apache in late 2016
• Has been in production at Yahoo since 2013
• Utilized in popular services and applications like Yahoo! Mail, Finance, Sports,
Flickr, Gemini Ads, and Sherpa
• Follows the publisher-subscriber model (pub-sub), and has the same producers,
topics, and consumers as some of the aforementioned technologies
• Uses built-in multi-datacenter replication
• Architected for multi-tenancy and uses concepts of properties and namespaces
26
Streamlio
• Enterprise-ready deployment of Pulsar
• Unified solution for connecting, processing and storing fast-moving
data
• The unified messaging model has three components:
• Consumption
• Acknowledgement
• Retention
• Three modes of subscription: exclusive, failover, and shared.
• Supports both persistent and non-persistent states.
• Has a configurable time-to-live (TTL) feature than can be set to handle
messages that have not been consumed.
• A unified platform gives enterprises the best of both the streaming
and message queuing worlds.
27
Workloads are Distinguished by
• The number of topics
• The size of the messages being produced and consumed
• The number of subscriptions per topic
• The number of producers per topic
• The rate at which producers produce messages (per
second)
• The size of the consumer’s backlog (in gigabytes)
28
Creating a Streaming Application
• Configure the Application
• Serialize data
• Set up tables for change logs
29
Migrating ETL to Stream Processing
• Sessionization of event data
• Tools to acquire:
– Message bus
– Data storage (i.e., HDFS with S3)
– Operations support
30
Biggest Challenges in Streaming
• Getting data live at scale
• Accenting data with metadata
• Misordered events
• Job recovery
• High operational workload
31
Future of Data Integration
32
Source
Dest
ConnectAPI
ConnectAPI
Streaming
Solution
Streams API
App
Transformations
Streaming
PlatformDW
Hadoop
RDBMS
NOSQLApps
Real-time Analytics
Search
Monitoring
Web Services
API
Big Data
Analysis
Parallel
Tools
Multi-Threaded
Math Libraries
Cluster
support
In Conclusion
• Streaming and message queuing have lasting value to
organizations.
• They will be as prevalent as ETL was and is in the world of data
warehousing and integration.
• APIs have begun to replace older, more cumbersome methods
of information sharing with lightweight endpoints.
• Streaming and messaging will be able to meet the data
volume, variety, and timing requirements of the coming years.
• Data-driven organizations will benefit from these technologies
because it will allow them to ingest data and operate at a scale
that would have been practically impossible just a few years
ago.
33
Second Thursday of
Every Month, at 2:00 ET
Presented by: William McKnight
President, McKnight Consulting Group
www.mcknightcg.com (214) 514-1444
#AdvAnalytics

More Related Content

What's hot

Core banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov OsloCore banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov OsloAlexander Petrov
 
The Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big InformationThe Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big InformationDATAVERSITY
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data modelDATAVERSITY
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsDATAVERSITY
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DATAVERSITY
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...DLT Solutions
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDATAVERSITY
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Metadata Governance for Vocabularies, Dictionaries, and Data
Metadata Governance for Vocabularies, Dictionaries, and DataMetadata Governance for Vocabularies, Dictionaries, and Data
Metadata Governance for Vocabularies, Dictionaries, and DataDATAVERSITY
 
Data-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata StrategiesData-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata StrategiesDATAVERSITY
 
Next generation Data Governance
Next generation Data GovernanceNext generation Data Governance
Next generation Data GovernanceVladimiro Borsi
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitMing Yuan
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
RWDG Slides: Data Governance and Three Levels of Metadata Management
RWDG Slides: Data Governance and Three Levels of Metadata ManagementRWDG Slides: Data Governance and Three Levels of Metadata Management
RWDG Slides: Data Governance and Three Levels of Metadata ManagementDATAVERSITY
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data ArchitectureSammer Qader
 

What's hot (20)

Core banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov OsloCore banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
 
The Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big InformationThe Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big Information
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data model
 
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced AnalyticsADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance Framework
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Metadata Governance for Vocabularies, Dictionaries, and Data
Metadata Governance for Vocabularies, Dictionaries, and DataMetadata Governance for Vocabularies, Dictionaries, and Data
Metadata Governance for Vocabularies, Dictionaries, and Data
 
Data-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata StrategiesData-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata Strategies
 
Next generation Data Governance
Next generation Data GovernanceNext generation Data Governance
Next generation Data Governance
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
RWDG Slides: Data Governance and Three Levels of Metadata Management
RWDG Slides: Data Governance and Three Levels of Metadata ManagementRWDG Slides: Data Governance and Three Levels of Metadata Management
RWDG Slides: Data Governance and Three Levels of Metadata Management
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data Architecture
 

Similar to ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware

Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibDavid Nzoputa Ofili
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...
Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...
Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...IndicThreads
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1lisanl
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
The Overview of Microservices Architecture
The Overview of Microservices ArchitectureThe Overview of Microservices Architecture
The Overview of Microservices ArchitectureParia Heidari
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
API’s and Micro Services 0.5
API’s and Micro Services 0.5API’s and Micro Services 0.5
API’s and Micro Services 0.5Richard Hudson
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...StreamNative
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Denodo
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 

Similar to ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware (20)

Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...
Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...
Power Your Mobile Applications On The Cloud [IndicThreads Mobile Application ...
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
The Overview of Microservices Architecture
The Overview of Microservices ArchitectureThe Overview of Microservices Architecture
The Overview of Microservices Architecture
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
API’s and Micro Services 0.5
API’s and Micro Services 0.5API’s and Micro Services 0.5
API’s and Micro Services 0.5
 
Microservices-101
Microservices-101Microservices-101
Microservices-101
 
Big Data
Big DataBig Data
Big Data
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware

  • 1. William McKnight www.mcknightcg.com 214-514-1444 Trends in Streaming Analytics and Message-oriented Middleware @williammcknight
  • 2. The ETL Legacy • An ad hoc manner of connecting sources and destinations • ETL surfaced in the 1990s – Far fewer data platforms and types – Built for DW – Bottleneck in DW population – Time and Resource intensive – Batch • Can be chaotic and unmanageable 2
  • 3. EAI • Then came EAI – Facilitate exchange of business transactions messages between applications – Used Enterprise Service classes underneath the covers – Works for small scale data – Not designed to handle the span of data that is required for modern day, like sensors 3
  • 4. Modern Realities of Data Integration • Desire for consolidated methods for data integration • New types of data sources – Logs, sensors, etc. • We have more than OLTP and OLAP – Distributed data platforms • Desire for real-time data • High-velocity data increasingly needs integration • Traditional approaches, without Stream Processing, turn into ETL+custom scripts+middleware+MQ 4
  • 5. Streaming: Real-Time and Scalable • Streaming is Forward- thinking • Real-Time and Scale Becoming the Rule Not the Exception 5 SUN MON TUES WED THU FRI SAT BATCHREAL-TIME SCALABILITY ETL EAI STREAMING PLATFORM
  • 6. Point to point • Old way • Add another database? Repeat process 6 S t a g i n g T a b l e s ERP CRM Financials HR BI Tools BI Tools/ OLAP Clients Physical OLAP Cubes Physical Object
  • 7. ETL is Insufficient for this combination • Data platforms operating at an enterprise-wide scale • A high variety of data sources • Real-time/streaming data • ETL forces either real-time loading without being scalable or scalability with batch loading – Data, produced from numerous sources, is a torrent of flowing information, needing to be timestamped, dispatched, and even duplicated (to protect against data loss) – A postman is needed to distribute data from message senders to receivers at the right place at the right time. 7
  • 8. Real-Time Data • A.k.a. messaging, live feeds, real-time, event-driven • Comes in continuously and often quickly, so we also call it streaming data • Needs special attention and can be of immense value, but only if we are alerted in time • Foundation for Artificial Intelligence excellence – Stream data forms the core of data for artificial intelligence 8
  • 9. Message Brokers • Message Brokers are a way of decoupling the sending and receiving services through the concept of Publish & Subscribe • Another thing Message Brokers do is queue or retain the message till the consumer picks it up • Streaming allows us to have both Pub-Sub as well as queuing features (historically, either one or the other was supported by such brokers 9
  • 10. Streaming Architecture Apps 10 Streaming Platform Change logs Streaming data pipelines Messaging / Stream processing Request - Response DW Technical Support Web Services API Big Data Analysis IDE / Developer GUI Hadoop Parallel Tools Multi-Threaded Math Libraries Cluster support
  • 11. All Data Can Be Represented as Streams 11 Streaming Platform DW Hadoop RDBMS NOSQL Apps Real-time Analytics Search Monitoring Web Services API Big Data Analysis Parallel Tools Multi-Threaded Math Libraries Cluster support
  • 12. Streaming Data • Unbounded, continuous flow of real-time records • Stream APIs transform and enrich data • Millisecond latency • Stateless or stateful • Incorporate data into your applications; deploy anywhere, including containers 12
  • 13. Enter Message-Oriented Middleware aka Streaming and message queuing technology • Messages can be any kind of data wrapped in a neat package with a very simple header as a bow on top. • Messages are sent by “producers”—systems, sensors, or devices that generate the messages—toward a “broker.” • A broker does not process the messages, but instead routes them into queues according to the information enclosed in the message header or its own routing process. • Then “consumers” retrieve the messages from the queues to which they subscribe (although sometimes messages are pushed to consumers rather than pulled). • The consumers open the messages and perform some kind of action on them. 13
  • 14. Streaming solutions Intelligent data platform for fast data: Connect, process, and store data in real-time …in a unified, flexible solution …able to meet demanding SLAs even at scale …without operational burdens and complexity 14
  • 15. Performance and scalability in streaming 15 Storage Ability to retain varying volumes of messages for varying lengths of time Throughput High, sustainable rate of message processing Latency Fast, consistent responsiveness for publishing and consumption Operations Minimizing operational burden for scaling, tuning, and monitoring
  • 16. Comprehensive capabilities 16 Stream-Native Functions Apply processing functions on data Multi-tenancy A single cluster can support many tenants and use cases Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Topic & Queue semantic in a single model Delivery Guarantees At least once, at most once and effectively once Scalability Supports millions of topics in a single cluster
  • 17. Apache Kafka • Open source streaming platform developed at LinkedIn • A distributed publish-subscribe messaging system that maintains feeds of messages called topics – Publishers write data to topics and subscribers read from topics – Kafka topics are partitioned and replicated across multiple nodes in your Hadoop cluster • Enables “source to sink” data pipelines • Kafka messages are simple, byte-long arrays that can store objects in virtually any format with a key attached to each message; often in JSON • E&L in ETL through Kafka Connect API • T in ETL through Kafka Streams API • Fault-tolerant • DIY 17
  • 19. Application programming interfaces • A ubiquitous method and de facto standard of communication among modern information technologies. • APIs have begun to replace older, more cumbersome methods of information sharing with lightweight endpoints. • Due to the popularity and proliferation of APIs and microservices, the need has arisen to manage the multitude of services a company relies on—both internal and external. • Organizations depend on these services to be properly managed, with high performance and availability. 19
  • 20. API & Microservices Ecosystem Public Private - External Private - Internal Over 20,000 public APIs* *according to https://www.programmableweb.com/apis/directory External Partners Connected Apps & Data 20
  • 21. The Need for Management HTTP Basic Auth OAuth2.0 OpenID API Keys Test Production Rate limiting Analytics Transformations Quotas Caching CORS 21
  • 22. Platform Architecture Load Balancer (Nginx, HAProxy, ELB, etc.) API Nodes Database Back End API Endpoint 1 API Endpoint 2 API Endpoint…n Client 1 Client 2 Client …n 22
  • 23. API Requirements • Performance: Good for high performance workloads (>1,000TPS) • Reliability: All workloads completed with 100% message completion • Complexity: Multiple plugins enabled 23
  • 24. RabbitMQ • Open source message broker platform • Created in 2007 and is managed by Pivotal Software • Uses an exchange to receive messages from brokers and pushes them to the registered consumers • The broker pushes messages—which are queued in random order—toward the consumers. • Brokers are persistently connected to consumer, and they know which ones are subscribed to which queues • Consumers cannot fetch specific messages, but can receive them unordered – unaware of the queue state • Messages, queues, and exchanges do not persist unless otherwise instructed. – If a broker is restarted or fails, the messages are lost – Has settings to make both queues and messages durable. Moreover, non-critical messages can be tagged by the producer to not be sent to a durable queue • Allows producers’ and consumers’ code to declare new queues and exchanges • Several replication and load balancing alternatives 24
  • 25. Amazon Kinesis • Similar to Kafka • In enterprise-ready package • Amazon users pay for by the shard-hour and payload 25
  • 26. Apache Pulsar • Originally developed at Yahoo • Began its incubation at Apache in late 2016 • Has been in production at Yahoo since 2013 • Utilized in popular services and applications like Yahoo! Mail, Finance, Sports, Flickr, Gemini Ads, and Sherpa • Follows the publisher-subscriber model (pub-sub), and has the same producers, topics, and consumers as some of the aforementioned technologies • Uses built-in multi-datacenter replication • Architected for multi-tenancy and uses concepts of properties and namespaces 26
  • 27. Streamlio • Enterprise-ready deployment of Pulsar • Unified solution for connecting, processing and storing fast-moving data • The unified messaging model has three components: • Consumption • Acknowledgement • Retention • Three modes of subscription: exclusive, failover, and shared. • Supports both persistent and non-persistent states. • Has a configurable time-to-live (TTL) feature than can be set to handle messages that have not been consumed. • A unified platform gives enterprises the best of both the streaming and message queuing worlds. 27
  • 28. Workloads are Distinguished by • The number of topics • The size of the messages being produced and consumed • The number of subscriptions per topic • The number of producers per topic • The rate at which producers produce messages (per second) • The size of the consumer’s backlog (in gigabytes) 28
  • 29. Creating a Streaming Application • Configure the Application • Serialize data • Set up tables for change logs 29
  • 30. Migrating ETL to Stream Processing • Sessionization of event data • Tools to acquire: – Message bus – Data storage (i.e., HDFS with S3) – Operations support 30
  • 31. Biggest Challenges in Streaming • Getting data live at scale • Accenting data with metadata • Misordered events • Job recovery • High operational workload 31
  • 32. Future of Data Integration 32 Source Dest ConnectAPI ConnectAPI Streaming Solution Streams API App Transformations Streaming PlatformDW Hadoop RDBMS NOSQLApps Real-time Analytics Search Monitoring Web Services API Big Data Analysis Parallel Tools Multi-Threaded Math Libraries Cluster support
  • 33. In Conclusion • Streaming and message queuing have lasting value to organizations. • They will be as prevalent as ETL was and is in the world of data warehousing and integration. • APIs have begun to replace older, more cumbersome methods of information sharing with lightweight endpoints. • Streaming and messaging will be able to meet the data volume, variety, and timing requirements of the coming years. • Data-driven organizations will benefit from these technologies because it will allow them to ingest data and operate at a scale that would have been practically impossible just a few years ago. 33
  • 34. Second Thursday of Every Month, at 2:00 ET Presented by: William McKnight President, McKnight Consulting Group www.mcknightcg.com (214) 514-1444 #AdvAnalytics