SlideShare a Scribd company logo
1 of 25
1 Private and Confidential1 Public
In-Stream Processing Service Blueprint
Reference architecture for real-time Big Data applications
Anton Ovchinnikov, Data Scientist, Grid Dynamics
2
What we’ll talk about today
• What’s In-Stream Processing?
• How it’s used to process huge data streams in real time
• How to build in-stream processing with open source
• All about scale and reliability considerations
• Example of large-scale customer implementation
• Where can you learn more (hint: blog.griddynamics.com)
3
In a complex landscape of Big Data systems…
4
…In-Stream Processing Service is
an approach to build real-time Big Data applications
Today’s focus
5
• Fraud detection
• Sentiment analytics
• Preventive maintenance
• Facilities optimization
• Network monitoring
• Intelligence and surveillance
• Risk management
• E-commerce
• Clickstream analytics
• Dynamic pricing
• Supply chain optimization
• Predictive medicine
• Transaction cost analysis
• Market data management
• Algorithmic trading
• Data warehouse augmentation
Multiple industries and use cases
6
Open source world is diverse and confusing
7
What credentials do we have to talk about this?
Big Data history @Grid Dynamics
8
Blueprint goals
Pre-integrated Enterprise-gradeCloud-ready
Proven mission-
critical use
Built 100% from
leading open
source projects
Production-ready
Portable across
clouds
Extendable
9
Target performance & reliability SLAs
Throughput Up to 100,000 events per second
Latency 1-60 seconds
Retention Raw data and results archived for 30 days
Reliability Built-in data loss mitigation mechanism in case of faults
Availability 99.999 on commodity cloud infrastructure
10
Conceptual architecture
11
Selected stack
• REST API
• Message
Queue
• HDFS
12
Every component is scalable in its own way
• No single point of failure
• Automatic failover
• Data replication
13
Multi-node Kafka cluster
• Undisputed modern choice
for real time MOM
• Retention and replay
• Scalable via partitioning
• Persistent
• Super-fast
14
Single Zookeeper cluster for all components
• Distributed coordination
service, facilitates HA
of other clustered services
• Guaranteed
consistent storage
• Client monitoring
• Leader election
15
Spark streaming:
Central component of the platform
Why Spark streaming?
• Leading In-Stream Processing middleware
• Active community
• Rate of adoption
• Vendor support
• Excellent integration with Hadoop eco-
system
Key architectural considerations
• Runs on top of Hadoop
• Out-of-the-box integration with Kafka
• Support for machine learning pipelines
16
Cassandra as operational store
• Massively scalable, highly available
NoSQL database
• Ideal choice as large operational
store (100s of GB) for streaming
applications
• Needed when event processing is
stateful, and the state is quite large
Example: stores user profiles as
they are being updated in real time
from clickstreams
17
REDIS as a lookup database
• Simple, cheap and super-
performant lookup store
• Needed when event processing
requires frequent access to GBs of
reference data
• Can be updated from outside
• Master/slave architecture
Example: IP geo-mapping,
dictionaries, training sets
18
Putting all the pieces together:
end-to-end platform configuration
• No single points of failure
• No bottlenecks
• Scaling or recovering any
component cluster mitigates
availability issues
• Caveat: pathologies do
happen, even in this design –
for example, dynamic
repartitioning is not supported
19
Out of scope
• Dashboards for visualization, monitoring, and reporting
of In-Stream Processing results
• Tools and modeling environments for data scientists
• Logging and monitoring
• Data encryption, in motion or at rest
• Dynamic re-partitioning is not supported and will require down-time
• Disaster recovery
• Infrastructure provisioning and deployment automation
20
Case study:
Large media agency
Business opportunity
• Real-time popularity trends for
the content across all client’s
properties drives audience
coverage with the most
interesting, trending articles
Work done
• Implementation used Amazon
Kinesis & Amazon
EMR/Spark Streaming stack
• Data egress: S3 and Amazon
Redshift
21
Case study:
Implementation details
22
Summary
• In-Stream Processing is a hot new technology
• It can process mind-boggling volumes of events in real-time and discover insights
• You can build a whole platform with 100% open source components
• We give you a complete blueprint on how to put it together
• It will run on any public cloud
23
What we didn’t get to talk about today
• Docker, Docker, Docker: how to make auto-deployment and auto-scaling work
• Data scientist’s kitchen: what they are doing when no one is watching
• Cloud sandbox for our In-Stream Processing Blueprint: how to take it for a spin on AWS
• Demo app: see how social analytics is done using the blueprint
• All this, and more: coming up soon in our blog (blog.griddynamics.com)
• Please, subscribe!
24
Thank you!
Anton Ovchinnikov: aovchinnikov@griddynamics.com
Grid Dynamics blog: blog.griddynamics.com
Follow up on twitter: @griddynamics
We are hiring!
https://www.griddynamics.com/careers
In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications

More Related Content

What's hot

RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_Containers
Jerome Marc
 
Commerical imaging constellation
Commerical imaging constellation Commerical imaging constellation
Commerical imaging constellation
SkyboxImaging
 

What's hot (20)

Capgemini: Observability within the Dutch government
Capgemini: Observability within the Dutch governmentCapgemini: Observability within the Dutch government
Capgemini: Observability within the Dutch government
 
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
 
Webinar: Automating the Creation and Use of Virtual Testing Environments
Webinar: Automating the Creation and Use of Virtual Testing Environments Webinar: Automating the Creation and Use of Virtual Testing Environments
Webinar: Automating the Creation and Use of Virtual Testing Environments
 
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API GatewaysDevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_Containers
 
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Containers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecasesContainers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecases
 
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
Using Apache Spark for Predicting Degrading and Failing Parts in AviationUsing Apache Spark for Predicting Degrading and Failing Parts in Aviation
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoring
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
REST API Overview with Nutanix
REST API Overview with Nutanix REST API Overview with Nutanix
REST API Overview with Nutanix
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
GitLab Integration Adapter - Datasheet
GitLab Integration Adapter - DatasheetGitLab Integration Adapter - Datasheet
GitLab Integration Adapter - Datasheet
 
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and StrategiesDigital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Nutanix basic
Nutanix basicNutanix basic
Nutanix basic
 
Moving from application automation to true DevOps by including the database
Moving from application automation to true DevOps by including the databaseMoving from application automation to true DevOps by including the database
Moving from application automation to true DevOps by including the database
 
Netherlands OSUG | Sep 30
Netherlands OSUG | Sep 30Netherlands OSUG | Sep 30
Netherlands OSUG | Sep 30
 
Commerical imaging constellation
Commerical imaging constellation Commerical imaging constellation
Commerical imaging constellation
 

Similar to In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications

Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
Koltiva
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloud
Akshay Mathur
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 

Similar to In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications (20)

Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
The Gib Five - Modern IT Architecture
The Gib Five - Modern IT ArchitectureThe Gib Five - Modern IT Architecture
The Gib Five - Modern IT Architecture
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloud
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Big Data and Semantic Web in Manufacturing
Big Data and Semantic Web in ManufacturingBig Data and Semantic Web in Manufacturing
Big Data and Semantic Web in Manufacturing
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Architecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12cArchitecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12c
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
 

More from Grid Dynamics

Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 

More from Grid Dynamics (20)

Are you keeping up with your customer
Are you keeping up with your customer Are you keeping up with your customer
Are you keeping up with your customer
 
"Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,..."Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,...
 
"How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like..."How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like...
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D
 
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
 
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
 
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
 
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
 
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul..."Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
 
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...
 
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav..."Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
 
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
 
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
 
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud..."ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
 
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...
 
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
 
Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...
 
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
 
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications

  • 1. 1 Private and Confidential1 Public In-Stream Processing Service Blueprint Reference architecture for real-time Big Data applications Anton Ovchinnikov, Data Scientist, Grid Dynamics
  • 2. 2 What we’ll talk about today • What’s In-Stream Processing? • How it’s used to process huge data streams in real time • How to build in-stream processing with open source • All about scale and reliability considerations • Example of large-scale customer implementation • Where can you learn more (hint: blog.griddynamics.com)
  • 3. 3 In a complex landscape of Big Data systems…
  • 4. 4 …In-Stream Processing Service is an approach to build real-time Big Data applications Today’s focus
  • 5. 5 • Fraud detection • Sentiment analytics • Preventive maintenance • Facilities optimization • Network monitoring • Intelligence and surveillance • Risk management • E-commerce • Clickstream analytics • Dynamic pricing • Supply chain optimization • Predictive medicine • Transaction cost analysis • Market data management • Algorithmic trading • Data warehouse augmentation Multiple industries and use cases
  • 6. 6 Open source world is diverse and confusing
  • 7. 7 What credentials do we have to talk about this? Big Data history @Grid Dynamics
  • 8. 8 Blueprint goals Pre-integrated Enterprise-gradeCloud-ready Proven mission- critical use Built 100% from leading open source projects Production-ready Portable across clouds Extendable
  • 9. 9 Target performance & reliability SLAs Throughput Up to 100,000 events per second Latency 1-60 seconds Retention Raw data and results archived for 30 days Reliability Built-in data loss mitigation mechanism in case of faults Availability 99.999 on commodity cloud infrastructure
  • 11. 11 Selected stack • REST API • Message Queue • HDFS
  • 12. 12 Every component is scalable in its own way • No single point of failure • Automatic failover • Data replication
  • 13. 13 Multi-node Kafka cluster • Undisputed modern choice for real time MOM • Retention and replay • Scalable via partitioning • Persistent • Super-fast
  • 14. 14 Single Zookeeper cluster for all components • Distributed coordination service, facilitates HA of other clustered services • Guaranteed consistent storage • Client monitoring • Leader election
  • 15. 15 Spark streaming: Central component of the platform Why Spark streaming? • Leading In-Stream Processing middleware • Active community • Rate of adoption • Vendor support • Excellent integration with Hadoop eco- system Key architectural considerations • Runs on top of Hadoop • Out-of-the-box integration with Kafka • Support for machine learning pipelines
  • 16. 16 Cassandra as operational store • Massively scalable, highly available NoSQL database • Ideal choice as large operational store (100s of GB) for streaming applications • Needed when event processing is stateful, and the state is quite large Example: stores user profiles as they are being updated in real time from clickstreams
  • 17. 17 REDIS as a lookup database • Simple, cheap and super- performant lookup store • Needed when event processing requires frequent access to GBs of reference data • Can be updated from outside • Master/slave architecture Example: IP geo-mapping, dictionaries, training sets
  • 18. 18 Putting all the pieces together: end-to-end platform configuration • No single points of failure • No bottlenecks • Scaling or recovering any component cluster mitigates availability issues • Caveat: pathologies do happen, even in this design – for example, dynamic repartitioning is not supported
  • 19. 19 Out of scope • Dashboards for visualization, monitoring, and reporting of In-Stream Processing results • Tools and modeling environments for data scientists • Logging and monitoring • Data encryption, in motion or at rest • Dynamic re-partitioning is not supported and will require down-time • Disaster recovery • Infrastructure provisioning and deployment automation
  • 20. 20 Case study: Large media agency Business opportunity • Real-time popularity trends for the content across all client’s properties drives audience coverage with the most interesting, trending articles Work done • Implementation used Amazon Kinesis & Amazon EMR/Spark Streaming stack • Data egress: S3 and Amazon Redshift
  • 22. 22 Summary • In-Stream Processing is a hot new technology • It can process mind-boggling volumes of events in real-time and discover insights • You can build a whole platform with 100% open source components • We give you a complete blueprint on how to put it together • It will run on any public cloud
  • 23. 23 What we didn’t get to talk about today • Docker, Docker, Docker: how to make auto-deployment and auto-scaling work • Data scientist’s kitchen: what they are doing when no one is watching • Cloud sandbox for our In-Stream Processing Blueprint: how to take it for a spin on AWS • Demo app: see how social analytics is done using the blueprint • All this, and more: coming up soon in our blog (blog.griddynamics.com) • Please, subscribe!
  • 24. 24 Thank you! Anton Ovchinnikov: aovchinnikov@griddynamics.com Grid Dynamics blog: blog.griddynamics.com Follow up on twitter: @griddynamics We are hiring! https://www.griddynamics.com/careers