SlideShare a Scribd company logo
1 Private and Confidential1 Public
In-Stream Processing Service Blueprint
Reference architecture for real-time Big Data applications
Anton Ovchinnikov, Data Scientist, Grid Dynamics
2
What we’ll talk about today
• What’s In-Stream Processing?
• How it’s used to process huge data streams in real time
• How to build in-stream processing with open source
• All about scale and reliability considerations
• Example of large-scale customer implementation
• Where can you learn more (hint: blog.griddynamics.com)
3
In a complex landscape of Big Data systems…
4
…In-Stream Processing Service is
an approach to build real-time Big Data applications
Today’s focus
5
• Fraud detection
• Sentiment analytics
• Preventive maintenance
• Facilities optimization
• Network monitoring
• Intelligence and surveillance
• Risk management
• E-commerce
• Clickstream analytics
• Dynamic pricing
• Supply chain optimization
• Predictive medicine
• Transaction cost analysis
• Market data management
• Algorithmic trading
• Data warehouse augmentation
Multiple industries and use cases
6
Open source world is diverse and confusing
7
What credentials do we have to talk about this?
Big Data history @Grid Dynamics
8
Blueprint goals
Pre-integrated Enterprise-gradeCloud-ready
Proven mission-
critical use
Built 100% from
leading open
source projects
Production-ready
Portable across
clouds
Extendable
9
Target performance & reliability SLAs
Throughput Up to 100,000 events per second
Latency 1-60 seconds
Retention Raw data and results archived for 30 days
Reliability Built-in data loss mitigation mechanism in case of faults
Availability 99.999 on commodity cloud infrastructure
10
Conceptual architecture
11
Selected stack
• REST API
• Message
Queue
• HDFS
12
Every component is scalable in its own way
• No single point of failure
• Automatic failover
• Data replication
13
Multi-node Kafka cluster
• Undisputed modern choice
for real time MOM
• Retention and replay
• Scalable via partitioning
• Persistent
• Super-fast
14
Single Zookeeper cluster for all components
• Distributed coordination
service, facilitates HA
of other clustered services
• Guaranteed
consistent storage
• Client monitoring
• Leader election
15
Spark streaming:
Central component of the platform
Why Spark streaming?
• Leading In-Stream Processing middleware
• Active community
• Rate of adoption
• Vendor support
• Excellent integration with Hadoop eco-
system
Key architectural considerations
• Runs on top of Hadoop
• Out-of-the-box integration with Kafka
• Support for machine learning pipelines
16
Cassandra as operational store
• Massively scalable, highly available
NoSQL database
• Ideal choice as large operational
store (100s of GB) for streaming
applications
• Needed when event processing is
stateful, and the state is quite large
Example: stores user profiles as
they are being updated in real time
from clickstreams
17
REDIS as a lookup database
• Simple, cheap and super-
performant lookup store
• Needed when event processing
requires frequent access to GBs of
reference data
• Can be updated from outside
• Master/slave architecture
Example: IP geo-mapping,
dictionaries, training sets
18
Putting all the pieces together:
end-to-end platform configuration
• No single points of failure
• No bottlenecks
• Scaling or recovering any
component cluster mitigates
availability issues
• Caveat: pathologies do
happen, even in this design –
for example, dynamic
repartitioning is not supported
19
Out of scope
• Dashboards for visualization, monitoring, and reporting
of In-Stream Processing results
• Tools and modeling environments for data scientists
• Logging and monitoring
• Data encryption, in motion or at rest
• Dynamic re-partitioning is not supported and will require down-time
• Disaster recovery
• Infrastructure provisioning and deployment automation
20
Case study:
Large media agency
Business opportunity
• Real-time popularity trends for
the content across all client’s
properties drives audience
coverage with the most
interesting, trending articles
Work done
• Implementation used Amazon
Kinesis & Amazon
EMR/Spark Streaming stack
• Data egress: S3 and Amazon
Redshift
21
Case study:
Implementation details
22
Summary
• In-Stream Processing is a hot new technology
• It can process mind-boggling volumes of events in real-time and discover insights
• You can build a whole platform with 100% open source components
• We give you a complete blueprint on how to put it together
• It will run on any public cloud
23
What we didn’t get to talk about today
• Docker, Docker, Docker: how to make auto-deployment and auto-scaling work
• Data scientist’s kitchen: what they are doing when no one is watching
• Cloud sandbox for our In-Stream Processing Blueprint: how to take it for a spin on AWS
• Demo app: see how social analytics is done using the blueprint
• All this, and more: coming up soon in our blog (blog.griddynamics.com)
• Please, subscribe!
24
Thank you!
Anton Ovchinnikov: aovchinnikov@griddynamics.com
Grid Dynamics blog: blog.griddynamics.com
Follow up on twitter: @griddynamics
We are hiring!
https://www.griddynamics.com/careers
In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications

More Related Content

What's hot

Capgemini: Observability within the Dutch government
Capgemini: Observability within the Dutch governmentCapgemini: Observability within the Dutch government
Capgemini: Observability within the Dutch government
Elasticsearch
 
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
Daniel Bryant
 
Webinar: Automating the Creation and Use of Virtual Testing Environments
Webinar: Automating the Creation and Use of Virtual Testing Environments Webinar: Automating the Creation and Use of Virtual Testing Environments
Webinar: Automating the Creation and Use of Virtual Testing Environments
Skytap Cloud
 
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API GatewaysDevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
Daniel Bryant
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersJerome Marc
 
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Red Hat Developers
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
Rakuten Group, Inc.
 
Containers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecasesContainers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecases
Ashnikbiz
 
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
Using Apache Spark for Predicting Degrading and Failing Parts in AviationUsing Apache Spark for Predicting Degrading and Failing Parts in Aviation
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
Databricks
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoring
Gang Tao
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
DevOps.com
 
REST API Overview with Nutanix
REST API Overview with Nutanix REST API Overview with Nutanix
REST API Overview with Nutanix
NEXTtour
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
atSistemas
 
GitLab Integration Adapter - Datasheet
GitLab Integration Adapter - DatasheetGitLab Integration Adapter - Datasheet
GitLab Integration Adapter - Datasheet
Kovair
 
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and StrategiesDigital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
HostedbyConfluent
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
Robert Metzger
 
Nutanix basic
Nutanix basicNutanix basic
Nutanix basic
ganang saputro
 
Moving from application automation to true DevOps by including the database
Moving from application automation to true DevOps by including the databaseMoving from application automation to true DevOps by including the database
Moving from application automation to true DevOps by including the database
Red Gate Software
 
Netherlands OSUG | Sep 30
Netherlands OSUG | Sep 30Netherlands OSUG | Sep 30
Netherlands OSUG | Sep 30
CatarinaPereira64715
 
Commerical imaging constellation
Commerical imaging constellation Commerical imaging constellation
Commerical imaging constellation SkyboxImaging
 

What's hot (20)

Capgemini: Observability within the Dutch government
Capgemini: Observability within the Dutch governmentCapgemini: Observability within the Dutch government
Capgemini: Observability within the Dutch government
 
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
 
Webinar: Automating the Creation and Use of Virtual Testing Environments
Webinar: Automating the Creation and Use of Virtual Testing Environments Webinar: Automating the Creation and Use of Virtual Testing Environments
Webinar: Automating the Creation and Use of Virtual Testing Environments
 
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API GatewaysDevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
DevOpsCon 2020: The Past, Present, and Future of Cloud Native API Gateways
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_Containers
 
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Containers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecasesContainers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecases
 
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
Using Apache Spark for Predicting Degrading and Failing Parts in AviationUsing Apache Spark for Predicting Degrading and Failing Parts in Aviation
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoring
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
REST API Overview with Nutanix
REST API Overview with Nutanix REST API Overview with Nutanix
REST API Overview with Nutanix
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
GitLab Integration Adapter - Datasheet
GitLab Integration Adapter - DatasheetGitLab Integration Adapter - Datasheet
GitLab Integration Adapter - Datasheet
 
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and StrategiesDigital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Nutanix basic
Nutanix basicNutanix basic
Nutanix basic
 
Moving from application automation to true DevOps by including the database
Moving from application automation to true DevOps by including the databaseMoving from application automation to true DevOps by including the database
Moving from application automation to true DevOps by including the database
 
Netherlands OSUG | Sep 30
Netherlands OSUG | Sep 30Netherlands OSUG | Sep 30
Netherlands OSUG | Sep 30
 
Commerical imaging constellation
Commerical imaging constellation Commerical imaging constellation
Commerical imaging constellation
 

Similar to In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications

Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
CEE-SEC(R)
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart CityKoltiva
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Grid Dynamics
 
The Gib Five - Modern IT Architecture
The Gib Five - Modern IT ArchitectureThe Gib Five - Modern IT Architecture
The Gib Five - Modern IT Architecture
Anatole Tresch
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudAkshay Mathur
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Kai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
confluent
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
Big Data and Semantic Web in Manufacturing
Big Data and Semantic Web in ManufacturingBig Data and Semantic Web in Manufacturing
Big Data and Semantic Web in Manufacturing
Nitesh Khilwani
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Architecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12cArchitecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12c
Gustavo Rene Antunez
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
Ali Hodroj
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
OVHcloud
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld
 

Similar to In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications (20)

Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
The Gib Five - Modern IT Architecture
The Gib Five - Modern IT ArchitectureThe Gib Five - Modern IT Architecture
The Gib Five - Modern IT Architecture
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloud
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Big Data and Semantic Web in Manufacturing
Big Data and Semantic Web in ManufacturingBig Data and Semantic Web in Manufacturing
Big Data and Semantic Web in Manufacturing
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Architecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12cArchitecting Your Own DBaaS in a Private Cloud with EM12c
Architecting Your Own DBaaS in a Private Cloud with EM12c
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
From Data to Services at the Speed of Business
From Data to Services at the Speed of BusinessFrom Data to Services at the Speed of Business
From Data to Services at the Speed of Business
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
 

More from Grid Dynamics

Are you keeping up with your customer
Are you keeping up with your customer Are you keeping up with your customer
Are you keeping up with your customer
Grid Dynamics
 
"Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,..."Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,...
Grid Dynamics
 
"How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like..."How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like...
Grid Dynamics
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D
Grid Dynamics
 
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Grid Dynamics
 
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Grid Dynamics
 
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Grid Dynamics
 
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul..."Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
Grid Dynamics
 
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
Grid Dynamics
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Grid Dynamics
 
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav..."Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
Grid Dynamics
 
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Grid Dynamics
 
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Grid Dynamics
 
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud..."ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
Grid Dynamics
 
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Grid Dynamics
 
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Grid Dynamics
 
Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...
Grid Dynamics
 
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Grid Dynamics
 
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Grid Dynamics
 

More from Grid Dynamics (20)

Are you keeping up with your customer
Are you keeping up with your customer Are you keeping up with your customer
Are you keeping up with your customer
 
"Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,..."Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,...
 
"How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like..."How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like...
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D
 
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
 
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
 
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
 
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
 
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul..."Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
 
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...
 
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav..."Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
 
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
 
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
 
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud..."ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
 
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...
 
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
 
Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...
 
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
 
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

In-Stream Processing Service Blueprint, Reference architecture for real-time Big Data applications

  • 1. 1 Private and Confidential1 Public In-Stream Processing Service Blueprint Reference architecture for real-time Big Data applications Anton Ovchinnikov, Data Scientist, Grid Dynamics
  • 2. 2 What we’ll talk about today • What’s In-Stream Processing? • How it’s used to process huge data streams in real time • How to build in-stream processing with open source • All about scale and reliability considerations • Example of large-scale customer implementation • Where can you learn more (hint: blog.griddynamics.com)
  • 3. 3 In a complex landscape of Big Data systems…
  • 4. 4 …In-Stream Processing Service is an approach to build real-time Big Data applications Today’s focus
  • 5. 5 • Fraud detection • Sentiment analytics • Preventive maintenance • Facilities optimization • Network monitoring • Intelligence and surveillance • Risk management • E-commerce • Clickstream analytics • Dynamic pricing • Supply chain optimization • Predictive medicine • Transaction cost analysis • Market data management • Algorithmic trading • Data warehouse augmentation Multiple industries and use cases
  • 6. 6 Open source world is diverse and confusing
  • 7. 7 What credentials do we have to talk about this? Big Data history @Grid Dynamics
  • 8. 8 Blueprint goals Pre-integrated Enterprise-gradeCloud-ready Proven mission- critical use Built 100% from leading open source projects Production-ready Portable across clouds Extendable
  • 9. 9 Target performance & reliability SLAs Throughput Up to 100,000 events per second Latency 1-60 seconds Retention Raw data and results archived for 30 days Reliability Built-in data loss mitigation mechanism in case of faults Availability 99.999 on commodity cloud infrastructure
  • 11. 11 Selected stack • REST API • Message Queue • HDFS
  • 12. 12 Every component is scalable in its own way • No single point of failure • Automatic failover • Data replication
  • 13. 13 Multi-node Kafka cluster • Undisputed modern choice for real time MOM • Retention and replay • Scalable via partitioning • Persistent • Super-fast
  • 14. 14 Single Zookeeper cluster for all components • Distributed coordination service, facilitates HA of other clustered services • Guaranteed consistent storage • Client monitoring • Leader election
  • 15. 15 Spark streaming: Central component of the platform Why Spark streaming? • Leading In-Stream Processing middleware • Active community • Rate of adoption • Vendor support • Excellent integration with Hadoop eco- system Key architectural considerations • Runs on top of Hadoop • Out-of-the-box integration with Kafka • Support for machine learning pipelines
  • 16. 16 Cassandra as operational store • Massively scalable, highly available NoSQL database • Ideal choice as large operational store (100s of GB) for streaming applications • Needed when event processing is stateful, and the state is quite large Example: stores user profiles as they are being updated in real time from clickstreams
  • 17. 17 REDIS as a lookup database • Simple, cheap and super- performant lookup store • Needed when event processing requires frequent access to GBs of reference data • Can be updated from outside • Master/slave architecture Example: IP geo-mapping, dictionaries, training sets
  • 18. 18 Putting all the pieces together: end-to-end platform configuration • No single points of failure • No bottlenecks • Scaling or recovering any component cluster mitigates availability issues • Caveat: pathologies do happen, even in this design – for example, dynamic repartitioning is not supported
  • 19. 19 Out of scope • Dashboards for visualization, monitoring, and reporting of In-Stream Processing results • Tools and modeling environments for data scientists • Logging and monitoring • Data encryption, in motion or at rest • Dynamic re-partitioning is not supported and will require down-time • Disaster recovery • Infrastructure provisioning and deployment automation
  • 20. 20 Case study: Large media agency Business opportunity • Real-time popularity trends for the content across all client’s properties drives audience coverage with the most interesting, trending articles Work done • Implementation used Amazon Kinesis & Amazon EMR/Spark Streaming stack • Data egress: S3 and Amazon Redshift
  • 22. 22 Summary • In-Stream Processing is a hot new technology • It can process mind-boggling volumes of events in real-time and discover insights • You can build a whole platform with 100% open source components • We give you a complete blueprint on how to put it together • It will run on any public cloud
  • 23. 23 What we didn’t get to talk about today • Docker, Docker, Docker: how to make auto-deployment and auto-scaling work • Data scientist’s kitchen: what they are doing when no one is watching • Cloud sandbox for our In-Stream Processing Blueprint: how to take it for a spin on AWS • Demo app: see how social analytics is done using the blueprint • All this, and more: coming up soon in our blog (blog.griddynamics.com) • Please, subscribe!
  • 24. 24 Thank you! Anton Ovchinnikov: aovchinnikov@griddynamics.com Grid Dynamics blog: blog.griddynamics.com Follow up on twitter: @griddynamics We are hiring! https://www.griddynamics.com/careers