SlideShare a Scribd company logo
1 of 76
Bootstrapping microservices
with kafka, Akka and spark
http://linkedin.com/in/alexvsilva
@thealexsilva
ALEX SILVA
Who am I?
- DATA Platform Architect
at Pluralsight
- Rackspace
- WDW
TECHNOLOGY
LEARNING
PLATFORM
What shou
ld
Ilearn?Where
Sho
uld
IStart?Who
can
help
me?Whatdid
I learn?
• Online technology
learning platform
• Subscription model
• Data-driven
PLURALSIGHT
What are we covering today?
microservices
Relational
ARCHITECTURE
COMMIT
LOGS
Data
ingestion
sTREAM
PROCESSING
Putting it all
together
MICROSERVICES
MONOLITHIC APPS
Deployment Boundary
ORDER
SERVICE
AUTH
SERVICE
RETURNS
SERVICE
INVENTORY
SERVICE
SHOPPING
CART
FULFILLMENT
SERVICE
SHOPPING
CART
AUTH
SERVICE
ORDER
SERVICE
RETURNS
SERVICE
INVENTORY
SERVICE
FULFILLMENT
SERVICE
MICROSERVICES
independence
Customer
Printer
Invoices
Job
Returns
Customer Invoices
Jobs
Most services
What data do we share?
HOW do we do it?
Invoices
Customer
Job
Returns
Printer
Encapsulation and loose coupling
“Sliceable”, domain-specific datasets
The service data mismatch
DATA WILL DIVERGE
OVERTIME
Invoices
Customer
Job
Returns
Printer
Is there a better approach?
DATABASES
TRANSACTIONS
ACID is old school
What
consistency do you really need and
when?
ACID 2.0
Associative
Commutative
Idempotent
Distributed
INDEXING
Indexes are awesome and we need it,
they make the lookups fast!
why do you want to scan all the data if you know
what you want?
That’s dumb.
- Dustin Vannoy
REPLICATION
LEADER FOLLOWER
REPLICATE
Failover + Resiliency
MUTATION vs. facts
UPDATE wishlist set qty=3
where user_id=121 and product_id=123
At 2:39pm, user 121 updated his wish list,
changing the quantity of product 123 from 1 to 3.
AND
state mutation
fact
VIEWS
materializedvirtual
Is there a
Better way?
SHOPPING CART
SERVICE
CATALOG
SERVICE
USER
SERVICE
FULFILLMENT
SERVICE
USER COMMIT LOG
RETURNS
SERVICE
WRITES TO
REPLICATES
REPLICATES
REPLICATES
REPLICATES
WHAT IF…
SEPARATE READS FROM WRITES
kafka
A Messaging system based on
distributed log semantics
Scalable
Fault tolerant
Stateful
Strong ordering
High concurrency
BROKER
BROKER
BROKER(User, 0)
Topic: User
(User, 0)
(User, 0)
READS/WRITES FROM/TO
Leader only
REPLICATION PROTOCOL
Replication is about RESILIENCY
BROKER BROKER BROKER BROKER
Looks like A GLOBALLY ORDERED QUEUE
BROKER
APPLICATION
APPLICATION
CONSUMER
APPLICATION
THE LOG is a linear structure
Old New
Messages are added here
Consumers have a position
Only sequential access Read to offset and SCAN
Old New
Consumer 1
Consumer 2
MESSAGES CAN BE REPLAYED
FOR AS LONG AS THEY EXIST IN THE LOG
Old New
Consumer 1
Consumer 2
A DISTRIBUTED REPLICATION PROTOCOL
Rewind and Replay
LOG CLEAN UP Policy: delete
Scan
1 2 3 4 5 6 7 8 9 10 12 12Old New
After
log.retention.ms or retention.bytes
messages are dropped from the log.
Log clean up policy: compact
Delete retention
point
Cleaner point
delete.retention.ms
16 19 21 23 24 25 261 8 12 13 15Old
New
Log headLog tail
STREAM PROCESSING
Continuously updating datasets
Max(viewed_time) from
clip_views
where location=‘CA’
over 1 day window
Similar features as a database
JOINAGREGGATE FILTER VIEW
Streaming
platforms
Why spark?
Support for many different data formats
Structured streaming
Failover and lifecycle management
Medium latency
Unified api
EVENT STREAM / LOG
MATERIALIZEDVIEWS/CACHE
HADOOP
ETL
SERVICE
TRANSF
Writes to
Replicates to
• Reproducible
• Stays in sync
How do we do it?
Separate data capture from replication
REAL-TIME	DATA	REPLICATION	PLATFORM
Hydra ingest
Data capture at scale
HYDRA REQUEST
INGESTORS Transports
Ingestion replication protocol
What about metadata?
Always Capture metadata at ingestion time
Automate data replication
Automate data pipelines
Automate data discovery
Data Metadata
Make more kinds of datasets:
1. readily available
2. easier to use for the entire
organization.
Message format: AVRO
Why avro?
Schema evolution
Smaller data footprint
Json friendly
Strong community support
Existing tools
Hydra SPARK
What is it?
Abstraction layer on top of datasets
Models data flows
Sources and operations
Based on a custom dsl
Api-driven
The “DSL” abstraction
What is a hydra dsl?
Example
examples
Kafka Source
JSON File Source
SaveAsAvro Operation
DatabaseUpsert Operation
Putting it all together…
HYDRA
BROKER
INGESTION
Customer
HYDRA
STREAM DISPATCH
{ }
/dsls
Invoices Returns
WE ARE ON GITHUB!
github.com/pluralsight/hydra-spark
github.com/pluralsight/hydra
Thank You!

More Related Content

What's hot

Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
Martin Kleppmann
 

What's hot (20)

Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive Streams
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Designing a reactive real-time data platform: Architecture and Infrastructure...
Designing a reactive real-time data platform: Architecture and Infrastructure...Designing a reactive real-time data platform: Architecture and Infrastructure...
Designing a reactive real-time data platform: Architecture and Infrastructure...
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With Akka
 
Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
 
Reactive mistakes - ScalaDays Chicago 2017
Reactive mistakes -  ScalaDays Chicago 2017Reactive mistakes -  ScalaDays Chicago 2017
Reactive mistakes - ScalaDays Chicago 2017
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streams
 
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Detecting Real-Time Financial Fraud with Cloudflow on KubernetesDetecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
 
Reactive Design Patterns
Reactive Design PatternsReactive Design Patterns
Reactive Design Patterns
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 

Similar to Bootstrapping Microservices with Kafka, Akka and Spark

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward
 

Similar to Bootstrapping Microservices with Kafka, Akka and Spark (20)

Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Containers and microservices create new performance challenges kowall - app...
Containers and microservices create new performance challenges   kowall - app...Containers and microservices create new performance challenges   kowall - app...
Containers and microservices create new performance challenges kowall - app...
 
AppSphere 15 - Containers and Microservices Create New Performance Challenges
AppSphere 15 - Containers and Microservices Create New Performance ChallengesAppSphere 15 - Containers and Microservices Create New Performance Challenges
AppSphere 15 - Containers and Microservices Create New Performance Challenges
 
.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers!
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
The beginning of the end of the market for all-flash arrays
The beginning of the end of the market for all-flash arrays The beginning of the end of the market for all-flash arrays
The beginning of the end of the market for all-flash arrays
 
Dev ops
Dev opsDev ops
Dev ops
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 

Recently uploaded

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 

Bootstrapping Microservices with Kafka, Akka and Spark

Editor's Notes

  1. Generate a lot of data and leverage it to make the product better
  2. Single process Codebase and development
  3. Split the monolith into Many processes, different codebases, different deployment pipelines
  4. Independently Built The build process for creating a service should be completely separate from building another service. Independently Testable Our microservice should be testable independently of the test lifecycle of other services and components. Independently Deployable Our microservice must be independently deployable, this is a fundamental aspect of enabling rapid change. Independent Teams Small independent teams owning the full lifecycle of a service from inception through to it’s final death. Independent Data One of the hardest aspects for the microservice purist to achieve is data independence.
  5. When it comes to being independent, data is usually a naggy point. Services still need to share data somehow Around deployment, contract schemas, deprecation, interconnectivity, etc. Very rarely you will find a service that has a tightly bounded context so that data sharing is secondary. Maybe AuthN services, but even then.
  6. Most services will fall on this area where they slice and dice the same core business facts and data, they just slice them differently
  7. These applications/services must work together. Services force to think about what we need to expose and share to the outside world. Mostly an afterthought
  8. Future services will become even more interconnected and intertwined.
  9. Because of this, you end up with multiple copies of data across different service that will get out of sync. The more mutable copies of data, the more divergent data will be come.
  10. What you do? Keep changing the contract of services to add more attributes? Turn your services into daos?
  11. 8down vote A transaction is a sequence of one or more SQL operations that are treated as a unit. Specifically, each transaction appears to run in isolation, and furthermore, if the system fails, each transaction is either executed in its entirety or not all. The concept of transactions is actually motivated by two completely independent concerns. One has to do with concurrent access to the database by multiple clients and the other has to do with having a system that is resilient to system failures.
  12. Acid is overkill or as some would say old school
  13. Databases do this really well!
  14. the idea is that you have a copy of the same data on multiple machines (nodes), so that you can serve reads in parallel, and so that the system keeps running if you lose a machine.
  15. This distinction between an imperative modification and an immutable fact is something you may have seen in the context of event sourcing. That’s a method of database design that says you should structure all of your data as immutable facts, and it’s an interesting idea.
  16. However, there’s something really compelling about this idea of materialized views. I see a materialized view almost as a kind of cache that magically keeps itself up-to-date. Instead of putting all of the complexity of cache invalidation in the application (risking race conditions and all the discussed problems), materialized views say that cache maintenance should be the responsibility of the data infrastructure.
  17. Stream of immutable facts are used to segregate reads from writes SHARED STATE IS ONLY IN THE CACHE SO THAT DATA CANNOT DIVERGE
  18. Let’s talk about Kafka as a commit log / source for a replication stream
  19. Kafka messages have a key and value.
  20. See the benefit if ever used a regular message queue
  21. Data becomes an immutable stream of facts
  22. Keeps the latest record per key. Truncate history but at least the latest version of every key will be present in the log.
  23. What differentiates Kafka from a traditional messaging system
  24. Medium latency High volume data flows, SQL en masse processing massive scaling - 10,000s nodes not for small volumes rich options for SQL, etc. Low limit: 0.5 seconds (we are ok with that) Failover and lifecycle management from cluster itself - restartability  (ADD TO WHY SPARK)
  25. Why we chose akka and scala? Distributed systems Functional paradigm and datasets Akka is really the backbone of this platfform