SlideShare a Scribd company logo
1 of 40
Ryan Svihla & Wei Deng
Highly Available Spark Streaming Processing with DataStax Enterprise and
Confluent Platform
1 PySpark sadness
2 Streaming Points of Failure & Their Resolutions
3 Anti-Patterns
2© DataStax, All Rights Reserved.
PySpark Sadness
Our original talk
Kafka Direct API Issues
© DataStax, All Rights Reserved. 4
• Python 2nd class citizen, simple java wrapper with direct api (version 1.4)
• Python important features such as arbitrary offset support until 1.5
Driver HA: Cluster Mode
© DataStax, All Rights Reserved. 5
• Cluster mode not available in standalone, so no supervise
• Provides driver resilience which we cover use later but varying degrees of support with PySpark
• Only available in Mesos SPARK-9669 (Spark 1.6.0)
• And YARN SPARK-5162 (Spark 1.4.0)
Kafka 0.10 Python support - SPARK-16534
So Scala
© DataStax, All Rights Reserved. 7
Supported & Agile
• Like Python has a repl for iterative
development
• Lots of higher order functions
• Unlike Python first class citizen of
Spark
• Gets all the features
Spark Streaming Points of Failure
Long running jobs and keeping them up
Spark Refresher
Component Description
Driver Where the application is launched and where results are fed
Master Coordinates
Worker Launches Executors
Executors Spawn tasks to do work
Tasks Unit of execution over partition
© DataStax, All Rights Reserved. 9
Points of failure and their resolutions
Failure mode Resolution
Driver Failure Cluster mode and supervise
Master Failure DSE Leader Election
Worker Failure Spark Master Handles this
Task Failure Executors will retry
Cassandra Failure Replicas > 1
Kafka Failure Replicas > 1
ZooKeeper Failure Replicas > 1
Message Loss WAL or DirectAPI
© DataStax, All Rights Reserved. 10
Streaming considerations
• Has to restart and pick up where it left off if master dies.
• Default driver mode isn't fault tolerant.
• Bad GC configuration (covered later in gotchas).
• Ever growing state problems.
• Message broker crashing and message loss.
© DataStax, All Rights Reserved. 11
Cluster mode - Supervise
© DataStax, All Rights Reserved. 12
Summary
• Restarts dead jobs
• Runs inside an executor on a worker
• Master is in charge of restarting driver
on another worker
• Gives us driver fault tolerance
Checkpointing state
© DataStax, All Rights Reserved. 13
Summary
• Ongoing calculations are preserved
• Has to use a DFS (HDFS, DSEFS,
S3)
• Code has to be written to take
advantage of it.
Master Leader Election
© DataStax, All Rights Reserved. 14
Summary
• Gives us Spark Master fault tolerance
• DSE 4.7+
• Needs Quorum of Spark Nodes in
data center to elect new master
• Until DSE 4.7.9, 4.8.8, 5.0.0 can't
restart driver that is also running on
master
Context Cleaner
© DataStax, All Rights Reserved. 15
Summary
• Eliminates a potential outage.
• Removes no longer used shuffle data.
• Removes no longer referenced RDDs
that were persisted.
• Executes on GC of the driver. So don't
run too large of a driver heap.
• Replaces TTL Cleaner which is
deprecated in SPARK-10534
Kafka Replication
© DataStax, All Rights Reserved. 16
Summary
• As of 0.8 ISR (In Sync Replica)
• Leader/Follower Architecture
• Relies on ZooKeeper for election
• Can block till N replicas ack
• Async otherwise
• ISR has some interesting tradeoffs
with leader election and durability.
Know the boundaries.
Messaging Failures: WAL
© DataStax, All Rights Reserved. 17
Write Ahead Log
• Used with receivers.
• Allows the message log to pick back
up where it left off.
• Slows down throughput a lot.
• Still does nothing for handling double
sent messages from an application
perspective.
• Requires DFS like HDFS, DSEFS or
S3
Messaging Failures: Direct API
© DataStax, All Rights Reserved. 18
Direct API
• Exactly once messaging once it enters
the pipeline.
• Doesn't require DFS if you store offset
in ZK. Option A
• If you want to minimize message loss,
checkpoint to DFS. Option B
• Much faster throughput than WAL but
experimental.
• Much simpler tuning. 2 Kafka
partitions for 2 parallel consumers.
Additional considerations for all jobs
• Worker nodes crashing
• Transient network failures
• Cassandra nodes crashing
• Cache regeneration
© DataStax, All Rights Reserved. 19
Spark Fault Tolerance: Workers (not really)
• Master will stop sending work to Workers that aren't responding by timeout (default 60 seconds)
• spark.worker.timeout 60
• Tasks from crashed worker will be retried on another working node
© DataStax, All Rights Reserved. 20
Spark Fault Tolerance: Executors
• Master will tell Worker to launch a replacement Executor if an Executor dies early
• Any subtasks are rescheduled
© DataStax, All Rights Reserved. 21
Spark Fault Tolerance: Tasks
• Individual tasks will be retried by the executor. Default of 4 times.
• spark.task.maxFailures
© DataStax, All Rights Reserved. 22
Spark Fault Tolerance: Cache Replication
• When caching can specify how many replicas you'd like.
• Caching takes a lot more time because you have to copy the cache to another worker.
© DataStax, All Rights Reserved. 23
Gotchas
Beyond the textbook case
OOM: Out Of Memory
© DataStax, All Rights Reserved. 25
Summary
• Small defaults for heap and GC in
Spark
• Some work doesn't split easily or has
a large split
• Also can be too much work in the
driver
• Asking executors to do too much
OOM: Small Defaults
• Driver only 1gb heap available by default (as of 1.5.0). Should be enough for a lot of use cases.
• Executors are only 1gb with Spark by default (as of 1.5.0). This is only usable for a small number
of tasks. Set spark.executor.memory to a higher number for more flexibility with task sizes and
better throughput.
• Using G1GC plus a larger heap (32gb for example) should give you nice big executors that can do
lots of work at once and handle a variety of workloads.
© DataStax, All Rights Reserved. 26
OOM: Large Spark Partitions
• When the work cannot fit inside of a given task
• Partitions that are larger than the available Executor heap
• Sometimes I see people hit this trying to optimize throughput
© DataStax, All Rights Reserved. 27
OOM: Driver AKA don't call collect on 18TB
• Common early mistake with Spark
• It's often easier to debug work in the driver for new users
• Or write out a file result locally instead of on a shared DFS
• Easy to avoid don't call collect on large amounts of data. Write out large files to a DFS
© DataStax, All Rights Reserved. 28
Cassandra Anti-Patterns
© DataStax, All Rights Reserved. 29
Failures
• Really Large Cassandra Partitions
• Requiring all nodes to be up
• Data center name mismatch
Cassandra Anti-Patterns: Very Large Partitions
• But the Spark Cassandra Connector can split the work up (covered later).
• IndexInfo can mean large heap pressure on Cassandra underneath CASSANDRA-9754 with large
partitions. Looking at multiple gigabytes.
• Pre 3.0 versions of Cassandra column names can make a large difference in partition size
(compression helps)
• If you run a large enough ParNew in CMS with the correct tuning or a large enough heap in G1
(26gb+) this is fine. Otherwise large partitions can cause a job to not be able to complete.
© DataStax, All Rights Reserved. 30
Cassandra Anti-Patterns: Bad RF and CL
Combination
• LOCAL_QUORUM/QUORUM with less than RF3 or using ALL
• Means all nodes have to be up to complete your work.
• There is now no fault tolerance for node outages.
• One node drops your job is failing.
© DataStax, All Rights Reserved. 31
Out of Disk aka "Drive Space isn't Infinite!?"
© DataStax, All Rights Reserved. 32
Failures
• Bad data skew on shuffle.
• Ever growing state.
• Job History Server without Cleanup
Out of Disk: Bad data skew
• Shuffle will flush intermediary data to local disk (this is why fast SSDs are a good idea)
• Group by a limited range like "active" and "inactive".
• One of those will grow indefinitely and you'll need an impossibly large node to handle it.
© DataStax, All Rights Reserved. 33
Out of Disk: Driver Heap Is Huge
• Context cleaner requires the driver to GC to actually fire a cleanup job
• If you set heap large enough this may not happen soon enough relative to disk space
© DataStax, All Rights Reserved. 34
Out of Disk: Job History Server w/o cleanup
• Pretty easy to set
• spark.history.fs.cleaner.enabled true #need it enabled)
• spark.history.fs.cleaner.interval 1d #check once a day
• spark.history.fs.cleaner.maxAge 7d #clean files over 7 days old
© DataStax, All Rights Reserved. 35
Reading Too Aggressively
© DataStax, All Rights Reserved. 36
Summary
• Watch out for large rows (covered
earlier)
• spark.cassandra.input.fetch.size_in_ro
ws default 1000 may need to drop for
large rows
• spark.cassandra.input.split.size_in_m
b if Spark doesn't have enough heap
Writing Too Aggressively
© DataStax, All Rights Reserved. 37
Summary
• This can be about just too much write
volume in the first place
• Also look at spark cassandra connector
settings
• Modern connector setting is
output.batch.size.bytes 1024 and
spark.cassandra.output.concurrent.writes
5 this is pretty safe. If this isn’t set or if the
row number is set it can be bad.
• Old versions set output.batch.size.rows
new versions this is auto.
Infinite Retry
© DataStax, All Rights Reserved. 38
Summary
• SPARK-5945 it was possible in spark
for stages to get stuck retrying forever.
• Fixed in DSE-5.0
Summary
• Try not to shoot yourself in the foot
• Use a durable broker.
• Use a durable database.
• Use a distributed filesystem.
• Use direct API or WAL or have some other method of message replay
• Don't use Python for HA Spark Streaming :(
© DataStax, All Rights Reserved. 39
Thank You
Go forth and be stable

More Related Content

More from DataStax

More from DataStax (20)

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 
Innovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionInnovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud Detection
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 

DataStax | Highly Available Spark Stream Processing with Confluent Platform and DataStax Enterprise (Ryan Svihla & Wei Deng) | Cassandra Summit 2016

  • 1. Ryan Svihla & Wei Deng Highly Available Spark Streaming Processing with DataStax Enterprise and Confluent Platform
  • 2. 1 PySpark sadness 2 Streaming Points of Failure & Their Resolutions 3 Anti-Patterns 2© DataStax, All Rights Reserved.
  • 4. Kafka Direct API Issues © DataStax, All Rights Reserved. 4 • Python 2nd class citizen, simple java wrapper with direct api (version 1.4) • Python important features such as arbitrary offset support until 1.5
  • 5. Driver HA: Cluster Mode © DataStax, All Rights Reserved. 5 • Cluster mode not available in standalone, so no supervise • Provides driver resilience which we cover use later but varying degrees of support with PySpark • Only available in Mesos SPARK-9669 (Spark 1.6.0) • And YARN SPARK-5162 (Spark 1.4.0)
  • 6. Kafka 0.10 Python support - SPARK-16534
  • 7. So Scala © DataStax, All Rights Reserved. 7 Supported & Agile • Like Python has a repl for iterative development • Lots of higher order functions • Unlike Python first class citizen of Spark • Gets all the features
  • 8. Spark Streaming Points of Failure Long running jobs and keeping them up
  • 9. Spark Refresher Component Description Driver Where the application is launched and where results are fed Master Coordinates Worker Launches Executors Executors Spawn tasks to do work Tasks Unit of execution over partition © DataStax, All Rights Reserved. 9
  • 10. Points of failure and their resolutions Failure mode Resolution Driver Failure Cluster mode and supervise Master Failure DSE Leader Election Worker Failure Spark Master Handles this Task Failure Executors will retry Cassandra Failure Replicas > 1 Kafka Failure Replicas > 1 ZooKeeper Failure Replicas > 1 Message Loss WAL or DirectAPI © DataStax, All Rights Reserved. 10
  • 11. Streaming considerations • Has to restart and pick up where it left off if master dies. • Default driver mode isn't fault tolerant. • Bad GC configuration (covered later in gotchas). • Ever growing state problems. • Message broker crashing and message loss. © DataStax, All Rights Reserved. 11
  • 12. Cluster mode - Supervise © DataStax, All Rights Reserved. 12 Summary • Restarts dead jobs • Runs inside an executor on a worker • Master is in charge of restarting driver on another worker • Gives us driver fault tolerance
  • 13. Checkpointing state © DataStax, All Rights Reserved. 13 Summary • Ongoing calculations are preserved • Has to use a DFS (HDFS, DSEFS, S3) • Code has to be written to take advantage of it.
  • 14. Master Leader Election © DataStax, All Rights Reserved. 14 Summary • Gives us Spark Master fault tolerance • DSE 4.7+ • Needs Quorum of Spark Nodes in data center to elect new master • Until DSE 4.7.9, 4.8.8, 5.0.0 can't restart driver that is also running on master
  • 15. Context Cleaner © DataStax, All Rights Reserved. 15 Summary • Eliminates a potential outage. • Removes no longer used shuffle data. • Removes no longer referenced RDDs that were persisted. • Executes on GC of the driver. So don't run too large of a driver heap. • Replaces TTL Cleaner which is deprecated in SPARK-10534
  • 16. Kafka Replication © DataStax, All Rights Reserved. 16 Summary • As of 0.8 ISR (In Sync Replica) • Leader/Follower Architecture • Relies on ZooKeeper for election • Can block till N replicas ack • Async otherwise • ISR has some interesting tradeoffs with leader election and durability. Know the boundaries.
  • 17. Messaging Failures: WAL © DataStax, All Rights Reserved. 17 Write Ahead Log • Used with receivers. • Allows the message log to pick back up where it left off. • Slows down throughput a lot. • Still does nothing for handling double sent messages from an application perspective. • Requires DFS like HDFS, DSEFS or S3
  • 18. Messaging Failures: Direct API © DataStax, All Rights Reserved. 18 Direct API • Exactly once messaging once it enters the pipeline. • Doesn't require DFS if you store offset in ZK. Option A • If you want to minimize message loss, checkpoint to DFS. Option B • Much faster throughput than WAL but experimental. • Much simpler tuning. 2 Kafka partitions for 2 parallel consumers.
  • 19. Additional considerations for all jobs • Worker nodes crashing • Transient network failures • Cassandra nodes crashing • Cache regeneration © DataStax, All Rights Reserved. 19
  • 20. Spark Fault Tolerance: Workers (not really) • Master will stop sending work to Workers that aren't responding by timeout (default 60 seconds) • spark.worker.timeout 60 • Tasks from crashed worker will be retried on another working node © DataStax, All Rights Reserved. 20
  • 21. Spark Fault Tolerance: Executors • Master will tell Worker to launch a replacement Executor if an Executor dies early • Any subtasks are rescheduled © DataStax, All Rights Reserved. 21
  • 22. Spark Fault Tolerance: Tasks • Individual tasks will be retried by the executor. Default of 4 times. • spark.task.maxFailures © DataStax, All Rights Reserved. 22
  • 23. Spark Fault Tolerance: Cache Replication • When caching can specify how many replicas you'd like. • Caching takes a lot more time because you have to copy the cache to another worker. © DataStax, All Rights Reserved. 23
  • 25. OOM: Out Of Memory © DataStax, All Rights Reserved. 25 Summary • Small defaults for heap and GC in Spark • Some work doesn't split easily or has a large split • Also can be too much work in the driver • Asking executors to do too much
  • 26. OOM: Small Defaults • Driver only 1gb heap available by default (as of 1.5.0). Should be enough for a lot of use cases. • Executors are only 1gb with Spark by default (as of 1.5.0). This is only usable for a small number of tasks. Set spark.executor.memory to a higher number for more flexibility with task sizes and better throughput. • Using G1GC plus a larger heap (32gb for example) should give you nice big executors that can do lots of work at once and handle a variety of workloads. © DataStax, All Rights Reserved. 26
  • 27. OOM: Large Spark Partitions • When the work cannot fit inside of a given task • Partitions that are larger than the available Executor heap • Sometimes I see people hit this trying to optimize throughput © DataStax, All Rights Reserved. 27
  • 28. OOM: Driver AKA don't call collect on 18TB • Common early mistake with Spark • It's often easier to debug work in the driver for new users • Or write out a file result locally instead of on a shared DFS • Easy to avoid don't call collect on large amounts of data. Write out large files to a DFS © DataStax, All Rights Reserved. 28
  • 29. Cassandra Anti-Patterns © DataStax, All Rights Reserved. 29 Failures • Really Large Cassandra Partitions • Requiring all nodes to be up • Data center name mismatch
  • 30. Cassandra Anti-Patterns: Very Large Partitions • But the Spark Cassandra Connector can split the work up (covered later). • IndexInfo can mean large heap pressure on Cassandra underneath CASSANDRA-9754 with large partitions. Looking at multiple gigabytes. • Pre 3.0 versions of Cassandra column names can make a large difference in partition size (compression helps) • If you run a large enough ParNew in CMS with the correct tuning or a large enough heap in G1 (26gb+) this is fine. Otherwise large partitions can cause a job to not be able to complete. © DataStax, All Rights Reserved. 30
  • 31. Cassandra Anti-Patterns: Bad RF and CL Combination • LOCAL_QUORUM/QUORUM with less than RF3 or using ALL • Means all nodes have to be up to complete your work. • There is now no fault tolerance for node outages. • One node drops your job is failing. © DataStax, All Rights Reserved. 31
  • 32. Out of Disk aka "Drive Space isn't Infinite!?" © DataStax, All Rights Reserved. 32 Failures • Bad data skew on shuffle. • Ever growing state. • Job History Server without Cleanup
  • 33. Out of Disk: Bad data skew • Shuffle will flush intermediary data to local disk (this is why fast SSDs are a good idea) • Group by a limited range like "active" and "inactive". • One of those will grow indefinitely and you'll need an impossibly large node to handle it. © DataStax, All Rights Reserved. 33
  • 34. Out of Disk: Driver Heap Is Huge • Context cleaner requires the driver to GC to actually fire a cleanup job • If you set heap large enough this may not happen soon enough relative to disk space © DataStax, All Rights Reserved. 34
  • 35. Out of Disk: Job History Server w/o cleanup • Pretty easy to set • spark.history.fs.cleaner.enabled true #need it enabled) • spark.history.fs.cleaner.interval 1d #check once a day • spark.history.fs.cleaner.maxAge 7d #clean files over 7 days old © DataStax, All Rights Reserved. 35
  • 36. Reading Too Aggressively © DataStax, All Rights Reserved. 36 Summary • Watch out for large rows (covered earlier) • spark.cassandra.input.fetch.size_in_ro ws default 1000 may need to drop for large rows • spark.cassandra.input.split.size_in_m b if Spark doesn't have enough heap
  • 37. Writing Too Aggressively © DataStax, All Rights Reserved. 37 Summary • This can be about just too much write volume in the first place • Also look at spark cassandra connector settings • Modern connector setting is output.batch.size.bytes 1024 and spark.cassandra.output.concurrent.writes 5 this is pretty safe. If this isn’t set or if the row number is set it can be bad. • Old versions set output.batch.size.rows new versions this is auto.
  • 38. Infinite Retry © DataStax, All Rights Reserved. 38 Summary • SPARK-5945 it was possible in spark for stages to get stuck retrying forever. • Fixed in DSE-5.0
  • 39. Summary • Try not to shoot yourself in the foot • Use a durable broker. • Use a durable database. • Use a distributed filesystem. • Use direct API or WAL or have some other method of message replay • Don't use Python for HA Spark Streaming :( © DataStax, All Rights Reserved. 39
  • 40. Thank You Go forth and be stable