More Related Content Similar to Advanced celery (20) Advanced celery6. Popularity
Celery
• ~8800 Stars
• ~2500 Forks
Kombu
• ~1200 Stars
• ~500 Forks
Py-amqp
• ~150 Stars
• ~120 Forks
Billiard
• ~200 Stars
• ~110 Forks
All Rights Reserved © Omer
Katz 2018
15. Typical Bug Report
All Rights Reserved © Omer
Katz 2018
Mar 6 10:13:53 h3class01 kernel: [8883126.929572] swap_free: Bad swap file entry 007e9058
Mar 6 10:13:53 h3class01 kernel: [8883126.929580] BUG: Bad page map: 31 messages suppressed
Mar 6 10:13:53 h3class01 kernel: [8883126.929583] BUG: Bad page map in process celery pte:fd20b000 pmd:16d7b08067
Mar 6 10:13:53 h3class01 kernel: [8883126.929588] addr:00007fb3611a5000 vm_flags:08100073 anon_vma:ffff8801b4b13910 mapping: (null) index:7fb3611a5
Mar 6 10:13:53 h3class01 kernel: [8883126.929593] file: (null) fault: (null) mmap: (null) readpage: (null)
Mar 6 10:13:53 h3class01 kernel: [8883126.929599] CPU: 38 PID: 488 Comm: celery Tainted: G B OE 4.1.0-generic #1
Mar 6 10:13:53 h3class01 kernel: [8883126.929601] Hardware name: New H3C Technologies Co., Ltd. UIS R390X G2/RS32M2C9S, BIOS 1.00.23P01 07/07/2017
Mar 6 10:13:53 h3class01 kernel: [8883126.929602] 0000000000000000 ffff881895aa7a50 ffffffff817ee563 ffff88109d7f0170
Mar 6 10:13:53 h3class01 kernel: [8883126.929605] 00007fb3611a5000 ffff881895aa7aa0 ffffffff811b73f5 00000020beb9a067
Mar 6 10:13:53 h3class01 kernel: [8883126.929607] 00000000fd20b000 ffff881895aa7aa0 00007fb3611a6000 ffff8816d7b08d28
Mar 6 10:13:53 h3class01 kernel: [8883126.929609] Call Trace:
Mar 6 10:13:53 h3class01 kernel: [8883126.929621] [] dump_stack+0x63/0x81
Mar 6 10:13:53 h3class01 kernel: [8883126.929628] [] print_bad_pte+0x1e5/0x280
Mar 6 10:13:53 h3class01 kernel: [8883126.929630] [] unmap_single_vma+0x7dd/0x830
Mar 6 10:13:53 h3class01 kernel: [8883126.929635] [] ? release_pages+0x1ec/0x270
Mar 6 10:13:53 h3class01 kernel: [8883126.929638] [] unmap_vmas+0x54/0xa0
Mar 6 10:13:53 h3class01 kernel: [8883126.929640] [] exit_mmap+0x9b/0x160
Mar 6 10:13:53 h3class01 kernel: [8883126.929645] [] mmput+0x64/0x130
Mar 6 10:13:53 h3class01 kernel: [8883126.929649] [] flush_old_exec+0x4e8/0xa10
Mar 6 10:13:53 h3class01 kernel: [8883126.929655] [] load_elf_binary+0x35d/0x1830
Mar 6 10:13:53 h3class01 kernel: [8883126.929657] [] ? get_user_pages+0x52/0x60
Mar 6 10:13:53 h3class01 kernel: [8883126.929659] [] ? get_arg_page+0xa9/0xe0
Mar 6 10:13:53 h3class01 kernel: [8883126.929660] [] search_binary_handler+0x9f/0x1e0
Mar 6 10:13:53 h3class01 kernel: [8883126.929662] [] do_execveat_common.isra.28+0x58e/0x740
Mar 6 10:13:53 h3class01 kernel: [8883126.929663] [] do_execve+0x2c/0x30
Mar 6 10:13:53 h3class01 kernel: [8883126.929666] [] ? getname+0x12/0x20
Mar 6 10:13:53 h3class01 kernel: [8883126.929667] [] SyS_execve+0x2e/0x40
Mar 6 10:13:53 h3class01 kernel: [8883126.929672] [] stub_execve+0x5/0x5
Mar 6 10:13:53 h3class01 kernel: [8883126.929674] [] ? system_call_fastpath+0x16/0x75
celery
execve
16. Source of Most Bugs
All Rights Reserved © Omer
Katz 2018
Too Many
Dependencies
Hard To
Reproduce
Issues
Not Enough
End to End
Test Coverage
19. Example
• User reported that Celery Beat refuses to start
• Only reproduces on OSX when Shelve picks DBM as it’s storage
• Upstream bug reports: PyPy#2755, BPO32922
All Rights Reserved © Omer
Katz 2018
21. Not All Tasks Are Born Equal
All Rights Reserved © Omer
Katz 2018
22. Please Stop Running Machine Learning
Pipelines in Celery.
Dask is a Much Better Choice.
All Rights Reserved © Omer
Katz 2018
23. Building Blocks
Published in 2003
• Examples Are Outdated
• Patterns Are Still Relevant
• Stood the Test of Time
65 Patterns
• Messaging Systems
• Messaging Channels
• Message Construction
• Message Routing
• Message Transformation
• Messaging Endpoints
• System Management
All Rights Reserved © Omer
Katz 2018
24. Integration Styles
File Transfer
• Share Data
• Examples:
• Inotify
• Cron Job
Shared
Database
• Share Data
• Not Event Driven
Remote
Procedure
Invocation
• Share Data &
Processing
• Decouple Internal
State
• Synchronous or
Asynchronous
Messaging
• Share Data &
Processing
• Does not Require all
Applications to be
Running at the same
time
• Synchronous or
Asynchronous
All Rights Reserved © Omer
Katz 2018
25. Modern Examples of Integration Patterns
Publish-
Subscribe
Channel
Google
Cloud
Pub/Sub
NATS
Kafka
Dead
Letter
Channel
SQS Dead
Letter
Queue
RabbitMQ
Dead
Letter
Exchange
Return
Address
Golang
Erlang
AMQP
Reply-To
Header
Content-
based
Router
Apache
Camel
Message
Filter
AMQP 0.9
Bindings
Event
Driven
Consumer
RabbitMQ
ActiveMQ
Competing
Consumers
Apache
Kafka
NATS
Streaming
Process
Manager
Celery
Canvas
Airflow
Luigi
All Rights Reserved © Omer
Katz 2018
26. Message Types
Command Message
• Like Post
• Send Email
Document Message
• Email Contents
• New User Information
Event Message
• A User Registered
• An Item Was Purchased
All Rights Reserved © Omer
Katz 2018
28. Document Messages
Are Immutable
Do not Instruct
the Message
Endpoint to
Process the
Data in a
specific Way
Triggers New
Commands or
Events based
on Data
Consumers
May Enrich or
Filter the Data
and Produce
new Document
Messages
All Rights Reserved © Omer
Katz 2018
30. Optimal Message Passing Flow
A User Was Created
• Produces an Event Message – “user_created”
• Contains User Id
“user_created” Event Handler
• Fetches Relevant Data about the User
• Produces Document Messages
User Email Data Handler
• Validates Email
• Produces the “send_verification_email” Command Message
“send_verification_email” Command Handler
• Sends a Verification Email to the Newly Registered User
All Rights Reserved © Omer
Katz 2018
31. Actual Message Passing Flow
A User Was Created
• “send_verification_email.delay(user)”
• “increase_user_registration_metric.delay()”
• Etc.
Celery Executes Some Code
All Rights Reserved © Omer
Katz 2018
33. Everyone Has Their Own Messaging Protocol
All Rights Reserved © Omer
Katz 2018
• AMQP 0.9 vs. 1.0 – Totally Different Protocols
• Redis has it’s Own Protocol
• So Does Kafka
• And NATS
• …
34. Why the F*** Can’t We Agree On One Protocol?
All Rights Reserved © Omer
Katz 2018
35. What If I Told You Everything Should Speak AMQP 1.0?
All Rights Reserved © Omer
Katz 2018
36. Celery is a Distributed System
Message
Broker
Celery
Worker
Data
Pipeline
Web
Application
All Rights Reserved © Omer
Katz 2018
37. Do Not Do Distributed Computing
All Rights Reserved © Omer
Katz 2018
38. A Node (a.k.a An Actor, An Agent, A
Process)
All Rights Reserved © Omer
Katz 2018
41. High Cohesion – Lots of Work Locally
Low Adhesion – Selective Work
Remotely
All Rights Reserved © Omer
Katz 2018
42. Nodes Properties
Fail as a Unit
Errors are
Observable
State is
Coherent
State Transitions
Occur in an
Orderly Fashion
All Rights Reserved © Omer
Katz 2018
43. Fallacies of Distributed Computing
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Network topology doesn't change
There is one administrator
Transport cost is zero
All Rights Reserved © Omer
Katz 2018
44. A Network Partition Is
• Caused by the Network Fairy
• A Decomposition of a Network into Relatively Independent Subnets
• All Sides of the Partition Observes All Nodes On Other Side as
Unavailable
All Rights Reserved © Omer
Katz 2018
45. But has Never Been Tested for Fault
Tolerance in Face of Network Partitions
All Rights Reserved © Omer
Katz 2018
46. A Recipe for Disaster
All Rights Reserved © Omer
Katz 2018
47. Pause the Minority Nodes
Pause all the Selected Nodes
Automatically Pick which Side
of the Partition “wins”
Do Nothing – The Default
All Rights Reserved © Omer
Katz 2018
49. The DNS Goblin Strikes Yet
Again
Since Network is
Slow We’d Rather
Avoid it at All Cost if
Possible
All Rights Reserved © Omer
Katz 2018
51. Celery Beat relies on Time
• Time Zones Make Everything Way More Complex
• Leap Seconds Make Things Worse
• NTP is Not Good As You Think It Is
• POSIX Clocks are Not Monotonic By Definition
• We Use Monotonic Clocks When Possible
All Rights Reserved © Omer
Katz 2018
52. Serialization & Compression
• Use MsgPack to
Serialize &
Deserialize
Messages
• Compress &
Decompress Using
zstd
All Rights Reserved © Omer
Katz 2018
53. Task Pools
Gevent Pool
• Use for I/O
Bound Tasks
Prefork Pool
• Use for CPU
Bound Tasks
Solo Pool
• When All Else
Fails
All Rights Reserved © Omer
Katz 2018
54. In 9 Years of Development We
Still Haven’t Squashed All Bugs
All Rights Reserved © Omer
Katz 2018
55. Mixing Threads & Processes
will be possible
All Rights Reserved © Omer
Katz 2018
56. Celery 5 Will Use Python’s AsyncIO
All Rights Reserved © Omer
Katz 2018
58. Summary
All Rights Reserved © Omer
Katz 2018
• Distributed Systems are Terrible
• We Have To Use Them Anyway
• Celery 5 will be More Reliable and Better Tested
• AsyncIO on the Consumer Side
• AsyncIO or Blocking on the Producer Side
• All Messages From All Brokers Will Be Translated To/From AMQP 1.0
• Different Abstractions for Different Types of Messages
• Celery will be a Better Integration Framework
59. Celery 5 Status
All Rights Reserved © Omer
Katz 2018
• Almost No Code Has Been Written
• We Need Funding to Complete it
• Please Donate at http://www.celeryproject.org/ if You Want To See
Celery 5 Happening
Editor's Notes A total of 38,779 LOC לכידות גבוהה
הידבקות נמוכה לכידות גבוהה
הידבקות נמוכה לכידות גבוהה
הידבקות נמוכה לכידות גבוהה
הידבקות נמוכה Software applications are written with little error-handling on networking errors. During a network outage, such applications may stall or infinitely wait for an answer packet, permanently consuming memory or other resources. When the failed network becomes available, those applications may also fail to retry any stalled operations or require a (manual) restart.
Ignorance of network latency, and of the packet loss it can cause, induces application- and transport-layer developers to allow unbounded traffic, greatly increasing dropped packets and wasting bandwidth.
Ignorance of bandwidth limits on the part of traffic senders can result in bottlenecks over frequency-multiplexed media.
Complacency regarding network security results in being blindsided by malicious users and programs that continually adapt to security measures.[2]
Changes in network topology can have effects on both bandwidth and latency issues, and therefore similar problems.
Multiple administrators, as with subnets for rival companies, may institute conflicting policies of which senders of network traffic must be aware in order to complete their desired paths.
The "hidden" costs of building and maintaining a network or subnet are non-negligible and must consequently be noted in budgets to avoid vast shortfalls.
If a system assumes a homogeneous network, then it can lead to the same problems that result from the first three fallacies. “While a network partition is in place, the two (or more!) sides of the cluster can evolve independently, with both sides thinking the other has crashed. Queues, bindings, exchanges can be created or deleted separately. Mirrored queues which are split across the partition will end up with one master on each side of the partition, again with both sides acting independently. Other undefined and weird behaviour may occur.
It is important to understand that when network connectivity is restored, this state of affairs persists. The cluster will continue to act in this way until you take action to fix it.
“