Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 34

Scaling with Python: SF Python Meetup, September 2017

0

Share

Download to read offline

This presentation will take you through the requirements, problems, design decisions, implementation details and lessons learned while building a planetary scale network telemetry system at Yahoo. You’ll see all the joys and wonders of using Python for building a scalable, distributed system and all the mistakes (and their solutions too!) we made along the way.

Related Books

Free with a 30 day trial from Scribd

See all

Scaling with Python: SF Python Meetup, September 2017

  1. 1. Scaling with PythonVarun Varma, Principal Engineer vvarun@oath.com
  2. 2. Before we begin… 2
  3. 3. Throughput = Speed x Parallelism 3
  4. 4. Throughput = Speed x Parallelism x Productivity 4
  5. 5. 5 “Optimize for your most expensive resource” - Nick Humrich: Yes, Python is Slow, and I Don’t Care
  6. 6. Problem Statement 6
  7. 7. 7 Collect, store, analyze and visualize network telemetry
  8. 8. Scale: Orders of Magnitude 8 100KServers 10K Network Devices 100 Network Sites 1MTime Series 60 Seconds 10 Systems Replaced
  9. 9. Architecture 9
  10. 10. • Multiple methods to collect data: SNMP, APIs, Streaming • Horizontal Scalability • No Single Point Of Failure • Survive Network Partitions 10 System Requirements
  11. 11. • Configuration Parsing • Logging Management • Plugin Management • Work Queue Management • Message Bus • Distributed Locking and Leader Election • Persistence • Caching • Federation 11 Framework Requirements
  12. 12. 12 Tech Stack Framework Requirement Choice Language Python 2.7 Configuration Parsing ConfigObj Logging Logging Facility + rsyslog + Splunk Plugin Management yapsy Work Queue Management Celery Message Bus kafka-python + Kafka Distributed Locking, Leader Election Kazoo + Zookeeper Persistence Django + MySQL, OpenTSDB* Caching redis-py + Redis Federation Django + MySQL
  13. 13. 13 Platform Celery Redis Zookeeper Kafka OpenTSDB Plugin Framework Discovery Plugins Polling Plugins Device Specific Plugins (SNMP, API) CMDB Chef
  14. 14. 14 “Distributed Software is tough. It’s tougher when you’re stupid” - Me
  15. 15. Scaling Vertically: aka Speed 15
  16. 16. 16 Profile it! Our single slowest operation? JSON Schema Validation :/
  17. 17. https://wiki.python.org/moin/PythonSpeed https://wiki.python.org/moin/PythonSpeed/PerformanceTips • List comprehensions • Built-ins • Local vs. global • …etc. 17 Basics
  18. 18. • cProfile • Built in since Python 2.5 • pstats lets you do slicing/dicing/reporting • Use with a signal handler to profile daemon processes • objgraph • Lets you hunt down memory leaks • Draw graphs of object counts and relations 18 Tools
  19. 19. 19 cProfile import cProfile import re cProfile.run('re.compile("foo|bar")', 'restats') 197 function calls (192 primitive calls) in 0.002 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.001 0.001 <string>:1(<module>) 1 0.000 0.000 0.001 0.001 re.py:212(compile) 1 0.000 0.000 0.001 0.001 re.py:268(_compile) 1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset) 1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset) 4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction) 3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile)
  20. 20. 20 objgraph
  21. 21. cDecimal vs. Decimal (in Python < 3.3): Pi, 64-bit, 10,000 iterations, 3.16GHz Core 2 Duo Source: http://www.bytereef.org/mpdecimal/benchmarks.html 21 Use C Extension Modules Digits floats decimal cdecimal cdecimal-nt gmpy 9 0.12s 17.61s 0.27s 0.24s 0.52s 19 - 42.75s 0.58s 0.55s 0.52s 38 - - 1.32s 1.21s 1.07s 100 - - 4.52s 4.08s 3.57s
  22. 22. 22 Cache Properties https://github.com/pydanny/cached-property
  23. 23. Scaling Horizontally: aka Parallelism 23
  24. 24. 24 Celery! Scale across processes, CPUs and hosts How Celery fixed Python's GIL problem http://www.celeryproject.org/
  25. 25. 25 Choose and test dependent systems that scale horizontally This is hard to go back and fix
  26. 26. 26 Compare system performance with all features MySQL > Redis, if you want clustering, AAA, TLS, indexing
  27. 27. Miscellanea 27
  28. 28. 28 Horizontal > Vertical More is better than fast
  29. 29. 29 Use an IDE Autocomplete is very useful. Also Type Hints.
  30. 30. 30 Logging is important – and slow Bake it in day 1. Keep it consistent. Rate limit.
  31. 31. 31 There is something called ‘catastrophic backtracking’ re.search(‘(x+x+)+y’ , ‘x’ * 40) https://blog.codinghorror.com/regex-performance/
  32. 32. TBD 32 Python 3.6, cython, Async I/O, More C modules
  33. 33. Questions? 33
  34. 34. Workflow 34 Collect Data Place on Message Bus Post Process Place on Message Bus OpenTSDB MySQL API UI CLI Graphing Alerting Grid Analytics/Repo rting

Editor's Notes

  • - Talk about Oath
    - Scaling
  • - Lingua franca of the network automation wold
  • - Use built-ins – people much smarter than you and me have spent a lot of time optimizing these
  • Not only memory, but also performance
    Need to understand Python object details
  • @cached_property decorator
    @threaded_cached_property
  • - Clustering, AAA, TLS, indexing
  • - Build it so that you *can* throw more hardware at the problem!
    - It might be a long time before it becomes financially unviable and you have to optimize – but that’s generally a good problem to have
  • - PEP8 hints – consistency across teams
  • ×