Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pycon2017 instagram keynote

460 views

Published on

PyCon 2017 Keynote Instagram, Lisa Guo, Hui Ding

Published in: Engineering
  • Be the first to comment

Pycon2017 instagram keynote

  1. 1. PYTHON@INSTAGRAM Hui Ding & Lisa Guo May 20, 2017
  2. 2. @TechCrunch
  3. 3. STORIES DIRECT LIVE EXPLORE
  4. 4. “INSTAGRAM, WHAT THE HECK ARE YOU DOING AT PYCON?” CIRCA PYCON 2015 (MONTREAL)
  5. 5. “HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?”
  6. 6. “HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?” "WHY HAVEN’T YOU RE-WRITTEN EVERYTHING IN NODE.JS YET?"
  7. 7. WHY DID INSTAGRAM
 CHOOSE PYTHON?
  8. 8. AND HERE’S WHAT THEY FOUND OUT: “Friends don’t let friends use RoR”
  9. 9. THINGS INSTAGRAM LOVES ABOUT PYTHON Easy to become productive Practicality Easy to grow
 engineering team Popular language
  10. 10. THINGS INSTAGRAM LOVES ABOUT DJANGO Maturity of the language and Django framework Use Django user model for supporting 3B+ registered users
  11. 11. WE HAVE TAKEN OUR PYTHON+DJANGO STACK QUITE FAR 1 Sharded database support 2 Run our stack across multiple geographically distributed data centers3 Disable garbage-collection to improve memory utilization
  12. 12. WE HAVE TAKEN OUR PYTHON+DJANGO STACK QUITE FAR
  13. 13. WE HAVE TAKEN OUR PYTHON+DJANGO STACK QUITE FAR
  14. 14. PYTHON IS SIMPLE AND CLEAN, AND FAVORS PRAGMATISM. 1 Scope the problem, AKA, do simple things first 2 Use proven technology 3 User first: focus on adding value to user facing features
  15. 15. Subtitle AT INSTAGRAM, OUR BOTTLENECK IS
 DEVELOPMENT VELOCITY, 
 NOT PURE CODE EXECUTION BUT PYTHON IS STILL SLOW, RIGHT?
  16. 16. SCALING PYTHON TO SUPPORT USER AND FEATURE GROWTH 20 40 60 80 00 0 2 4 6 8 10 12 14 16 18 20 22 24Server growthUser growth
  17. 17. PYTHON EFFICIENCY STRATEGY 1 Build extensive tools to profile and understand perf bottleneck 2 Moving stable, critical components to C/C++, e.g., memcached access 3 Async? New python runtime?4 Cythonization
  18. 18. ROAD TO PYTHON 3
  19. 19. 1 Motivation 2 Strategy 3 Challenges 4 Resolution
  20. 20. MOTIVATION
  21. 21. def compose_from_max_id(max_id): ‘’’ @param str max_id ’’’ MOTIVATION: DEV VELOCITY
  22. 22. MOTIVATION: PERFORMANCE uWSGI/web async tier/celery media storage user/media metadata search/ranking Python
  23. 23. MOTIVATION: PERFORMANCE N processes M CPU cores N >> M Request
  24. 24. MOTIVATION: PERFORMANCE N processes M CPU cores N == M Request
  25. 25. MOTIVATION: COMMUNITY
  26. 26. STRATEGIES
  27. 27. SERVICE DOWNTIME
  28. 28. SERVICE DOWNTIME PRODUCT SLOWDOWN
  29. 29. MASTER LIVE DEVELOP/TEST/DOGFOOD
  30. 30. Create separate branch? MIGRATION OPTIONS • Branch sync overhead, error prone • Merging back will be a risk • Lose the opportunity to educate
  31. 31. MIGRATION OPTIONS One endpoint at a time?Create separate branch? • Common modules across end points • Context switch for developers • Overhead of managing separate pools
  32. 32. MIGRATION OPTIONS One endpoint at a time?Create separate branch? Micro services? • Massive code restructuring • Higher latency • Deployment complexity
  33. 33. MIGRATION OPTIONS One endpoint at a time?Create separate branch? Micro services?
  34. 34. MAKE MASTER COMPATIBLE
  35. 35. MASTER PYTHON3 PYTHON2
  36. 36. Third-party packages
 3-4 months Codemod
 2-3 months Unit tests
 2 months Production rollout
 4 months
  37. 37. THIRD-PARTY PACKAGES Rule: No Python3, no new package
  38. 38. Rule: No Python3, no new package Delete unused, incompatible packages twisted django-paging django-sentry django-templatetag-sugar dnspython enum34 hiredis httplib2 ipaddr jsonfig pyapns phpserialize python-memcached thrift THIRD-PARTY PACKAGES
  39. 39. Upgraded packages Rule: No Python3, no new package Delete unused, incompatible packages THIRD-PARTY PACKAGES
  40. 40. CODEMOD modernize -f libmodernize.fixes.fix_filter <dir> -w -n raise, metaclass, methodattr, next, funcattr, library renaming, range, maxint/maxsize, filter->list comprehension, long integer, itertools, tuple unpacking in method, cython, urllib parse, StringiO, context mgr/decorator, ipaddr, cmp, except, nested, dict iter fixes, mock
  41. 41. Failed include_list: passed tests Passed UNIT TESTS
  42. 42. FailedPassed UNIT TESTS
  43. 43. exclude_list: failed tests FailedPassed UNIT TESTS
  44. 44. • Not 100% code coverage • Many external services have mocks • Data compatibility issues typically do not show up in unit tests UNIT TESTS: LIMITS
  45. 45. 100% 20% 0.1% EMPLOYEES DEVELOPERS ROLLOUT
  46. 46. 100% 20% 0.1% EMPLOYEES DEVELOPERS ROLLOUT
  47. 47. CHALLENGES
  48. 48. CHALLENGES 1 Unicode 2 Data format incompatible 3 4 Dictionary ordering Iterator
  49. 49. from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/BYTES
  50. 50. from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/BYTES from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals
  51. 51. CHALLENGE: UNICODE/STR/BYTES mymac = hmac.new(‘abc’) TypeError: key: expected bytes or bytearray, but got 'str' value = ‘abc’ if isinstance(value, six.text_type): value = value.encode(encoding=‘utf-8’) mymac = hmac.new(value) Error Fix
  52. 52. CHALLENGE: UNICODE/STR/BYTES ensure_binary() ensure_str() ensure_text() mymac = hmac.new(ensure_binary(‘abc’)) Helper functions Fix
  53. 53. CHALLENGE: PICKLE memcache_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read Memcache Me Python3 Others Python2
  54. 54. CHALLENGE: PICKLE Memcache Me Python3 Others Python2 memcache_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read 4 ValueError: unsupported pickle protocol: 4 2
  55. 55. CHALLENGE: PICKLE pickle.dumps({'a': '爱'}, 2) UnicodeDecodeError: 'ascii' codec can’t decode byte 0xe9 in position 0: ordinal not in range(128) memcache_data = pickle.dumps(data, 2) Write Python2 writes Python3 reads
  56. 56. pickle.dumps({'a': '爱'}, 2){u'a': u'u7231'} != {'a': '爱'} Python2 reads Python3 writes memcache_data = pickle.dumps(data, 2) Write CHALLENGE: PICKLE
  57. 57. CHALLENGE: PICKLE Memcache Python3 Python2 4 Memcache 2
  58. 58. map() filter() dict.items()
  59. 59. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess, CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  60. 60. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess, CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  61. 61. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess, CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  62. 62. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = list(map(BuildProcess, CYTHON_SOURCES)) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  63. 63. '{"a": 1, "c": 3, "b": 2}' CHALLENGE: DICTIONARY ORDER Python2 Python3.5.1 >>> testdict = {'a': 1, 'b': 2, 'c': 3} >>> json.dumps(testdict) Python3.6 Cross version '{"c": 3, "b": 2, "a": 1}' '{"c": 3, "a": 1, "b": 2}' '{"a": 1, "b": 2, "c": 3}' >>> json.dumps(testdict, sort_keys=True) '{"a": 1, "b": 2, "c": 3}'
  64. 64. ALMOST THERE...
  65. 65. CPU instructions per request max Requests Per Second -12% 0% Memory configuration difference? 12%
  66. 66. if uwsgi.opt.get('optimize_mem', None) == 'True': optimize_mem() b
  67. 67. RESOLUTION
  68. 68. FEB 2017 PYTHON3 PYTHON2
  69. 69. Saving of 30%
 (on celery) INSTAGRAM ON PYTHON3 Saving of 12%
 (on uwsgi/django) CPU MEMORY
  70. 70. June 2016 Python3 migration Sept 2015 Dec 2016 Apr 2017 400M 500M 600M
  71. 71. Video View Notification Save Draft Comment Filtering Story Viewer Ranking First Story Notification Self-harm Prevention Live
  72. 72. MOTIVATION: TYPE HINTS Type hints - 2% done def compose_from_max_id(max_id: Optional[str]) -> Optional[str]: MyPy and typeshed contribution Tooling - collect data and suggest type hints
  73. 73. MOTIVATION: ASYNC IO Asynchronize web framework Parallel network access within a request
  74. 74. MOTIVATION: COMMUNITY Benchmark web workload Run time, memory profiling, etc

×