Python Through the
Back Door
!

CodeMash 2014
!

Roy Rapoport
@royrapoport rsr@netflix.com
www.linkedin.com/in/royrapoport
A Word About Me
• About 20 years in technology
• Systems engineering, networking,
software development, QA, release
management
• Time at Netflix: 1655 days (4y:6m:11d)
• Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of
Python Things[tm]
• Current role: Insight Engineering
•Real-Time Operational Insight

!2
Problem
Python
People

Stories We Tell
• Technical Problems
• Howler Monkey
• Alerting
• Python As a First-Class Language
• Culture and People

!3
People

“Netflix Company Profile
Now via self service*
Go to your favorite Python REPL and type the following:
import re, requests!
content = requests.get(“http://ir.netflix.com").content!
content = content.replace(“ ", " ")!
p = re.compile(r”.*over (d+) .*in (d+)”, re.S)!
m = re.match(p, content)!
print "Netflix is the world's leading Internet !
subscription service for enjoying TV and movies, !
with more than {} million subscribers in {} !
countries.”.format(m.group(1), m.group(2))!

*No whining. Remember that you’ll never again need to wait for me to update this
slide like you had to wait for database access when you started your last job.”
- Jay Zarfoss, http://www.slideshare.net/zarfide
!4
People

Design Your Culture for
Desired Outcomes
1. Speed of innovation!
2. Availability!
3. Cost

!5
People

Design For What’s Important
Freedom and Responsibility!
Hire Smart Experienced People!
Set them Loose!
Watch Magic Happen

!6
People

Policies
Raise your hand if you love them
People

Policies
(How They Usually Work)

8
People

Policies
(How They Usually Work)

11/27/2006	

“Sorry, but the standard monitor...is the HP 17 flat panel. I
actually told a director last week that they couldn't have a
19 for a new office so I am not picking on just you.”	


9
People

Policies
(How They Usually Work)

!

6/18/2007	

“There is a request for quantity 2 17” flat panels. We have
received direction from the CIO that no one will have
more than 1 flat panel monitor. I just wanted to let you
know that there will only be one monitor ordered ... The
17” is our only standard except for Legal.”

10
People

Policies
(How They Usually Work)

•Prescriptive	

•Inflexible	

•Determined by others	

•Slow to change
11
People

Policies
@nflx

12
People

Policies
@nflx
!
01/30/2013, 15:22 PST	

I'd like to request a 15” MBP w/ Retina Display. I don't know how much
you guys care about CPU specs -- it looks like the bump from 2.3GHz to
2.6GHz is reasonably priced at only about $100, so if it works for you
that'd be nice. 16GB RAM and at least 512GB drive. 	

!
01/31/2013	

12:00 PST: “Forwarded to IT Purchasing to provide a quote to Roy for
the requested configuration.”	

13:33 PST: “Requesting quote from vendor”	

15:32 PST: “Attached is the quote, please approve and I’ll place order”	

15:46 PST: “Thanks for the rapid response. Please order.”	

15:52 PST: “Ordered. PO #...”

13
People

Policies
@nflx

•
•
•

Descriptive	


•

Evolve quickly

As flexible as we are	

Describe what we
choose to do/get	


14
Problem

The Before Time
Dozens of SSL Certificates	

Decentralized	

Kept Expiring	

Hilarity would ensue	

Amazon Resources	

“No Preset Limit”	

You know when you hit it	

Hilarity would ensue
15
Python

The Before Time
•

Well-developed Developer Ecosystem	

•
•

DB Client	


•

Credentials Management	


•

Memory Object Cache	


•

Server Infrastructure	


•
•

Discovery	


Telemetry	


You wanted that for Java, right?
16
The Before Time
•

Just moved from IT/Ops	


•

Problem
Python

Formally tasked with SSL cert
issue as quarterly goal	

•

•
•

Limits issue “tacked” on	


Happily hackily Pythonic	

Presenter Selfie

Didn’t know Java

17
Problem

Architecture
7/10/2011 Ready for beta
Cassandra
ELB

EC2

CherryPy
Filesystem
Certificate

IP Range

Nagger
DNS Domain

18
Python

Persistence
•

Started with SimpleDB

•

Then Cassandra

•

Drove creation of …
•
•

•

import Discovery
import Cassandra

And a design error
!19
Python

Abstraction
•

“The process of separating
ideas from specific instances of
those ideas at work.”

•

Some abstraction: Good

•

Too much abstraction burns
your tongue*

•

Known bug
* Mixed metaphor is mixed
!20
Problem

Architecture

21
Problem

Architecture

22
Problem

Alerting
• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File Tickets
• Send alerts to NOC
• Completely separate from telemetry system

Copyright USAID Microlinks. CC Attribution 2.0

23
Problem

Alerting
• Enterprise IT Solution
• Managed by the Enterprise IT Alerting People
• File Tickets
• Send alerts to NOC
• Completely separate from telemetry system

Copyright: http://www.flickr.com/photos/s_w_ellis
CC Attribution 2.0 License

24
Problem

Alerting
Monitoring

Alerting

Notification

• Already had a good telemetry system
• Outsourced notification to PagerDuty
• No alert routing (and deduplication)
25
People

Alerting
•Space crunch
•New cube mate: @jedberg
•One Month Deadline

26
Problem

Alerting
Central	

Alert 	

Gateway

Atlas

Pager	

Duty

alerting

Amazon	

SES

api

Let’s Wake Someone Up
(Livecoding for Fun and Profit)
27
Python

But Now We Need…
• import Discovery.publish
• import EVCache
• import EpicMetrics
• import Archaius
• import Asgard.Registry
• import AKMS
28
Python

AKMS?
In [1]: import AKMS!
In [2]: ak = AKMS.AKMS(RoyWasHere)!
In [3]: ak.keys()!
Out[3]: ['MLQBAYLLDIGXPBQB', ‘eMr+Mdhv+E4xD+paPCxXF+’]!
In [4]: a, s = ak.keys()!
In [5]: s3_object = boto.connect_s3(a, s)!
In [6]: ak = AKMS.AKMS(RoyWasHere, version=2)!
In [7]: ak.keys()!
Out[7]: [‘yn[…]G’, ‘rV[…]bKfSUHDSA’, ‘reallyLongStringElided']!
In [8]: ak.expiration!
Out[8]: 1389165118!
In [9]: a, s, s2 = ak.keys()!
In [10]: s3_object = boto.connect_s3(a, s, security_token=s2)

29
People

So AKMS
• Server more paranoid than most
• Making Python library was a pain
• Remember Jay?
• High lateral trust
• Prioritization autonomy
• Never ask for permission
30
People

Lateral Trust
• Humans are good game players
• What are the rules?
• Zero-sum games: I want you to lose
• Stack ranking
• Fixed bonus / raise pools
31
People

Lateral Trust @nflx
• No fixed pools for anything
• No ranking (at all)
• Reviews != raises
• Smart people generally make good decisions
• Global optimization
32
People

Subordinate Trust @nflx
• Focus on results
• Unleash employees
• Encourage disagreement
• Accept dissent
• Job #1: Attract and retain world-class talent
33
People

Manager Trust @nflx
• Question, question, question
• Drive for context, not decisions
• Nobody is above questioning

34
Python

Field of Dreams
• Turned out I wasn’t the only one
• Striking the right balance between MVP
and future growth (maybe)
• And if it hadn’t … it’d still have been the
right choice

35
A Virtuous Cycle
• Requirement for high impact
• No process for permission
• Unorthodox language choice
• Lateral support for development
• Increased adoption
•…
• Profit!*
* (or at least a new standard)
36

Python
People
Problem
Tell me what you think.
You know you want to.
http://bit.ly/netflixcmpython
!37
Attributions
http://www.flickr.com/photos/watchsmart/	

http://www.flickr.com/photos/yaketyyakyak/	

Pem Dorjee Sherpa

38

Python Through the Back Door: Netflix Presentation at CodeMash 2014

  • 1.
    Python Through the BackDoor ! CodeMash 2014 ! Roy Rapoport @royrapoport rsr@netflix.com www.linkedin.com/in/royrapoport
  • 2.
    A Word AboutMe • About 20 years in technology • Systems engineering, networking, software development, QA, release management • Time at Netflix: 1655 days (4y:6m:11d) • Before at Netflix: Service Delivery in the IT/Ops, troubleshooter, Builder of Python Things[tm] • Current role: Insight Engineering •Real-Time Operational Insight !2
  • 3.
    Problem Python People Stories We Tell •Technical Problems • Howler Monkey • Alerting • Python As a First-Class Language • Culture and People !3
  • 4.
    People “Netflix Company Profile Nowvia self service* Go to your favorite Python REPL and type the following: import re, requests! content = requests.get(“http://ir.netflix.com").content! content = content.replace(“ ", " ")! p = re.compile(r”.*over (d+) .*in (d+)”, re.S)! m = re.match(p, content)! print "Netflix is the world's leading Internet ! subscription service for enjoying TV and movies, ! with more than {} million subscribers in {} ! countries.”.format(m.group(1), m.group(2))! *No whining. Remember that you’ll never again need to wait for me to update this slide like you had to wait for database access when you started your last job.” - Jay Zarfoss, http://www.slideshare.net/zarfide !4
  • 5.
    People Design Your Culturefor Desired Outcomes 1. Speed of innovation! 2. Availability! 3. Cost !5
  • 6.
    People Design For What’sImportant Freedom and Responsibility! Hire Smart Experienced People! Set them Loose! Watch Magic Happen !6
  • 7.
  • 8.
  • 9.
    People Policies (How They UsuallyWork) 11/27/2006 “Sorry, but the standard monitor...is the HP 17 flat panel. I actually told a director last week that they couldn't have a 19 for a new office so I am not picking on just you.” 9
  • 10.
    People Policies (How They UsuallyWork) ! 6/18/2007 “There is a request for quantity 2 17” flat panels. We have received direction from the CIO that no one will have more than 1 flat panel monitor. I just wanted to let you know that there will only be one monitor ordered ... The 17” is our only standard except for Legal.” 10
  • 11.
    People Policies (How They UsuallyWork) •Prescriptive •Inflexible •Determined by others •Slow to change 11
  • 12.
  • 13.
    People Policies @nflx ! 01/30/2013, 15:22 PST I'dlike to request a 15” MBP w/ Retina Display. I don't know how much you guys care about CPU specs -- it looks like the bump from 2.3GHz to 2.6GHz is reasonably priced at only about $100, so if it works for you that'd be nice. 16GB RAM and at least 512GB drive. ! 01/31/2013 12:00 PST: “Forwarded to IT Purchasing to provide a quote to Roy for the requested configuration.” 13:33 PST: “Requesting quote from vendor” 15:32 PST: “Attached is the quote, please approve and I’ll place order” 15:46 PST: “Thanks for the rapid response. Please order.” 15:52 PST: “Ordered. PO #...” 13
  • 14.
  • 15.
    Problem The Before Time Dozensof SSL Certificates Decentralized Kept Expiring Hilarity would ensue Amazon Resources “No Preset Limit” You know when you hit it Hilarity would ensue 15
  • 16.
    Python The Before Time • Well-developedDeveloper Ecosystem • • DB Client • Credentials Management • Memory Object Cache • Server Infrastructure • • Discovery Telemetry You wanted that for Java, right? 16
  • 17.
    The Before Time • Justmoved from IT/Ops • Problem Python Formally tasked with SSL cert issue as quarterly goal • • • Limits issue “tacked” on Happily hackily Pythonic Presenter Selfie Didn’t know Java 17
  • 18.
    Problem Architecture 7/10/2011 Ready forbeta Cassandra ELB EC2 CherryPy Filesystem Certificate IP Range Nagger DNS Domain 18
  • 19.
    Python Persistence • Started with SimpleDB • ThenCassandra • Drove creation of … • • • import Discovery import Cassandra And a design error !19
  • 20.
    Python Abstraction • “The process ofseparating ideas from specific instances of those ideas at work.” • Some abstraction: Good • Too much abstraction burns your tongue* • Known bug * Mixed metaphor is mixed !20
  • 21.
  • 22.
  • 23.
    Problem Alerting • Enterprise ITSolution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC • Completely separate from telemetry system Copyright USAID Microlinks. CC Attribution 2.0 23
  • 24.
    Problem Alerting • Enterprise ITSolution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC • Completely separate from telemetry system Copyright: http://www.flickr.com/photos/s_w_ellis CC Attribution 2.0 License 24
  • 25.
    Problem Alerting Monitoring Alerting Notification • Already hada good telemetry system • Outsourced notification to PagerDuty • No alert routing (and deduplication) 25
  • 26.
    People Alerting •Space crunch •New cubemate: @jedberg •One Month Deadline 26
  • 27.
  • 28.
    Python But Now WeNeed… • import Discovery.publish • import EVCache • import EpicMetrics • import Archaius • import Asgard.Registry • import AKMS 28
  • 29.
    Python AKMS? In [1]: importAKMS! In [2]: ak = AKMS.AKMS(RoyWasHere)! In [3]: ak.keys()! Out[3]: ['MLQBAYLLDIGXPBQB', ‘eMr+Mdhv+E4xD+paPCxXF+’]! In [4]: a, s = ak.keys()! In [5]: s3_object = boto.connect_s3(a, s)! In [6]: ak = AKMS.AKMS(RoyWasHere, version=2)! In [7]: ak.keys()! Out[7]: [‘yn[…]G’, ‘rV[…]bKfSUHDSA’, ‘reallyLongStringElided']! In [8]: ak.expiration! Out[8]: 1389165118! In [9]: a, s, s2 = ak.keys()! In [10]: s3_object = boto.connect_s3(a, s, security_token=s2) 29
  • 30.
    People So AKMS • Servermore paranoid than most • Making Python library was a pain • Remember Jay? • High lateral trust • Prioritization autonomy • Never ask for permission 30
  • 31.
    People Lateral Trust • Humansare good game players • What are the rules? • Zero-sum games: I want you to lose • Stack ranking • Fixed bonus / raise pools 31
  • 32.
    People Lateral Trust @nflx •No fixed pools for anything • No ranking (at all) • Reviews != raises • Smart people generally make good decisions • Global optimization 32
  • 33.
    People Subordinate Trust @nflx •Focus on results • Unleash employees • Encourage disagreement • Accept dissent • Job #1: Attract and retain world-class talent 33
  • 34.
    People Manager Trust @nflx •Question, question, question • Drive for context, not decisions • Nobody is above questioning 34
  • 35.
    Python Field of Dreams •Turned out I wasn’t the only one • Striking the right balance between MVP and future growth (maybe) • And if it hadn’t … it’d still have been the right choice 35
  • 36.
    A Virtuous Cycle •Requirement for high impact • No process for permission • Unorthodox language choice • Lateral support for development • Increased adoption •… • Profit!* * (or at least a new standard) 36 Python People Problem
  • 37.
    Tell me whatyou think. You know you want to. http://bit.ly/netflixcmpython !37
  • 38.