Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Managing Python at scale
without breaking the bank
Michael (Misha) Tselman
PyData NY 2017
Agenda
• J.P. Morgan and Athena
• Objectives
• Continuous delivery
• Under the hood
• Challenges
• Conclusions
• Q&A
J.P. Morgan
• One of the world’s biggest banks
• $2.5 trillion assets
• $95 billion revenue
• Processing $5 trillion payme...
Athena
• Python-based Pricing, Trading, Risk Management, and Analytics
platform with tools for Data Science and Machine Le...
Athena
Foundation
• Hydra ( globally replicated object database )
• Reactive Athena ( C++/Python reactive dataflow framewo...
Objectives
• Keep end-users and clients happy 
• Ensure robustness and stability of our production systems
• Keep develop...
Approach
• Conceptually:
• Continuous delivery:
• 10,000 – 15,000 production changes every week.
• Full visibility of the ...
Continuous delivery
Write code & tests Test Commit Ask for a bless Push Run
PROS
• Time to market
• User satisfaction
• De...
Layering of changes / Effective runtime
Developer’s
layer
B3 C2
Shared staging /
UAT
A2
Effective
Runtime
A2 B3 C2 D1
B2
P...
Alternatives to filesystem based source
DB-LDN DB-NYC DB-TKO
“lib.foo”
“
def hello():
print ‘world’
“lib.bar” def hello():...
Python and Binary Runtime
prod old prod prod new
Python Source
C++ & 3rd party
Some Challenges
• Open source package upgrades
• API changes
• Change of pickled/stored representation
• Numerical changes...
Conclusions
• Python’s flexibility makes things easier
• Good integration tests ensure compatibility and consistency
• Mod...
References
• J.P. Morgan
http://www.jpmorgan.com/techcareers
• The motivation for a monolithic codebase
http://cacm.acm.or...
Q&A
Upcoming SlideShare
Loading in …5
×

1

Share

Download to read offline

Managing python at scale without breaking the bank

Download to read offline

By Misha Tselman
PyData New York City 2017

Athena is the largest Python-based ecosystem in J.P. Morgan Chase and among the largest in the world. Maintaining consistency and stability while enabling creativity, research, and speed of development are critical for supporting our clients and staying ahead. Can we fit everything into the same platform?

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Managing python at scale without breaking the bank

  1. 1. Managing Python at scale without breaking the bank Michael (Misha) Tselman PyData NY 2017
  2. 2. Agenda • J.P. Morgan and Athena • Objectives • Continuous delivery • Under the hood • Challenges • Conclusions • Q&A
  3. 3. J.P. Morgan • One of the world’s biggest banks • $2.5 trillion assets • $95 billion revenue • Processing $5 trillion payments every day • 230,000+ employees globally • One of the world’s biggest tech companies • 44,000+ employees in Technology • $9.5 billion annual investment in technology and innovation
  4. 4. Athena • Python-based Pricing, Trading, Risk Management, and Analytics platform with tools for Data Science and Machine Learning • Thousands of users across multiple business lines • 1500+ Python developers use and contribute to the platform • 150,000 python modules, 35 million lines of python code • 500+ Python packages from the Open Source. • Rapid development and deployment model that puts developers and quants at the heart of the business.
  5. 5. Athena Foundation • Hydra ( globally replicated object database ) • Reactive Athena ( C++/Python reactive dataflow framework ) • Pixie Graph ( directed acyclic dependency graph ) • Athena Application framework based on QT • Athena Web ( tornado, html5, websockets, javascript, web assembly ) • Job scheduling ( ~270,000 jobs daily which kick-off ~1M processes ) • Integration with Compute Grid ( tens of thousands of cores + GPUs )
  6. 6. Objectives • Keep end-users and clients happy  • Ensure robustness and stability of our production systems • Keep developers productive and efficient • Provide quants and data scientists with the best research tools • Encourage sharing and global consistency across business lines
  7. 7. Approach • Conceptually: • Continuous delivery: • 10,000 – 15,000 production changes every week. • Full visibility of the entire code base. Anyone can contribute. • Instant global deployment • Under the hood: • Globally replicated object databases for code (and data) • Monorepo – Monolithic code base • Extensively automated testing
  8. 8. Continuous delivery Write code & tests Test Commit Ask for a bless Push Run PROS • Time to market • User satisfaction • Developer productivity CONS • Fear of change / stability • High reliance on automation • Tricky in distributed systems 10,000 - 15,000 modules pushed to production every week
  9. 9. Layering of changes / Effective runtime Developer’s layer B3 C2 Shared staging / UAT A2 Effective Runtime A2 B3 C2 D1 B2 Production A1 B1 C1 D1 E1 E1
  10. 10. Alternatives to filesystem based source DB-LDN DB-NYC DB-TKO “lib.foo” “ def hello(): print ‘world’ “lib.bar” def hello(): print ‘pydata’ “lib.bar @ 2017-10-01 12:33” “ def hello(): print ‘jpmorgan’ “lib.bar @ 2017-09-21 10:16” “...” • Use globally replicated database • Customize the importer • SourceMarkers - Take advantage of transactions & timestamps
  11. 11. Python and Binary Runtime prod old prod prod new Python Source C++ & 3rd party
  12. 12. Some Challenges • Open source package upgrades • API changes • Change of pickled/stored representation • Numerical changes • Runtime/binary dependencies • Limited branching • Streamlines production • Does not fit some research/experimental workflows • Full reproducibility requires “freezing” all code including the binary train
  13. 13. Conclusions • Python’s flexibility makes things easier • Good integration tests ensure compatibility and consistency • Modules don’t have to be loaded from a filesystem • Production stability does not imply slow delivery and deployment • Open source does not imply free • Shared platform does not imply shared knowledge
  14. 14. References • J.P. Morgan http://www.jpmorgan.com/techcareers • The motivation for a monolithic codebase http://cacm.acm.org/magazines/2016/7/204032-why-google-stores- billions-of-lines-of-code-in-a-single-repository/fulltext
  15. 15. Q&A
  • AnsonAu3

    Dec. 17, 2018

By Misha Tselman PyData New York City 2017 Athena is the largest Python-based ecosystem in J.P. Morgan Chase and among the largest in the world. Maintaining consistency and stability while enabling creativity, research, and speed of development are critical for supporting our clients and staying ahead. Can we fit everything into the same platform?

Views

Total views

2,434

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

31

Shares

0

Comments

0

Likes

1

×