Successfully reported this slideshow.
Your SlideShare is downloading. ×

PyBay 2018: Production-Ready Python Applications

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 48 Ad

PyBay 2018: Production-Ready Python Applications

Download to read offline

In 2016, Susan Fowler released the 'Production Ready Microservices' book. This book sets an industry benchmark on explaining how microservices should be conceived, all the way through to documentation. So how does this translate for Python applications? This session will explore how to expertly deploy your Python micro-service to production.

In 2016, Susan Fowler released the 'Production Ready Microservices' book. This book sets an industry benchmark on explaining how microservices should be conceived, all the way through to documentation. So how does this translate for Python applications? This session will explore how to expertly deploy your Python micro-service to production.

Advertisement
Advertisement

More Related Content

Similar to PyBay 2018: Production-Ready Python Applications (20)

More from Michael Kehoe (20)

Advertisement
Advertisement

PyBay 2018: Production-Ready Python Applications

  1. 1. Production-Ready Python applications Michael Kehoe Staff Site Reliability Engineer
  2. 2. Today’s agenda 1 Introduction 2 Tenets of Readiness 3 Building Production-Ready Python Applications 4 Recap
  3. 3. Introduction
  4. 4. Michael Kehoe $ /USR/BIN/WHOAMI • Staff Site Reliability Engineer @ LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Worked on: • Networks • Micro-services • Traffic Engineering • Databases
  5. 5. Production-SRE Team @ LinkedIn $ /USR/BIN/WHOAMI • Disaster Recovery - Planning & Automation • Incident Response – Process & Automation • Visibility Engineering – Making use of operational data • Reliability Principles – Defining best practice & automating it
  6. 6. Production-Ready Python Applications • This talk is a high-level overview of what it takes to build a production-ready service • Each of the topics could be its own separate talk • Focus on standard open-source options • Number of options not mentioned
  7. 7. What makes an application Production-Ready?
  8. 8. O’Reilly 2017 Susan J. Fowler Production-Ready Microservices
  9. 9. “A production-ready application or service is one that can be trusted to serve production traffic…” S U S A N J . F O W L E R
  10. 10. “… We trust it to behave reasonably, we trust it to perform reliably, we trust it to get the job done and to do its job well with very little downtime.” S U S A N J . F O W L E R
  11. 11. Tenets of readiness
  12. 12. Tenets of Readine ss 1 Stability 2 Reliability 3 Scalability 4 Performance 5 Fault Tolerance 6 Disaster Recovery 7 Monitoring 8 Documentation
  13. 13. Building Production-Ready Python applications
  14. 14. Stability
  15. 15. Tenets of Readiness STABILITY • Stable development cycle • Code Linting • Code Review • Central Repository • Build system • See Fabio Fleitas’s talk from Saturday
  16. 16. Tenets of Readiness STABILITY • Stable deployment cycle • Canary/ Staging environment • Reliable Deployment via: • Docker/ Kubernetes • Heroku • CD tools
  17. 17. Reliability
  18. 18. Tenets of Readiness RELIABILITY • Dependency Management • Onboarding + Deprecation procedures • See Documentation section • Routing + Discovery • Etcd • PyDiscover • Consul-service-discovery
  19. 19. Scalability
  20. 20. Tenets of Readiness SCALABILITY • Understanding growth-scales • Qualitative vs quantitative growth scale • Resource awareness • Dependency scaling • What services/ databases need to scale
  21. 21. Performance
  22. 22. Tenets of Readiness PERFORMANCE • Constant performance evaluation • Understand how to benchmark application • Traffic management • Understand traffic pattern performance • Capacity Planning • Have the right metrics
  23. 23. Fault Tolerance
  24. 24. Tenets of Readiness FAULT TOLERANCE • Avoiding Single Points of Failure (SPOF) • Use multiple instances behind a load balancer • Catch exceptions (meaningfully)
  25. 25. Tenets of Readiness FAULT TOLERANCE • Resiliency Engineering • Add testing to verify that non-standard behavior is handled correctly • Run chaos experiments: • DNS/ Network failures • Consume disk space/ IO • Consume CPU
  26. 26. Disaster Recovery
  27. 27. Tenets of Readiness DISASTER RECOVERY • Understand common failures • Have an application-specific disaster- recovery plan in place • Have a general incident management plan
  28. 28. Tenets of Readiness DISASTER RECOVERY: DAEMONS
  29. 29. Tenets of Readiness DISASTER RECOVERY: FLASK APP
  30. 30. Monitoring
  31. 31. Tenets of Readiness MONITORING • Logging • Tracing • Metrics  Dashboards/ Alerts
  32. 32. Tenets of Readiness MONITORING: LOGGING https://docs.python.org/2/library/logging.handlers.html#sysloghandler
  33. 33. Tenets of Readiness MONITORING: TRACING • Multiple (free/ open-source) Options • Opentracing • Jaeger • Zipkin (various community libraries)
  34. 34. Tenets of Readiness MONITORING: METRICS • Multiple (free/ open-source) Options • Statsd • Jaeger • prometheus
  35. 35. Monitoring: Metrics
  36. 36. Documentation
  37. 37. In-Code Documentation What you should be covering • Function/ API & Class docstrings • Code usage documentation • Project documentation • Onboarding • Contribution • Testing
  38. 38. Tenets of Readiness DOCUMENTATION How to do it? • Sphinx • JavaDoc • Doxygen • Etc….
  39. 39. Documentation RESTRUCTURED TEXT EXAMPLE https://github.com/requests/requests/blob/master/requests/api.py
  40. 40. Documentation RESTRUCTURED TEXT EXAMPLE http://docs.python-requests.org/en/master/api/
  41. 41. Documentation RESTRUCTURED TEXT EXAMPLE https://github.com/requests/requests/blob/master/requests/api.py
  42. 42. Documentation RESTRUCTURED TEXT EXAMPLE http://docs.python-requests.org/en/master/api/
  43. 43. Documentation https://oncall.tools/docs/
  44. 44. Documentation • Relevant PEP’s: • PEP -257: Docstring Conventions • PEP-287: reStructured Docstring Format (Official Python Documentation Standard) • Further info: • https://pythonhosted.org/an_example_ pypi_project/sphinx.html • https://realpython.com/documenting- python-code
  45. 45. Recap
  46. 46. Tenets of Readine ss 1 Stability 2 Reliability 3 Scalability 4 Performance 5 Fault Tolerance 6 Disaster Recovery 7 Monitoring 8 Documentation
  47. 47. Further Resources • Find me at: • michael-kehoe.io • @matrixtek • linkedin.com/in/michaelkkehoe • Slides will be available in multiple locations shortly

Editor's Notes

  • Michael
    So we’re apart of a team at LinkedIn called Production-SRE
    The key tenants of production-sre at LinkedIn is:
    Assist in restoring stability during site-critical issues
    Developing applications to reduce MTTD and MTTR
    Provide direction and guidelines for site-troubleshooting
    Build tools for efficient site-issue troubleshooting, issue detection and correlation

    As this presentation goes on, you’ll notice how an Event Correlation system fits into these
  • Michael
    So we’re apart of a team at LinkedIn called Production-SRE
    The key tenants of production-sre at LinkedIn is:
    Assist in restoring stability during site-critical issues
    Developing applications to reduce MTTD and MTTR
    Provide direction and guidelines for site-troubleshooting
    Build tools for efficient site-issue troubleshooting, issue detection and correlation

    As this presentation goes on, you’ll notice how an Event Correlation system fits into these
  • In this context, stability is about having a consistent pre-production experience
    Development
    Continuous Integration with central code repo and code review
    Reproducible builds
    Unit/ Integration testing
    Deployment
    Simple repeatable deploys
    Canary/ Dark Canary/ Staging
    Canary testing
  • In this context, stability is about having a consistent pre-production experience
    Development
    Continuous Integration with central code repo and code review
    Reproducible builds
    Unit/ Integration testing
    Deployment
    Simple repeatable deploys
    Canary/ Dark Canary/ Staging
    Canary testing
  • Dependency management
    Unreliability in microservices usually comes from either changes in inbound traffic or changes in behavior from downstream traffic
    Knowing all of these and understanding impact matters
    Onboarding/ Deprecation
    Documented manner to start using API’s
    Access Control
    Best Practices
    Deprecation
    ACL/ Firewalls
    Code Cleanup
    Routing + Discovery
    Is there a standard way to discover how to get to your service
    Does your application have reliable health-checks
    Does your load-balancer respect these
    Circuit breakers/ Degraders
  • Growth Scales:
    How your service scales with the business goals/ metrics
    How your application scales as it gets more traffic (how do you make it serve more traffic)
    What resource “bounds” the application throughput
    Resource Awareness:
    What is your resource usage
    What are your bottlenecks
    Horizontal vs Vertical scaling (don’t do this)
    Dependency Scaling:
    Going back to reliability…how do your downstreams scale with your services growth
  • Performance
    This is essential
    Ideally should be done every time a change is made to the service (deployment/ AB flag)
    Measure and report performance
    Traffic Management
    QoS
    Scaling for bursts/ failover
    Capacity:
    This really ties everything together
    Do you have the right numbers to know how many resources you’ll need in the future
  • SPOF:
    In 2018, this really shouldn’t be a problem.
    Hardware failures
    Rack failures
    Do the right thing early to avoid problems down the road
    Resiliency Engineering
    May be known as chaos engineering
    Deliberately break your service to find weakpoints and look to make things fail more gracefully
    Outages happen, be prepared for them
  • SPOF:
    In 2018, this really shouldn’t be a problem.
    Hardware failures
    Rack failures
    Do the right thing early to avoid problems down the road
    Resiliency Engineering
    May be known as chaos engineering
    Deliberately break your service to find weakpoints and look to make things fail more gracefully
    Outages happen, be prepared for them
  • SPOF:
    In 2018, this really shouldn’t be a problem.
    Hardware failures
    Rack failures
    Do the right thing early to avoid problems down the road
    Resiliency Engineering
    May be known as chaos engineering
    Deliberately break your service to find weakpoints and look to make things fail more gracefully
    Outages happen, be prepared for them
  • Failure Scenarios:
    Understand what ways your service can break – resiliency engineering can help here if you’re ensure
    Have plans to respond to these
    Disaster Recovery:
    For larger scale outages, what’s your plan
    IM:
    What’s the process to manage & respond to the outage
    Gamedays are a great way to test this
  • Failure Scenarios:
    Understand what ways your service can break – resiliency engineering can help here if you’re ensure
    Have plans to respond to these
    Disaster Recovery:
    For larger scale outages, what’s your plan
    IM:
    What’s the process to manage & respond to the outage
    Gamedays are a great way to test this
  • Dashboards:
    Dashboards are for high-level system health
    Not for regression validation
    Monitor service/ resources/ infrastructure
    All alerts should be actionable and have pre-planned responses
    Anything other than this creates alert fatigue
    Logging is an underrated aspect of software development. During resiliency testing, see what the logs say and if they’re helpful
  • Dashboards:
    Dashboards are for high-level system health
    Not for regression validation
    Monitor service/ resources/ infrastructure
    All alerts should be actionable and have pre-planned responses
    Anything other than this creates alert fatigue
    Logging is an underrated aspect of software development. During resiliency testing, see what the logs say and if they’re helpful
  • In code documentation has two main forms,
    Writing documentation (or docstrings) for your function, Class, module
    Writing code examples

    So how can we do this nice and easily
    Sphinx
    JavaDoc
    Doxygen
    And there’s many others
  • Michael
    So we’re apart of a team at LinkedIn called Production-SRE
    The key tenants of production-sre at LinkedIn is:
    Assist in restoring stability during site-critical issues
    Developing applications to reduce MTTD and MTTR
    Provide direction and guidelines for site-troubleshooting
    Build tools for efficient site-issue troubleshooting, issue detection and correlation

    As this presentation goes on, you’ll notice how an Event Correlation system fits into these

×