Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nginx conference 2015

1,590 views

Published on

Bart Warmerdam, Advisory IT Specialist at ING, talks about NGINX.

Published in: Software
  • Be the first to comment

Nginx conference 2015

  1. 1. Move Over IBM WebSeal and F5 BigIP, Here Comes NGINX 09/23/2015
  2. 2. #nginx #nginxconf 2 Advisory IT Specialist at ING Bank N.V. Bart Warmerdam
  3. 3. Who is ING globally 3
  4. 4. Who is ING in the Netherlands 4
  5. 5. • Bank with diverse software and hardware landscape • Cost driven IT • Traditional software development: design, build, test, implement • Software strategy: buy before build • Middleware strategy: buy • Hardware strategy: appliance History up to 2.5 years ago within ING 5
  6. 6. • Bank with diverse software and hardware landscape • IT and Time-to-Market is important • 60 scrum teams internally working on software • Software strategy: build before buy (a lot of time) • Middleware strategy: buy but… • Hardware strategy: standard scalable stacks From 2.5 years ago up to now 6
  7. 7. Complex IT landscape Task: simplify IT Add missing functionality 7
  8. 8. • Internet facing reverse proxies (IBM TAM WebSeal)  Authenticating proxy  Content caching and compression  Cookie jar functionality • Multiple layers of load balancers (F5 BigIP)  Over data centers  Over nodes in different network zones For all internet facing domains of domestic banking Netherlands Infra structure to replace 8
  9. 9. • Investigate open source software: NGINX or Apache vs IBM WebSeal / F5 • Perform a proof of concept with NGINX for Authentication and Event Publishing • Write a report for deciding architects which concluded after proof of concept:  Replace IBM TAM WebSeal with NGINX using custom modules  Integrate the layers of F5 BigIP’s with NGINX The result “GO!” Now we are more in control then ever. The Plan to Simplify 9
  10. 10. Starting with 10 Load balancer WebSeal Load balancer Tier 1 (dmz) Tier 2 F5 IBM F5 F5 External Authentication Interface Application Application Application 10 Inter Connectivity Cloud (between DC’s)Inter Connectivity Cloud (between DC’s) Policy Mgr LDAP Load Balancer
  11. 11. Working towards 11 Load balancer NGINX Tier 1 (dmz) Tier 2 F5 NGINX External Authentication Interface Application Application Application 11 Inter Connectivity Cloud (between DC’s)Inter Connectivity Cloud (between DC’s)
  12. 12. Control in… 12 • Integrate Authentication and Event Publishing module from PoC Functionality Time-to-Market Operational Monitoring Control
  13. 13. Control in… 13 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality Functionality Time-to-Market Operational Monitoring Control
  14. 14. Control in… 14 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers Functionality Time-to-Market Operational Monitoring Control
  15. 15. Control in… 15 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers • Add dynamic service discovery so teams can self-service end points Functionality Time-to-Market Operational Monitoring Control
  16. 16. Control in… 16 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers • Add dynamic service discovery so teams can self-service end points • Integrate existing (Java) Continuous Delivery Pipeline Functionality Time-to-Market Operational Monitoring Control
  17. 17. Control in… 17 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers • Add dynamic service discovery so teams can self-service end points • Integrate existing (Java) Continuous Delivery Pipeline • Monitor system resource usages and errors to Graphite Functionality Time-to-Market Operational Monitoring Control
  18. 18. Control in… 18 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers • Add dynamic service discovery so teams can self-service end points • Integrate existing (Java) Continuous Delivery Pipeline • Monitor system resource usages and errors to Graphite • Add Grafana dashboards and Mobile alerts for team dashboards Functionality Time-to-Market Operational Monitoring Control
  19. 19. Control in… 19 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers • Add dynamic service discovery so teams can self-service end points • Integrate existing (Java) Continuous Delivery Pipeline • Monitor system resource usages and errors to Graphite • Add Grafana dashboards and Mobile alerts for team dashboards • Monitor and report upstream errors to Tivoli Omnibus (MCR) Functionality Time-to-Market Operational Monitoring Control
  20. 20. Control in… 20 • Integrate Authentication and Event Messaging module from PoC • Add missing cookie jar functionality • Add load balancing persistency over data centers • Add dynamic service discovery so teams can self-service end points • Integrate existing (Java) Continuous Delivery Pipeline • Monitor system resource usages and errors to Graphite • Add Grafana dashboards and Mobile alerts for team dashboards • Monitor and report upstream errors to Tivoli Omnibus (MCR) • Make performance data and reports available to all scrum teams Functionality Time-to-Market Operational Monitoring Control
  21. 21. • First step: Integrate into the Continuous Delivery Pipeline • From GIT to production • Second step: Add additional functionality to NGINX • Future roadmap of the NGINX authenticating proxy environment Roll-out planning 21
  22. 22. • Using standard open source tools like: Git, Jenkins, Maven, Nexus, Docker, Valgrind, Python • And closed source tools like Nolio (deployments), Fortify (static source code analysis) First step: integrate in continuous delivery pipeline 22
  23. 23. 23 GIT repository
  24. 24. 24 Commits on “develop” trigger a build in Jenkins Using an Apache Maven build profile
  25. 25. 25 Which builds the project modules
  26. 26. 26 By packaging all own modules And add nginx.org source from our Nexus repository And 3rd party source modules from our Nexus repository As a tar.gz file
  27. 27. 27 And add the RedHat .spec file
  28. 28. 28 To start a Docker build in a CentOS image Which results in an RPM
  29. 29. 29 If all Python tests succeed on the binary
  30. 30. 30 If all integration test scripts ran successfully All product acceptance scripts ran successfully
  31. 31. 31 And all module tests succeed as well
  32. 32. 32 Using a Python test framework To easily create test cases for the binary and modules
  33. 33. 33 The RPM’s and test results are uploaded to a Nexus Repository Together with Nolio deployment scripts After which Jenkins triggers an automatic Nolio deployment in LCM
  34. 34. 34 Each commit in “develop” also starts a Jenkins job that Triggers the Valgrind tests on all modules And emails the results on failures
  35. 35. 35 Each commit in “develop” also starts a nightly Jenkins job that Starts a Fortify scan for static source code analysis On all own modules, NGINX code and all 3rd party modules used
  36. 36. 36 Releases on “master” trigger a build in Jenkins Using Apache Maven release profile Where versioned artifacts are uploaded to Nexus
  37. 37. 37 Configuration releases on “master” trigger a build in Jenkins Where the correct nginx.conf and site information created
  38. 38. 38 And SQL is used to create a list of URL endpoints And their module directives
  39. 39. 39 Using a maven plugin to create the correct configuration files
  40. 40. 40 Using Docker to build a RPM and test all generated configurations
  41. 41. 41 So it can be automatically deployed in Nolio in LCM by Jenkins
  42. 42. • LCM DEV + TST environment for internal team tests • DEV + TST for integration tests for all other teams • ACC for pre-production tests Daily load tests using Load Runner & perf. reports using Python, Latex and gnuplot Weekly resilience tests Unplanned Simian Army tests Run “perf” tests for NGINX profiling (if a change requires it) Penetration and security tests • Multiple PRD environments in different data centers Replaced all IBM WebSeal reverse proxies with NGINX Starting to replace all F5 BigIP internal load balancers with NGINX load balancer module The result… 42
  43. 43. • Using “perf” we analyzed the binary under load ~500 URI/sec Optimizing the result 43 Number 1, 3, 8,11 is GZIP compression Number 2 is memset => hard to pinpoint since generic use Number 4 is network driver => cannot change Number 5 is cookie header parsing, triggered by our code Number 6 is OS Number 7 is Kafka CRC32 code Number 9 is memcpy => hard to pinpoint since generic use Number 10 is cause by the audit system => cannot change Number 20 first own method listed
  44. 44. • GZIP is expensive on the CPU, use optimized libraries when possible • Use static linking when replacing the patched library cannot be done on target machine • Two patches available, from Intel and Cloudflare Compression level 5 Source: https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html Include optimized libraries 44
  45. 45. • Some libraries are not available on the target machine (Kafka, MaxMind, Protobuf) • Some libraries are too old on target machine (PCRE3 – for JIT) • CPU optimized versions are added in the Docker image and statically linked Patching libraries for performance 45
  46. 46. • Our five most important home-made modules Cookie jar module – store Set-Cookie operations in reverse proxy WebSeal module – Authentication module based on Extended Authentication Interface (EAI) Kafka module – Send Event Messages from proxy layer to other systems Load balancing – Rule based upstream use, allow dynamic service discovery Monitoring module – Monitor application use and system resource usage Second step: Add additional functionality to NGINX 46
  47. 47. • Uses two levels of RB Trees to store state • Highly configurable • Use timers for automatic expiration and cleanup • Use shared memory to share state between workers Cookie jar module 47
  48. 48. • Uses a RB Trees to store session state • Allows access on different policies (fine or coarse grained) • Use timers for automatic expiration and cleanup • Use shared memory to share state between workers • Implement the EAI interface to allow gradual migration WebSeal module 48
  49. 49. • Publish Events for monitoring and error analysis • Highly configurable using a separate json config file • Fast and asynchronous to avoid processing overhead Event Publishing (Kafka) module 49
  50. 50. • Use specific upstream servers based on rules (e.g. confidence test) • Allow static load balancing over data centers for stateful applications • Allow TCP connection re-use, using pools • Integration with monitoring module to allow monitoring via MCR Load balancing module 50
  51. 51. • Read variables from other modules to monitor • Create and expose variables with system resources to monitor • Use UDP or TCP to transfer monitor data to Graphite • Integration with Tivoli Omnibus to allow monitoring via MCR Monitoring module 51
  52. 52. Monitoring example 52
  53. 53. • Add WAF modules • Fully implement dynamic service discovery to dynamically add/remove URI’s and upstream servers • Implement cross datacenter persistency for cookie jar Future roadmap of the NGINX authenticating proxy environment 53
  54. 54. • Remove manual work in development and testing ASAP • NGINX has a lot of configuration optimization possibilities TCP Socket/TCP options, caching, connection re-use, JIT, Threads, upstream zone, buffer settings, timeouts • In own modules Use Shared Memory for Session State (if needed), RB Trees, Thread pools, Timers and the event queue Use atomic reference counter over shared mutex locks if possible Use variables to pass data between modules • In NGINX modules Compression on content is CPU expensive! Cookie lookups in modules are potentially CPU expensive CRC32 is potentially CPU expensive If using symmetric crypto, use types supported by the CPU (EAS-NI), like EAS GCM/CTR Lessons learned so far… 54
  55. 55. • Older stack require more work to fully use all configurations Recompiled new GCC C-compiler for strong stack protector and CPU optimization options Recompiled libz and static link for latest version and add Intel performance patches Recompiled libpcre and static link for latest version for JIT, and use CPU optimize flags Recompiled other libs which are not present in RHEL and use CPU optimize flags • Make monitoring highly configurable per site and fine-tune over time • Use good monitoring dashboards Combination of Graphite and Grafana works very well Test which log data in error.log is required for good root-cause-analysis if an error occurs • Take enough time to test Performance tests under stress load with tools like “perf” give a lot of insight Invest enough time in resilience tests and what key data is needed to monitor your system All code which involves shared memory, locks, timers and configuration reloads take more time to get right Lessons learned so far… 55
  56. 56. And… NGINX is very fast, very efficiently coded and extremely fun to program for! Lessons learned so far… 56
  57. 57. Questions?? E-mail: bart.warmerdam@ing.nl And... 57
  58. 58. The opinions expressed in this publication are based on information gathered by ING and on sources that ING deems reliable. This data has been processed with care in our analyses. Neither ING nor employees of the bank can be held liable for any inaccuracies in this publication. No rights can be derived from the information given. ING accepts no liability whatsoever for the content of the publication or for information offered on or via the sites. Author rights and data protection rights apply to this publication. Nothing in this publication may be reproduced, distributed or published without explicit mention of ING as the source of this information. The user of this information is obliged ot abide byb ING's instructions relating to the use of this information. Dutch law applies. www.ing.com Disclaimer 58

×