2. Roadmap
• What is Eikon Reverse Proxy
• The Rationale
• Internals
• Challenges
• Tooling for agility and resilience
• Q & A
3. What is Eikon Reverse Proxy?
Eikon Reverse Proxy is the entry gateway for Eikon, Thomson
Reuters’ flagship financial desktop platform
• Processes 110 million+ requests/day globally
• 2.5k+ requests/sec peak traffic
• Constitutes 90%+ of web traffic (content & services) hitting platform
• 90+ services onboarded in under 3 years (since inception)
4. What is Eikon Reverse Proxy?
Varnish Cache is at the heart of Eikon Reverse Proxy
• Centralized Authentication enforcement
• Currently powered by Varnish 3.0.2 (move to Varnish 3.0.7 underway)
• Simplification thru routing logic abstraction away from n/w appliances
5. The Rationale
Access
● Simplified external network connectivity for TR and for customers
● Better time to market and simpler architecture through central
authentication
● Increased visibility into incoming traffic - logging and searching
● Improved security: smaller attack surface and central control
Caching and performance
● Improved performance for end users thru caching
● Reduced load on application servers
● Faster payload delivery due to persistent connections (esp. for
connections to distant sources)
6. Routing
● Better service through transparent and automatic failover
● Better service and time to market through increased location
independence for applications
● Application specific routing logic resides in VCL
● Integration of external or internal products, via federated authentication
Platform stabilization, improved resilience
● Failover at a per service level, low down time for users
● Easier hardware/infrastructure refresh/moves since everything is non-
edge
The Rationale
7. Internals
Infrastructure
● Regular off-the-shelf HP blades running Linux, 48 GB RAM, 600 GB
SATA
● Simplified n/w infrastructure: single 1 GB bonded NICs, fronted by LB
(SSL offload)
● Other proxies (Nginx) used behind Varnish to fill in areas/make RP a
rounded proposition
● Deployments in all strategic data centers
Customizations to Varnish
● Small customizations to core Varnish to better support failover needs
(directors)
● VModules to enforce Eikon authentication checks & implement other
custom functionality
8. Internals
RP API
● Python based API layer abstracts RP configuration, services & VCL
generation
● Templatized for easy extensibility and abstracting from Varnish
internals/VCL semantics
● API is the glue to platform failover simplification efforts
● API is the vehicle to enable “RP as a Service” (self-service on-boarding)
10. Challenges I
Application teams operating in silos
● Differing business priorities, geographically dispersed
● Convincing why they need to go through a proxy
● Proving RP does not add overhead
● Educating teams on how to build proxy friendly applications
● Identify special proxying needs
11. Challenges II
Not being in the middle of everything
● Slowness? Is RP down or broken?
● Proving “It’s not the proxy, the problem lies elsewhere”
● Training support staff on common “gotchas”
● Moulding mindsets to adopt more modern troubleshooting
tools/techniques
Supportability
● Ensuring adequate functional coverage for major changes
● Continuously evolving Apps, change in usage patterns, load
12. Tooling for agility & resilience
Replay test framework
● Home-grown test framework to mimic Prod like traffic in staged setup
using logs from Prod
● Builds on Varnishlog, taking raw logs from Prod, parses log data to
prepare -
○ Replay test scenario that matches with Staged setup
○ Stub data to feed dummy services that mimic real backends during
replay scenario
○ User anonymization and user session concordance
● Extended replay capability that can construct a load test suite using
Fiddler capture(s) as input
○ Parses request to frame input for load test tool (httperf)
○ Uses captured response to feed stub service
○ Customized AutoBench to extend Httperf’s reporting and simulate
concurrent user access
13. Tooling for agility & resilience
Wiring into the rest of the infrastructure
● Full integration into monitoring/alerting infrastructure for event/alert
ingestion
● Complete view of all requests entering the platform - feedback loop into
replay test framework
● Metrics for platform performance measurement and capacity management
● Sophisticated dashboard to expose run-time view of all services & their
state to Ops/Support