SpringOne 2021
Session Title: Preparing the Gap Inc. Ecommerce Platform for Traffic Surge During the Holiday Season
Speakers: Anand Rao, Advisory Platform Architect at VMware; Ram Kesavan, Senior Director, SRE/PaaS/API Platform at Gap, Inc
8257 interfacing 2 in microprocessor for btech students
Preparing the Gap Inc. Ecommerce Platform for Traffic Surge During the Holiday Season
1. Preparing Gap Inc.
E-commerce for the
Holiday Season
Ram Kesavan, Senior Director, IT (Gap, Inc)
Anand Rao, Senior Staff Specialist Solutions Engineer (Vmware, Inc)
2. What does Peak mean for Gap Inc.?
For retailers, Peak
Season begins after
October and through
December 25th
Black Friday and
Cyber Monday are
the highest traffic
days
All systems run at
highest utilization
during those days
4. Peak Preparation
A team that follows up with each
Product/App team
Peak
readiness
Peak Target
Numbers
Capacity
Engineering
High Availability
requirements and
Failover Testing
Coordinate
Integrated Load
testing
Any Vendor
dependencies -
Questionnaire
All Observability
data are
captured and
analyzed
All required
Dashboards
are ready
With scores and
red/green for peak
readiness for each
team
All-year effort for
Gap Inc. as a Peak
Program
5. Peak Readiness
Peak Target
Transaction numbers
are defined early
(usually 200 – 250%
of last Peak)
Each Product team
analyses the prior
Peak Numbers and
sets for the target for
individual applications
Product teams then
analyze and do
Individual Load tests
on the Applications, all
thru the year usually
after releases
6. Capacity Engineering
• SRE Teams help with Capacity Engineering Practice
with Individual Application Teams
• Help with Converting Business Forecast to
Transactions Per Second
• Consult with Product teams when Application faces
performance bottlenecks
• Have each product team ready with their Application
for Full Peak capacity before Full Integrated Load tests
• Define Service Dependency and Scaling Ratios
required for those dependency
• Define and Document Capacity for a single unit of
application (Container/VM) and calculate Capacity
requirements accordingly
7. Integrated Load Testing
Conduct Full Integrated Load testing using 3rd party Cloud
tool to simulate Production Traffic
Calculate expected CDN Leaks and generate the required
Transactions Per Second in Gap Inc. Infrastructure
Integrated Load test to test all components running at full Peak Load
Used 3rd Party Cloud based product to generate load (Jmeter)
Page Ratios similar to production Ratios
Going all the way down to even do synthetic order placement
8. Load Testing Data Analysis
Product/App Level Load testing
1 2 3
Data analysis of
Load Test using
Observability
metrics
Observability analysis
using APM and other
infra metrics to see
App bottlenecks
Any backend
issues with the
App Load test
9. High Availability &
Failover Testing
• Product Teams (Infrastructure/Application)
Conduct regular Failover testing
• Critical Components and Applications are
Failed over
• Tested for seamless failover
• No order drops, application
reconnects, recovery from connection storms
• Critical components like DB, Distributed DB
nodes, LB, Firewalls are failed over during
Load tests
10. Peak Execution
• Teams from India and the US are involved during the long
weekend for a successful peak
• Peak execution plan – Sample Execution Plan
• Constant Checkpoints
• Support online
• Multiple all day Zoom calls per product team with a main call
• Direct connectivity to all support teams
• Product teams provide all day coverage schedule for four days
• SRE teams run calls for four days monitoring all systems online
• Coordinated, Orchestrated, choreographed by Technical
Program Managers and SRE
11. PIPELINE DASHBOARD
Order Management
North America
Customer Channels
Services
Backend
Storage
Backend 3 Backend 4
Backend 5 Backend 6
Tax Svc Service 2
Service 3 Service 4
Service 5 Service 6
GP ON BR AT
Firewall
GPF ONF BRF
US
EU
UK
GP BR
CA
GP BR
GP BR
JP
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
DB
DB1 DB2
DB3 DB4
DB5 DB6
Tools
Tool 1 Tool 2
Tool 3 Tool 4
Tool 5 Tool 6
International
Fulfillment
DB
DB1 DB2
DB3 DB4
International
Order Count:
Orders/min: xxx
Orders/Hour: xxx
Orders/Day: xxxx
Invoice
MOBILE
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
Location 1
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
Location 2
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
Location 3
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
User DB Backend 7
Backend 8
Backend 9
Inventory
Service 8
Service 9
Service
1
Service
2
Service
3
Service
4
Service
5
Service
6
12. Where do we go from here
• Using Automation for recovery of issues
• Using ML for analyzing Observability Data
• Product teams are Peak read 365 days a year
• Being able to do individual product load tests as part of CI/CD
• Automation of scale up and scale down