2. Xoom.com
• Digital remittance
• Founded in 2001
• Acquired bluekite.com in 2014
• Acquired by PayPal in 2016
2
A little history
3. Xoom.com
• Digital remittance Highly regulated environment
• Founded in 2001
• Acquired bluekite.com in 2014
• Acquired by PayPal in 2016
3
A little history (translated)
4. Xoom.com
• Digital remittance Highly regulated environment
• Founded in 2001 16 years of code and data
• Acquired bluekite.com in 2014
• Acquired by PayPal in 2016
4
A little history (translated)
5. Xoom.com
• Digital remittance Highly regulated environment
• Founded in 2001 16 years of code and data
• Acquired bluekite.com in 2014 Polyglot code and persistence
• Acquired by PayPal in 2016
5
A little history (translated)
6. Xoom.com
• Digital remittance Highly regulated environment
• Founded in 2001 16 years of code and data
• Acquired bluekite.com in 2014 Polyglot code and persistence
• Acquired by PayPal in 2016 New rules
6
A little history (translated)
7. Throwing down the gauntlet
• Decouple teams
• Reduce time to build and deploy
• Understand our resource needs
• Scale appropriately
7
Break up the monolith(s)
8. Microservices to the rescue
• Programming paradigms and idioms
• Service discovery
• Monitoring
• Performance
• Infrastructure as code
• Build and deployment pipeline
• Data ownership
8
Challenges and risks
10. Service discovery
• Custom, local, layer seven load balancers
• Zookeeper back-end
• Apache Curator
• Registration, health checks, and routing
• Service Portal
• Integrating with linkerd.io
10
The service-proxy solution
Zookeeper
Host
Service-proxy
App A App B
11. Monitoring
• Define required measurements
• persistence operations
• remote calls
• service endpoints
• 3rd party service endpoints
• Define metric types
• gauges
• counters
• histograms
• Standard naming scheme
• Self-service dashboards
• Time series explosion
11
Grafana and InfluxDB
12. Performance
• Additional network latency has been offset by:
• Reduced contention on datastores
• Limiting the scope of database transactions
• Optimization through observability
• Throughput has improved dramatically
• Latency distribution is wider
• Latency sensitive APIs are deployed nearby
12
Throughput and response latency
13. Infrastructure as code
• TDD isn’t just for applications
• Terraform and Packer for host provisioning on AWS and Vsphere
• Puppet and Ansible acceptance testing using beaker
• Network gear
• Standardize app packaging
• Docker
• Contracts for deployment
• Application control plane
13
14. Build and deploy pipeline
• Git-flow
• Branch per feature
• Docker-flow
• Container per branch
• Seed jobs
• Build job per branch
• Automated and self service deployments
• Dev and QA teams can choose branches to deploy and test
• Fidelity of environments
• Environment fidelity ∝ automation success
14
16. Data ownership
• Hard problem
• Start eliminating cross-domain joins now
• Two years on, we are just now migrating the last auth-server client from tables to APIs
• Analytics becomes more complicated
16
17. Current status
• ~100 distinct microservices across 3 production data centers
• Most new features are developed as microservices
• Monoliths still exist, but are being chipped away
17
18. Lessons learned
• Measure everything, and be prepared to scale your monitoring system
• Application packaging contracts and delivery pipelines are mandatory
• Staff a tooling team for build, test, and deployment automation
• Enroll your network operations team
• The infrastructure and culture we built in order to move to microservices has paid off
• Elimination of the monoliths isn’t that important
18
Currently US outbound to 56 countries
Several iterations of the technology stack
Bluekite was a digital bill payment service
The usual PCI controls
Developers can never touch production
Regulatory obligations to all 50 states
Regulatory obligations to all 56 countries
With that much history, there are complexities in the codebase that all can’t easily be understood by a single developer
Silos
Difficult to tease apart the tables
Java, NodeJS, Ruby, and go
MySql vs PostgreSQL, RabbitMQ vs Redis, Logstash vs Splunk
Hybrid cloud. Bluekite was built for AWS, Xoom was built for the physical datacenter.
Aligning with new infosec policies and standards
Quite seamless
Totally different technology stack and approach (Mesos vs k8s)
Mostly hands off
Learning opportunity for us. PP has done a lot of work with microservices and all that implies, and have the scars to prove it.
Because we’re polyglot and we’re hybrid cloud, contracts have been vital for us. We gave up the luxury of making a change in one plane and affecting change across the organization.
This has actually cost us in organizational agility in some respects.
You would be amazed how much some clever DBAs and analysts can get done with a single massive relational database.
You want to move a table after 10 years, be prepared to scour the organization for users who depend on it.
It’s been three years
Most of our pay partner integrations are still deployed as a single giant Java app.
Large swaths of our customer facing website are still deployed with our payment processor