Capital One
3/8/2017
Microservices, Continuous Delivery, and
Elasticsearch at Capital One
Noriaki (Nori) Tatsumi, Bingchen (Ben) Hu, Anne Cather
Security breaches dominate the news
CYBER TECH
DATA LAKE
Build vs. buy
• Industry tools only meet ~80% of our requirements
• Vendors’ priorities don’t align with ours
• Elasticsearch is an open source solution
• Open source technology is extensible
30+ data sources, 3B events and 6TB data/day
How we got here
Scale New features NFRs
• More data
• More processing
• Longer data retention
• More consumers
• Alerts console
• Cyber threat intelligence
repository
• And more!
Our initial requirements
• Uptime and DR
• Security
• Compliance
• Data management
The prototype we had
Elasticsearch
Data Nodes
Elasticsearch
Master Nodes
Elasticsearch
Client NodeKibana Fork
w/ SSO Integration
AD SSO
MORE REQUIREMENTS,
DELIVERY DATES,
BIGGER TEAMS
=
HIGHER COMPLEXITY
Monolith
• Work in parallel
• Do one scope of things well
• Easy to understand and maintain
• Technology stack choice for features and teams
• Quicker, smaller, & independent deploys
• Fault isolation
What we wanted
MICROSERVICES
No SSO Integration!
Embracing microservices
Elasticsearch
Data Nodes
Elasticsearch
Master Nodes
Elasticsearch
Client NodeKibana Fork
w/ SSO Integration
AD SSO
Alerts-API Alerts-UI CTI Repo
• A well known entry point to the system
• Security
• Dynamic routing
• Resiliency
• Latency and fault tolerance
• Monitoring and stats collection
Edge gateway
Align same qualities to downstream services
• Spring Boot for developer productivity
• JVM-based for production supportability
• Netflix OSS that’s proven microservices technology
Spring Cloud
Foundation for our web microservices
@SpringBootApplication
@EnableAutoConfiguration
@EnableZuulProxy
public class EdgeGateway {
public static void main(String[] args) throws Exception {
SpringApplication.run(EdgeGateway.class, args);
}
}
Getting started with Netflix Zuul is easy
Edge gateway
zuul.routes.kibana.path=/kibana/**
zuul.routes.kibana.url=https://172.20.10.15:5601
Routing with Zuul
Edge gateway
Elasticsearch
Client NodeKibana
Elasticsearch
Client NodeKibana
Zuul: the edge gateway
Elasticsearch
Data Nodes
Elasticsearch
Master Nodes
Edge
Gateway
Elasticsearch
Client NodeKibana
AD SSO
Alerts API
Alerts UI Reports UI
CyberTech
Reports Repo
Auth
Asking engineers to maintain IP addresses
• Use cases
• Service connection information lookup
• Automated configuration of load balancing and failover
• Alternatives to Eureka with Spring Cloud
• HashiCorp Consul
• Apache Zookeeper
Discover service
Automate orchestration with Netflix Eureka
<application>
<name>...</name>
<instance>
<instanceId>... </instanceId>
<hostName>... </hostName>
<app>...</app>
<ipAddr>...</ipAddr>
<status>UP</status>
<overriddenstatus>UNKNOWN</overriddenstatus>
<port enabled="false">...</port>
<securePort enabled="true">...</securePort>
<countryId>1</countryId>
<dataCenterInfo class="com.netflix.appinfo.AmazonInfo">
<name>Amazon</name>
<metadata>
<accountId>...</accountId>
<local-hostname>... </local-hostname>
<instance-id>...</instance-id>
<local-ipv4>...</local-ipv4>
<instance-type>...</instance-type>
<vpc-id>...</vpc-id>
<ami-id>...</ami-id>
<mac>...</mac>
<availability-zone>...</availability-zone>
</metadata>
</dataCenterInfo>
<leaseInfo>
<renewalIntervalInSecs>...</renewalIntervalInSecs>
<durationInSecs>...</durationInSecs>
…..
zuul.routes.kibana.path=/kibana/**
zuul.routes.kibana.serviceId=kibana
kibana.ribbon.listOfServers=172.20.10.11:5601,172.20.10.12:5601,
172.20.10.13:5601,172.20.10.14:5601
ribbon.eureka.enabled=false
Routing with Zuul without Eureka
Discover service
zuul.routes.kibana.path=/kibana/**
zuul.routes.kibana.serviceId=kibana
Routing with Zuul with Eureka
Discover service
@SpringBootApplication
@EnableDiscoveryClient
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
Making Spring Boot app discoverable with Eureka
Discover service
• Eureka Client (Java)
• Eureka-js-client (JavaScript)
• Eureka REST API (Polyglot)
• *Sidecar/App gateway (Polyglot)
Discover service
Making any app discoverable with Eureka
Solving the configuration nightmare
Elasticsearch
Data Nodes
Elasticsearch
Master Nodes
Edge
Gateway
AD SSO
Kibana
Gateway
Elasticsearch
Client Node
Kibana
Kibana
Gateway
Elasticsearch
Client Node
Kibana
Kibana
Gateway
Elasticsearch
Client Node
Kibana
Eureka
Discovery
Service
/kibana
Alerts-UI
CyberTech
Reports UI
Alerts-API
CyberTech
Reports API
Multi-config Kibanas
Elasticsearch
Data Nodes
Elasticsearch
Master Nodes
Edge
Gateway
AD SSO
Kibana
Gateway
Elasticsearch
Client Node
Kibana
Kibana
Gateway
Elasticsearch
Client Node
Kibana
Kibana
Gateway
Elasticsearch
Client NodeKibana
(Console Off)
Kibana
Gateway
Elasticsearch
Client NodeKibana
(Console On)
Authorization
Service
/kibana
/kibana-admin
Protected Elasticsearch gate
Elasticsearch
Data Nodes
Elasticsearch
Master Nodes
Edge
Gateway
AD SSO
Kibana
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana
Gateway
Kibana
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana
Gateway
Kibana
(Console OFF)
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana
Gateway
Kibana
(Console ON)
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana-Admin
Gateway
Authorization
Service
/kibana-admin
/kibana
/esclient
Spring Boot Admin for Spring Cloud microservices
https://github.com/codecentric/spring-boot-admin
Distributed tracing with Spring Cloud Sleuth
https://cloud.spring.io/spring-cloud-sleuth/
Distributed tracing with Spring Cloud Sleuth
Distributed tracing with Spring Cloud Sleuth
• Successes
• Short circuited
• Thread timeouts
• Thread-pool rejections
• Failures/exceptions
• Error percentage
(Rolling 10 second counters)
Circuit breaker monitoring
Crushed it!
Elasticsearch
Kibana
Product delivered and released on time
MICROSERVICES
=
PROFIT!
ELASTICSEARCH
OPERATIONS
Cluster on fire!
• Stability issues from end user queries
• Data ingestion latency problems
• Insufficient monitoring
Finding the causes
• Inconsistent OS, JVM, and
Elasticsearch configurations across
cluster
• No circuit breakers
• Elasticsearch index templates were
missing
• Shards improperly sized
• Incorrect field mappings
• Improper cluster sizing
DEV + OPS
CONTINUOUS DELIVERY
=
REQUIREMENT
Configuration management
+
Automation
Hello
Hardware
Playbook
• Spin up AWS infrastructure
• Tag for purpose
• Configure subnet, security
group, VPC, etc.
Software
Playbook
• Install common dependencies
• AWS tags determine software
• Deploy latest artifacts per
environment
Ansible deployment breakdown
Hardware playbook example
roles:
- role: servers
instances:
- name: Elasticsearch_Master
instance_type: m4.2xlarge
number_of_instances: 3
- name: Elasticsearch_Data
instance_type: m4.4xlarge
number_of_instances: 100
additional_volume_sizes: [1000, 1000, 1000]
- hosts: tag_{{ ansible_ec2_tag }}_Elasticsearch_Data
become: true
roles:
- role: elasticsearch
es_heap_size: '{{ [(ansible_memtotal_mb / 1024) / 2, 16] | min | int }}g'
es_plugins:
- '{{ es_plugin_license }}'
- '{{ es_plugin_marvel_agent }}'
- '{{ es_plugin_cloud_aws }}'
es_config:
cluster.name: '{{ elasticsearch_cluster_name }}'
node.name: '{{ ansible_default_ipv4.address }}'
node.master: false
node.data: true
indices.fielddata.cache.size: 10%
indices.breaker.fielddata.limit: 15%
indices.breaker.request.limit: 15%
indices.breaker.total.limit: 30%
network.breaker.inflight_requests.limit: 75%
Software playbook example
./hardware-playbook.yml --extra-vars @dev-vars.yml
./software-playbook.yml --extra-vars @dev-vars.yml
How to use
Monitor everything!
Don’t run a black box
• Cloud metrics
• Server metrics
• JVM metrics (even built our own JVM agent)
• Application metrics
• …
What we should monitor
Time-series dashboards with Grafana
ANOTHER SERVICE?
Metrics cluster integration
Elasticsearch
CyberLake Nodes
Edge
Gateway
AD SSO
Kibana
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana
Gateway
Kibana
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana
Gateway
Kibana
Elasticsearch
Client Node
Elasticsearch
Gateway
Kibana
Gateway
Kibana
Elasticsearch
Gateway
Kibana-Metrics
Gateway
Elasticsearch
Client Node
/metrics
/kibana
/esclient
Elasticsearch
Metrics Cluster
Eureka
Discovery
Service
ES query data
ES query data
Service Availability Data
Service Availability Data
PLATFORM STABILITY
TAKEAWAYS
• Microservices architecture works for us
• Increase velocity and reduce maintenance effort
• Elastic stack can integrate easily
• Continuous Delivery must be a requirement
• Monitor everything!
Takeaways
MICROSERVICES
+
CONTINUOUS DELIVERY
=
PROFIT!
More Questions?
Visit us at the AMA

Microservices, Continuous Delivery, and Elasticsearch at Capital One