3. Etsy is the global
marketplace for
unique and
creative goods. It’s
home to a universe
of special,
extraordinary
items, from unique
handcrafted pieces
to vintage
treasures.
7. Why migrate Etsy’s
logging system? ● Etsy was migrating entirely to
Google Cloud
● Elasticsearch is a complex system
that requires specialized knowledge
(especially in a logging use case)
● Elasticsearch 2.4 old and
unmaintained (EOL date was
02/2018)
A few reasons...
8. Why migrate Etsy’s
logging system? ● Alert fatigue for the whole team
● Maintaining Elasticsearch infra is
NOT observability
● Data center shutdown
A few reasons...
9. Key considerations
Migration must not impact
developers’ day-to-day
work
Business as usual
Migration must be time
efficient (data center
shutdown)
Time
Migration must reduce
infrastructure
management from the
team
Reduce TOIL
10. Process Options
1. Move all logs to Elasticsearch
service on Elastic Cloud
2. Move only critical logs to
Elasticsearch service on Elastic
Cloud
3. Move to our Google Cloud
infrastructure using ECE (Elastic
Cloud Enterprise)
4. Move to our Google Cloud
infrastructure manually
11. Alternatives
● Splunk
● Stackdriver
● <name logging solution>
Considerations:
● Too many intrusive solutions for
developers
● We didn’t want to throw away the
Elasticsearch knowledge we built
over the year
● Not enough time to prototype and
roll out a change that big
12. Challenges
● Move stack from 2.4 to 7.x
○ Logstash 2.x can’t talk to
Elasticsearch > 6.x
○ Identify and replace
deprecated settings in
Elasticsearch
○ Learn new features
○ Deploy changes safely
● Keep two systems running in
parallel for some time
13. Migration Timeline
03/2018
Gathered cluster size
and wrote first options
draft
Prod data
migrated;
Beta testing
started!
10/2019
Users fully migrated to the
new setup
01/202012/2018
Finalized
options
Contract
signed!
03/201902/2019
Prepared
migration
plan
06/2019
Dev data
migrated
14. Migration Successes
● Met our deadline
● Elastic support and consultants are helpful
● Happy developers
● Returning teams
15. Migration Successes
● Better observability into the stack
● Easier and safer management of indices and logstash pipelines
● Create, grow and shrink clusters is way easier
● Better isolation of the stream of data
16. What we wished we knew?
● Sizing an ES cluster is an art
○ One needs to consider volume AND throughput
● Noisy neighbors
● Support SLAs are not ideal when developing
○ Initial response on SEV3 is 1 business day
● Elastic Cloud is not just an endpoint
○ We are still responsible for indices management
17. What’s next?
● Improvement on the logging pipeline
● Analyze use cases and recommend best practices in Etsy