Crea il tuo assistente AI con lo Stregatto (open source python framework)
Apache Cassandra at Target - Cassandra Summit 2014
1. Apache Cassandra at Target:
Pioneering NoSQL
in a Big Enterprise
Dan Cundiff (@pmotch)
Target
2. Context
● Target’s API platform
● mostly REST APIs
● e.g. products, locations, inventory, etc.
● consumers inside and outside of Target
● wide variety of providing systems (legacy, in-house
built, saas, packages, etc.)
3. Problems we needed to solve
● slow providing systems
● cost prohibitive to call directly
● unable to scale from increased demand
● need a place to aggregate data from multiple
systems
● some data wasn’t even in a database to
begin with!
4. Barriers with existing tools, part 1
● cost too much
● process for traditional DBs wasn’t a fit
● too few tools/vendors
5. Barriers with existing tools, part 2
● RDBMS isn’t:
○ distributed (multi-tenant)
○ close to Guests (geographic distribution)
○ distributed across our data centers
○ distributed to the cloud!
6. Barriers with existing tools, part 3
● lack of performance control
○ process, not owning it all, flexibility on
changes like indexing, etc
● availability
○ systems before had outages, downtime,
etc.
● not automate-able
8. Taking the idea back
● i just went and talked to Pete and we
decided to do it!
● tried other things in the past
● show results by trying; succeed or fail fast
9. Reasons trying was attractive, part 1
● fit 80% of our need
● years in development
● rich C* dev ecosystem
10. Reasons trying was attractive, part 2
● google-able
● strong community
● a company who would support it
11. Reasons trying was attractive, part 3
● chef-able
● aligned well with existing investments
● simple pricing model
12. Barriers to adoption
● enterprise IT; the nature of it
● selling it
● NoSQL for the first time
● automation (was happening at the time;
scary to do)
● political
13. Challenges integrating
● bulk loading data
● keeping cassandra in sync
● many systems not event driven
● packaged software
● limited ways to integrate with providing
systems
14. Challenges of standing it up, part 1
● early distributed system (new to teams)
● needed local disk (always used SAN before)
● needed SSDs (always used spinning things)
● existing config conflicts (backups,
monitoring, raid, swap, etc)
● use right sized server (don’t settle for what
your infra friends give you by default)
15. Challenges of standing it up, part 2
● full stack ownership
● it’s new, don’t hand it off
● support response is quick because we own it
● you’re closest to the problem; you’re best
suited to solve it
● tuned to meet the needs of our APIs
● data is modeled for API performance gains
16. Challenges of standing it up, part 3
● skills supply is low (but getting better)
● train your people
● be wary of promises from consultants
○ grill them on what they claim to know
17. Challenges of development, part 1
● skills ramp up (data modeling, datastax
driver, etc)
● developers need to care
○ encourage tweaking, research, make
things better
○ clients are equally as important to get the
most out of C*
18. Challenges of development, part 2
● mind shift from RDBMS
● started with Astyanax; switched to DataStax
driver
○ DataStax supported
○ newer features
19. Ops challenges, part 1
● lots of machines; don’t config by hand
● wrote Chef cookbooks
● support people saw these odd servers and
turned on things we disabled (like swap)
● can’t use “legacy” testing, cassandra works
differently; chaos stuff (turn off gossip, thrift,
etc.)
20. Ops challenges, part 2
● made logging awesome; we can see
anything
● utilized C* jmx interface to send data in real-time
to Splunk
● can correlate these events with the app tier
(because app logs are in Splunk too!)
22. Ops challenges, part 4
● more useful mbeans:
○ ss tables per read
○ tombstones
○ cache hits and ratios
○ misbehaving queries (range slice)
23. Open source cookbook!
● https://github.com/target/dse-cookbook
● by Danny Parker
● pull requests encouraged
24. Blog post on tuning
● http://target.github.io/infrastructure/tuning-cassandra/
● by Danny Parker (@dcparker88)
25. Results, part 1
● from n00bs to production ready = 2 months!
○ infra, operation testing, app dev, and
deployed!
○ just in time before peak season
● today our highest volume APIs depend on it
26. Results, part 2
● growth (↑ functions + ↑ volume) = ~2000%
● increased adoption of our APIs
● C* unlocking things we couldn't do before
● quick changes possible
○ makes Agile possible
○ gets us close to continuous delivery
27. Results, part 3
● other teams are using it; more coming
● sharing our cookbooks, lessons, etc.
● opened the door to other distributed systems
28. Future, part 1
● Use across more of our APIs
● Remove remaining spinning disks
29. Future, part 2
● move to cloud
● automate full stack down to infra
○ scale, quick geo-distribute, flexibility to
tweak new infra settings, etc.
30. Future, part 3
● get better at data modeling designs
● less bulk loading
○ remove compaction process overhead
● weave in Spark, Kafka
○ more event-based updates