Apache Cassandra at Target: 
Pioneering NoSQL 
in a Big Enterprise 
Dan Cundiff (@pmotch) 
Target
Context 
● Target’s API platform 
● mostly REST APIs 
● e.g. products, locations, inventory, etc. 
● consumers inside and outside of Target 
● wide variety of providing systems (legacy, in-house 
built, saas, packages, etc.)
Problems we needed to solve 
● slow providing systems 
● cost prohibitive to call directly 
● unable to scale from increased demand 
● need a place to aggregate data from multiple 
systems 
● some data wasn’t even in a database to 
begin with!
Barriers with existing tools, part 1 
● cost too much 
● process for traditional DBs wasn’t a fit 
● too few tools/vendors
Barriers with existing tools, part 2 
● RDBMS isn’t: 
○ distributed (multi-tenant) 
○ close to Guests (geographic distribution) 
○ distributed across our data centers 
○ distributed to the cloud!
Barriers with existing tools, part 3 
● lack of performance control 
○ process, not owning it all, flexibility on 
changes like indexing, etc 
● availability 
○ systems before had outages, downtime, 
etc. 
● not automate-able
Discovering the solution
Taking the idea back 
● i just went and talked to Pete and we 
decided to do it! 
● tried other things in the past 
● show results by trying; succeed or fail fast
Reasons trying was attractive, part 1 
● fit 80% of our need 
● years in development 
● rich C* dev ecosystem
Reasons trying was attractive, part 2 
● google-able 
● strong community 
● a company who would support it
Reasons trying was attractive, part 3 
● chef-able 
● aligned well with existing investments 
● simple pricing model
Barriers to adoption 
● enterprise IT; the nature of it 
● selling it 
● NoSQL for the first time 
● automation (was happening at the time; 
scary to do) 
● political
Challenges integrating 
● bulk loading data 
● keeping cassandra in sync 
● many systems not event driven 
● packaged software 
● limited ways to integrate with providing 
systems
Challenges of standing it up, part 1 
● early distributed system (new to teams) 
● needed local disk (always used SAN before) 
● needed SSDs (always used spinning things) 
● existing config conflicts (backups, 
monitoring, raid, swap, etc) 
● use right sized server (don’t settle for what 
your infra friends give you by default)
Challenges of standing it up, part 2 
● full stack ownership 
● it’s new, don’t hand it off 
● support response is quick because we own it 
● you’re closest to the problem; you’re best 
suited to solve it 
● tuned to meet the needs of our APIs 
● data is modeled for API performance gains
Challenges of standing it up, part 3 
● skills supply is low (but getting better) 
● train your people 
● be wary of promises from consultants 
○ grill them on what they claim to know
Challenges of development, part 1 
● skills ramp up (data modeling, datastax 
driver, etc) 
● developers need to care 
○ encourage tweaking, research, make 
things better 
○ clients are equally as important to get the 
most out of C*
Challenges of development, part 2 
● mind shift from RDBMS 
● started with Astyanax; switched to DataStax 
driver 
○ DataStax supported 
○ newer features
Ops challenges, part 1 
● lots of machines; don’t config by hand 
● wrote Chef cookbooks 
● support people saw these odd servers and 
turned on things we disabled (like swap) 
● can’t use “legacy” testing, cassandra works 
differently; chaos stuff (turn off gossip, thrift, 
etc.)
Ops challenges, part 2 
● made logging awesome; we can see 
anything 
● utilized C* jmx interface to send data in real-time 
to Splunk 
● can correlate these events with the app tier 
(because app logs are in Splunk too!)
Ops challenges, part 3 
● useful mbeans: 
○ heap usage 
○ specific read/write latencies 
○ dropped reads/writes 
○ bloom filter ratios 
○ column count, size
Ops challenges, part 4 
● more useful mbeans: 
○ ss tables per read 
○ tombstones 
○ cache hits and ratios 
○ misbehaving queries (range slice)
Open source cookbook! 
● https://github.com/target/dse-cookbook 
● by Danny Parker 
● pull requests encouraged
Blog post on tuning 
● http://target.github.io/infrastructure/tuning-cassandra/ 
● by Danny Parker (@dcparker88)
Results, part 1 
● from n00bs to production ready = 2 months! 
○ infra, operation testing, app dev, and 
deployed! 
○ just in time before peak season 
● today our highest volume APIs depend on it
Results, part 2 
● growth (↑ functions + ↑ volume) = ~2000% 
● increased adoption of our APIs 
● C* unlocking things we couldn't do before 
● quick changes possible 
○ makes Agile possible 
○ gets us close to continuous delivery
Results, part 3 
● other teams are using it; more coming 
● sharing our cookbooks, lessons, etc. 
● opened the door to other distributed systems
Future, part 1 
● Use across more of our APIs 
● Remove remaining spinning disks
Future, part 2 
● move to cloud 
● automate full stack down to infra 
○ scale, quick geo-distribute, flexibility to 
tweak new infra settings, etc.
Future, part 3 
● get better at data modeling designs 
● less bulk loading 
○ remove compaction process overhead 
● weave in Spark, Kafka 
○ more event-based updates
Future, part crazy 
● Docker + Cassandra?
We’re hiring! 
Come talk to us
#CassandraSummit 
Dan Cundiff (@pmotch) 
Danny Parker (@dcparker88) 
Pete Guidarelli (@pguidarelli) 
Heather Mickman (@hmmickman)
Apache Cassandra at Target - Cassandra Summit 2014

Apache Cassandra at Target - Cassandra Summit 2014

  • 1.
    Apache Cassandra atTarget: Pioneering NoSQL in a Big Enterprise Dan Cundiff (@pmotch) Target
  • 2.
    Context ● Target’sAPI platform ● mostly REST APIs ● e.g. products, locations, inventory, etc. ● consumers inside and outside of Target ● wide variety of providing systems (legacy, in-house built, saas, packages, etc.)
  • 3.
    Problems we neededto solve ● slow providing systems ● cost prohibitive to call directly ● unable to scale from increased demand ● need a place to aggregate data from multiple systems ● some data wasn’t even in a database to begin with!
  • 4.
    Barriers with existingtools, part 1 ● cost too much ● process for traditional DBs wasn’t a fit ● too few tools/vendors
  • 5.
    Barriers with existingtools, part 2 ● RDBMS isn’t: ○ distributed (multi-tenant) ○ close to Guests (geographic distribution) ○ distributed across our data centers ○ distributed to the cloud!
  • 6.
    Barriers with existingtools, part 3 ● lack of performance control ○ process, not owning it all, flexibility on changes like indexing, etc ● availability ○ systems before had outages, downtime, etc. ● not automate-able
  • 7.
  • 8.
    Taking the ideaback ● i just went and talked to Pete and we decided to do it! ● tried other things in the past ● show results by trying; succeed or fail fast
  • 9.
    Reasons trying wasattractive, part 1 ● fit 80% of our need ● years in development ● rich C* dev ecosystem
  • 10.
    Reasons trying wasattractive, part 2 ● google-able ● strong community ● a company who would support it
  • 11.
    Reasons trying wasattractive, part 3 ● chef-able ● aligned well with existing investments ● simple pricing model
  • 12.
    Barriers to adoption ● enterprise IT; the nature of it ● selling it ● NoSQL for the first time ● automation (was happening at the time; scary to do) ● political
  • 13.
    Challenges integrating ●bulk loading data ● keeping cassandra in sync ● many systems not event driven ● packaged software ● limited ways to integrate with providing systems
  • 14.
    Challenges of standingit up, part 1 ● early distributed system (new to teams) ● needed local disk (always used SAN before) ● needed SSDs (always used spinning things) ● existing config conflicts (backups, monitoring, raid, swap, etc) ● use right sized server (don’t settle for what your infra friends give you by default)
  • 15.
    Challenges of standingit up, part 2 ● full stack ownership ● it’s new, don’t hand it off ● support response is quick because we own it ● you’re closest to the problem; you’re best suited to solve it ● tuned to meet the needs of our APIs ● data is modeled for API performance gains
  • 16.
    Challenges of standingit up, part 3 ● skills supply is low (but getting better) ● train your people ● be wary of promises from consultants ○ grill them on what they claim to know
  • 17.
    Challenges of development,part 1 ● skills ramp up (data modeling, datastax driver, etc) ● developers need to care ○ encourage tweaking, research, make things better ○ clients are equally as important to get the most out of C*
  • 18.
    Challenges of development,part 2 ● mind shift from RDBMS ● started with Astyanax; switched to DataStax driver ○ DataStax supported ○ newer features
  • 19.
    Ops challenges, part1 ● lots of machines; don’t config by hand ● wrote Chef cookbooks ● support people saw these odd servers and turned on things we disabled (like swap) ● can’t use “legacy” testing, cassandra works differently; chaos stuff (turn off gossip, thrift, etc.)
  • 20.
    Ops challenges, part2 ● made logging awesome; we can see anything ● utilized C* jmx interface to send data in real-time to Splunk ● can correlate these events with the app tier (because app logs are in Splunk too!)
  • 21.
    Ops challenges, part3 ● useful mbeans: ○ heap usage ○ specific read/write latencies ○ dropped reads/writes ○ bloom filter ratios ○ column count, size
  • 22.
    Ops challenges, part4 ● more useful mbeans: ○ ss tables per read ○ tombstones ○ cache hits and ratios ○ misbehaving queries (range slice)
  • 23.
    Open source cookbook! ● https://github.com/target/dse-cookbook ● by Danny Parker ● pull requests encouraged
  • 24.
    Blog post ontuning ● http://target.github.io/infrastructure/tuning-cassandra/ ● by Danny Parker (@dcparker88)
  • 25.
    Results, part 1 ● from n00bs to production ready = 2 months! ○ infra, operation testing, app dev, and deployed! ○ just in time before peak season ● today our highest volume APIs depend on it
  • 26.
    Results, part 2 ● growth (↑ functions + ↑ volume) = ~2000% ● increased adoption of our APIs ● C* unlocking things we couldn't do before ● quick changes possible ○ makes Agile possible ○ gets us close to continuous delivery
  • 27.
    Results, part 3 ● other teams are using it; more coming ● sharing our cookbooks, lessons, etc. ● opened the door to other distributed systems
  • 28.
    Future, part 1 ● Use across more of our APIs ● Remove remaining spinning disks
  • 29.
    Future, part 2 ● move to cloud ● automate full stack down to infra ○ scale, quick geo-distribute, flexibility to tweak new infra settings, etc.
  • 30.
    Future, part 3 ● get better at data modeling designs ● less bulk loading ○ remove compaction process overhead ● weave in Spark, Kafka ○ more event-based updates
  • 31.
    Future, part crazy ● Docker + Cassandra?
  • 32.
  • 33.
    #CassandraSummit Dan Cundiff(@pmotch) Danny Parker (@dcparker88) Pete Guidarelli (@pguidarelli) Heather Mickman (@hmmickman)