Apache Cassandra at Target - Cassandra Summit 2014

Apache Cassandra at Target:
Pioneering NoSQL
in a Big Enterprise
Dan Cundiff (@pmotch)
Target

Context
● Target’s API platform
● mostly REST APIs
● e.g. products, locations, inventory, etc.
● consumers inside and outside of Target
● wide variety of providing systems (legacy, in-house
built, saas, packages, etc.)

Problems we needed to solve
● slow providing systems
● cost prohibitive to call directly
● unable to scale from increased demand
● need a place to aggregate data from multiple
systems
● some data wasn’t even in a database to
begin with!

Barriers with existing tools, part 1
● cost too much
● process for traditional DBs wasn’t a fit
● too few tools/vendors

● RDBMS isn’t:
○ distributed (multi-tenant)
○ close to Guests (geographic distribution)
○ distributed across our data centers
○ distributed to the cloud!

● lack of performance control
○ process, not owning it all, flexibility on
changes like indexing, etc
● availability
○ systems before had outages, downtime,
etc.
● not automate-able

Taking the idea back
● i just went and talked to Pete and we
decided to do it!
● tried other things in the past
● show results by trying; succeed or fail fast

Reasons trying was attractive, part 1
● fit 80% of our need
● years in development
● rich C* dev ecosystem

● google-able
● strong community
● a company who would support it

● chef-able
● aligned well with existing investments
● simple pricing model

Barriers to adoption
● enterprise IT; the nature of it
● selling it
● NoSQL for the first time
● automation (was happening at the time;
scary to do)
● political

Challenges integrating
● bulk loading data
● keeping cassandra in sync
● many systems not event driven
● packaged software
● limited ways to integrate with providing
systems

Challenges of standing it up, part 1
● early distributed system (new to teams)
● needed local disk (always used SAN before)
● needed SSDs (always used spinning things)
● existing config conflicts (backups,
monitoring, raid, swap, etc)
● use right sized server (don’t settle for what
your infra friends give you by default)

● full stack ownership
● it’s new, don’t hand it off
● support response is quick because we own it
● you’re closest to the problem; you’re best
suited to solve it
● tuned to meet the needs of our APIs
● data is modeled for API performance gains

● skills supply is low (but getting better)
● train your people
● be wary of promises from consultants
○ grill them on what they claim to know

Challenges of development, part 1
● skills ramp up (data modeling, datastax
driver, etc)
● developers need to care
○ encourage tweaking, research, make
things better
○ clients are equally as important to get the
most out of C*

Challenges of development, part 2
● mind shift from RDBMS
● started with Astyanax; switched to DataStax
driver
○ DataStax supported
○ newer features

Ops challenges, part 1
● lots of machines; don’t config by hand
● wrote Chef cookbooks
● support people saw these odd servers and
turned on things we disabled (like swap)
● can’t use “legacy” testing, cassandra works
differently; chaos stuff (turn off gossip, thrift,
etc.)

● made logging awesome; we can see
anything
● utilized C* jmx interface to send data in real-time
to Splunk
● can correlate these events with the app tier
(because app logs are in Splunk too!)

● useful mbeans:
○ heap usage
○ specific read/write latencies
○ dropped reads/writes
○ bloom filter ratios
○ column count, size

● more useful mbeans:
○ ss tables per read
○ tombstones
○ cache hits and ratios
○ misbehaving queries (range slice)

Open source cookbook!
● https://github.com/target/dse-cookbook
● by Danny Parker
● pull requests encouraged

Blog post on tuning
● http://target.github.io/infrastructure/tuning-cassandra/
● by Danny Parker (@dcparker88)

Results, part 1
● from n00bs to production ready = 2 months!
○ infra, operation testing, app dev, and
deployed!
○ just in time before peak season
● today our highest volume APIs depend on it

Results, part 2
● growth (↑ functions + ↑ volume) = ~2000%
● increased adoption of our APIs
● C* unlocking things we couldn't do before
● quick changes possible
○ makes Agile possible
○ gets us close to continuous delivery

Results, part 3
● other teams are using it; more coming
● sharing our cookbooks, lessons, etc.
● opened the door to other distributed systems

Future, part 1
● Use across more of our APIs
● Remove remaining spinning disks

Future, part 2
● move to cloud
● automate full stack down to infra
○ scale, quick geo-distribute, flexibility to
tweak new infra settings, etc.

Future, part 3
● get better at data modeling designs
● less bulk loading
○ remove compaction process overhead
● weave in Spark, Kafka
○ more event-based updates

Future, part crazy
● Docker + Cassandra?

We’re hiring!
Come talk to us

#CassandraSummit
Dan Cundiff (@pmotch)
Danny Parker (@dcparker88)
Pete Guidarelli (@pguidarelli)
Heather Mickman (@hmmickman)

Apache Cassandra at Target - Cassandra Summit 2014

Apache Cassandra at Target - Cassandra Summit 2014

More Related Content

Viewers also liked

Similar to Apache Cassandra at Target - Cassandra Summit 2014

More from Dan Cundiff

Recently uploaded

Apache Cassandra at Target - Cassandra Summit 2014