Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experience

Macy’s: Why Your Database Decision Directly
Impacts Customer Experience
October 5, 2016
Peter Connolly, Senior Architect @ Macys.com

Agenda
• Background/Problem Statement
• Options
• Data Models
• Migration
• Performance
• Business Outcomes
• Retrospectives

Background & Problem Statement

About Us – Macys.com
• ~1,000 full-time employees
• ~500 technology employees
• Most server side apps are Java-based
• Heavy reliance on Spring Framework
• 6th largest eCommerce site
• >$4 billion in sales last year
• 5-25% growth year-over-year
• depending on product category
• More investment in dot-com properties

About Me – Peter Connolly
• At Macys.com for 5 years
• Senior Application Architect
• Was lead architect on Catalog Migration
• Currently one of two lead architects on multi-data center
expansion

About the Macys.com Catalog
• Transformed customer experience by serving product,
inventory and store data to website, mobile, store, and
partner applications
• Customer experience needed to keep up with growth in
business:
• 10x growth in data
• Move from a read-mostly model to one which could
handle near-real-time updates
• Move into multiple data centers
• Real-time RESTful data services application

Some History
• Macy’s online catalog was based in 3NF RDBMS
• Write throughput was adequate for online catalog
• Read performance was way too slow – customer
experience impact
• Had to add an in-memory data grid
• That further increased refresh times – customer
experience impact
• Both were very expensive, proprietary packages
• And Catalog loaded before Search Index
• Nightly refresh times were ~6hrs and growing

Technical Problems
• Scalability Problems – Customer Experience Impact
• Heavily normalized relational database
• Caching – Customer Experience Impact
• Local caches still put too much load on DB
• Large latency gap between front cache hit vs. miss
• Large updates led to GC problems
• Operational Problems
• Limited time window to pre-warm cache
• Needed second cache cluster for failover
• Complicated release process
• Expensive to scale out distributed cache

Business Problems
• Limits on the size of the Macy’s catalog
• Could not add large amounts of store-only data
• Long nightly refresh  Catalog not up-to-date
• In-memory DataGrid failures
• Large grid clusters had stability problems
• Large hit in response times if grid went down
• Created a back-up grid to deal with failures, but…
• This led to even longer refresh times

Options Evaluated - 2013
• Denormalized Relational DB: DB2, Oracle
• Document DB: MongoDB
• Columnar DB: DataStax Enterprise (Apache CassandraTM), Couchbase,
ActiveSpaces (TIBCO)
• Graph: Neo4J
• Object: Versant
• Any selection had to have commercial support

Options Short List
• MongoDB
• Feature-rich, JSON document DB
• DataStax Enterprise (Apache CassandraTM)
• Scalable, true peer-to-peer
• ActiveSpaces
• TIBCO Proprietary. In-memory key/value datagrid
• Existing relationship with TIBCO
• Vendors assisted setup and running benchmarks
• DataStax, 10Gen & TIBCO

POC Benchmark Environment
• Amazon EC2
• 5 servers
• Same servers used for each test
• Had the 3 DB vendors assist with setup and
execution of benchmarks
• Modeled benchmarks after our own retail
inventory use cases
• C3.2xlarge instances
• 8 vCPU, 15GB RAM, 2x80GB SSD
• Baseline of ~150MM inventory records

POC Results - Summary
DataStax Enterprise
(Apache CassandraTM)

Data Model Requirements
• Model hierarchical domain objects
• Aggregate data to minimize reads
• No reads before writes
• Readable by 3rd party tools
• (e.g., cqlsh, Dev Center)
• DataStax pivotal in modeling process

Data Model based on Maps
id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+--------------------------
11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66:
33}

Data Model based on Compound Key & JSON
CREATE TABLE product (
id int, -- product id
upcId int, -- 0 means product row,
-- otherwise upc row
object text, -- JSON object
review text, -- JSON object
PRIMARY KEY(id, upcId)
);
id | upcId | object | review
----+--------+------------------------------------------------------------------------------------------------------------+--------------------------------------
11 | 0 | {"id" : "11", "name" : "Nike Pants"} | {"avgRating" : "5", "count" : "567"}
11 | 22 | {"color" : "White", "attr" : [{"id": "44", "name": "ACTIVE", "value": "Y"}, ...]} | null
11 | 33 | {"color" : "Red", "attr" : [{"id": "66", "name": "ACTIVE", "value": "N"}]} | null

Data Load Performance Comparison
July 2013

JSON vs. primitive column types
• Significantly reduces storage overhead because of
better ratio of payload / storage metadata
• Improves throughput and latency
• Supports complex hierarchical structures
• But it loses in partial reads / updates
• Complicates schema versioning

Phase 0 – Starting Point
All services were
Hessian binary
HTTP protocol

Phase 1 – Adding New to Old
All new services are
RESTful JSON
Same RESTful JSON
services built on old
platform
DataStax Enterprise

Phase 2 – All Feeds Going to New
DataStax Enterprise

Phase 3 – Target: Retire Old Platform
DataStax Enterprise

Changing Engines
Hybrid with Legacy Complete Switchover

Customer Experience Improvements
• Much faster Catalog refreshes
• RDBMS + DataGrid refresh times  ~3 hours
• Cassandra refresh times  ~½ hour
• Moving to Incremental Refreshes
• Refresh times will be even faster  less than ½ hour
• Capacity for Catalog growth
• Ability to scale number of products/UPCs
• Ability to scale numbers of requests per second linearly
• Currently migrating Store Catalog online

DataStax Integrations
• DSE Java Client
• CQL support
• Intelligent request routing
• DSE and Apache SparkTM Integration
• Lets Apache SparkTM use Cassandra as datastore
• Added 4 nodes to each cluster for Spark analytics
• Gives us real-time analytics on Catalog & Inventory
• Great for problem resolution

What Worked Well
• Partnership with DataStax: Modeling/Performance
• CQL3 easy to migrate from SQL background
• Stable: Haven’t lost a node in production
• Able to reduce our reliance on caching in app tier
• Documentation is good
• Query tracing is helpful
• Cassandra Cluster Manager
(https://github.com/pcmanus/ccm)

Problems encountered with Cassandra
• Certain CQL queries brought down cluster
• DataStax fixed this (phew)
• Delete and creating a keyspace …
• Lengthy compaction
• Need to understand underlying storage
• OpsCenter Performance Charts

Our own problems
• Small cluster ⇒ all servers must perform well
• Lack of versioning of JSON schema
• How to handle exports
• 5% CPU would spike to 30% during exports
• Under-allocated test environments (> 1 vCPU)
• Non-primary key access

Future Plans
• Upgrade DSE 4.8.4 → 5.0.x (2017)
• Apache Cassandra 2.1.12 → 3.0.7+
• (Requires RHEL 7.2)
• Currently trialing DSE 5.0.1 (C* 3.0.7)
• For Store & Online Catalog & Inventory POC
• Multi-Datacenter
• Using Mutagen for schema management
• JSON schema versioning

Stuff we’d like to see in DSE
• DSE Kibana
• Augment Spark’s query capability with an
interactive GUI
• Replication listeners
• Detect when replication is complete

Contacts & Resources
Macy’s
• Peter Connolly: Senior Architect @ Macys.com
• Contact: https://www.linkedin.com/in/peter-connolly-0a544a
DataStax
• Allene Jue: Product Marketing
• Email: allene.jue@datastax.com
Resources
• www.macys.com
• www.datastax.com
© 2015 DataStax, All Rights Reserved. 40

Before we go…a few reminders
• Download DSE 5.0, available today!
• Check out upcoming webinars here:
http://www.datastax.com/resources/webinars
• Become a DataStax Professional Community Member:
http://academy.datastax.com/community
© DataStax, All Rights Reserved.

POC Results - Initial Load
Operation TIBCO
ActiveSpaces 1.0
Apache
Cassandra 1.2.10
10Gen
MongoDB 2.0
Initial Load
~148MM Records
Same datacenter
(availability zone)
72,000 TPS (32 nodes) 34 min
98,000 TPS (40 nodes) 25 min
CPU: 62.4% I/O, Disk 1: 36%
Memory: 83.0% I/O, Disk 2: 35%
65,000 TPS (5 nodes) 38 min
CPU: 31% I/O, Disk 1: 42%
Memory: 59% I/O, Disk 2: 14%
20,000 TPS (5 nodes) ?? min
(Did not complete)
processed ~23MM records
Upsert
(Writes)
ActiveSpaces sync
writes vs.
Cassandra async
4,000 TPS
3.57 ms Avg Latency
CPU: 0.6% I/O: 18% (disk 1)
Memory: 71% I/O: 17% (disk 2)
4,000 TPS
3.2 ms Avg Latency
CPU: 3.7% I/O: 0.3%
Memory: 77% I/O: 2.2%
(Did not complete)
tests failed
Read 400 TPS
2.54 ms Avg Latency
CPU: 0.06% I/O: 0%
Memory: 62.4%
400 TPS
3.23 ms Avg Latency
CPU: 0.02% I/O: 3.7%
Memory: 47%
(Did not complete)
tests failed

POC Results - Summary
- DataStax
Enterprise
(Apache CassandraTM)
& ActiveSpaces
- Very
close
- MongoDB
- Failed
tests
YMMV!
Your mileage
may (will
probably) vary

Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experience

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experience

Similar to Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experience (20)

More from DataStax

More from DataStax (20)

Recently uploaded

Recently uploaded (20)

Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experience

Editor's Notes