According to a recent Harvard Business Review study, there’s only a 43% chance that customers who have a poor experience will stick with you for the next 12 months. Contrast that to the 74% that will remain your customer if they have a great experience. Learn how Macy’s, a leading American department store chain founded in 1858 with over 750 stores in North America, is transforming their customer experience with DataStax Enterprise.
Webinar recording: https://youtu.be/CiUVxh6Ov_E
View current and past DataStax webinars: http://www.datastax.com/resources/webinars
5. About Us – Macys.com
• ~1,000 full-time employees
• ~500 technology employees
• Most server side apps are Java-based
• Heavy reliance on Spring Framework
• 6th largest eCommerce site
• >$4 billion in sales last year
• 5-25% growth year-over-year
• depending on product category
• More investment in dot-com properties
6. About Me – Peter Connolly
• At Macys.com for 5 years
• Senior Application Architect
• Was lead architect on Catalog Migration
• Currently one of two lead architects on multi-data center
expansion
7. About the Macys.com Catalog
• Transformed customer experience by serving product,
inventory and store data to website, mobile, store, and
partner applications
• Customer experience needed to keep up with growth in
business:
• 10x growth in data
• Move from a read-mostly model to one which could
handle near-real-time updates
• Move into multiple data centers
• Real-time RESTful data services application
8. Some History
• Macy’s online catalog was based in 3NF RDBMS
• Write throughput was adequate for online catalog
• Read performance was way too slow – customer
experience impact
• Had to add an in-memory data grid
• That further increased refresh times – customer
experience impact
• Both were very expensive, proprietary packages
• And Catalog loaded before Search Index
• Nightly refresh times were ~6hrs and growing
9. Technical Problems
• Scalability Problems – Customer Experience Impact
• Heavily normalized relational database
• Caching – Customer Experience Impact
• Local caches still put too much load on DB
• Large latency gap between front cache hit vs. miss
• Large updates led to GC problems
• Operational Problems
• Limited time window to pre-warm cache
• Needed second cache cluster for failover
• Complicated release process
• Expensive to scale out distributed cache
10. Business Problems
• Limits on the size of the Macy’s catalog
• Could not add large amounts of store-only data
• Long nightly refresh Catalog not up-to-date
• In-memory DataGrid failures
• Large grid clusters had stability problems
• Large hit in response times if grid went down
• Created a back-up grid to deal with failures, but…
• This led to even longer refresh times
12. Options Evaluated - 2013
• Denormalized Relational DB: DB2, Oracle
• Document DB: MongoDB
• Columnar DB: DataStax Enterprise (Apache CassandraTM), Couchbase,
ActiveSpaces (TIBCO)
• Graph: Neo4J
• Object: Versant
• Any selection had to have commercial support
13. Options Short List
• MongoDB
• Feature-rich, JSON document DB
• DataStax Enterprise (Apache CassandraTM)
• Scalable, true peer-to-peer
• ActiveSpaces
• TIBCO Proprietary. In-memory key/value datagrid
• Existing relationship with TIBCO
• Vendors assisted setup and running benchmarks
• DataStax, 10Gen & TIBCO
14. POC Benchmark Environment
• Amazon EC2
• 5 servers
• Same servers used for each test
• Had the 3 DB vendors assist with setup and
execution of benchmarks
• Modeled benchmarks after our own retail
inventory use cases
• C3.2xlarge instances
• 8 vCPU, 15GB RAM, 2x80GB SSD
• Baseline of ~150MM inventory records
17. Data Model Requirements
• Model hierarchical domain objects
• Aggregate data to minimize reads
• No reads before writes
• Readable by 3rd party tools
• (e.g., cqlsh, Dev Center)
• DataStax pivotal in modeling process
18. Data Model based on Lists
id | name | upcId | upcColor | upcAttrId | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+-----------+---------------------+-----------------+---------------------------------------------------+---------------------+--------------
11 | Nike Pants | [22, 33] | ['White', 'Red'] | [44, 55, 66] | ['ACTIVE', 'PROMOTION', 'ACTIVE'] | ['Y', 'Y', 'N'] | [22, 22, 33]
19. Data Model based on Maps
id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+--------------------------
11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66:
33}
22. JSON vs. primitive column types
• Significantly reduces storage overhead because of
better ratio of payload / storage metadata
• Improves throughput and latency
• Supports complex hierarchical structures
• But it loses in partial reads / updates
• Complicates schema versioning
30. Customer Experience Improvements
• Much faster Catalog refreshes
• RDBMS + DataGrid refresh times ~3 hours
• Cassandra refresh times ~½ hour
• Moving to Incremental Refreshes
• Refresh times will be even faster less than ½ hour
• Capacity for Catalog growth
• Ability to scale number of products/UPCs
• Ability to scale numbers of requests per second linearly
• Currently migrating Store Catalog online
31. DataStax Integrations
• DSE Java Client
• CQL support
• Intelligent request routing
• DSE and Apache SparkTM Integration
• Lets Apache SparkTM use Cassandra as datastore
• Added 4 nodes to each cluster for Spark analytics
• Gives us real-time analytics on Catalog & Inventory
• Great for problem resolution
33. What Worked Well
• Partnership with DataStax: Modeling/Performance
• CQL3 easy to migrate from SQL background
• Stable: Haven’t lost a node in production
• Able to reduce our reliance on caching in app tier
• Documentation is good
• Query tracing is helpful
• Cassandra Cluster Manager
(https://github.com/pcmanus/ccm)
34. Problems encountered with Cassandra
• Certain CQL queries brought down cluster
• DataStax fixed this (phew)
• Delete and creating a keyspace …
• Lengthy compaction
• Need to understand underlying storage
• OpsCenter Performance Charts
35. Our own problems
• Small cluster ⇒ all servers must perform well
• Lack of versioning of JSON schema
• How to handle exports
• 5% CPU would spike to 30% during exports
• Under-allocated test environments (> 1 vCPU)
• Non-primary key access
37. Future Plans
• Upgrade DSE 4.8.4 → 5.0.x (2017)
• Apache Cassandra 2.1.12 → 3.0.7+
• (Requires RHEL 7.2)
• Currently trialing DSE 5.0.1 (C* 3.0.7)
• For Store & Online Catalog & Inventory POC
• Multi-Datacenter
• Using Mutagen for schema management
• JSON schema versioning
38. Stuff we’d like to see in DSE
• DSE Kibana
• Augment Spark’s query capability with an
interactive GUI
• Replication listeners
• Detect when replication is complete