SlideShare a Scribd company logo
1 of 38
Download to read offline
Changing Engines in Mid-Flight: Cassandra @ Macys
Andrew Santoro Development Manager
Peter Connolly Application Architect
Ivan Ukolov Architect/Implementation Lead
© 2015. All Rights Reserved.
1 Problem Statement
2 Options
3 Migration
4 Data Models
5 Retrospective
2
© 2015. All Rights Reserved.
3
Background & Problem Statement
© 2015. All Rights Reserved.
About Us
• Real-time RESTful data services application
• Serve product, inventory and store data to website, mobile, store,
other service applications, and partners
• Not “Big Data” or Analytics
• Challenge:
• 10x growth in data
• Move from a batch update, read-mostly model > near-real-time updates
• Move into multiple data centers
4
© 2015. All Rights Reserved.
Problems
• Scalability Problems
– Heavily normalized relational database
• 51 tables/4 unions to load a complete product
• DB response time could be in seconds
– Solved via caching?
• Local caches put too much load on DB
• Latency difference between front cache hit vs miss
• Large batch updates led to GC issues
• Operational Problems
– Limited window to pre-warm cache
– Needed 2nd cache cluster for failover
– Two clusters complicated releases
– Expensive to scale out
5
© 2015. All Rights Reserved.
6
Options Explored
© 2015. All Rights Reserved.
Options Evaluated
Storage Type Products
Denormalized Relational
DB2
Oracle
Document Mongo
Columnar
Cassandra
Couchbase
ActiveSpaces
Graph Neo4J
Object Versant
7
© 2015. All Rights Reserved.
Options: Short List
8
• MongoDB
– Feature Rich, JSON document DB
• Cassandra
– Scalable, true peer-to-peer
• Active Spaces
– Tibco proprietary
– Existing relationship with Tibco
– In-memory key/value datagrid
© 2015. All Rights Reserved.
POC Benchmark Environment
9
• Amazon EC2
– 5 servers
• Same servers used for each test
– Had DB vendors assist with setup and execution of benchmarks
– Modeled benchmarks after retail inventory use cases
– C3.2xlarge instances
• 8 vCPU, 15GB RAM, 2x80GB SSD
– Baseline of 148MM inventory records
© 2015. All Rights Reserved.
POC Results – Initial Load
Operation TIBCO ActiveSpaces Apache Cassandra 1.2.10 10Gen MongoDB 2.0
Initial Load
~148MM Records
Same datacenter
(availability zone)
72,000 TPS (32 nodes) 34 min
98,000 TPS (40 nodes) 25 min
CPU: 62.4% I/O, Disk 1: 36%
Memory: 83.0% I/O, Disk 2: 35%
65,000 TPS (5 nodes) 38 min
CPU: 31% I/O, Disk 1: 42%
Memory: 59% I/O, Disk 2: 14%
20,000 TPS (5 nodes) ??
min
(Did not complete)
processed ~23MM records
Upsert
(Writes)
ActiveSpaces sync writes
vs.Cassandra async
4,000 TPS
3.57 ms Avg Latency
CPU: 0.6% I/O: 18% (disk 1)
Memory: 71% I/O: 17% (disk 2)
4,000 TPS
3.2 ms Avg Latency
CPU: 3.7% I/O: 0.3%
Memory: 77% I/O: 2.2%
(Did not complete)
tests failed
Read
400 TPS
2.54 ms Avg Latency
CPU: 0.06% I/O: 0%
Memory: 62.4%
400 TPS
3.23 ms Avg Latency
CPU: 0.02% I/O: 3.7%
Memory: 47%
(Did not complete)
tests failed
10
© 2015. All Rights Reserved.
POC Results – Summary
11
© 2015. All Rights Reserved.
12
Migration Path
© 2015. All Rights Reserved.
Phase 0 – Starting Point
13
© 2015. All Rights Reserved.
Phase 1 – Adding New to Old
14
© 2015. All Rights Reserved.
Phase 2 – All Feeds Going to New
15
© 2015. All Rights Reserved.
Phase 3 – Target: Retire Old Platform
16
© 2015. All Rights Reserved.
Changing Engines
17
© 2015. All Rights Reserved.
18
Data Models
© 2015. All Rights Reserved.
Data Model Requirements
19
• Model hierarchical domain objects
• Aggregate data to minimize reads
• No reads before writes
• Readable by 3rd party tools
• cqlsh
• Dev Center
© 2015. All Rights Reserved.
Data Model Based on Lists
20
id | name | upcId | upcColor | upcAttrId | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+-----------+---------------------+-----------------+---------------------------------------------------+---------------------+--------------
11 | Nike Pants | [22, 33] | ['White', 'Red'] | [44, 55, 66] | ['ACTIVE', 'PROMOTION', 'ACTIVE'] | ['Y', 'Y', 'N'] | [22, 22, 33]
© 2015. All Rights Reserved.
Data Model Based on Maps
21
id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+--------------------------
11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66: 33}
© 2015. All Rights Reserved.
Collection Size Limits
22
• Native protocol v1/v2
– collection max size is 64K
– collection element max size is 64K
• Native protocol v3
– the collection size and the length of each element is 4 bytes long now
– still 64K limit for set element max size and map key max size
– official doc doesn't recommend going above 64K for any collection type
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec
© 2015. All Rights Reserved.
Collections Internal Storage
23
CREATE TABLE collections (
my_key int PRIMARY KEY,
my_set set<int>,
my_map map<int, int>,
my_list list<int>,
);
INSERT INTO collections
(my_key, my_set, my_map, my_list)
VALUES ( 1, {1, 2}, {1:2, 3:4}, [1, 2]);
SELECT * FROM collections ;
my_key | my_list | my_map | my_set
-------------+----------+---------------+--------
1 | [1, 2] | {1: 2, 3: 4} | {1, 2}
© 2015. All Rights Reserved.
Collections Internal Storage
24
RowKey: 1
=> (name=, value=, timestamp=1429162693574381)
=> (name=my_list:c646b7d0e3fa11e48d582f4261da0d90, value=00000001,
timestamp=1429162693574381)
=> (name=my_list:c646b7d1e3fa11e48d582f4261da0d90, value=00000002,
timestamp=1429162693574381)
=> (name=my_map:00000001, value=00000002, timestamp=1429162693574381)
=> (name=my_map:00000003, value=00000004, timestamp=1429162693574381)
=> (name=my_set:00000001, value=, timestamp=1429162693574381)
=> (name=my_set:00000002, value=, timestamp=1429162693574381)
© 2015. All Rights Reserved.
Data Model based on Compound Key & JSON
25
CREATE TABLE product (
id int, -- product id

upcId int, -- 0 means product row,
-- otherwise upc id indicating it is upc row
object text, -- JSON object
review text, -- JSON object
PRIMARY KEY(id, upcId)
);
id | upcId | object | review
----+--------+------------------------------------------------------------------------------------------------------------+--------------------------------------
11 | 0 | {"id" : "11", "name" : "Nike Pants"} | {"avgRating" : "5", "count" : "567"}
11 | 22 | {"color" : "White", "attr" : [{"id": "44", "name": "ACTIVE", "value": "Y"}, ...]} | null
11 | 33 | {"color" : "Red", "attr" : [{"id": "66", "name": "ACTIVE", "value": "N"}]} | null
© 2015. All Rights Reserved.
Data Load Performance Comparison
26
© 2015. All Rights Reserved.
Collection Size Limits
27
• regular column size = column name + column value + 15 bytes
• counter / expiring = +8 additional bytes
name : 2 bytes (length as short int) + byte[]
flags : 1 byte
timestamp : 8 bytes (long)
value : 4 bytes (length as int) + byte[]
http://btoddb-cass-storage.blogspot.ru/2011/07/column-overhead-and-sizing-every-column.html
https://issues.apache.org/jira/browse/CASSANDRA-4175
© 2015. All Rights Reserved.
JSON vs Primitive Column Types
28
• Significantly reduces storage overhead because of better ratio of
payload / storage metadata
• Improves throughput and latency
• Supports complex hierarchical structures
• But it loses in partial reads / updates
• Complicates schema versioning
• UDT might be worth to consider
© 2015. All Rights Reserved.
29
Retrospectives
© 2015. All Rights Reserved.
What Worked Well
30
• Documentation is good
• Good support from Datastax
• CQL3 easy to learn and migrate to from SQL background
• Query tracing is helpful
• Stable
• Able to reduce our reliance on caching in app tier
• Cassandra Cluster Manager (https://github.com/pcmanus/ccm)
© 2015. All Rights Reserved.
Problems we’ve had with Cassandra
31
• CQL queries brought down cluster
• Delete and creating a keyspace w/o truncating
• File I/O issues
• Lengthy compaction
• Tombstones, shouldn’t have been clustering
• Need to understand underlying storage
• Driver not knowing nodes left cluster
• OpsCenter Performance Charts
© 2015. All Rights Reserved.
Our Own Problems
32
• Small cluster: all servers must perform well
• Lack of versioning of JSON schema
• How to handle performant exports
• 5% CPU -> 30%
• Non-primary key access
• Under-allocated test environments
© 2015. All Rights Reserved.
33
Future Plans
© 2015. All Rights Reserved.
Future Plans
34
• Upgrade DSE 4.0.3 → 4.6.5
• Conditional updates based on column timestamps
• DSE Spark Integration for reporting
• Multi-Datacenter
• Using Mutagen for schema management
• JSON schema versioning
• RxJava for asynchronous operations
• Spring Data Cassandra
© 2015. All Rights Reserved.
Stuff we’d like to see in Cassandra/DSE
35
• DSE Kibana
– Augment Spark’s query capability with an interactive GUI
– Experiment with Banana (https://github.com/LucidWorks/banana)
• Replication listeners
– Detect when replication is complete
• Geospatial Query Support
We’re hiring ...
ecommerce.macysjobs.com
Thank you
© 2015. All Rights Reserved.
One Concern …
38
Patrick McFadin Buddy Pine/”Syndrome”
Secret Identity???

More Related Content

What's hot

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsDataStax Academy
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesScyllaDB
 
PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!DataStax Academy
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
 
DataStax: 0 to App faster with Ruby and NodeJS
DataStax: 0 to App faster with Ruby and NodeJSDataStax: 0 to App faster with Ruby and NodeJS
DataStax: 0 to App faster with Ruby and NodeJSDataStax Academy
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetDataStax Academy
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japanHiromitsu Komatsu
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Deep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraDeep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraAhmedabadJavaMeetup
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteScyllaDB
 
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with JepsenTesting Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsenjkni
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSDataStax Academy
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax Academy
 

What's hot (20)

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
 
PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!PagerDuty: Span the WAN? Yes you can!
PagerDuty: Span the WAN? Yes you can!
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
 
DataStax: 0 to App faster with Ruby and NodeJS
DataStax: 0 to App faster with Ruby and NodeJSDataStax: 0 to App faster with Ruby and NodeJS
DataStax: 0 to App faster with Ruby and NodeJS
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developer
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Deep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraDeep dive into event store using Apache Cassandra
Deep dive into event store using Apache Cassandra
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with JepsenTesting Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
 

Similar to Macy's: Changing Engines in Mid-Flight

Meeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaMeeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaScyllaDB
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLBjoern Rost
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basicsnitin anjankar
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performanceGuy Harrison
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Jim Czuprynski
 
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...Nelson Calero
 

Similar to Macy's: Changing Engines in Mid-Flight (20)

Meeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaMeeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with Scylla
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
Autonomous Transaction Processing (ATP): In Heavy Traffic, Why Drive Stick?
 
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Macy's: Changing Engines in Mid-Flight

  • 1. Changing Engines in Mid-Flight: Cassandra @ Macys Andrew Santoro Development Manager Peter Connolly Application Architect Ivan Ukolov Architect/Implementation Lead
  • 2. © 2015. All Rights Reserved. 1 Problem Statement 2 Options 3 Migration 4 Data Models 5 Retrospective 2
  • 3. © 2015. All Rights Reserved. 3 Background & Problem Statement
  • 4. © 2015. All Rights Reserved. About Us • Real-time RESTful data services application • Serve product, inventory and store data to website, mobile, store, other service applications, and partners • Not “Big Data” or Analytics • Challenge: • 10x growth in data • Move from a batch update, read-mostly model > near-real-time updates • Move into multiple data centers 4
  • 5. © 2015. All Rights Reserved. Problems • Scalability Problems – Heavily normalized relational database • 51 tables/4 unions to load a complete product • DB response time could be in seconds – Solved via caching? • Local caches put too much load on DB • Latency difference between front cache hit vs miss • Large batch updates led to GC issues • Operational Problems – Limited window to pre-warm cache – Needed 2nd cache cluster for failover – Two clusters complicated releases – Expensive to scale out 5
  • 6. © 2015. All Rights Reserved. 6 Options Explored
  • 7. © 2015. All Rights Reserved. Options Evaluated Storage Type Products Denormalized Relational DB2 Oracle Document Mongo Columnar Cassandra Couchbase ActiveSpaces Graph Neo4J Object Versant 7
  • 8. © 2015. All Rights Reserved. Options: Short List 8 • MongoDB – Feature Rich, JSON document DB • Cassandra – Scalable, true peer-to-peer • Active Spaces – Tibco proprietary – Existing relationship with Tibco – In-memory key/value datagrid
  • 9. © 2015. All Rights Reserved. POC Benchmark Environment 9 • Amazon EC2 – 5 servers • Same servers used for each test – Had DB vendors assist with setup and execution of benchmarks – Modeled benchmarks after retail inventory use cases – C3.2xlarge instances • 8 vCPU, 15GB RAM, 2x80GB SSD – Baseline of 148MM inventory records
  • 10. © 2015. All Rights Reserved. POC Results – Initial Load Operation TIBCO ActiveSpaces Apache Cassandra 1.2.10 10Gen MongoDB 2.0 Initial Load ~148MM Records Same datacenter (availability zone) 72,000 TPS (32 nodes) 34 min 98,000 TPS (40 nodes) 25 min CPU: 62.4% I/O, Disk 1: 36% Memory: 83.0% I/O, Disk 2: 35% 65,000 TPS (5 nodes) 38 min CPU: 31% I/O, Disk 1: 42% Memory: 59% I/O, Disk 2: 14% 20,000 TPS (5 nodes) ?? min (Did not complete) processed ~23MM records Upsert (Writes) ActiveSpaces sync writes vs.Cassandra async 4,000 TPS 3.57 ms Avg Latency CPU: 0.6% I/O: 18% (disk 1) Memory: 71% I/O: 17% (disk 2) 4,000 TPS 3.2 ms Avg Latency CPU: 3.7% I/O: 0.3% Memory: 77% I/O: 2.2% (Did not complete) tests failed Read 400 TPS 2.54 ms Avg Latency CPU: 0.06% I/O: 0% Memory: 62.4% 400 TPS 3.23 ms Avg Latency CPU: 0.02% I/O: 3.7% Memory: 47% (Did not complete) tests failed 10
  • 11. © 2015. All Rights Reserved. POC Results – Summary 11
  • 12. © 2015. All Rights Reserved. 12 Migration Path
  • 13. © 2015. All Rights Reserved. Phase 0 – Starting Point 13
  • 14. © 2015. All Rights Reserved. Phase 1 – Adding New to Old 14
  • 15. © 2015. All Rights Reserved. Phase 2 – All Feeds Going to New 15
  • 16. © 2015. All Rights Reserved. Phase 3 – Target: Retire Old Platform 16
  • 17. © 2015. All Rights Reserved. Changing Engines 17
  • 18. © 2015. All Rights Reserved. 18 Data Models
  • 19. © 2015. All Rights Reserved. Data Model Requirements 19 • Model hierarchical domain objects • Aggregate data to minimize reads • No reads before writes • Readable by 3rd party tools • cqlsh • Dev Center
  • 20. © 2015. All Rights Reserved. Data Model Based on Lists 20 id | name | upcId | upcColor | upcAttrId | upcAttrName | upcAttrValue | upcAttrRefId ----+----------------+-----------+---------------------+-----------------+---------------------------------------------------+---------------------+-------------- 11 | Nike Pants | [22, 33] | ['White', 'Red'] | [44, 55, 66] | ['ACTIVE', 'PROMOTION', 'ACTIVE'] | ['Y', 'Y', 'N'] | [22, 22, 33]
  • 21. © 2015. All Rights Reserved. Data Model Based on Maps 21 id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId ----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+-------------------------- 11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66: 33}
  • 22. © 2015. All Rights Reserved. Collection Size Limits 22 • Native protocol v1/v2 – collection max size is 64K – collection element max size is 64K • Native protocol v3 – the collection size and the length of each element is 4 bytes long now – still 64K limit for set element max size and map key max size – official doc doesn't recommend going above 64K for any collection type https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec
  • 23. © 2015. All Rights Reserved. Collections Internal Storage 23 CREATE TABLE collections ( my_key int PRIMARY KEY, my_set set<int>, my_map map<int, int>, my_list list<int>, ); INSERT INTO collections (my_key, my_set, my_map, my_list) VALUES ( 1, {1, 2}, {1:2, 3:4}, [1, 2]); SELECT * FROM collections ; my_key | my_list | my_map | my_set -------------+----------+---------------+-------- 1 | [1, 2] | {1: 2, 3: 4} | {1, 2}
  • 24. © 2015. All Rights Reserved. Collections Internal Storage 24 RowKey: 1 => (name=, value=, timestamp=1429162693574381) => (name=my_list:c646b7d0e3fa11e48d582f4261da0d90, value=00000001, timestamp=1429162693574381) => (name=my_list:c646b7d1e3fa11e48d582f4261da0d90, value=00000002, timestamp=1429162693574381) => (name=my_map:00000001, value=00000002, timestamp=1429162693574381) => (name=my_map:00000003, value=00000004, timestamp=1429162693574381) => (name=my_set:00000001, value=, timestamp=1429162693574381) => (name=my_set:00000002, value=, timestamp=1429162693574381)
  • 25. © 2015. All Rights Reserved. Data Model based on Compound Key & JSON 25 CREATE TABLE product ( id int, -- product id
 upcId int, -- 0 means product row, -- otherwise upc id indicating it is upc row object text, -- JSON object review text, -- JSON object PRIMARY KEY(id, upcId) ); id | upcId | object | review ----+--------+------------------------------------------------------------------------------------------------------------+-------------------------------------- 11 | 0 | {"id" : "11", "name" : "Nike Pants"} | {"avgRating" : "5", "count" : "567"} 11 | 22 | {"color" : "White", "attr" : [{"id": "44", "name": "ACTIVE", "value": "Y"}, ...]} | null 11 | 33 | {"color" : "Red", "attr" : [{"id": "66", "name": "ACTIVE", "value": "N"}]} | null
  • 26. © 2015. All Rights Reserved. Data Load Performance Comparison 26
  • 27. © 2015. All Rights Reserved. Collection Size Limits 27 • regular column size = column name + column value + 15 bytes • counter / expiring = +8 additional bytes name : 2 bytes (length as short int) + byte[] flags : 1 byte timestamp : 8 bytes (long) value : 4 bytes (length as int) + byte[] http://btoddb-cass-storage.blogspot.ru/2011/07/column-overhead-and-sizing-every-column.html https://issues.apache.org/jira/browse/CASSANDRA-4175
  • 28. © 2015. All Rights Reserved. JSON vs Primitive Column Types 28 • Significantly reduces storage overhead because of better ratio of payload / storage metadata • Improves throughput and latency • Supports complex hierarchical structures • But it loses in partial reads / updates • Complicates schema versioning • UDT might be worth to consider
  • 29. © 2015. All Rights Reserved. 29 Retrospectives
  • 30. © 2015. All Rights Reserved. What Worked Well 30 • Documentation is good • Good support from Datastax • CQL3 easy to learn and migrate to from SQL background • Query tracing is helpful • Stable • Able to reduce our reliance on caching in app tier • Cassandra Cluster Manager (https://github.com/pcmanus/ccm)
  • 31. © 2015. All Rights Reserved. Problems we’ve had with Cassandra 31 • CQL queries brought down cluster • Delete and creating a keyspace w/o truncating • File I/O issues • Lengthy compaction • Tombstones, shouldn’t have been clustering • Need to understand underlying storage • Driver not knowing nodes left cluster • OpsCenter Performance Charts
  • 32. © 2015. All Rights Reserved. Our Own Problems 32 • Small cluster: all servers must perform well • Lack of versioning of JSON schema • How to handle performant exports • 5% CPU -> 30% • Non-primary key access • Under-allocated test environments
  • 33. © 2015. All Rights Reserved. 33 Future Plans
  • 34. © 2015. All Rights Reserved. Future Plans 34 • Upgrade DSE 4.0.3 → 4.6.5 • Conditional updates based on column timestamps • DSE Spark Integration for reporting • Multi-Datacenter • Using Mutagen for schema management • JSON schema versioning • RxJava for asynchronous operations • Spring Data Cassandra
  • 35. © 2015. All Rights Reserved. Stuff we’d like to see in Cassandra/DSE 35 • DSE Kibana – Augment Spark’s query capability with an interactive GUI – Experiment with Banana (https://github.com/LucidWorks/banana) • Replication listeners – Detect when replication is complete • Geospatial Query Support
  • 38. © 2015. All Rights Reserved. One Concern … 38 Patrick McFadin Buddy Pine/”Syndrome” Secret Identity???