SlideShare a Scribd company logo
1 of 70
Download to read offline
DataStax Enterprise in the Field
Daniel Cohen
Solutions Engineer @ DataStax
© DataStax, All Rights Reserved.
But Enough About Me…
• Solutions Engineer at DataStax
• LA ➜ SF ➜ NYC ➜ SF ➜ London
• Previously at JP Morgan in London
• Finance & digital media
2
© DataStax, All Rights Reserved.
But Enough About Me…
• Solutions Engineer at DataStax
• LA ➜ SF ➜ NYC ➜ SF ➜ London
• Previously at JP Morgan in London
• Finance & digital media
2
© DataStax, All Rights Reserved.
But Enough About Me…
• Solutions Engineer at DataStax
• LA ➜ SF ➜ NYC ➜ SF ➜ London
• Previously at JP Morgan in London
• Finance & digital media
2
© DataStax, All Rights Reserved.
1 Introductions
2 Top Customer Questions
3 Field Lessons: Big Irish Bank
4 Field Lessons: Big British Bank
3
© DataStax, All Rights Reserved.
Top Customer Questions
• What are all the other [banks] doing?
• How many nodes do I need?
• What do you mean SSDs?
• How do I load data from [Oracle]?
• We already have [MongoDB] for NoSQL.
What’s the difference?
• What are all the other [banks] doing?
4
What are all the other [banks] doing?
“Tell me secrets about my competitors.”
© DataStax, All Rights Reserved.
Transform Legacy Infrastructure
6
…USA
Equities
UK
FX
UK
Bonds
Global
Users
Legacy
Systems
USA
FX
DataStax Enterprise ClusterDSE
User Interface / Application Services
© DataStax, All Rights Reserved.
Transition Legacy to Microservices
7
Users µServices
DC NY1
A B
C D
DC LDN1
A Z
B
Messages
DC NY1
DC LDN1
DC NY1
DC LDN1
USA
Customers
Data
UK
Accounts
Legacy
C
DSE
DSE
How many nodes do I need?
“How long is a piece of string?”
© DataStax, All Rights Reserved.
The Node Count Dance
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
• Realities
– Cost
– Data center capacity (space)
– Operational capacity (people)
– Your hardware
– Your use cases
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
• Realities
– Cost
– Data center capacity (space)
– Operational capacity (people)
– Your hardware
– Your use cases
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
• Realities
– Cost
– Data center capacity (space)
– Operational capacity (people)
– Your hardware
– Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
• Realities
– Cost
– Data center capacity (space)
– Operational capacity (people)
– Your hardware
– Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.
• Lesson 2 ➔ Test, iterate, test.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
• Realities
– Cost
– Data center capacity (space)
– Operational capacity (people)
– Your hardware
– Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.
• Lesson 2 ➔ Test, iterate, test.
• Lesson 3 ➔ Good news! DSE scales linearly.
9
© DataStax, All Rights Reserved.
The Node Count Dance
• “How many nodes do I need?” is a natural question.
– Large organizations buy hardware months in advance.
• Desires ➔ Storage, Throughput, Latency, SLAs
• Realities
– Cost
– Data center capacity (space)
– Operational capacity (people)
– Your hardware
– Your use cases
• Lesson 1 ➔ Computer science is about trade-offs.
• Lesson 2 ➔ Test, iterate, test.
• Lesson 3 ➔ Good news! DSE scales linearly.
9
What do you mean SSDs?
“We have an amazing SAN.”
© DataStax, All Rights Reserved.
Storage Matters
11
SSD (consumer grade)
• 10K – 1M IOPS
• 400 MB – 3 GB bandwidth
• < 200us latency
✴ Acknowledgements to my colleague Kathryn Erickson.
15K RPM HDD (spinning rust)
• ~ 200 IOPS
• ~ 160 MB bandwidth
• > 5 ms latency
© DataStax, All Rights Reserved.
Storage Interfaces Matter
12
Interface Transfer Rate
SATA III 6 Gb/s
SAS II 6 Gb/s
SAS III 12 Gb/s
PCIe Gen 2 x8 32 Gb/s
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
– Do not use network attached storage with DSE.
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.
– No! Do not use network attached storage with DSE.
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.
– No! Do not use network attached storage with DSE.
• Fine. What about EBS?
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.
– No! Do not use network attached storage with DSE.
• Fine. What about EBS?
– Let’s discuss!
13
© DataStax, All Rights Reserved.
A Nondeterministic Path to Failure
• What about my incredible SAN?
– Do not use network attached storage with DSE.
• But our SAN is awesome! We paid a lot of money for it.
– No! Do not use network attached storage with DSE.
• Fine. What about EBS?
– Let’s discuss!
13
© DataStax, All Rights Reserved.
Starting Points
Workload CPU RAM Storage
DSE (Read Heavy) 8-24 cores 32-128 GB ✴ Local SSD (.5 - 2 TB)
DSE (Write Heavy) 12-32 cores 32-128 GB Local SSD (1-3 TB)
DSE + Search 16-32 cores 128 GB Local SSD (1-3 TB)
DSE + Analytics 16-32 cores 128+ GB Local SSD (1-3 TB)
✴ Got extra RAM? Cache is king.
✴✴ 1 Gb ethernet is fine. 10Gb is future-proof.
14
We already have [MongoDB] for NoSQL.
What’s the difference?
“Behold the one true NoSQL database.”
© DataStax, All Rights Reserved.
NoSQL
16
© DataStax, All Rights Reserved.
NoSQL
16
© DataStax, All Rights Reserved.
NoSQL
16
© DataStax, All Rights Reserved.
NoSQL
16
© DataStax, All Rights Reserved.
NoSQL
Fantasy
16
© DataStax, All Rights Reserved.
NoSQL
Fantasy
16
© DataStax, All Rights Reserved.
1 Introductions
2 Top Customer Questions
3 Field Lessons: Big Irish Bank
4 Field Lessons: Big British Bank
17
© DataStax, All Rights Reserved.
Proof of Technology @ Big Irish Bank
18
Initial Goals
• Deploy on AWS
• Ingest ten years of (fake)
customer data efficiently
• Fast retrieval & search
Synopsis
• Payment Services Directive
(PSD II) and Open Banking
• Customer access to current
and historical data via APIs
• Competitive PoT versus
other database vendors
© DataStax, All Rights Reserved.
Hardware
19
© DataStax, All Rights Reserved.
Hardware
19
PoT Recommendation
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1
• c4.8xlarge (AWS)
• 36 vCPU, 60 GB RAM
• EBS only
Hardware
19
PoT Recommendation
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1
• c4.8xlarge (AWS)
• 36 vCPU, 60 GB RAM
• EBS only
Hardware
19
PoT Recommendation
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1
• c4.8xlarge (AWS)
• 36 vCPU, 60 GB RAM
• EBS only
Hardware
19
PoT Recommendation
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
PoT Final
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1
• c4.8xlarge (AWS)
• 36 vCPU, 60 GB RAM
• EBS only
Hardware
19
PoT Recommendation
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
Production
• 8 nodes across 2 data centers (4:4)
• HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III
• 10 Gb ethernet, fiber between DCs
PoT Final
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
© DataStax, All Rights Reserved.
PoT Mark 1
• c4.8xlarge (AWS)
• 36 vCPU, 60 GB RAM
• EBS only
Hardware
19
PoT Recommendation
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
Production
• 8 nodes across 2 data centers (4:4)
• HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III
• 10 Gb ethernet, fiber between DCs
PoT Final
• 6 x i2.xlarge (AWS)
• 4 vCPU, 30.5 GB RAM
• 1 x 800 local SSD
© DataStax, All Rights Reserved.
Lessons
20
© DataStax, All Rights Reserved.
Lessons
20
1) The Node Count Dance is iterative.
• Initial node count estimates were low.
• Early refusal to modify AWS setup.
• Avoid rigidity. Test, iterate, test.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes?
• Hit performance plateau at 5,000 ops/s.
• Added second jMeter, performance
doubled to 10,000 ops/s.
• jMeter was the bottleneck!
• Who will test the testers?
1) The Node Count Dance is iterative.
• Initial node count estimates were low.
• Early refusal to modify AWS setup.
• Avoid rigidity. Test, iterate, test.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes?
• Hit performance plateau at 5,000 ops/s.
• Added second jMeter, performance
doubled to 10,000 ops/s.
• jMeter was the bottleneck!
• Who will test the testers?
1) The Node Count Dance is iterative.
• Initial node count estimates were low.
• Early refusal to modify AWS setup.
• Avoid rigidity. Test, iterate, test.
3) EBS is still network attached.
• 99% Read Latency (milliseconds)
▫ 3.311 ➔ local SSD
▫ 35.425 ➔ EBS Provisioned SSD
• Competing vendor falsified numbers.
• Lies, damned lies, and statistics.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes?
• Hit performance plateau at 5,000 ops/s.
• Added second jMeter, performance
doubled to 10,000 ops/s.
• jMeter was the bottleneck!
• Who will test the testers?
1) The Node Count Dance is iterative.
• Initial node count estimates were low.
• Early refusal to modify AWS setup.
• Avoid rigidity. Test, iterate, test.
4) Not all data needs to be hot.
• PoT Mark 1 ➔ 10 years of hot data
▫ ~ 20 billion transactions
▫ ~ 30 nodes to reach latency targets
• PoT Final ➔ 2 years of hot data
• Do not architect by convenience.
3) EBS is still network attached.
• 99% Read Latency (milliseconds)
▫ 3.311 ➔ local SSD
▫ 35.425 ➔ EBS Provisioned SSD
• Competing vendor falsified numbers.
• Lies, damned lies, and statistics.
© DataStax, All Rights Reserved.
Lessons
20
2) Quis custodiet ipsos custodes?
• Hit performance plateau at 5,000 ops/s.
• Added second jMeter, performance
doubled to 10,000 ops/s.
• jMeter was the bottleneck!
• Who will test the testers?
1) The Node Count Dance is iterative.
• Initial node count estimates were low.
• Early refusal to modify AWS setup.
• Avoid rigidity. Test, iterate, test.
4) Not all data needs to be hot.
• PoT Mark 1 ➔ 10 years of hot data
▫ ~ 20 billion transactions
▫ ~ 30 nodes to reach latency targets
• PoT Final ➔ 2 years of hot data
• Do not architect by convenience.
3) EBS is still network attached.
• 99% Read Latency (milliseconds)
▫ 3.311 ➔ local SSD
▫ 35.425 ➔ EBS Provisioned SSD
• Competing vendor falsified numbers.
• Lies, damned lies, and statistics.
© DataStax, All Rights Reserved.
1 Introductions
2 Top Customer Questions
3 Field Lessons: Big Irish Bank
4 Field Lessons: Big British Bank
21
© DataStax, All Rights Reserved.
Production Pilot @ Big British Bank
22
Initial Goals
• Transition from mothballed
trials of OrientDB, Titan
• Ingest enormous quantities
of data from legacy DB
• Prove graph at scale
Synopsis
• Customer 360° use case
across banking group
• DSE Graph
• Dissatisfied with other
graph databases
© DataStax, All Rights Reserved.
Hardware
23
© DataStax, All Rights Reserved.
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
© DataStax, All Rights Reserved.
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
© DataStax, All Rights Reserved.
Pilot Mark 2
• “Hadoop Leftovers”
• 4 x HP DL380s
• 24 cores, 512 GB RAM
• 1 x 2.1 TB SSD
• 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
© DataStax, All Rights Reserved.
Pilot Mark 2
• “Hadoop Leftovers”
• 4 x HP DL380s
• 24 cores, 512 GB RAM
• 1 x 2.1 TB SSD
• 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
© DataStax, All Rights Reserved.
Pilot Mark 2
• “Hadoop Leftovers”
• 4 x HP DL380s
• 24 cores, 512 GB RAM
• 1 x 2.1 TB SSD
• 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
Pilot Final
• 3 x Dell C6220
• 12 cores, 128 GB RAM
• 6 x 1 TB SATA HDDs
▫ 2 x OS
▫ 1 x commit log
▫ 3 x data, caches
© DataStax, All Rights Reserved.
Pilot Mark 2
• “Hadoop Leftovers”
• 4 x HP DL380s
• 24 cores, 512 GB RAM
• 1 x 2.1 TB SSD
• 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
Production Target
16 nodes across 2 data centers (8:8)
HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs
Pilot Final
• 3 x Dell C6220
• 12 cores, 128 GB RAM
• 6 x 1 TB SATA HDDs
▫ 2 x OS
▫ 1 x commit log
▫ 3 x data, caches
© DataStax, All Rights Reserved.
Pilot Mark 2
• “Hadoop Leftovers”
• 4 x HP DL380s
• 24 cores, 512 GB RAM
• 1 x 2.1 TB SSD
• 14 x 2 TB HDDs
Hardware
23
Pilot Mark 1
• “Private Cloud”
• N x Hosted VM
• 8 vCPU, 112 GB RAM
• SAN only (for now)
Production Target
16 nodes across 2 data centers (8:8)
HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs
Pilot Final
• 3 x Dell C6220
• 12 cores, 128 GB RAM
• 6 x 1 TB SATA HDDs
▫ 2 x OS
▫ 1 x commit log
▫ 3 x data, caches
© DataStax, All Rights Reserved.
Lessons
24
© DataStax, All Rights Reserved.
Lessons
24
1) DSE essentials are critical.
• Great team but zero DSE experience.
• Ad hoc education introduces risk.
• Walk before you run.
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph.
• Data size unknown due to privacy.
• Load 5% of data, extrapolate.
• Test, iterate, test.
1) DSE essentials are critical.
• Great team but zero DSE experience.
• Ad hoc education introduces risk.
• Walk before you run.
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph.
• Data size unknown due to privacy.
• Load 5% of data, extrapolate.
• Test, iterate, test.
1) DSE essentials are critical.
• Great team but zero DSE experience.
• Ad hoc education introduces risk.
• Walk before you run.
3) Hardware matters, of course.
• Leftover Hadoop boxes, spinning rust.
• Get creative with configuration & tuning.
• “Under no circumstances should you do
load tests on these boxes.”
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph.
• Data size unknown due to privacy.
• Load 5% of data, extrapolate.
• Test, iterate, test.
1) DSE essentials are critical.
• Great team but zero DSE experience.
• Ad hoc education introduces risk.
• Walk before you run.
4) Avoid surprises before deadlines.
• Upgraded from RHEL 6.7 to 7.1.
• CPU spikes made nodes unusably slow.
• Revert!
• Nobody move, nobody gets hurt.
3) Hardware matters, of course.
• Leftover Hadoop boxes, spinning rust.
• Get creative with configuration & tuning.
• “Under no circumstances should you do
load tests on these boxes.”
© DataStax, All Rights Reserved.
Lessons
24
2) Node Count Dance applies to Graph.
• Data size unknown due to privacy.
• Load 5% of data, extrapolate.
• Test, iterate, test.
1) DSE essentials are critical.
• Great team but zero DSE experience.
• Ad hoc education introduces risk.
• Walk before you run.
4) Avoid surprises before deadlines.
• Upgraded from RHEL 6.7 to 7.1.
• CPU spikes made nodes unusably slow.
• Revert!
• Nobody move, nobody gets hurt.
3) Hardware matters, of course.
• Leftover Hadoop boxes, spinning rust.
• Get creative with configuration & tuning.
• “Under no circumstances should you do
load tests on these boxes.”
Thank you!
Daniel Cohen
Solutions Engineer @ DataStax
daniel.cohen@datastax.com
@CodaAzzurra

More Related Content

What's hot

Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopMarc Cluet
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...Joseph Arriola
 
Archiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
Archiving is a No-brainer - Bloor Analyst and RainStor Executive DiscussArchiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
Archiving is a No-brainer - Bloor Analyst and RainStor Executive DiscussRainStor
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBMongoDB
 
100 Exadata Implementations Later-Tim Fox
100 Exadata Implementations Later-Tim Fox100 Exadata Implementations Later-Tim Fox
100 Exadata Implementations Later-Tim FoxEnkitec
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?DataStax
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsAerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsBrillix
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInDataWorks Summit
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
 
5 step for deploying cost effective cloud ecommerce
5 step for deploying cost effective cloud ecommerce5 step for deploying cost effective cloud ecommerce
5 step for deploying cost effective cloud ecommerceWiudo Laos
 
GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014Shay Hassidim
 

What's hot (20)

Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...implementation of a big data architecture for real-time analytics with data s...
implementation of a big data architecture for real-time analytics with data s...
 
Archiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
Archiving is a No-brainer - Bloor Analyst and RainStor Executive DiscussArchiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
Archiving is a No-brainer - Bloor Analyst and RainStor Executive Discuss
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDB
 
100 Exadata Implementations Later-Tim Fox
100 Exadata Implementations Later-Tim Fox100 Exadata Implementations Later-Tim Fox
100 Exadata Implementations Later-Tim Fox
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsAerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
5 step for deploying cost effective cloud ecommerce
5 step for deploying cost effective cloud ecommerce5 step for deploying cost effective cloud ecommerce
5 step for deploying cost effective cloud ecommerce
 
GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014
 

Similar to DataStax Enterprise in the Field – 20160920

Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsWhere Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsInsightDataScience
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Johnny Miller
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformRackspace
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successErick Ramirez
 
Pros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdfPros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdfHernanKlint
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?Greg Lindahl
 
Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseStorage Switzerland
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupJustin Borgman
 
Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007Wing Venture Capital
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrPranav Kulkarni
 

Similar to DataStax Enterprise in the Field – 20160920 (20)

Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsWhere Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
Pros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdfPros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdf
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the Enterprise
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007Nimble Storage Series A presentation 2007
Nimble Storage Series A presentation 2007
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

DataStax Enterprise in the Field – 20160920

  • 1. DataStax Enterprise in the Field Daniel Cohen Solutions Engineer @ DataStax
  • 2. © DataStax, All Rights Reserved. But Enough About Me… • Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media 2
  • 3. © DataStax, All Rights Reserved. But Enough About Me… • Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media 2
  • 4. © DataStax, All Rights Reserved. But Enough About Me… • Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media 2
  • 5. © DataStax, All Rights Reserved. 1 Introductions 2 Top Customer Questions 3 Field Lessons: Big Irish Bank 4 Field Lessons: Big British Bank 3
  • 6. © DataStax, All Rights Reserved. Top Customer Questions • What are all the other [banks] doing? • How many nodes do I need? • What do you mean SSDs? • How do I load data from [Oracle]? • We already have [MongoDB] for NoSQL. What’s the difference? • What are all the other [banks] doing? 4
  • 7. What are all the other [banks] doing? “Tell me secrets about my competitors.”
  • 8. © DataStax, All Rights Reserved. Transform Legacy Infrastructure 6 …USA Equities UK FX UK Bonds Global Users Legacy Systems USA FX DataStax Enterprise ClusterDSE User Interface / Application Services
  • 9. © DataStax, All Rights Reserved. Transition Legacy to Microservices 7 Users µServices DC NY1 A B C D DC LDN1 A Z B Messages DC NY1 DC LDN1 DC NY1 DC LDN1 USA Customers Data UK Accounts Legacy C DSE DSE
  • 10. How many nodes do I need? “How long is a piece of string?”
  • 11. © DataStax, All Rights Reserved. The Node Count Dance 9
  • 12. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. 9
  • 13. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs 9
  • 14. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs • Realities – Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases 9
  • 15. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs • Realities – Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases 9
  • 16. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs • Realities – Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases • Lesson 1 ➔ Computer science is about trade-offs. 9
  • 17. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs • Realities – Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases • Lesson 1 ➔ Computer science is about trade-offs. • Lesson 2 ➔ Test, iterate, test. 9
  • 18. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs • Realities – Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases • Lesson 1 ➔ Computer science is about trade-offs. • Lesson 2 ➔ Test, iterate, test. • Lesson 3 ➔ Good news! DSE scales linearly. 9
  • 19. © DataStax, All Rights Reserved. The Node Count Dance • “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance. • Desires ➔ Storage, Throughput, Latency, SLAs • Realities – Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases • Lesson 1 ➔ Computer science is about trade-offs. • Lesson 2 ➔ Test, iterate, test. • Lesson 3 ➔ Good news! DSE scales linearly. 9
  • 20. What do you mean SSDs? “We have an amazing SAN.”
  • 21. © DataStax, All Rights Reserved. Storage Matters 11 SSD (consumer grade) • 10K – 1M IOPS • 400 MB – 3 GB bandwidth • < 200us latency ✴ Acknowledgements to my colleague Kathryn Erickson. 15K RPM HDD (spinning rust) • ~ 200 IOPS • ~ 160 MB bandwidth • > 5 ms latency
  • 22. © DataStax, All Rights Reserved. Storage Interfaces Matter 12 Interface Transfer Rate SATA III 6 Gb/s SAS II 6 Gb/s SAS III 12 Gb/s PCIe Gen 2 x8 32 Gb/s
  • 23. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure 13
  • 24. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? 13
  • 25. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? – Do not use network attached storage with DSE. 13
  • 26. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? – Do not use network attached storage with DSE. • But our SAN is awesome! We paid a lot of money for it. 13
  • 27. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? – Do not use network attached storage with DSE. • But our SAN is awesome! We paid a lot of money for it. – No! Do not use network attached storage with DSE. 13
  • 28. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? – Do not use network attached storage with DSE. • But our SAN is awesome! We paid a lot of money for it. – No! Do not use network attached storage with DSE. • Fine. What about EBS? 13
  • 29. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? – Do not use network attached storage with DSE. • But our SAN is awesome! We paid a lot of money for it. – No! Do not use network attached storage with DSE. • Fine. What about EBS? – Let’s discuss! 13
  • 30. © DataStax, All Rights Reserved. A Nondeterministic Path to Failure • What about my incredible SAN? – Do not use network attached storage with DSE. • But our SAN is awesome! We paid a lot of money for it. – No! Do not use network attached storage with DSE. • Fine. What about EBS? – Let’s discuss! 13
  • 31. © DataStax, All Rights Reserved. Starting Points Workload CPU RAM Storage DSE (Read Heavy) 8-24 cores 32-128 GB ✴ Local SSD (.5 - 2 TB) DSE (Write Heavy) 12-32 cores 32-128 GB Local SSD (1-3 TB) DSE + Search 16-32 cores 128 GB Local SSD (1-3 TB) DSE + Analytics 16-32 cores 128+ GB Local SSD (1-3 TB) ✴ Got extra RAM? Cache is king. ✴✴ 1 Gb ethernet is fine. 10Gb is future-proof. 14
  • 32. We already have [MongoDB] for NoSQL. What’s the difference? “Behold the one true NoSQL database.”
  • 33. © DataStax, All Rights Reserved. NoSQL 16
  • 34. © DataStax, All Rights Reserved. NoSQL 16
  • 35. © DataStax, All Rights Reserved. NoSQL 16
  • 36. © DataStax, All Rights Reserved. NoSQL 16
  • 37. © DataStax, All Rights Reserved. NoSQL Fantasy 16
  • 38. © DataStax, All Rights Reserved. NoSQL Fantasy 16
  • 39. © DataStax, All Rights Reserved. 1 Introductions 2 Top Customer Questions 3 Field Lessons: Big Irish Bank 4 Field Lessons: Big British Bank 17
  • 40. © DataStax, All Rights Reserved. Proof of Technology @ Big Irish Bank 18 Initial Goals • Deploy on AWS • Ingest ten years of (fake) customer data efficiently • Fast retrieval & search Synopsis • Payment Services Directive (PSD II) and Open Banking • Customer access to current and historical data via APIs • Competitive PoT versus other database vendors
  • 41. © DataStax, All Rights Reserved. Hardware 19
  • 42. © DataStax, All Rights Reserved. Hardware 19 PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
  • 43. © DataStax, All Rights Reserved. PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only Hardware 19 PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
  • 44. © DataStax, All Rights Reserved. PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only Hardware 19 PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
  • 45. © DataStax, All Rights Reserved. PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only Hardware 19 PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
  • 46. © DataStax, All Rights Reserved. PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only Hardware 19 PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD Production • 8 nodes across 2 data centers (4:4) • HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III • 10 Gb ethernet, fiber between DCs PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
  • 47. © DataStax, All Rights Reserved. PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only Hardware 19 PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD Production • 8 nodes across 2 data centers (4:4) • HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III • 10 Gb ethernet, fiber between DCs PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD
  • 48. © DataStax, All Rights Reserved. Lessons 20
  • 49. © DataStax, All Rights Reserved. Lessons 20 1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
  • 50. © DataStax, All Rights Reserved. Lessons 20 2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers? 1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.
  • 51. © DataStax, All Rights Reserved. Lessons 20 2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers? 1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test. 3) EBS is still network attached. • 99% Read Latency (milliseconds) ▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD • Competing vendor falsified numbers. • Lies, damned lies, and statistics.
  • 52. © DataStax, All Rights Reserved. Lessons 20 2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers? 1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test. 4) Not all data needs to be hot. • PoT Mark 1 ➔ 10 years of hot data ▫ ~ 20 billion transactions ▫ ~ 30 nodes to reach latency targets • PoT Final ➔ 2 years of hot data • Do not architect by convenience. 3) EBS is still network attached. • 99% Read Latency (milliseconds) ▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD • Competing vendor falsified numbers. • Lies, damned lies, and statistics.
  • 53. © DataStax, All Rights Reserved. Lessons 20 2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers? 1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test. 4) Not all data needs to be hot. • PoT Mark 1 ➔ 10 years of hot data ▫ ~ 20 billion transactions ▫ ~ 30 nodes to reach latency targets • PoT Final ➔ 2 years of hot data • Do not architect by convenience. 3) EBS is still network attached. • 99% Read Latency (milliseconds) ▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD • Competing vendor falsified numbers. • Lies, damned lies, and statistics.
  • 54. © DataStax, All Rights Reserved. 1 Introductions 2 Top Customer Questions 3 Field Lessons: Big Irish Bank 4 Field Lessons: Big British Bank 21
  • 55. © DataStax, All Rights Reserved. Production Pilot @ Big British Bank 22 Initial Goals • Transition from mothballed trials of OrientDB, Titan • Ingest enormous quantities of data from legacy DB • Prove graph at scale Synopsis • Customer 360° use case across banking group • DSE Graph • Dissatisfied with other graph databases
  • 56. © DataStax, All Rights Reserved. Hardware 23
  • 57. © DataStax, All Rights Reserved. Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
  • 58. © DataStax, All Rights Reserved. Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
  • 59. © DataStax, All Rights Reserved. Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
  • 60. © DataStax, All Rights Reserved. Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)
  • 61. © DataStax, All Rights Reserved. Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now) Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs ▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches
  • 62. © DataStax, All Rights Reserved. Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now) Production Target 16 nodes across 2 data centers (8:8) HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs ▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches
  • 63. © DataStax, All Rights Reserved. Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs Hardware 23 Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now) Production Target 16 nodes across 2 data centers (8:8) HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs ▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches
  • 64. © DataStax, All Rights Reserved. Lessons 24
  • 65. © DataStax, All Rights Reserved. Lessons 24 1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
  • 66. © DataStax, All Rights Reserved. Lessons 24 2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test. 1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.
  • 67. © DataStax, All Rights Reserved. Lessons 24 2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test. 1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run. 3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do load tests on these boxes.”
  • 68. © DataStax, All Rights Reserved. Lessons 24 2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test. 1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run. 4) Avoid surprises before deadlines. • Upgraded from RHEL 6.7 to 7.1. • CPU spikes made nodes unusably slow. • Revert! • Nobody move, nobody gets hurt. 3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do load tests on these boxes.”
  • 69. © DataStax, All Rights Reserved. Lessons 24 2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test. 1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run. 4) Avoid surprises before deadlines. • Upgraded from RHEL 6.7 to 7.1. • CPU spikes made nodes unusably slow. • Revert! • Nobody move, nobody gets hurt. 3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do load tests on these boxes.”
  • 70. Thank you! Daniel Cohen Solutions Engineer @ DataStax daniel.cohen@datastax.com @CodaAzzurra