Extreme Data Velocity
Continuous Availability
Operational Simplicity
Michael Shaler
Senior Director, Business Development

©2013 DataStax Confidential. Do not distribute without consent.
What is Big Data’s payoff?
DataStax: CRN’s “10 Coolest Big Data Startups”
Cassandra: InfoWorld’s Technology of the Year

1,000+ production deployments and 300 customers
$84M in funding from industry-leading investors
We are the first viable alternative to
Oracle for modern online
applications.

We seek to be the first and best
choice in databases.
No, Seriously…
Real-world Use Cases
Internet of Things Database Requirements
• “UTC subject predicate”: Time series data and metadata are the lingua franca of
sensors/device data communications
• FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs
with variable schemas/data models is the norm—and unless you tell them to do so, sensors
never, ever sleep…
• HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary.

• DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT
operational datastores
• AND: Other key functionality needed includes indexed search, along with both batch and realtime analytics—with data-in-flight and data-at-rest security an emerging need
• SPOILER ALERT: DataStax Enterprise supports all of the above

7
Time Series Analytics: 70B readings
Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households
Improvements in demand forecasting could yield EBITDA > $100M per GW saved

•
•

•

$5M CAPEX
10 man/months delivery
(Deploy, DevOps, Tuning)
Ongoing OPEX of > $1M

•
•
•
•

$450K OPEX
2 DevOps running 15 AWS nodes
Faster performance in 2 weeks
…All in the cloud
Major Changes: The Evolving Data Center

LOB
App

LOB
App

LOB
App

Data Warehouse

Oracle

MySQL

SQL
Server

Teradata/
Exadata

“What’s Happening?”
Hyper Velocity
Transactional

“What Happened?”
Massive Volume
Bit Bucket

NoSQL

Hadoop
The Application World *HAS* Changed
Common Use Cases

•

Web product searches

•

Internal document search (law firms, etc.)

•

Real estate/property searches

•

Social media match ups

•

Web & application log management / analysis

•

Big data OLTP and write intensive systems

•

Time series data management

•

High velocity device data consumption and analysis

•

Healthcare systems input and analysis

•

Media streaming (music, movies, etc.)

•

Online Web retail (shopping carts, user transactions, etc.)

•

Online gaming (real-time messaging, etc.)

•

Real time data analytics

•
•

Web click-stream analysis

•

Buyer event and behavior analytics

•

Fraud detection and analysis

•

Risk analysis and management

•

11

Social media input and analysis

Supply chain analytics
Continuous Availability Commentary
Cassandra: Architecture as Foundation
Virginia

Santa Clara

London

Sydney
The New DR: Simian Army “Dystopia as a Service”
Virginia

London

Santa Clara

Sydney
14
Heterogeneous Workloads: Active Everywhere
Read

Analyze

Write

Virginia

London

Search
Write

Santa Clara

Sydney

Search
Write
15

Read
Our Product Solution

• DataStax Enterprise
powers the big data apps
that transform business.
• Extreme Data Velocity
• Continuous Availability
• Operational Simplicity
Operational Simplicity

33M streaming customers
2T API calls/year
~1,200 Servers
55 AWS clusters
12 developers
4 operators
0 New data centers
©2012 DataStax

“Our primary operational data store
is now Cassandra, not Oracle.”
17
Performance: NoSQL Leadership

Cassandra vs. HBase:

•10x more read throughput
•100x faster read latency
•8x more write throughput
•8x faster scan latency
•4x more scan throughput

Source: Solving Big Data Challenges for Enterprise Application Performance Management
Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
Performance: NoSQL Leadership
YCSB Load Process

YCSB Read-mostly

YCSB Read-write mix

©2012 DataStax

YCSB Write-mostly

19
From STB to the Scalable Cloud Message Bus

Use Case: X1 Sports App

18000)
16000)

API/sec)

14000)

Even in preproduction
environment prior
to tuning, achieved
near-linear
scalability

12000)
10000)
8000)
6000)
4000)
2000)
0)
4)

8)

12) 16) 20) 24)

Ring)Size)

20

Enabling a richer
active consumer
experience across
multiple devices,
multiple platforms
Instagram Scales Engaged Networks
• Transitioned from Redis (in-memory cache) to
Cassandra in Amazon Web Services EC2
• Doubled cluster—and then doubled again—to support
150MM users on new infrastructure
• Continue to scale in spite of Justin Bieber storms, video
formats, new features, new markets
CASSAN DRA
AT IN STAGRAM
Rick Branson, Infrastructure Engineer
@
rbranson
c om i t ac b02daea57dc a889c 2aa45963754a271f a51566
m
Aut hor : Ri c k Br ans on
Dat e:
Sun Feb 10 20: 36: 34 2013 - 0800
Doubl ed C* c l us t er

2013 Cassandra Summit
#cassandra13
June 12, 2013
San Francisco, CA

21
Our Vision

DataStax is driving
Cassandra to be the first
viable alternative to the
Oracle database for
companies who are
transforming the way they
interact with customers.

Getting ahead of exploding growth
Sign big, new contracts all the time (ESPN)
• 200M unique users per month
• 40TB of data
•

Flexible architecture
•

“Couldn’t shoehorn RDBMS technology”

Very small operations team
3 people
• 20 clusters
• 100’s of nodes
•
Why We Exist

Today’s applications must be
always available and lightning
fast as they scale to previously
unimaginable levels.
Cassandra delivers both with a
beautifully simple and elegant
architecture.

“We need a real-time, massively
scalable architecture, where no
one node is a single point of
failure, that can easily span
multiple data centers and cloud
availability zones, and that’s
Cassandra.”
What We Do Best

Cassandra was designed to do
things that are impossible in
other databases when it comes
to availability and
performance. Forget about
losing a machine here or there -Cassandra delivers a world
where you can lose an entire
datacenter and still perform as
your customers expect.

“We have to be ready for disaster
recovery all the time. It’s really
great that Cassandra allows for
active-active multiple data centers
where we can read and write
anywhere”
Jay Patel
Technical Architect at eBay
(Describing why they switched from legacy
relational architecture)
The Modern “Application”
The Modern “Application”
Fraud Detection and Prevention
What It Means In Real Life
What It Means In Real Life
Cassandra Summit SF 2013
Real Growth In Production
We are the first viable
alternative to Oracle for
modern online applications.
Thank You

We power the big data apps
that transform business.

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
BENEFITS

FEATURES

Security in Cassandra

Internal Authentication
Manages login IDs and
passwords inside the
database
+ Ensures only
authorized users can
access a database
system using internal
validation
+ Simple to implement
and easy to
understand
+ No learning curve from
the relational world

Object Permission
Management
controls who has access
to what and who can do
what in the database

Client to Node
Encryption
protects data in flight to
and from a database
cluster

+ Provides granular based
control over who can
add/change/delete/read
data

+ Ensures data cannot be
captured/stolen in route
to a server

+ Uses familiar
GRANT/REVOKE from
relational systems
+ No learning curve

+ Data is safe both in
flight from/to a
database and on the
database; complete
coverage is ensured
BENEFITS

FEATURES

Advanced Security in DataStax Enterprise

External Authentication
uses external security
software packages to
control security

Transparent Data
Encryption
encrypts data at rest

Data Auditing
provides trail of who did
and looked at what/when

+ Only authorized users
have access to a
database system using
external validation

+ Protects sensitive data
at rest from theft and
from being read at the
file system level

+ Supplies admins with
an audit trail of all
accesses and changes

+ Uses most trusted
external security
packages (Kerberos,
LDAP), mainstays in
government and
finance

+ No changes needed at
application level

+ Single sign on to all
data domains

+ Can encrypt both
Cassandra and
Hadoop data

+ Granular control to
audit only what’s
needed
+ Uses log4j interface to
ensure performance
and efficient audit
operations

DataStax

  • 1.
    Extreme Data Velocity ContinuousAvailability Operational Simplicity Michael Shaler Senior Director, Business Development ©2013 DataStax Confidential. Do not distribute without consent.
  • 2.
    What is BigData’s payoff?
  • 3.
    DataStax: CRN’s “10Coolest Big Data Startups” Cassandra: InfoWorld’s Technology of the Year 1,000+ production deployments and 300 customers $84M in funding from industry-leading investors
  • 4.
    We are thefirst viable alternative to Oracle for modern online applications. We seek to be the first and best choice in databases.
  • 5.
  • 6.
  • 7.
    Internet of ThingsDatabase Requirements • “UTC subject predicate”: Time series data and metadata are the lingua franca of sensors/device data communications • FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs with variable schemas/data models is the norm—and unless you tell them to do so, sensors never, ever sleep… • HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary. • DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT operational datastores • AND: Other key functionality needed includes indexed search, along with both batch and realtime analytics—with data-in-flight and data-at-rest security an emerging need • SPOILER ALERT: DataStax Enterprise supports all of the above 7
  • 8.
    Time Series Analytics:70B readings Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households Improvements in demand forecasting could yield EBITDA > $100M per GW saved • • • $5M CAPEX 10 man/months delivery (Deploy, DevOps, Tuning) Ongoing OPEX of > $1M • • • • $450K OPEX 2 DevOps running 15 AWS nodes Faster performance in 2 weeks …All in the cloud
  • 9.
    Major Changes: TheEvolving Data Center LOB App LOB App LOB App Data Warehouse Oracle MySQL SQL Server Teradata/ Exadata “What’s Happening?” Hyper Velocity Transactional “What Happened?” Massive Volume Bit Bucket NoSQL Hadoop
  • 10.
    The Application World*HAS* Changed
  • 11.
    Common Use Cases • Webproduct searches • Internal document search (law firms, etc.) • Real estate/property searches • Social media match ups • Web & application log management / analysis • Big data OLTP and write intensive systems • Time series data management • High velocity device data consumption and analysis • Healthcare systems input and analysis • Media streaming (music, movies, etc.) • Online Web retail (shopping carts, user transactions, etc.) • Online gaming (real-time messaging, etc.) • Real time data analytics • • Web click-stream analysis • Buyer event and behavior analytics • Fraud detection and analysis • Risk analysis and management • 11 Social media input and analysis Supply chain analytics
  • 12.
  • 13.
    Cassandra: Architecture asFoundation Virginia Santa Clara London Sydney
  • 14.
    The New DR:Simian Army “Dystopia as a Service” Virginia London Santa Clara Sydney 14
  • 15.
    Heterogeneous Workloads: ActiveEverywhere Read Analyze Write Virginia London Search Write Santa Clara Sydney Search Write 15 Read
  • 16.
    Our Product Solution •DataStax Enterprise powers the big data apps that transform business. • Extreme Data Velocity • Continuous Availability • Operational Simplicity
  • 17.
    Operational Simplicity 33M streamingcustomers 2T API calls/year ~1,200 Servers 55 AWS clusters 12 developers 4 operators 0 New data centers ©2012 DataStax “Our primary operational data store is now Cassandra, not Oracle.” 17
  • 18.
    Performance: NoSQL Leadership Cassandravs. HBase: •10x more read throughput •100x faster read latency •8x more write throughput •8x faster scan latency •4x more scan throughput Source: Solving Big Data Challenges for Enterprise Application Performance Management Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
  • 19.
    Performance: NoSQL Leadership YCSBLoad Process YCSB Read-mostly YCSB Read-write mix ©2012 DataStax YCSB Write-mostly 19
  • 20.
    From STB tothe Scalable Cloud Message Bus Use Case: X1 Sports App 18000) 16000) API/sec) 14000) Even in preproduction environment prior to tuning, achieved near-linear scalability 12000) 10000) 8000) 6000) 4000) 2000) 0) 4) 8) 12) 16) 20) 24) Ring)Size) 20 Enabling a richer active consumer experience across multiple devices, multiple platforms
  • 21.
    Instagram Scales EngagedNetworks • Transitioned from Redis (in-memory cache) to Cassandra in Amazon Web Services EC2 • Doubled cluster—and then doubled again—to support 150MM users on new infrastructure • Continue to scale in spite of Justin Bieber storms, video formats, new features, new markets CASSAN DRA AT IN STAGRAM Rick Branson, Infrastructure Engineer @ rbranson c om i t ac b02daea57dc a889c 2aa45963754a271f a51566 m Aut hor : Ri c k Br ans on Dat e: Sun Feb 10 20: 36: 34 2013 - 0800 Doubl ed C* c l us t er 2013 Cassandra Summit #cassandra13 June 12, 2013 San Francisco, CA 21
  • 22.
    Our Vision DataStax isdriving Cassandra to be the first viable alternative to the Oracle database for companies who are transforming the way they interact with customers. Getting ahead of exploding growth Sign big, new contracts all the time (ESPN) • 200M unique users per month • 40TB of data • Flexible architecture • “Couldn’t shoehorn RDBMS technology” Very small operations team 3 people • 20 clusters • 100’s of nodes •
  • 23.
    Why We Exist Today’sapplications must be always available and lightning fast as they scale to previously unimaginable levels. Cassandra delivers both with a beautifully simple and elegant architecture. “We need a real-time, massively scalable architecture, where no one node is a single point of failure, that can easily span multiple data centers and cloud availability zones, and that’s Cassandra.”
  • 24.
    What We DoBest Cassandra was designed to do things that are impossible in other databases when it comes to availability and performance. Forget about losing a machine here or there -Cassandra delivers a world where you can lose an entire datacenter and still perform as your customers expect. “We have to be ready for disaster recovery all the time. It’s really great that Cassandra allows for active-active multiple data centers where we can read and write anywhere” Jay Patel Technical Architect at eBay (Describing why they switched from legacy relational architecture)
  • 25.
  • 26.
    The Modern “Application” FraudDetection and Prevention
  • 27.
    What It MeansIn Real Life
  • 28.
    What It MeansIn Real Life
  • 29.
  • 30.
    Real Growth InProduction
  • 31.
    We are thefirst viable alternative to Oracle for modern online applications.
  • 32.
    Thank You We powerthe big data apps that transform business. ©2013 DataStax Confidential. Do not distribute without consent.
  • 33.
    DataStax OpsCenter 4.0 ©2013DataStax Confidential. Do not distribute without consent.
  • 34.
    DataStax OpsCenter 4.0 ©2013DataStax Confidential. Do not distribute without consent.
  • 35.
    DataStax OpsCenter 4.0 ©2013DataStax Confidential. Do not distribute without consent.
  • 36.
    DataStax OpsCenter 4.0 ©2013DataStax Confidential. Do not distribute without consent.
  • 37.
    DataStax OpsCenter 4.0 ©2013DataStax Confidential. Do not distribute without consent.
  • 38.
    BENEFITS FEATURES Security in Cassandra InternalAuthentication Manages login IDs and passwords inside the database + Ensures only authorized users can access a database system using internal validation + Simple to implement and easy to understand + No learning curve from the relational world Object Permission Management controls who has access to what and who can do what in the database Client to Node Encryption protects data in flight to and from a database cluster + Provides granular based control over who can add/change/delete/read data + Ensures data cannot be captured/stolen in route to a server + Uses familiar GRANT/REVOKE from relational systems + No learning curve + Data is safe both in flight from/to a database and on the database; complete coverage is ensured
  • 39.
    BENEFITS FEATURES Advanced Security inDataStax Enterprise External Authentication uses external security software packages to control security Transparent Data Encryption encrypts data at rest Data Auditing provides trail of who did and looked at what/when + Only authorized users have access to a database system using external validation + Protects sensitive data at rest from theft and from being read at the file system level + Supplies admins with an audit trail of all accesses and changes + Uses most trusted external security packages (Kerberos, LDAP), mainstays in government and finance + No changes needed at application level + Single sign on to all data domains + Can encrypt both Cassandra and Hadoop data + Granular control to audit only what’s needed + Uses log4j interface to ensure performance and efficient audit operations