This presentation was used in a session by Hari Mankude, CTO at Talena Inc. He discussed common misconceptions and the technical challenges associated with key Cassandra data management processes around scalable backup, recovery and test data management.
3. Confidential and Proprietary3
Why Bother With Backup and Test Data Mgmt?
The average cost of a data loss incident is $900,000
90% of enterprises delay applications because of a lack
of test data
• Source: EMC, Talena
4. Confidential and Proprietary4
Myth #1 Data Replicas Prevent Data Loss
N1
N2
N3
N4
Human errors: dropping
column of a table
Application corruption:
incorrect updates to a
column
6. Confidential and Proprietary6
Myth #3: Cassandra snapshots are an effective backup
strategy
Snapshots
result in
storage
amplification
due to
compaction
PROBLEM
Need
scheduler to
take timely
snapshots &
delete older
restore points
PROBLEM
7. Confidential and Proprietary7
Myth #4: Restoring from snapshots is trivial
When your
cluster size
changes due
to addition or
deletion of
nodes
PROBLEM
If you have
config (e.g.,
compaction
policy) or
name changes
PROBLEM
Scaling your
restore to
hundreds of
nodes
PROBLEM
9. Confidential and Proprietary9
Myth #6 Test Data Management Is A Simple Process
Change
Request -
1 week
Provision
Production
Data - 1
week
Create
Test DB
and Mask
Data - 1
week
Create
Samples of
Production
Data – 2
days
Push
Production
Data To
Test –
Hours
Repeat
Process –
3-4 weeks
11. Confidential and Proprietary11
Talena in Production
Test
Cluster
Research
Cluster
Talena GUI
Hadoop/Spark
Cluster
Cassandra
Cluster
Vertica
Cluster
Couchbase
Cluster
Talena
Smart Storage
Cluster
12. Confidential and Proprietary12
The Talena Architecture
• Deep de-duplication and compression with app-aware architecture
• Incremental-forever backup architecture
• High availability via erasure coding in distributed cluster architecture
Smart Storage Optimizer
13. Confidential and Proprietary13
The Talena Architecture
Native querying and analytics
via active compute layer
Unbounded scale with a
Hadoop-native architecture
Smart Storage Optimizer
Active Compute Services Distributed File System
14. Confidential and Proprietary14
The Talena Architecture
• Google-like catalog
shortens data recovery
time
• Automatic schema
generation for mirroring
and backups
• Granular recovery at an
object level
• Recovery to multiple
topologies
• Native integration with
LDAP and Kerberos for
authentication
• Role-based access control
defines specific privileges
• Transparent data encryption
• Masking for PII data
Smart Storage Optimizer
Active Compute Services Distributed File System
Metadata Catalog Data Orchestration ServicesSecurity Services
15. Confidential and Proprietary15
Smart Storage Optimizer
The Talena Architecture
GUI CLI API
Active Compute Services Distributed File System
• ‘Single pane of glass’ for multiple use cases and data platforms
• Agentless architecture minimizes management overhead
• GUI, CLI, REST-based Talena API options
Metadata Catalog Data Orchestration ServicesSecurity Services
16. Confidential and Proprietary16
Q&A
We’ll send you a link to our
eBook “The Cassandra
Backup Guide”
Additional resources: talena-
inc.com/resources and
talena-inc.com/blog
Ping us with any additional
questions: info@talena-
inc.com
----- Meeting Notes (9/1/16 13:59) -----
Change the slide
----- Meeting Notes (9/1/16 16:03) -----
Add sampling bullet point
Then push sampled data to test
Add repeat bucket
Starting over 20 years ago, the traditional database market became the foundation of enterprise applications. A whole ecosystem of data management products emerged to provide capabilities like backup/recovery (Veritas), storage pooling (Data Domain) test/dev management (Delphix) and Iron Mountain (archiving). But, companies had to purchase separate products to provide a full data management solution for their enterprise.
Over the past few years and into the foreseeable future, modern data platforms will become new hubs of enterprise applications. These modern data platforms also need data management capabilities, similar to what happened with traditional databases.
(Click for build) Our vision is to help companies with their critical data management needs in a single software product, one that is optimized specifically for these modern Big Data environments.
The next few slides will introduce the unique Talena architecture and highlight how this architecture delivers on these core business benefits.
One of the most significant components of our architecture is our Smart Storage Optimizer.
By integrating compute and storage management into our storage optimizer, we’re able to deliver significant cost savings. Our application-aware architecture enables us to do deep de-duplication and compression. Our backup process is incremental-forever, saving on storage costs, and by incorporating erasure coding we also ensure high availability no matter how large a Talena cluster you choose to deploy.
Supports transparent data encryption in the security services section
Our agentless architecture makes Talena an ideal solution for big data architectures and minimizes your operational overhead. Furthermore, Talena can support multiple data platforms, versions, and use cases in a single deployment of Talena, thereby providing a “single pane of glass” for all your big data management needs.
While most of our clients work within our user interface, we also provide a REST-based API to accomplish the same tasks.