These slides are from a webinar that featured a discussion on ways companies can address shifts in big data infrastructure and design the appropriate data management architecture for optimal performance and scale. It covers lessons gleaned from real customer challenges and implementations, offering attendees practical advice on the kinds of design decisions that can optimize the protection and management modern data platforms from Hadoop to NoSQL databases.
2. Confidential and Proprietary2
Webinar Goals
Outline key architecture principles for backup/recovery
& test data management in a modern data world
Illustrate the importance of these principles using real
customer examples
Identify specific tradeoffs that can be made when
deploying your data management infrastructure
3. Confidential and Proprietary3
Attributes of Modern Data Platforms
Scale out to petabyte
workloads
Analytics-driven
Intelligence
Storage optimization
across diverse data &
environments
Minimize copies
Create storage and
compute pools on
commodity H/W
4. Confidential and Proprietary4
Why Incremental Forever: Backups
Traditional
Approach
Big Data
Platform
Approach
Day 1 Days 2-7 Day 8
Full
backup
Incremental
backups
Full
backup
Incremental
backups
Full
backup
Incremental
backup
Days 9-14
Incremental
backups
Incremental
backups
Day 15
Full
backup
Incremental
backup
5. Confidential and Proprietary5
Why Incremental Forever: Restores
Day 1 2 3 . . . . 80 81
Data Size 1
TB
1.02
TB
1.03
TB
. . . . 1.2TB Developer
error
Changed data
from last
backup
- 50 GB 50 GB . . . . 50GB
Backup type Full Incr Incr . . . . Incr
Data recovered by traditional approach: 1 TB + 79 x 50 GB = 4.95 TB
Data recovered by big data approach: 1.2 TB
Key concept: the notion of a “virtualized full” image
6. Confidential and Proprietary6
The Importance of Parallelism
Test Platform Utility Talena Differential
Full backup 8 hours, 17 min 2 hours, 20 min 3.5x
Incremental
backup
4 hours, 55
minutes
26 min, 7 seconds 11.3x
Full restore to
different cluster
40 hours, 28 min 14 hours, 55
minutes
2.7x
Full restore to
same cluster
6 hours, 21 min 1 hour, 58 minutes 2.2x
Full restore using
incremental
restore point to
same cluster
21 hours, 28 min 2 hours, 5 min 10.3x
Eliminate choke
points
Tradeoff between
backup/restore
performance
versus production
cluster
Bandwidth efficiency
7. Confidential and Proprietary7
Elastic Scaling: What Are The Issues
Multi-DC
Cassandra Cluster
100-nodes, 320 TB
ARCHIVE
Data Center #1
Data Center #2
Cassandra Cluster
50-node, 125 TB
Data Center #1
Year 1 Year 2
Topic Key consideration
Scaling backup
infrastructure
Just adding nodes or
forklift
Agents/listeners Manageability
Multi-DC awareness Minimize WAN bandwidth
overhead
8. Confidential and Proprietary8
The Cloud Effect
NoSQL/Hado
op/EDW
Local
Storage
Production
Cluster
Object
Storage
Cold
Storage• Storage tiering
• Transparent access
• Bandwidth impact
10. Confidential and Proprietary10
The Talena Architecture
• Deep de-duplication and compression with app-aware architecture
• Incremental-forever backup architecture
• High availability via erasure coding in distributed cluster architecture
Smart Storage Optimizer
11. Confidential and Proprietary11
The Talena Architecture
Native querying and analytics
via active compute layer
Unbounded scale with a
Hadoop-native architecture
Smart Storage Optimizer
Active Compute Services Distributed File System
12. Confidential and Proprietary12
The Talena Architecture
• Google-like catalog
shortens data recovery
time
• Automatic schema
generation for mirroring
and backups
• Granular recovery at an
object level
• Recovery to multiple
topologies
• Native integration with
LDAP and Kerberos for
authentication
• Role-based access control
defines specific privileges
• Transparent data encryption
• Masking for PII data
Smart Storage Optimizer
Active Compute Services Distributed File System
Metadata Catalog Data Orchestration ServicesSecurity Services
13. Confidential and Proprietary13
Smart Storage Optimizer
The Talena Architecture
GUI CLI API
Active Compute Services Distributed File System
• ‘Single pane of glass’ for multiple use cases and data platforms
• Agentless architecture minimizes management overhead
• GUI, CLI, REST-based Talena API options
Metadata Catalog Data Orchestration ServicesSecurity Services
14. Confidential and Proprietary14
Q&A
We’ll send you a link to our
architecture white paper
Additional resources: talena-
inc.com/resources and
talena-inc.com/blog
Ping us with any additional
questions: info@talena-
inc.com
Starting over 20 years ago, the traditional database market became the foundation of enterprise applications. A whole ecosystem of data management products emerged to provide capabilities like backup/recovery (Veritas), storage pooling (Data Domain) test/dev management (Delphix) and Iron Mountain (archiving). But, companies had to purchase separate products to provide a full data management solution for their enterprise.
Over the past few years and into the foreseeable future, modern data platforms will become new hubs of enterprise applications. These modern data platforms also need data management capabilities, similar to what happened with traditional databases.
(Click for build) Our vision is to help companies with their critical data management needs in a single software product, one that is optimized specifically for these modern Big Data environments.
The next few slides will introduce the unique Talena architecture and highlight how this architecture delivers on these core business benefits.
One of the most significant components of our architecture is our Smart Storage Optimizer.
By integrating compute and storage management into our storage optimizer, we’re able to deliver significant cost savings. Our application-aware architecture enables us to do deep de-duplication and compression. Our backup process is incremental-forever, saving on storage costs, and by incorporating erasure coding we also ensure high availability no matter how large a Talena cluster you choose to deploy.
Supports transparent data encryption in the security services section
Our agentless architecture makes Talena an ideal solution for big data architectures and minimizes your operational overhead. Furthermore, Talena can support multiple data platforms, versions, and use cases in a single deployment of Talena, thereby providing a “single pane of glass” for all your big data management needs.
While most of our clients work within our user interface, we also provide a REST-based API to accomplish the same tasks.