The Data Lake
Hidden in Your Backups
Jeff Reichard
Field Technical Evangelist, Product Strategy
Veeam Software
@JeffReichard
Jeff.Reichard@Veeam.com
Veeam® DataLabs™
Veeam® Data Integration API
Introduction to Veeam®
State of the Data Lake Market
Agenda
The Data Lake Hidden in Your Backups
Why Might Vendors
Love #BigData?
• Data Lakes
• Total market $12BN by 2024
• CAGR 27.8%
• Data Warehouses
• Total market $34BN by 2025
• CAGR 8.2%
Major IT Trends Driving Data Lakes
Cloud AI/MLEdge/IoT
What are the Best Outcomes?
• Shell – reduced equipment failure rates, reduced inventory
analysis time from 48 hours to 45 minutes, saving
$millions/year
• Merck – MANTIS 50% reduction in inventory carrying costs
• Monsanto – planting optimization = 4% land use
reduction and $millions saved
• And many more…
Provisional schema on load nature is inherently agile
Opens up new data types for IOT and other unstructured data
Can be responsive, granular, flexible because no ETL
Why?
So what goes Wrong?
Expertise
Security
Requirements
Governance
Data flows
Intro to Veeam
Backup & recovery
Ensure that data is never lost and
that applications stay available.
Recover as fast as possible if server
crashes or if power outages
or natural disasters occur.
Insurance policy?
Veeam Availability Platform
Backup & Replication
Backup & Replication
Monitoring & Analytics
DataLabs
Orchestration
Universal Storage API
CloudSaaSPhysicalVirtual
Object Storage
Veeam Platform Differentiators
355K+
WW Customers
Adding 4,000+ / mo
+75
Net Promoter Score
$963M
2018 Bookings
18M+
Servers Protected
Veeam DataLabs
What is Veeam DataLabs?
• On-Demand Sandbox™: Virtual sandbox for testing (including On-Demand Sandbox
for Storage Snapshots).
• SureBackup® and SureReplica: Validate backups and replicas.
• NEW Staged Restore: Reduce time to review sensitive data and remove personal
information, streamlining compliance (including GDPR).
• NEW Secure Restore: Increase security and reduce interruptions by scanning backups
with an anti-virus software interface to prevent introducing viruses and malware into
production.
Delivering added business value without disrupting business!
Powerful capabilities that empower IT and developers to test workloads to validate updates,
patches, security vulnerabilities, compliance and general recoverability of workloads.
Features (continuous evolution of innovation):
The Virtual Lab is an isolated virtual environment
fully fenced off from production
Veeam Backup & Replication™ starts VMs from
the application group
Used for:
• SureBackup
• U-AIR®
• On-Demand Sandbox
• Staged Restore
VMs running in the Virtual Lab consume CPU
and memory resources of the hypervisor host
where the Virtual Lab is deployed
Virtual Lab
Proxy
appliance
Virtual Lab
Use cases for Veeam DataLabs
Leveraging existing data proactively
Improve IT services
and operations;
DevOps
Security
and forensics
Improve backup-
related operational
efficiencies
Analyze and ensure
compliance
Use cases for Veeam DataLabs
Leveraging existing data proactively
Data Lake / Analytics
Data Integration API
Backed up Disks
iSCSI
Veeam Repository
• PS cmdlet to publish
the VM disks to a
server
• The disks can also
be presented via
iSCSI to any server
Advantages for Data Lake Projects
Expertise
Security
Requirements
Governance
Data flows
A virtual gold mine of data is sitting idle in backups
Thank You!
Questions?
Please visit us at stand 31!

The Data lake hidden in your backups - Big Data Expo 2019

  • 1.
    The Data Lake Hiddenin Your Backups Jeff Reichard Field Technical Evangelist, Product Strategy Veeam Software @JeffReichard Jeff.Reichard@Veeam.com
  • 2.
    Veeam® DataLabs™ Veeam® DataIntegration API Introduction to Veeam® State of the Data Lake Market Agenda The Data Lake Hidden in Your Backups
  • 3.
    Why Might Vendors Love#BigData? • Data Lakes • Total market $12BN by 2024 • CAGR 27.8% • Data Warehouses • Total market $34BN by 2025 • CAGR 8.2%
  • 4.
    Major IT TrendsDriving Data Lakes Cloud AI/MLEdge/IoT
  • 5.
    What are theBest Outcomes? • Shell – reduced equipment failure rates, reduced inventory analysis time from 48 hours to 45 minutes, saving $millions/year • Merck – MANTIS 50% reduction in inventory carrying costs • Monsanto – planting optimization = 4% land use reduction and $millions saved • And many more…
  • 6.
    Provisional schema onload nature is inherently agile Opens up new data types for IOT and other unstructured data Can be responsive, granular, flexible because no ETL Why?
  • 7.
    So what goesWrong? Expertise Security Requirements Governance Data flows
  • 8.
  • 9.
    Backup & recovery Ensurethat data is never lost and that applications stay available. Recover as fast as possible if server crashes or if power outages or natural disasters occur. Insurance policy?
  • 10.
    Veeam Availability Platform Backup& Replication Backup & Replication Monitoring & Analytics DataLabs Orchestration Universal Storage API CloudSaaSPhysicalVirtual Object Storage
  • 11.
  • 12.
    355K+ WW Customers Adding 4,000+/ mo +75 Net Promoter Score $963M 2018 Bookings 18M+ Servers Protected
  • 13.
  • 14.
    What is VeeamDataLabs? • On-Demand Sandbox™: Virtual sandbox for testing (including On-Demand Sandbox for Storage Snapshots). • SureBackup® and SureReplica: Validate backups and replicas. • NEW Staged Restore: Reduce time to review sensitive data and remove personal information, streamlining compliance (including GDPR). • NEW Secure Restore: Increase security and reduce interruptions by scanning backups with an anti-virus software interface to prevent introducing viruses and malware into production. Delivering added business value without disrupting business! Powerful capabilities that empower IT and developers to test workloads to validate updates, patches, security vulnerabilities, compliance and general recoverability of workloads. Features (continuous evolution of innovation):
  • 15.
    The Virtual Labis an isolated virtual environment fully fenced off from production Veeam Backup & Replication™ starts VMs from the application group Used for: • SureBackup • U-AIR® • On-Demand Sandbox • Staged Restore VMs running in the Virtual Lab consume CPU and memory resources of the hypervisor host where the Virtual Lab is deployed Virtual Lab Proxy appliance Virtual Lab
  • 16.
    Use cases forVeeam DataLabs Leveraging existing data proactively Improve IT services and operations; DevOps Security and forensics Improve backup- related operational efficiencies Analyze and ensure compliance
  • 17.
    Use cases forVeeam DataLabs Leveraging existing data proactively Data Lake / Analytics
  • 18.
    Data Integration API Backedup Disks iSCSI Veeam Repository • PS cmdlet to publish the VM disks to a server • The disks can also be presented via iSCSI to any server
  • 19.
    Advantages for DataLake Projects Expertise Security Requirements Governance Data flows
  • 21.
    A virtual goldmine of data is sitting idle in backups
  • 22.