Smarter Management for Your Data Growth
Upcoming SlideShare
Loading in...5
×
 

Smarter Management for Your Data Growth

on

  • 3,395 views

Matt Aslett (The451Group) and Deirdre Mahon (RainStor) examine the evolving data management landscape and how RainStor's Online Data Retention (OLDR) repository fits into the equation.

Matt Aslett (The451Group) and Deirdre Mahon (RainStor) examine the evolving data management landscape and how RainStor's Online Data Retention (OLDR) repository fits into the equation.

Statistics

Views

Total Views
3,395
Views on SlideShare
3,391
Embed Views
4

Actions

Likes
1
Downloads
50
Comments
0

1 Embed 4

http://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • De-dupe & ReductionAny storage / PlatformCloud EnabledLimitless Data VolumesFast load – Ingestion RatesSQL Query – High PerformanceImmutable Compliant Store
  • So if we take a look at Matt’s earlier high level architecture diagram, I think its worth pointing out the key areas RainStor technology can be applied – at the top, we have a RS repository which can be deployed alongside the RDBMS … and can be archived / retired saving by compressing the data to a much smaller footprint. Our INFA partnership focuses on this area predominantly and retires a large number of applications such as Oracle ebusiness suite… On the lower part of the screen – RS can be deployed as the leading repository to store long term historical data for EDW’s and additionally the same data sets can be stored on the cloud…
  • Security Industry:The combination of the increase in cybercrime, changing regulations, and public exposures is increasing the attention and resources dedicated to data security. Over the next three years it's expected that data security issues (and the related application security) will account for over 60% of new enterprise security spending- this includes spending on new technologies, and excludes maintenance of existing technologies such as firewalls and antivirus, which account for most current security costs.Data and business application security will drive most of the new growth of the security market over the next 3-5 years.Business network traffic for 2010 > 3,800 Pb / month> 2,500 Pb internet traffic > 1,200 Pb WAN traffic > 58 Pb mobile trafficCisco forecasts 20% CAGRData breaches are common - 95% of records stolen externally - 90% involved malware - 70% were uncovered by outsiders - 50% went unnoticed for monthsCSPs: Global mobile data traffic will increase 26-fold between 2010 and 2015. Mobile data traffic will grow at acompound annual growth rate (CAGR) of 92 percent from 2010 to 2015, reaching 6.3 exabytes per month by 2015.Last year’s mobile data traffic was three times the size of the entire global Internet in 2000.

Smarter Management for Your Data Growth Smarter Management for Your Data Growth Presentation Transcript

  • Smarter Management for Your Data Growth
    Retain Critical Data Online At A Fraction of The Cost
    April 2011
  • Introductions
    Changing Data Management Landscape & Trends
    From Operational to Analytical
    Cloud and Hadoop
    Where do They Fit?
    RainStor and How it Works
    Analytics Data Retention Use-case
    Economics
    Q&A
    Matt Aslett, The 451 Group
    Deirdre Mahon, VP Marketing – RainStor
    Ramon Chen, VP Product Management - RainStor
    Agenda
  • Total Data
    The changing data management landscape
    Matthew Aslett, The 451 Group
    matthew.aslett@the451group.com
    © 2011 by The 451 Group. All rights reserved
    View slide
  • 451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments.
    The 451 Group
    Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research.
    The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities.
    TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide.
    ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
    View slide
  • Overview
    The changing data management landscape
    One overarching trend:
    Total Data
    Impacting four technology areas:
    Operational database
    Analytic database
    Data archiving
    Machine-generated data
    The trends driving data management
    5
  • Trends driving data management
    The volume, variety and velocity of data has never been greater and is growing
    The value of data has never been better understood
    The capabilities for processing data have never been better
    Higher processor performance and density are enabling advanced processing on commodity hardware
    Software enhancements designed to make best use of processing performance and scalable architecture
    Advanced and in-database analytics bring processing to the data, reducing latency and improving efficiency
    The data deluge problem is also a big data opportunity
    6
  • Introducing Total Data
    A concept define by The 451 Group to describe new approaches to data management – beyond restrictive silos
    Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques
    Processing any data that might be applicable to analytics
    in the operational database, data warehouse, or Hadoop, or archive
    Structured, semi-structured or unstructured
    Relational or non-relational, on-premise or in the cloud
    Inspired by ‘Total Football’
    7
  • Total Football meets Total Data
    “You make space, you come into space. And if the ball doesn’t come, you leave this space and another player will come into it.”
    BernadusHulshoff, Ajax 1966-77
    Abandonment of restrictive (self-imposed) rules about individual roles and responsibility
    Enabled and relied on fluidity and flexibility to respond to changing requirements
    Reliant on, and exploited, improved performance levels
    8
  • Reporting/BI
    Data management – in theory
    9
    • The application is the primary source of data
    • The relational database is sacrosanct
    • The enterprise data warehouse is the single source of the truth (or is supposed to be)
    • Offline data archiving
    • Infrastructure primarily exists to support the data/application layer
    Enterprise app
    Operationaldatabase
    Data cleansing/sampling/MDM
    EDW
    Data archive
    Infrastructure
  • Data management – in practice
    10
    • The relational database is sacrosanct
    • Distributed data layer to meet the scalability and performance demands
    • New opportunities for real-time BI
    • Polyglot persistence – use the most appropriate data storage for the application
    Enterprise app
    Reporting/BI
    Reporting/BI
    Distributed data
    Data cleansing/sampling/MDM
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    EDW
    Data archive
    Infrastructure
  • Data management – in practice
    11
    • The enterprise data warehouse is the single source of the truth
    • Data is copied into departmental or regional data marts
    • Data warehouse administrators are fighting a losing battle for control
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data archive
    Infrastructure
  • Data management – in practice
    12
    • Higher processor performance and density are enabling advanced processing on commodity hardware
    • Advanced in-database analytics bring processing to the data, reducing latency and improving efficiency
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data archive
    Infrastructure
  • Data management – in practice
    13
    • Hadoop and associated analysis tools (Hive, Pig) for large-scale batch processing of large, complex data sets
    • Taking further advantage of hardware economics
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data archive
    Infrastructure
  • Data management – in practice
    14
    • Integrating Hadoop with the data warehouse for ETL and also two-step data analysis
    • Greater acceptance that the EDW is part of a broader data analytics architecture
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data archive
    Infrastructure
  • Data location, data location, data location
    Not the end of the EDW, but the EDW is one of many sources of BI, rather than the only source of BI
    The issue of data location becomes paramount
    Choose the right storage technology – software and hardware
    EDW, Hadoop or archive
    On-premise or on the cloud
    Memory, disk or SSD
    Understand the requirements:
    Value and temperature of the data
    Ensure data can be queried using existing tools/skills
    Cost
    15
  • EDW requirements/characteristics
    High performance query/analysis response
    Ability to support multiple users concurrently
    Capacity for multi-terabyte storage and scale
    Fast data load and staging for data transformation
    Ability to operate with BI/analytics tools
    Security and governance
    Cost - $20k-$50k per TB
    Alternatives
    Do nothing and suffer the consequences
    Deploy appliances and/or Hadoop for specific use-cases
    Offload to an online repository
    16
  • Data management – in practice
    17
    • Offline data archiving
    • Traditionally, data archived for legal requirements
    • Previously little need for querying/analytics
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data archive
    Infrastructure
  • Data management – in practice
    18
    • Regulations have increased the need to query archived data
    • Focus shifts on to how to enable querying easily and cost effectively
    • Becomes an online repository for historical data
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data repository
    Infrastructure
  • Data management – in practice
    19
    • Infrastructure primarily exists to support the data/application layer
    • “Machine generated data” an untapped source of data
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data repository
    Infrastructure
  • Data management – in practice
    20
    • Infrastructure as a source of data for analysis and integration with application data: ‘datastructure’
    • Likely to transform into data-generating and data-processing infrastructure as analytics capabilities are applied directly to the data source
    Enterprise app
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Data repository
    Datastructure
  • Data management – in practice
    21
    • Cloud as both a source of data and data storage and processing layer
    Enterprise app
    Hadoop/DW
    Data archive
    Analytic DB
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Cloud Infrastructure
    Data repository
    Datastructure
  • Total Data
    22
    • More flexible approach to data management
    • Greater opportunities for business intelligence
    Enterprise app
    Hadoop/DW
    Data archive
    Analytic DB
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Cloud Infrastructure
    Data repository
    Datastructure
  • Data location, data location, data location
    Avoid data movement and duplication – retain governance
    Virtual data marts and data clouds
    Data virtualization to provide access to multiple data sources
    23
  • Data virtualization
    24
    Enterprise app
    Hadoop/DW
    Data archive
    Analytic DB
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Cloud Infrastructure
    Data repository
    Datastructure
  • Data virtualization
    25
    Enterprise app
    Analytic DB
    Hadoop/DW
    Data archive
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Datavirtualization
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Virtualdata mart
    Virtualdata mart
    Virtualdata mart
    Virtualdata mart
    Virtualdata mart
    Virtualdata mart
    EDW
    Cloud Infrastructure
    Data repository
    Datastructure
  • Who is RainStor?
    Specialized database for cost effective
    reduction, retention & on-demand retrieval
    of historical structured data
    At 10x Less Cost
    OEM Partner Model
    Cloud or On-premise
  • Partner Case Studies
    HP
    Sector :Telco
    Solution : CDR/IPDR retention and lawful intercept (HP Dragon)
    Retaining billions of CDRs per day in immutable form and enabling cost effective query for regulatory authorities
    • Sector : Telco
    • Solution : Message (SMS/MMS) and traffic log management
    • Retaining 1000s of messages a second while keeping accessible for regulatory purposes
    • Sector : Horizontal
    • Solution : Teradata Data Retention Machine
    • Retain BI & Analytical data long term in RainStor powered Data Retention Machine for low cost per TB stored. Eliminating tape.
    • Sector : Various/Horizontal
    • Solution : Information Lifecycle Management
    • Retaining historical data from highly complex packaged applications while keeping accessible for business and regulatory purposes
  • Data Retention Solution Requirements
    Database Archiving
    Application Retirement
    Data Warehouse Archiving
    Data Warehouse Appliance
    Online Data Retention (OLDR)
    Analytical
    OLAP
    Transactional
    OLTP
    Compliance
    Query
    Static Machine-Generated Data (MGD)
  • Where RainStor Fits
    Enterpriseapp
    Hadoop/DW
    Data archive
    Analytic DB
    Application
    Archive / Retired
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting/BI
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Reporting
    Distributed data
    Data cleansing/sampling/MDM
    Hadoop
    Operational
    database
    Operational
    database
    Operational
    database
    Operational
    database
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    Analytic
    database
    Analytic
    database
    Analyticdatabase
    EDW
    Cloud Infrastructure
    Data repository
    Datastructure
  • RainStor’s Focus
    SmartGrid to Generated 1 Exabyte of Data
    In US Alone
    Next 2 years
    Data security will account for over 60% of new enterprise security spending in next 3 years
    Global mobile data traffic will grow 26-fold between 2010 and 2015!
    (6.3 Exabyte's p/mth)
    Utilities
    • SmartGrid
    • e Meter
    Security
    Network Forensics
    Cyber-security
    Communications
    • OSS
    • BSS
    • ISS
    Big Data Volumes
    - Needs to be online & Query-able
    Found the needle – where’s the haystack?
    Volumes are rising-
    Regulated -
    Infrastructure needs -
    Reaching Telco-scale
    Multi- billions of records
    Strict Compliance
    RDBMS’s Break
    Analytics Required
    10’s of Petabytes Retained
  • How Does RainStor Do It?
    Reduce
    SIZE: Massive de-dupe ~97% savings in storage
    HARDWARE: On commodity server/disk infrastructure
    RESOURCES: Without specialist DBA support
    Retain
    PRESERVED: Massive record volumes in original form
    IMMUTABLE: Tamper proofed with audit trail
    CONFIGURABLE: With retention & expiry policies
    Retrieve
    STANDARDS: SQL & BI tools via ODBC/JDBC
    PERFORMANT: Fast queries for large complex data sets
    FLEXIBLE: With schema evolution & point-in-time access
  • RainStor’s Disruptive Technology
    • Patented – 4 layers of compression
    • Data Reduction through value and pattern de-duplication
    • Further Algorithmic-level and byte-level compression
    • Fast Queries in stored format without re-inflation.
    Smith
    Pharma
    Peter
    $40,000
    Pharma
    Smith
    $40,000
    Peter
    Finance
    Paul
    $35,000
    Pharma
    Smith
    $40,000
    Peter
    Finance
    Paul
    Brown
    $35,000
    John
  • Offload Warehouse Data to Online ArchiveHigh Performance & Lower Cost
    • Augment existing warehouse & analytics systems by providing access to years of history
    • Run query on RainStor and import results to data warehouse
    • Re-instate data from data retention repository back to warehouse for deep analytics
    Benefits:
    • Lower TCO (Admin, Storage, CPU)
    • Compliant data retention
    • Unlimited scalability
    • Add more data sources for broader analysis
    50 Quarters
    Source DB
    e.g. Oracle
    Analytics/DW
    5 Quarters
  • RainStor Cloud
    2. Encrypted data stored in private containers ensuring security and easy management.
    1. Compressed de-duplicated data sent to the cloud resulting in quicker and cheaper uploads.
    VM Software Appliance
    Amazon
    Send
    S3
    Search
    EC2
    ODBC/JDBC
    Store
    3. Data accessed on demand using standard SQL tools leveraging elasticity of the cloud
  • How Do the Economics Stack Up?
  • Quick summary
    The growing volume, variety and velocity of data is a problem, but it is also an opportunity
    Requires a broader approach to data management
    Deploy appliances and Hadoop for specific use-cases, and online repository for historical data
    ‘Datastructure’ will become increasingly valuable, not only as a source of data but also as a source of intelligence
    Data location, and the role of data virtualization will come into greater focus
    36
  • Q&A
  • FULL TIME
    Thank you