Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Smarter Management for Your Data Growth


Published on

Matt Aslett (The451Group) and Deirdre Mahon (RainStor) examine the evolving data management landscape and how RainStor's Online Data Retention (OLDR) repository fits into the equation.

Published in: Technology, Business
  • Be the first to comment

Smarter Management for Your Data Growth

  1. 1. Smarter Management for Your Data Growth <br />Retain Critical Data Online At A Fraction of The Cost<br />April 2011<br />
  2. 2. Introductions<br />Changing Data Management Landscape & Trends<br />From Operational to Analytical <br />Cloud and Hadoop<br />Where do They Fit?<br />RainStor and How it Works<br />Analytics Data Retention Use-case<br />Economics<br />Q&A<br />Matt Aslett, The 451 Group<br />Deirdre Mahon, VP Marketing – RainStor<br />Ramon Chen, VP Product Management - RainStor<br />Agenda<br />
  3. 3. Total Data<br />The changing data management landscape<br />Matthew Aslett, The 451 Group<br /><br />© 2011 by The 451 Group. All rights reserved <br />
  4. 4. 451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments.<br />The 451 Group<br />Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research. <br />The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities.<br />TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide.<br />ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends. <br />
  5. 5. Overview<br />The changing data management landscape<br />One overarching trend:<br />Total Data<br />Impacting four technology areas:<br />Operational database<br />Analytic database<br />Data archiving<br />Machine-generated data<br />The trends driving data management<br />5<br />
  6. 6. Trends driving data management<br />The volume, variety and velocity of data has never been greater and is growing<br />The value of data has never been better understood<br />The capabilities for processing data have never been better<br />Higher processor performance and density are enabling advanced processing on commodity hardware<br />Software enhancements designed to make best use of processing performance and scalable architecture<br />Advanced and in-database analytics bring processing to the data, reducing latency and improving efficiency<br />The data deluge problem is also a big data opportunity<br />6<br />
  7. 7. Introducing Total Data<br />A concept define by The 451 Group to describe new approaches to data management – beyond restrictive silos<br />Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques<br />Processing any data that might be applicable to analytics<br />in the operational database, data warehouse, or Hadoop, or archive<br />Structured, semi-structured or unstructured <br />Relational or non-relational, on-premise or in the cloud<br />Inspired by ‘Total Football’<br />7<br />
  8. 8. Total Football meets Total Data<br />“You make space, you come into space. And if the ball doesn’t come, you leave this space and another player will come into it.”<br />BernadusHulshoff, Ajax 1966-77<br />Abandonment of restrictive (self-imposed) rules about individual roles and responsibility<br />Enabled and relied on fluidity and flexibility to respond to changing requirements<br />Reliant on, and exploited, improved performance levels <br />8<br />
  9. 9. Reporting/BI<br />Data management – in theory<br />9<br /><ul><li>The application is the primary source of data
  10. 10. The relational database is sacrosanct
  11. 11. The enterprise data warehouse is the single source of the truth (or is supposed to be)
  12. 12. Offline data archiving
  13. 13. Infrastructure primarily exists to support the data/application layer</li></ul>Enterprise app<br />Operationaldatabase<br />Data cleansing/sampling/MDM<br />EDW<br />Data archive<br />Infrastructure<br />
  14. 14. Data management – in practice<br />10<br /><ul><li>The relational database is sacrosanct
  15. 15. Distributed data layer to meet the scalability and performance demands
  16. 16. New opportunities for real-time BI
  17. 17. Polyglot persistence – use the most appropriate data storage for the application</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Distributed data<br />Data cleansing/sampling/MDM<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />EDW<br />Data archive<br />Infrastructure<br />
  18. 18. Data management – in practice<br />11<br /><ul><li>The enterprise data warehouse is the single source of the truth
  19. 19. Data is copied into departmental or regional data marts
  20. 20. Data warehouse administrators are fighting a losing battle for control</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data archive<br />Infrastructure<br />
  21. 21. Data management – in practice<br />12<br /><ul><li>Higher processor performance and density are enabling advanced processing on commodity hardware
  22. 22. Advanced in-database analytics bring processing to the data, reducing latency and improving efficiency</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data archive<br />Infrastructure<br />
  23. 23. Data management – in practice<br />13<br /><ul><li>Hadoop and associated analysis tools (Hive, Pig) for large-scale batch processing of large, complex data sets
  24. 24. Taking further advantage of hardware economics</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data archive<br />Infrastructure<br />
  25. 25. Data management – in practice<br />14<br /><ul><li>Integrating Hadoop with the data warehouse for ETL and also two-step data analysis
  26. 26. Greater acceptance that the EDW is part of a broader data analytics architecture</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data archive<br />Infrastructure<br />
  27. 27. Data location, data location, data location<br />Not the end of the EDW, but the EDW is one of many sources of BI, rather than the only source of BI <br />The issue of data location becomes paramount<br />Choose the right storage technology – software and hardware<br />EDW, Hadoop or archive<br />On-premise or on the cloud<br />Memory, disk or SSD<br />Understand the requirements:<br />Value and temperature of the data<br />Ensure data can be queried using existing tools/skills<br />Cost<br />15<br />
  28. 28. EDW requirements/characteristics<br />High performance query/analysis response<br />Ability to support multiple users concurrently<br />Capacity for multi-terabyte storage and scale<br />Fast data load and staging for data transformation<br />Ability to operate with BI/analytics tools<br />Security and governance<br />Cost - $20k-$50k per TB<br />Alternatives<br />Do nothing and suffer the consequences <br />Deploy appliances and/or Hadoop for specific use-cases<br />Offload to an online repository<br />16<br />
  29. 29. Data management – in practice<br />17<br /><ul><li>Offline data archiving
  30. 30. Traditionally, data archived for legal requirements
  31. 31. Previously little need for querying/analytics</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data archive<br />Infrastructure<br />
  32. 32. Data management – in practice<br />18<br /><ul><li>Regulations have increased the need to query archived data
  33. 33. Focus shifts on to how to enable querying easily and cost effectively
  34. 34. Becomes an online repository for historical data</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data repository<br />Infrastructure<br />
  35. 35. Data management – in practice<br />19<br /><ul><li>Infrastructure primarily exists to support the data/application layer
  36. 36. “Machine generated data” an untapped source of data</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data repository<br />Infrastructure<br />
  37. 37. Data management – in practice<br />20<br /><ul><li>Infrastructure as a source of data for analysis and integration with application data: ‘datastructure’
  38. 38. Likely to transform into data-generating and data-processing infrastructure as analytics capabilities are applied directly to the data source</li></ul>Enterprise app<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Data repository<br />Datastructure<br />
  39. 39. Data management – in practice<br />21<br /><ul><li>Cloud as both a source of data and data storage and processing layer</li></ul>Enterprise app<br />Hadoop/DW<br />Data archive<br />Analytic DB<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Cloud Infrastructure<br />Data repository<br />Datastructure<br />
  40. 40. Total Data<br />22<br /><ul><li>More flexible approach to data management
  41. 41. Greater opportunities for business intelligence</li></ul>Enterprise app<br />Hadoop/DW<br />Data archive<br />Analytic DB<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Cloud Infrastructure<br />Data repository<br />Datastructure<br />
  42. 42. Data location, data location, data location<br />Avoid data movement and duplication – retain governance<br />Virtual data marts and data clouds<br />Data virtualization to provide access to multiple data sources<br />23<br />
  43. 43. Data virtualization<br />24<br />Enterprise app<br />Hadoop/DW<br />Data archive<br />Analytic DB<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Cloud Infrastructure<br />Data repository<br />Datastructure<br />
  44. 44. Data virtualization<br />25<br />Enterprise app<br />Analytic DB<br />Hadoop/DW<br />Data archive<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Datavirtualization<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Virtualdata mart<br />Virtualdata mart<br />Virtualdata mart<br />Virtualdata mart<br />Virtualdata mart<br />Virtualdata mart<br />EDW<br />Cloud Infrastructure<br />Data repository<br />Datastructure<br />
  45. 45. Who is RainStor?<br />Specialized database for cost effective<br />reduction, retention & on-demand retrieval<br />of historical structured data <br />At 10x Less Cost<br />OEM Partner Model<br />Cloud or On-premise<br />
  46. 46. Partner Case Studies<br />HP<br />Sector :Telco<br />Solution : CDR/IPDR retention and lawful intercept (HP Dragon)<br />Retaining billions of CDRs per day in immutable form and enabling cost effective query for regulatory authorities<br /><ul><li>Sector : Telco
  47. 47. Solution : Message (SMS/MMS) and traffic log management
  48. 48. Retaining 1000s of messages a second while keeping accessible for regulatory purposes
  49. 49. Sector : Horizontal
  50. 50. Solution : Teradata Data Retention Machine
  51. 51. Retain BI & Analytical data long term in RainStor powered Data Retention Machine for low cost per TB stored. Eliminating tape.
  52. 52. Sector : Various/Horizontal
  53. 53. Solution : Information Lifecycle Management
  54. 54. Retaining historical data from highly complex packaged applications while keeping accessible for business and regulatory purposes</li></li></ul><li>Data Retention Solution Requirements<br />Database Archiving<br />Application Retirement<br />Data Warehouse Archiving<br />Data Warehouse Appliance<br />Online Data Retention (OLDR)<br />Analytical<br />OLAP<br />Transactional<br />OLTP<br />Compliance<br />Query<br />Static Machine-Generated Data (MGD)<br />
  55. 55. Where RainStor Fits<br /> Enterpriseapp<br />Hadoop/DW<br />Data archive<br />Analytic DB<br />Application <br />Archive / Retired<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting/BI<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Reporting<br />Distributed data<br />Data cleansing/sampling/MDM<br />Hadoop<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Operational<br />database<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />Analytic<br />database<br />Analytic<br />database<br />Analyticdatabase<br />EDW<br />Cloud Infrastructure<br />Data repository<br />Datastructure<br />
  56. 56. RainStor’s Focus<br />SmartGrid to Generated 1 Exabyte of Data<br />In US Alone<br />Next 2 years<br />Data security will account for over 60% of new enterprise security spending in next 3 years<br />Global mobile data traffic will grow 26-fold between 2010 and 2015! <br />(6.3 Exabyte's p/mth)<br />Utilities<br /><ul><li>SmartGrid
  57. 57. e Meter</li></ul>Security<br />Network Forensics<br />Cyber-security<br />Communications<br /><ul><li> OSS
  58. 58. BSS
  59. 59. ISS</li></ul>Big Data Volumes<br />- Needs to be online & Query-able<br />Found the needle – where’s the haystack?<br />Volumes are rising- <br />Regulated - <br />Infrastructure needs -<br />Reaching Telco-scale<br />Multi- billions of records<br />Strict Compliance<br />RDBMS’s Break<br />Analytics Required<br />10’s of Petabytes Retained<br />
  60. 60. How Does RainStor Do It? <br />Reduce<br />SIZE: Massive de-dupe ~97% savings in storage<br />HARDWARE: On commodity server/disk infrastructure<br />RESOURCES: Without specialist DBA support<br />Retain<br />PRESERVED: Massive record volumes in original form<br />IMMUTABLE: Tamper proofed with audit trail<br />CONFIGURABLE: With retention & expiry policies<br />Retrieve<br />STANDARDS: SQL & BI tools via ODBC/JDBC<br />PERFORMANT: Fast queries for large complex data sets<br />FLEXIBLE: With schema evolution & point-in-time access<br />
  61. 61. RainStor’s Disruptive Technology<br /><ul><li>Patented – 4 layers of compression
  62. 62. Data Reduction through value and pattern de-duplication
  63. 63. Further Algorithmic-level and byte-level compression
  64. 64. Fast Queries in stored format without re-inflation.</li></ul>Smith<br />Pharma<br />Peter<br />$40,000<br />Pharma<br />Smith<br />$40,000<br />Peter<br />Finance<br />Paul<br />$35,000<br />Pharma<br />Smith<br />$40,000<br />Peter<br />Finance<br />Paul<br />Brown<br />$35,000<br />John<br />
  65. 65. Offload Warehouse Data to Online ArchiveHigh Performance & Lower Cost<br /><ul><li>Augment existing warehouse & analytics systems by providing access to years of history
  66. 66. Run query on RainStor and import results to data warehouse
  67. 67. Re-instate data from data retention repository back to warehouse for deep analytics</li></ul>Benefits:<br /><ul><li>Lower TCO (Admin, Storage, CPU)
  68. 68. Compliant data retention
  69. 69. Unlimited scalability
  70. 70. Add more data sources for broader analysis</li></ul>50 Quarters<br />Source DB<br />e.g. Oracle<br />Analytics/DW<br />5 Quarters<br />
  71. 71. RainStor Cloud<br />2. Encrypted data stored in private containers ensuring security and easy management.<br />1. Compressed de-duplicated data sent to the cloud resulting in quicker and cheaper uploads.<br />VM Software Appliance<br />Amazon<br />Send<br />S3<br />Search<br />EC2<br />ODBC/JDBC<br />Store<br />3. Data accessed on demand using standard SQL tools leveraging elasticity of the cloud <br />
  72. 72. How Do the Economics Stack Up?<br />
  73. 73. Quick summary<br />The growing volume, variety and velocity of data is a problem, but it is also an opportunity<br />Requires a broader approach to data management<br />Deploy appliances and Hadoop for specific use-cases, and online repository for historical data<br />‘Datastructure’ will become increasingly valuable, not only as a source of data but also as a source of intelligence<br />Data location, and the role of data virtualization will come into greater focus <br />36<br />
  74. 74. Q&A<br />
  75. 75. FULL TIME<br />Thank you<br />