Scalar unstructured data april 28, 2010


Published on

Join our guest, Vale Inco, worldwide leading producer of nickel, and Scalar for an informative session providing you insight on how to:
•Automate data management tasks to free up IT resources and eliminate downtime
•Get better utilization out of your storage resources
•Utilize storage policies to better manage and optimize use of storage devices
•Easily add and manage storage policies for all devices from a single management console
•Reduce overall storage costs by 50 to 80%
•Cut migration times by up to 90% with zero impact to users during migration
•Reduce backup times and costs by up to 90%

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hassle-free access to the technologies you need21 vendors’ products on display with remote accessProduct demonstrations and hands-onCustomer Proof-of-Concepts in person or via remote connectionsInteroperability Testing between servers, networks and storageAccess to direct vendor assistance as neededConvenient downtown Toronto location near Yonge and KingEvents, tours and special requestsEMCAvamar and Data Domain – in our lab, with site-to-site replication between here and Vancouver office Available for product demonstrations, evaluations and POC’s.Scalar Labs also hosts bi-weekly training sessions for our customers on Fridays over lunch no charge to participate technical topics – no sales / marketing material View the schedule and register on
  • Scalar unstructured data april 28, 2010

    1. 1. Unstructured Data<br />Managing Growth of Unstructured Data<br />Michael TravesChief Architect, Data<br />
    2. 2. Session Agenda:<br /><ul><li>Overview of Scalar Decisions
    3. 3. Unstructured Data
    4. 4. Challenges
    5. 5. Approaches
    6. 6. Solutions
    7. 7. Case Study – Vale Inco
    8. 8. Tom Morrier
    9. 9. Next Steps
    10. 10. Unstructured Data Assessment
    11. 11. Activity: Demonstration @ ScalarLabs TGIF Session
    12. 12. Questions & Answers
    13. 13. Draw</li></li></ul><li>Scalar Decisions – Who we are:Toronto · Vancouver · Calgary · Ottawa · London · Kitchener · Guelph<br /><ul><li>Product and Solution delivery experts focussing on the most current technologies and complex business challenges
    14. 14. Technically led organization specializing in the design, deployment and management of complete IT Infrastructures
    15. 15. Key industry partnerships with leading technology solution vendors such as EMC and VMware</li></li></ul><li>What we do:<br />
    16. 16. Scalar Professional Services:<br />Architecture and Solution Design<br /><ul><li>Real World Experience
    17. 17. With our customers and at our own data centres using proven architectures and solutions
    18. 18. End-to-end Consulting
    19. 19. From up-front assessments to long-term architecture considerations
    20. 20. Holistic Vision
    21. 21. Scalar designs, deploys and manages the entire IT stack including eco considerations</li></ul>System Implementation<br />Capacity Planning<br />Health Checks<br />Storage and System Consolidation<br />Converged Network Infrastructure<br />
    22. 22. Scalar Leadership in Managed Services<br /><ul><li>Highly flexible, scalable and affordable managed services for customer IT environments
    23. 23. Multiple data centre hosting facilities, plus full remote management offerings at customer sites
    24. 24. Virtualized offerings include:
    25. 25. Cloud computing for primary or dev/test environment
    26. 26. Remote VMs / hosted DR at multiple sites
    27. 27. Remote monitoring of ESX and hardware platform</li></li></ul><li>Unify your test environment @20+ vendor products available to platform test <br />the systems<br />the network<br />the storage<br />
    28. 28. The Data Management Challenge<br />
    29. 29. The Traditional Infrastructure Problem<br />
    30. 30. The Challenges with Unstructured Data<br /><ul><li>Storage growth rates that average 40-120% CAGR.
    31. 31. Storage environments becoming increasing complex and difficult to manage
    32. 32. Inconsistent utilization of storage resources
    33. 33. Skyrocketing storage and backup costs
    34. 34. Lengthy data migrations and consolidations
    35. 35. Backup times that exceed backup windows
    36. 36. Costly downtime caused by disruptive data and capacity management</li></li></ul><li>The Challenge: Data Growth<br /><ul><li>Growth increases complexity and administrative burden
    37. 37. Most companies are still managing growth reactively. Where do you put new data when your filesystems fill up?
    38. 38. If you aren’t able to dynamically increase the size of a file system (pooling, thin provisioning, etc), how do you move data between filesystems/servers without impacting users?
    39. 39. When you need to increase capacity, how long does it usually take to acquire, deploy and provision it? Do you play the data “shell” game until its ready?
    40. 40. What if the new storage isn’t the same type/brand/release as the current? How does this affect integration and manageability?</li></li></ul><li>The Challenge: File Count Growth<br /><ul><li>More files means more metadata. What’s the impact?
    41. 41. In high file count environments, you have a metadata problem, not a data problem.
    42. 42. Lots of small files complicate management strategies
    43. 43. Archiving, while one strategy to address data growth actually increases file counts (stubs), creating more of a problem
    44. 44. Backup and recovery of high file count filesystems are complex – “walking a filesystem” is usually an order of magnitude more time consuming than actually moving the data.
    45. 45. More, smaller filesystems to constrain file counts increases complexity and don’t really address the source of the problem</li></li></ul><li>The Challenge: Backup Windows<br /><ul><li>Large data volumes are resource intensive
    46. 46. File system backups are sequential (one job per filesystem) and take time. Multiple filesystems create management headaches.
    47. 47. Full backups of large amounts of data takes time and chew up resources (either D2D, Tape, or Dedupe).
    48. 48. Most data doesn’t change week to week (80%+ is aged, static)
    49. 49. Large file counts create disk I/O constraints
    50. 50. A 72hr backup job can typically be 95% metadata processing and 5% data movement.
    51. 51. Solving the data problem with archiving can create the high file count problem</li></li></ul><li>The Challenge: Disruptive Migrations<br /><ul><li>Transitioning between new/old or different vendors
    52. 52. Storage is typically on a three year life cycle – which generally means four, if you account for migration in and migration out
    53. 53. How do you migrate large volumes of data between old and new storage platforms without impacting users?
    54. 54. How do you migrate between different types of technologies? I.e., NetApp to EMC, EMC to BlueArc, Windows/UNIX to NAS?
    55. 55. When migrating between different NAS vendors, how do you leverage their proprietary vendor specific tools?</li></li></ul><li>The Challenge: Disparate Storage Platforms<br /><ul><li>Multi-vendor, multi-protocol environments
    56. 56. Managing multiple solutions is typical with unstructured data – UNIX (NFS) and Windows (CIFS) typically coexist. NAS appliances or gateways come into play when UNIX/Windows can’t scale
    57. 57. Having multiple protocols across multiple fileshares, on multiple servers/NAS solutions creates management complexity. Ensuring that each platform can grow/scale to meet demand is difficult to predict, and requires different strategies for managing growth
    58. 58. Different generations/brands of technology support different features and and protocols. How do you integrate NFS3 and NFS4 across two different storage solutions? And what happens when you have to move a share from one device to the other due to space constraints?</li></li></ul><li>The Challenge: Scalability<br /><ul><li>Horizontal Scale-out, Vertical Scale-up, and Mobility
    59. 59. Scale-up strategies leverage the same server/NAS platform by adding capacity. This minimizes management overhead, assuming that filesystems can dynamically be scaled online.
    60. 60. This assumes that the existing system can sustain performance growth too
    61. 61. Scale-out strategies couple storage capacity with performance, ideally using the same building block for consistency. This is predictable, creates allocation issues
    62. 62. Can a single fileshare span multiple device? How is data and performance distributed?
    63. 63. How is data balanced across devices? Is this automated? Can data migrate between devices without impacting users?</li></li></ul><li>The Challenge: Inefficient Resource Utilization<br /><ul><li>Having multiple server/NAS devices presenting unstructured data creates administrative challenges
    64. 64. How do you manage capacity, when data on different devices grows at different rates?
    65. 65. How do you manage performance, when access patterns are unpredictable?
    66. 66. Is it possible to redistribute content between filesystems and devices to “optimize” utilization? How does this impact users?
    67. 67. When you do move a directory or share from one device to another (out of space issues anyone?), how does that impact backups? Generally, it’s included in your incremental backups.</li></li></ul><li>Approaches to Solving the these Challenges<br />
    68. 68. Approaches to Managing Unstructured Data<br /><ul><li>Quota Management
    69. 69. Archiving
    70. 70. Bigger is Better
    71. 71. Tiering
    72. 72. Deduplication
    73. 73. Replicate the Problem</li></li></ul><li>Approach: Quota Management<br /><ul><li>Establish quotas to prevent users from storing “too much” data on home, project, etc folders
    74. 74. Pro’s
    75. 75. Limits the amount of data people can store in public folders
    76. 76. Con’s
    77. 77. People always find places to store their data (desktop/laptops, external drives, etc) – usually outside the control and protection of IT
    78. 78. Drives helpdesk complaints, and constant “exceptions”
    79. 79. Does not address project/departmental folders
    80. 80. Does not move static data out of day-to-day management processes (i.e., backup/recovery)</li></li></ul><li>Approach: Archiving<br /><ul><li>Moves inactive, static content from primary storage to lower cost storage, reducing backup data volumes
    81. 81. Pro’s
    82. 82. Reduces primary storage usage, and associated costs
    83. 83. Reduces backup volumes, reducing backup tape/disk usage
    84. 84. Con’s
    85. 85. Requires stubs (for no user impact), which does not reduce file counts
    86. 86. Increasing file counts while decreasing data does not solve the backup problem – millions of files/stubs still take hours/days to process
    87. 87. The longer this strategy is employed, the more metadata/stubs you maintain, the worse the problem becomes</li></li></ul><li>Approach: Bigger is Better. More is Better<br /><ul><li>The philosophy of “buy more” to address growing storage requirements may address growth, but how does it address manageability?
    88. 88. Con’s
    89. 89. More device means more to manage. How do you organize it?
    90. 90. When 80%+ of your data is static, how do you separate it from current/new data without impacting users?
    91. 91. More primary storage creates more costs, and more backup/recovery pain
    92. 92. Just because a new, larger NAS head is “faster”, doesn’t mean you’ll be able to backup or restore it “faster”.</li></li></ul><li>Approach: Tiering<br /><ul><li>By creating different tiers of storage (i.e., FC and SATA) in your environment, perhaps on different devices, you can put data with lower access/priority on lower cost/performing storage
    93. 93. Pro’s
    94. 94. Helps manage cost by prioritizing data placement
    95. 95. Con’s
    96. 96. How do you decide what should go where?
    97. 97. What if priority or access patterns change?
    98. 98. At what level of granularity is this possible? Filesystem (LUN)? Directory? File? Block?</li></li></ul><li>Approach: Deduplication<br /><ul><li>Deduplication, combined with compression, can reduce your storage foot print across all your unstructured data
    99. 99. Pro’s
    100. 100. Deduplication can dramatically reduce the storage footprint for many types of data, promising lower storage costs long-term
    101. 101. Con’s
    102. 102. Not all data deduplication is created equal. Is it block level, file level, or variable block level? What is the performance impact?
    103. 103. More efficient storage of static data is good, but if it’s still in the backup/recovery cycle, have you really addressed the problem?
    104. 104. Most solutions today still rehydrate the data during backup. So are you really saving anything for backups? What performance impact does this imply during backup/recovery operations?</li></li></ul><li>Approach: Replicate the Problem<br /><ul><li>When backup/recovery activities kill performance on your primary storage device(s), replicate the data (and delta changes) instead.
    105. 105. Pro’s
    106. 106. Allows you to backup the replication target, instead of source
    107. 107. Gets you a DR solution while moving the backup issue offsite
    108. 108. Con’s
    109. 109. Active and Static data is still mixed, with the same policies and retentions being applied to each
    110. 110. Your storage costs have now doubled, and backup is still a (now remote site) problem. Snapshot history helps, but not forever. </li></li></ul><li>Solving the Unstructured Data Challenge<br />
    111. 111. Solving the Challenges<br /><ul><li>Automate data management tasks to free up IT resources and eliminate downtime
    112. 112. Get better utilization out of your storage resources
    113. 113. Utilize storage policies to better manage and optimize use of storage devices
    114. 114. Easily add and manage storage policies for all devices from a single management console
    115. 115. Reduce overall storage costs by 50 to 80%
    116. 116. Cut migration times by up to 90% with zero impact to users during migration
    117. 117. Reduce backup times and costs by up to 90%</li></li></ul><li>Solution: File Virtualization<br /><ul><li>Capacity Balancing
    118. 118. Balance data and I/O across multiple storage devices, making the most efficient use of your storage resources
    119. 119. Data Migration
    120. 120. Automatically migrate data between heterogeneous devices, without impacting user access – no downtime
    121. 121. Storage Tiering
    122. 122. Intelligently put data on the right type of storage based on metadata policies and aging criteria</li></li></ul><li>The Global Namespace (Wikipedia)<br /><ul><li>A Global Namespace is a heterogeneous, enterprise-wide abstraction of all file information, open to dynamic customization based on user-defined parameters. This becomes of particular importance as multiple network based file systems proliferate within an organization—the challenge becomes one of effective file management.
    123. 123. A Global NameSpace (GNS) has the unique ability to aggregate disparate and remote network based file systems, providing a consolidated view that can greatly reduce complexities of localized file management and administration. For example, prior to file system namespace consolidation, two servers exist and each represent their own independent namespaces; e.g. erver1share1 & erver2share2. Various files exist within each share respectively, however users have to access each namespace independently. This becomes an obvious challenge as the number of namespaces grows within an organization.
    124. 124. With a GNS, an organization can access a virtualized file system namespace; e.g. files now exist under a unified structure, such as ompany.comshare1, share2—where the files exist in multiple physical servershare locations but appear to be part of a single namespace</li></li></ul><li>Implementation of a Global Namespace<br />
    125. 125. Capacity Balancing<br /><ul><li>Automatically balance capacity across multiple file servers and NAS appliances
    126. 126. Make the best use of your current and future storage capacity
    127. 127. Eliminate the need to manually rebalance data – use automated, policy driven tools instead
    128. 128. Reduce storage costs, management complexities, and eliminate downtime due to maintenance</li></li></ul><li>Capacity Balancing<br />
    129. 129. Data Migration<br /><ul><li>Move data between storage devices on your schedule, not your users – seamless access to data during migration means no scheduled downtime
    130. 130. Transitioning from one generation of technology to another is now a scheduled task, not a 6 month project
    131. 131. Keep your vendors competitive – without the pain of data migration projects, your choice of solution comes down to features and costs. Why pay more by being locked in?
    132. 132. Automatically, Online, and without disrupting your business, migrate your file infrastructure with zero downtime and complex administrative burden.</li></li></ul><li>Data Migration<br />
    133. 133. Storage Tiering<br /><ul><li>With multiple tiers of storage in your environment, you now have the power to cost effectively store data based on policies you establish – age, access, type, etc
    134. 134. Keep current data on faster, regularly backed up storage, while segregating static, older content that isn’t changing to lower tiers
    135. 135. Eliminate backup of over 80% of your data by cycling it out of the regular backup scheme
    136. 136. Shrink your backups and related costs, improve recovery windows, and store data on the right tier - creating efficiencies and capital cost savings at multiple levels</li></li></ul><li>Storage Tiering<br />
    137. 137. Storage Tiering – Granular Value-based Policy<br />
    138. 138. The Benefits of File Virtualization<br /><ul><li>Capacity Balancing
    139. 139. Utilize your existing storage assets better
    140. 140. Optimize access performance and eliminate issues that impact user productivity (scheduled and unscheduled)
    141. 141. Pool the resources of servers and NAS appliances you already own, achieving better asset utilization and realized cost savings
    142. 142. Eliminated downtime and reconfiguration activities. Enable non-disruptive data management
    143. 143. Create process efficiencies in your organization through the elimination of administrator “shell-game” tasks.</li></li></ul><li>The Benefits of File Virtualization<br /><ul><li>Data Migration
    144. 144. No client reconfiguration – with a virtualized, global name space, the location of data is policy and administrator controlled. Moving data around does not impact access to it.
    145. 145. Move entire file systems or individual files around without interrupting access to them.
    146. 146. Reduce the overhead of migration projects with a streamlined, consistent, automated solution.
    147. 147. No stubs. Ever. Leaving stubs or pointers around in the filesystem does not solve the backup problem, and long-term this can become a management headache!</li></li></ul><li>The Benefits of File Virtualization<br /><ul><li>Storage Tiering
    148. 148. Reduce your storage costs by putting data on the right (cost) tier of storage – automated, policy driven.
    149. 149. Reduce your backup volumes dramatically be moving aged data out of the daily/weekly/monthly backup cycle. Back static data up once a quarter or less, with proper retention practices.
    150. 150. Tiering without Administrative overhead. Automate the challenge of what goes where, and save yourself the trouble.
    151. 151. Improve your storage utilization across all tiers and devices, automatically, as granular as the file level – without stubs!</li></li></ul><li>Storage Tiering – Optimizing Backups<br />
    152. 152. Case Study: Vale Inco<br /><ul><li>Challenges
    153. 153. Backup Windows
    154. 154. Impact to production during backups
    155. 155. Too much data, high growth, archiving partially implemented
    156. 156. How we helped
    157. 157. Information Life Cycle Management Assessment
    158. 158. Reviewing all aspects of data in their environment
    159. 159. Current State Analysis
    160. 160. Future State Recommendations
    161. 161. Technology and Design Recommendations
    162. 162. and now….. Tom Morrier!</li></li></ul><li>Introduction<br />Tom Morrier<br />Vale Inco Limited<br />Once Storage Administrator<br />Now Project Manger<br />Still Secretly the Storage Administrator<br />
    163. 163. Killing 5 Birds with one Appliance<br />
    164. 164. Our Problem(s)<br />Extremely large volumes of data growing out of control<br />Millions of files, many of them under 1k in size<br />Aging End Of Life Data Archiving solution<br />5 day backup times<br />Backups were running during business hours<br />Small change windows to take outages in<br />24 hour operation that does not like down time.<br />
    165. 165. The Solution<br />Two pair of ARX 4000’s <br /> 1 pair in our Primary DC<br /> 1 pair in our Largest Site<br />
    166. 166. How We Used the ARX<br />Tier 1<br />Tier 2<br />5 TB<br />3 TB<br />2.5 TB<br />4 TB<br />
    167. 167. The Results<br />Backup Times<br />98 hours went to 28 hours<br />5 streams have been turned into 14 streams 4 of witch only happen once a month<br />In primary DC backup times went from 110 hours for 1 full backup to 21 hours over 5 streams for the same full<br />Archiving has been undone in one site and under way in the other<br />Re-archive based on change through tiering<br />All data moves were done during business hours without impacting user data access<br />
    168. 168. Some Bonus Results<br />Tape usage has gone down thanks to tiering<br />Data types can be isolated<br />Old systems still accessing network storage surface <br />Strange connections get identified<br />MP3 library gets a boost !<br />
    169. 169. Questions<br />?<br />
    170. 170. How we can help you – two approaches<br /><ul><li>Information Life Cycle Assessment
    171. 171. We’ll look at all aspects of your data environment (online, nearline, offline), processes, and applications, and provide guidance on how to get from “current state” to your desired “future state” given the challenges specific to your business.
    172. 172. Unstructured Data Targeted – “Quick Assessment”
    173. 173. We’ll target your file servers and NAS appliances with tools specifically designed to capture and analyze your unstructured data environment, provide recommendations on design and TCO/ROI, and business justification on how, where, and to what impact a File Virtualization solution would have for you.</li></li></ul><li>Unstructured Data Assessment – what’s involved?<br /><ul><li>Discovery
    174. 174. Through a ½ day workshop, we will gather information about your processes, policies, infrastructure and challenges.
    175. 175. A data collection tool will be installed (non-invasive) to capture the metadata for the target filesystem shares (server/NAS)
    176. 176. Analysis
    177. 177. The captured data will be analyzed to determine what efficiencies would be realized, and the best design case
    178. 178. Presentation of Results
    179. 179. We will present the results of the analysis, along with recommendations on how to realize the benefits of file virtualization
    180. 180. A mapping of benefits to your specific challenges will help build an ROI/TCO for business justification
    181. 181. Specific design recommendations and costs will be presented.</li></li></ul><li>The Tool – F5 Data Manager<br />
    182. 182. Unstructured Data Assessment<br /><ul><li>Results
    183. 183. Justification through soft and hard dollar cost savings will be presented to help establish a business case for deployment in your environment
    184. 184. Costs</li></ul>Free to Session Attendees<br />Because we believe this solution can be proven out as a cost effective, highly impactful way of managing unstructured data growth, we are presenting this 2 ½ day engagement free of charge.<br />
    185. 185. Next Steps<br /><ul><li>Unstructured Data Assessment
    186. 186. Learn how file virtualization can benefit your environment
    187. 187. Free of charge for attendees who complete the survey
    188. 188. Inquire for additional information (see handout)
    189. 189. Complete your Survey
    190. 190. Be sure to complete the survey for your chance to win a Netbook!
    191. 191. Beer Tasting
    192. 192. Join us for a sampling of Duggin’s Beer</li>