Scalar unstructured data april 28, 2010
Upcoming SlideShare
Loading in...5

Scalar unstructured data april 28, 2010



Join our guest, Vale Inco, worldwide leading producer of nickel, and Scalar for an informative session providing you insight on how to: ...

Join our guest, Vale Inco, worldwide leading producer of nickel, and Scalar for an informative session providing you insight on how to:
•Automate data management tasks to free up IT resources and eliminate downtime
•Get better utilization out of your storage resources
•Utilize storage policies to better manage and optimize use of storage devices
•Easily add and manage storage policies for all devices from a single management console
•Reduce overall storage costs by 50 to 80%
•Cut migration times by up to 90% with zero impact to users during migration
•Reduce backup times and costs by up to 90%



Total Views
Views on SlideShare
Embed Views



5 Embeds 9 4 2 1 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Hassle-free access to the technologies you need21 vendors’ products on display with remote accessProduct demonstrations and hands-onCustomer Proof-of-Concepts in person or via remote connectionsInteroperability Testing between servers, networks and storageAccess to direct vendor assistance as neededConvenient downtown Toronto location near Yonge and KingEvents, tours and special requestsEMCAvamar and Data Domain – in our lab, with site-to-site replication between here and Vancouver office Available for product demonstrations, evaluations and POC’s.Scalar Labs also hosts bi-weekly training sessions for our customers on Fridays over lunch no charge to participate technical topics – no sales / marketing material View the schedule and register on

Scalar unstructured data april 28, 2010 Scalar unstructured data april 28, 2010 Presentation Transcript

  • Unstructured Data
    Managing Growth of Unstructured Data
    Michael TravesChief Architect, Data
  • Session Agenda:
    • Overview of Scalar Decisions
    • Unstructured Data
    • Challenges
    • Approaches
    • Solutions
    • Case Study – Vale Inco
    • Tom Morrier
    • Next Steps
    • Unstructured Data Assessment
    • Activity: Demonstration @ ScalarLabs TGIF Session
    • Questions & Answers
    • Draw
  • Scalar Decisions – Who we are:Toronto · Vancouver · Calgary · Ottawa · London · Kitchener · Guelph
    • Product and Solution delivery experts focussing on the most current technologies and complex business challenges
    • Technically led organization specializing in the design, deployment and management of complete IT Infrastructures
    • Key industry partnerships with leading technology solution vendors such as EMC and VMware
  • What we do:
  • Scalar Professional Services:
    Architecture and Solution Design
    • Real World Experience
    • With our customers and at our own data centres using proven architectures and solutions
    • End-to-end Consulting
    • From up-front assessments to long-term architecture considerations
    • Holistic Vision
    • Scalar designs, deploys and manages the entire IT stack including eco considerations
    System Implementation
    Capacity Planning
    Health Checks
    Storage and System Consolidation
    Converged Network Infrastructure
  • Scalar Leadership in Managed Services
    • Highly flexible, scalable and affordable managed services for customer IT environments
    • Multiple data centre hosting facilities, plus full remote management offerings at customer sites
    • Virtualized offerings include:
    • Cloud computing for primary or dev/test environment
    • Remote VMs / hosted DR at multiple sites
    • Remote monitoring of ESX and hardware platform
  • Unify your test environment @20+ vendor products available to platform test
    the systems
    the network
    the storage
  • The Data Management Challenge
  • The Traditional Infrastructure Problem
  • The Challenges with Unstructured Data
    • Storage growth rates that average 40-120% CAGR.
    • Storage environments becoming increasing complex and difficult to manage
    • Inconsistent utilization of storage resources
    • Skyrocketing storage and backup costs
    • Lengthy data migrations and consolidations
    • Backup times that exceed backup windows
    • Costly downtime caused by disruptive data and capacity management
  • The Challenge: Data Growth
    • Growth increases complexity and administrative burden
    • Most companies are still managing growth reactively. Where do you put new data when your filesystems fill up?
    • If you aren’t able to dynamically increase the size of a file system (pooling, thin provisioning, etc), how do you move data between filesystems/servers without impacting users?
    • When you need to increase capacity, how long does it usually take to acquire, deploy and provision it? Do you play the data “shell” game until its ready?
    • What if the new storage isn’t the same type/brand/release as the current? How does this affect integration and manageability?
  • The Challenge: File Count Growth
    • More files means more metadata. What’s the impact?
    • In high file count environments, you have a metadata problem, not a data problem.
    • Lots of small files complicate management strategies
    • Archiving, while one strategy to address data growth actually increases file counts (stubs), creating more of a problem
    • Backup and recovery of high file count filesystems are complex – “walking a filesystem” is usually an order of magnitude more time consuming than actually moving the data.
    • More, smaller filesystems to constrain file counts increases complexity and don’t really address the source of the problem
  • The Challenge: Backup Windows
    • Large data volumes are resource intensive
    • File system backups are sequential (one job per filesystem) and take time. Multiple filesystems create management headaches.
    • Full backups of large amounts of data takes time and chew up resources (either D2D, Tape, or Dedupe).
    • Most data doesn’t change week to week (80%+ is aged, static)
    • Large file counts create disk I/O constraints
    • A 72hr backup job can typically be 95% metadata processing and 5% data movement.
    • Solving the data problem with archiving can create the high file count problem
  • The Challenge: Disruptive Migrations
    • Transitioning between new/old or different vendors
    • Storage is typically on a three year life cycle – which generally means four, if you account for migration in and migration out
    • How do you migrate large volumes of data between old and new storage platforms without impacting users?
    • How do you migrate between different types of technologies? I.e., NetApp to EMC, EMC to BlueArc, Windows/UNIX to NAS?
    • When migrating between different NAS vendors, how do you leverage their proprietary vendor specific tools?
  • The Challenge: Disparate Storage Platforms
    • Multi-vendor, multi-protocol environments
    • Managing multiple solutions is typical with unstructured data – UNIX (NFS) and Windows (CIFS) typically coexist. NAS appliances or gateways come into play when UNIX/Windows can’t scale
    • Having multiple protocols across multiple fileshares, on multiple servers/NAS solutions creates management complexity. Ensuring that each platform can grow/scale to meet demand is difficult to predict, and requires different strategies for managing growth
    • Different generations/brands of technology support different features and and protocols. How do you integrate NFS3 and NFS4 across two different storage solutions? And what happens when you have to move a share from one device to the other due to space constraints?
  • The Challenge: Scalability
    • Horizontal Scale-out, Vertical Scale-up, and Mobility
    • Scale-up strategies leverage the same server/NAS platform by adding capacity. This minimizes management overhead, assuming that filesystems can dynamically be scaled online.
    • This assumes that the existing system can sustain performance growth too
    • Scale-out strategies couple storage capacity with performance, ideally using the same building block for consistency. This is predictable, creates allocation issues
    • Can a single fileshare span multiple device? How is data and performance distributed?
    • How is data balanced across devices? Is this automated? Can data migrate between devices without impacting users?
  • The Challenge: Inefficient Resource Utilization
    • Having multiple server/NAS devices presenting unstructured data creates administrative challenges
    • How do you manage capacity, when data on different devices grows at different rates?
    • How do you manage performance, when access patterns are unpredictable?
    • Is it possible to redistribute content between filesystems and devices to “optimize” utilization? How does this impact users?
    • When you do move a directory or share from one device to another (out of space issues anyone?), how does that impact backups? Generally, it’s included in your incremental backups.
  • Approaches to Solving the these Challenges
  • Approaches to Managing Unstructured Data
    • Quota Management
    • Archiving
    • Bigger is Better
    • Tiering
    • Deduplication
    • Replicate the Problem
  • Approach: Quota Management
    • Establish quotas to prevent users from storing “too much” data on home, project, etc folders
    • Pro’s
    • Limits the amount of data people can store in public folders
    • Con’s
    • People always find places to store their data (desktop/laptops, external drives, etc) – usually outside the control and protection of IT
    • Drives helpdesk complaints, and constant “exceptions”
    • Does not address project/departmental folders
    • Does not move static data out of day-to-day management processes (i.e., backup/recovery)
  • Approach: Archiving
    • Moves inactive, static content from primary storage to lower cost storage, reducing backup data volumes
    • Pro’s
    • Reduces primary storage usage, and associated costs
    • Reduces backup volumes, reducing backup tape/disk usage
    • Con’s
    • Requires stubs (for no user impact), which does not reduce file counts
    • Increasing file counts while decreasing data does not solve the backup problem – millions of files/stubs still take hours/days to process
    • The longer this strategy is employed, the more metadata/stubs you maintain, the worse the problem becomes
  • Approach: Bigger is Better. More is Better
    • The philosophy of “buy more” to address growing storage requirements may address growth, but how does it address manageability?
    • Con’s
    • More device means more to manage. How do you organize it?
    • When 80%+ of your data is static, how do you separate it from current/new data without impacting users?
    • More primary storage creates more costs, and more backup/recovery pain
    • Just because a new, larger NAS head is “faster”, doesn’t mean you’ll be able to backup or restore it “faster”.
  • Approach: Tiering
    • By creating different tiers of storage (i.e., FC and SATA) in your environment, perhaps on different devices, you can put data with lower access/priority on lower cost/performing storage
    • Pro’s
    • Helps manage cost by prioritizing data placement
    • Con’s
    • How do you decide what should go where?
    • What if priority or access patterns change?
    • At what level of granularity is this possible? Filesystem (LUN)? Directory? File? Block?
  • Approach: Deduplication
    • Deduplication, combined with compression, can reduce your storage foot print across all your unstructured data
    • Pro’s
    • Deduplication can dramatically reduce the storage footprint for many types of data, promising lower storage costs long-term
    • Con’s
    • Not all data deduplication is created equal. Is it block level, file level, or variable block level? What is the performance impact?
    • More efficient storage of static data is good, but if it’s still in the backup/recovery cycle, have you really addressed the problem?
    • Most solutions today still rehydrate the data during backup. So are you really saving anything for backups? What performance impact does this imply during backup/recovery operations?
  • Approach: Replicate the Problem
    • When backup/recovery activities kill performance on your primary storage device(s), replicate the data (and delta changes) instead.
    • Pro’s
    • Allows you to backup the replication target, instead of source
    • Gets you a DR solution while moving the backup issue offsite
    • Con’s
    • Active and Static data is still mixed, with the same policies and retentions being applied to each
    • Your storage costs have now doubled, and backup is still a (now remote site) problem. Snapshot history helps, but not forever.
  • Solving the Unstructured Data Challenge
  • Solving the Challenges
    • Automate data management tasks to free up IT resources and eliminate downtime
    • Get better utilization out of your storage resources
    • Utilize storage policies to better manage and optimize use of storage devices
    • Easily add and manage storage policies for all devices from a single management console
    • Reduce overall storage costs by 50 to 80%
    • Cut migration times by up to 90% with zero impact to users during migration
    • Reduce backup times and costs by up to 90%
  • Solution: File Virtualization
    • Capacity Balancing
    • Balance data and I/O across multiple storage devices, making the most efficient use of your storage resources
    • Data Migration
    • Automatically migrate data between heterogeneous devices, without impacting user access – no downtime
    • Storage Tiering
    • Intelligently put data on the right type of storage based on metadata policies and aging criteria
  • The Global Namespace (Wikipedia)
    • A Global Namespace is a heterogeneous, enterprise-wide abstraction of all file information, open to dynamic customization based on user-defined parameters. This becomes of particular importance as multiple network based file systems proliferate within an organization—the challenge becomes one of effective file management.
    • A Global NameSpace (GNS) has the unique ability to aggregate disparate and remote network based file systems, providing a consolidated view that can greatly reduce complexities of localized file management and administration. For example, prior to file system namespace consolidation, two servers exist and each represent their own independent namespaces; e.g. erver1share1 & erver2share2. Various files exist within each share respectively, however users have to access each namespace independently. This becomes an obvious challenge as the number of namespaces grows within an organization.
    • With a GNS, an organization can access a virtualized file system namespace; e.g. files now exist under a unified structure, such as ompany.comshare1, share2—where the files exist in multiple physical servershare locations but appear to be part of a single namespace
  • Implementation of a Global Namespace
  • Capacity Balancing
    • Automatically balance capacity across multiple file servers and NAS appliances
    • Make the best use of your current and future storage capacity
    • Eliminate the need to manually rebalance data – use automated, policy driven tools instead
    • Reduce storage costs, management complexities, and eliminate downtime due to maintenance
  • Capacity Balancing
  • Data Migration
    • Move data between storage devices on your schedule, not your users – seamless access to data during migration means no scheduled downtime
    • Transitioning from one generation of technology to another is now a scheduled task, not a 6 month project
    • Keep your vendors competitive – without the pain of data migration projects, your choice of solution comes down to features and costs. Why pay more by being locked in?
    • Automatically, Online, and without disrupting your business, migrate your file infrastructure with zero downtime and complex administrative burden.
  • Data Migration
  • Storage Tiering
    • With multiple tiers of storage in your environment, you now have the power to cost effectively store data based on policies you establish – age, access, type, etc
    • Keep current data on faster, regularly backed up storage, while segregating static, older content that isn’t changing to lower tiers
    • Eliminate backup of over 80% of your data by cycling it out of the regular backup scheme
    • Shrink your backups and related costs, improve recovery windows, and store data on the right tier - creating efficiencies and capital cost savings at multiple levels
  • Storage Tiering
  • Storage Tiering – Granular Value-based Policy
  • The Benefits of File Virtualization
    • Capacity Balancing
    • Utilize your existing storage assets better
    • Optimize access performance and eliminate issues that impact user productivity (scheduled and unscheduled)
    • Pool the resources of servers and NAS appliances you already own, achieving better asset utilization and realized cost savings
    • Eliminated downtime and reconfiguration activities. Enable non-disruptive data management
    • Create process efficiencies in your organization through the elimination of administrator “shell-game” tasks.
  • The Benefits of File Virtualization
    • Data Migration
    • No client reconfiguration – with a virtualized, global name space, the location of data is policy and administrator controlled. Moving data around does not impact access to it.
    • Move entire file systems or individual files around without interrupting access to them.
    • Reduce the overhead of migration projects with a streamlined, consistent, automated solution.
    • No stubs. Ever. Leaving stubs or pointers around in the filesystem does not solve the backup problem, and long-term this can become a management headache!
  • The Benefits of File Virtualization
    • Storage Tiering
    • Reduce your storage costs by putting data on the right (cost) tier of storage – automated, policy driven.
    • Reduce your backup volumes dramatically be moving aged data out of the daily/weekly/monthly backup cycle. Back static data up once a quarter or less, with proper retention practices.
    • Tiering without Administrative overhead. Automate the challenge of what goes where, and save yourself the trouble.
    • Improve your storage utilization across all tiers and devices, automatically, as granular as the file level – without stubs!
  • Storage Tiering – Optimizing Backups
  • Case Study: Vale Inco
    • Challenges
    • Backup Windows
    • Impact to production during backups
    • Too much data, high growth, archiving partially implemented
    • How we helped
    • Information Life Cycle Management Assessment
    • Reviewing all aspects of data in their environment
    • Current State Analysis
    • Future State Recommendations
    • Technology and Design Recommendations
    • and now….. Tom Morrier!
  • Introduction
    Tom Morrier
    Vale Inco Limited
    Once Storage Administrator
    Now Project Manger
    Still Secretly the Storage Administrator
  • Killing 5 Birds with one Appliance
  • Our Problem(s)
    Extremely large volumes of data growing out of control
    Millions of files, many of them under 1k in size
    Aging End Of Life Data Archiving solution
    5 day backup times
    Backups were running during business hours
    Small change windows to take outages in
    24 hour operation that does not like down time.
  • The Solution
    Two pair of ARX 4000’s
    1 pair in our Primary DC
    1 pair in our Largest Site
  • How We Used the ARX
    Tier 1
    Tier 2
    5 TB
    3 TB
    2.5 TB
    4 TB
  • The Results
    Backup Times
    98 hours went to 28 hours
    5 streams have been turned into 14 streams 4 of witch only happen once a month
    In primary DC backup times went from 110 hours for 1 full backup to 21 hours over 5 streams for the same full
    Archiving has been undone in one site and under way in the other
    Re-archive based on change through tiering
    All data moves were done during business hours without impacting user data access
  • Some Bonus Results
    Tape usage has gone down thanks to tiering
    Data types can be isolated
    Old systems still accessing network storage surface
    Strange connections get identified
    MP3 library gets a boost !
  • Questions
  • How we can help you – two approaches
    • Information Life Cycle Assessment
    • We’ll look at all aspects of your data environment (online, nearline, offline), processes, and applications, and provide guidance on how to get from “current state” to your desired “future state” given the challenges specific to your business.
    • Unstructured Data Targeted – “Quick Assessment”
    • We’ll target your file servers and NAS appliances with tools specifically designed to capture and analyze your unstructured data environment, provide recommendations on design and TCO/ROI, and business justification on how, where, and to what impact a File Virtualization solution would have for you.
  • Unstructured Data Assessment – what’s involved?
    • Discovery
    • Through a ½ day workshop, we will gather information about your processes, policies, infrastructure and challenges.
    • A data collection tool will be installed (non-invasive) to capture the metadata for the target filesystem shares (server/NAS)
    • Analysis
    • The captured data will be analyzed to determine what efficiencies would be realized, and the best design case
    • Presentation of Results
    • We will present the results of the analysis, along with recommendations on how to realize the benefits of file virtualization
    • A mapping of benefits to your specific challenges will help build an ROI/TCO for business justification
    • Specific design recommendations and costs will be presented.
  • The Tool – F5 Data Manager
  • Unstructured Data Assessment
    • Results
    • Justification through soft and hard dollar cost savings will be presented to help establish a business case for deployment in your environment
    • Costs
    Free to Session Attendees
    Because we believe this solution can be proven out as a cost effective, highly impactful way of managing unstructured data growth, we are presenting this 2 ½ day engagement free of charge.
  • Next Steps
    • Unstructured Data Assessment
    • Learn how file virtualization can benefit your environment
    • Free of charge for attendees who complete the survey
    • Inquire for additional information (see handout)
    • Complete your Survey
    • Be sure to complete the survey for your chance to win a Netbook!
    • Beer Tasting
    • Join us for a sampling of Duggin’s Beer