Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Content Lifecycle Management - Best Practices for Governance, Archiving, Compliance & Mining Unstructured Data - Caringo and Alfresco Software - Presentation Transcript

    1. © 2009 Caringo, Inc. Access Store Distribute Caringo & Alfresco Complete Content Lifecycle Solution
    2. Managing File-Based Data as Content
      • Storing file-based data is as much an information management problem as it is an issue with the storage technology
      • The point where business and IT needs converge
      • Business Need: Protecting and preserving intellectual property and business critical records for future benefit
      • IT Need: Implementing a cost-effective infrastructure that ensures the availability and integrity of file-based data
    3. Realities of File-Based Data
      • Unstructured data
      • Over 95% is “unstructured” 1
      • Massive file growth
      • Up to 120% per year 2
      • Low reuse of files 3
      • 90% never accessed after creation
      • Only 65% of files accessed are only accessed once 3
      • Aging files occupying expensive storage
      • Software needed to migrate files to secondary storage
      • Added cost and complexity
      • Must meet compliance mandates
      • Secondary storage tier required
      90% 10% 65% Files never accessed again accessed once 1 IDC, The Expanding Digital Universe 2 The Economic Impact of File Virtualization, IDC 3 Measurement and Analysis of Large-Scale Network File System Workloads, UC Santa Cruz accessed
    4. File Storage Challenges
      • Today’s storage requirements are different
        • Millions and billions of files on thousands of large disk drives
      • File systems simply cannot stretch any farther
        • The weight of layers of complexity and virtualization makes them brittle
        • They hit maximums on file size and number of files and servers
        • They encounter folder and drive letter problems
      • Newer file systems are high-maintenance
        • Even with layers of virtualization, underlying file systems must still be managed, migrated, backed up and maintained
        • Requires highly skilled administrators
      • Volume of file data is major information management problem
        • Folder/Sub-folder/file name becomes cryptic at scale (millions/billions)
        • File systems provide no informational context for files
      • Majority of capacity in commercial sector born as file-based, rich digital content
      • 5 key infrastructure requirements (Enterprise Strategy Group)
        • Infinite scale – in real-time, dynamically, no human intervention
        • No boundaries – expand beyond walls of IT department
        • Operationally efficient – leverage commodity components, policy-based automation
        • Self-Management – auto re-balance and optimize, no human intervention
        • Self- Healing – withstand failures, automatically adjust/heal itself
      • Object-based storage (IDC)
        • 4 tests/criteria for technology
          • Self-referencing – Unique address for each file/object
          • Described by metadata – Beyond standard file system
          • Location independence
          • Dynamic presentation – Not fixed to a traditional tree format
          • Intelligent replication/distribution
      Next Gen: Internet Scale and Object-Based
      • Effectively manage content from creation through storage and expiration
        • Alfresco2Caringo interface available at Alfresco Forge
        • Developed by XeniT
          • Alfresco and Caringo Partner
      • Alfresco ECM manages business process & workflow
      • Caringo stores and protects business-critical content
      • Ensure content integrity and preservation
        • Preserve context of content for the long-term
        • Accessible and available well into the future
      Convergence: ECM & Content Storage
    5. Covering the Complete Storage Workflow
      • Comprehensive solution to access , store and distribute
        • HTTP access for cloud storage and Web 2.0
        • Complete business solutions
        • Continuous data availability
        • Long-term data protection
        • Intelligent data replication for content distribution and disaster recovery
      Content File Server (CFS) Integrated Solutions Native CAStor
    6. CAStor Software Key Features
      • Runs on affordable and standard x86 server hardware
        • Delivers flexibility and choice
      • Massively scalable storage cluster
        • Start small and scale to billions of files or objects
        • As you grow from TBs to PBs, access bandwidth also grows
      • Increase capacity seamlessly
        • No disruption in operations or data availability. No migration!
      • Manages and repairs itself automatically and faster than RAID
      • Local and Wide Area Replication for DR and backup
      • Data protection for regulatory compliance and internal governance
        • WORM, integrity checking, authenticity, object-level retention, LifePoints
      • Rich metadata support
        • Attach and persist descriptive metadata with objects
        • Content in Context
      Node 1 n 2 3 GigE 900 4 CAStor Cluster
    7. Early File Management Challenge
      • 8dot3 in DOS days
        • Eight characters + extension
          • Example: C:Directorydocument.doc
        • Organizational challenge for even hundreds of files
          • Significant position coding schemes
          • Include fully qualified path and name on documents
          • Law firms still do this today
        • System metadata only, basic
          • Non-descriptive, not useful in organizing files
    8. Incremental Advancement
      • Long file names introduced in Windows mid-90s
        • Promise of better identifying files for organization and finding
      • 8dot3 turned into this:
        • My DocumentsThis is my document.doc
        • and…
        • My DocumentsThis is my document v2.doc
      • Folder/sub-folder hierarchy is cumbersome
      • File counts now in the millions and beyond
        • Remains difficult to manage especially over time
        • Millions will turn into billions
      • File names still lack informational value
    9. CAStor Content Storage Software Object-Based
      • CAStor ideally suited for file-based data storage
      • Supports rich metadata tags
        • System generated metadata
        • Custom metadata
        • Descriptive information lives with the file
      Metadata 101000101010100111010101100010110100…110010 UUID HTTP/1.1 200 OK Date: Thu, 26 Jun 2008 21:26:34 GMT Server: CAStor Cluster/2.2 CAStor-Application-Name: FinalCutPro CAStor-Create-Date: 2008-06-26 21:26:14.687000 Castor-System-Cluster: Internet Demo Cluster Castor-System-Created: Thu, 26 Jun 2008 21:26:20 GMT Content-Disposition: inline; filename=Sports %Segment%206-26-08.mxf Content-Length: 8619354 Content-type: application/mxf lifepoint: [Thu, 03 Jul 2008 21:26:14 GMT] reps=2, deletable=True lifepoint: [] delete Replica-Count: 2 Content Address File Data
    10. CAStor Content Objects
      • Supports all types of digital content
      • Metadata values stored are specific to each individual type
      • Vast 128-bit address space
        • Never run out of UUIDs
        • Billions of objects
      • Define metadata values to drive replication and distribution
      Metadata 101000101010100111010101100010110100…110010 UUID1 Content Address Video Metadata 111000101010100111010101100010110100…101010 UUID2 Image Metadata 101100101010100111010101100010110100…110111 UUID3 Audio Metadata 101011101010100111010101100010111110…110011 UUID4 Doc
    11. Metadata Enables Intelligence Filter and Rules Engine HTTP/1.1 200 OK Date: Thu, 26 Jun 2008 21:26:34 GMT Server: CAStor Cluster/2.2 CAStor-Application-Name: SimpleCASg CAStor-Create-Date: 2008-06-26 21:26:14.687000 Castor-System-Cluster: Internet Demo Cluster Castor-System-Created: Thu, 26 Jun 2008 21:26:20 GMT Content-Disposition: inline; filename=Car%20Chase%206-26-08.mxf Content-Length: 86193452 Content-type: application/mxf lifepoint: [Thu, 03 Jul 2008 21:26:14 GMT] reps=2, deletable=True lifepoint: [] delete Replica-Count: 2 Metadata Filtered for specific value(s) Rule(s) fire when condition met If Content-type = MXF then Replicate to DR Facility
    12. Intelligent Content Replication and Distribution
      • CAStor Content Router (CR)
      • Policy-based replication to geographically separate sites
      • Policies driven by administrator-defined rules for specific metadata
      • Multiple replication and distribution topologies supported
        • 1:1, 1:M, M:1, M:M
        • Customize to meet specific needs
      • Replicate some or all files
      • Fully automated to reduce management effort for file replication
      Video
    13. Content Relationships in Storage
      • Relate elements of a specific project in an anchor stream
        • Simple list of UUIDs
        • Video, key image, audio, script
      • Add elements through business process/workflow
      • Persist relationships over the long term
      Mutable Metadata UUID UUID1 UUID2 UUID3 UUID4    UUID n Anchor Stream Metadata 101000101010100111010101100010110100…110010 UUID1 Content Address Video Metadata 111000101010100111010101100010110100…101010 UUID2 Image Metadata 101100101010100111010101100010110100…110111 UUID3 Audio Metadata 101011101010100111010101100010111110…110011 UUID4 Doc
    14. Caringo: Unified Content Infrastructure
      • Investment Protection
        • Runs on standard x86 server hardware
        • Add new generation server hardware at any time without disruption
      • Cost-effective Scaling
        • Add capacity without interruption or need to provision storage
        • Scale from Terabytes to Petabytes in a single cluster
      • Operational Efficiency
        • Self-managing and self-healing cluster minimizes administrative intervention
      • High Performance Object Storage
        • Easily address performance needs for small and/or large file workloads
      • Data Protection & Preservation
        • Archive unstructured data for the long-term and address regulatory compliance
      Content File Server (CFS) Integrated Solutions Native CAStor
      • A Winning Combination for Managing the Content Lifecycle
      • Experience the Solution Today
      • Get 4TB of CAStor Software Free
      • Go to http://www.caringo.com/downloadCAStor.html
    15. Training 9 Thank You! Access Store Distribute

    + Alfresco SoftwareAlfresco Software, 4 months ago

    custom

    619 views, 0 favs, 0 embeds more stats

    See the full webinar here: http://www.alfresco.com/ more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 619
      • 619 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 42
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories