Data continues to grow exponentially – especially with the advent of social content. Approximately 70% of data is unstructured. This impacts on storage costs and management, Data Protection, and SLAs.
New deployment options such as cloud provide alternatives but how do you know what you should move to the cloud?
2. PAGE2
www.DocuLynx.com
Challenges
• Data continues to grow
exponentially – especially
with the advent of social
content
• Approximately 70% of data is
unstructured
• Impact on
- Storage costs and
management
- Data Protection SLAs
• New deployment options
such as cloud provide
alternatives
Corporate knowledge has
grown from 75 Exabyte in
2007 to 580 Exabyte in 2011
or a CAGR of 67%
- Forrester
3. PAGE3
www.DocuLynx.com
• What files ?
• How ?
• Approx. 65% of the total volume
is rarely used
• Challenge:
– Identify the rarely accessed data
– Transport to target
– Keep available to users /applications
• Results:
– Optimized storage utilization
– Reduced operational and
administrative costs
ON PREMISE STORAGE
Typical Scenario
Window
ServerNAS Filer
NetApp
4. PAGE4
www.DocuLynx.com
you could classify information based on external
attributes: analyze, monitor, report?
you could select and move files based on centrally defined
policies to second tier storage in the cloud?
you could make it completely transparent?
you could address compliance and access controls
through classification based on content analysis
Challenges
5. PAGE5
www.DocuLynx.com
• You can migrate files – transparent for the user - into
multiple storage locations
• Users and applications will not notice any difference
• With cloud storage, you can integrate the public
cloud with on-premises storage to:
• reduce datacenter infrastructure complexity
• maximize data protection
• reduce overall storage total cost of ownership (TCO) by
60-80%
• provision storage more rapidly to reclaim IT time cycles
• With the optional DocuClassify feature, the selection
of files can be done based on its content and
business values
Enabling the fast introduction of integrated cloud storage
without disturbing users and IT operations
Intelligent Content Management
More Intelligent Content Management
Window
ServerNAS Filer
NetApp
Cloud Storage
6. PAGE6
www.DocuLynx.com
• Analysis & Classification
Analysis and classification of the file inventory
with customer-defined rules.
• Migration
Creation of a copy on secondary storage in
accordance with rules based on criteria such
as meta-data or content-based classification.
• Release
Exchange of the original file with a small
reference file which looks and acts just like the
original.
• Recall
Restoration of the original file from secondary
storage upon request.
After the initial set-up, these processes run independent and transparently in the background.
Analysis &
Classification
ReleaseRecall
Migration
Identifying
the right files
Moving
files transparently to
secondary storage
Storage Optimization –
Functional principle in four processes
7. PAGE7
www.DocuLynx.com
Purpose of the assessment:
•Analyze the current file structure
•Illustrate results in multiple ways for examination
•Make recommendations regarding future HSM or archiving options
•Make recommendations about optimal usage of file classification
Quantitative Analysis
Number and volume of:
•Server
•Disks
•Shares
•Directories
•Capacity Growth
Qualitative Analysis
•Distribution by Age
•Distribution by Size
•HSM/Archiving Simulation
•File types
•Users
•Groups
•Duplicates
File Assessment–
Purpose and scope
8. PAGE8
www.DocuLynx.com
Analysis –
Date Analysis & Assessment
Quantitative
• Server
• Disks
• Shares
• Directories
• Capacity Growth
• Forecast
Qualitative Analysis
• Distribution by Age
• Distribution by Size
• HSM Simulation
• Analysis of Data Types
• User Distribution
• Group Distribution
• Duplicates
12. PAGE12
www.DocuLynx.com
• Reference files are transparent to end-users and applications
• No change in file name, extension, size or date
• Reference files act like files, meaning they can be copied, moved, renamed…
• Applications can work with the archived files without any customisation requirements
The only visible changes:
Attribute “O” for offline files and a clock symbol on the icon -> NTFS native feature
Reference Files
A Small But Important Detail
13. PAGE13
www.DocuLynx.com
Migration Adapter for NetApp
Providing HSM functionality for NetApp filers
CC Node
Responsible for all DocuFile Migration Adapter
operations and communication with the
DocuSuite Control Center
FPolicy Node
Responsible for file recalls via the FPolicy
Server
FPolicy Server
Detects access to files residing on the ONTAP
file system of the NetApp file server. In case a
reference file is accessed, the DocuFile
Migration Adapter initiates a recall from
secondary storage
Typical Deployment and Main Components
14. PAGE14
www.DocuLynx.com
• Meta data based classification enables high
performance on high volumes of data
• Pattern Matching is performant and exact when
using classification criteria that can be expressed
in regular expressions
• Machine Learning with linguistic-statistical
analysis has universal applicability and delivers
strong results with fuzzy classification criteria
• Partnerships with
– KPMG
– Deloitte
– Fontis International
Content Based Classification
Enhances migration precision and effectiveness
metadata-
based
metadata-
based
......
Machine
Learning
Machine
Learning
Pattern
Matching
Pattern
Matching
15. PAGE15
www.DocuLynx.com
• Critical information that should
not be moved to the cloud.
• Sensitive information that needs to
be encrypted before moving it to
the cloud.
• Important information that needs to
be archived.
• Passive information that is little
used and could be moved to IRM
immediately.
DocuClassifyDocuClassify
Identifying the Right Files
Only Classification Enables Automation
16. PAGE16
www.DocuLynx.com
Keep it simple
The Classification Cube® is a concept for an
universal classification scheme for companies of
all kinds which…
– is easy enough to be implemented
quickly,
– is flexible enough to be adjusted and
expanded in a simple way and
– meets the most significant and most
urgent requirements.
Our cube is the key guideline which helps to
finish the project successfully together with the
document classes.
A cube as the solution
Easy Implementation with the Classification Cube®
A Universal Technical Approach
17. PAGE17
www.DocuLynx.com
(1) Any information object brings
Metadata (e.g. location, name, creator
etc.) with it. The actual set can vary
with the type of information.
(2) Classification rule assigns a document
type to the information object. Each DT
comes with a set of properties which
are used by the classification rule. The
property values enrich the metadata of
the information object.
(3) Any application can – based on the
classification expressed in the
properties values of a specific
information object – trigger an action
(e.g. archiving) for this object. This
could be other DocuSuite modules or 3rd
party applications.
Document
Class
Document
Class Classification
Properties
Classification
Properties
MetadataMetadata
(1)
Classification
Rules
Information
Object
(File, Mail, SP, ...)
Information
Object
(File, Mail, SP, ...)
(2)
Property
Values
Property
Values
(3rd
Party)
Actions
(3rd
Party)
Actions
(2)
(3)
Document Class Metadata Properties Actions
Invoice Location Retention If (retention>0) then archive
Engineering Plan (CAD) Location
User
Project ID
Restricted Access
If (Restricted Access) then block access on mobile devices
Document Classes and the Classification Cube
The Document Class Model
18. PAGE18
www.DocuLynx.com
BC
AC
Security SettingSecurity Setting: Confidential: Confidential
ComplianceCompliance : 5 year retention: 5 year retention
Summary Project Document
Process of Tagging to the Information Object thru the ADS or Security Descriptor
20. PAGE20
www.DocuLynx.com
Introduce Cloud Storage using
intelligent content
management as an enabler
• Identify files to offload
based on metadata such as
age, location, type, ...
• Immediately start to offload
your primary storage
• Harvest your benefits e.g.
reduced storage TCO, better
backup and DR
Add automatic classification for
even more benefits
• Introduce file classification
classes based on business
demand
• Use self-learning, trainable
classifier to automatically
classify all files
• Automatically attach business
value to every file and use it
as a trigger to offload files
Leverage classification for
security and governance task
• Introduce Dynamic Access
Control to ease access rights
management
• Increase security by
automatic encryption of
confidential files
• Boost effectiveness of any
DLP solution
Starting small you can expand the capabilities of this system
from storage optimization to classification based information
governance.
Starting small you can expand the capabilities of this system
from storage optimization to classification based information
governance.
Progression:
Introduction and Expansion
21. PAGE21
www.DocuLynx.com
Intelligent Content Management
•Significant cost reduction in storage
infrastructure
•Easy implementation with minimal admin
effort
•High degree of automation
•Permanent availability of all relevant data
•Highly scalable
•Fulfillment of legal requirements
•Higher transparency of unstructured data
•Optimize use of resources and investments
•Implementation of rule-based Information
Lifecycle Management (ILM)
Classification
•Automated classification fulfills regulatory
compliance and information security
requirements relative to encryption and access
control
•Mitigate risk due to knowledge of the value of
your information
•Boosts in productivity due to higher
transparency levels
•Movement toward value-based information
management
Company Confidential
Summary Advantages
22. PAGE22
www.DocuLynx.com
• Large medium-sized company based in the
automation technology sector
• Head of Information Management Server
• 80 TB on Windows Server 2008 R2 and 2012
• Few relevant datasets should be moved to an
external service provider
• Motivation: optimization of costs and change
in awareness within the business
departments
• No transparency in terms of data relevance
Challenge
• Initial on-site visit: More precise analysis of
the demand, especially in terms of granularity
• Decision: classic HSM (without classification)
vs. intelligent HSM (with classification)
• Presentation file assessment as starting point
for analysis
• Solution:
– migration for transparent moving of data
– classification (external attribute analysis,
reporting) and optional content classification
ApproachCustomer
Concrete Customer Request With Regard to Classifiction
Storage Optimization through moving of “irrelevant” data
23. PAGE23
www.DocuLynx.com
– Railway Engines
– Trams
– Train Engines
– High Speed Trains
– Transport Solutions
Rail Systems
– Trucks
– Buses
– Engine Brakes
– Special Products
Commercial Brake Systems
The Knorr-Bremse Group is the
world’s leading manufacturer of
braking systems for rail and
commercial vehicles, vehicle air
conditioning systems and
torsional vibration dampers
Customer Reference
24. PAGE24
www.DocuLynx.com
– 120 TB total storage
– 16.000 users, 120 locations
– 8000 SAP users, 1000 CAD users
Key metrics:
– 6 NetApp V-Filer, 1 Windows File Server
– Growth rate 30% per year
– 24 TB ( 26,4 Million Files) managed by
DocuFile
– Tier 1 Storage growth -> BU / Recovery
cost increase
– Only a fraction of the files is in active use
2012
24 TB
2013
32 TB
2014
43 TB
2015
55 TB
Active Active Active Active
Typical Challenge
25. PAGE25
www.DocuLynx.com
Secondary Storage Tier
Primary Storage
– 39% of the managed storage
volume has been migrated
– Individual quota between
Server/Filer varies between 21%
and 62%
– Successfully increased migration
quota by fine tuning the migration
policies
– Back-Up and Snapshot performance
vastly improved
21 – 62% migrated
Results
26. PAGE26
www.DocuLynx.com
DocuLynx Portfolio
Addressing Informance Governance and Lifecycle Challenges
DocuLynx Portfolio
Enterprise Information Infrastructure
Solutions
• DocuLynx 360- a Content Services Platform for
Smart Applications
• Active Information Archiving Platform
o DocuSuite
o DocHarbor/Haven
• DocuClassify
• DocuSearch
Enterprise Information Application
Solutions
• AP Automation
• E-Signature
Information Conversion Services
(eg.Scanning)
Cover today
Oftne seen at our customers
What files
How transport
Using content based classification – later – much higher precision – compliance
Metadata based classificatio – fast
Content based classification – higher precision
Handle files based on the business value – not based on storage location and file naming – may be wrong anyway
Dg file - external meta data
Optional dg classification – content ( regex / key words /document cklasses)
Alle Prozesse laufen im Prinzip zeitgesteuert.
Bei der Migration gibt es zwei Methoden:
Migration nach Archivierung
Ereignisgesteuerte Migration (Bei Erreichen von Füllständen)
Part of a policyCreating efficient filters for migration
Levels
Fiel sets ( type of files ) customized
Typical all HSM meta data; Group membership, User name
Most effective + precise -> classification Property and values
Dg classification is an add on to dg file. Dg file can operate stand alone
Filter can be based not only on external meta data
Also on classification Properties / Stanrads
FCI is a Standard by Microsoft back in 2008
Further enhanced with server 2012
There are different methods to classify files
Even manual classification – has some significant downsides
dg classification allows flexible configuration of different methods
In our experience: meta data based for high performance – providing fst results
Content based using training sets for high precision, automatic and in teh background
After classification comes the follow on process
Example for our demo is archiving based on classification properties /values
We typically implement in phases
Phase 1 – assessment and use external metadata – fast ersults
Phase 2 – use automatic classification for higher precision _ handel files according to busienss value / not only based onname and location