Data grows at a rate of 40-60% each year, but as capacity is expanded, redundant, obsolete and trivial user data - ROT - is clogging corporate networks resulting in unnecessary risk and expense.
Depending on industry, 40-70% of this data has no business value. Harnessing ROT growth will not only control expenses by reducing or eliminate storage upgrades, but also minimize risk.
Index Engines data profiling software supports ROT analysis and data disposition that ranges from terabytes to petabytes of enterprise content. It provides search, reporting, disposition and defensible deletion of data.
http://www.indexengines.com/storage-management/solutions-for/rot-analysis
3. Unstructured Data Challenges
▪ No knowledge of what exists
▪ What has business value
▪ What has regulatory preservation requirements
▪ What has no value (ROT)
▪ Can not support data policies
▪ Tier content to appropriate repository (cloud)
▪ Archive what needs to be preserved
▪ Find data to support legal & compliance
4. Cost of Storage
▪ 100TB costs $955,500 annually
▪ Buying storage capacity is cheap
▪ Maintaining it is expensive
5. Index Engines Support for Data Clean Up
Know IT
Manage IT
Govern IT
▪ Data Classification & Profiling
▪ Enterprise class indexing software
▪ Metadata, full text, pattern/regex/PII, security ACLs, activity logs
▪ Reporting & classification on user files and email
▪ Defensible Disposition
▪ Delete, copy or migrate
▪ Integrated archiving & preservation
▪ Defensible audit trails and logs
▪ Automation and Monitoring
▪ Ongoing monitoring
▪ Automated management based on policy
▪ Instant access to personal data
6. Classify Data for Simplified Access and Management
Classify data by Active Directory group
membership
Example: Client Services, HR
Use metadata to filter on data that
typically contains personal information
Example: Documents, email, etc.
Tag this content for easy access and
future queries
Example: PII/RegEx or persons name
▪ Create an automated data map based on a
range of criteria
▪ Classify content to focus on areas that are
highly suspect for personal information
▪ Allows for more targeted and simplified search
and audits
7. Classification for ROT Analysis and Clean Up
▪ Clean Redundant, Obsolete & Trivial content
▪ ROT can comprise up to 40% of network data
▪ Classify data by:
▪ Duplicate content: redundant data
▪ Aged data: not accessed in more than X years
▪ Abandoned data: owned by ex-employees and not accessed
▪ Non-Business Multimedia files: photos, videos, audio (iTunes)
▪ Trivial files: log files, iTunes music, personal vacation photos, etc.
▪ Defensible disposition:
▪ Migrate non-active data that should be preserved to cloud
▪ Delete content with no business value maintaining full audit trail
▪ Archive non-active data with personal information for further investigation
8. Legal Mktg Fin HR Oper Mfg
Percentage 17% 18% 12% 28% 8% 17%
Capacity (TB) 850 900 600 1,400 400 850
# Files (B) 42.5 45 30 70 20 42.5
8 2 7 12 22
5
92 98 93 88 78
95
0
50
100
150
LEGAL MKTG FIN HR OPER MFG
Active Data
Last Accessed in Last Year
1 Year > 1 Year
17%
18%
12%
28%
8%
17%
Capacity by Department
Total Capacity 5,000TB
Legal
Marketing
Finance
HR
Operations
Manufacturing
0
20
40
60
80
100
Legal Mktg Fin HR Oper Mfg
Abandoned Data
Ex-Employee based on Active Directory (TBs)
Accessed in Past Year Not Accessed in > 1 Year
ROT Analysis
Classification of Redundant, Obsolete & Trivial Content
Legal ,
248
Mktg, 270
Fin, 240
HR, 560
Oper, 123 Mfg,
170
Redundant Content (TBs)
0 1000 2000 3000
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Obsolete Content by
Last Accessed (TBs)
0
20
40
60
80
100
Logs Video Photos Music Other
Trivial Files (TBs)
1,256
35
989
768
49 88
Email Audit
# of PSTs on Shared Network by Department
Legal Mktg Fin HR Oper Mfg
Note: Charts generated using 3rd party software based on Index Engines data
9. Index Engines Reporting and Classification Features
▪ Supports petabyte-class data center
environments, 1% index footprint for
metadata
▪ Federated search, reporting and
archiving for large scale, distributed data
▪ High speed indexing, reaching up to
1TB/hour/node
▪ Active Directory integration to group
data by departments
▪ Tagging to classify data based on any
criteria
▪ Flexible queries and reporting on:
▪ Metadata
▪ Full text and keyword
▪ Boolean search including proximity
▪ Pattern/PII including credit cards, bank
routing, social security, etc.
▪ Regular expression, POSIX basic and
extended
▪ Security ACLs, read/write/browse
permissions
▪ Activity logs reporting on user access to
specific files
10. Analyze and Classify
Copyright Index Engines Inc. 2005 All rights reserved. 10
• Active and dark data (old reports and research data)
• Ensure it is available and accessible by those who need itData of Value
• Data has outlived its business value
• Migrate to cheaper storage environment
Aged/Redundant
Data
• Email and files containing PII, PSTs, contracts, etc.
• Migrate to archive for long term preservationSensitive Data
11. Analyze and Classify
20 to 40% Redundant
10 to 20% Aged Data
18 to 22% abandoned data (owner no longer exists)
4 to 10% contains sensitive content
5 to 8% contains personal multimedia content
20 - 40% Data of value/Active
Copyright Index Engines Inc. 2005 All rights reserved. 11
13. Departmental Drill Down
LegalOperHRFinMktgMfg
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Documents Containing PII
17
Documents Containing PII
22
Documents Containing PII
857
Documents Containing PII
1,232
Documents Containing PII
5
Documents Containing PII
217
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
14. Defensible Disposition
▪ Based on data profile disposition includes the following:
▪ Cloud storage – data with little or no value
▪ Deletion – redundant content owned by ex-employees, not accessed in 3+ yrs
▪ Archive – aged sensitive content
▪ Encryption – active sensitive content
▪ Personal Multimedia – notification of management
▪ All migrations are logged for defensible audit trail of disposition
▪ As data is migrated all metadata remains intact
15. Ongoing Monitoring and Automation
▪ The GDPR requires “monitoring compliance with the GDPR and other Union or
Member State data protection laws, including managing internal data protection
activities, training data processing staff, and conducting internal audits.”
▪ Organizations will need to show compliance with the GDPR through the use of
technology and sound policies
▪ Examples of Index Engines capabilities:
▪ Store queries and polices
▪ Automated policy search and identification (email notifications)
▪ Automated reporting with csv/text file for use in 3rd party reporting tools
▪ Automated indexing and preservation/archiving
▪ Activity logs/audit trails
16. Index Engines Key Advantages Data Management
Enterprise data insightKnow IT
• The only enterprise class indexing platform on the market today
• Supports all classes of data from primary storage to backup content
Streamlined dispositionManage IT
• Classify and report on content across the data center
• Flexible access and disposition options to manage effectively
Take control of dataGovern IT
• Integrated archiving and preservation
• Support for legal and security policies
17. Next Steps…
▪ Contact Index Engines for our Data
Classification eBook
▪ Index Engines
www.indexengines.com
info@indexengines.com