SlideShare a Scribd company logo
1 of 17
Cleaning up Redundant, Obsolete and Trivial Data to Reclaim Capacity and Manage Risk
Data Classification and Disposition
Index Engines Introduction
▪ Enterprise Class Information Management Platform
▪ Purpose-built, high-speed indexing for global data centers
▪ Scalable platform that supports petabytes of unstructured data and email
▪ Only solution to support both network and backup data sources
▪ Find, manage and govern data based on policies
▪ Corporate Profile
▪ Private company headquartered in Holmdel, NJ
▪ Founded in 2004
▪ Partnered with Dell EMC, Amazon/AWS, EY, FTI
▪ Patented technology
▪ Clients include: JPMC, Citi, Barclays, TIAA-CREF, State of CA, DOJ, Catholic Health, Cincinnati Children’s, Qualcom, Merck
© Index Engines Inc. All Rights Reserved. 2016 2
Unstructured Data Challenges
▪ No knowledge of what exists
▪ What has business value
▪ What has regulatory preservation requirements
▪ What has no value (ROT)
▪ Can not support data policies
▪ Tier content to appropriate repository (cloud)
▪ Archive what needs to be preserved
▪ Find data to support legal & compliance
Cost of Storage
▪ 100TB costs $955,500 annually
▪ Buying storage capacity is cheap
▪ Maintaining it is expensive
Index Engines Support for Data Clean Up
Know IT
Manage IT
Govern IT
▪ Data Classification & Profiling
▪ Enterprise class indexing software
▪ Metadata, full text, pattern/regex/PII, security ACLs, activity logs
▪ Reporting & classification on user files and email
▪ Defensible Disposition
▪ Delete, copy or migrate
▪ Integrated archiving & preservation
▪ Defensible audit trails and logs
▪ Automation and Monitoring
▪ Ongoing monitoring
▪ Automated management based on policy
▪ Instant access to personal data
Classify Data for Simplified Access and Management
Classify data by Active Directory group
membership
Example: Client Services, HR
Use metadata to filter on data that
typically contains personal information
Example: Documents, email, etc.
Tag this content for easy access and
future queries
Example: PII/RegEx or persons name
▪ Create an automated data map based on a
range of criteria
▪ Classify content to focus on areas that are
highly suspect for personal information
▪ Allows for more targeted and simplified search
and audits
Classification for ROT Analysis and Clean Up
▪ Clean Redundant, Obsolete & Trivial content
▪ ROT can comprise up to 40% of network data
▪ Classify data by:
▪ Duplicate content: redundant data
▪ Aged data: not accessed in more than X years
▪ Abandoned data: owned by ex-employees and not accessed
▪ Non-Business Multimedia files: photos, videos, audio (iTunes)
▪ Trivial files: log files, iTunes music, personal vacation photos, etc.
▪ Defensible disposition:
▪ Migrate non-active data that should be preserved to cloud
▪ Delete content with no business value maintaining full audit trail
▪ Archive non-active data with personal information for further investigation
Legal Mktg Fin HR Oper Mfg
Percentage 17% 18% 12% 28% 8% 17%
Capacity (TB) 850 900 600 1,400 400 850
# Files (B) 42.5 45 30 70 20 42.5
8 2 7 12 22
5
92 98 93 88 78
95
0
50
100
150
LEGAL MKTG FIN HR OPER MFG
Active Data
Last Accessed in Last Year
1 Year > 1 Year
17%
18%
12%
28%
8%
17%
Capacity by Department
Total Capacity 5,000TB
Legal
Marketing
Finance
HR
Operations
Manufacturing
0
20
40
60
80
100
Legal Mktg Fin HR Oper Mfg
Abandoned Data
Ex-Employee based on Active Directory (TBs)
Accessed in Past Year Not Accessed in > 1 Year
ROT Analysis
Classification of Redundant, Obsolete & Trivial Content
Legal ,
248
Mktg, 270
Fin, 240
HR, 560
Oper, 123 Mfg,
170
Redundant Content (TBs)
0 1000 2000 3000
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Obsolete Content by
Last Accessed (TBs)
0
20
40
60
80
100
Logs Video Photos Music Other
Trivial Files (TBs)
1,256
35
989
768
49 88
Email Audit
# of PSTs on Shared Network by Department
Legal Mktg Fin HR Oper Mfg
Note: Charts generated using 3rd party software based on Index Engines data
Index Engines Reporting and Classification Features
▪ Supports petabyte-class data center
environments, 1% index footprint for
metadata
▪ Federated search, reporting and
archiving for large scale, distributed data
▪ High speed indexing, reaching up to
1TB/hour/node
▪ Active Directory integration to group
data by departments
▪ Tagging to classify data based on any
criteria
▪ Flexible queries and reporting on:
▪ Metadata
▪ Full text and keyword
▪ Boolean search including proximity
▪ Pattern/PII including credit cards, bank
routing, social security, etc.
▪ Regular expression, POSIX basic and
extended
▪ Security ACLs, read/write/browse
permissions
▪ Activity logs reporting on user access to
specific files
Analyze and Classify
Copyright Index Engines Inc. 2005 All rights reserved. 10
• Active and dark data (old reports and research data)
• Ensure it is available and accessible by those who need itData of Value
• Data has outlived its business value
• Migrate to cheaper storage environment
Aged/Redundant
Data
• Email and files containing PII, PSTs, contracts, etc.
• Migrate to archive for long term preservationSensitive Data
Analyze and Classify
20 to 40% Redundant
10 to 20% Aged Data
18 to 22% abandoned data (owner no longer exists)
4 to 10% contains sensitive content
5 to 8% contains personal multimedia content
20 - 40% Data of value/Active
Copyright Index Engines Inc. 2005 All rights reserved. 11
Search and Reporting Interface
© Index Engines Inc. All Rights Reserved. 2016 12
Departmental Drill Down
LegalOperHRFinMktgMfg
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
< 1 Year
2 - 3 Years
3 - 4 Years
4+ Years
Last Accessed (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Other
Server Z
Server Y
Server X
Location of Data (TBs)
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Data Types (TBs)
Document
Spreadsheet
Presentation
Other
Documents Containing PII
17
Documents Containing PII
22
Documents Containing PII
857
Documents Containing PII
1,232
Documents Containing PII
5
Documents Containing PII
217
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Other
Server Z
Server Y
Server X
Location of PII (Files)
Defensible Disposition
▪ Based on data profile disposition includes the following:
▪ Cloud storage – data with little or no value
▪ Deletion – redundant content owned by ex-employees, not accessed in 3+ yrs
▪ Archive – aged sensitive content
▪ Encryption – active sensitive content
▪ Personal Multimedia – notification of management
▪ All migrations are logged for defensible audit trail of disposition
▪ As data is migrated all metadata remains intact
Ongoing Monitoring and Automation
▪ The GDPR requires “monitoring compliance with the GDPR and other Union or
Member State data protection laws, including managing internal data protection
activities, training data processing staff, and conducting internal audits.”
▪ Organizations will need to show compliance with the GDPR through the use of
technology and sound policies
▪ Examples of Index Engines capabilities:
▪ Store queries and polices
▪ Automated policy search and identification (email notifications)
▪ Automated reporting with csv/text file for use in 3rd party reporting tools
▪ Automated indexing and preservation/archiving
▪ Activity logs/audit trails
Index Engines Key Advantages Data Management
Enterprise data insightKnow IT
• The only enterprise class indexing platform on the market today
• Supports all classes of data from primary storage to backup content
Streamlined dispositionManage IT
• Classify and report on content across the data center
• Flexible access and disposition options to manage effectively
Take control of dataGovern IT
• Integrated archiving and preservation
• Support for legal and security policies
Next Steps…
▪ Contact Index Engines for our Data
Classification eBook
▪ Index Engines
www.indexengines.com
info@indexengines.com

More Related Content

What's hot

LexisNexis Government Transparency Solutions
LexisNexis Government Transparency SolutionsLexisNexis Government Transparency Solutions
LexisNexis Government Transparency SolutionsMichael Gandy
 
Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"
Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"
Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"Ragnar Heil
 
The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help Niklas Hjorthen
 
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...Jeff Kelly
 
Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera
 
BigData and Privacy webinar at Brighttalk
BigData and Privacy webinar at BrighttalkBigData and Privacy webinar at Brighttalk
BigData and Privacy webinar at BrighttalkUlf Mattsson
 
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DevOps.com
 
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...Jean-Michel Franco
 
Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019
Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019
Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019Magnus Runesson
 
Data breach protection from a DB2 perspective
Data breach protection from a  DB2 perspectiveData breach protection from a  DB2 perspective
Data breach protection from a DB2 perspectiveCraig Mullins
 
General Data Protection Regulation (GDPR)
General Data Protection Regulation (GDPR) General Data Protection Regulation (GDPR)
General Data Protection Regulation (GDPR) Kimberly Simon MBA
 
Data security or technology what drives dlp implementation
Data security or technology  what drives dlp implementationData security or technology  what drives dlp implementation
Data security or technology what drives dlp implementationSatyanandan Atyam
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Steven Meister
 
Data Cleanup Presentation - RecordLion
Data Cleanup Presentation - RecordLionData Cleanup Presentation - RecordLion
Data Cleanup Presentation - RecordLionAndrew Borgschulte
 
How to turn GDPR into a Strategic Advantage using Connected Data
How to turn GDPR into a Strategic Advantage using Connected DataHow to turn GDPR into a Strategic Advantage using Connected Data
How to turn GDPR into a Strategic Advantage using Connected DataNeo4j
 
Database auditing essentials
Database auditing essentialsDatabase auditing essentials
Database auditing essentialsCraig Mullins
 

What's hot (20)

LexisNexis Government Transparency Solutions
LexisNexis Government Transparency SolutionsLexisNexis Government Transparency Solutions
LexisNexis Government Transparency Solutions
 
Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"
Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"
Webinar Metalogix "Auf der Zielgeraden zur DSGVO!"
 
GDPR and Hadoop
GDPR and HadoopGDPR and Hadoop
GDPR and Hadoop
 
The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help
 
Security&Governance
Security&GovernanceSecurity&Governance
Security&Governance
 
Data lake protection ft 3119 -ver1.0
Data lake protection   ft 3119 -ver1.0Data lake protection   ft 3119 -ver1.0
Data lake protection ft 3119 -ver1.0
 
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
 
Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
BigData and Privacy webinar at Brighttalk
BigData and Privacy webinar at BrighttalkBigData and Privacy webinar at Brighttalk
BigData and Privacy webinar at Brighttalk
 
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
 
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...
Enacting the Data Subjects Access Rights for GDPR with Data Services and Data...
 
Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019
Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019
Metadata Driven Access Control in Practice - BigData Tech Warsawm 2019
 
Data breach protection from a DB2 perspective
Data breach protection from a  DB2 perspectiveData breach protection from a  DB2 perspective
Data breach protection from a DB2 perspective
 
General Data Protection Regulation (GDPR)
General Data Protection Regulation (GDPR) General Data Protection Regulation (GDPR)
General Data Protection Regulation (GDPR)
 
Data security or technology what drives dlp implementation
Data security or technology  what drives dlp implementationData security or technology  what drives dlp implementation
Data security or technology what drives dlp implementation
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
 
Data Cleanup Presentation - RecordLion
Data Cleanup Presentation - RecordLionData Cleanup Presentation - RecordLion
Data Cleanup Presentation - RecordLion
 
How to turn GDPR into a Strategic Advantage using Connected Data
How to turn GDPR into a Strategic Advantage using Connected DataHow to turn GDPR into a Strategic Advantage using Connected Data
How to turn GDPR into a Strategic Advantage using Connected Data
 
Database auditing essentials
Database auditing essentialsDatabase auditing essentials
Database auditing essentials
 

Similar to Cleaning up Redundant, Obsolete and Trivial Data to Reclaim Capacity and Manage Risk

Replacing Tape Backup with Cloud-Enabled Solutions by Index Engines
Replacing Tape Backup with Cloud-Enabled Solutions by Index EnginesReplacing Tape Backup with Cloud-Enabled Solutions by Index Engines
Replacing Tape Backup with Cloud-Enabled Solutions by Index EnginesAmazon Web Services
 
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Amazon Web Services
 
What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...Everteam
 
10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePoint10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePointRecordLion
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...DataWorks Summit
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architectureCosta Pissaris
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Enterprise content management (in short)
Enterprise content management  (in short)Enterprise content management  (in short)
Enterprise content management (in short)Anatoliy Arkhipov
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Concept Searching, Inc
 
SharePoint Online vs. On-Premise
SharePoint Online vs. On-PremiseSharePoint Online vs. On-Premise
SharePoint Online vs. On-PremiseEvan Hodges
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Denodo
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information
[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information
[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your InformationAIIM International
 
Tableau Customer Advocacy Summit March 2016
Tableau Customer Advocacy Summit March 2016Tableau Customer Advocacy Summit March 2016
Tableau Customer Advocacy Summit March 2016Mark Wu
 
Achieving Digital Transformation in Regulatory
Achieving Digital Transformation in RegulatoryAchieving Digital Transformation in Regulatory
Achieving Digital Transformation in RegulatoryCary Smithson
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Australia Conference 2018_Building trust, reputation & budget within itam acc...
Australia Conference 2018_Building trust, reputation & budget within itam acc...Australia Conference 2018_Building trust, reputation & budget within itam acc...
Australia Conference 2018_Building trust, reputation & budget within itam acc...Martin Thompson
 

Similar to Cleaning up Redundant, Obsolete and Trivial Data to Reclaim Capacity and Manage Risk (20)

Replacing Tape Backup with Cloud-Enabled Solutions by Index Engines
Replacing Tape Backup with Cloud-Enabled Solutions by Index EnginesReplacing Tape Backup with Cloud-Enabled Solutions by Index Engines
Replacing Tape Backup with Cloud-Enabled Solutions by Index Engines
 
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
 
Solving Content Chaos
Solving Content ChaosSolving Content Chaos
Solving Content Chaos
 
What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...What Are you Waiting For? Remediate your File Shares and Govern your Informat...
What Are you Waiting For? Remediate your File Shares and Govern your Informat...
 
10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePoint10 Things You'll Need to Succeed with Information Governance and SharePoint
10 Things You'll Need to Succeed with Information Governance and SharePoint
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Building the enterprise data architecture
Building the enterprise data architectureBuilding the enterprise data architecture
Building the enterprise data architecture
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Enterprise content management (in short)
Enterprise content management  (in short)Enterprise content management  (in short)
Enterprise content management (in short)
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
 
SharePoint Online vs. On-Premise
SharePoint Online vs. On-PremiseSharePoint Online vs. On-Premise
SharePoint Online vs. On-Premise
 
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information
[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information
[Webinar Slides] 3 Steps to Organizing, Finding, and Governing Your Information
 
Tableau Customer Advocacy Summit March 2016
Tableau Customer Advocacy Summit March 2016Tableau Customer Advocacy Summit March 2016
Tableau Customer Advocacy Summit March 2016
 
Achieving Digital Transformation in Regulatory
Achieving Digital Transformation in RegulatoryAchieving Digital Transformation in Regulatory
Achieving Digital Transformation in Regulatory
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Australia Conference 2018_Building trust, reputation & budget within itam acc...
Australia Conference 2018_Building trust, reputation & budget within itam acc...Australia Conference 2018_Building trust, reputation & budget within itam acc...
Australia Conference 2018_Building trust, reputation & budget within itam acc...
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Cleaning up Redundant, Obsolete and Trivial Data to Reclaim Capacity and Manage Risk

  • 1. Cleaning up Redundant, Obsolete and Trivial Data to Reclaim Capacity and Manage Risk Data Classification and Disposition
  • 2. Index Engines Introduction ▪ Enterprise Class Information Management Platform ▪ Purpose-built, high-speed indexing for global data centers ▪ Scalable platform that supports petabytes of unstructured data and email ▪ Only solution to support both network and backup data sources ▪ Find, manage and govern data based on policies ▪ Corporate Profile ▪ Private company headquartered in Holmdel, NJ ▪ Founded in 2004 ▪ Partnered with Dell EMC, Amazon/AWS, EY, FTI ▪ Patented technology ▪ Clients include: JPMC, Citi, Barclays, TIAA-CREF, State of CA, DOJ, Catholic Health, Cincinnati Children’s, Qualcom, Merck © Index Engines Inc. All Rights Reserved. 2016 2
  • 3. Unstructured Data Challenges ▪ No knowledge of what exists ▪ What has business value ▪ What has regulatory preservation requirements ▪ What has no value (ROT) ▪ Can not support data policies ▪ Tier content to appropriate repository (cloud) ▪ Archive what needs to be preserved ▪ Find data to support legal & compliance
  • 4. Cost of Storage ▪ 100TB costs $955,500 annually ▪ Buying storage capacity is cheap ▪ Maintaining it is expensive
  • 5. Index Engines Support for Data Clean Up Know IT Manage IT Govern IT ▪ Data Classification & Profiling ▪ Enterprise class indexing software ▪ Metadata, full text, pattern/regex/PII, security ACLs, activity logs ▪ Reporting & classification on user files and email ▪ Defensible Disposition ▪ Delete, copy or migrate ▪ Integrated archiving & preservation ▪ Defensible audit trails and logs ▪ Automation and Monitoring ▪ Ongoing monitoring ▪ Automated management based on policy ▪ Instant access to personal data
  • 6. Classify Data for Simplified Access and Management Classify data by Active Directory group membership Example: Client Services, HR Use metadata to filter on data that typically contains personal information Example: Documents, email, etc. Tag this content for easy access and future queries Example: PII/RegEx or persons name ▪ Create an automated data map based on a range of criteria ▪ Classify content to focus on areas that are highly suspect for personal information ▪ Allows for more targeted and simplified search and audits
  • 7. Classification for ROT Analysis and Clean Up ▪ Clean Redundant, Obsolete & Trivial content ▪ ROT can comprise up to 40% of network data ▪ Classify data by: ▪ Duplicate content: redundant data ▪ Aged data: not accessed in more than X years ▪ Abandoned data: owned by ex-employees and not accessed ▪ Non-Business Multimedia files: photos, videos, audio (iTunes) ▪ Trivial files: log files, iTunes music, personal vacation photos, etc. ▪ Defensible disposition: ▪ Migrate non-active data that should be preserved to cloud ▪ Delete content with no business value maintaining full audit trail ▪ Archive non-active data with personal information for further investigation
  • 8. Legal Mktg Fin HR Oper Mfg Percentage 17% 18% 12% 28% 8% 17% Capacity (TB) 850 900 600 1,400 400 850 # Files (B) 42.5 45 30 70 20 42.5 8 2 7 12 22 5 92 98 93 88 78 95 0 50 100 150 LEGAL MKTG FIN HR OPER MFG Active Data Last Accessed in Last Year 1 Year > 1 Year 17% 18% 12% 28% 8% 17% Capacity by Department Total Capacity 5,000TB Legal Marketing Finance HR Operations Manufacturing 0 20 40 60 80 100 Legal Mktg Fin HR Oper Mfg Abandoned Data Ex-Employee based on Active Directory (TBs) Accessed in Past Year Not Accessed in > 1 Year ROT Analysis Classification of Redundant, Obsolete & Trivial Content Legal , 248 Mktg, 270 Fin, 240 HR, 560 Oper, 123 Mfg, 170 Redundant Content (TBs) 0 1000 2000 3000 < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Obsolete Content by Last Accessed (TBs) 0 20 40 60 80 100 Logs Video Photos Music Other Trivial Files (TBs) 1,256 35 989 768 49 88 Email Audit # of PSTs on Shared Network by Department Legal Mktg Fin HR Oper Mfg Note: Charts generated using 3rd party software based on Index Engines data
  • 9. Index Engines Reporting and Classification Features ▪ Supports petabyte-class data center environments, 1% index footprint for metadata ▪ Federated search, reporting and archiving for large scale, distributed data ▪ High speed indexing, reaching up to 1TB/hour/node ▪ Active Directory integration to group data by departments ▪ Tagging to classify data based on any criteria ▪ Flexible queries and reporting on: ▪ Metadata ▪ Full text and keyword ▪ Boolean search including proximity ▪ Pattern/PII including credit cards, bank routing, social security, etc. ▪ Regular expression, POSIX basic and extended ▪ Security ACLs, read/write/browse permissions ▪ Activity logs reporting on user access to specific files
  • 10. Analyze and Classify Copyright Index Engines Inc. 2005 All rights reserved. 10 • Active and dark data (old reports and research data) • Ensure it is available and accessible by those who need itData of Value • Data has outlived its business value • Migrate to cheaper storage environment Aged/Redundant Data • Email and files containing PII, PSTs, contracts, etc. • Migrate to archive for long term preservationSensitive Data
  • 11. Analyze and Classify 20 to 40% Redundant 10 to 20% Aged Data 18 to 22% abandoned data (owner no longer exists) 4 to 10% contains sensitive content 5 to 8% contains personal multimedia content 20 - 40% Data of value/Active Copyright Index Engines Inc. 2005 All rights reserved. 11
  • 12. Search and Reporting Interface © Index Engines Inc. All Rights Reserved. 2016 12
  • 13. Departmental Drill Down LegalOperHRFinMktgMfg < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Last Accessed (TBs) < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Last Accessed (TBs) < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Last Accessed (TBs) < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Last Accessed (TBs) < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Last Accessed (TBs) < 1 Year 2 - 3 Years 3 - 4 Years 4+ Years Last Accessed (TBs) Other Server Z Server Y Server X Location of Data (TBs) Other Server Z Server Y Server X Location of Data (TBs) Other Server Z Server Y Server X Location of Data (TBs) Other Server Z Server Y Server X Location of Data (TBs) Other Server Z Server Y Server X Location of Data (TBs) Other Server Z Server Y Server X Location of Data (TBs) Data Types (TBs) Document Spreadsheet Presentation Other Data Types (TBs) Document Spreadsheet Presentation Other Data Types (TBs) Document Spreadsheet Presentation Other Data Types (TBs) Document Spreadsheet Presentation Other Data Types (TBs) Document Spreadsheet Presentation Other Data Types (TBs) Document Spreadsheet Presentation Other Documents Containing PII 17 Documents Containing PII 22 Documents Containing PII 857 Documents Containing PII 1,232 Documents Containing PII 5 Documents Containing PII 217 Other Server Z Server Y Server X Location of PII (Files) Other Server Z Server Y Server X Location of PII (Files) Other Server Z Server Y Server X Location of PII (Files) Other Server Z Server Y Server X Location of PII (Files) Other Server Z Server Y Server X Location of PII (Files) Other Server Z Server Y Server X Location of PII (Files)
  • 14. Defensible Disposition ▪ Based on data profile disposition includes the following: ▪ Cloud storage – data with little or no value ▪ Deletion – redundant content owned by ex-employees, not accessed in 3+ yrs ▪ Archive – aged sensitive content ▪ Encryption – active sensitive content ▪ Personal Multimedia – notification of management ▪ All migrations are logged for defensible audit trail of disposition ▪ As data is migrated all metadata remains intact
  • 15. Ongoing Monitoring and Automation ▪ The GDPR requires “monitoring compliance with the GDPR and other Union or Member State data protection laws, including managing internal data protection activities, training data processing staff, and conducting internal audits.” ▪ Organizations will need to show compliance with the GDPR through the use of technology and sound policies ▪ Examples of Index Engines capabilities: ▪ Store queries and polices ▪ Automated policy search and identification (email notifications) ▪ Automated reporting with csv/text file for use in 3rd party reporting tools ▪ Automated indexing and preservation/archiving ▪ Activity logs/audit trails
  • 16. Index Engines Key Advantages Data Management Enterprise data insightKnow IT • The only enterprise class indexing platform on the market today • Supports all classes of data from primary storage to backup content Streamlined dispositionManage IT • Classify and report on content across the data center • Flexible access and disposition options to manage effectively Take control of dataGovern IT • Integrated archiving and preservation • Support for legal and security policies
  • 17. Next Steps… ▪ Contact Index Engines for our Data Classification eBook ▪ Index Engines www.indexengines.com info@indexengines.com