SlideShare a Scribd company logo
www.exquando.com Exquando 2016. All rights Reserved 1 of 5
Introduction
Organizations face an ever increasing amount of
content and find it difficult to understand what
content they have and how to manage their
overall content assets.
Large volumes of content are not under any form
of governance, retention management or e-
discovery. Many organizations have found that
traditional (manual) approaches to content
classifying or categorization yield poor or uneven
results and put unacceptable burden on
employees. In addition, individuals will not
A research report by IDC in 2008 stated that 22% to 33%
of the digital universe is made up of content that is
governed by security, compliance, and preservation
obligations. IDC forecasts that high-value information will
make up close to 50% of the digital universe by the end of
2020.
IDC Research
A case for content classification automation
October 2016
Abstract
In this article a case for introducing automation to content classification is made. After pointing out
generally accepted challenges and organizational needs, an assessment of market adoption is provided,
followed by a brief overview of approaches and algorithms to automation. We conclude with practical
advice on how to introduce auto-classification within your organization.
A case for content classification automation
www.exquando.com Exquando 2016. All rights Reserved 2 of 5
consistently categorize information in exactly the
same way.
Market needs
Understanding all the content that exists in an
organization – governing it, enabling users to
access it, find it, reuse it and repurpose it – can
significantly enhance the
performance of an
organization.
Even if policies are adopted
to classify and tag this
content, the volumes
involved, and the change in
staff attitudes needed,
represent a huge hurdle to
jump. Tagging and
classifying the existing content to add value and
remove ROT (redundant, obsolete and trivial
content) would be quite impossible without
automated algorithms that match the defined
governance policies.
Automation solutions to content classification
need to fit in with existing IT infrastructure and
embrace current investments in content
management systems so that organizations can
take control of their unstructured content.
From analysts’ reports and customer surveys,
there is a clear need for a new way of managing
information. Information is everywhere and
organizations are still struggling to extract
quantifiable value from it that can add to their
bottom line. Moreover, organizations often
don’t even realize how their legacy information
may expose them to risks, and the dangers of
non-compliance.
Growth in the number and extent of laws and
regulations forces companies to be more vigilant
or face expensive penalties.
Information is everywhere, from on premise
content and file systems to cloud based drives
and repositories. It is simply unrealistic to
assume that information can be extracted from
all existing and future systems where it is created
and consumed, and migrated to a central
repository where, out of context, it can still be
accessed and correctly
interpreted. This means that
solutions need to enforce
information policies in
place, while the
organization’s information
remains where it is most
created, stored, maintained,
used and reused.
The adoption of cloud based
storage is growing rapidly in
organizational departments. Cloud repositories
simply cannot be ignored or managed
individually. Global and local governance policies
need to be enforced in the cloud even more
strictly, than on premise. Hybrid information
governance is the only way to bring consistency
and apply the same information policies across
the organization, irrespective of the information
type and format.
How can automation help?
Before any content can be governed, consumed,
and data-mined, there is a need to describe what
that content is, so that it can be classified,
become searchable, properly secured and timely
disposed of. Metadata is data that describes
various aspects of content in order to enhance
usability throughout its lifecycle. It enables
content to be contextualized, making it more
useful to individuals and to the business. In
summary, metadata enables content to be
turned into a valuable information asset.
22% to 33% of the digital universe is
made up of content that is governed by
security, compliance, and preservation
obligations. IDC forecasts that high-
value information will make up close to
50% of the digital universe by the end
of 2020.
IDC Research, 2008
A case for content classification automation
www.exquando.com Exquando 2016. All rights Reserved 3 of 5
Completing metadata and classifying large
volumes of existing content to add value and
remove ROT in line with governance policies of
the organization, is quite impossible without
automation.
Where there are high
volumes of inbound
content, staff struggle to
keep up, and quickly
conclude that any attempt of manual “back-
filing”, in order to tidy up existing repositories,
would be a tedious and a thankless, if not
impossible, task.
Automating processes is likely the only way to
provide lasting solutions.
Market adoption of auto-
classification
A 2015 market survey1
by the Association of
Information and Image Management (AIIM),
concludes with the following key findings:
 47% respondents are implementing auto-
classification. Another 13% are keen to get
started.
 18% are using automated classification at the
point of ingestion to ECM, RM or email
systems, and 15% within a workflow or
process. 8% are trawling legacy content for
metadata improvement and 13% processing
migrated content.
 The biggest benefits given for auto-
classification are improved searchability
(63%) then improved productivity (43%).
Defensible deletion and compliance are cited
by 37%.
1
© AIIM 2015, www.aiim.org Industry Watch - Information
Governance: too important to be left to humans
 When it comes to comparing automated
classification to manual, 34% of respondents
feel automation is more consistent than
humans, including 20% who feel it is more
accurate too.
 48% would prefer a
combined machine prompt
with human review.
 Accuracy is more of an
issue when it comes to defensible deletion
and compliance. In many ways, having a
demonstrable and consistent mechanism
which follows the declared governance
policies is more likely to produce a favorable
judgment in court than manual procedures,
which are not followed or enforced.
 Adding value to otherwise “dead content” is
a primary benefit for 30%, and reducing
storage volumes for 24%.
Commonly used automation
approaches
There are several technology approaches to
tackling the problem of automating the
classification process. Technology providers offer
solutions based on a variety of techniques.
 Rules-based
The rules-based technique requires experts
with domain knowledge to create and
maintain a set of rules for a document to be
included in any given category of a
taxonomy. This approach is likely the most
straightforward and user controllable
approach and focuses more on the
classification process than the construction
and definition of a taxonomy. Debugging is
59% of respondents are moving
towards auto-classification.
AIIM 2015 market survey
A case for content classification automation
www.exquando.com Exquando 2016. All rights Reserved 4 of 5
more straightforward compared to other
techniques. Making improvements relies on
updating the rules to a higher level of
sophistication.
 Statistical
Statistical approaches
typically require a
training set containing
content instances
whose category
membership in a
taxonomy is known.
Training sets, match
word and concepts to
categories and need to
be chosen carefully. Per
category, often 50
documents and more
are required to train the
system. Poor selection
of documents in the training set can
introduce pollution and negatively impact
classification results. In case of issues
diagnosing root causes and making
corrections can be less obvious.
Bayes' theorem in probability theory is
frequently used to measure “degrees of
belief”, and only when a sufficient degree of
belief is attained an assignment to a category
membership made.
 Machine learning
Machine learning is a subfield of computer
science that gives computer applications the
ability to learn without following explicitly
programmed instructions.
Machine learning tasks are typically
classified into:
Supervised learning - Some external mechanism,
typically human feedback, provides information
on the correct classification. The learning goal is
to learn to match the content input stream to the
correct category.
Unsupervised learning - No external information
or supervision is provided, leaving it up to the
learning machine to find, on
its own, structure from the
inbound stream. Clustering
techniques are used to
group content.
Semi-supervised learning-
Also known as constrained
clustering, aims to obtain
better-defined clusters than
those obtained from
unlabeled content. Semi-
supervised learning is an
extension of unsupervised
and supervised learning by
including additional
information typical of the other learning
paradigm.
No single approach should be considered
superior to another for every possible
application. The trend by technology software
providers is to combine multiple methods to
increase the accuracy of content classification.
74% of organizations continue to
depend on individuals to manually
comply with legal, regulatory, and
record management requirements.
Given the projected growth and the
inability of employees to manually
manage information, organizations
need to start automating the tasks
associated with classifying, managing,
and disposing of information assets.
Council for Information Auto-
Classification (CIAC)
A case for content classification automation
www.exquando.com Exquando 2016. All rights Reserved 5 of 5
Summary
More and more, organizations are finding it increasingly difficult to know what content they have,
what content is business critical, and how to best manage their overall content assets.
Manual approaches to content classification deliver poor or inconsistent results and put an
unwarranted load on content creators and handlers to tag documents.
The explosion of data sources and complexity of information makes manual classification and analysis
infeasible and financially unjustifiable.
Technology advances in machine learning algorithms are making solutions more comprehensive,
autonomous and efficient.
Organizations, today, are proactively moving towards auto-classification to address challenges.
Recommendations
A good practice is taking an iterative approach to adopting auto-classification. Start small and expand
in depth or scale with each subsequent iteration. This approach allows to deliver incremental business
value with each iteration, and simultaneously enabling the implementation team to learn and gain
expertise.
 Take manageable steps, each building on the previous. Be conscious not ending up attempting to
“boil the ocean”.
 Engage business users in developing the approach as they know, better than anybody else, the
content and business process details.
 Be cautious not to add incremental burden on users and content managers in the operational
phase. Solutions need to focus on becoming more intuitive and reduce repetitive, tedious manual
work for users.
 Where to start? Identifying a well-defined process depending on a significant amount of manual
interaction, and that can benefit from automation, is a good candidate. Improving a well-defined
process has the advantage of making it easier to define key performance indicators (KPI’s) and to
demonstrate incremental value. The business process improvements can then be used to secure
incremental funding for subsequent iterations with key stakeholders or executives.

More Related Content

What's hot

State of Security Operations 2016 report of capabilities and maturity of cybe...
State of Security Operations 2016 report of capabilities and maturity of cybe...State of Security Operations 2016 report of capabilities and maturity of cybe...
State of Security Operations 2016 report of capabilities and maturity of cybe...
at MicroFocus Italy ❖✔
 
Scenario you have recently been hired as a chief information gov
Scenario you have recently been hired as a chief information govScenario you have recently been hired as a chief information gov
Scenario you have recently been hired as a chief information gov
AKHIL969626
 
Information Governance
Information GovernanceInformation Governance
Confidentiality
ConfidentialityConfidentiality
Confidentiality
Kym Canty
 
Information Analytics: Know What Is In Your E-files To Save Millions and Mana...
Information Analytics: Know What Is In Your E-files To Save Millions and Mana...Information Analytics: Know What Is In Your E-files To Save Millions and Mana...
Information Analytics: Know What Is In Your E-files To Save Millions and Mana...
Paragon Solutions
 
9545-RR-Why-Use-MSSP
9545-RR-Why-Use-MSSP9545-RR-Why-Use-MSSP
9545-RR-Why-Use-MSSP
Alex Himmelberg
 
web-MINImag
web-MINImagweb-MINImag
web-MINImag
Allison Walton
 
Running head organizational information system1 organizational
Running head organizational information system1 organizational Running head organizational information system1 organizational
Running head organizational information system1 organizational
AKHIL969626
 
A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...
A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...
A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...
IJNSA Journal
 
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
ijcsa
 
Ponemon: Managing Complexity in IAM
Ponemon: Managing Complexity in IAMPonemon: Managing Complexity in IAM
Ponemon: Managing Complexity in IAM
EMC
 
The Whole is Greater than the Sum of its Parts with IG
The Whole is Greater than the Sum of its Parts with IGThe Whole is Greater than the Sum of its Parts with IG
The Whole is Greater than the Sum of its Parts with IG
Ronke Ekwensi
 
Ics 3210 information systems security and audit - edited
Ics 3210   information systems security and audit - editedIcs 3210   information systems security and audit - edited
Ics 3210 information systems security and audit - edited
Nelson Kimathi
 
Content management
Content managementContent management
Content management
Rajendra Babu
 
Data security issues, ethical issues and challenges to privacy in knowledge-i...
Data security issues, ethical issues and challenges to privacy in knowledge-i...Data security issues, ethical issues and challenges to privacy in knowledge-i...
Data security issues, ethical issues and challenges to privacy in knowledge-i...
Tore Hoel
 
Case Study on Effective IS Governance within a Department of Defense Organiza...
Case Study on Effective IS Governance within a Department of Defense Organiza...Case Study on Effective IS Governance within a Department of Defense Organiza...
Case Study on Effective IS Governance within a Department of Defense Organiza...
Chris Furton
 
2009 iapp-the corpprivacydeptmar13-2009
2009 iapp-the corpprivacydeptmar13-20092009 iapp-the corpprivacydeptmar13-2009
2009 iapp-the corpprivacydeptmar13-2009
asundaram1
 
Information Security Governance: Concepts, Security Management & Metrics
Information Security Governance: Concepts, Security Management & MetricsInformation Security Governance: Concepts, Security Management & Metrics
Information Security Governance: Concepts, Security Management & Metrics
Marius FAILLOT DEVARRE
 
Brian Dirking Knowing Your Organizations Goals Before Choosing A Product
Brian Dirking Knowing Your Organizations Goals Before Choosing A ProductBrian Dirking Knowing Your Organizations Goals Before Choosing A Product
Brian Dirking Knowing Your Organizations Goals Before Choosing A Product
bdirking
 

What's hot (19)

State of Security Operations 2016 report of capabilities and maturity of cybe...
State of Security Operations 2016 report of capabilities and maturity of cybe...State of Security Operations 2016 report of capabilities and maturity of cybe...
State of Security Operations 2016 report of capabilities and maturity of cybe...
 
Scenario you have recently been hired as a chief information gov
Scenario you have recently been hired as a chief information govScenario you have recently been hired as a chief information gov
Scenario you have recently been hired as a chief information gov
 
Information Governance
Information GovernanceInformation Governance
Information Governance
 
Confidentiality
ConfidentialityConfidentiality
Confidentiality
 
Information Analytics: Know What Is In Your E-files To Save Millions and Mana...
Information Analytics: Know What Is In Your E-files To Save Millions and Mana...Information Analytics: Know What Is In Your E-files To Save Millions and Mana...
Information Analytics: Know What Is In Your E-files To Save Millions and Mana...
 
9545-RR-Why-Use-MSSP
9545-RR-Why-Use-MSSP9545-RR-Why-Use-MSSP
9545-RR-Why-Use-MSSP
 
web-MINImag
web-MINImagweb-MINImag
web-MINImag
 
Running head organizational information system1 organizational
Running head organizational information system1 organizational Running head organizational information system1 organizational
Running head organizational information system1 organizational
 
A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...
A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...
A MULTI-CRITERIA EVALUATION OF INFORMATION SECURITY CONTROLS USING BOOLEAN FE...
 
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
 
Ponemon: Managing Complexity in IAM
Ponemon: Managing Complexity in IAMPonemon: Managing Complexity in IAM
Ponemon: Managing Complexity in IAM
 
The Whole is Greater than the Sum of its Parts with IG
The Whole is Greater than the Sum of its Parts with IGThe Whole is Greater than the Sum of its Parts with IG
The Whole is Greater than the Sum of its Parts with IG
 
Ics 3210 information systems security and audit - edited
Ics 3210   information systems security and audit - editedIcs 3210   information systems security and audit - edited
Ics 3210 information systems security and audit - edited
 
Content management
Content managementContent management
Content management
 
Data security issues, ethical issues and challenges to privacy in knowledge-i...
Data security issues, ethical issues and challenges to privacy in knowledge-i...Data security issues, ethical issues and challenges to privacy in knowledge-i...
Data security issues, ethical issues and challenges to privacy in knowledge-i...
 
Case Study on Effective IS Governance within a Department of Defense Organiza...
Case Study on Effective IS Governance within a Department of Defense Organiza...Case Study on Effective IS Governance within a Department of Defense Organiza...
Case Study on Effective IS Governance within a Department of Defense Organiza...
 
2009 iapp-the corpprivacydeptmar13-2009
2009 iapp-the corpprivacydeptmar13-20092009 iapp-the corpprivacydeptmar13-2009
2009 iapp-the corpprivacydeptmar13-2009
 
Information Security Governance: Concepts, Security Management & Metrics
Information Security Governance: Concepts, Security Management & MetricsInformation Security Governance: Concepts, Security Management & Metrics
Information Security Governance: Concepts, Security Management & Metrics
 
Brian Dirking Knowing Your Organizations Goals Before Choosing A Product
Brian Dirking Knowing Your Organizations Goals Before Choosing A ProductBrian Dirking Knowing Your Organizations Goals Before Choosing A Product
Brian Dirking Knowing Your Organizations Goals Before Choosing A Product
 

Viewers also liked

Security a New Era in Computing: Acceleration using the Supply Chain Principl...
Security a New Era in Computing: Acceleration using the Supply Chain Principl...Security a New Era in Computing: Acceleration using the Supply Chain Principl...
Security a New Era in Computing: Acceleration using the Supply Chain Principl...
EC-Council
 
resume jay chauhan 2 ah
resume jay chauhan 2 ahresume jay chauhan 2 ah
resume jay chauhan 2 ah
Jay Chauhan
 
Encarte de CD
Encarte de CDEncarte de CD
Encarte de CD
Beatriz Cruz
 
SOOSAN BREAKERS CATALOG-FARSI-1395
SOOSAN BREAKERS CATALOG-FARSI-1395SOOSAN BREAKERS CATALOG-FARSI-1395
SOOSAN BREAKERS CATALOG-FARSI-1395Alireza Qaderi
 
[Non devotees] Maha Mangala
[Non devotees] Maha Mangala[Non devotees] Maha Mangala
[Non devotees] Maha Mangala
mahamangala
 
Location scouting sheet 3
Location scouting sheet 3Location scouting sheet 3
Location scouting sheet 3
Lauren Barrett
 
Analysing music videos
Analysing music videosAnalysing music videos
Analysing music videos
baileyplaskow
 
Design thinking para startups no Festival Path
Design thinking para startups no Festival PathDesign thinking para startups no Festival Path
Design thinking para startups no Festival Path
DTStartups
 
60s pitch number 2
60s pitch number 260s pitch number 2
60s pitch number 2
Emily Roberts
 
菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』
菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』
菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』
http://spartaeditor3th-bayash.hatenablog.com/
 

Viewers also liked (10)

Security a New Era in Computing: Acceleration using the Supply Chain Principl...
Security a New Era in Computing: Acceleration using the Supply Chain Principl...Security a New Era in Computing: Acceleration using the Supply Chain Principl...
Security a New Era in Computing: Acceleration using the Supply Chain Principl...
 
resume jay chauhan 2 ah
resume jay chauhan 2 ahresume jay chauhan 2 ah
resume jay chauhan 2 ah
 
Encarte de CD
Encarte de CDEncarte de CD
Encarte de CD
 
SOOSAN BREAKERS CATALOG-FARSI-1395
SOOSAN BREAKERS CATALOG-FARSI-1395SOOSAN BREAKERS CATALOG-FARSI-1395
SOOSAN BREAKERS CATALOG-FARSI-1395
 
[Non devotees] Maha Mangala
[Non devotees] Maha Mangala[Non devotees] Maha Mangala
[Non devotees] Maha Mangala
 
Location scouting sheet 3
Location scouting sheet 3Location scouting sheet 3
Location scouting sheet 3
 
Analysing music videos
Analysing music videosAnalysing music videos
Analysing music videos
 
Design thinking para startups no Festival Path
Design thinking para startups no Festival PathDesign thinking para startups no Festival Path
Design thinking para startups no Festival Path
 
60s pitch number 2
60s pitch number 260s pitch number 2
60s pitch number 2
 
菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』
菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』
菅付雅信の編集スパルタ塾 課題⑭『AR三兄弟 川田十夢氏』
 

Similar to Exquando - A case for content classification automation

conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
martingarland
 
Vertex_Why_Software_Non_Negotiable_WP
Vertex_Why_Software_Non_Negotiable_WPVertex_Why_Software_Non_Negotiable_WP
Vertex_Why_Software_Non_Negotiable_WP
Luke Arrington
 
SME- Developing an information governance strategy 2016
SME- Developing an information governance strategy 2016 SME- Developing an information governance strategy 2016
SME- Developing an information governance strategy 2016
Hybrid Cloud
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
Angie Jorgensen
 
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docxRunning head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
jeanettehully
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
Anjan Roy, PMP
 
MIST.601 Management Information SystemsResearch Project Proposal.docx
MIST.601 Management Information SystemsResearch Project Proposal.docxMIST.601 Management Information SystemsResearch Project Proposal.docx
MIST.601 Management Information SystemsResearch Project Proposal.docx
annandleola
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Srikanth Sharma Boddupalli
 
ThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECMThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECM
Christopher Wynder
 
km ppt neew one
km ppt neew onekm ppt neew one
km ppt neew one
Sahil Jain
 
Information systems strategy formulation
Information systems strategy formulationInformation systems strategy formulation
Information systems strategy formulation
Assignment Studio
 
Pingar - The Future of Text Analytics
Pingar - The Future of Text AnalyticsPingar - The Future of Text Analytics
Pingar - The Future of Text Analytics
Chris Riley ☁
 
[Webinar Slides] Realizing the True Value of ECM with Integration
[Webinar Slides] Realizing the True Value of ECM with Integration[Webinar Slides] Realizing the True Value of ECM with Integration
[Webinar Slides] Realizing the True Value of ECM with Integration
AIIM International
 
Igs animation s;lide
Igs animation s;lideIgs animation s;lide
Igs animation s;lide
Recommind
 
Digital Ethical Risk Assessment
Digital Ethical Risk AssessmentDigital Ethical Risk Assessment
Digital Ethical Risk Assessment
Marc St-Pierre
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Hewlett Packard Enterprise Services
 
How Social and the Cloud Impact Your Governance Strategy
How Social and the Cloud Impact Your Governance StrategyHow Social and the Cloud Impact Your Governance Strategy
How Social and the Cloud Impact Your Governance Strategy
Christian Buckley
 
AIIM and Vamosa - Practical Cosniderations when Implementing ECM
AIIM and Vamosa - Practical Cosniderations when Implementing ECMAIIM and Vamosa - Practical Cosniderations when Implementing ECM
AIIM and Vamosa - Practical Cosniderations when Implementing ECM
nicarcher
 
Mobilization +
Mobilization +Mobilization +
Intelligent Case Management
Intelligent Case ManagementIntelligent Case Management
Intelligent Case Management
AIIM International
 

Similar to Exquando - A case for content classification automation (20)

conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
 
Vertex_Why_Software_Non_Negotiable_WP
Vertex_Why_Software_Non_Negotiable_WPVertex_Why_Software_Non_Negotiable_WP
Vertex_Why_Software_Non_Negotiable_WP
 
SME- Developing an information governance strategy 2016
SME- Developing an information governance strategy 2016 SME- Developing an information governance strategy 2016
SME- Developing an information governance strategy 2016
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
 
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docxRunning head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
Running head PROJECT PLAN INCEPTION1PROJECT PLAN INCEPTION .docx
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
 
MIST.601 Management Information SystemsResearch Project Proposal.docx
MIST.601 Management Information SystemsResearch Project Proposal.docxMIST.601 Management Information SystemsResearch Project Proposal.docx
MIST.601 Management Information SystemsResearch Project Proposal.docx
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
ThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECMThinkDox implementation whitepaper for ECM
ThinkDox implementation whitepaper for ECM
 
km ppt neew one
km ppt neew onekm ppt neew one
km ppt neew one
 
Information systems strategy formulation
Information systems strategy formulationInformation systems strategy formulation
Information systems strategy formulation
 
Pingar - The Future of Text Analytics
Pingar - The Future of Text AnalyticsPingar - The Future of Text Analytics
Pingar - The Future of Text Analytics
 
[Webinar Slides] Realizing the True Value of ECM with Integration
[Webinar Slides] Realizing the True Value of ECM with Integration[Webinar Slides] Realizing the True Value of ECM with Integration
[Webinar Slides] Realizing the True Value of ECM with Integration
 
Igs animation s;lide
Igs animation s;lideIgs animation s;lide
Igs animation s;lide
 
Digital Ethical Risk Assessment
Digital Ethical Risk AssessmentDigital Ethical Risk Assessment
Digital Ethical Risk Assessment
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
 
How Social and the Cloud Impact Your Governance Strategy
How Social and the Cloud Impact Your Governance StrategyHow Social and the Cloud Impact Your Governance Strategy
How Social and the Cloud Impact Your Governance Strategy
 
AIIM and Vamosa - Practical Cosniderations when Implementing ECM
AIIM and Vamosa - Practical Cosniderations when Implementing ECMAIIM and Vamosa - Practical Cosniderations when Implementing ECM
AIIM and Vamosa - Practical Cosniderations when Implementing ECM
 
Mobilization +
Mobilization +Mobilization +
Mobilization +
 
Intelligent Case Management
Intelligent Case ManagementIntelligent Case Management
Intelligent Case Management
 

Exquando - A case for content classification automation

  • 1. www.exquando.com Exquando 2016. All rights Reserved 1 of 5 Introduction Organizations face an ever increasing amount of content and find it difficult to understand what content they have and how to manage their overall content assets. Large volumes of content are not under any form of governance, retention management or e- discovery. Many organizations have found that traditional (manual) approaches to content classifying or categorization yield poor or uneven results and put unacceptable burden on employees. In addition, individuals will not A research report by IDC in 2008 stated that 22% to 33% of the digital universe is made up of content that is governed by security, compliance, and preservation obligations. IDC forecasts that high-value information will make up close to 50% of the digital universe by the end of 2020. IDC Research A case for content classification automation October 2016 Abstract In this article a case for introducing automation to content classification is made. After pointing out generally accepted challenges and organizational needs, an assessment of market adoption is provided, followed by a brief overview of approaches and algorithms to automation. We conclude with practical advice on how to introduce auto-classification within your organization.
  • 2. A case for content classification automation www.exquando.com Exquando 2016. All rights Reserved 2 of 5 consistently categorize information in exactly the same way. Market needs Understanding all the content that exists in an organization – governing it, enabling users to access it, find it, reuse it and repurpose it – can significantly enhance the performance of an organization. Even if policies are adopted to classify and tag this content, the volumes involved, and the change in staff attitudes needed, represent a huge hurdle to jump. Tagging and classifying the existing content to add value and remove ROT (redundant, obsolete and trivial content) would be quite impossible without automated algorithms that match the defined governance policies. Automation solutions to content classification need to fit in with existing IT infrastructure and embrace current investments in content management systems so that organizations can take control of their unstructured content. From analysts’ reports and customer surveys, there is a clear need for a new way of managing information. Information is everywhere and organizations are still struggling to extract quantifiable value from it that can add to their bottom line. Moreover, organizations often don’t even realize how their legacy information may expose them to risks, and the dangers of non-compliance. Growth in the number and extent of laws and regulations forces companies to be more vigilant or face expensive penalties. Information is everywhere, from on premise content and file systems to cloud based drives and repositories. It is simply unrealistic to assume that information can be extracted from all existing and future systems where it is created and consumed, and migrated to a central repository where, out of context, it can still be accessed and correctly interpreted. This means that solutions need to enforce information policies in place, while the organization’s information remains where it is most created, stored, maintained, used and reused. The adoption of cloud based storage is growing rapidly in organizational departments. Cloud repositories simply cannot be ignored or managed individually. Global and local governance policies need to be enforced in the cloud even more strictly, than on premise. Hybrid information governance is the only way to bring consistency and apply the same information policies across the organization, irrespective of the information type and format. How can automation help? Before any content can be governed, consumed, and data-mined, there is a need to describe what that content is, so that it can be classified, become searchable, properly secured and timely disposed of. Metadata is data that describes various aspects of content in order to enhance usability throughout its lifecycle. It enables content to be contextualized, making it more useful to individuals and to the business. In summary, metadata enables content to be turned into a valuable information asset. 22% to 33% of the digital universe is made up of content that is governed by security, compliance, and preservation obligations. IDC forecasts that high- value information will make up close to 50% of the digital universe by the end of 2020. IDC Research, 2008
  • 3. A case for content classification automation www.exquando.com Exquando 2016. All rights Reserved 3 of 5 Completing metadata and classifying large volumes of existing content to add value and remove ROT in line with governance policies of the organization, is quite impossible without automation. Where there are high volumes of inbound content, staff struggle to keep up, and quickly conclude that any attempt of manual “back- filing”, in order to tidy up existing repositories, would be a tedious and a thankless, if not impossible, task. Automating processes is likely the only way to provide lasting solutions. Market adoption of auto- classification A 2015 market survey1 by the Association of Information and Image Management (AIIM), concludes with the following key findings:  47% respondents are implementing auto- classification. Another 13% are keen to get started.  18% are using automated classification at the point of ingestion to ECM, RM or email systems, and 15% within a workflow or process. 8% are trawling legacy content for metadata improvement and 13% processing migrated content.  The biggest benefits given for auto- classification are improved searchability (63%) then improved productivity (43%). Defensible deletion and compliance are cited by 37%. 1 © AIIM 2015, www.aiim.org Industry Watch - Information Governance: too important to be left to humans  When it comes to comparing automated classification to manual, 34% of respondents feel automation is more consistent than humans, including 20% who feel it is more accurate too.  48% would prefer a combined machine prompt with human review.  Accuracy is more of an issue when it comes to defensible deletion and compliance. In many ways, having a demonstrable and consistent mechanism which follows the declared governance policies is more likely to produce a favorable judgment in court than manual procedures, which are not followed or enforced.  Adding value to otherwise “dead content” is a primary benefit for 30%, and reducing storage volumes for 24%. Commonly used automation approaches There are several technology approaches to tackling the problem of automating the classification process. Technology providers offer solutions based on a variety of techniques.  Rules-based The rules-based technique requires experts with domain knowledge to create and maintain a set of rules for a document to be included in any given category of a taxonomy. This approach is likely the most straightforward and user controllable approach and focuses more on the classification process than the construction and definition of a taxonomy. Debugging is 59% of respondents are moving towards auto-classification. AIIM 2015 market survey
  • 4. A case for content classification automation www.exquando.com Exquando 2016. All rights Reserved 4 of 5 more straightforward compared to other techniques. Making improvements relies on updating the rules to a higher level of sophistication.  Statistical Statistical approaches typically require a training set containing content instances whose category membership in a taxonomy is known. Training sets, match word and concepts to categories and need to be chosen carefully. Per category, often 50 documents and more are required to train the system. Poor selection of documents in the training set can introduce pollution and negatively impact classification results. In case of issues diagnosing root causes and making corrections can be less obvious. Bayes' theorem in probability theory is frequently used to measure “degrees of belief”, and only when a sufficient degree of belief is attained an assignment to a category membership made.  Machine learning Machine learning is a subfield of computer science that gives computer applications the ability to learn without following explicitly programmed instructions. Machine learning tasks are typically classified into: Supervised learning - Some external mechanism, typically human feedback, provides information on the correct classification. The learning goal is to learn to match the content input stream to the correct category. Unsupervised learning - No external information or supervision is provided, leaving it up to the learning machine to find, on its own, structure from the inbound stream. Clustering techniques are used to group content. Semi-supervised learning- Also known as constrained clustering, aims to obtain better-defined clusters than those obtained from unlabeled content. Semi- supervised learning is an extension of unsupervised and supervised learning by including additional information typical of the other learning paradigm. No single approach should be considered superior to another for every possible application. The trend by technology software providers is to combine multiple methods to increase the accuracy of content classification. 74% of organizations continue to depend on individuals to manually comply with legal, regulatory, and record management requirements. Given the projected growth and the inability of employees to manually manage information, organizations need to start automating the tasks associated with classifying, managing, and disposing of information assets. Council for Information Auto- Classification (CIAC)
  • 5. A case for content classification automation www.exquando.com Exquando 2016. All rights Reserved 5 of 5 Summary More and more, organizations are finding it increasingly difficult to know what content they have, what content is business critical, and how to best manage their overall content assets. Manual approaches to content classification deliver poor or inconsistent results and put an unwarranted load on content creators and handlers to tag documents. The explosion of data sources and complexity of information makes manual classification and analysis infeasible and financially unjustifiable. Technology advances in machine learning algorithms are making solutions more comprehensive, autonomous and efficient. Organizations, today, are proactively moving towards auto-classification to address challenges. Recommendations A good practice is taking an iterative approach to adopting auto-classification. Start small and expand in depth or scale with each subsequent iteration. This approach allows to deliver incremental business value with each iteration, and simultaneously enabling the implementation team to learn and gain expertise.  Take manageable steps, each building on the previous. Be conscious not ending up attempting to “boil the ocean”.  Engage business users in developing the approach as they know, better than anybody else, the content and business process details.  Be cautious not to add incremental burden on users and content managers in the operational phase. Solutions need to focus on becoming more intuitive and reduce repetitive, tedious manual work for users.  Where to start? Identifying a well-defined process depending on a significant amount of manual interaction, and that can benefit from automation, is a good candidate. Improving a well-defined process has the advantage of making it easier to define key performance indicators (KPI’s) and to demonstrate incremental value. The business process improvements can then be used to secure incremental funding for subsequent iterations with key stakeholders or executives.