oncept-based auto categorization,
which automatically categorizes
documents based on their actual
content, not keywords or terms, is
a fast, easy, and repeatable way to
pinpoint only the most important docu-
ments and e-mails among libraries spanning
millions of files and messages. It is an estab-
lished standard in legal e-discovery and U.S.
intelligence, having proved itself defensible
and highly scalable.
By using sample documents containing
the concepts being sought, concept-based
auto categorization “looks” across an organi-
zation's entire electronic content and finds
others like them. Because it doesn’t depend
on finding key words or terms, it is faster,
easier, and far more accurate than lexicon-
based taxonomy alternatives.
With concept-based auto categorization,
then, it’s no longer necessary to manually
create – and constantly maintain – word-
based taxonomies and complex rules in
order to precisely and accurately classify
large volumes of unstructured big data and
improve the "findability" of information.
BenefitsofAccurateCategorization
In enterprise content management,
increased categorization accuracy enables
better content lifecycle management,
improved sharing among internal and exter-
nal audiences, more effective document and
records management for disposal, retention,
and compliance, and reduced exposure to
the cost of future legal matters.
Concept-based auto categorization is
now in its early stages of adoption to help
tame big data, reducing the burden while
simultaneously capitalizing on its hidden
value. It does this by helping organizations:
1.Dispose of redundant, outdated and
trivial (ROT) documents and e-mails.
Sample documents of such things as
spam, old e-mail newsletters, and outdat-
ed marketing documents can be used as
examples to find similar documents that
©2013 ARMA International, www.arma.org
C
value for the organization or is not marked
for retention through compliance could be
an unnecessary liability and increase the
cost burden to cull through in any future
litigations. Sensitive customer data, such as
medical records, Social Security numbers,
credit card numbers, or – worse yet – illic-
it materials, are a virtual time bomb.
Concept-based auto categorization can
reduce risks by enabling you to identify
these materials, dispose of them in a high-
ly defensible way, and demonstrate that
your company’s information governance
policies are enforceable and consistent.
7.Autocategorizeinanylanguage. Breaking
down language barriers with language-
agnostic document classification means
that all of the benefits of taming big data,
as well as the mitigation of big data’s neg-
ative impact, can be applied in global
organizations without requiring native
language speakers for every language in
which the enterprise generates content.
Despite the hype around big data, few
will disagree that it poses challenges, as well
as benefits if managed properly, and fewer
still will disagree that it’s going away anytime
soon. Throwing more bodies at the problem
is simply not practical, as the volume, veloc-
ity, and variety of content comprising big
data are accelerating.
Concept-based auto categorization has
proven itself as a highly effective, extremely
fast, and incredibly precise approach; the
possibilities are endless for applying it to big
data to address its major obstacles and to
harvest its broad benefits.
can be considered for disposal, dramati-
cally reducing the clutter without having
to manually inspect each document and
e-mail.
2. Maintain archiving regulatory compli-
ance. Oncethejunkhasbeenpareddown,
concept-based categorization can be used
to enable greater precision in determining
exactly which documents and messages
are required to be archived – and for how
long – according to your company’s reten-
tion policy and regulatory requirements.
3.Improve cross-functional, divisional,
and external content sharing and col-
laboration. Concept-based auto catego-
rization makes documents much easier to
find, dramatically improving collabora-
tion, sharing, and syndication of your
valuable content. With internal research
assets and intellectual property that can be
leveraged elsewhere in the enterprise, or
content generated for external consump-
tion, auto categorization dramatically
improves the ability of users to consume
and properly apply these information
assets.
4.Improve content lifecycle management
amidst evolving terms and categories.
With new terms and categories constantly
being introduced, concept-based auto cat-
egorization keeps document libraries cur-
rent and can even apply the right catego-
rization decisions to documents that con-
tain the newer terms, without having to
define or update dictionaries, thesauri,
keywords, or meta tags.
5.Integrate – and dis-integrate – content
through mergers, acquisitions, and
divestitures. Concept-based auto catego-
rization groups similar content together,
applying uniform categories to content
across all divisional boundaries and dis-
parate taxonomies inherent with mergers
and carving out conceptually related doc-
uments for a divestiture.
6.Improve security, privacy, and risk mit-
igation. Content that either no longer has
About Content Analyst
Company LLC
Content Analyst Company LLC’s soft-
ware provides advanced, conceptual-
based search, classification, and docu-
ment analysis. For more information on
the capabilities and value of advanced
analytics,visitwww.ContentAnalyst.com.
Concept-Based Auto Categorization:
Seven Ways it Tames Big Data

Seven Ways Concept-Based Auto Categorization Tames Big Data

  • 1.
    oncept-based auto categorization, whichautomatically categorizes documents based on their actual content, not keywords or terms, is a fast, easy, and repeatable way to pinpoint only the most important docu- ments and e-mails among libraries spanning millions of files and messages. It is an estab- lished standard in legal e-discovery and U.S. intelligence, having proved itself defensible and highly scalable. By using sample documents containing the concepts being sought, concept-based auto categorization “looks” across an organi- zation's entire electronic content and finds others like them. Because it doesn’t depend on finding key words or terms, it is faster, easier, and far more accurate than lexicon- based taxonomy alternatives. With concept-based auto categorization, then, it’s no longer necessary to manually create – and constantly maintain – word- based taxonomies and complex rules in order to precisely and accurately classify large volumes of unstructured big data and improve the "findability" of information. BenefitsofAccurateCategorization In enterprise content management, increased categorization accuracy enables better content lifecycle management, improved sharing among internal and exter- nal audiences, more effective document and records management for disposal, retention, and compliance, and reduced exposure to the cost of future legal matters. Concept-based auto categorization is now in its early stages of adoption to help tame big data, reducing the burden while simultaneously capitalizing on its hidden value. It does this by helping organizations: 1.Dispose of redundant, outdated and trivial (ROT) documents and e-mails. Sample documents of such things as spam, old e-mail newsletters, and outdat- ed marketing documents can be used as examples to find similar documents that ©2013 ARMA International, www.arma.org C value for the organization or is not marked for retention through compliance could be an unnecessary liability and increase the cost burden to cull through in any future litigations. Sensitive customer data, such as medical records, Social Security numbers, credit card numbers, or – worse yet – illic- it materials, are a virtual time bomb. Concept-based auto categorization can reduce risks by enabling you to identify these materials, dispose of them in a high- ly defensible way, and demonstrate that your company’s information governance policies are enforceable and consistent. 7.Autocategorizeinanylanguage. Breaking down language barriers with language- agnostic document classification means that all of the benefits of taming big data, as well as the mitigation of big data’s neg- ative impact, can be applied in global organizations without requiring native language speakers for every language in which the enterprise generates content. Despite the hype around big data, few will disagree that it poses challenges, as well as benefits if managed properly, and fewer still will disagree that it’s going away anytime soon. Throwing more bodies at the problem is simply not practical, as the volume, veloc- ity, and variety of content comprising big data are accelerating. Concept-based auto categorization has proven itself as a highly effective, extremely fast, and incredibly precise approach; the possibilities are endless for applying it to big data to address its major obstacles and to harvest its broad benefits. can be considered for disposal, dramati- cally reducing the clutter without having to manually inspect each document and e-mail. 2. Maintain archiving regulatory compli- ance. Oncethejunkhasbeenpareddown, concept-based categorization can be used to enable greater precision in determining exactly which documents and messages are required to be archived – and for how long – according to your company’s reten- tion policy and regulatory requirements. 3.Improve cross-functional, divisional, and external content sharing and col- laboration. Concept-based auto catego- rization makes documents much easier to find, dramatically improving collabora- tion, sharing, and syndication of your valuable content. With internal research assets and intellectual property that can be leveraged elsewhere in the enterprise, or content generated for external consump- tion, auto categorization dramatically improves the ability of users to consume and properly apply these information assets. 4.Improve content lifecycle management amidst evolving terms and categories. With new terms and categories constantly being introduced, concept-based auto cat- egorization keeps document libraries cur- rent and can even apply the right catego- rization decisions to documents that con- tain the newer terms, without having to define or update dictionaries, thesauri, keywords, or meta tags. 5.Integrate – and dis-integrate – content through mergers, acquisitions, and divestitures. Concept-based auto catego- rization groups similar content together, applying uniform categories to content across all divisional boundaries and dis- parate taxonomies inherent with mergers and carving out conceptually related doc- uments for a divestiture. 6.Improve security, privacy, and risk mit- igation. Content that either no longer has About Content Analyst Company LLC Content Analyst Company LLC’s soft- ware provides advanced, conceptual- based search, classification, and docu- ment analysis. For more information on the capabilities and value of advanced analytics,visitwww.ContentAnalyst.com. Concept-Based Auto Categorization: Seven Ways it Tames Big Data