More Related Content Similar to Seven Ways Concept-Based Auto Categorization Tames Big Data (20) Seven Ways Concept-Based Auto Categorization Tames Big Data1. oncept-based auto categorization,
which automatically categorizes
documents based on their actual
content, not keywords or terms, is
a fast, easy, and repeatable way to
pinpoint only the most important docu-
ments and e-mails among libraries spanning
millions of files and messages. It is an estab-
lished standard in legal e-discovery and U.S.
intelligence, having proved itself defensible
and highly scalable.
By using sample documents containing
the concepts being sought, concept-based
auto categorization “looks” across an organi-
zation's entire electronic content and finds
others like them. Because it doesn’t depend
on finding key words or terms, it is faster,
easier, and far more accurate than lexicon-
based taxonomy alternatives.
With concept-based auto categorization,
then, it’s no longer necessary to manually
create – and constantly maintain – word-
based taxonomies and complex rules in
order to precisely and accurately classify
large volumes of unstructured big data and
improve the "findability" of information.
BenefitsofAccurateCategorization
In enterprise content management,
increased categorization accuracy enables
better content lifecycle management,
improved sharing among internal and exter-
nal audiences, more effective document and
records management for disposal, retention,
and compliance, and reduced exposure to
the cost of future legal matters.
Concept-based auto categorization is
now in its early stages of adoption to help
tame big data, reducing the burden while
simultaneously capitalizing on its hidden
value. It does this by helping organizations:
1.Dispose of redundant, outdated and
trivial (ROT) documents and e-mails.
Sample documents of such things as
spam, old e-mail newsletters, and outdat-
ed marketing documents can be used as
examples to find similar documents that
©2013 ARMA International, www.arma.org
C
value for the organization or is not marked
for retention through compliance could be
an unnecessary liability and increase the
cost burden to cull through in any future
litigations. Sensitive customer data, such as
medical records, Social Security numbers,
credit card numbers, or – worse yet – illic-
it materials, are a virtual time bomb.
Concept-based auto categorization can
reduce risks by enabling you to identify
these materials, dispose of them in a high-
ly defensible way, and demonstrate that
your company’s information governance
policies are enforceable and consistent.
7.Autocategorizeinanylanguage. Breaking
down language barriers with language-
agnostic document classification means
that all of the benefits of taming big data,
as well as the mitigation of big data’s neg-
ative impact, can be applied in global
organizations without requiring native
language speakers for every language in
which the enterprise generates content.
Despite the hype around big data, few
will disagree that it poses challenges, as well
as benefits if managed properly, and fewer
still will disagree that it’s going away anytime
soon. Throwing more bodies at the problem
is simply not practical, as the volume, veloc-
ity, and variety of content comprising big
data are accelerating.
Concept-based auto categorization has
proven itself as a highly effective, extremely
fast, and incredibly precise approach; the
possibilities are endless for applying it to big
data to address its major obstacles and to
harvest its broad benefits.
can be considered for disposal, dramati-
cally reducing the clutter without having
to manually inspect each document and
e-mail.
2. Maintain archiving regulatory compli-
ance. Oncethejunkhasbeenpareddown,
concept-based categorization can be used
to enable greater precision in determining
exactly which documents and messages
are required to be archived – and for how
long – according to your company’s reten-
tion policy and regulatory requirements.
3.Improve cross-functional, divisional,
and external content sharing and col-
laboration. Concept-based auto catego-
rization makes documents much easier to
find, dramatically improving collabora-
tion, sharing, and syndication of your
valuable content. With internal research
assets and intellectual property that can be
leveraged elsewhere in the enterprise, or
content generated for external consump-
tion, auto categorization dramatically
improves the ability of users to consume
and properly apply these information
assets.
4.Improve content lifecycle management
amidst evolving terms and categories.
With new terms and categories constantly
being introduced, concept-based auto cat-
egorization keeps document libraries cur-
rent and can even apply the right catego-
rization decisions to documents that con-
tain the newer terms, without having to
define or update dictionaries, thesauri,
keywords, or meta tags.
5.Integrate – and dis-integrate – content
through mergers, acquisitions, and
divestitures. Concept-based auto catego-
rization groups similar content together,
applying uniform categories to content
across all divisional boundaries and dis-
parate taxonomies inherent with mergers
and carving out conceptually related doc-
uments for a divestiture.
6.Improve security, privacy, and risk mit-
igation. Content that either no longer has
About Content Analyst
Company LLC
Content Analyst Company LLC’s soft-
ware provides advanced, conceptual-
based search, classification, and docu-
ment analysis. For more information on
the capabilities and value of advanced
analytics,visitwww.ContentAnalyst.com.
Concept-Based Auto Categorization:
Seven Ways it Tames Big Data