SlideShare a Scribd company logo
1 of 1
Download to read offline
oncept-based auto categorization,
which automatically categorizes
documents based on their actual
content, not keywords or terms, is
a fast, easy, and repeatable way to
pinpoint only the most important docu-
ments and e-mails among libraries spanning
millions of files and messages. It is an estab-
lished standard in legal e-discovery and U.S.
intelligence, having proved itself defensible
and highly scalable.
By using sample documents containing
the concepts being sought, concept-based
auto categorization “looks” across an organi-
zation's entire electronic content and finds
others like them. Because it doesn’t depend
on finding key words or terms, it is faster,
easier, and far more accurate than lexicon-
based taxonomy alternatives.
With concept-based auto categorization,
then, it’s no longer necessary to manually
create – and constantly maintain – word-
based taxonomies and complex rules in
order to precisely and accurately classify
large volumes of unstructured big data and
improve the "findability" of information.
BenefitsofAccurateCategorization
In enterprise content management,
increased categorization accuracy enables
better content lifecycle management,
improved sharing among internal and exter-
nal audiences, more effective document and
records management for disposal, retention,
and compliance, and reduced exposure to
the cost of future legal matters.
Concept-based auto categorization is
now in its early stages of adoption to help
tame big data, reducing the burden while
simultaneously capitalizing on its hidden
value. It does this by helping organizations:
1.Dispose of redundant, outdated and
trivial (ROT) documents and e-mails.
Sample documents of such things as
spam, old e-mail newsletters, and outdat-
ed marketing documents can be used as
examples to find similar documents that
©2013 ARMA International, www.arma.org
C
value for the organization or is not marked
for retention through compliance could be
an unnecessary liability and increase the
cost burden to cull through in any future
litigations. Sensitive customer data, such as
medical records, Social Security numbers,
credit card numbers, or – worse yet – illic-
it materials, are a virtual time bomb.
Concept-based auto categorization can
reduce risks by enabling you to identify
these materials, dispose of them in a high-
ly defensible way, and demonstrate that
your company’s information governance
policies are enforceable and consistent.
7.Autocategorizeinanylanguage. Breaking
down language barriers with language-
agnostic document classification means
that all of the benefits of taming big data,
as well as the mitigation of big data’s neg-
ative impact, can be applied in global
organizations without requiring native
language speakers for every language in
which the enterprise generates content.
Despite the hype around big data, few
will disagree that it poses challenges, as well
as benefits if managed properly, and fewer
still will disagree that it’s going away anytime
soon. Throwing more bodies at the problem
is simply not practical, as the volume, veloc-
ity, and variety of content comprising big
data are accelerating.
Concept-based auto categorization has
proven itself as a highly effective, extremely
fast, and incredibly precise approach; the
possibilities are endless for applying it to big
data to address its major obstacles and to
harvest its broad benefits.
can be considered for disposal, dramati-
cally reducing the clutter without having
to manually inspect each document and
e-mail.
2. Maintain archiving regulatory compli-
ance. Oncethejunkhasbeenpareddown,
concept-based categorization can be used
to enable greater precision in determining
exactly which documents and messages
are required to be archived – and for how
long – according to your company’s reten-
tion policy and regulatory requirements.
3.Improve cross-functional, divisional,
and external content sharing and col-
laboration. Concept-based auto catego-
rization makes documents much easier to
find, dramatically improving collabora-
tion, sharing, and syndication of your
valuable content. With internal research
assets and intellectual property that can be
leveraged elsewhere in the enterprise, or
content generated for external consump-
tion, auto categorization dramatically
improves the ability of users to consume
and properly apply these information
assets.
4.Improve content lifecycle management
amidst evolving terms and categories.
With new terms and categories constantly
being introduced, concept-based auto cat-
egorization keeps document libraries cur-
rent and can even apply the right catego-
rization decisions to documents that con-
tain the newer terms, without having to
define or update dictionaries, thesauri,
keywords, or meta tags.
5.Integrate – and dis-integrate – content
through mergers, acquisitions, and
divestitures. Concept-based auto catego-
rization groups similar content together,
applying uniform categories to content
across all divisional boundaries and dis-
parate taxonomies inherent with mergers
and carving out conceptually related doc-
uments for a divestiture.
6.Improve security, privacy, and risk mit-
igation. Content that either no longer has
About Content Analyst
Company LLC
Content Analyst Company LLC’s soft-
ware provides advanced, conceptual-
based search, classification, and docu-
ment analysis. For more information on
the capabilities and value of advanced
analytics,visitwww.ContentAnalyst.com.
Concept-Based Auto Categorization:
Seven Ways it Tames Big Data

More Related Content

What's hot

Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...
Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...
Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...
dbpublications
 
Big Data for Media Development
Big Data for Media DevelopmentBig Data for Media Development
Big Data for Media Development
Anahi Iacucci
 

What's hot (10)

Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...
Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...
Cloud Computing: Scalable and Secure Sharing of Personal Health Records Using...
 
Big Data for Media Development
Big Data for Media DevelopmentBig Data for Media Development
Big Data for Media Development
 
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data Migration
 
Joining It All Up - KIM Legal
Joining It All Up - KIM LegalJoining It All Up - KIM Legal
Joining It All Up - KIM Legal
 
Data Retention and eDiscovery from Symantec
Data Retention and eDiscovery from SymantecData Retention and eDiscovery from Symantec
Data Retention and eDiscovery from Symantec
 
Infrastructure Training Session
Infrastructure Training SessionInfrastructure Training Session
Infrastructure Training Session
 
Inforouterproducttour V7
Inforouterproducttour V7Inforouterproducttour V7
Inforouterproducttour V7
 
Kogni - A Data Security Product. Discovers, Secures, & Monitors Sensitive Ent...
Kogni - A Data Security Product. Discovers, Secures, & Monitors Sensitive Ent...Kogni - A Data Security Product. Discovers, Secures, & Monitors Sensitive Ent...
Kogni - A Data Security Product. Discovers, Secures, & Monitors Sensitive Ent...
 
How Text Analytics Increases Search Relevance
How Text Analytics Increases Search RelevanceHow Text Analytics Increases Search Relevance
How Text Analytics Increases Search Relevance
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 

Similar to Seven Ways Concept-Based Auto Categorization Tames Big Data

Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
Steven Toole
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Hewlett Packard Enterprise Services
 
Recommind-AXC-Data-Management-Intelligent-Information-Governance-DS
Recommind-AXC-Data-Management-Intelligent-Information-Governance-DSRecommind-AXC-Data-Management-Intelligent-Information-Governance-DS
Recommind-AXC-Data-Management-Intelligent-Information-Governance-DS
rschrader1954
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Dan Keldsen
 
CESSI Digital Library Case Study Eng
CESSI Digital Library Case Study EngCESSI Digital Library Case Study Eng
CESSI Digital Library Case Study Eng
atolomei
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
martingarland
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Findwise
 

Similar to Seven Ways Concept-Based Auto Categorization Tames Big Data (20)

Avoid expensive electronic dumping grounds by auto-tagging content
Avoid expensive electronic dumping grounds by auto-tagging contentAvoid expensive electronic dumping grounds by auto-tagging content
Avoid expensive electronic dumping grounds by auto-tagging content
 
Avoid Expensive Electronic Dumping Grounds by Auto-tagging Content
Avoid Expensive Electronic Dumping Grounds by Auto-tagging ContentAvoid Expensive Electronic Dumping Grounds by Auto-tagging Content
Avoid Expensive Electronic Dumping Grounds by Auto-tagging Content
 
AMCTO presentation on moving from records managment to information management
AMCTO presentation on moving from records managment to information managementAMCTO presentation on moving from records managment to information management
AMCTO presentation on moving from records managment to information management
 
Technology in Legal Collection - RMA - presentation - Francois Sauvageau
Technology in Legal Collection - RMA - presentation - Francois SauvageauTechnology in Legal Collection - RMA - presentation - Francois Sauvageau
Technology in Legal Collection - RMA - presentation - Francois Sauvageau
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
 
DocuClassify - AutoClassification at its best
DocuClassify - AutoClassification at its bestDocuClassify - AutoClassification at its best
DocuClassify - AutoClassification at its best
 
Requirements for Implementing Data-Centric ABAC
Requirements for Implementing Data-Centric ABAC Requirements for Implementing Data-Centric ABAC
Requirements for Implementing Data-Centric ABAC
 
SKOS - Some Use Cases
SKOS - Some Use CasesSKOS - Some Use Cases
SKOS - Some Use Cases
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
 
Recommind-AXC-Data-Management-Intelligent-Information-Governance-DS
Recommind-AXC-Data-Management-Intelligent-Information-Governance-DSRecommind-AXC-Data-Management-Intelligent-Information-Governance-DS
Recommind-AXC-Data-Management-Intelligent-Information-Governance-DS
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
 
Semantic business applications - case examples - Ontology Summit 2011
Semantic business applications - case examples - Ontology Summit 2011Semantic business applications - case examples - Ontology Summit 2011
Semantic business applications - case examples - Ontology Summit 2011
 
Optimizing Your Physical Files Part 1
Optimizing Your Physical Files Part 1Optimizing Your Physical Files Part 1
Optimizing Your Physical Files Part 1
 
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
 
CESSI Digital Library Case Study Eng
CESSI Digital Library Case Study EngCESSI Digital Library Case Study Eng
CESSI Digital Library Case Study Eng
 
User-Driven Taxonomies
User-Driven TaxonomiesUser-Driven Taxonomies
User-Driven Taxonomies
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
 
Angels_in_our_Midst
Angels_in_our_MidstAngels_in_our_Midst
Angels_in_our_Midst
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Seven Ways Concept-Based Auto Categorization Tames Big Data

  • 1. oncept-based auto categorization, which automatically categorizes documents based on their actual content, not keywords or terms, is a fast, easy, and repeatable way to pinpoint only the most important docu- ments and e-mails among libraries spanning millions of files and messages. It is an estab- lished standard in legal e-discovery and U.S. intelligence, having proved itself defensible and highly scalable. By using sample documents containing the concepts being sought, concept-based auto categorization “looks” across an organi- zation's entire electronic content and finds others like them. Because it doesn’t depend on finding key words or terms, it is faster, easier, and far more accurate than lexicon- based taxonomy alternatives. With concept-based auto categorization, then, it’s no longer necessary to manually create – and constantly maintain – word- based taxonomies and complex rules in order to precisely and accurately classify large volumes of unstructured big data and improve the "findability" of information. BenefitsofAccurateCategorization In enterprise content management, increased categorization accuracy enables better content lifecycle management, improved sharing among internal and exter- nal audiences, more effective document and records management for disposal, retention, and compliance, and reduced exposure to the cost of future legal matters. Concept-based auto categorization is now in its early stages of adoption to help tame big data, reducing the burden while simultaneously capitalizing on its hidden value. It does this by helping organizations: 1.Dispose of redundant, outdated and trivial (ROT) documents and e-mails. Sample documents of such things as spam, old e-mail newsletters, and outdat- ed marketing documents can be used as examples to find similar documents that ©2013 ARMA International, www.arma.org C value for the organization or is not marked for retention through compliance could be an unnecessary liability and increase the cost burden to cull through in any future litigations. Sensitive customer data, such as medical records, Social Security numbers, credit card numbers, or – worse yet – illic- it materials, are a virtual time bomb. Concept-based auto categorization can reduce risks by enabling you to identify these materials, dispose of them in a high- ly defensible way, and demonstrate that your company’s information governance policies are enforceable and consistent. 7.Autocategorizeinanylanguage. Breaking down language barriers with language- agnostic document classification means that all of the benefits of taming big data, as well as the mitigation of big data’s neg- ative impact, can be applied in global organizations without requiring native language speakers for every language in which the enterprise generates content. Despite the hype around big data, few will disagree that it poses challenges, as well as benefits if managed properly, and fewer still will disagree that it’s going away anytime soon. Throwing more bodies at the problem is simply not practical, as the volume, veloc- ity, and variety of content comprising big data are accelerating. Concept-based auto categorization has proven itself as a highly effective, extremely fast, and incredibly precise approach; the possibilities are endless for applying it to big data to address its major obstacles and to harvest its broad benefits. can be considered for disposal, dramati- cally reducing the clutter without having to manually inspect each document and e-mail. 2. Maintain archiving regulatory compli- ance. Oncethejunkhasbeenpareddown, concept-based categorization can be used to enable greater precision in determining exactly which documents and messages are required to be archived – and for how long – according to your company’s reten- tion policy and regulatory requirements. 3.Improve cross-functional, divisional, and external content sharing and col- laboration. Concept-based auto catego- rization makes documents much easier to find, dramatically improving collabora- tion, sharing, and syndication of your valuable content. With internal research assets and intellectual property that can be leveraged elsewhere in the enterprise, or content generated for external consump- tion, auto categorization dramatically improves the ability of users to consume and properly apply these information assets. 4.Improve content lifecycle management amidst evolving terms and categories. With new terms and categories constantly being introduced, concept-based auto cat- egorization keeps document libraries cur- rent and can even apply the right catego- rization decisions to documents that con- tain the newer terms, without having to define or update dictionaries, thesauri, keywords, or meta tags. 5.Integrate – and dis-integrate – content through mergers, acquisitions, and divestitures. Concept-based auto catego- rization groups similar content together, applying uniform categories to content across all divisional boundaries and dis- parate taxonomies inherent with mergers and carving out conceptually related doc- uments for a divestiture. 6.Improve security, privacy, and risk mit- igation. Content that either no longer has About Content Analyst Company LLC Content Analyst Company LLC’s soft- ware provides advanced, conceptual- based search, classification, and docu- ment analysis. For more information on the capabilities and value of advanced analytics,visitwww.ContentAnalyst.com. Concept-Based Auto Categorization: Seven Ways it Tames Big Data