M12S20 - Analytics: The New Way to Manage e-Records


Published on

Speakers: Doug Magnuson, Thomas E. Reding, Brian Tuemmler, & Marcia Zweerink, Ph.D.

Data is growing at an astounding pace. The volume has long since outstripped the capacity of manual intervention in e-records management.

As organizations struggle to improve RM their compliance and better their information governance, they are recognizing the critical need a for a much better (and probably radically different) way to manage their e-records and legacy data.

An increasingly promising answer is content analytics - which, through more advanced tools, increasingly reduces business risks and provides new competitive advantages.Now, many believe that the time has come for content analytics to play an important role in the management of e-records and legacy data.

Read more: http://www.rimeducation.com/videos/rimondemand.php

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

M12S20 - Analytics: The New Way to Manage e-Records

  1. 1. Cohasset Associates, Inc. NOTES Session 20 Analytics The New Way to Manage e-Records Tuesday, May 08, 2012 3:15 – 4:30 pm G What is Analytics? Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry. p y Wikipedia 2 Why Analytics? Identification of Official Records Cleaning up ‘stuff’ Finding relevant files for e-discovery g y Business insight 3 20.12012 Managing Electronic Records Conference
  2. 2. Cohasset Associates, Inc. NOTES Why Analytics? Ginormous volume 10 TB at 1 minute / doc = 311 years Those unreliable users g Evolving locations 4 "In theory there is no difference between theory and practice. In practice there is." p Yogi Berra 5 "If you come to a fork in the road, take it." Yogi Berra 6 20.22012 Managing Electronic Records Conference
  3. 3. Cohasset Associates, Inc. NOTES Enterprise Content Management Doug Magnuson IBM © 2012 IBM Corporation Enterprise Content Management Traditional approaches are converging More than keyword  Analyzing unstructured  search is needed content no longer optional “Making unstructured data  “For many business process  searchable is now a presumed  professionals, access to structured  primary interface for applications of  Enterprise Business data, even when supported by BI or  all kinds, as well as for intranets  predictive analytics, lacks sufficient  and content repositories.”  Search Intelligence context for customer  service, finance, and other areas where  – Whit Andrews, Rita Knox Gartner , communications with customers involves  Content  many channels” Analytics – Craig Le Clair Forrester Increasing in business  Converging toward  importance Text content analytics “Early adopters of [text analytics]  Analytics “Every enterprise should understand  are already gaining a competitive  how content analytics can produce  advantage. Organizations that fail to  answers to its critical questions;  do so will be at risk.” understanding this now will make it  possible to exploit these tools as their  – Sue Feldman IDC availability proliferates.” – Rita Knox Gartner 8 © 2012 IBM Corporation Enterprise Content Management Content Analytics Explained Analyzed Content Extracted Claimant: Soft Tissue Injury Concept (and Data) Person Injury Body Part Location Noun Verb Noun Phrase Prep Phrase John sprained his ankle on the step ... Source Information Internal (ECM, Files, DBMS, etc.) and External (Social, News, etc.) What is Natural Language Processing? NLP describes a set of linguistic, statistical, and machine learning techniques that allow text to be analyzed and key information extracted for business integration 9 © 2012 IBM Corporation 20.32012 Managing Electronic Records Conference
  4. 4. Cohasset Associates, Inc. NOTES Enterprise Content Management Real language is real hard Chess  A finite, mathematically well-defined search space  Limited number of moves and states  Grounded in explicit, unambiguous mathematical rules Human Language  Ambiguous, contextual and implicit  Contains slang, riddles, idioms, abbreviations, acronyms and more  Grounded only in human cognition  Seemingly infinite number of ways to express the same concepts and meaning 10 © 2012 IBM Corporation Enterprise Content Management The key is: understanding natural language with confidence and accuracy  Where was Einstein born? Unstructured Structured One day, from among his city views of Ulm, Otto chose a watercolor to send to Alb t Ei t i t Albert Einstein as a remembrance b of Einstein’s birthplace.  Welch ran this? If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE. 11 © 2012 IBM Corporation Enterprise Content Management Things we learned from The Jeopardy! Challenge 5 key dimensions to drive the technology $200 If youre standing, its the  direction you should look  to check out the  wainscoting Broad/open domain  $800 Complex language C l l In cell division, mitosis  splits the nucleus &  cytokinesis splits this liquid  High precision cushioning the nucleus Accurate confidence $1000 Of the 4 countries in the  world that the U.S. does  High speed not have diplomatic  relations with, the one  that’s farthest north 12 © 2012 IBM Corporation 20.42012 Managing Electronic Records Conference
  5. 5. Cohasset Associates, Inc. NOTES Enterprise Content Management Decision Plans Layer Multiple Methods for Records classification Consistency Consistent High Participation & Multiple Enforcement Accuracy Methods Context Based Imply Classification Rules Based Inspect Classification Decision Plans combine approaches t bi h to Ask classification Manual Classification Cost Savings Low Productivity Low High Context-based classification delivers high accuracy, rules-based classification addresses hard-and-fast requirements. Combining methods delivers the best results. 13 © 2012 IBM Corporation Enterprise Content Management High Rule Systems - the Effect of Real-Time Learning Multiple Methods Context Based Classification Rules Based Classification Use rule systems to act on existing meta data available in the Manual process, content system or document properties. Low Classification Low High 14 © 2012 IBM Corporation Enterprise Content Management High Context Based Classification Multiple Methods Context Based Classification Rules Based Classification Use context based classification to inspect the document when Manual there is not enough meta data already available Low Classification Low High Simple rules or keyword based analysis can be too coarse to make fine distinctions between long-form texts with very different intent 15 © 2012 IBM Corporation 20.52012 Managing Electronic Records Conference
  6. 6. Cohasset Associates, Inc. NOTES Enterprise Content Management High Critical dimensions of classification: Multiple Methods Context Based Classification Magnified by exploding volumes Rules Based Classification Manual Classification Low Low High Use manual classification for high value documents or when other methods do not provide enough information. Manual Automated Accuracy y X 92% 60 – 90% 46% Cost (per doc) $ 0.17 < $ 0.01 Consistency <50% 100% Increasing volume and variety of information magnifies the challenges of consistency and cost burdens 16 © 2012 IBM Corporation Enterprise Content Management Quickly Understand Timeline & Essence of Custodian & Business Information Quickly get a view of the people, sender and recipient domains, and companies involved. Combine facets and filters to quickly include and eliminate custodians and data – such as people from certain locations or other combination. Automatically extracted phrases in the content show the essence of the information. f th i f ti Organize a topographical view by key category. The “peaks” show frequency and phrases to quickly identify relevant information. 17 © 2012 IBM Corporation Tom Reding, CRM Principal, Information Governance Practice tom.reding@emc.com 20.62012 Managing Electronic Records Conference
  7. 7. Cohasset Associates, Inc. NOTES CIS - Automated Analytics Content Analytics CONTENT CONTENT TEXT ENTITIES RELATIONSHIPS EASILY ADDED ANALYZED EXTRACTED STORED FOUND © Copyright 2011 EMC Corporation. All rights reserved. 19 Discover and Act on Legacy Information File Intelligence – Understanding what you have File System Intelligently Identify Records Migrate & Secure Records Email Server SharePoint Secure Repository File Intelligence Content + Repositories Retention Policy Personal Email Archives Notebook and Desktop © Copyright 2011 EMC Corporation. All rights reserved. 20 How File Intelligence Works Catalog Analyze Act Classify Search Report Crawl data Classify files based on metadata keyword metadata, Robust action sources content, and pattern matching set Build index – Move, copy, – Metadata basic Age, owner, location, file type, etc. delete, – Metadata with retain, document type Business value, security risk, intellectual export, tag – Metadata with property, PII, PCI hash Policy-based – Deep crawl full Analyze data with search and report tools actions text – Semantic search with – One-time – Deep crawl Boolean, proximity, stemming, phrase support with – Scheduled – More than 30 pre-built reports out of the box classification – Recurring – Custom reports as needed © Copyright 2011 EMC Corporation. All rights reserved. 21 20.72012 Managing Electronic Records Conference
  8. 8. Cohasset Associates, Inc. NOTES Solution Overview Secure, Retain, Discover 3 Enterprise Retention 1 2 File Content Intelligence Capture / Archiving Electronic Discovery 4 • Crawl, Index , analyze, search, report information repositories in-place File Intelligence • Take action upon the discovered information assets RPS • Examples: Decommission non-required information in-place, capture & classify records © Copyright 2011 EMC Corporation. All rights reserved. 22 Experience with File Intelligence • ~24% of unstructured data is actively Active, used known, 24% • ~48% is stale: not touched in 6 months relevant • ~18% are duplicates • ~6% is unknown or orphaned • ~4% is not business related - pictures Stale 48% • Cost to the Customer: – It consumes expensive storage capacity – It gets managed, backed Duplicates 18% up, replicated, ... – It poses serious legal & compliance 6% Unknown risks Non-business 4% – It gets recovered equally in a DR related scenario * Results from 37 Kazeon customer assessments Stale is defined as files not accessed or modified for 6 months © Copyright 2011 EMC Corporation. All rights reserved. 23 Analytics for eDiscovery Full Case Management Workflow Case Tracking Preservation Notification – New case creation – Collection Status – E-mail notification to custodians – Assignment of lead attorney & reviewers – Document Review Status – Customize e-mail messages per matter – Case specific collection & culling – Reviewer Workload – Full tracking during hold notice lifecycle – Legal case processing – Reviewer Progress Tracking – Automatic reminders – Document review & analysis – Custodian or Proxy acknowledgment  Early case assessment throughout the Legal Hold Notices eDiscovery process Keyword Hit Report  Identify relevant, important data  Analytics on relevant legal case data  E-mail communication threads  Prepare for FRCP meetings E-mail Threading Custodian & Concept Analysis 24 © Copyright 2011 EMC Corporation. All rights reserved. 24 20.82012 Managing Electronic Records Conference
  9. 9. Cohasset Associates, Inc. NOTES Network Drive Transformation Analytics Brian Tuemmler Information Management for Everyone 25 Enterprise Knowledge Case  Data SAP Mgt GIS Tech  Wikis & Blogs Supp Dev Share ECRM Point Other Shared Drives M: Hard Copy Central Off site Desks Information Management for Everyone 26 Analytics Perspective Taxonomy  Development Records  ICM Tools Retention ECM  C Image  I Repository  Conversion  Architecture Services ERM &  Cleanup  Compliance  and  Preservation Policies Transform Information Management for Everyone 27 20.92012 Managing Electronic Records Conference
  10. 10. Cohasset Associates, Inc. NOTES Program Approach Repeat by Repeat by Share Workgroup Strategy  Record  Governance  Group cleanup Definition and Policy IT  Individual  Content  Infrastructure  cleanup Migration (Mapping) Query  Auto cleanup Improvement Definition Information Management for Everyone 28 Cleanup Categories Recent Capture Voluminous Database Non‐capturable Important Application For review Garbage  Delete opportunities •Large  Scheduled ed duplicates Expire Past  Past Manually D •Photos &  retention  Identified  Opt in Delete media date garbage ‐ "To  be deleted" Temporary Auto‐Delete Policy  Backup deletes Zero content User Intent Information Management for Everyone 29 Example 80000 Category Volume by Year 60000 40000 20000 Storage 0 1997 1998 1999 2000 Document Image 2001 Content 2002 Photo 2003 2004 Database 2005 Archive 2006 Media 2007 Application 2008 2009 Information Management for Everyone 30 20.102012 Managing Electronic Records Conference
  11. 11. Cohasset Associates, Inc. NOTES Discussion 31 Contact Information Marcy Zweerink Marcy.Zweerink@Cohasset.com Doug Magnuson dmagnuson@us.ibm.com Tom Reding Tom.Reding@emc.com Brian Tuemmler Brian.Tuemmler@gimmal.com 32 20.112012 Managing Electronic Records Conference