Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ARMA Winnipeg | AI Auto-classification

148 views

Published on

Presentation by Amitabh Srivastav at the ARMA Winnipeg event in January 2020.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ARMA Winnipeg | AI Auto-classification

  1. 1. Artificial Intelligence and Auto- Classification: Are They a Silver Bullet for Records Management and Compliance? Amitabh Srivastav, VP, Operations & Governance ARMA Winnipeg January 29, 2020
  2. 2. Agenda – Part 1 2January 29, 2020 Introduction Terms and Definitions Digital Transformation Journey Unstructured Information 1 4 2 3
  3. 3. Agenda – Part 2 3 Information Chaos What is Artificial Intelligence? THEMIS CS for Auto-classification Key Takeaways 5 8 6 7 January 29, 2020
  4. 4. “Difficulties are just things to overcome, after all” – Sir Ernest Henry Shackleton – Part 1 4January 29, 2020
  5. 5. 1 Introduction Part 1 5January 29, 2020
  6. 6. Profile Amitabh Srivastav IGP, CIP, PMP Since 2001 worked with ECM technologies and implemented enterprise- wide programs and projects worth several millions of dollars to clients in the public and private sectors Extensive IG / IM experience with a strong portfolio of qualifications in strategy, transformation, and risk management Combine strategic IG / IM thinking, risk management techniques, and practical implementation experience Provide CxO / VP-level consulting advice on current technology solutions, industry trends, and best practices with a focus on digital transformation, change management, records threat management, and compliance 6January 29, 2020
  7. 7. HELUX Highlights 7 • A Microsoft Preferred Partner in Content Services specializing in SharePoint, O365, and Cloud technologies • Using AI and machine learning, our THEMIS products re-imagine the way we do information management and digital transformation
  8. 8. Sample HELUX Clients 8January 29, 2020
  9. 9. 2 Terms and Definitions Part 1 9January 29, 2020
  10. 10. Key concepts 10 Information Governance Compliance Security Risk Management Data Management Digital Rights Management January 29, 2020
  11. 11. Key concepts (cont.) 11 Information Governance Digital Asset Management Content Services Information Management Data Analytics ?? … January 29, 2020
  12. 12. Key concepts (cont.) 12 Information Management Taxonomy Metadata Information Architecture Knowledge Management ECM / EDRMS User Experience User Interface File Plan Retention / Disposition Security Model Archiving January 29, 2020
  13. 13. Content Services (CS) is ECM+ 13 Document Management Records Management XaaSECM+ Content Services CaaS, MCaaS, DaaS, BaaS? January 29, 2020
  14. 14. Content Services and Microsoft’s Modern Approach to ECM+ … 14 Content Services Records Management Document Management Information Architecture Artificial Intelligence Auto- Classification User Experience User Interface File Plan Retention / Disposition Security Model Archiving January 29, 2020
  15. 15. … Content Services include … 15 Content Services Knowledge Management Search e-Discovery Digital Asset Management Digital Rights Management January 29, 2020
  16. 16. 3 Digital Transformation Journey Part 1 16January 29, 2020
  17. 17. The current state 17 85% will never be retrieved 50% are duplicates “… digital technologies, tools, and social media platforms now allow individuals to create information at a torrid pace and instantaneously share it globally …” (IGBoK, 1st Ed., p 111) “Information chaos and confusion are preventing organizations from achieving their digital transformation objectives. Many organizations believe they must modernize their information management strategy in order to meet this challenge and survive.” (John Brown, CEO, HELUX) January 29, 2020
  18. 18. Digital Transformation (DT) “pain points” Cyber Attacks Data Breaches BYOD Remote Workforce Change Manageme nt Cloud Services Content Repositorie s Informatio n Chaos Content Monetizatio n DT January 29, 2020
  19. 19. Digital Transformation enablers 19 Cloud Enablement Intelligent Capture Repository Neutral Content Integrated Collaboration Information Governance Content Services Auto- Classification Customer Experience 1 2 3 45 6 7 8 “… deploying capabilities or services … that exists outside the firewall.” “… workflows to convert physical information into digital formats using multiple channels … “ “… repositories that are independent of … different systems and underlying technology platforms …” “… technology platform … that allows teams to save, search, and share information assets …” “… the end-users' “felt experiences” … with an organization’s on-line services and digital products …” “… using rules … to automate how content is captured, analyzed, and governed over its lifecycle.” (AIIM) “… delivers content and / or services on demand, regardless of its source, to any device, and anywhere …” “… specification of decision rights and an accountability framework to encourage desirable behavior …” (Gartner) Source: Intelligent Information Management Maturity (I2M2) Model January 29, 2020
  20. 20. ARMA: “The structure and interrelationship of information, especially with an eye towards using business rules, observed use behaviors, and effective interface design to facilitate access to information.” (Glossary of Records Management and Information Governance Terms, 5th Edition, ARMA International TR 22- 2016) 20 What is Information Architecture (IA)? Treasury Board Secretariat: “Information Architecture is the structure of the information components of an enterprise, their interrelationships, and the principles and guidelines governing their design and evolution over time. Information architecture enables the sharing, reuse, horizontal aggregation, and analysis of information.” (The TBS Information Management Policy, Govt. of Canada) January 29, 2020
  21. 21. Information Architecture applies structure to content sources 21 Social Media Blogs Information Architecture Videos / Pictures Audio Emails and Documents Direct Messages January 29, 2020
  22. 22. 2222 Information Architecture auto-classifies content Information Architecture and Content Services January 29, 2020
  23. 23. 4 Unstructured Information Part 1 23January 29, 2020
  24. 24. Unstructured content growthUnstructured content growth 30M 15GB 12.5EB 25TB 30,000,000,000,000 24
  25. 25. The Cost of Search 25  49% said they have trouble locating documents  43% have trouble with document approval requests and document sharing  33% struggle with the document versioning The average knowledge workers spends: 2.5 hours per day 15% to 30% of the workday searching for information (IDC) The inability to find and retrieve document costs organization, that employ 1,000 workers, $25 million per year January 29, 2020
  26. 26. The High Cost of Document 26 For every $1 spent to create a document $10 is spent on management 30 billion documents are created every year (McKinseyGlobalInstitute)85% will never be retrieved 85% 50% are duplicates 60% are obsolete 50% 60% Document Creation Document Management January 29, 2020
  27. 27. Unstructured information comes from … 27 85% will never be retrieved 50% are duplicates Source: https://www.statista.com/chart/17518/internet-use-one-minute/ January 29, 2020
  28. 28. Risk of losing control of information Content sprawl Risk of data breaches Unmanaged grown Risk of non- compliance Poor Governanc e Risk of litigation Information leaks … result in information chaos 28 85% will never be retrieved 50% are duplicates January 29, 2020
  29. 29. Agenda – Part 1 recap 29 Introduction Terms and Definitions Digital Transformation Journey Unstructured Information 1 4 2 3 January 29, 2020
  30. 30. Agenda – Part 2 recap 30 Information Chaos What is Artificial Intelligence? THEMIS CS for Auto-classification Key Takeaways 5 8 6 7 January 29, 2020
  31. 31. 5 Information Chaos Part 2 31January 29, 2020
  32. 32. Beware of the “Document Chaos Monster” 32 Data Security Storage Costs User Productivity Compliance Chaos Monster Victims Office 365Adoption SILOED CONTENT UNSTRUCTURED CONTENT ROT CONTENT (Redundant, Obsolete, or Trivial) MISCLASSIFIED CONTENT January 29, 2020
  33. 33. The Challenge of Taming the “Document Chaos Monster” 33 SILOED CONTENT UNSTRUCTURED CONTENT ROT CONTENT (Redundant, Obsolete, or Trivial) MISCLASSIFIED CONTENT  Rely on Users to Classify Documents - inconsistent, incomplete, lack of knowledge  Automated Classification Processes - not smart enough, incomplete processes  Classification Workflows - incomplete, inconsistent, reliance legacy date classification codes  AI Auto-Classification - intelligent, complete, up to date, scalable to large data sets, consistent, ongoing January 29, 2020
  34. 34. “Document Chaos Monster pain points” 34 Internal Drivers • e-Discovery • Records management • Analytics for decision-making • Metrics for predictive analytics • Process inefficiencies • Uncontrolled storage costs • ROT • Business continuity and resiliency • Disaster recovery External Drivers • Privacy regulations • Regulatory fines • Consumer trust • Competitive environment • Political and legal environment • Reputational damage • Monetize content • Digital rights January 29, 2020
  35. 35. 6 What is Artificial Intelligence Part 2 35January 29, 2020
  36. 36. “At last … an AI solution!” 36 85% will never be retrieved 50% are duplicates“Artificial intelligence (AI) has crossed the chasm; more companies and more executives than ever before have come to realize the value that augmented intelligence offers their firms. These companies have actively moved to implement the technology in their organizations.” (AI's Disruption Of Data Management: Is A Different Approach Needed?, https://www.forbes.com/sites/forbestechcouncil/2019/09/06/ais-disruption-of-data- management-is-a-different-approach-needed/#1061832a24df) January 29, 2020
  37. 37. 37 In the movies you see AI, but … January 29, 2020
  38. 38. .. when is AI the same as natural intelligence? 38January 29, 2020
  39. 39. AI Concepts 39 Artificial General Intelligence Natural Intelligence Artificial Intelligence Neural Networks Machine Learning Expert Systems Deep Learning January 29, 2020
  40. 40. Natural Intelligence (NI): “… is the opposite of artificial intelligence: it is all the systems of control present in biology.” (http://www.cs.bath.ac.uk/~jjb/web/uni.html) 40 Definitions January 29, 2020
  41. 41. Artificial General Intelligence (AGI): “… is the intelligence of a machine that has the capacity to understand or learn any intellectual task that a human being can.” (www.wikipedia.org) 41 Definitions January 29, 2020
  42. 42. Artificial Intelligence (AI): “… is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans, … that perceives its environment, and learns, makes decisions, and takes actions that maximize its chances of successfully achieving its goals without human input.” (Amitabh Srivastav, HELUX) 42 Definitions (modified from Wikipedia) Red text is my modification January 29, 2020
  43. 43. Neural Networks: “A neural network is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes.” (www.wikipedia.org) 43 Definitions https://commons.wikimedia.org/w/index.php?curid=5084582 January 29, 2020
  44. 44. Machine Learning (ML): “… is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.” (www.wikipedia.org) 44 Definitions January 29, 2020
  45. 45. Deep Learning (DL): “Deep learning is part of a broader family of machine learning methods based on artificial neural networks.” (www.wikipeida.org) 45 Definitions January 29, 2020
  46. 46. Supervised Learning: Is training the classifier using many labeled examples such as images of a child playing with a dog and learning to differentiate between the two. Another example is recognizing handwriting. (“Unsupervised Deep Learning Recommender System for Personal Computer Users”, NTELLI 2017 : The Sixth International Conference on Intelligent Systems and Applications (includes InManEnt)) 46 AI deep learning models for auto- classification January 29, 2020
  47. 47. Unsupervised Learning: Is the process of the classifier learning without labeled examples organized into a dataset and there is no feedback to the classifier (“Unsupervised Deep Learning Recommender System for Personal Computer Users”, NTELLI 2017 : The Sixth International Conference on Intelligent Systems and Applications (includes InManEnt)) 47 AI deep learning models for auto- classification January 29, 2020
  48. 48. Semi-supervised Learning: It uses a small amount of labeled data bolstering a larger set of unlabeled data (“Unsupervised Deep Learning Recommender System for Personal Computer Users”, NTELLI 2017 : The Sixth International Conference on Intelligent Systems and Applications (includes InManEnt)) 48 AI deep learning models for auto- classification January 29, 2020
  49. 49. Transfer Learning: Is an approach in which the classifier is trained on data that is augmented by some other already trained model (“Unsupervised Deep Learning Recommender System for Personal Computer Users”, NTELLI 2017 : The Sixth International Conference on Intelligent Systems and Applications (includes InManEnt)) 49 AI deep learning models for auto- classification January 29, 2020
  50. 50. Reinforcement learning Is useful in use cases where the feedback to the learning system only arrives after some end state is reached, or after a significant delay (“Unsupervised Deep Learning Recommender System for Personal Computer Users”, NTELLI 2017 : The Sixth International Conference on Intelligent Systems and Applications (includes InManEnt)) 50 AI deep learning models for auto- classification January 29, 2020
  51. 51. 51 Use AI to auto-classify and enable AI Complianc e e- Discovery Records Mgmt. ATIP Response s Open Govt. Archival Unstructure d Analytics January 29, 2020
  52. 52. Use AI to search repositories to classify content Laptops Desktops Cell phones Tablets On- premise Cloud storage Cloud services Hybrid Offsite storage Unstructured data accounts for 80% of content on devices and in repositories Very large amount of “dark data” stored in repositories AI January 29, 2020
  53. 53. Use AI to search repositories to classify content Laptops Desktops Cell phones Tablets On- premise Cloud storage Cloud services Hybrid Offsite storageAI Unstructured data accounts for 80% of content on devices and in repositories Very large amount of “dark data” stored in repositories
  54. 54. 54 Are these products AI “in action?” Product Description Alexa, Siri, Cortana, and Google Assistant These are virtual (voice / digital) assistants that respond to voice queries and use NLP to answer questions, make recommendations, and perform actions Watson An AI product from IBM that uses NLP to answer questions and is used in healthcare, education, weather forecasting, etc. Debater An AI project from IBM, designed to participate in a full live debates with expert human debaters January 29, 2020
  55. 55. 55 What about AI for content? Term Description Classifiers Classifiers can greatly increase the number of content items that are labeled by learning from the input data given to it and then using this knowledge to classify new observations Entity Extractors Extract and process information to identify and classify key elements from text into pre-defined categories to help transform unstructured data to structured data Image Recognizers Gives a machine the ability to interpret the input received through computer vision and categorize what it “sees” Independent Component Analysis Look for patterns in data that are not obvious to humans Machine to Machine Learning How will AI treat content, especially ethical consideration January 29, 2020
  56. 56. 56 What is Microsoft’s Project Cortex? “Project Cortex uses AI to create a knowledge network that reasons over your organization’s data and automatically organizes it into shared topics like projects and customers. It also delivers relevant knowledge to people across your organization through topic cards and topic pages in the apps they use every day.” (www.microsoft.com) January 29, 2020
  57. 57. 57 Using the metadata as a foundation Coherent across Microsoft 365 Discover enterprise content based on terms Consistent tagging experience with contextual term suggestions and Auto Tagging Improved enterprise content type syndication, discovery, and enforcement for consistent metadata schemas across tenant January 29, 2020
  58. 58. 7 THEMIS CS for Auto-classification Part 2 58January 29, 2020
  59. 59. 59 Information Architecture and AI? January 29, 2020 “There is no AI without IA!” - John Brown, CEO, HELUX
  60. 60. How does THEMIS CS “Slay the Monster?” 60 Unstructured Content Information Architecture Design Artificial Intelligence Rules Structured Content with metadata Internal Drivers January 29, 2020
  61. 61. THEMIS CS can search repositories to classify content Laptops Desktops Cell phones Tablets On- premise Cloud storage Cloud services Hybrid Offsite storage Unstructured content on devices and in repositories Very large amount of “dark data” stored in repositories THEMIS CS 61January 29, 2020
  62. 62. THEMIS CS bookends the Office 365 content lifecycle Create & Capture Classify Document Management Collaborate Search Share & Distribute Content Preparation Turningdocumentchaosinto order, control,andstructure Automated InformationArchitecture Deployment Auto-Classification of Unstructured Content Compliance & GovernanceCollaboration & Document ManagementMigration & Organization Ongoing Governance Ensuring ongoingdocument control,governance,findability,& organization InformationArchitecture Auto-Classification of UnstructuredContent RecordsManagement 62January 29, 2020
  63. 63. OLD WAY Use Excel or Word SLOW, TEDIOUS, AND COSTLY PROCESS SPECIALIZED TEAM Rinse & Repeat Requirements Gathering IA Assembly in Excel Send to Development Back to Users User Acceptance (Maybe!) IA Expert IT IM Developer MODERN WAY CREATE IA INTUITIVELY ONLINE ANY TEAM Import & Analysis Design & Visualization User Acceptance Rapid Deployment CREATE IA MANUALLY OFFLINE QUICK, PAINLESS PROCESS Your Team IA … then and now … using THEMIS CS 63January 29, 2020
  64. 64. 64 IA made easy using THEMIS CS THEMIS CS turns the complex, time-intensive task of building and maintaining a robust IAinto an automated, intelligent process using AI. Cut your deployment times by 50% or more Guaranteed 100% error-free deployments Effective Team Collaboration January 29, 2020
  65. 65. 65 IA made easy using the THEMIS CS process IA Visualization Publish and GO LIVE! 1 2 3 THEMIS IA Designer IA Analysis Wizard Iteration and Acceptance Deploy to Sandbox 4 5 6 THEMIS CS’s step-by-step wizard will ensure you deploy an error-free IA built on top of industry best practices January 29, 2020
  66. 66. 66 THEMIS CS features for auto-classification THEMIS CS Information Architecture Assistant using a Chatbot THEMIS Blueprint Information Architecture Assistant using Machine Learning Chatbot assists in designing the information architecture and configuring GCdocs or SharePoint ML recommends appropriate taxonomy, folder structure (GCdocs), site structure (SharePoint) based on user-provided information and using best practices HELUX-hosted service that stores information architecture blueprint snippets and then builds a blueprint based on best practices January 29, 2020
  67. 67. Eight ways THEMIS CS enables Digital Transformation 67 Digitize Paper Analyze ROT Import Information Architecture Apply AI Rules Accurate Auto- Classification Rapid Deployment Improve e-Discovery Increase ROI on Content 1 2 3 45 6 7 8 January 29, 2020
  68. 68. 68 THEMIS CS uses AI for ROT analysis January 29, 2020 Inventory the target content sources • Shared drives • Laptops / Desktops • Off-line repositories • Cloud storage • Mobile devices Rules to identify the content types • Personally information • Health information • Employee information • Confidential data • Public information Schema to classify content • File plan • Taxonomy • Metadata • Retention and disposition schedule Consider additional rules • Relevant regulations • Industry standards • Best practices
  69. 69. 69 THEMIS CS uses AI for email auto- categorization January 29, 2020 Extract email headers • From • To • Subject • Date • Copied to • Attachments • Etc. Rules to identify email content • Personal information • Health information • Employee information • Confidential data • Public information Identify duplicate emails • Duplicate threshold • “Near duplicates” Put unknown emails into “quarantine” • Does not match any rules • Matches rules for further analysis Schema to classify email and content • File plan • Taxonomy • Metadata • Retention and disposition schedule
  70. 70. 70 THEMIS CS Use Case for AI and Auto- Classification Problem Description Business Challenge Solution Benefits • Several terabytes of pictures and videos on share drives • Many years worth of physical pictures • Difficult to work with physical pictures • Storage costs are increasing • Volume of content is increasing • Identify duplicates and “near duplicates” • Identify content to dispose • Tag content with appropriate metadata • Retain content for on- going operations and possible litigation • Reduce storage costs • Accurately and consistently classify digital the physical content • Use THEMIS IA to rapidly build the architecture, rules, and metadata to tag content • Use THEMIS AI to search the shared drives and apply the rules and auto-classify the content • Use THEMIS RM to apply the retention and disposition schedules to the auto-classified content • Correctly and consistently auto- classify content • Identify and dispose of ROT • Improve accuracy of search • Improve e-Discovery • Reduce storage costs • Teach THEMIS AI additional rules to auto- classify more content • THEMIS AI is “resource” available 24/7 to handle increasing volumes January 29, 2020
  71. 71. 8 Key Takeaways Part 2 71January 29, 2020
  72. 72. THEMIS CS controls the “Document Chaos Monster” 72 SILOED CONTENT UNSTRUCTURED CONTENT ROT CONTENT (Redundant, Obsolete, or Trivial) MISCLASSIFIED CONTENT Data Security Storage Costs User Productivity Compliance Chaos Monster Victims Office 365Adoption January 29, 2020
  73. 73. 73 The THEMIS CS Advantage THEMIS CS Manually build IA with spreadsheets Hire consultant Time to Deployment Cost Effective Accuracy Error Free User Satisfaction Ongoing Monitoring & Improvements January 29, 2020
  74. 74. IA integrates: File plan Taxonomy Retention and disposition schedules Security groups, permissions, and user accounts Metadata Content types, document types, categories and attributes 74 THEMIS CS benefits IA enables: Improved UX and UI via better navigation Improved search experience via better auto-classification Improved collaboration and knowledge sharing via a more intuitive design Improved change management via more user awareness and easier user adoptions January 29, 2020
  75. 75. Thank you Amitabh Srivastav, VP, Operations & Governance MSc (Proj Mgmt), MSc (Comp Sc), BCSc (Hons) amitabh@heluxsystems.com www.linkedin.com/in/amitabhsrivastav @am_srivastav 75 Please contact or follow HELUX at (613) 291-2683 https://www.heluxsystems.com info@heluxsystems.com @HeluxSystems https://www.linkedin.com/company/helux-systems/ https://www.facebook.com/HELUXSystems/ Amitabh Srivastav IGP, CIP, PMP January 29, 2020

×