Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Nick Patience, Director Product Marketing & Strategy at Recommind - Big Data: The role of The CIO in dealing with large amounts of unstructured information

418

Published on

Nick Patience, Director Product Marketing & Strategy at Recommind spoke at the CIO Event, March 2013

Nick Patience, Director Product Marketing & Strategy at Recommind spoke at the CIO Event, March 2013

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
418
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Unmanaged file servers can pose legal and compliance risks and may cause vast disruption if a company is asked to produce documents to a court or for an internal enquiry and this is an essential starting point in that process. Unmanaged information also represents a risk because it makes it hard to find information – particularly when much of that information is unstructuredWhen litigation occurs, if information cannot be found, organisations may ultimately face court sanctions – or settle on punitive termsCompliance risks - sensitive client information may reside on servers that are not managed may be misused, lost or even destroyed.
  • Who is answering this - ?
  • How do we Preserve email in accordance with Departments records management policies and regulatory requirementsUp to 10k users with thousands of emails a day who’s emails are not being classifiedData everywhereOver 50,000 emails received a dayInconsistency classification when done manuallyLack of ownershipRequirementsEasy for employees to useComplies with Departments RM policiesAutomatically categorizes recordsOperates within the Departments information architectureIs easily modifiable to meet future needs
  • Transcript

    • 1. BIG DATA: THE ROLE OF THE CIO IN DEALING WITHLARGE AMOUNTS OF UNSTRUCTURED INFORMATION Nick Patience, Director, product marketing & strategy March 19 2013 RECOMMIND PROPRIETARY & CONFIDENTIAL | 1
    • 2. ABOUT RECOMMIND Founded in 2001 450+ employees Recognized leader by top analyst firms ˗ Gartner Magic Quadrant Leader ˗ IDC MarketScape Leader Offices in San Francisco, Boston, NYC, London, Bonn & Sydney RECOMMIND PROPRIETARY & CONFIDENTIAL | 2
    • 3. WHAT WE DO… Software solutions & infrastructure for large- volume unstructured information management and analysis RECOMMIND PROPRIETARY & CONFIDENTIAL | 3
    • 4. PRODUCTS AND SOLUTIONSVERTICALMARKETSENTERPRISEAPPLICATIONS 3rd Solutions Party NoSQL ENRICHED ANALYTICSCORE DATABASE INDEXPLATFORMENTERPRISEDATA Databases Machine Office System Social ESI Email Web XML Data Documents Logs Media RECOMMIND PROPRIETARY & CONFIDENTIAL | 4
    • 5. SAMPLE CUSTOMERS RECOMMIND PROPRIETARY & CONFIDENTIAL | 5
    • 6. AGENDA Big Data and the importance of analysing both structured and unstructured information Role of the CIO in helping to alleviate risk & compliance issues within the enterprise How to categorise, find, manage and analyse information from disparate repositories into one overarching platform RECOMMIND PROPRIETARY & CONFIDENTIAL | 6
    • 7. BIG DATA RISKS &OPPORTUNITIESAND WHAT YOU CAN DO TO AVOID ONE AND EMBRACE THE OTHER RECOMMIND PROPRIETARY & CONFIDENTIAL | 7
    • 8. BIG DATA DEFINITION “ Data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. ” Source: Edd Dumbill, Forbes RECOMMIND PROPRIETARY & CONFIDENTIAL | 8
    • 9. VOLUME, VARIETY AND VELOCITYFURTHER DEFINE BIG DATA VOLUME (Petabytes) >3500 >2000 North America Europe >250 China >400 Japan >50 >200 >50 Middle East India Latin America VARIETY VELOCITYPEOPLE PEOPLE MACHINE TO 2.9 MILLION 20 HOURS 50 MILLIONTO PEOPLE TO MACHINE MACHINE Emails sent every Of video uploaded Tweets perEmail, social Medical devices, Sensors, GPS, bar second every minute daynetworks, blogs ecommerce, bank code scanners transactions Source: McKinsey, comScore, Radicati RECOMMIND PROPRIETARY & CONFIDENTIAL | 9
    • 10. TRADITIONAL VS. BIG DATATRADITIONAL DATA BIG DATAGigabytes to Terabytes Petabytes to ExabytesCentralised DistributedStructured UnstructuredKnown Relationships Complex, Undefined Interrelationships RECOMMIND PROPRIETARY & CONFIDENTIAL | 10
    • 11. MASSIVE GROWTH IN UNSTRUCTURED CONTENT Worldwide Corporate Data Growth 80% of Data Growth is Unstructured 45,000 40,000 35,000 Exabytes 30,000 25,000 20,000 15,000 10,000 5,000 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020Source: IDCThe Digital Universe, Dec 2012 Structured Data Unstructured Data RECOMMIND PROPRIETARY & CONFIDENTIAL | 11
    • 12. WHAT BIG DATA IS NOT… NOW THEN Doing what you did before - just scaled up RECOMMIND PROPRIETARY & CONFIDENTIAL | 12
    • 13. BUT… You can do things now you could not do a few years ago because of: • New analytics techniques • Faster, more powerful computers • New architectures, such as the cloud RECOMMIND PROPRIETARY & CONFIDENTIAL | 13
    • 14. AGENDA Big Data and the importance of analysing both structured and unstructured information Role of the CIO in helping to alleviate risk & compliance issues within the enterprise How to categorise, find, manage and analyse information from disparate repositories into one overarching platform RECOMMIND PROPRIETARY & CONFIDENTIAL | 14
    • 15. [1]Source: Horison Information Strategies RECOMMIND PROPRIETARY & CONFIDENTIAL | 15
    • 16. INFORMATION VALUE DECLINES OVERTIME, COST AND RISK DO NOT Risk-value delta Cost-value delta RECOMMIND PROPRIETARY & CONFIDENTIAL | 16
    • 17. SPECIFIC BIG DATA RISKS • Unmanaged file servers can pose legal and compliance risks • Unmanaged information represents a risk because it makes it hard to find • When litigation occurs, if information cannot be found, organisations may ultimately face court sanctions • Compliance risks - sensitive client information may reside on servers that are not managed may be misused, lost or even destroyed RECOMMIND PROPRIETARY & CONFIDENTIAL | 17
    • 18. BIG DATA OPPORTUNITIES RECOMMIND PROPRIETARY & CONFIDENTIAL | 18
    • 19. BIG DATA MARKET OPPORTUNITIES BYINDUSTRY Source: Gartner RECOMMIND PROPRIETARY & CONFIDENTIAL | 19
    • 20. BIG DATA INVESTMENTS BY INDUSTRY This Year Next Year Within 2 Years 17% 21% 15% 29% 11% 15% 17% 21% 15% 9% 11% 18% 17% 18% 12% 20% 18% 18% 12% 8% 39% 36% 36% 29% 31% 25% 21% 22% 23% 23% Source: Gartner RECOMMIND PROPRIETARY & CONFIDENTIAL | 20
    • 21. BIG DATA ANALYTICS - OPPORTUNITIES Recommendation Sentiment Marketing Campaign Fraud Engines Analysis Analysis Analytics Match and recommend Determine the how Improve accuracy of Identify fraudulent activity users to one another or to consumers feel about forecasting, prediction of and stolen credit cards products and services by particular buyer behavior by through active monitoring understanding profiles and companies, brands or reviewing increasingly of customer buyer behavior. products from Tweets and granular data, click behavior, historical and Facebook posts. streams, call details. transaction data. RECOMMIND PROPRIETARY & CONFIDENTIAL | 21
    • 22. FRAUD, WASTE AND ABUSE (FWA) INHEALTHCAREAccording to a 2010 white paper by the US National Health CareAnti-Fraud Association (NHCAA) The US Federal Bureau of Investigation (FBI) estimates that 3- 10% of $2.34 trillion spent on healthcare in 2008 was lost to fraud Represents $70-$234 billion annually $234 billion is roughly equivalent to the gross domestic product (GDP) of FinlandSource: http://www.nhcaa.org/media/5994/whitepaper_oct10.pdf RECOMMIND PROPRIETARY & CONFIDENTIAL | 22
    • 23. BIG DATA ANALYTICS - OPPORTUNITIES Customer Network Contract Patent Churn Management Analysis Analysis Evaluate customer Ingest data from Mine large volumes of Comb through enormous behavior to identify servers, storage devices transactional data and volumes of text-based patterns that indicate and other hardware to documentation to information and prior art to which customers are most monitor network determine risk and assist in the development likely to leave for a activity, diagnose exposure of financial of new products, guide competing vendor. bottlenecks. assets. portfolio strategies. RECOMMIND PROPRIETARY & CONFIDENTIAL | 23
    • 24. MITIGATING BIG DATA RISKSTHROUGH DEFENSIBLE DELETION RECOMMIND PROPRIETARY & CONFIDENTIAL | 24
    • 25. THE LIFECYCLE OF DATA & DEFENSIBLE DELETION RECOMMIND PROPRIETARY & CONFIDENTIAL | 25
    • 26. DEFENSIBLE DELETION: PIPE DREAM ORREALITY?• Survey by Enterprise Strategy Group in Q4 2012• 253 business and IT professionals familiar with their organisation’s data disposition policies (all organisations currently dispose of data) - 36% IT professionals - 64% business professionals• Midmarket (100 to 999 employees) and enterprise-class (1,000+ employees) organisations - 32% midmarket - 68% enterprise-class• Multiple verticals RECOMMIND PROPRIETARY & CONFIDENTIAL | 26
    • 27. RESPONDENTS BY NUMBER OF EMPLOYEES How many total employees does your organisation have worldwide? (N=253) 100 to 249, 13% 20,000 or more, 21% 250 to 499, 10% 10,000 to 500 to 999, 9% 19,999, 13% 5,000 to 9,999, 8% 1,000 to 2,499, 13% 2,500 to 4,999, 12%© 2012 Enterprise Strategy Group RECOMMIND PROPRIETARY & CONFIDENTIAL | 27
    • 28. RESPONDENTS BY INDUSTRY What is your organisation’s primary industry? (Percent of respondents, N=253) Government (Federal/National, Sta te/Province/Local), 22 Other, 28% % Retail/Wholesale, 1% Manufacturing, 15% Health Care, 5% Communications & Financial Media, 6% Business Services (banking, securities, in (accounting, consultin surance), 12%© 2012 Enterprise Strategy Group g, legal, etc.), 10% RECOMMIND PROPRIETARY & CONFIDENTIAL | 28
    • 29. HOW ORGANISATIONS DISPOSE OF DATA Which of the following best describes the manner in which your organisation disposes of data? (N=253) Data is disposed of on an ad hoc basis, 20% We have a formal data disposition policy in place, 80%© 2012 Enterprise Strategy Group RECOMMIND PROPRIETARY & CONFIDENTIAL | 29
    • 30. DRIVERS BEHIND DATA DISPOSITION What are the biggest drivers behind your organisation’s data disposition? (N=253, multiple responses accepted) Improving overall data management for better future retrieval 66% Mitigating the risk of exposing sensitive/confidential data past its 58% retention mandate to potential future security breaches Reducing exposure to risk from future e-discovery/regulatory 56% productions Reducing costs of storing legacy data or records with third parties 56% Improving systems performance 50% Reducing costs of future e-discovery/regulatory productions 46% Reducing maintenance costs (i.e., OPEX) associated with storing 46% data volumes Reducing systems costs (i.e., CAPEX) associated with storing data 46% volumes 0% 10% 20% 30% 40% 50% 60% 70%© 2012 Enterprise Strategy Group RECOMMIND PROPRIETARY & CONFIDENTIAL | 30
    • 31. APPLICATION OF DATA RETENTION POLICIES Which of the following best describes your organisation’s application – or expected application – of data retention policies? (N=245) We have – or will have – retention policies or records 85% management in place for paper… We retain – or will retain – regulated data according to its 80% mandated retention schedule We preserve – or will preserve – 71% data under legal hold We have – or will have – retention policies or records 71% management in place for…© 2012 Enterprise Strategy Group 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% RECOMMIND PROPRIETARY & CONFIDENTIAL | 31
    • 32. GROUPS RESPONSIBLE FOR CREATION ANDSETTING DATA DISPOSITION POLICIES Which of the following groups assist/are expected to assist in the creation of data disposition policies? Which group is/will be responsible for setting data disposition policies? (N=235, multiple responses accepted) Records management group 38% 78% Legal department/general counsel 21% Group that 77% 11% sets – or will IT 77% set – data Executive team 9% disposition 54% policies Business users 3% 51% All groups Compliance group 6% with input 49% into data Accounting or auditing firm 2% 31% disposition 2% policies Outside counsel 23% Service provider 2% 12%© 2012 Enterprise Strategy Group 0% 20% 40% 60% 80% 100% RECOMMIND PROPRIETARY & CONFIDENTIAL | 32
    • 33. GROUP RESPONSIBLE FOR EXECUTION OF DATADISPOSITION POLICIES Which group is – or will likely be – responsible for the execution of data disposition policies (i.e., actually removing data from systems)? (N=235) Service provider, 1% Other, 2% Accounting or auditing firm, 1% Don’t know, 1% Legal department/general counsel, 2% Compliance group, 3% Business users, 4% IT, 49% Records management group, 38%© 2012 Enterprise Strategy Group RECOMMIND PROPRIETARY & CONFIDENTIAL | 33
    • 34. AMOUNT OF DATA DISPOSED ANNUALLY On average, approximately how much data would you estimate your organisation disposed on an annual basis? (N=137) 45% 41% 40% 35% 30% 27% 25% 20% 15% 15% 12% 10% 5% 4% 0% Less than 1 TB 1 TB to 5 TB 6 TB to 10 TB 11 TB to 25 TB More than 25 TB© 2012 Enterprise Strategy Group RECOMMIND PROPRIETARY & CONFIDENTIAL | 34
    • 35. PERCENT OF TOTAL AMOUNT OF DATADISPOSED OF ANNUALLY On average, what percentage of your organisation’s total amount of data do you estimate is disposed on an annual basis? (N=171) 40% 37% 35% 30% 25% 25% 20% 16% 15% 11% 10% 9% 5% 2% 0% Less than 5% 5% to 10% 11% to 15% 16% to 20% 21% to 25% More than 25% RECOMMIND PROPRIETARY & CONFIDENTIAL | 35
    • 36. IMPACT OF FORMAL DATA DISPOSITIONPOLICIES How significant has the impact of formal data disposition policies been on cost savings and/or risk avoidance for your organisation? (N=203) Don’t know, 13% Very significant, 9% Too soon to tell, 13% Insignificant, 2% Significant, 39% Neither significant nor© 2012 Enterprise Strategy Group insignificant, 23% RECOMMIND PROPRIETARY & CONFIDENTIAL | 36
    • 37. AGENDA Big Data and the importance of analysing both structured and unstructured information Role of the CIO in helping to alleviate risk & compliance issues within the enterprise How to categorise, find, manage and analyse information from disparate repositories into one overarching platform RECOMMIND PROPRIETARY & CONFIDENTIAL | 37
    • 38. CUSTOMER CASE STUDY –US DEPT. OF ENERGY RECOMMIND PROPRIETARY & CONFIDENTIAL | 38
    • 39. CASE STUDY – US DEPARTMENT OF ENERGY The Challenge We have thousands of users generating many records a day. How do we manage this information like an asset so that it can be useful and we comply with the government’s records management mandate? RECOMMIND PROPRIETARY & CONFIDENTIAL | 39
    • 40. EMAIL & RECORDS MANAGEMENT AT USDEPARTMENT OF ENERGY Need to preserve history Importance of vital records for continuity of operations if emergencies arise Need to provide copies of records for legal actions or FOIA legal requests Lack of motivation to categorize content RECOMMIND PROPRIETARY & CONFIDENTIAL | 40
    • 41. AUTOMATIC CATEGORISATION APPROACH Auto Categorization Uncategorized Journaling Content Drop-Off Library Categorized Content Organizer Categorized Mov e Site 1 Site 2 Site 3 Site 4 RECOMMIND PROPRIETARY & CONFIDENTIAL | 41
    • 42. IMPACT Requires one system administrator/engineer & two people to manage the electronic records center – for 1,000 users End users more productive due to no longer having to categorize content RECOMMIND PROPRIETARY & CONFIDENTIAL | 42
    • 43. EMAIL CATEGORISATION ACCURACY 100% 80% 60% Accuracy Average B a s e d R u l e - 40% 86% 20% 0% Administrative Notices Budget Records Customer Service IT Management Improvement Procurement Records Travel Records RECOMMIND PROPRIETARY & CONFIDENTIAL | 43
    • 44. SIMILAR USE CASES Email compliance in financial services ˗ Email archiving capture emails from target employees ˗ Random sampling & manual review of emails ˗ Automatic sampling, initial review & assignment to senior reviewers is more cost and time efficient, accurate & defensible Predictive coding in e-Discovery ˗ Predictive Sampling to estimate the percentage of responsive documents ˗ Predictive Analytics (Concepts, Phrases, Smart Filters) to find potentially relevant documents ˗ Complete iterative cycle until zero documents are computer-suggested or responsive ˗ Use Predictive Sampling to QC the non-reviewed documents Predictive modelling in healthcare: ˗ Find at risk patients using guided data mining against a pre-built, validated predictive model for a specific issue such as hospital acquired conditions ˗ Predict the patients who should be isolated upon arrival, and the most reliable approach to screening RECOMMIND PROPRIETARY & CONFIDENTIAL | 44
    • 45. DIFFERENT USE CASES, DIFFERENT ROIs • Predictive Analytics Optimize • Operational Efficiencies Value • Business Intelligence Lower Costs • Storage Management • Personnel optimization • Operational efficiencies Minimize Risk • Security Breaches • eDiscovery Costs • Data Leakage • Regulatory Inquiries RECOMMIND PROPRIETARY & CONFIDENTIAL | 45
    • 46. CUSTOMER CASE STUDY –SWISS RE INSURANCE RECOMMIND PROPRIETARY & CONFIDENTIAL | 46
    • 47. SWISS RE - ACCESS, COMPLIANCE & EDISCOVERY  100s of TB data  Index once, use many  True 360 degree view of enterprise data  Based on CORE platform Custom-built apps NoSQL ENRICHED ANALYTICS INDEX DATABASE RECOMMIND PROPRIETARY & CONFIDENTIAL | 47
    • 48. PRODUCTS AND SOLUTIONSVERTICALMARKETSENTERPRISEAPPLICATIONS 3rd Solutions Party NoSQL ENRICHED ANALYTICSCORE DATABASE INDEXPLATFORMENTERPRISEDATA Databases Machine Office System Social ESI Email Web XML Data Documents Logs Media RECOMMIND PROPRIETARY & CONFIDENTIAL | 48
    • 49. WHAT MAKES CORE UNIQUE? Powerful and scalable indexing and retrieval FIND Keyword and language-agnostic machine learning Unstructured information “joins” CONNECT Unstructured data extraction & analytics ANALYSE Delivers ability to confidently act on data ACT RECOMMIND PROPRIETARY & CONFIDENTIAL | 49
    • 50. SUMMARY Big Data and the importance of analysing both structured and unstructured information ˗ What it is ˗ What it is not ˗ Risks & opportunities Role of the CIO in helping to alleviate risk & compliance issues within the enterprise ˗ Defensible deletion ˗ Categorization – US Dept of Energy How to categorise, find, manage and analyse information from disparate repositories into one overarching platform ˗ CORE platform ˗ Swiss Re RECOMMIND PROPRIETARY & CONFIDENTIAL | 50
    • 51. THANK YOU– QUESTIONS?@nickpatiencenick.patience@recommind.com RECOMMIND PROPRIETARY & CONFIDENTIAL | 51

    ×