Employees, Business Partners and Bad Guys: What Web Data Reveals About Persons of Interest


Published on

This presentation will discuss how to collect Web data with precision, transform it and then apply next-generation text analytics to reveal insights about the past activities of persons of interest and/or predict future outcomes. Featured guest speaker Claire Schmidt will discuss results of a project which proved the potential of using automated Web data collection and advanced analytics to identify potential child victims of exploitation.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Employees, Business Partners and Bad Guys: What Web Data Reveals About Persons of Interest

  1. 1. Employees, Business Partners and Bad Guys:What Web data reveals about persons of interestPresenters: Gina Cerami, VP of Marketing, ConnotateDave Danielson, VP of Marketing, Digital ReasoningCl i S h id Di f PClaire Schmidt, Director of Programs,Thorn: Digital Defenders of Children(formerly DNA Foundation)Date: November 28, 2012
  2. 2. Today’s Discussion• What Web Data Reveals: The FundamentalsThe business caseEmployee background check business partner screening persons of interestEmployee background check – business partner screening – persons of interest• Collecting Good Data: Not That EasyWhere to start? Best practicesDifferences in data sources – the automation processDifferences in data sources – the automation process• Analyzing Data: A Difficult ProblemWhy advanced text analytics mattersMaking sense of big dataMaking sense of big data• Automation and Advanced Analytics: A Powerful CombinationBackground check accuracy enhanced with Entity Resolution• Thorn: Working to End Child Sexual ExploitationCombined solution applied to detecting child sex trafficking online• Q&AQ&A2
  3. 3. What Web Data Reveals:What Web Data Reveals:The Fundamentals3
  4. 4. The Business Casenews – blogs – social mediatrillions of URLstrillions of URLscourt records – registries – sanctions lists4
  5. 5. What Web Data Reveals About Persons of InterestBad GuysBusiness PartnersProspective Employees• Extract precise datafrom 10,000+ recordson URLs linked to• Check sanctions lists• Identify politically• 3-minute screeningusing public recordsin 1 500 jurisdictions on URLs linked toillegal activities• Use advancedanalytics to narrow theexposed persons• Reduce business risk• Avoid fines – complyin 1,500 jurisdictions• Eliminate human error• Save time, money;analytics to narrow thescope of investigationsAvoid fines complywith AML/KYC ruleshire right the first timeAutomated, precise data collectionis key to success555
  6. 6. The Cost of Not Knowing Your EmployeesThe cost of fraud in the workplace:• $400 to 600 billion/year in the U S (Harvard)$400 to 600 billion/year in the U.S. (Harvard)• 5% revenue on average (Association of Fraud Examiners)• $3.2B in Canada in 2011 (Certified General Accountants Assoc. of Canada)The cost of re-hiring (not getting it right the first time)• From $3.5K (U.S. average cost-per-hire) to $$millions for CEOs$ ( g p ) $$How does this happen?• 50% of resumes have factual errors• 1 in 5 job applications have a major lie or discrepancy (UK 2009 survey)• Many background checks are manual (error prone) or incomplete6
  7. 7. Solution: Comprehensive SearchRegular monitoring of all levels of government sites• National state county and localNational, state, county and local• If you outsource – make sure your screening service continually monitorsthese sites for updates• If you already do it yourself consider automating the Web data collection• If you already do it yourself – consider automating the Web data collectionprocess to ensure accuracy and timelinessConnotate’s software powers over250,000 background checks per month;3 million to date3 million to date7
  8. 8. The Cost of Not Knowing Business PartnersRecent Bank Secrecy Act (BSA) Penalties• $1 2 B – Citibank April 2012$1.2 B Citibank, April 2012• $7 M – Pacific National Bank, March 2011BSA / Anti-Money Laundering (AML) Penalties• $10.9 M – Ocean Bank (FL) August 2011Reputation Risk – substantial8
  9. 9. Solution: Comprehensive SearchComprehensive searches by third-party services areavailable for specific vertical industriesavailable for specific vertical industriesIf you wish to conduct customized searches on aregular basis consider automated data collectionregular basis, consider automated data collection• Sanctions lists (Treasury.gov, ICE, EU Terrorism List, FBI Most Wanted,OCC Shell Bank, etc.)• PACER, national and state lists• Social media may reveal that the person of interest is associating with otherson sanction lists9
  10. 10. Collecting Good Data: Not That Easy10
  11. 11. Where to Start? Best Practices• Narrow your search• Scope the projectp p j• Think about the long term• Sources11
  12. 12. Differences in Web Sources12
  13. 13. Polling Question: Web Data CollectionAre you currently collecting background datafrom the Web?from the Web?Yes – we are doing this using an automated processg g pYes – however, we are collecting Web data using a manual processNo – we outsource background check to a third-party service13
  14. 14. An Overview of the Automation ProcessTransform Deliver• StructureClassify• ReportsDashboardsCollect DataInternal Sources• DatabasesExternal Sources• Social Media • Classify• Prep for Analysis• Dashboards• Workflow• BI Plug-ins• Databases• Interviews• Resumes• Social Media• Surface Web• Hidden Web•Secured Sites14
  15. 15. Analyzing Data: A Difficult Problem15
  16. 16. New Content SourcesRequire Advanced AnalyticsOutputsq yTransformCollect DataCollect Data Advanced Analytics• Reports• Dashboards• Workflow• BI Plug ins• Remove Formatting• Text Only• Unstructured andStructured Data• Variety of Sources• Scalable• Automated• BI Plug-insResolving the Unique IndividualAssociating Time and Geographic dataFact/Assertion ExtractionRelationship Identification and Extraction1616
  17. 17. Synthesys Overview:A software platform for making sense of big datap g gREAD RESOLVE REASONSynthesys PlatformDISPARATEDATAAPPLICATION-READYDeep processingof unstructureddataAssemble,organize, andrelateUncoverrelationships,compare & correlateNewsWebEmailResearchDATAApp IntegrationEvents/AlarmsNetwork AnalysisREADYANALYTICSInstant MessagesAnalytic Primitives• Natural LanguageProcessing• Entity Resolution• Synonym Generation• Similarity Algorithms• ConnectivityProcessing• Extraction• Geocoding• Time normalization• Synonym Generation • ConnectivityMachine LearninggDistributed Processing (Hadoop MapReduce)Distributed Storage (HBase, Cassandra, Cloudbase)Synthesys reads, resolves and reasons about entities and relationships in space and time.17
  18. 18. Other solutions are flawedand don’t make automated understanding possibleg pHistorically, the market has built tools to help find reading materialSearchGoogle, Fast, Autonomy, Recommind, LuceneEntity ExtractionBasis, Janya, Aerotext, Attensity, SAP/Inxight, Lexalytics, SRA NetOwlComprehensive Ontologies or Data ModelsClarabridge, Endeca, Expert Systems, IBM Entity Analytics, InformaticaOther text analytics solutions still require the human to read to understand18
  19. 19. Synthesys turns data into “knowledge objects”President Masayoshi Son wants to repeat the successVBZNNP NNP NNP VBTO DT NN PRPNP VP - PP VPPERSON – PROPER NOUNPresident Masayoshi SonJapanhe had while building Softbank into Japan’s third-l t i l i S t t t k k tNNP NNPSYMVBD JJSJJININVBD POSNP NPNPPERSON – PROPER NOUNLOCATIONJapanORGANIZATIONSoftbankPREDICATEBuiltlargest wireless carrier. Son wants to take marketshare from entrenched giants and deliver more data toNN NNP. VBZ VB NN NN INNN TONPNPORGANIZATIONNOUN - ENTITYMore DataTo DeliverNOUN - ENTITYMarket SharePREDICATEsmartphones, tablets, cars and even bicycles.CCNNSNNSNNSNNSNNS NNS ,CC VBJJRBDT TONPNP NPENTITYSmartphonesTablets PREDICATETo TakePREDICATECC NNSNNSNNS , .RBNLP ExtractionEOS TOK POS CHUNK NER SREXENTITYEOS TOK POS CHUNK NER SREX19
  20. 20. Resolution makes “Concept” or “Semantic”understanding possibleg pConcept: California-based AppleReferences/Mentions:AppleApple incApple, inc.ppCalifornia-based AppleSecretive AppleiPhone inventorSteve Job’s CompanyAAPLTechnology Innovator AppleSynthesys resolves multiple, varied mentions across the entire data setb i f h b d h i ias being part of the same concept based on their usage in context.20
  21. 21. Synthesys is “Software that Learns”new languages, patterns, categories, etc.g g , p , g ,Supervised machine learning techniques and patent-pending workflows allow contentexperts to train models and achieve quality improvement without any programming.User uploadsexample of newdocumentdomain/language1155domain/languageSynthesys predictsannotation22Operator corrects 1Operator correctsannotation andadds categories33Completedannotation is44223344annotation issubmitted to server4Completed modeltraining is submitted toSynthesys55321
  22. 22. Synthesys Powers Tools:Providing a Common Global ViewLeadingVisualizationPlatformsR l ti l D t bPlatform(Data Organized &Application-Ready)Relational DatabaseManagement System(RDBMS)Data SourcesU t t d D tSt t d D t Unstructured DataStructured Data22
  23. 23. Polling Question: Data AnalysisAre you looking to use analytics on Web data toresolve entities or understand relationships that mightresolve entities or understand relationships that mighthelp in background investigations?Yes – we are analyzing Web data manually todayYes – we analyzing with text extractors or other text mining toolsYes – we have a near-term project to analyze Web dataN b t h d t l W b d t i th f tNo – but we may have a need to analyze Web data in the futureNo – we have no plans to analyze Web data23
  24. 24. Web Data and Advanced Analytics:Web Data and Advanced Analytics:A Powerful Combination24
  25. 25. Employee Screening: A Delicate BalanceThe cost of mistaken identity (incomplete screening)• Class action suits have been filed over erroneous sex offender reportingClass action suits have been filed over erroneous sex offender reporting• Digital Reasoning’s Solution: Entity Resolution with Synthesys®Positions of trust Employee privacySafe workplaceRight hire thefirst timeLibel: Impact = job lossEEOC / FCRA25
  26. 26. Business Partner Screening:Avoiding Legal and Reputation RiskAvoiding Legal and Reputation RiskAnti MoneyL d iDo we have the right LaunderingPoliticalCorruptionDo we have the rightperson?(Nicknames,misspellings, etc.)Foreign CorruptPractices ActTerroristDo we know who isconnected with thiscompany? TerroristFinancingSuspiciousActivity ReportWhat about ForeignLanguage Sources?company?Activity ReportIncreasingly, the sources of the information you needare in unstructured web content26
  27. 27. Thorn:Thorn:Working to End Child Sexual Exploitation27
  28. 28. Thorn OverviewThorn’s focus: The role technology plays in crimesinvolving the sexual exploitation of children.Thorn’s goal: To disrupt and deflate predatory behaviorin the fight to end child sexual exploitation.Thorn creates tools, policies and programs to bringan end to illicit activities that could harm childrenan end to illicit activities that could harm children.Technology Task Force consists of over 25 top techcompanies that collaborate on technology initiatives tofight child sexual exploitation.g pWorks closely with law enforcement, NGOs, privatesector and its Technology Task ForcePart of the White House’s Office of Science andPart of the White House s Office of Science andTechnology’s commitment to end trafficking28Claire Schmidt is the Director of Programs for Thorn28
  29. 29. The Challenge• The explosive growth of online media has made it moredifficult to monitor and identify illicit activities, includingy , gchild sex trafficking• Traditional analytics tools do a poor job of monitoring theseforms of online media• Data is “messy” and unstructuredData is often false• Data is often false• Real age is difficult to determine from online data content• Law enforcement has few, if any, tools to combat thisyproblem29
  30. 30. Combating Sex Trafficking: Project Overview• Thorn desired to determine the feasibility of using advanced textanalytics to detect child sex trafficking in online media.y g• Connotate built a process to automatically download data fromselected websites.• Digital Reasoning developed analytics to detect potential child sexDigital Reasoning developed analytics to detect potential child sextrafficking activity within the collected data.Widely VariedData SourcesData Aggregationand CleansingAnalytics, resolution,and pattern matchingAnalytics results,reports, charts30
  31. 31. Project MethodologyInterview Law Enforcement Experts• Interviewed Law Enforcement officials and determined three major focal• Interviewed Law Enforcement officials and determined three major focalpoints for automated understandingIsolate and Map Semantic Features• Interview results were mapped into semantic features (“signatures”)Develop models for use in Synthesys• Analytic models were created by training Synthesys on the semanticsignaturesIdentify sources of Internet dataIdentify sources of Internet data• Then configured into Connotate for automated collection, cleansing andtransformation of data31
  32. 32. Key Innovative Developments• Accurate telephone number extractor• Unique profiles for people posting ads• Analytic assessment models for textAchieved High Level of Accuracy• Achieved High Level of Accuracy32
  33. 33. Web Data Collection and Advanced AnalyticsOutputsTransformCollect DataCollect Data Advanced AnalyticsConnotate Digital Reasoning• Reports• Dashboards• Workflow• BI Plug ins• Remove Formatting• Text Only• Unstructured andStructured Data• Variety of Sources• Scalable• Automated• BI Plug-insConnotate provides precise qualitydata, formatted for delivery to yourDigital Reasoning applies advancedanalytics to resolve identities, enrich, y yanalysis tools data and develop unique profiles ofindividuals targeted for investigation3333
  34. 34. Web Data Can Reveal Insights ofTremendous ValueValid insightsrequire precise,quality dataAvoid mistakenidentity with entityresolutionAutomation isthe key toextractingObtain a deeperunderstanding ofpartner operationse t act gprecise,quality datapartner operationsand key relationships34
  35. 35. Q & AConnotate will email a link to this presentation as well as apcopy of the slides to you within 2 business days.If you would like to use advanced Web data collection solutionyto support background check of employees or businesspartners in-house, please call (+1) 732-296-8844 or visitwww connotate com or www connotate co ukwww.connotate.com or www.connotate.co.ukFor more information about law enforcement applications andadvanced analytics please visit www digitalreasoning comadvanced analytics, please visit www.digitalreasoning.com.35
  36. 36. Thank YouIf you have an immediate need and would like us to contactyyou about a forthcoming project, please check the appropriatebox in the last polling question or call (+1) 732-296-8844.For more information, visitwww connotate com or www connotate co ukwww.connotate.com or www.connotate.co.ukandwww digitalreasoning comwww.digitalreasoning.com36