Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Big Data to Find Anomalies and Fight Crime

324 views

Published on

Presented at FITC Toronto 2017
More info at http://fitc.ca/event/to17/

Richard Brath, Uncharted Software

Overview

Many crimes are difficult to track down: human trafficking, fake parts, counterfeit medication. But traces exist: in ads, in posts, in shopping sites and on and on. Richard approached the problem as one related to advertising: in order to sell, someone has to be visible somewhere. He uses search indexes, big data, machine learning, language processing, and graph theory to facilitate search through this data and finally add an easy visualization-based interface to allow experts to connect the dots. Investigators have been able to use the approach to achieve successes in combating human trafficking. Richard will discuss high-level considerations of the end-to-end approach of such a system and the needs for a simple interface for non-computer experts to be able to navigate through complexity.

Objective

An iterative approach to stitching together many emerging technologies to solve tough problems.

Target Audience

Anyone working on tough problems: end users, application architects, senior developers.

Assumed Audience Knowledge

Specific technologies will be avoided, although some advanced computational concepts will be discussed.

Four Things Audience Members Will Learn

Iterative design and technology approach to solving tough problems
Why standard approaches don’t work for some types of problems
Search, facets and graph analysis
Evolving a user interface and visualization for analysis

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Using Big Data to Find Anomalies and Fight Crime

  1. 1. © 2017 Uncharted Software Inc. Richard Brath April 24, 2017 Using Big Data to Find Anomaliesand Fight Crime
  2. 2. © 2017 Uncharted Software Inc.
  3. 3. © 2017 Uncharted Software Inc.© Copyright 2000, The Nasdaq Stock Market, Inc. Reprinted with the permission of The Nasdaq Stock Market, Inc. Photo credit: Peter Aaron/Esto
  4. 4. © 2017 Uncharted Software Inc. IS DAVID FREESE A CLUTCH HITTER? DAVID FREESE: 2011 WORLD SERIES MVP A LOOK AT ALL FREESE’S SUCCESSFUL HITS IN 2011 ON A 3-2 COUNT
  5. 5. © 2017 Uncharted Software Inc.
  6. 6. © 2017 Uncharted Software Inc. Using Big Data to Find Anomaliesand Fight Crime 1. Define problem 2. Find data 3. Index and organize useful data 4. Visual interfaces to solve problem
  7. 7. © 2017 Uncharted Software Inc. Fatal Fakery Fake Pharmaceuticals o Intent to deceive regarding origin, authenticity or effectiveness. https://nakedsecurity.sophos.com/2013/01/23/h ard-viagra-spam/
  8. 8. © 2017 Uncharted Software Inc. Fatal Fakery Fake Pharma world-wide o 2012: 100 patients die from poor quality tuberculosis drug in Pakistan o 2013: 8,000 patients die because antibiotic has no active ingredient in India o 64% of antimalarial drugs in Nigeria are fake o Estimates of 100,000 to 1,000,000 die annually to fake drugs http://www.newsweek.com/2015/09/25/fake-drug-industry- exploding-and-we-cant-do-anything-about-it-373088.html
  9. 9. © 2017 Uncharted Software Inc. Fatal Fakery Fake Pharma in North America o Tainted steroids cause 11 deaths in Boston o Contaminate blood thinner linked to 149 deaths in USA o Vials of cancer drug found with no active ingredients o 11 people indicted for $42m conspiracy to sell counterfeit Lipitor http://www.economist.com/node/21564546
  10. 10. © 2017 Uncharted Software Inc. Fatal Fakery Aircraft Parts o Partnair Flight 394 crash due to fake parts o Old parts refurbished and sold as new o 100+ convictions in fake parts https://www.defensetech.org/2012/05/22/fake-parts-are-everywhere/ https://www.theguardian.com/business/2002/jan/29/theairlineindustry.internationalnews https://www.wired.com/2008/04/fake-parts-in-a/
  11. 11. © 2017 Uncharted Software Inc. Fatal Fakery Aircraft Parts o Even in Canada http://www.cbc.ca/news/politics/fake-parts-in- hercules-aircraft-called-a-genuine-risk-1.1345862
  12. 12. © 2017 Uncharted Software Inc. Fatal Fakery Fake Airbags o Fail to deploy, only partially deploy or cause fires! o One person sells 7000 fake airbags o 10,000-250,000 cars in US estimate to have fake airbags http://www.dailymail.co.uk/news/article-2221976/Fake-airbags- Man-27-1-4-million-selling-counterfeit-airbags-failed-deploy.html http://www.cnn.com/2012/10/10/us/counterfeit-airbags/index.html
  13. 13. © 2017 Uncharted Software Inc. Using Big Data to Find Anomaliesand Fight Crime 1. Define problem 2. Find data 3. Index and organize useful data 4. Visual interfaces to solve problem
  14. 14. © 2017 Uncharted Software Inc. How to Track Fake Goods? o Single medication might legitimately pass through a dozen countries during manufacturing process o E.g. chemical from China with filler from India packaged in Mexico sold in Canada to patient in USA o E.g. online pharmacy in Brazil, involves bank in Azerbaijan and a commerce server in Russia, to buy drugs made in India o Difficult to track each and every step in the supply chain, crosses many national boundaries. And many different kinds of data.
  15. 15. © 2017 Uncharted Software Inc. How to Track Fake Goods? o Fake meds and fake parts need a way to sell o Many sell on Internet: o Classified ad sites o Social media o Paid ads o Text, photos, multimedia o Websites o Etc. o Many kinds of illegitimate selling activity on the Internet o Illegal weapons o Human trafficking o Stock promotions o Endangered species items o Fake iPhones o Credit cards o Identities
  16. 16. © 2017 Uncharted Software Inc. Using Big Data to Find Anomaliesand Fight Crime 1. Define problem 2. Find data 3. Index and organize useful data 4. Visual interfaces to solve problem
  17. 17. © 2017 Uncharted Software Inc. Ads as data? Poor quality data • Freeform data: Maybe an email, maybe an address • Embedded data: Data might be in an image • Obscure data: 1-800-Spelled or 1- eight8eight-mixed-11-extra • One organization might make many different ads • One person might move around: same person in different cities • One phone number might be more than one business • Btw, Most ads are legit! • And there’s millions of ads! Sample ads for illustrative purposes only. No structure to data, different spelling/cases. Also images may contain metadata or images may be clipart
  18. 18. © 2017 Uncharted Software Inc. Ads as Data First name, Phone number Item, Stock screenshot, Seller Name, Website, Address, Item List with prices, Seller Phone number, Item, Buyer Name, Phone, Address, Item List, Seller 1. Index metadata Same Company? Same Person? 3. Aggregate into Clusters Same address Same phone 2. Make Links Search? Visualization? Summaries? Filtering? 4. User Interface Sample ads for illustrative purposes only.
  19. 19. © 2017 Uncharted Software Inc. Using Big Data to Find Anomaliesand Fight Crime 1. Define problem 2. Find data 3. Index and organize useful data 4. Visual interfaces to solve problem
  20. 20. © 2017 Uncharted Software Inc. Search? Traditional search has problems o Top 10 results then lots of scrolling o Legitimate things more likely at the top o It’s just all the individual items
  21. 21. © 2017 Uncharted Software Inc. Graph Analytics o Instead of top 10 items o Link items that are connected o A group of connected items are a unique group
  22. 22. © 2017 Uncharted Software Inc. o This is 10 years of my email. o Circles are people who I sent > 10 emails o Lines are cc’s Chief dev Current project School admin Convener Current team Old team Graph Visualization o Red circles: recent o Size: # of emails o Groups: communities of related people o All together…
  23. 23. © 2017 Uncharted Software Inc. Graph Visualization Challenging to scale up to 10,000, 100,000, 1,000,000, 1m+… 500 romantic relationships 1500 hardware store items 20,000 movie actors
  24. 24. © 2017 Uncharted Software Inc. INFLUENCER ANALYSIS 1. Marketing email – track forwards. 2. See who forward the most, where tenuous connections are, long chains. - Size: people who share widely (degree) - Centrality: text size indicates key people required to reach across the network (betweenness) - Color: average distance to other nodes (closeness) Graph Analysis and VisualizationRichard Brath and David Jonker authors (Uncharted Partners) CONVERSATION ANALYSIS 1. Toronto Raptor’s fan data via Facebook. 2. Topic extraction to identify references to player name. 3. Sentiment extraction to identify positive/negative comments. 4. Visually group clusters of conversations regarding players (by jersey number), and color (to indicate sentiment). 5. Click to select, e.g. the largest negative conversations and associated side conversations. EVOLUTION ANALYSIS Grow, fade, evolve, new hubs. COMMUNITY ANALYSIS 1. Email traffic, use cc’s and fwd’s to establish connections. 2. Weight by number of emails, color by recency. 3. Select subgraphs, replot and label. 4. Large dots frequent, those in-between are key influencers that bridge across communities. Wiley book by Uncharted authors. Many examples of different types of social graph analysis: e.g. local importance, community, topics, evolution:
  25. 25. © 2017 Uncharted Software Inc. Graph Visualization: Scale via Interaction!
  26. 26. © 2017 Uncharted Software Inc. Scale via big visualization (Uncharted SALT)
  27. 27. © 2017 Uncharted Software Inc. Example 100m Ads o Built a front-end for searching through 100m ads. o Start is a dashboard, most typical route is to start with a search.
  28. 28. © 2017 Uncharted Software Inc. First Graph Visualization
  29. 29. © 2017 Uncharted Software Inc. Link Types o Many, many link types o Gets cluttered fast o Becomes difficult to decipher o Hard to navigate
  30. 30. © 2017 Uncharted Software Inc. Dense Graph Problem When you start indexing lots of meta data – name, address, city, website, date, text similarity - eventually you’ll have everything connected: 1. There can be thousands or hundreds of thousands of adjacent entities. 2. Which links do you want to connect on: Phone number x? Email address y? 3. For clusters, can I easily tell what type of link connects them? What the value is of the link that connects them?
  31. 31. © 2017 Uncharted Software Inc. Focus on the metadata what are the differences Focus on the links interact by link types Focus on the aggregate what makes a cluster Ads as Data Same Company? - Same address Name, Website, Address, Item List with prices, Seller First name, Item, Stock screenshot, Seller Phone number, Item, Buyer Name, Phone, Address, Item List, Seller Same Person? - Same name 1. Index metadata 3. Aggregate into Clusters 4. User Interface Search Metadata Links Clusters 1234 Any Street, NYC 1234 Any Street, NYC 1245 Any Street, NYC 215 Wisteria Lane, Hoboken John Doe Roy G. Biv BothhaveAnyStreet,NYC BothMr.Biv Ernest Biv Ernest Biv 2. Make Links
  32. 32. © 2017 Uncharted Software Inc. Next Version Focus on the metadata what are the differences Focus on the links interact by link types Focus on aggregate what makes a cluster
  33. 33. © 2017 Uncharted Software Inc. Current production version Clusters Individual Ads Meta data
  34. 34. © 2017 Uncharted Software Inc. So what? o Some people design visualizations just using “ready-made” components o In this case, a standard graph layout didn’t work. o Nor did a few other early prototypes • Design is required • Iterative approach • Driven by the data and the tasks Sample ready-made visualizations in D3js. A good starting point, but not necessarily the right visual for the task.
  35. 35. © 2017 Uncharted Software Inc. Thanks! Richard Brath Uncharted Software rbrath<at>uncharted.software @rkbrath

×