Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neo4j GraphTalk Oslo - Graph Analytics at DNB

31 views

Published on

Neo4j GraphTalk Oslo 2019
Aiko Yamashita, DNB

Published in: Software
  • Be the first to comment

  • Be the first to like this

Neo4j GraphTalk Oslo - Graph Analytics at DNB

  1. 1. Graph Analytics at DNB Exploring the frontiers of Advanced Analytics with the help of Network Analysis and Neo4J Aiko Yamashita, PhD. Data Scientist Center of Excellence in Advanced Analytics
  2. 2. 2018 PROTECT 2020 GROW 2019 OPERATIONALIZE
  3. 3. DNB IS ROLLING OUT A WIDE SET OF TECHNOLOGIES... FROM DIGITAL BEHAVIOR COLLECTION…
  4. 4. TO OUR CLOUD-BASED (AWS) INSIGHT PLATFORM TO…
  5. 5. HARNESSING NEO4J TO UNDERSTAND OUR CUSTOMERS AND THEIR CONNECTIONS
  6. 6. Networks are visual and intuitive Tables need to be traversed Sender Account Receiver Account 2 Amount Sindre 38387 Karl Aksel 123456 50 000 DNB 9918181818 Karl Aksel 123456 70 000 Karl Aksel 123456 Åste 34567 10 000 Schibsted 929237333 Åste 34567 85 000 Åste 34567 Elsa 4567 100 Elsa 4567 Gustav 6789 20 Karl Aksel 123456 Gustav 6789 100 Sindre 38387 Gustav 6789 200
  7. 7. Personalisering Fraud Personalisation
  8. 8. Default Prediction Churn Prediction
  9. 9. Example 1: Fraud detection Synthetic identities
  10. 10. A ring of N people (N≥2) sharing M elements of data (such as name, date of birth, phone number, address, account number, company id, tax id, etc.) can create up to NM synthetic identities, where each synthetic identity is linked to M × (N-1)other nodes, for a total of: NORMAL QUERIES IN SQL ARE UNSUSTAINABLE IF THE COMPLEXITY IN RELATION EXPLODES!
  11. 11. A ring of N people (N≥2) sharing M elements of data (such as name, date of birth, phone number, address, account number, company id, tax id, etc.) can create up to NM synthetic identities, where each synthetic identity is linked to M × (N-1)other nodes, for a total of: NORMAL QUERIES IN SQL ARE UNSUSTAINABLE IF THE COMPLEXITY IN RELATION EXPLODES! (NM × M × (N-1) ) / 2 relationships
  12. 12. NORMAL QUERIES IN SQL ARE UNSUSTAINABLE IF THE COMPLEXITY IN RELATION EXPLODES! 2187 Nodes 19 683 Relations
  13. 13. SELECT DISTINCT co_owner.name FROM person AS capo JOIN prop AS prop1 ON prop1.person_id = capo.id JOIN prop AS prop2 ON prop2.company_id = prop1.company_id JOIN … JOIN … JOIN … JOIN … JOIN … JOIN … JOIN … JOIN … JOIN person AS co_owner ON prop2.person_id = co_owner.id AND co_owner.id <> capo.id WHERE person.name = 'Pablo Escobar';
  14. 14. MATCH (n:Person)-[*]-(o) WITH n, count (DISTINCT o) AS size WHERE size > 2 RETURN n
  15. 15. EXAMPLE 2: DEFAULT PREDICTION 16
  16. 16. 17
  17. 17. We predict default probability for node based on neighbors.. We calculate a edge weight via tanh or mutual information rate We use a Normalization factor We complement this score with accounting-related variables
  18. 18. EXAMPLE 3: UNDERSTANDING OUR CUSTOMERS
  19. 19. MATCH (p:Company) WITH p, SIZE((p)-[:BOARD]->()) as cnt ORDER BY cnt DESC LIMIT 5 MATCH (p)-[:BOARD]->(c)RETURN p, c, cnt; WHO ARE THE PEOPLE WHO SIT IN MOST STEERING BOARDS?
  20. 20. WHO IS A TOP INFLUENCER IN THE NETWORK? Closeness centrality, PageRank, Centroids

×