Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A view of graph data usage by Cerved

237 views

Published on

Brief introduction to Cerved data, the role of data scientist in Cerved and how a data scientist can take advantage from graph database.

Bio:

Stefano Gatti: Born in 1970, has been involved for more than 15 years in several big data and technologies driven projects in leading business information companies like Lince and Cerved. He is very fond of agile metodologies, trying to apply them at all organizational levels. In last years he is strongly engaged in facilitating in Cerved the spread of innovation and the taking advantage from the new big and smart data technologies especially from a business usage perspective. datatelling, open innovation, partnership with smart actors of worldwide data driven innovation ecosystem are his actual mantra. Nunzio Pellegrino: Data Scientist in Cerved, as part of Innovation team, with focus on extract value from data and resolve problems with the latest technologies available. I’ve a degree in Statistics with background in Machine Learning. I’ve being worked primarily in Data Integration and Business Intelligence projects for 3 years. In this moment, I’m product owner of a web application based on GraphDB and involved in Italian Open Data projects. I’m a R enthusiastic, Python practitioner and fascinated of graph ecosystem.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A view of graph data usage by Cerved

  1. 1. 25 Settembre 2017 With a datascience perspective A view of graph data usage by Cerved Stefano Gatti – Head of Innovation and data sources Nunzio Pellegrino – Senior Data Scientist – Innovation team
  2. 2. Cerved and its graphs in a nutshell
  3. 3. 3 Cerved, in a nutshell The Italian data-driven company CREDIT INFORMATION Protection against credit risk MARKETING SOLUTIONS New business opportunities CREDIT MANAGEMENT Manage and collect performing and non-performing loans. Over 1000 a minute ü Documents Over 40 million ü Lines of code Over 30,000 ü Customers Over 50 different ü Data sources Over 10million a day ü Api call Over 1,900 ü People 377 million Eur (2016) ü Revenue
  4. 4. 4 Web Data Open Data Proprietary data Official data Chamber of Commerce official data C o m p l e x i t y Our big data
  5. 5. 5 Cerved, in a tech view Data Algorithms Solutions Towards algorithmic economy …
  6. 6. 6 Cerved Graph Story 2011-12 - we started from an IT problem: reengineering of beneficial owner algorithm
  7. 7. 7 Cerved Graph Story 2014-15 - we went through a more algorithmic problem: corporate linkages algorithm
  8. 8. 8 Cerved Graph Story 2015-16 - we go with a “full stack” solution
  9. 9. 9 Cerved Graph thoughts We strongly believe in … The power of linking data The power of analyzing data with network analysis The power of visualizing data in a different way To understand a little better the increasing complexity of modern world … also from an economic point of view
  10. 10. Why a Graph Database?
  11. 11. 11 What is a Graph?
  12. 12. 12 Key Concepts Graph database NoSQL database Managing highly connected data and complex queries Flexible data model
  13. 13. 13 Key Concepts Graph database Declarative or imperative language Horizontal Scaling Graph native storage and process
  14. 14. 14 Where graphdb can be useful? “Hands-On Machine Learning with Scikit-Learn and TensorFlow” by Aurélien Géron
  15. 15. 15 Maybe in the future… “Hands-On Machine Learning with Scikit-Learn and TensorFlow” by Aurélien Géron
  16. 16. 16 Frame the Problem Data Model Simple1 Expressive2 3 Additive
  17. 17. 17 RDBMS RDBMS vs Graph Data Model Graph
  18. 18. 18 Store & Get Data Native graph storage Store Data Fast Write Performance Easy Data Integration: CSV Jdbc REST Api
  19. 19. 19 Store & Get Data Native graph storage Store Data Fast Write Performance Easy Data Integration: Get Data Native graph processing à Index free adjacency CSV Jdbc REST Api Cypher, Declarative Language Driver: Python py2neo (unofficial) R (unofficial) Java APOC
  20. 20. 20 Explore Data Transform implicit to explicitCypher (access points, pattern)
  21. 21. 21 Explore Data Transform implicit to explicitCypher (access points, pattern)
  22. 22. 22 Explore Data Transform implicit to explicitCypher (access points, pattern)
  23. 23. 23 Prepare Data Feature Creation with parallel Graph algorithms Centralities • Page Rank • Betweenness Centrality • Closeness Centrality Graph Partitioning • Label Propagation • Connected Components • Strongly Connected Components Path Finding • Minimum Weight Spanning Tree • All Pairs- and Single Source Shortest Path
  24. 24. 24 Prepare Data Feature Creation with parallel Graph algorithms Centralities • Page Rank • Betweenness Centrality • Closeness Centrality Graph Partitioning • Label Propagation • Connected Components • Strongly Connected Components Path Finding • Minimum Weight Spanning Tree • All Pairs- and Single Source Shortest Path Graph Size (GB) nodes (M) rels (M) PageRank (s) ConCom (s) LabelPropag (s) StrongConCom (s) Pokec 7.3 2 31 10 24 12 12 DBPedia 15 11 117 46 91 51 65 Graphs500-23 7.9 5 129 19 29 18 25 Twitter-2010 49 42 1468 349 353 405 339 soc-LifeJournal1 6.3 5 69 30 34 25 23 Friendster 62 66 1806 611 619 296 483 Performance
  25. 25. 25 Present&Launch your solution Real time Recommendation Fraud Detection Social Network Analysis Search & Link Analysis Knowledge Graph Natural Language Process
  26. 26. Nunzio Pellegrino Senior Data Scientist – Innovation Team nunzio.pellegrino@cerved.com Stefano Gatti Head of Innovation & Data Sources stefano.gatti@cerved.com

×