(Big) Data Science

2,150 views

Published on

slides from my talk at WebExpo Prague 2013

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
2,150
On SlideShare
0
From Embeds
0
Number of Embeds
977
Actions
Shares
0
Downloads
26
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

(Big) Data Science

  1. 1. GraphAware TM by Michal Bachman and a bit of Graph Theory (Big) Data Science
  2. 2. GraphAware TM “the sexiest job in the 21st century” HARVARD BUSINESS REVIEW Data Science
  3. 3. GraphAware TM by 2018 the United States could be short up to 190,000 people with the analytical skills ... to make wise use of virtual mountain ranges of data for critical decisions in business, energy, intelligence, health care, finance, and other fields. McKinsey Global Institute (2011) Data Science
  4. 4. GraphAware TM
  5. 5. GraphAware TM “hybrid computer scientist/software engineer/statistician” The Times Data Scientist
  6. 6. GraphAware TM a collection of data sets that are large and complex. Big Data
  7. 7. GraphAware TM is a function of size, connectedness, and uniformity. Data Complexity
  8. 8. GraphAware TM a pattern of interconnections among a set of things. Network
  9. 9. GraphAware TM Social ties Information we consume Technological and economic systems ... Networks
  10. 10. GraphAware TM a pattern of interconnections among a set of things. Network
  11. 11. GraphAware TM implicit consequences of one’s actions for the outcomes of everyone in the system who is linked to whom Structure Behaviour
  12. 12. GraphAware TM is the study of network structure. Graph Theory
  13. 13. GraphAware TM 0 25.0 50.0 75.0 100.0 2007 2008 2009 2010
  14. 14. GraphAware TM Leonhard Euler
  15. 15. GraphAware TM Seven Bridges of Königsberg
  16. 16. A B C D GraphAware TM Graph Theory
  17. 17. A B C D GraphAware TM Graph Theory
  18. 18. A B C D GraphAware TM Graph Theory
  19. 19. A B C D GraphAware TM Connected Graph
  20. 20. A B C D E F GraphAware TM Connected Components
  21. 21. GraphAware TM is the social network of the entire world connected? Question:
  22. 22. GraphAware TM (probably :-)) No.
  23. 23. GraphAware TM Giant Components
  24. 24. GraphAware TM how many giant components are there in a large, complex network? Question:
  25. 25. GraphAware TM why? 1
  26. 26. GraphAware TM “I read somewhere that everybody on this planet is separated only by six other people. Six degrees of separation. Between us and everyone else on this planet.” Six Degrees of Separation: A Play. (John Guare) Six Degrees of Separation
  27. 27. GraphAware TM average Bacon number for all performers in the IMDb. 2.9
  28. 28. GraphAware TM Collaboration networks Who-talks-to-whom graphs Information linkage graphs Technological networks Natural world networks Transport networks ... Graphs Are Everywhere
  29. 29. GraphAware TM Domain interest Proxy for a related network Look for domain-agnostic properties Motivations for Study
  30. 30. GraphAware TM People learned about new jobs through acquaintances rather than close friends. Granovetter’s Experiment
  31. 31. A B C GraphAware TM Triadic Closure
  32. 32. A B C GraphAware TM Triadic Closure A B C
  33. 33. GraphAware TM If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future. Triadic Closure
  34. 34. A D C E B GraphAware TM Bridge
  35. 35. A D C E B A D C E B F H J KG GraphAware TM Local Bridge
  36. 36. A B C A B C GraphAware TM Strong Triadic Closure
  37. 37. A D C E BA D C E B F H J KG A D C E B F H J KG GraphAware TM Local Bridge = Weak Tie
  38. 38. A B C GraphAware TM Structural Balance
  39. 39. A B C GraphAware TM Structural Balance A B C A B C
  40. 40. A B C GraphAware TM Structural Balance A B C
  41. 41. A B C GraphAware TM Structural Balance A B C A B C A B C
  42. 42. A B C GraphAware TM Structural Balance A B C
  43. 43. B C D A B C D A GraphAware TM Structural Balance
  44. 44. GraphAware TM If a labelled complete graph is balanced, then either all pairs of nodes are friends, or else the nodes can be divided into two groups, X and Y, such that each pair of people in X likes each other, each pair of people in Y likes each other, and everyone in X is the enemy of everyone in Y. The Balance Theorem
  45. 45. B C D A B C D A GraphAware TM The Balance Theorem
  46. 46. GraphAware TM Graph Partitioning
  47. 47. GraphAware TM is an open-source, fully transactional graph database. It manipulates data in the form of a directed property graph with labelled vertices and edges. Neo4j
  48. 48. name: "Drama" type: "genre" name: "Triller" type: "genre" name: "Pulp Fiction" year: 1994 type: "movie" DIRECTED IS_OF_GENRE name: "Quentin Tarantino" type: "person" name: "Director" type: "occupation" name: "Actor" type: "occupation" IS_OF_GENRE ACTED_IN name: "Samuel L. Jackson" type: "person" IS_A IS_A IS_A ACTED_IN role: "Jules Winnfield" role: "Jimmie Dimmick" GraphAware TM Neo4j
  49. 49. GraphAware TM MATCH (a)-[:ACTED_IN]->(m) Cypher Query Language
  50. 50. GraphAware TM MATCH (a)-[:ACTED_IN]->(m) Cypher Query Language
  51. 51. GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) Cypher Query Language
  52. 52. GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) Cypher Query Language
  53. 53. GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) ORDER BY count(m) DESC Cypher Query Language
  54. 54. GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) ORDER BY count(m) DESC LIMIT 5; Cypher Query Language
  55. 55. GraphAware TM ==> +-----------------------------+ ==> | a.name | count(m) | ==> +-----------------------------+ ==> | "Tom Hanks" | 12 | ==> | "Keanu Reeves" | 7 | ==> | "Hugo Weaving" | 5 | ==> | "Meg Ryan" | 5 | ==> | "Jack Nicholson" | 5 | ==> +-----------------------------+ ==> 5 rows ==> ==> 47 ms Cypher Query Language
  56. 56. GraphAware TM www.graphaware.com @graph_aware Thank You

×