• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
(Big) Data Science
 

(Big) Data Science

on

  • 1,062 views

slides from my talk at WebExpo Prague 2013

slides from my talk at WebExpo Prague 2013

Statistics

Views

Total Views
1,062
Views on SlideShare
672
Embed Views
390

Actions

Likes
1
Downloads
16
Comments
0

4 Embeds 390

http://webexpo.net 132
http://graphaware.com 110
http://www.bachman.cz 106
https://twitter.com 42

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    (Big) Data Science (Big) Data Science Presentation Transcript

    • GraphAware TM by Michal Bachman and a bit of Graph Theory (Big) Data Science
    • GraphAware TM “the sexiest job in the 21st century” HARVARD BUSINESS REVIEW Data Science
    • GraphAware TM by 2018 the United States could be short up to 190,000 people with the analytical skills ... to make wise use of virtual mountain ranges of data for critical decisions in business, energy, intelligence, health care, finance, and other fields. McKinsey Global Institute (2011) Data Science
    • GraphAware TM
    • GraphAware TM “hybrid computer scientist/software engineer/statistician” The Times Data Scientist
    • GraphAware TM a collection of data sets that are large and complex. Big Data
    • GraphAware TM is a function of size, connectedness, and uniformity. Data Complexity
    • GraphAware TM a pattern of interconnections among a set of things. Network
    • GraphAware TM Social ties Information we consume Technological and economic systems ... Networks
    • GraphAware TM a pattern of interconnections among a set of things. Network
    • GraphAware TM implicit consequences of one’s actions for the outcomes of everyone in the system who is linked to whom Structure Behaviour
    • GraphAware TM is the study of network structure. Graph Theory
    • GraphAware TM 0 25.0 50.0 75.0 100.0 2007 2008 2009 2010
    • GraphAware TM Leonhard Euler
    • GraphAware TM Seven Bridges of Königsberg
    • A B C D GraphAware TM Graph Theory
    • A B C D GraphAware TM Graph Theory
    • A B C D GraphAware TM Graph Theory
    • A B C D GraphAware TM Connected Graph
    • A B C D E F GraphAware TM Connected Components
    • GraphAware TM is the social network of the entire world connected? Question:
    • GraphAware TM (probably :-)) No.
    • GraphAware TM Giant Components
    • GraphAware TM how many giant components are there in a large, complex network? Question:
    • GraphAware TM why? 1
    • GraphAware TM “I read somewhere that everybody on this planet is separated only by six other people. Six degrees of separation. Between us and everyone else on this planet.” Six Degrees of Separation: A Play. (John Guare) Six Degrees of Separation
    • GraphAware TM average Bacon number for all performers in the IMDb. 2.9
    • GraphAware TM Collaboration networks Who-talks-to-whom graphs Information linkage graphs Technological networks Natural world networks Transport networks ... Graphs Are Everywhere
    • GraphAware TM Domain interest Proxy for a related network Look for domain-agnostic properties Motivations for Study
    • GraphAware TM People learned about new jobs through acquaintances rather than close friends. Granovetter’s Experiment
    • A B C GraphAware TM Triadic Closure
    • A B C GraphAware TM Triadic Closure A B C
    • GraphAware TM If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future. Triadic Closure
    • A D C E B GraphAware TM Bridge
    • A D C E B A D C E B F H J KG GraphAware TM Local Bridge
    • A B C A B C GraphAware TM Strong Triadic Closure
    • A D C E BA D C E B F H J KG A D C E B F H J KG GraphAware TM Local Bridge = Weak Tie
    • A B C GraphAware TM Structural Balance
    • A B C GraphAware TM Structural Balance A B C A B C
    • A B C GraphAware TM Structural Balance A B C
    • A B C GraphAware TM Structural Balance A B C A B C A B C
    • A B C GraphAware TM Structural Balance A B C
    • B C D A B C D A GraphAware TM Structural Balance
    • GraphAware TM If a labelled complete graph is balanced, then either all pairs of nodes are friends, or else the nodes can be divided into two groups, X and Y, such that each pair of people in X likes each other, each pair of people in Y likes each other, and everyone in X is the enemy of everyone in Y. The Balance Theorem
    • B C D A B C D A GraphAware TM The Balance Theorem
    • GraphAware TM Graph Partitioning
    • GraphAware TM is an open-source, fully transactional graph database. It manipulates data in the form of a directed property graph with labelled vertices and edges. Neo4j
    • name: "Drama" type: "genre" name: "Triller" type: "genre" name: "Pulp Fiction" year: 1994 type: "movie" DIRECTED IS_OF_GENRE name: "Quentin Tarantino" type: "person" name: "Director" type: "occupation" name: "Actor" type: "occupation" IS_OF_GENRE ACTED_IN name: "Samuel L. Jackson" type: "person" IS_A IS_A IS_A ACTED_IN role: "Jules Winnfield" role: "Jimmie Dimmick" GraphAware TM Neo4j
    • GraphAware TM MATCH (a)-[:ACTED_IN]->(m) Cypher Query Language
    • GraphAware TM MATCH (a)-[:ACTED_IN]->(m) Cypher Query Language
    • GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) Cypher Query Language
    • GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) Cypher Query Language
    • GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) ORDER BY count(m) DESC Cypher Query Language
    • GraphAware TM START a=node(*) MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, count(m) ORDER BY count(m) DESC LIMIT 5; Cypher Query Language
    • GraphAware TM ==> +-----------------------------+ ==> | a.name | count(m) | ==> +-----------------------------+ ==> | "Tom Hanks" | 12 | ==> | "Keanu Reeves" | 7 | ==> | "Hugo Weaving" | 5 | ==> | "Meg Ryan" | 5 | ==> | "Jack Nicholson" | 5 | ==> +-----------------------------+ ==> 5 rows ==> ==> 47 ms Cypher Query Language
    • GraphAware TM www.graphaware.com @graph_aware Thank You