SlideShare a Scribd company logo
An Introduction to
    Neo4j
   Michal Bachman
    @bachmanm
Roadmap
•   Intro to NOSQL
•   Intro to Graph Databases
•   Intro to Neo4j
•   A bit of hacking
•   Current research
•   Q&A



                               @bachmanm
Not Only SQL

          @bachmanm
Why NOSQL now?

   Driving trends




                    @bachmanm
Trend 1: Data Size




                     @bachmanm
Trend 2: Connectedness
                                                                                          GGG
                                                                                 Onotologies

                                                                              RDFa


                                                                         Folksonomies
Information connectivity




                                                               Tagging

                                                     Wikis

                                                               UGC

                                                       Blogs

                                                    Feeds


                                        Hypertext
                              Text
                           Documents




                                                                                                @bachmanm
Trend 3: Semi-structured Data




                            @bachmanm
Trend 4: Application Architecture (80’s)



                           Application




                               DB




                                         @bachmanm
Trend 4: Application Architecture (90’s)



                        App   App    App




                               DB




                                    @bachmanm
Application   Application   Application




    DB            DB            DB


                                          @bachmanm
Side note: RDBMS performance
 Salary List




                          @bachmanm
Four NOSQL Categories




                        @bachmanm
Key-Value Stores
• “Dynamo: Amazon’s Highly Available Key-
  Value Store” (2007)
• Data model:
  – Global key-value mapping
  – Big scalable HashMap
  – Highly fault tolerant (typically)
• Examples:
  – Riak, Redis, Voldemort

                                            @bachmanm
Pros and Cons
• Strengths
  – Simple data model
  – Great at scaling out horizontally
     • Scalable
     • Available
• Weaknesses:
  – Simplistic data model
  – Poor for complex data


                                        @bachmanm
Column Family (BigTable)
• Google’s “Bigtable: A Distributed Storage
  System for Structured Data” (2006)
• Data model:
  – A big table, with column families
  – Map-reduce for querying/processing
• Examples:
  – HBase, HyperTable, Cassandra



                                              @bachmanm
Pros and Cons
• Strengths
  – Data model supports semi-structured data
  – Naturally indexed (columns)
  – Good at scaling out horizontally
• Weaknesses:
  – Unsuited for interconnected data




                                               @bachmanm
Document Databases
• Data model
  – Collections of documents
  – A document is a key-value collection
  – Index-centric, lots of map-reduce
• Examples
  – CouchDB, MongoDB




                                           @bachmanm
Pros and Cons
• Strengths
  – Simple, powerful data model (just like SVN!)
  – Good scaling (especially if sharding supported)
• Weaknesses:
  – Unsuited for interconnected data
  – Query model limited to keys (and indexes)
     • Map reduce for larger queries




                                                 @bachmanm
Graph Databases
• Data model:
  – Nodes with properties
  – Named relationships with properties
  – Hypergraph, sometimes
• Examples:
  – Neo4j (of course), Sones GraphDB, OrientDB,
    InfiniteGraph, AllegroGraph



                                                  @bachmanm
Pros and Cons
• Strengths
  – Powerful data model
  – Fast
     • For connected data, can be many orders of magnitude
       faster than RDBMS
• Weaknesses:
  – Sharding
     • Though they can scale reasonably well
     • And for some domains you can shard too!

                                                     @bachmanm
Social Network “path exists”
              Performance
• Experiment:
  • ~1k persons                           # persons query time

  • Average 50 friends per   Relational   1000      2000ms
                             database
    person
                             Neo4j        1000      2ms
  • pathExists(a,b)
                             Neo4j        1000000   2ms
    limited to depth 4
  • Caches warm to
    eliminate disk IO


                                                      @bachmanm
Four NOSQL Categories




                        @bachmanm
What are graphs good for?
•   Recommendations
•   Business intelligence
•   Social computing
•   Geospatial
•   MDM
•   Systems management
•   Web of things
•   Genealogy
•   Time series data
•   Product catalogue
•   Web analytics
•   Scientific computing (especially bioinformatics)
•   Indexing your slow RDBMS
•   And much more!


                                                       @bachmanm
Neo4j is a Graph Database

So we need to detour through a little
           graph theory



                                        @bachmanm
@bachmanm
Meet Leonhard Euler
    • Swiss mathematician
    • Inventor of Graph
      Theory (1736)




                                       @bachmanm
http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg   @bachmanm
Property Graph Model
                                  name: Michal Bachman




• nodes / vertices
• relationships / edges
                                  title: Intro to Neo4j
• properties                      duration: 45




                    name: Neo4j           name: NOSQL




                                                          @bachmanm
Graphs are very whiteboard-friendly




                                @bachmanm
@bachmanm
Neo4j




        @bachmanm
32 billion nodes
32 billion relationships
64 billion properties
                           @bachmanm
@bachmanm
http://opfm.jpl.nasa.gov/




                      @bachmanm
http://news.xinhuanet.com




                       @bachmanm
@bachmanm
@bachmanm
Community


  Advanced



    Enterprise


                 @bachmanm
How do I use it?




                   @bachmanm
Getting started is easy
• Single package download, includes server stuff
  – http://neo4j.org/download/
• For developer convenience, Ivy (or whatever):
  –   <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/>




                                                                   @bachmanm
Run it!
• Server is easy to start stop
  – cd <install directory>
  – bin/neo4j start
  – bin/neo4j stop
• Provides a REST API in addition to the other
  APIs we’ve seen
• Provides some ops support
  – JMX, data browser, graph visualisation

                                             @bachmanm
Embed it!
• If you want to host the database in your
  process just load the jars

• And point the config at the right place on disk

• Embedded databases can be HA too
  – You don’t have to run as server



                                             @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo");

Transaction tx = neo.beginTx();
try {
      Node speaker = neo.createNode();
      speaker.setProperty("name", "Michal Bachman");

    Node talk = neo.createNode();
    talk.setProperty("title", "Intro to Neo4j");

    Relationship delivers
         = speaker.createRelationshipTo(talk,
              DynamicRelationshipType.withName("DELIVERS"));
    delivers.setProperty("day", ”Saturday");

      neo.index().forNodes("people")
             .add(speaker, "name", "Michal Bachman");
} finally {
      tx.finish();
}


      name: Michal Bachman                 DELIVERS     title: Intro to Neo4j
                                        day: Saturday

                                                                         @bachmanm
@bachmanm
Core API
• Nodes
  – Properties (optional K-V pairs)
• Relatiosnhips
  – Start node (required)
  – End node (required)
  – Properties (optional K-V pairs)




                                      @bachmanm
All Conference Topics




                        @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
All Conference Topics
    Node webExpo = neo.getReferenceNode();
    for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) {
          Node speaker = talksAt.getStartNode();
          for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) {
                Node talk = delivers.getEndNode();
                for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) {
                      String topicName = (String) about.getEndNode().getProperty(NAME);
                      //add to result...
                }
          }
    }




-------------------
Printing all topics
All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software,
responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design,
marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j,
rest, css, design, publishing, nosql. Took: 2 ms
Which talks should I attend?




                               @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
Which talks should I attend?
   TraversalDescription talksTraversal = Traversal.description()
        .uniqueness(Uniqueness.NONE)
        .breadthFirst()
        .relationships(INTERESTED, OUTGOING)
        .relationships(ABOUT, INCOMING)
        .evaluator(Evaluators.atDepth(2));

   Node attendee =
        neo.index().forNodes("people").get("name", ”Jeremy White").getSingle();

   Iterable<Node> talks = talksTraversal.traverse(attendee).nodes();

   //iterate over talks and print




------------------------------------------
Suggesting talks for 100 random attendees.
...
Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms
Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms
Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms
Suggested talks for 100 random attendees in 449 ms
What do we have in common?




                         @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
What do we have in common?
      //retrieve attendeeOne and attendeeTwo from index

      int maxDepth = 2;
      Iterable<Path> paths = GraphAlgoFactory
            .allPaths(Traversal.expanderForAllTypes(), maxDepth)
            .findAllPaths(attendeeOne, attendeeTwo);

      for (Path path : paths) {
            //print it
      }



------------------------------------------------------------
Finding things in common for 100 random couples of attendees
...
Karel Kunc and Phil Smith:

(Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith),
(Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith).
Took: 0 ms.
...

Found things in common for 100 random couples of attendees in 142 ms.
Youngsters, Y U No Like Java?




                            @bachmanm
Who is my beer mate?

myself                     beerMate:?




                talk:?



                                 @bachmanm
Who is my beer mate?

(myself)                     (beerMate)




                  (talk)



                                   @bachmanm
Who is my beer mate?
start myself=node:people(name = "Emil Votruba")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                       @bachmanm
Cypher Query
start myself=node:people(name = ”Alex Smart")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                       @bachmanm
Cypher Query
start myself=node:people(name = ”Emil Votruba")

match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                   @bachmanm
Who is my beer mate?




                       @bachmanm
Current Research
•   Graph partitioning
•   Graph analytics (“OLAP” and predictive)
•   Performance improvements
•   Query languages
•   MVCC and single-threaded write models
•   ACID (tradeoffs for weakening C and I)
•   Yield and Harvest in distributed systems
•   Application-level
    – Recommendations
    – Protein interactions
    –…

                                               @bachmanm
Questions?
Neo4j: http://neo4j.org
Neo Technology: http://neotechnology.com
Twitter: @bachmanm
Code: git://github.com/bachmanm/neo4j-imperial.git

More Related Content

Viewers also liked

Easy AJAX with Java and DWR
Easy AJAX with Java and DWREasy AJAX with Java and DWR
Easy AJAX with Java and DWR
Mikalai Alimenkou
 
Finance Tips for New Parents
Finance Tips for New ParentsFinance Tips for New Parents
Finance Tips for New Parents
Miguel Aliaga
 
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationMobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
David Eads
 
Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)
Dawgen Global
 
TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!
Mikalai Alimenkou
 
Pomodoro technique
Pomodoro techniquePomodoro technique
Pomodoro technique
Tricode (part of Dept)
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Tachyon Nexus, Inc.
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
Spark Summit
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
Databricks
 
Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and Thucydides
Mikalai Alimenkou
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Red_Hat_Storage
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
DataWorks Summit
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio, Inc.
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas
Tom Goodwin
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
Jiří Šimša
 
What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?
Marsha Benson
 
CV espanol
CV espanolCV espanol
CV espanol
Rebeca Eriksen
 

Viewers also liked (18)

Easy AJAX with Java and DWR
Easy AJAX with Java and DWREasy AJAX with Java and DWR
Easy AJAX with Java and DWR
 
Finance Tips for New Parents
Finance Tips for New ParentsFinance Tips for New Parents
Finance Tips for New Parents
 
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationMobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
 
Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)
 
TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!
 
Pomodoro technique
Pomodoro techniquePomodoro technique
Pomodoro technique
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
 
Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and Thucydides
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
 
What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?
 
CV espanol
CV espanolCV espanol
CV espanol
 

Similar to Neo4j Introduction at Imperial College London

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
Ranjeet Jha - OCM-JEA
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
zenyk
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
Ajit Koti
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
Patrick Baumgartner
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
TrendProgContest13
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
darthvader42
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
Imaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
Imaginea
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
Maori Ito
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
Peter O'Kelly
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 

Similar to Neo4j Introduction at Imperial College London (20)

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 

More from Michal Bachman

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
Michal Bachman
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
Michal Bachman
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
Michal Bachman
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
Michal Bachman
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
Michal Bachman
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
Michal Bachman
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
Michal Bachman
 
Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the Trenches
Michal Bachman
 
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
Michal Bachman
 

More from Michal Bachman (9)

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
 
Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the Trenches
 
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
 

Recently uploaded

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Neo4j Introduction at Imperial College London

  • 1. An Introduction to Neo4j Michal Bachman @bachmanm
  • 2. Roadmap • Intro to NOSQL • Intro to Graph Databases • Intro to Neo4j • A bit of hacking • Current research • Q&A @bachmanm
  • 3. Not Only SQL @bachmanm
  • 4. Why NOSQL now? Driving trends @bachmanm
  • 5. Trend 1: Data Size @bachmanm
  • 6. Trend 2: Connectedness GGG Onotologies RDFa Folksonomies Information connectivity Tagging Wikis UGC Blogs Feeds Hypertext Text Documents @bachmanm
  • 7. Trend 3: Semi-structured Data @bachmanm
  • 8. Trend 4: Application Architecture (80’s) Application DB @bachmanm
  • 9. Trend 4: Application Architecture (90’s) App App App DB @bachmanm
  • 10. Application Application Application DB DB DB @bachmanm
  • 11. Side note: RDBMS performance Salary List @bachmanm
  • 13. Key-Value Stores • “Dynamo: Amazon’s Highly Available Key- Value Store” (2007) • Data model: – Global key-value mapping – Big scalable HashMap – Highly fault tolerant (typically) • Examples: – Riak, Redis, Voldemort @bachmanm
  • 14. Pros and Cons • Strengths – Simple data model – Great at scaling out horizontally • Scalable • Available • Weaknesses: – Simplistic data model – Poor for complex data @bachmanm
  • 15. Column Family (BigTable) • Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006) • Data model: – A big table, with column families – Map-reduce for querying/processing • Examples: – HBase, HyperTable, Cassandra @bachmanm
  • 16. Pros and Cons • Strengths – Data model supports semi-structured data – Naturally indexed (columns) – Good at scaling out horizontally • Weaknesses: – Unsuited for interconnected data @bachmanm
  • 17. Document Databases • Data model – Collections of documents – A document is a key-value collection – Index-centric, lots of map-reduce • Examples – CouchDB, MongoDB @bachmanm
  • 18. Pros and Cons • Strengths – Simple, powerful data model (just like SVN!) – Good scaling (especially if sharding supported) • Weaknesses: – Unsuited for interconnected data – Query model limited to keys (and indexes) • Map reduce for larger queries @bachmanm
  • 19. Graph Databases • Data model: – Nodes with properties – Named relationships with properties – Hypergraph, sometimes • Examples: – Neo4j (of course), Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph @bachmanm
  • 20. Pros and Cons • Strengths – Powerful data model – Fast • For connected data, can be many orders of magnitude faster than RDBMS • Weaknesses: – Sharding • Though they can scale reasonably well • And for some domains you can shard too! @bachmanm
  • 21. Social Network “path exists” Performance • Experiment: • ~1k persons # persons query time • Average 50 friends per Relational 1000 2000ms database person Neo4j 1000 2ms • pathExists(a,b) Neo4j 1000000 2ms limited to depth 4 • Caches warm to eliminate disk IO @bachmanm
  • 23. What are graphs good for? • Recommendations • Business intelligence • Social computing • Geospatial • MDM • Systems management • Web of things • Genealogy • Time series data • Product catalogue • Web analytics • Scientific computing (especially bioinformatics) • Indexing your slow RDBMS • And much more! @bachmanm
  • 24. Neo4j is a Graph Database So we need to detour through a little graph theory @bachmanm
  • 26. Meet Leonhard Euler • Swiss mathematician • Inventor of Graph Theory (1736) @bachmanm http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
  • 28. Property Graph Model name: Michal Bachman • nodes / vertices • relationships / edges title: Intro to Neo4j • properties duration: 45 name: Neo4j name: NOSQL @bachmanm
  • 29. Graphs are very whiteboard-friendly @bachmanm
  • 31. Neo4j @bachmanm
  • 32. 32 billion nodes 32 billion relationships 64 billion properties @bachmanm
  • 38. Community Advanced Enterprise @bachmanm
  • 39. How do I use it? @bachmanm
  • 40. Getting started is easy • Single package download, includes server stuff – http://neo4j.org/download/ • For developer convenience, Ivy (or whatever): – <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/> @bachmanm
  • 41. Run it! • Server is easy to start stop – cd <install directory> – bin/neo4j start – bin/neo4j stop • Provides a REST API in addition to the other APIs we’ve seen • Provides some ops support – JMX, data browser, graph visualisation @bachmanm
  • 42. Embed it! • If you want to host the database in your process just load the jars • And point the config at the right place on disk • Embedded databases can be HA too – You don’t have to run as server @bachmanm
  • 43. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 44. GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo"); Transaction tx = neo.beginTx(); try { Node speaker = neo.createNode(); speaker.setProperty("name", "Michal Bachman"); Node talk = neo.createNode(); talk.setProperty("title", "Intro to Neo4j"); Relationship delivers = speaker.createRelationshipTo(talk, DynamicRelationshipType.withName("DELIVERS")); delivers.setProperty("day", ”Saturday"); neo.index().forNodes("people") .add(speaker, "name", "Michal Bachman"); } finally { tx.finish(); } name: Michal Bachman DELIVERS title: Intro to Neo4j day: Saturday @bachmanm
  • 45.
  • 47. Core API • Nodes – Properties (optional K-V pairs) • Relatiosnhips – Start node (required) – End node (required) – Properties (optional K-V pairs) @bachmanm
  • 49. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 50. All Conference Topics Node webExpo = neo.getReferenceNode(); for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) { Node speaker = talksAt.getStartNode(); for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) { Node talk = delivers.getEndNode(); for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) { String topicName = (String) about.getEndNode().getProperty(NAME); //add to result... } } } ------------------- Printing all topics All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software, responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design, marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j, rest, css, design, publishing, nosql. Took: 2 ms
  • 51. Which talks should I attend? @bachmanm
  • 52. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 53. Which talks should I attend? TraversalDescription talksTraversal = Traversal.description() .uniqueness(Uniqueness.NONE) .breadthFirst() .relationships(INTERESTED, OUTGOING) .relationships(ABOUT, INCOMING) .evaluator(Evaluators.atDepth(2)); Node attendee = neo.index().forNodes("people").get("name", ”Jeremy White").getSingle(); Iterable<Node> talks = talksTraversal.traverse(attendee).nodes(); //iterate over talks and print ------------------------------------------ Suggesting talks for 100 random attendees. ... Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms Suggested talks for 100 random attendees in 449 ms
  • 54. What do we have in common? @bachmanm
  • 55. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 56. What do we have in common? //retrieve attendeeOne and attendeeTwo from index int maxDepth = 2; Iterable<Path> paths = GraphAlgoFactory .allPaths(Traversal.expanderForAllTypes(), maxDepth) .findAllPaths(attendeeOne, attendeeTwo); for (Path path : paths) { //print it } ------------------------------------------------------------ Finding things in common for 100 random couples of attendees ... Karel Kunc and Phil Smith: (Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith), (Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith), (Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith), (Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith). Took: 0 ms. ... Found things in common for 100 random couples of attendees in 142 ms.
  • 57. Youngsters, Y U No Like Java? @bachmanm
  • 58. Who is my beer mate? myself beerMate:? talk:? @bachmanm
  • 59. Who is my beer mate? (myself) (beerMate) (talk) @bachmanm
  • 60. Who is my beer mate? start myself=node:people(name = "Emil Votruba") match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 61. Cypher Query start myself=node:people(name = ”Alex Smart") match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 62. Cypher Query start myself=node:people(name = ”Emil Votruba") match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 63. Who is my beer mate? @bachmanm
  • 64. Current Research • Graph partitioning • Graph analytics (“OLAP” and predictive) • Performance improvements • Query languages • MVCC and single-threaded write models • ACID (tradeoffs for weakening C and I) • Yield and Harvest in distributed systems • Application-level – Recommendations – Protein interactions –… @bachmanm
  • 65. Questions? Neo4j: http://neo4j.org Neo Technology: http://neotechnology.com Twitter: @bachmanm Code: git://github.com/bachmanm/neo4j-imperial.git

Editor's Notes

  1. WelcomeIntroduce myself, NeoTechMotivations:Presented this at a conference Conversations with FriendsTalked to Serena, no affiliationBigData and NOSQL popular termsGraphs are getting more and more popular (Facebook)Not much attention at ImperialAsk about the audience, heard about graph databases? Graphs? Databases?Outcomes:Learn about a new technologySee application of graph theory in practiceTailored to students (not industry)Agenda:Intro to NOSQLIntro to Graph DatabasesIntro to Neo4jPractical part – how to work with oneReal experiencesCurrent researchQ &amp; A
  2. Why now?Not woke up one day thinking Rel DBs are not cool any moretrends
  3. Generate, process, store and work with
  4. UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)– každýkousíček, každájednotkazajímavýchdat je sémantickypropojená s každoudalšízajímavoujednotkoudat (Tim Berners-Lee)Data jsoupropojenější (lineárně)RDFa (Resource Description Framework in attributes), českysystémpopisuzdrojů v atributech, je technologie pro přenosstrukturovanýchinformacíuvnitřwebovýchstránek. RDFa je jedenzezpůsobůzápisu (serializace) datovéhoformátu Resource Description Framework (RDF). Ontologie je v informaticevýslovný (explicitní) a formalizovanýpopisurčitéproblematiky. Je to formální a deklarativníreprezentace, kteráobsahujeglosář (definicipojmů) a tezaurus (definicivztahůmezijednotlivýmipojmy). Ontologie je slovníkem, kterýslouží k uchovávání a předáváníznalostitýkající se určitéproblematiky.
  5. Data losing predictable structureIndividualisation of data, can’t box each individual, want data about meShape of data, less predictable structureDecentralisation of data creation accelerates this trend
  6. Apps can choose what makes sense to store the data
  7. This is strictly about connected data – joins kill performance there.No bashing of RDBMS performance for tabular transaction processing
  8. Krásavesvětě NOSQL - nikdovámnepřikazuje, vybratdatabázi, kteráodpovídátypučicharakteristicedat, se kterýmipracujete. key-value databáze: jedenklíč - jednahodnota, hash mapy, Redis, Riak (Amazon Dynamo), Většinouvysocetolerantnívůčivýpadkům, Jednoduchýdatový model, Vynikajícíhorizontálníškálovatelnost, Dostupnost, BigTabledatabáze: k-vvvvvvv store s implicitnímiindexy, Cassandra (Google), PodporačástečněstrukturovanýchdatAutomatický index (sloupce), Dobráhorizontálníškálovatelnost, opětnevhodné pro propojená dataDokumentovédatabáze, známá je například subversion, MongoDB, CouchDB, …Kolekcedokumentů, Dokument je kolekce key-value párů, Index je důležitý, hodně map-reduce,Škálovatelnostcelkemdobrá. (Ne takjako key-value, složitějšímdatovýmmodelem, Jednoduchý a výkonýdatový model, jako subversion.Nevýhodouvšech 3 je nejsouúplněvhodné pro hustěpropojená data. Přílišjednoduchýdatový (HashMap, rychlá, ale…) model znamená, žechceme-li získatjakékolivokamžitéhlubšíporozuměníuloženýmdatům. Musí to býtzodpovědnostíaplikačnívrstvy (čili to musímenějaknaprogramovat). Velmičastojsoutedytytodatabázespojeny s frameworkyjako Map-Reduce, pro kterémusímevytvořitúlohy, kterénámtotoporozuměníumožnízískat.Map-reduce je dávkováoperace (to bychuvedl v kontrastu s on-line / in-the-click-stream synchronníoperací), abystezískalipohlednavašepropojená data.Všechny 3 pracují s agregovanýmidaty, tzn. Ževyžadujístruktutupředem, data, kterápatřílogicky k sobě (jakoobjednávka a jejíjednotlivépoložky), jsou v databáziuloženy u sebe a je k nimtaké v dotazechpřistupovánojako k celku. V key-value úložištích je tímcelkemhodnota, v CF CF a v Dok. Dbsdokumenty.OKvpřípadech, kdypřístup k datůmvyžadujepřesnětutostrukturu. Pokud se ale chcemena data podívatjinak, napříkladanalyzovat z objednávekcelkovéprodejejednotlivýchproduktů, musíme s toustrukturoutrochubojovat a to je ten důvod, proč se tolikmluví o map-reduce vespojení s těmitodatabázemi. Výhodouukládánídat v neagregovanýchformách je to, že se dajíanalyzovat a prezentovat z různáchúhlůpohledy v závislotinakonkrétnímpřípadě.A samozřejměgrafovédatabáze, kvůlikterýmtudnesjsme a o kterých se tohodozvíme o něcovíczaminutku
  9. History – Amazon decide that they always wanted the shopping basket to be available, but couldn’t take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
  10. Mongo DB has a reputation for taking liberties with durability to get speedCouch DB has good multimaster replication from Lotus Notes
  11. People talk about Codd’s relational model being mature because it was proposed in 1969 – 42 years old.Euler’s graph theory was proposed in 1736 – 275 years old.
  12. Can’t easily shard graphs like documents or KV stores.This means that high performance graph databases are limited in terms of data set size that can be handled by a single machine.Can use replicas to speed things up (and improve availability) but limits data set size limited to a single machine’s disk/memory.Some domains can shard easily (.e.g geo, most web apps) using consistent routing approach and cache sharding – we’ll cover that later.
  13. Teoriegrafůzkoumávlastnostistruktur, zvanýchgrafy. Ty jsoutvořenyvrcholy, kteréjsouvzájemněspojenéhranami. Znázorňuje se obvyklejakomnožinabodůspojenýchčárami. Formálně je grafuspořádanoudvojicímnožinyvrcholů V a množinyhran E.
  14. SedmmostůměstaKrálovce (dnes Kaliningrad)Kdodělá pro velkoufirmu, tímmyslímněkolikvrstevmanagementu, softwarovýarchitektnajinémpatřenežvývojářiTatoinformace je pro Vás, v těchtofirmáchbývátěžképrosadit “nové” technologie. Ale relační model, se kterýmpřišel E.F. Codd v roce 1969, je pouze 43 let starý. Grafový model je 276 starý. TakžepříštěažVámšéfnebochytrýarchitektřeknenaadopci NOSQL něcovesmyslu “tadypoužívámejenomzralé a prokázanévyspělétechnologie”, víte, kterýmsměrem ho máteposlat… tímmámnamyslitřebatutopřednáškunawebunebopříslušnéstránkynawikipedii. Takžejakukládáme data v grafu…
  15. Takžejakukládáme data v grafu…V grafuukládámedata jakovrcholy a vrcholyjsouvlastnědokumenty, kterémodoumítlibovolnéklíče a k nimpřiřazenéhodnoty. Stejnějakodokument v MongoDB. V čem se grafliší od MongoDB je že v grafujsouvztahymezivrcholy. A to je trade-off, MongoDB je lépeškálovatelné, protožetohlenedělá. Neo4J je lepší pro propojená data, tohledělá. Ukládávztahymezijednotlivýmivrcholy. Ale nenítakdobřeškálovatelné. A do musímevzít v potazpřiřešeníVašichproblémů: chcetemasivníškálovatelnost, nebookamžitýnáhled do propojenostiVašich dat. POPSAT GRAFVztahymajisemantickyvyznam! Recnici, prednasky v RDBMSJe to poměrněintuitivnízpůsobukládánídat! Úkolgrafovédatabáze je vzíttatointuitivní data, kterásimůžemejednodušenačrtnoutnatabulinebokuspapíru a rychle je procházetvevašichprogramech.
  16. A to je jednahezkávlastnostgrafů – jsouideální pro tabule,zadnístranyobálek, pivníchtácků a krabiček od cigaret… to jsouvěci, nakterýchtynejlepšídesigny (zejménavestartupech) většinouvznikajíJájsemsivybraljakopříkladWebExpo, původnějsemchtělzmapovatkorupčníaféryčeskýchpolitiků, ale tohle je o něconeškodnější. Vztahymeziřečníky, přednáškam, tématy, účastníky a podobněsimůžemenakreslitnapivnítácek! WebExpo je doména,kterámáspoustuvztahů – řečnícimajípřednášky, …To simůžetejednodušenakreslitnatabuli, to je mimochodem to, co dělámejakoprogramátoři, kdyžsedíme s lidmi, kteřípotřebujínějakýkussoftwaru a my se snažímetomu business problému, tédoméněporozumět. Sednemsi k tabuli, nakreslímezákazníky, objednávky, faktury, produkty a podobně a vztahymezinimi!A co udělámepak – vezmemenášpěkný design a denormalizujeme ho. Potíme se vymýšlením, jak to všechnonaládujeme do tabulek. A jsmešťastní a usměvaví, než to zpustímenaživo, do provozu…. A ono to bežíjakželva… Co uděláme? Denormalitzujemenáš model! Všechnaenergie, kteroujsmeinvestovali, krev, pot a slzy, všechno v niveč. U grafovédatabáze, to co je napapíře je přesně to, co naházíte do databáze.
  17. To neznamená,žejsteomluveni s designovéfáze. Pořád se musítehlubocezamysletnadtím, jaké entity (neboobjekty) tvořívašidoménu a jakéjsoumezinimivztahy! Stálepotřebujete design.Nemůžetejednoduševzít data ztabulek, kterámáte a násilím je natřískat do vašízbrusunovégrafovédatabáze. Člověkmusízačítmyslet v nódách a vztazích.Přinavrhovánídatovéhomodelu pro WebExpomusímeudělathodnědesignovýchrozhodnutí: jakodlišitřečníky od účastníků? A je to vůbecpotřeba? Udělatzepátka a sobotynódy, nebojenomvlastnostnajednotlivýchpřednáškách?Stálemusítedělat design, ale pointa je že design datovéhomodelu pro grafovoudatabázimůžebýtpříjemná a přirozenázkušenost.
  18. Stará se proVás o nódy, vztahymezinimi a indexy.Neo4j je stabilní a běží od roku 2003ProcházíaktivnímvývojemPrimárně pro Javu, ale použitelná se spoustoudalšíchtechnologiíIdeální pro škáludesítekserverů v clusteru, ne pro stovkyPro hustěpropojená data, není to KV store
  19. 32 billion nodes, 32 billion relationships, 64 billion properties
  20. Plně a militantně ACID. Kdoneví, co to znamená?Rychlevysvětlit: atomicity, consistency, isolation, durabilityNěkterédalší NOSQL databáze se vzdávajíněkterýchgarancíveprospěchvýkonu, u Neo4j tohlevypnoutnejde. Data jsouvždyzapsánana disk.
  21. Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
  22. Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
  23. Neo mázabudovanoucelouknihovnugrafovýchalgoritmů, jakonejkratšícesta, všechnycesty, atp
  24. 1m hops zasekundunanormálnímlaptopu, žádnýrozdílpřiznásobenípočtudatHigh performance graph operationsTraverses 1,000,000+ relationships / second on commodity hardware
  25. Obecněpokudpoužíváte MySQL a neplatítezaněj, nebudeteplatitaniza Neo.
  26. Pojďmesikázatpoužití v embedded módunakonkrétnímpříkladu. Vytvořiljsemgraf z webexpa, řečníci a přednáškyjsouopravdové, 1000 účastníkůmánáhodněvygenerovanájména. Popsatgraf a scénář.KdonečteJavuKodbudenagithubu
  27. Vztahymůžoubýtbuďřetězceznaků, neboEnum, kterévámdajívýhodustatickéhotypování v IDE, pro Neo4j v tom nenížádnýrozdíl.Postupopakujemedokudnemámecelýgraf
  28. Tohle je screenshot z webovékonzole, kdemůžemegrafvizálněprocházet. Běžínalaptopu, dámVámnakonci URL, abystesi s tímmohlipohrát.Tak, mámegraf, ale jak z nějteďdostaneme data ven?
  29. Existujeněkolikzpůsobů,jakpsátdotazy v Neo4j, liší se čitelností, složitostí, výkonem a úrovníabstrakce. UkážuVámněkterézezpůsobů a začnuodspoda, tzn. On nativníhonejrychlejšího API.
  30. Core API pracujepřímo s jednotkami, kteréjsme do databázeuložili – vrcholy, hrany a jejichvlastnosti.
  31. Podívejme se ještějednounavelýgraf. Novýgrafmávždyjednunódu s ID 0, z téjsmeudělalliWebExpo.
  32. Tohle je imperativní API, všechnupráciděláprogramátor, je nejvýkonnější
  33. Pojďme se podívat o úroveňvýš co se abstrakcetýčenatakzvané traversal API, kterénámumožnípsátdotazydeklarativně, to znamenápopsat, jakchcemegrafprocházet. Samotnéprocházeníudělá Neo4J zanás.
  34. Můžemepsátvlastníevaluatory
  35. Dalšípovedenoufunkcí je knihovnaalgoritmů pro hledánícestmezidvěmauzly.
  36. Takénejkratšícesta, Dijkstra a další
  37. Těžké pro neprogramátory, pojďmě se podívatnaněcojednoduššího
  38. Na nejvyššíúrovniabstrakce Neo4j zprostředkovávásvůjvlastníjazyk pro psanídotazů, částečněinspirovaný SQL. Ten jazyk se jmenuje Cypher a rozumílidskyčitelnýmpříkazům, jakonapříkladtomu, kterýtadyteďvidíte.
  39. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  40. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  41. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  42. A výsledek pro panavotrubu.
  43. “Tales from the Trenches” for further tips