Windy City DB - Recommendation Engine with Neo4j
Upcoming SlideShare
Loading in...5
×
 

Windy City DB - Recommendation Engine with Neo4j

on

  • 5,257 views

 

Statistics

Views

Total Views
5,257
Views on SlideShare
5,220
Embed Views
37

Actions

Likes
17
Downloads
122
Comments
2

4 Embeds 37

http://www.twylah.com 32
http://coderwall.com 2
http://madalgorithmist.wordpress.com 2
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Bad exemple for Cypher, not optimazed
    Need like this:
    START me=node:people('name:MAX')
    ,sim_profiles = node(1,2,3)
    MATCH sim_profiles-[:rel_name]->i1, me-[:rel_name]->i2
    WHERE NOT(i1 = i2)
    RETURN i1, COUNT(*);

    Because, condition on `where` running for all interests. It's terrible
    Are you sure you want to
    Your message goes here
    Processing…
  • excellent article about recommendation engine using with Graph Database
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Atomic = all or nothing, consistent = stay consistent from one tx to another, isolation = no tx will mess with another tx, durability = once tx committed, it stays

Windy City DB - Recommendation Engine with Neo4j Windy City DB - Recommendation Engine with Neo4j Presentation Transcript

  • Adding aRecommendation Engine WindyCityDB Max De Marzi
  • About Me Built the Neography Gem (Ruby Wrapper to the Neo4j REST API) Playing with Neo4j since 10/2009• My Blog: http://maxdemarzi.com• Find me on Twitter: @maxdemarzi• Email me: maxdemarzi@gmail.com• GitHub: http://github.com/maxdemarzi
  • Agenda• What is Neo4j?• What is Neography?• Approaches• Gremlin• Gremlin Recommends• Cypher• Cypher Recommends View slide
  • What is Neo4j?• A Graph Database + Lucene Index• Property Graph• Full ACID (atomicity, consistency, isolation, durability)• High Availability (with Enterprise Edition)• 32 Billion Nodes, 32 Billion Relationships, 64 Billion Properties• Embedded Server (Java or JVM languages)• REST API (for everyone else) View slide
  • Obligatory CAP theorem Slide • Neo4j is at the section joining Consistency with Availability just like your RDBMS
  • Good For• Highly connected data (social networks)• Recommendations (e-commerce)• Path Finding (how do I know you?)• A* (Least Cost path)• Data First Schema (bottom-up, but you still need to design)
  • Property Graph
  • If you’ve ever• Joined more than 7 tables together• Modeled a graph in a table• Written a recursive CTE• Tried to write some crazy stored procedure with multiple recursive self and inner joins You should use Neo4j
  • What is Neography?
  • A very thin wrapper to the Neo4j REST API• Two layers: Following the REST API Ruby Sugar• Made for graph database programmer happiness >8-]• Read the github Readme and Specs• Can I haz pull requests?• Want Active Record? Use the activerecord-neo4j-adapter gem which sits on top of neography
  • How do I install Neo4j?Create a project directory anddo this now if you haven’t yet.Make sure wget is installed.
  • Approaches
  • Collaborative Filtering• Step 1: Collect User Behavior• Step 2: Find similar Users• Step 3: Recommend Behavior taken by similar users
  • Content Based Filtering• Step 1: Collect Item Characteristics• Step 2: Find similar Items• Step 3: Recommend Similar ItemsMarko likes Romantic Zombie Comedies,what other romantic zombie comedies are there?Tweet him @twarko
  • Hybrid• Combine the two for better results.• Example: Netflix
  • What is Gremlin?
  • Gremlin is• A Graph Traversal Language• A domain specific language for traversing property graphs• Implemented by most Graph Database Vendors• Primarily seen with the Groovy Language• With JVM connectivity in Java, Scala, and other languages
  • Created by:Marko Rodriguezhttp://markorodriguez.com
  • A Graph DSLA Dynamic Language for the JVMA Data Flow Framework“JDBC” for Graph DBs
  • Gremlin Recommends
  • Hybrid movie recommendations
  • Our Graph (from MovieLens)
  • Recommendation Algorithmm = [:];x = [] as Set; (continued)v = g.v(node_id); outV. outE(rated).v. filter{it.stars > 3}.out(hasGenre). inV.aggregate(x). filter{it != v}.back(2). filter{it.out(hasGenre).toSet().equals(x)}.inE(rated). groupCount(m){"${it.id}:${it.title}"}.iterate();filter{it. stars > 3}. m.sort{a,b -> b.value <=> a.value}[0..24]
  • Explanationm = [:];x = [] as Set;v = g.v(node_id);In Groovy [:] is a map, we will return thisThe set “x” will hold the collection of genres we want our recommendedmovies to have.v is our starting point.
  • Explanationv.out(hasGenre). (we are now at a genre node)aggregate(x).We fill the empty set “x” with the genres of our movie.These are the properties we want to make sure our recommendations have.
  • Explanationback(2). (we are back to our starting point)inE(rated).filter{it. stars > 3}. (we are now at the link between our movie and users)We go back two steps to our starting movie, go to the relationship ‘rated’and filter it so we only keep those with more than 3 stars.
  • ExplanationoutV. (we are now at a user node)outE(rated).filter{it.stars > 3}. (we are now at the link between user and movie)We follow our relationships to the users who made them, and thengo to the “rated” relationships of movies which also received morethan 3 stars.
  • ExplanationinV. (we are now at a movie node)filter{it != v}.We follow our relationships to the movies who received the, but filter out “v”which is our starting movie. We do not want the system to recommend thesame movie we just watched.
  • Explanationfilter{it.out(hasGenre).toSet().equals(x)}.We also want to keep only the movies that have the same genres as ourstarting movie. People gave Toy Story and Terminator both 4 stars,but you wouldn’t want to recommend one to the other.
  • ExplanationgroupCount(m){"${it.id}:${it.title}"}.iterate();groupCount does what it sounds like and stores the values in the map “m”we created earlier, but we to retain the id and title of the movies.iterate() is needed from the Neo4j REST API, the gremlin shell doesit automatically for you. You will forget this one day and kill30 minutes of your life trying to figure out why you get nothing.
  • Explanationm.sort{a,b -> b.value <=> a.value}[0..24]Finally, we sort our map by value in descending order and grab the top25 items… and you’re done.See http://maxdemarzi.com/2012/01/16/neo4j-on-heroku-part-two/for the full walk-through including data loading.
  • What is Cypher?
  • The blue pill after the red pill• Graph Query language for Neo4j• Based on Pattern Matching• Makes querying easy again
  • Cypher : Neo4j Query LanguageASCII ART FTW: a--b, a-->b, a<--c
  • Cypher Recommends
  • Similar UsersWhich users have rated movies that I have rated within one star of my rating?
  • Average RatingWhat is the average rating(by similar users) of a movie not rated by me?
  • Movies I should seeWhich movies (that I havent seen) have been rated 4 stars or higherby users similar to me?
  • Questions? ?
  • Thank you! http://maxdemarzi.com