Neo4j at SethGodin’s SquidoowithChief Engineer Gil Hildebrand
What’s                           ?Passionate people sharing the ideas they care aboutSocial publishing platform with over ...
Introducing PostcardsA brand new product from SquidooCurrently in private beta (not public just yet)Single page, beautiful...
Semantic Web         A group of methods         and technologies to         allow machines to         understand the      ...
Postcards get better withthe Semantic Web We parse web pages and external APIs to extract meaning. Web pages - Meta and Op...
Problem is normalization The meta tag “Hotel California” on a web page is not particularly useful unless I know the tag is...
Connecting the DotsCrawl a web page or API to extract metadataStore subjects, nouns, adjectives, and possessives intoNeoQu...
Stacking UpPostcards are organized into Stacks. Stacks are ataxonomy based on media type and other commonfactors. Ex:  Boo...
Modeling TaxonomyFound that the “Parts of Speech” are a great way tomodel Postcards taxonomy.All Postcards have:  Name of ...
Parts of Speech
Modeling with our existingDB platforms
Very familiar with MySQL.Extremely reliable.Relational model makes normalization possible, butscaling is a concern as join...
Schema                                    QueriesCREATE TABLE post_meta (   post_id BIGINT,   user_id VARCHAR,   date_crea...
At Squidoo, used primarily for analytics.Massively scalable, but no relational model oraggregation features. Heavy denorma...
Truly RelationalOur data model is very much a graph problemRecommendation systems are one query away (easy!)Meets all our ...
Week One with Neo
Evaluating Tech Requirements High availability Great administrative tools Great PHP wrapper   https://github.com/jadell/ne...
Learning to think in graphs wasHARD, but now feels NATURAL              Should it be a node or a property?              Wh...
Reviewing Sample Graphs        (It Helps)Official Examples: http://bit.ly/RzCDY95 Common Graphs: http://slidesha.re/cnomwzM...
Designing with paper or flow chart
Learning PHP wrapper
First Prototype           Basic HTML           REST API only             Easy to get started,             but the real pow...
Extending thePrototype with Cypher Implement Cypher for recommendations and other traversals. Cypher looks intimidating at...
3 Graph Design Tips
Tip #1: Use reference nodes   START ref=node:Meta(title = "Actor")   MATCH ref<-[:IS]-actor   RETURN actor;
Tip #2: Use reference properties    foreach ($posts as $post) {      if ($post->getProperty(‘type’) == ‘Actor’) {      // ...
Tip #3: Schema ChangesAt first, there were a lot of schema changes duringdevelopmentNo equivalent to MySQL’s ALTER TABLE or...
Tip #3.1: Schema Changes      Wiped your DB and need to start over? Use an initialization script to set things up.function...
Postcards Demo
Homepage
A Single Postcard
Nouns           “Noun” is our word for the        domain or media type associated                with a Postcard
Movie NounJust one example. We have books, music albums, products, and many others!
Single User’s Stack about DirectorMartin Scorsese
Single User’s Stack about DirectorMartin Scorsese    START user=node({user_id})    MATCH user-[:POSTED]->post-[:POST]->sub...
Finding Stacks for a Postcard   START post=node:post(post_id={post_id})   MATCH post-[:POST]->subject-->adjective-[:IS]->p...
Finding a user’s “Liked” Postcards     START user=node({user_id})     MATCH user-[:LIKED]->post-[:POST]->subject     RETUR...
Popularity Sorting Popularity is based on Likes, Comments, and other social signals, using a time decay factor to favor ne...
Next StepsFollow Users and Stacks (Activity Stream)Load BalancingDisambiguation
The End          Gil Hildebrand          gil@squidoo.com
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
Upcoming SlideShare
Loading in...5
×

When Relational Isn't Enough: Neo4j at Squidoo

7,163
-1

Published on

A look at how we used Neo4j to power Squidoo's new Postcards product

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,163
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • When Relational Isn't Enough: Neo4j at Squidoo

    1. 1. Neo4j at SethGodin’s SquidoowithChief Engineer Gil Hildebrand
    2. 2. What’s ?Passionate people sharing the ideas they care aboutSocial publishing platform with over 3 million users100mm+ pageviews per month, Quantcast ranked #35in US
    3. 3. Introducing PostcardsA brand new product from SquidooCurrently in private beta (not public just yet)Single page, beautifully designed personalrecommendations of books, movies, music albums,quotes, and other products and media types
    4. 4. Semantic Web A group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information
    5. 5. Postcards get better withthe Semantic Web We parse web pages and external APIs to extract meaning. Web pages - Meta and Open Graph tags Title, Description, Photo, and Video External APIs Amazon, IMDB, Freebase, Google, YouTube, Bing, and more
    6. 6. Problem is normalization The meta tag “Hotel California” on a web page is not particularly useful unless I know the tag is music related - then I can search for music albums containing Hotel California. This is not easy, but the web as a whole is becoming more structured.
    7. 7. Connecting the DotsCrawl a web page or API to extract metadataStore subjects, nouns, adjectives, and possessives intoNeoQuery Neo to organize subjects into Stacks based onnouns, adjectives, and possessives
    8. 8. Stacking UpPostcards are organized into Stacks. Stacks are ataxonomy based on media type and other commonfactors. Ex: Books Stack Crime Novel Books Stack Tom Clancy Books StackStacks created automatically based on metadataassociated with each Postcard.Minimum of three Postcards is required for a Stack toexist.
    9. 9. Modeling TaxonomyFound that the “Parts of Speech” are a great way tomodel Postcards taxonomy.All Postcards have: Name of the item (subject) Domains or media types (nouns) Descriptors (adjectives) Owners or creators (possessives)
    10. 10. Parts of Speech
    11. 11. Modeling with our existingDB platforms
    12. 12. Very familiar with MySQL.Extremely reliable.Relational model makes normalization possible, butscaling is a concern as joins get larger and larger.
    13. 13. Schema QueriesCREATE TABLE post_meta ( post_id BIGINT, user_id VARCHAR, date_created SMALLINT, subject VARCHAR, Seth Godin’s Business Books noun VARCHAR, KEY (user_id), SELECT m.post_id FROM post_meta m KEY (date_created), JOIN possessives USING(user_id) KEY (subject), JOIN adjectives USING(user_id) KEY (noun) WHERE); possessive=Seth Godin AND adjective=BusinessCREATE TABLE adjectives ( AND noun=Book; post_id BIGINT, user_id VARCHAR, 90s Rock Music Albums adjective VARCHAR, PRIMARY KEY (user_id, adjective), SELECT m.post_id FROM post_meta m KEY (adjective) JOIN adjectives USING(user_id)); WHERE adjective=RockCREATE TABLE possessives ( AND noun=Music; post_id BIGINT, AND date_created BETWEEN 1990 AND user_id VARCHAR, 1999; possessive VARCHAR, PRIMARY KEY (user_id, possessive), KEY (possessive));
    14. 14. At Squidoo, used primarily for analytics.Massively scalable, but no relational model oraggregation features. Heavy denormalization required.Many operations have to be performed asynchronouslyusing queues or batch processes.
    15. 15. Truly RelationalOur data model is very much a graph problemRecommendation systems are one query away (easy!)Meets all our tech requirements
    16. 16. Week One with Neo
    17. 17. Evaluating Tech Requirements High availability Great administrative tools Great PHP wrapper https://github.com/jadell/neo4jphp Commercial support
    18. 18. Learning to think in graphs wasHARD, but now feels NATURAL Should it be a node or a property? Which direction should the relationship point? More so than any other type of database I’ve encountered, graph DBs require you to know in advance exactly what queries you’ll need to perform.
    19. 19. Reviewing Sample Graphs (It Helps)Official Examples: http://bit.ly/RzCDY95 Common Graphs: http://slidesha.re/cnomwzMovies: http://bitly.com/QZbGw0
    20. 20. Designing with paper or flow chart
    21. 21. Learning PHP wrapper
    22. 22. First Prototype Basic HTML REST API only Easy to get started, but the real power comes from Cypher
    23. 23. Extending thePrototype with Cypher Implement Cypher for recommendations and other traversals. Cypher looks intimidating at first, and the “it’s like SQL” analogy was not particularly helpful for me. However, Cypher is essential for using Neo’s most powerful features, and is worth learning. Once you get past the strange (but necessary) arrow syntax, it does start to feel like SQL.
    24. 24. 3 Graph Design Tips
    25. 25. Tip #1: Use reference nodes START ref=node:Meta(title = "Actor") MATCH ref<-[:IS]-actor RETURN actor;
    26. 26. Tip #2: Use reference properties foreach ($posts as $post) { if ($post->getProperty(‘type’) == ‘Actor’) { // do something special for actors } }
    27. 27. Tip #3: Schema ChangesAt first, there were a lot of schema changes duringdevelopmentNo equivalent to MySQL’s ALTER TABLE orTRUNCATE TABLETwo options: Shut down Neo, rm -rf data/graph.db/*, and restart Or use this plugin: http://bitly.com/rHFSu6 With the plugin, node IDs do not restart from zero
    28. 28. Tip #3.1: Schema Changes Wiped your DB and need to start over? Use an initialization script to set things up.function initialize() { $master = $this->client->getNode(0); $master->setProperty(title, Master)->setProperty(parent, )->save(); // should be node 1 $user_master = $this->client->makeNode(); $user_master->save(); $user_index = new EverymanNeo4jIndexNodeIndex($this->client, users); $user_index->save(); $post_index = new EverymanNeo4jIndexNodeIndex($this->client, post); $post_index->save(); $index = new EverymanNeo4jIndexNodeIndex($this->client, master); $nouns = array(Movie, Music, TV, Book, Video, Article, Photo, Product, Game, Squidoo); foreach ($nouns as $noun) { $node = $this->client->makeNode(); $node->setProperty(title, $noun)->setProperty(type, master)->save(); $index->add($node, noun, $noun); $index->save(); $node->relateTo($master, IS)->save(); $noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun); $noun_index->save(); } }
    29. 29. Postcards Demo
    30. 30. Homepage
    31. 31. A Single Postcard
    32. 32. Nouns “Noun” is our word for the domain or media type associated with a Postcard
    33. 33. Movie NounJust one example. We have books, music albums, products, and many others!
    34. 34. Single User’s Stack about DirectorMartin Scorsese
    35. 35. Single User’s Stack about DirectorMartin Scorsese START user=node({user_id}) MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive WHERE possessive.title={meta} AND subject.type={noun} RETURN DISTINCT post, COLLECT(subject) as subject; {user_id} = 123 {meta} = Martin Scorsese {noun} = Movie
    36. 36. Finding Stacks for a Postcard START post=node:post(post_id={post_id}) MATCH post-[:POST]->subject-->adjective-[:IS]->parent RETURN subject, adjective, parent;
    37. 37. Finding a user’s “Liked” Postcards START user=node({user_id}) MATCH user-[:LIKED]->post-[:POST]->subject RETURN DISTINCT post, COLLECT(subject) as subject;
    38. 38. Popularity Sorting Popularity is based on Likes, Comments, and other social signals, using a time decay factor to favor newer Postcards. Difficult to find an algorithm that allowed us support time decay without having to constantly re-score all Postcards. Long story short, we use Cypher’s ORDER BY for sorting. We perform a calculation based on pop_score and pop_date properties that exist in each Postcard node. An individual Postcard’s pop_score and pop_date are updated in real time when someone interacts with it.
    39. 39. Next StepsFollow Users and Stacks (Activity Stream)Load BalancingDisambiguation
    40. 40. The End Gil Hildebrand gil@squidoo.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×