When Relational Isn't Enough: Neo4j at Squidoo
Upcoming SlideShare
Loading in...5
×
 

When Relational Isn't Enough: Neo4j at Squidoo

on

  • 4,129 views

A look at how we used Neo4j to power Squidoo's new Postcards product

A look at how we used Neo4j to power Squidoo's new Postcards product

Statistics

Views

Total Views
4,129
Views on SlideShare
1,797
Embed Views
2,332

Actions

Likes
1
Downloads
10
Comments
0

9 Embeds 2,332

http://gilhildebrand.com 1254
http://techgil.tumblr.com 608
http://www.neotechnology.com 179
http://flavors.me 138
http://www.gilhildebrand.com 111
http://assets.txmblr.com 20
http://mbonaci.tumblr.com 13
http://www.linkedin.com 8
http://127.0.0.1 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

When Relational Isn't Enough: Neo4j at Squidoo When Relational Isn't Enough: Neo4j at Squidoo Presentation Transcript

  • Neo4j at SethGodin’s SquidoowithChief Engineer Gil Hildebrand
  • What’s ?Passionate people sharing the ideas they care aboutSocial publishing platform with over 3 million users100mm+ pageviews per month, Quantcast ranked #35in US
  • Introducing PostcardsA brand new product from SquidooCurrently in private beta (not public just yet)Single page, beautifully designed personalrecommendations of books, movies, music albums,quotes, and other products and media types
  • Semantic Web A group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information
  • Postcards get better withthe Semantic Web We parse web pages and external APIs to extract meaning. Web pages - Meta and Open Graph tags Title, Description, Photo, and Video External APIs Amazon, IMDB, Freebase, Google, YouTube, Bing, and more
  • Problem is normalization The meta tag “Hotel California” on a web page is not particularly useful unless I know the tag is music related - then I can search for music albums containing Hotel California. This is not easy, but the web as a whole is becoming more structured.
  • Connecting the DotsCrawl a web page or API to extract metadataStore subjects, nouns, adjectives, and possessives intoNeoQuery Neo to organize subjects into Stacks based onnouns, adjectives, and possessives
  • Stacking UpPostcards are organized into Stacks. Stacks are ataxonomy based on media type and other commonfactors. Ex: Books Stack Crime Novel Books Stack Tom Clancy Books StackStacks created automatically based on metadataassociated with each Postcard.Minimum of three Postcards is required for a Stack toexist.
  • Modeling TaxonomyFound that the “Parts of Speech” are a great way tomodel Postcards taxonomy.All Postcards have: Name of the item (subject) Domains or media types (nouns) Descriptors (adjectives) Owners or creators (possessives)
  • Parts of Speech
  • Modeling with our existingDB platforms
  • Very familiar with MySQL.Extremely reliable.Relational model makes normalization possible, butscaling is a concern as joins get larger and larger.
  • Schema QueriesCREATE TABLE post_meta ( post_id BIGINT, user_id VARCHAR, date_created SMALLINT, subject VARCHAR, Seth Godin’s Business Books noun VARCHAR, KEY (user_id), SELECT m.post_id FROM post_meta m KEY (date_created), JOIN possessives USING(user_id) KEY (subject), JOIN adjectives USING(user_id) KEY (noun) WHERE); possessive=Seth Godin AND adjective=BusinessCREATE TABLE adjectives ( AND noun=Book; post_id BIGINT, user_id VARCHAR, 90s Rock Music Albums adjective VARCHAR, PRIMARY KEY (user_id, adjective), SELECT m.post_id FROM post_meta m KEY (adjective) JOIN adjectives USING(user_id)); WHERE adjective=RockCREATE TABLE possessives ( AND noun=Music; post_id BIGINT, AND date_created BETWEEN 1990 AND user_id VARCHAR, 1999; possessive VARCHAR, PRIMARY KEY (user_id, possessive), KEY (possessive));
  • At Squidoo, used primarily for analytics.Massively scalable, but no relational model oraggregation features. Heavy denormalization required.Many operations have to be performed asynchronouslyusing queues or batch processes.
  • Truly RelationalOur data model is very much a graph problemRecommendation systems are one query away (easy!)Meets all our tech requirements
  • Week One with Neo
  • Evaluating Tech Requirements High availability Great administrative tools Great PHP wrapper https://github.com/jadell/neo4jphp Commercial support
  • Learning to think in graphs wasHARD, but now feels NATURAL Should it be a node or a property? Which direction should the relationship point? More so than any other type of database I’ve encountered, graph DBs require you to know in advance exactly what queries you’ll need to perform.
  • Reviewing Sample Graphs (It Helps)Official Examples: http://bit.ly/RzCDY95 Common Graphs: http://slidesha.re/cnomwzMovies: http://bitly.com/QZbGw0
  • Designing with paper or flow chart
  • Learning PHP wrapper
  • First Prototype Basic HTML REST API only Easy to get started, but the real power comes from Cypher
  • Extending thePrototype with Cypher Implement Cypher for recommendations and other traversals. Cypher looks intimidating at first, and the “it’s like SQL” analogy was not particularly helpful for me. However, Cypher is essential for using Neo’s most powerful features, and is worth learning. Once you get past the strange (but necessary) arrow syntax, it does start to feel like SQL.
  • 3 Graph Design Tips
  • Tip #1: Use reference nodes START ref=node:Meta(title = "Actor") MATCH ref<-[:IS]-actor RETURN actor;
  • Tip #2: Use reference properties foreach ($posts as $post) { if ($post->getProperty(‘type’) == ‘Actor’) { // do something special for actors } }
  • Tip #3: Schema ChangesAt first, there were a lot of schema changes duringdevelopmentNo equivalent to MySQL’s ALTER TABLE orTRUNCATE TABLETwo options: Shut down Neo, rm -rf data/graph.db/*, and restart Or use this plugin: http://bitly.com/rHFSu6 With the plugin, node IDs do not restart from zero
  • Tip #3.1: Schema Changes Wiped your DB and need to start over? Use an initialization script to set things up.function initialize() { $master = $this->client->getNode(0); $master->setProperty(title, Master)->setProperty(parent, )->save(); // should be node 1 $user_master = $this->client->makeNode(); $user_master->save(); $user_index = new EverymanNeo4jIndexNodeIndex($this->client, users); $user_index->save(); $post_index = new EverymanNeo4jIndexNodeIndex($this->client, post); $post_index->save(); $index = new EverymanNeo4jIndexNodeIndex($this->client, master); $nouns = array(Movie, Music, TV, Book, Video, Article, Photo, Product, Game, Squidoo); foreach ($nouns as $noun) { $node = $this->client->makeNode(); $node->setProperty(title, $noun)->setProperty(type, master)->save(); $index->add($node, noun, $noun); $index->save(); $node->relateTo($master, IS)->save(); $noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun); $noun_index->save(); } }
  • Postcards Demo
  • Homepage
  • A Single Postcard
  • Nouns “Noun” is our word for the domain or media type associated with a Postcard
  • Movie NounJust one example. We have books, music albums, products, and many others!
  • Single User’s Stack about DirectorMartin Scorsese
  • Single User’s Stack about DirectorMartin Scorsese START user=node({user_id}) MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive WHERE possessive.title={meta} AND subject.type={noun} RETURN DISTINCT post, COLLECT(subject) as subject; {user_id} = 123 {meta} = Martin Scorsese {noun} = Movie
  • Finding Stacks for a Postcard START post=node:post(post_id={post_id}) MATCH post-[:POST]->subject-->adjective-[:IS]->parent RETURN subject, adjective, parent;
  • Finding a user’s “Liked” Postcards START user=node({user_id}) MATCH user-[:LIKED]->post-[:POST]->subject RETURN DISTINCT post, COLLECT(subject) as subject;
  • Popularity Sorting Popularity is based on Likes, Comments, and other social signals, using a time decay factor to favor newer Postcards. Difficult to find an algorithm that allowed us support time decay without having to constantly re-score all Postcards. Long story short, we use Cypher’s ORDER BY for sorting. We perform a calculation based on pop_score and pop_date properties that exist in each Postcard node. An individual Postcard’s pop_score and pop_date are updated in real time when someone interacts with it.
  • Next StepsFollow Users and Stacks (Activity Stream)Load BalancingDisambiguation
  • The End Gil Hildebrand gil@squidoo.com