Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bootstrapping Recommendations OSCON 2015


Published on

Building Recommendation Engines with Neo4j

Published in: Internet
  • Be the first to comment

Bootstrapping Recommendations OSCON 2015

  1. 1. Bootstrapping  Recommendations
 with  Neo4j OSCON
  2. 2. About  Me • Max  De  Marzi  -­‐  Neo4j  Field  Engineer     • My  Blog:   • Find  me  on  Twitter:  @maxdemarzi   • Email  me:   • GitHub:
  3. 3. Big  Data  -­‐  What  is  it  good  for? • Absolutely  Nothing!
 • Benchmarks
 Is  this  performing  better  then  that?  Yes,  why?  Uh.   • Recommendations
 You  should  buy  this  right  now.   • Predictions
 You  will  probably  buy  this.
  4. 4. Top  10  Recommendations • Popularity
 The  naive  approach
 One  size  fits  most
  5. 5. Naive  Approach I’m  getting  little  Timmy  some   “Cards  Against  Humanity”  
  6. 6. Content  Based  Recommendations • Step  1:  Collect  Item  Characteristics   • Step  2:  Find  similar  Items   • Step  3:  Recommend  Similar  Items   • Example:  Similar  Movie  Genres
  7. 7. There  is  more  to  life  than  Romantic  Zombie-­‐coms
  8. 8. Collaborative  Filtering  Recommendations • Step  1:  Collect  User  Behavior   • Step  2:  Find  similar  Users   • Step  3:  Recommend  Behavior  taken  by  similar  users   • Example:  People  with  similar  musical  tastes
  9. 9. You  are  so  original!
  10. 10. Using  Relationships  for  Recommendations Content-­‐based  filtering   Recommend  items  based  on  what  users   have  liked  in  the  past   Collaborative  filtering     Predict  what  users  like  based  on  the   similarity  of  their  behaviors,  activities   and  preferences  to  others   Movie Person Person RATED SIMILARITY rating:  7 value:  .92
  11. 11. Hybrid  Recommendations • Combine  the  two  for   better  results   • Like  Peanut  Butter  and   Jelly
  12. 12. Benefits  of  Real-­‐Time  Recommendations Online  Retail   • Suggest  related  products  and  services   • Increase  revenue  and  engagement   Media  and  Broadcasting   • Create  an  engaging  experience   • Produce  personalized  content  and  offers   Logistics   • Recommend  optimal  routes   • Increase  network  efficiency
  13. 13. Challenges  for  Real-­‐Time  Recommendations Make  effective  real-­‐time  recommendations   • Timing  is  everything  in  point-­‐of-­‐touch  applications   • Base  recommendations  on  current  data,  not  last  night’s  batch  load   Process  large  amounts  of  data  and  relationships  for  context   • Relevance  is  king:  Make  the  right  connections   • Drive  traffic:  Get  users  to  do  more  with  your  application   Accommodate  new  data  and  relationships  continuously   • Systems  get  richer  with  new  data  and  relationships   • Recommendations  become  more  relevant
  14. 14. Relational  vs.  Graph  Models Relational  Model Graph  Model RATED RATED RATED MAX Person MovieRatings MAX Terminator Toy  Story Titanic
  15. 15. Cypher  Query  Language MATCH  (:Person  {  name:“Dan”}  )  -­‐[:KNOWS]-­‐>  (:Person  {  name:“Ann”}  )   KNOWS Dan Ann Label Property Label Property Node Node
  16. 16. MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),              (sub)-­‐[:MANAGES*1..3]-­‐>(report)   WHERE  =  “John  Doe”   RETURN  AS  Subordinate,  
    count(report)  AS  Total Express  Complex  Queries  Easily  with  Cypher Find  all  direct  reports  and  
 how  many  people  they  manage,  
 up  to  3  levels  down Cypher  QuerySQL  Query
  17. 17. Hello  World  Recommendation
  18. 18. Hello  World  Recommendation
  19. 19. Movie  Data  Model
  20. 20. Cypher  Query:  Movie  Recommendation MATCH  (watched:Movie  {title:"Toy  Story”})  <-­‐[r1:RATED]-­‐  ()  -­‐[r2:RATED]-­‐>  (unseen:Movie)   WHERE  r1.rating  >  7  AND  r2.rating  >  7   AND  watched.genres  =  unseen.genres   AND  NOT(  (:Person  {username:”maxdemarzi"})  -­‐[:RATED|WATCHED]-­‐>  (unseen)  )   RETURN  unseen.title,  COUNT(*)   ORDER  BY  COUNT(*)  DESC   LIMIT  25 What  are  the  Top  25  Movies   • that  I  haven't  seen   • with  the  same  genres  as  Toy  Story     • given  high  ratings   • by  people  who  liked  Toy  Story
  21. 21. Let’s  try  k-­‐nearest  neighbors  (k-­‐NN) Cosine  Similarity
  22. 22. Cypher  Query:  Ratings  of  Two  Users MATCH    (p1:Person  {name:'Michael  Sherman’})  -­‐[r1:RATED]-­‐>  (m:Movie),                                (p2:Person  {name:'Michael  Hunger’})  -­‐[r2:RATED]-­‐>  (m:Movie)   RETURN  AS  Movie,  
                              r1.rating  AS  `M.  Sherman's  Rating`,                                  r2.rating  AS  `M.  Hunger's  Rating` What  are  the  Movies  these  2  users  have  both  rated
  23. 23. Cypher  Query:  Ratings  of  Two  Users Calculating  Cosine  Similarity
  24. 24. Cypher  Query:  Cosine  Similarity   MATCH  (p1:Person)  -­‐[x:RATED]-­‐>  (m:Movie)  <-­‐[y:RATED]-­‐  (p2:Person)   WITH    SUM(x.rating  *  y.rating)  AS  xyDotProduct,              SQRT(REDUCE(xDot  =  0.0,  a  IN  COLLECT(x.rating)  |  xDot  +  a^2))  AS  xLength,              SQRT(REDUCE(yDot  =  0.0,  b  IN  COLLECT(y.rating)  |  yDot  +  b^2))  AS  yLength,              p1,  p2   MERGE  (p1)-­‐[s:SIMILARITY]-­‐(p2)   SET      s.similarity  =  xyDotProduct  /  (xLength  *  yLength) Calculate  it  for  all  Person  nodes  with  at  least  one  Movie  between  them
  25. 25. Movie  Data  Model
  26. 26. Cypher  Query:  Your  nearest  neighbors MATCH  (p1:Person  {name:'Grace  Andrews’})  -­‐[s:SIMILARITY]-­‐  (p2:Person)   WITH    p2,  s.score  AS  sim   ORDER  BY  sim  DESC   LIMIT  5   RETURN  AS  Neighbor,  sim  AS  Similarity Who  are  the   • top  5  Persons  and  their  similarity  score   • ordered  by  similarity  in  descending  order   • for  Grace  Andrews
  27. 27. Your  nearest  neighbors
  28. 28. Cypher  Query:  k-­‐NN  Recommendation MATCH  (m:Movie)  <-­‐[r:RATED]-­‐  (b:Person)  -­‐[s:SIMILARITY]-­‐  (p:Person  {name:'Zoltan  Varju'})   WHERE  NOT(  (p)  -­‐[:RATED]-­‐>  (m)  )   WITH  m,  s.similarity  AS  similarity,  r.rating  AS  rating   ORDER  BY,  similarity  DESC   WITH  AS  movie,  COLLECT(rating)[0..3]  AS  ratings   WITH  movie,  REDUCE(s  =  0,  i  IN  ratings  |  s  +  i)*1.0  /  LENGTH(ratings)  AS  recommendation   ORDER  BY  recommendation  DESC   RETURN  movie,  recommendation
 LIMIT  25 What  are  the  Top  25  Movies   • that  Zoltan  Varju  has  not  seen   • using  the  average  rating   • by  my  top  3  neighbors  
  29. 29. Recommendations  over  Searching/Browsing
  30. 30. Recommend  Jobs  to  Job  Seekers What  connects  them?   • location   • skills   • education   • experience
  31. 31. Cypher  Query:  Job  Recommendation What  are  the  Top  10  Jobs  for  me   • that  are  in  the  same  location  I’m  in   • for  which  I  have  the  necessary  qualifications
  32. 32. Job  Recommendation  Results Perfect  Candidate  for  100%  matches     • missing  qualifications  can  be  added  quickly   • might  encourage  exaggerated  resumes    
  33. 33. Just  one  tiny  itsy  bitsy  problem Job  Boards  get  paid  by   • Number  of  Applicants  to  a  Job   • Wholesale  Resume  sales   • Selling  your  data  
  34. 34. Recommend  Love Find  your  soulmate  in  the  graph     • Are  they  energetic?   • Do  they  like  dogs?   • Have  a  good  sense  of  humor?   • Neat  and  tidy,  but  not  crazy  about  it? What  are  the  Top  10  Potential  Mates  for  me   • that  are  in  the  same  location   • are  sexually  compatible   • have  traits  I  want     • want  traits  I  have
  35. 35. Cypher  Query:  Love  Recommendation
  36. 36. Love  Recommendation  Results
  37. 37. Linked  Data Connect  to  the     Semantic  Web
  38. 38. Bootstrapping  your  Recommendation  Engine • Data     • Data   • Data
  39. 39. The  Concept  of  Sushi
  40. 40. What  else  is  Delicious?
  41. 41. Getting  some  Data
  42. 42. graphipedia
  43. 43. neo4j-­‐dbpedia-­‐importer­‐dbpedia-­‐importer
  44. 44. Named  Entity  Recognition Automatically  find   • names  of  people   • place  and  locations   • products   • and  organizations
  45. 45. Hacker  News  for  Example • What  are  the  kids  in   silicon  valley  talking   about?
  46. 46. Let’s  find  out • They  have  an  API!   • Get  some  data:
  47. 47. Data  Model
  48. 48. Hacker  News  Recommendations • Which  stories  should  I  read?   • Which  users  should  I  follow?   • What  else  should  I  be  interested  in?   • Who  seems  to  know  a  lot  about  X?   • Etc.
  49. 49. GraphAware  Recommendation  Framework • Ability  to  trade  off  recommendation  quality  for  speed   • Ability  to  pre-­‐compute  recommendations   • Built-­‐in  algorithms  and  functions   • Ability  to  measure  recommendation  quality   • Ability  to  easily  run  in  A/B  test  environments
  50. 50. Real-­‐Time  Recommendations  with  Neo4j Social
 Recommendations Products  
 and  Services Content Routing
  51. 51. Walmart        BUSINESS  CASE World’s  largest  company
 by  revenue   World’s  largest  retailer  and   private  employer   SF-­‐based  global  
 e-­‐commerce  division   manages  several  websites   Found  in  1969
 Bentonville,  Arkansas   • Needed  online  customer  recommendations  to   keep  pace  with  competition   • Data  connections  provided  predictive  context,  but   were  not  in  a  usable  format   • Solution  had  to  serve  many  millions  of  customers   and  products  while  maintaining  superior   scalability  and  performance
  52. 52. Walmart        SOLUTION • Brings  customers,  preferences,  purchases,   products  and  locations  into  a  graph  model   • Uses  connections  to  make  product   recommendations   • Solution  deployed  across  WalMart  
 divisions  and  websites
  53. 53. Global  Courier        BUSINESS  CASE World’s  largest  courier   480,000  employees
 €55  billion  in  revenue     Needed  new  
 B2C  and  B2B  parcel  routing   system  for  its  logistics   practice   Legacy  system  neither   supported  the  full  network   nor  the  shift  to  online   demands Needed  to  replace  aging  B2B  and  B2C  parcel  routing   system  whose  requirements  include:   • 24x7  availability   • Peak  loads  of  5M  parcels  per  day,  3K  per  second   • Support  for  complex  and  diverse  software  stack   • Predictable  performance  with  linear  scalability   • Daily  changes  to  logistics  networks   • Route  from  any  point  to  any  point   • Single  point  of  truth  for  entire  network
  54. 54. Global  Courier        SOLUTION Neo4j  provides  the  ideal  domain  fit  since  
 a  logistics  network  is  a  graph   • High  availability  and  performance  via  Neo4j   clustering   • Greatly  simplified  Cypher  queries  for  routing   versus  relational  SQL  queries   • Flexible  data  model  that  reflects  the  real   logistics  world  far  better  than  relational   • Easy-­‐to-­‐grasp  whiteboard-­‐friendly  model
  55. 55. eBay        BUSINESS  CASE C2C  and  B2C
 retail  network   Full  e-­‐commerce   functionality  for  individuals   and  businesses   Integrated  with  logistics   vendors  for  product   deliveries • Needed  an  offering  to  compete  with  
 Amazon  Prime   • Enable  customer-­‐selected  delivery  inside  
 90  minutes   • Calculate  best  route  option  in  real-­‐time   • Scale  to  enable  a  variety  of  services   • Offer  more  predictable  delivery  times
  56. 56. eBay  Now          SOLUTION • Acquired  UK-­‐based  Shutl.  a  leader   in  same-­‐day  delivery   • Used  Neo4j  to  create  eBay  Now   • 1000  times  faster  than  the  prior  
 MySQL-­‐based  solution   • Faster  time-­‐to-­‐market   • Improved  code  quality  with  
 10  to  100  times  less  query  code
  57. 57. Classmates        BUSINESS  CASE Online  yearbook   connecting  friends  from   school,  work  and  military   in  US  and  Canada   Founded  as  
 Memory  Lane  in  Seattle   Develop  new  social  networking  capabilities  to   monetize  yearbook-­‐related  offerings   • Show  all  the  people  I  know  in  a  yearbook   • Show  yearbooks  my  friends  appear  in  most  often   • Show  sections  of  a  yearbook  that  my  friends   appear  most  in   • Show  me  other  schools  my  friends  attended
  58. 58. Classmates        SOLUTION Neo4j  provides  a  robust  and  scalable  graph   database  solution   • 3-­‐instance  cluster  with  cache  sharding  and   disaster-­‐recovery   • 18ms  response  time  for  top  4  queries   • 100M  nodes  and  600M  relationships  in   initial  graph—including  people,  images,   schools,  yearbooks  and  pages   • Projected  to  grow  to  1B  nodes  and  6B   relationships
  59. 59. National  Geographic        BUSINESS  CASE Non-­‐profit  scientific  and   educational  institution   founded  in  1888   Covers  geography,   archaeology,  natural  science,   environment  and  historical   conservation   Journals,  online  media,  
 radio,  TV,  documentaries,  
 live  events  and  consumer   content  and  goods • Improve  poor  performance  of  PostgreSQL  app   • Increase  user  engagement  by  linking  to  100+  years   of  multimedia  content     • Improve  targeting  by  understand  subscribers’   interests  better   • Recommend  content  and  services  to  users  based   on  their  interests
  60. 60. National  Geographic        SOLUTION • Enabled  complex  real-­‐time  analytics  across   eight  million  users  and  a  century  of  content   • Delivered  robust  performance  by  eliminating   triple-­‐nested  SQL  joins     • Cross-­‐refers  users  among  content,  live  events,   travel,  goods  and  causes   • Neo4j  solution  much  less  cumbersome  
 and  easier  to  maintain  than  previous  
 SQL  system
  61. 61. Curaspan        BUSINESS  CASE Leader  in  patient   management  for  discharges   and  referrals   Manages  patient  referrals   4600+  health  care  facilities   Connects  providers,  payers   via  web-­‐based  patient   management  platform   Founded  in  1999  in   Newton,  Massachusetts • Improve  poor  performance  of  Oracle  solution   • Support  more  complexity  including  granular,  
 role-­‐based  access  control   • Satisfy  complex  Graph  Search  queries  by  discharge   nurses  and  intake  coordinators   Find  a  skilled  nursing  facility  within  n  miles  of  a   given  location,  belonging  to  health  care  group   XYZ,  offering  speech  therapy  and  cardiac  care,   and  optionally  Italian  language  services
  62. 62. Curaspan        SOLUTION • Met  fast,  real-­‐time  performance  demands   • Supported  queries  span  multiple  hierarchies   including  provider  and  employee-­‐permissions   graphs   • Improved  data  model  to  handle  adding  more   dimensions  to  the  data  such  as  insurance   networks,  service  areas  and  care  organizations   • Greatly  simplified  queries,  simplifying  
 multi-­‐page  SQL  statements  into  one  
 Neo4j  function
  63. 63. FiftyThree      BUSINESS  CASE Maker  of  Paper,  
 one  of  the  top  apps  
 in  Apple’s  App  Store,  with   millions  of  users   Based  in  New  York  City • Add  social  capabilities  to  digital-­‐paper  app   • Support  social  collaboration  across  millions  of   users  in  new  Mix  app   • Enable  seamless  interaction  between  social   and  content-­‐asset  networks   • Ensure  new  apps  are  robust,  scalable  and  fast
  64. 64. FiftyThree        SOLUTION • Neo4j  data  model  ideal  for  social  network,  content   management  and  access  control   • Users  create,  publish  and  share  designs  simply   • Easy  to  develop  and  evolve  Neo4j-­‐based  app   • Integrates  well  with  FiftyThree  EC2  architecture   See  the  Neo4j  solution  in  action   Betting  the  Company  (Literally)  on  a  Graph  Database­‐lessons-­‐learned#/ App  Store  Editor’s  Choice
 2012  iPad  App  of  Year
 Apple  Best  Apps  of  2014
  65. 65. Questions • How  does  Neo4j  fit  into  my  existing   infrastructure?
 As  a  Service.   • Will  Neo4j  scale?