Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

Presented at the Downtown SF Lucene/Solr Meetup by Ai Sasho, Sony Interactive Entertainment

  • Be the first to comment

Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

  1. 1. Developing Scalable User Search for PlayStation 4 Ai  Sasho   Sr.  So/ware  Engineer   Sony  Interac6ve  Entertainment    
  2. 2. ©2016  Sony  Interac6ve  Entertainment     About My Team §  Developing  social  features  for  PS4  to  improve  social  gaming  experiences.     §  Worked  on  User  Search  and  Players  You  May  know  recommenda6on   features.   §  Server  side:  Isaias,  Marlon,  Pavan,  Chris,  Janhavi,  Xifan,  Venkat   §  Client  side:  Tomas,  Nythya,  Max,  Yukio,  Katsuya,  Eric,  Tong     Sony  Interac8ve  Entertainment   =  Sony  Network  Entertainment  Intn’l   +  Sony  Computer  Entertainment   =  Greatness  Awaits!    
  3. 3. ©2016  Sony  Interac6ve  Entertainment    
  4. 4. ©2016  Sony  Interac6ve  Entertainment     Outline §  User Search Feature Overview §  SolrCloud Setup §  Personalized Search: Lucene + SolrCloud §  Challenges §  Solr4.8 to 5.4 Upgrade
  5. 5. ©2016  Sony  Interac6ve  Entertainment     User Search
  6. 6. ©2016  Sony  Interac6ve  Entertainment     User Search
  7. 7. §  Fast •  query should return < 100 ms §  Reliable / Fault Tolerant §  Scalable •  SolrCloud cluster need to handle: o  Up to 1000 RPS query requests o  Up to 250 RPS indexing requests •  Appr. 300 millions documents §  Ranking search results by friendship. •  Up to n degrees of separation. •  Friends, 2nd degree fiends (friends of friends), etc. ©2016  Sony  Interac6ve  Entertainment     User Search: Requirements
  8. 8. ©2016  Sony  Interac6ve  Entertainment     SolrCloud: System Architecture ZooKeeper   SolrCloud  cluster   Leader   a   Replica   Leader   a   Replica   Leader   a   Replica   Leader   a   Replica   ELB   Applica6on   Servers   Database  
  9. 9. §  SolrCloud 5.4 §  Documents •  User data (~ 1.5 kb per user) •  ID, Online ID, Name (First, Middle, Last), Privacy, User Type, etc.. •  ~ 300 million documents §  Shards •  4 shards + many replicas. •  # shards determined experimentally. •  Most of the docs on each shard fit in the memory. §  Cache •  Query Result Cache, Document Cache, Filter Cache, etc .. §  Commit •  SoftAutoComit: 5 secs •  AutoCommit: 15 mins (OpenSearcher=false) ©2016  Sony  Interac6ve  Entertainment     SolrCloud: Configurations
  10. 10. §  Tokenizers •  Whitespace Tokenizer §  Filters §  Ascii Folding Filter o  Stored and queried with equivalent English alphabets. o  Joan Miró -> Joan Miro §  N-Gram Filter o  abc -> a, b, c, ab, bc, abc o  Takes up more space, but faster than wildcard (*) when queried. §  Lower Case Filter ©2016  Sony  Interac6ve  Entertainment     SolrCloud: Configurations
  11. 11. §  People search users they know or they kind of know... §  Search results should be ranked by the friendship between the searcher and the searched (users). ©2016  Sony  Interac6ve  Entertainment     Personalized Search: Overview User  A    <-­‐  Friend  (1st  degree  of  separa6on)   User  B    <-­‐  Friend  (1st  degree  of  separa6on)   User  C    <-­‐  Friend  of  Friend  (2nd  degree  of  separa6on)   ...     User  Y  <-­‐  Not  associated.   User  Z  <-­‐  Not  associated.    
  12. 12. ©2016  Sony  Interac6ve  Entertainment     Personalized Search: Ideas q=ps4king& bf=friends:(ID1 or ID2 or ID3 or …)^500& bf=friends2nd:(ID4 or ID5 or ID6 or …)^50& bf=friends3rd:(ID7or ID8 or ID9 or …)^5& … Possible  Solu8on  1  :  Query  SolrCloud  with  the  list  of  friend  IDs.     Problems   •  The  list  of  friends  can  be  very  long  (poten6ally  thousands).   •  Increases  the  query  latency.   Giving  a  higher  boost  for  users  who  are  closer  to  the  caller.  
  13. 13. Possible Solution 2: Index the friendship in SolrCloud. Add “friends“ fields, if the caller is in one of the “friends” fields, boost the document. Problems: o  Too many requests to Solr. o  Maintaining friendship in Solr in addition to our database might be overkill. o  Requires a large disk space. ©2016  Sony  Interac6ve  Entertainment     Personalized Search: Ideas
  14. 14. ©2016  Sony  Interac6ve  Entertainment     Personalized Search: Our Solution +   Personalized  Index     Stores  people  close  to  the  caller  (friends,   friends  of  friends,  up  to  n  degrees  of   separa6on).     §  Also  used  in  friend  recommenda6on   system.   §  Other  team  already  uses  Lucene  index  for   user  owned  games.     Global  Index     Includes  all  the  users.    
  15. 15. ©2016  Sony  Interac6ve  Entertainment     Personalized Search: Lucene + SolrCloud Online  ID   First   Name   ….   Degree  of   Separa6on   ps4Queen   Marge   …   1   ps4King   Homer   ...   1   ps4aweso me   Bart   …   2   …   …   …   …   Lucene  Index  (simplified)   Applica6on   Server   Friendship   Data   §  Lucene  index  created  on-­‐ demand  for  the  caller     §  Cached  temporarily   +  
  16. 16. §  Hard to increase the performance using two index systems. (Lucene + SolrCloud) •  Tuned SolrCloud a lot (cache size, query optimization, soft/auto commit settings, GC settings, etc.) §  Not a problem anymore, but SolrCloud had been unstable for a while. •  Entire cluster would have gone down a couple of times a month. ©2016  Sony  Interac6ve  Entertainment     Challenges
  17. 17. §  Increased the number of replicas •  When leader goes in recovery, need to have enough replicas to handle all the requests. §  Reconfigured GC settings with CMS (concurrent mark sweep). §  Decreased the size of the document query cache. o  Cache warm-up time was longer than the soft auto commit duration -> was always warming the cache. ©2016  Sony  Interac6ve  Entertainment     Challenges: Instability Solutions
  18. 18. ©2016  Sony  Interac6ve  Entertainment     SolrCloud Upgrade §  Motivations •  Originally Solr 4.8 was used, but due to the instability issues, upgraded to Solr 5.4. §  Challeges •  Tried to data stream from a Solr 4.8 node to Solr 5.4 by joining a node, but did not work. •  Some data types have been deprecated. o  IntegerType, LongType -> TrieInteger, TrieLong o  schema.xml needed to be updated with the new data types. o  Decided to full index the 300 million documents in Solr 5.4 cluster.
  19. 19. §  First, query out 300M docs and then full indexing. §  Deep paging (specifying start index and limit) is too slow •  Solr needs to cache documents up to the starting index. §  The logical cursor cusorMark is solution to the deep paging problem. The cursorMark returns the next cursor as part of the response. §  cursorMark is not perfect. Sometimes the cursor stops before the end of the documents. Could use filter query to query the certain range of documents by ids. ©2016  Sony  Interac6ve  Entertainment     SolrCloud Upgrade: Full Indexing ...&rows=10&sort=id+asc&cursorMark=AoEjR0JQ  
  20. 20. Q  &  A   Any  Ques6ons?