Using AWS to Build a Graph-Based Product Recommendation System (BDT303) | AWS re:Invent 2013

10,617 views

Published on

Magazine Luiza, one of the largest retail chains in Brazil, developed an in-house product recommendation system, built on top of a large knowledge Graph. AWS resources like Amazon EC2, Amazon SQS, Amazon ElastiCache and others made it possible for them to scale from a very small dataset to a huge Cassandra cluster. By improving their big data processing algorithms on their in-house solution built on AWS, they improved their conversion rates on revenue by more than 25 percent compared to market solutions they had used in the past.

Published in: Technology, Business

Using AWS to Build a Graph-Based Product Recommendation System (BDT303) | AWS re:Invent 2013

  1. 1. Using AWS to Build a Graph-based Product Recommendation System Andre Fatala & Renato Pedigoni November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Friday, November 15, 13
  2. 2. About Magazine Luiza Magazine Luiza is one of the largest household appliance retail chains in Brazil. Focused on providing durable goods for Brazil's middle and lower-to-middle income classes. • • • • • 731 stores 8 distribution centers more than 23.000 workers 22.8 million customers multi-channel strategy Friday, November 15, 13
  3. 3. Friday, November 15, 13
  4. 4. Recommendation systems Friday, November 15, 13
  5. 5. Recommendation systems Friday, November 15, 13
  6. 6. Graphs Friday, November 15, 13
  7. 7. Graph Stack Distributed Graph Database Friday, November 15, 13 Distributed database management system
  8. 8. Graph Stack Distributed Graph Database • Used for OLTP queries Friday, November 15, 13 Distributed database management system
  9. 9. Graph Stack Distributed Graph Database • Used for OLTP queries • Native integration with Tinkerpop Friday, November 15, 13 Distributed database management system
  10. 10. Graph Stack Distributed Graph Database Distributed database management system • Used for OLTP queries • Native integration with Tinkerpop • Continuously available with no single point of failure Friday, November 15, 13
  11. 11. Graph Stack Distributed Graph Database Distributed database management system • Used for OLTP queries • Native integration with Tinkerpop • Continuously available with no single point of failure • Elastic scalability Friday, November 15, 13
  12. 12. Graph Stack Distributed Graph Database Distributed database management system • Used for OLTP queries • Native integration with Tinkerpop • Continuously available with no single point of failure • Elastic scalability • Caching layer Friday, November 15, 13
  13. 13. Graph Stack Distributed Graph Database Distributed database management system • Used for OLTP queries • Native integration with Tinkerpop • • • • Friday, November 15, 13 Continuously available with no single point of failure Elastic scalability Caching layer Built-in replication
  14. 14. Storing users data Elastic Load Balancing EC2 instance EC2 instance Auto Scaling API instances Friday, November 15, 13 m2.xlarge m2.xlarge m2.xlarge m2.xlarge m2.xlarge m2.xlarge Cassandra cluster
  15. 15. Storing users data Elastic Load Balancing EC2 instance EC2 instance Auto Scaling API instances Friday, November 15, 13 m2.xlarge m2.xlarge m2.xlarge m2.xlarge m2.xlarge m2.xlarge Cassandra cluster
  16. 16. In graph words… person Friday, November 15, 13
  17. 17. In graph words… person Friday, November 15, 13 session
  18. 18. In graph words… person Friday, November 15, 13 created session
  19. 19. In graph words… channel person Friday, November 15, 13 created session
  20. 20. In graph words… channel visited person Friday, November 15, 13 created session
  21. 21. In graph words… channel visited person created session item Friday, November 15, 13
  22. 22. In graph words… channel visited person created session viewed item Friday, November 15, 13
  23. 23. In graph words… channel visited person created session +1 viewed item Friday, November 15, 13
  24. 24. In graph words… channel visited person created session +1 add_to_cart item Friday, November 15, 13
  25. 25. In graph words… channel visited person created session +13 +1 add_to_cart item Friday, November 15, 13
  26. 26. In graph words… channel visited person created session +13 +1 bought item Friday, November 15, 13
  27. 27. In graph words… channel visited person created session +21 +13 +1 bought item Friday, November 15, 13
  28. 28. Friday, November 15, 13
  29. 29. Friday, November 15, 13
  30. 30. Base recommendations Who viewed this item also viewed Friday, November 15, 13
  31. 31. Base recommendations Who viewed this item also viewed Friday, November 15, 13
  32. 32. Base recommendations Who bought this item also bought Friday, November 15, 13
  33. 33. Base recommendations Bought after viewing this item Friday, November 15, 13
  34. 34. Base recommendations Upselling Friday, November 15, 13
  35. 35. How to query the graph for recs? Friday, November 15, 13
  36. 36. How to query the graph for recs? Friday, November 15, 13
  37. 37. Gremlin Graph Language Friday, November 15, 13
  38. 38. Gremlin Graph Language • Groovy DSL for graph traversals Friday, November 15, 13
  39. 39. Gremlin Graph Language • Groovy DSL for graph traversals • Easy to learn Friday, November 15, 13
  40. 40. Gremlin Graph Language • Groovy DSL for graph traversals • Easy to learn • Great community Friday, November 15, 13
  41. 41. Gremlin Graph Language • Groovy DSL for graph traversals • Easy to learn • Great community • Part of the Tinkerpop stack Friday, November 15, 13
  42. 42. Gremlin Graph Language • Groovy DSL for graph traversals • Easy to learn • Great community • Part of the Tinkerpop stack • Works with any Blueprints enabled graph database Friday, November 15, 13
  43. 43. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed LED 50" People who viewed a product Friday, November 15, 13
  44. 44. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed People who viewed a product g.v(4).in(‘viewed’) Friday, November 15, 13 LED 50"
  45. 45. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed People who viewed a product g.v(4).in(‘viewed’) Friday, November 15, 13 LED 50"
  46. 46. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed People who viewed a product g.v(4).in(‘viewed’) Friday, November 15, 13 LED 50"
  47. 47. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed People who viewed a product g.v(4).in(‘viewed’) Friday, November 15, 13 LED 50"
  48. 48. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed LED 50" Who viewed this product also viewed Friday, November 15, 13
  49. 49. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed LED 50" Who viewed this product also viewed g.v(4).in(‘viewed’).out(‘viewed’) Friday, November 15, 13
  50. 50. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed LED 50" Who viewed this product also viewed g.v(4).in(‘viewed’).out(‘viewed’) Friday, November 15, 13
  51. 51. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed LED 50" Who viewed this product also viewed g.v(4).in(‘viewed’).out(‘viewed’) Friday, November 15, 13
  52. 52. viewed LED TV 40" Renato viewed viewed LED TV 42" LCD TV 42" viewed viewed Fatala viewed LED 50" Who viewed this product also viewed g.v(4).in(‘viewed’).out(‘viewed’) Friday, November 15, 13
  53. 53. Processing data with Spot Instances Friday, November 15, 13
  54. 54. Processing data with Spot Instances Bob dispatch a task to Amazon SQS containing the product id Simple Queue Service (Amazon SQS) Friday, November 15, 13
  55. 55. Processing data with Spot Instances Bob dispatch a task to Amazon SQS containing the product id Simple Queue Service (Amazon SQS) consume Amazon SQS tasks EC2 instance EC2 instance m1.large m1.large … Spot instances Friday, November 15, 13 EC2 instance m1.large process W*A* recommendations
  56. 56. Processing data with Spot Instances Bob dispatch a task to Amazon SQS containing the product id Simple Queue Service (Amazon SQS) consume Amazon SQS tasks sync logs sync logs Simple Storage Service (Amazon S3) Friday, November 15, 13 EC2 instance EC2 instance m1.large m1.large … Spot instances EC2 instance m1.large process W*A* recommendations
  57. 57. Personalized e-mails Abandoned cart Friday, November 15, 13 Price dropped
  58. 58. Personalized e-mails Users receive e-mails when: Friday, November 15, 13
  59. 59. Personalized e-mails Users receive e-mails when: • A product has a price drop Friday, November 15, 13
  60. 60. Personalized e-mails Users receive e-mails when: • A product has a price drop • Abandoned a product on cart Friday, November 15, 13
  61. 61. Personalized e-mails Users receive e-mails when: • A product has a price drop • Abandoned a product on cart • Visits many similar products Friday, November 15, 13
  62. 62. Personalized e-mails Bob Bob API Friday, November 15, 13
  63. 63. Personalized e-mails Bob Bob API notifies an user interaction Mailer Manager dispatch a task to Amazon SQS containing the customer id Simple Queue Service (Amazon SQS) m1.large Bobby Mailer Friday, November 15, 13
  64. 64. Personalized e-mails Bob Bob API notifies an user interaction Mailer Manager dispatch a task to Amazon SQS containing the customer id Simple Queue Service (Amazon SQS) m1.large consume Amazon SQS tasks EC2 instance EC2 instance m1.large m1.large … Spot instances Bobby Mailer Friday, November 15, 13 EC2 instance m1.large find the best recommendation for that user
  65. 65. Personalized e-mails Bob Bob API notifies an user interaction Mailer Manager dispatch a task to Amazon SQS containing the customer id Simple Queue Service (Amazon SQS) m1.large Simple Email Service (Amazon SES) send the e-mail consume Amazon SQS tasks EC2 instance EC2 instance m1.large m1.large … Spot instances Bobby Mailer Friday, November 15, 13 EC2 instance m1.large find the best recommendation for that user
  66. 66. Personalized e-mails Bob Bob API notifies an user interaction Mailer Manager dispatch a task to Amazon SQS containing the customer id Simple Queue Service (Amazon SQS) m1.large sync logs Simple Email Service (Amazon SES) sync logs Simple Storage Service (Amazon S3) send the e-mail consume Amazon SQS tasks EC2 instance EC2 instance m1.large m1.large Spot instances Bobby Mailer Friday, November 15, 13 … EC2 instance m1.large find the best recommendation for that user
  67. 67. Analytics with Faunus Amazon EMR Graph Analytics Engine Friday, November 15, 13 Distributed computing
  68. 68. Analytics with Faunus Amazon EMR Graph Analytics Engine • Provides graphs input/output formats Friday, November 15, 13 Distributed computing
  69. 69. Analytics with Faunus Amazon EMR Graph Analytics Engine • Provides graphs input/output formats and traversal language for graphs Friday, November 15, 13 Distributed computing
  70. 70. Analytics with Faunus Amazon EMR Graph Analytics Engine Distributed computing • Provides graphs input/output formats and traversal language for graphs • Distributed processing of large data sets across clusters Friday, November 15, 13
  71. 71. Analytics with Faunus Amazon EMR Graph Analytics Engine Distributed computing • Provides graphs input/output formats and traversal language for graphs • Distributed processing of large data sets across clusters • Designed to scale Friday, November 15, 13
  72. 72. Analytics with Faunus Amazon EMR Graph Analytics Engine Distributed computing • Provides graphs input/output formats and traversal language for graphs • Distributed processing of large data sets across clusters • Designed to scale • Detect and handle failures at application layer Friday, November 15, 13
  73. 73. Analytics in Graphs with AWS Friday, November 15, 13
  74. 74. Analytics in Graphs with AWS > g.V.has(‘element_type’, ‘person’).age.mean() 34.683232 Friday, November 15, 13
  75. 75. Analytics in Graphs with AWS > g.V.has(‘element_type’, ‘person’).age.mean() 34.683232 Friday, November 15, 13
  76. 76. Analytics in Graphs with AWS > g.V.has(‘element_type’, ‘person’).age.mean() 34.683232 Amazon EMR Friday, November 15, 13
  77. 77. Backup process nodetool script Friday, November 15, 13 Amazon S3
  78. 78. Backup process nodetool script Friday, November 15, 13 Amazon S3
  79. 79. Backup process nodetool script Friday, November 15, 13 Amazon S3
  80. 80. Internet Gateway Infrastructure Amazon Route 53 Elastic Load Balancing Queue Queue CACHE EC2 instance m2.xlarge EC2 instance Auto Scaling m2.xlarge EC2 instance Amazon S3 Logs m2.xlarge m2.xlarge m2.xlarge m2.xlarge EC2 instance Auto Scaling m2.xlarge Spot instances m2.xlarge Backups Amazon SQS Amazon ElastiCache API instances Amazon S3 Queue Cassandra cluster Friday, November 15, 13 Amazon EMR Simple Email Service (Amazon SES)
  81. 81. Metrics Friday, November 15, 13
  82. 82. Metrics • 4.3 million Magazine Luiza identified customers Friday, November 15, 13
  83. 83. Metrics • 4.3 million Magazine Luiza identified customers • 50,000 nodes “products” Friday, November 15, 13
  84. 84. Metrics • 4.3 million Magazine Luiza identified customers • 50,000 nodes “products” • 90 million total nodes Friday, November 15, 13
  85. 85. Metrics • • • • 4.3 million Magazine Luiza identified customers 50,000 nodes “products” 90 million total nodes 350 million total edges Friday, November 15, 13
  86. 86. Metrics • • • • • 4.3 million Magazine Luiza identified customers 50,000 nodes “products” 90 million total nodes 350 million total edges 700 GB of data Friday, November 15, 13
  87. 87. Metrics • • • • • • 4.3 million Magazine Luiza identified customers 50,000 nodes “products” 90 million total nodes 350 million total edges 700 GB of data Peaks with 20,000 reads/sec - Cassandra Cluster Friday, November 15, 13
  88. 88. Results matter… 10x faster Friday, November 15, 13 60%
  89. 89. Results matter… January 2013 Friday, November 15, 13 March 2013 May 2013 July 2013 September 2013
  90. 90. Results matter… Solution A alone January 2013 Friday, November 15, 13 March 2013 May 2013 July 2013 September 2013
  91. 91. Results matter… Solution A alone January 2013 Friday, November 15, 13 First Bob tests March 2013 May 2013 July 2013 September 2013
  92. 92. Results matter… Bob out for 2 weeks Solution A alone January 2013 Friday, November 15, 13 First Bob tests March 2013 May 2013 July 2013 September 2013
  93. 93. Results matter… Bob alone Bob out for 2 weeks Solution A alone January 2013 Friday, November 15, 13 First Bob tests March 2013 May 2013 July 2013 September 2013
  94. 94. Results matter… Bob alone First Bob tests January 2013 Friday, November 15, 13 March 2013 May 2013 July 2013 September 2013
  95. 95. Results matter… Bob alone First Bob tests January 2013 Friday, November 15, 13 March 2013 190% May 2013 July 2013 September 2013
  96. 96. Next steps Friday, November 15, 13
  97. 97. Next steps • Use Faunus to pre-process all W*A* recommendations Friday, November 15, 13
  98. 98. Next steps • Use Faunus to pre-process all W*A* recommendations • Algorithms to identify communities in graph Friday, November 15, 13
  99. 99. Next steps • Use Faunus to pre-process all W*A* recommendations • Algorithms to identify communities in graph • Cassandra replication between regions Friday, November 15, 13
  100. 100. Please give us your feedback on this presentation BDT303 As a thank you, we will select prize winners daily for completed surveys! Friday, November 15, 13 Thank You

×