Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a Recommendation Engine with Spring and Hadoop

2,737 views

Published on

Speaker: Michael Minella
Big Data Track
The Amazon’s and Google’s of the world have had Ph.D.’s locked up in back rooms for years creating algorithms to get you to click on things and subsequently buy stuff. One of the big things that those smart people have been working on are recommendation engines. Today, a recommendation engine isn’t something that only the Amazon’s of the world can have. With an hour, and a handful of open source tools, we’ll build a recommendation engine based on the data from the website we probably spend the most time on…StackOverflow. We’ll use Spring XD and Spring Batch to orchestrate the full lifecycle of Hadoop processing (ingest, process, export) and use Apache Mahout to provide us with the recommendation processing. A basic understanding of Hadoop concepts (what Map/Reduce is) and Spring (basic D/I configuration) is expected for this talk.

Published in: Software
  • Be the first to comment

Building a Recommendation Engine with Spring and Hadoop

  1. 1. BUILDING ENGINES WITH SPRING
  2. 2. MICHAEL MINELLA TWITTER: @MICHAELMINELLA HOME PAGE: SPRING.IO/TEAM/MMINELLA
  3. 3. WHAT I’M NOT
  4. 4. https://github.com/SpringOne2GX-2014/
  5. 5. THANK YOU SEBASTIAN SCHELTER PAT FERREL
  6. 6. 13
  7. 7. RECOMMENDATION ALGORITHMS
  8. 8. L E T ’ S S E T S O M E EXPECTATIONS
  9. 9. SCALE OF THE PROBLEM
  10. 10. MILLIONS OF USERS
  11. 11. 100,000’s OF ITEMS
  12. 12. TOOLS AND TECHNOLOGIES
  13. 13. 1SPRING BOOT
  14. 14. 2MYSQL
  15. 15. 3HADOOP
  16. 16. 4SPRING XD
  17. 17. 5MAHOUT
  18. 18. SPRING XD EXTREME DATA
  19. 19. APPLICATION COMPLEXITY
  20. 20. LOTS OF BOILERPLATE
  21. 21. MANY DOMAINS TO BRIDGE
  22. 22. I N C O N S I S T E N T APIS
  23. 23. SOURCE, CHANEL, SINK DATA FLOW MODEL = ADAPTER, CHANEL, FILTER, TRANSFORMER, ETC EIP PATTERNS
  24. 24. JOB, CONNECTOR IMPORT/EXPORT = JOB, ITEMREADER/ITEMWRITER BATCH PROCESSING
  25. 25. WORKFLOW, ACTION WORKFLOW ORCHESTRATION = JOB, STEP BATCH PROCESSING
  26. 26. SPRING XD EXTREME DATA
  27. 27. Ingestion Orchestration SPRING Extraction Real-time Analytics
  28. 28. D I S T R I B U T E D RUNTIME
  29. 29. STREAMIN G BATCH &
  30. 30. --directory=/xd/dir1 --port=h8t1t8p1| hdfs filter --expression=“payload?.price > 3.00” |
  31. 31. BATCH PROCESSING FOR HEAVY LIFTING
  32. 32. JO B
  33. 33. STE P
  34. 34. TASKL ET
  35. 35. CHUN K
  36. 36. SPRING FOR APACHE HADOOP
  37. 37. TOTAL LINES OF CUSTOM CODE 47 Lines of Java 29 Lines of XML 6 Spring XD Shell Commands
  38. 38. RECOMMENDATION ALGORITHMS
  39. 39. PREDICTING THE FUTURE
  40. 40. COL L A BORAT I V E FILTERING
  41. 41. TWO OPTIONS
  42. 42. USER BASED
  43. 43. USER ITEM 1ITEM 2ITEM 3ITEM 4ITEM 5 DEREK MICHAEL PHIL DARREL ?
  44. 44. USER BASED
  45. 45. USER BASED
  46. 46. ITEM BASED
  47. 47. ITEM DEREKMICHAELPHILDARREL ITEM 1 ITEM 2 ITEM 3 ITEM 4 ITEM 5 ?
  48. 48. ITEM BASED
  49. 49. ITEM BASED
  50. 50. PEOPLE ARE FUNNY
  51. 51. USER_ID, TAG_ID, VOTES TAG_ID, TAG_ID, SCORE
  52. 52. LOOKING INTO THE FUTURE
  53. 53. SNAPSHOTS AHEAD!
  54. 54. MAP REDUCE
  55. 55. MA P R E D U C E PROBLEMS
  56. 56. A P I I S V E RY LOW LEVEL
  57. 57. HIGH LATENCY
  58. 58. NOT A LWAY S A GOOD FIT
  59. 59. POTENTIALLY FASTER
  60. 60. HIGHER LEVEL APIS
  61. 61. scala> textFile.count() res0: Long = 126
  62. 62. USER_ID, TAG_ID, VOTES TAGID,TAGID:RANK…
  63. 63. U S E A 1SEARCH ENGINE
  64. 64. D ATA 2NORMALIZATION
  65. 65. Learn More. Stay Connected. Spring Batch Project: spring.io/spring-batch Github: github.com/spring-projects/spring-batch Jira: jira.spring.io/browse/BATCH Spring Boot Project: spring.io/spring-boot Github: github.com/spring-projects/spring-boot Spring XD Project: spring.io/spring-xd Github: github.com/spring-projects/spring-xd Jira: jira.spring.io/browse/XD Twitter: twitter.com/springcentral YouTube: spring.io/video LinkedIn: spring.io/linkedin Google Plus: spring.io/gplus
  66. 66. Question by Jessica Lock from The Noun Project Servers by Jaime Carrion from The Noun Project Check Box by Hrag Chanchanian from The Noun Project Crane by Kenneth Von Alt from The Noun Project Nut by Naomi Atkinson from The Noun Project Funnel by Volodin Anton from The Noun Project Circuit by Piotrek Chuchla from The Noun Project Puzzle by Matthew Hall from The Noun Project Database by Anton Outkine from The Noun Project Network by Mister Pixel from The Noun Project Puzzle by Eric M. Ellis from The Noun Project People by Wilson Joseph from The Noun Project Maze by Gilbert Bages from The Noun Project Fork by Dmitry Baranovskiy from The Noun Project Algebra by Ilsur Aptukov from The Noun Project Users by Vittorio Maria Vecchi from The Noun Project Scale by Edward Boatman from The Noun Project Thumbs Up by Jørgen Bovolden from The Noun Project Flow Chart by Michael Wohlwend from The Noun Project Running by Dimiter Petrov from The Noun Project Running by Dimiter Petrov from The Noun Project Move by Dmitry Baranovskiy from The Noun Project Abacus byAlice Mortaro from The Noun Project Stopwatch by Scott Lewis from The Noun Project Lego by jon trillana from The Noun Project Lego by jon trillana from The Noun Project Lego by jon trillana from The Noun Project Lego by Jake Dunham from The Noun Project
  67. 67. TheEnd

×