Building a relevance platform with Couchbase and Elasticsearch

1,819 views

Published on

These slides were from my Goto Amsterdam presentation. During this presentation I went into detail about how we're building a high performance relevance platform at Hippo with Couchbase and Elasticsearch. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,819
On SlideShare
0
From Embeds
0
Number of Embeds
44
Actions
Shares
0
Downloads
17
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Building a relevance platform with Couchbase and Elasticsearch

  1. 1. OneHippo @ Gotofollow the Hippo trailBuilding a relevanceplatform with Couchbaseand Elasticsearch@jreijn | Hippo#gotoams, June 18
  2. 2. follow the Hippo trailOneHippo @ GotoAbout me• Architect @ Hippo• DevOps guy• Blogger @ http://blog.jeroenreijn.com
  3. 3. follow the Hippo trailOneHippo @ GotoAbout Hippo
  4. 4. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoRelevance?
  5. 5. follow the Hippo trailOneHippo @ GotoOneHippo @ Goto“The capability of a searchengine or function toretrieve data appropriateto a users needs.”http://www.thefreedictionary.com/relevance
  6. 6. follow the Hippo trailOneHippo @ GotoOneHippo @ Goto
  7. 7. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoHow we deliverrelevant content@Hippo
  8. 8. follow the Hippo trailOneHippo @ GotoRegistrationVisitor - entity making HTTP requestsCollector - records data about a visitor or his behaviorExample: location collector (GeoIPCollector)Targeting Data - all data about a specific visitorExample: IP address is located in Amsterdam
  9. 9. follow the Hippo trailOneHippo @ GotoMatchingCharacteristic - a type of fact about visitorsExample: "comes from a city", "experiences a type ofweather"Target Group - the specification of a CharacteristicExample: "comes from a European city", "comes fromAmsterdam"Persona - one or more target groups that describe acertain type of visitorExample: "Jim, the European urban consumer","Alice, the Pet owner"
  10. 10. follow the Hippo trailOneHippo @ GotoWhat do we store?Request logTargeting dataStatisticsAverages, e.g. how many visitors became which persona
  11. 11. follow the Hippo trailOneHippo @ GotoReal-time analysis
  12. 12. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoArchitecture
  13. 13. follow the Hippo trailOneHippo @ GotoRDBMSHippo Delivery TierHippo RepositoryApp serverXMLJSON (X)HTML
  14. 14. follow the Hippo trailOneHippo @ GotoDelivery TierURL MatchingFetch contentCompose outputRequestResponse
  15. 15. follow the Hippo trailOneHippo @ GotoDelivery TierURL MatchingTargeting Data CollectionCompose outputRequestResponseFetch contentScoring
  16. 16. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoScaling
  17. 17. follow the Hippo trailOneHippo @ GotoRDBMSHippo Delivery TierHippo RepositoryApp serverHippo Delivery TierHippo RepositoryApp serverScaling out
  18. 18. follow the Hippo trailOneHippo @ GotoRDBMSDelivery TierRepositoryApp serverDelivery TierRepositoryApp serverScaling outTargetingDatastore
  19. 19. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoWhat kind of ‘storage’?
  20. 20. follow the Hippo trailOneHippo @ GotoDistributed Cache?
  21. 21. follow the Hippo trailOneHippo @ GotoWe have a winner!
  22. 22. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoRequirementschange!
  23. 23. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoNoSQL to the rescue
  24. 24. follow the Hippo trailOneHippo @ GotoSuitable types• Key-value store• Document database
  25. 25. follow the Hippo trailOneHippo @ GotoAssessment CriteriaMaturity Data modelConsistency modelPerformanceReplicationCaching model Query modelMonitoringScalabilityReliabilitySupport
  26. 26. follow the Hippo trailOneHippo @ GotoSelection Criteria• Performance!• Scalability• Schema flexibility• Simplicity• Monitoring• Support
  27. 27. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoPerformance !!
  28. 28. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoScalability
  29. 29. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoSchema flexibility
  30. 30. follow the Hippo trailOneHippo @ Goto{"visitorId": "7a1c7e75-8539-40","pageUrl": "http://localhost:8080/site/news","pathInfo": "/news","remoteAddr": "127.0.0.1","referer": "http://localhost:8080/site/","timestamp": 1371419505909,"collectorData": {"geo": {"country": "","city": "","latitude": 0,"longitude": 0},"returningvisitor": false,"channel": "English Website"},"personaIdScores": [],"globalPersonaIdScores": []}Request log document
  31. 31. follow the Hippo trailOneHippo @ Goto{"geo": {"collectorId": "geo","city": "","country": "","latitude": 0,"longitude": 0},"channel": {"collectorId": "channel","channels": ["English Website"],"lastVisitedChannel": "English Website"}}Visitor document
  32. 32. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoSimplicity
  33. 33. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoMonitoring
  34. 34. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoSupport
  35. 35. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoCouchbase
  36. 36. follow the Hippo trailOneHippo @ GotoWhy Couchbase?• Drop-in replacement for memcached• Read/Write-through cache• High throughput• Easy scalability• Schema flexibility• Low latency
  37. 37. follow the Hippo trailOneHippo @ GotoCouchbase• Open Source• Document-oriented• Easy Scalable• Consistent High Performance
  38. 38. follow the Hippo trailOneHippo @ GotoPerformance• Object managed cache• Write Queue to disk• Avoids Cold Cache
  39. 39. follow the Hippo trailOneHippo @ GotoEasy scalable• Auto sharding• Cross cluster replication (XDCR)• Master - Master replication
  40. 40. follow the Hippo trailOneHippo @ GotoFlexible data model• Native JSON support• Incremental Map Reduce• Gives power to the developer
  41. 41. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoHow we runCouchbase @Hippo
  42. 42. follow the Hippo trailOneHippo @ GotoLoad BalancerDatabase clusterHippo Delivery TierCouchbase cluster•Request log data•Targeting data•Statistics data
  43. 43. follow the Hippo trailOneHippo @ GotoQuery capabilities• Querying via views• Secondary indexes via views• Views based on Map - Reduce• Lacks some advanced query capabilities
  44. 44. follow the Hippo trailOneHippo @ GotoElasticsearch• Apache Lucene• Designed to be distributed• Schema free• Apache 2 licensed• RESTful API
  45. 45. follow the Hippo trailOneHippo @ GotoAdded value of ES• Full text search• Faceted search• Geo spatial search• All in (near) real-time
  46. 46. follow the Hippo trailOneHippo @ GotoCouchbase Server Cluster Elasticsearch Server ClusterHippo Delivery TierJava APIWriteReadXDCR Couchbase ESTransport pluginReplicating to ES
  47. 47. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoDemo time!
  48. 48. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoWhat’s Next?
  49. 49. follow the Hippo trailOneHippo @ GotoAdvanced analytics
  50. 50. follow the Hippo trailOneHippo @ GotoOneHippo @ GotoThank you!Questions?j.reijn@onehippo.com@jreijnps. We’re hiring!

×