0
Building a relevance platformwith Couchbase andElasticsearchHippo GetTogether, 21 June 2013Jeroen Reijn | @jreijn | #hgt20...
follow the Hippo trailHippo GetTogether 2013About me• Architect @ Hippo• DevOps guy• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoRelevance?
follow the Hippo trailHippo GetTogether 2013OneHippo @ Goto“The capability of a searchengine or function toretrieve data a...
follow the Hippo trailHippo GetTogether 2013OneHippo @ Goto
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoHow we deliverrelevant content@Hippo
follow the Hippo trailHippo GetTogether 2013RegistrationVisitor - entity making HTTP requestsCollector - records data abou...
follow the Hippo trailHippo GetTogether 2013MatchingCharacteristic - a type of fact about visitorsExample: "comes from a c...
follow the Hippo trailHippo GetTogether 2013What do we store?Request logTargeting dataStatisticsAverages, e.g. how many vi...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoBIG DATA !!
follow the Hippo trailHippo GetTogether 2013Real-time analysis
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoArchitecture
follow the Hippo trailHippo GetTogether 2013RDBMSHippo Delivery TierHippo RepositoryApp serverXMLJSON (X)HTML
follow the Hippo trailHippo GetTogether 2013Delivery TierURL MatchingFetch contentCompose outputRequestResponse
follow the Hippo trailHippo GetTogether 2013Delivery TierURL MatchingTargeting Data CollectionCompose outputRequestRespons...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoScaling
follow the Hippo trailHippo GetTogether 2013RDBMSHippo Delivery TierHippo RepositoryApp serverHippo Delivery TierHippo Rep...
follow the Hippo trailHippo GetTogether 2013RDBMSDelivery TierRepositoryApp serverDelivery TierRepositoryApp serverScaling...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoWhat kind of ‘storage’?
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoQuestion?
follow the Hippo trailHippo GetTogether 2013Distributed Cache?
follow the Hippo trailHippo GetTogether 2013We have a winner!
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoRequirementschange!
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoNoSQL to the rescue
follow the Hippo trailHippo GetTogether 2013Suitable types• Key-value store• Document database
follow the Hippo trailHippo GetTogether 2013Assessment CriteriaMaturity Data modelConsistency modelPerformanceReplicationC...
follow the Hippo trailHippo GetTogether 2013Selection Criteria• Performance• Scalability• Schema flexibility• Simplicity• M...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoPerformance !!Performance !!!!
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoScalability
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoSchema flexibility
follow the Hippo trailHippo GetTogether 2013{"visitorId": "7a1c7e75-8539-40","pageUrl": "http://localhost:8080/site/news",...
follow the Hippo trailHippo GetTogether 2013{"geo": {"collectorId": "geo","city": "","country": "","latitude": 0,"longitud...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoSimplicity
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoMonitoring
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoSupport
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoCouchbase
follow the Hippo trailHippo GetTogether 2013Why Couchbase?• Drop-in replacement for memcached• Read/Write-through cache• H...
follow the Hippo trailHippo GetTogether 2013Couchbase• Open Source• Document-oriented• Easy Scalable• Consistent High Perf...
follow the Hippo trailHippo GetTogether 2013Performance• Object managed cache• Write Queue to disk• Avoids Cold Cache
follow the Hippo trailHippo GetTogether 2013Source: http://www.slideshare.net/Couchbase/benchmarking-couchbaseCopyright © ...
follow the Hippo trailHippo GetTogether 2013Easy scalable• Auto sharding• Cross cluster replication (XDCR)• Master - Maste...
follow the Hippo trailHippo GetTogether 2013Flexible data model• Native JSON support• Incremental Map Reduce• Gives power ...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoHow we runCouchbase @Hippo
follow the Hippo trailHippo GetTogether 2013Load BalancerDatabase clusterHippo Delivery TierCouchbase cluster•Request log ...
follow the Hippo trailHippo GetTogether 2013Query capabilities• Querying via views• Secondary indexes via views• Views bas...
follow the Hippo trailHippo GetTogether 2013Elasticsearch• Apache Lucene• Designed to be distributed• Schema free• Apache ...
follow the Hippo trailHippo GetTogether 2013Added value of ES• Full text search• Faceted search• Geo spatial search• All i...
follow the Hippo trailHippo GetTogether 2013Couchbase Server Cluster Elasticsearch Server ClusterHippo Delivery TierJava A...
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoWhat’s Next?
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoWhat’s Next?
follow the Hippo trailHippo GetTogether 2013Advanced analytics
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoDemo time!
follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoThank you!Questions?j.reijn@onehippo.com | @jreijnps. We’re hir...
Upcoming SlideShare
Loading in...5
×

Hippo GetTogether: The architecture behind Hippos relevance platform

1,631

Published on

These slides were from my Hippo GetTogether 2013 presentation. During this presentation I went into detail about the architecture behind our high performance relevance platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

Published in: Technology, Sports
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,631
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Hippo GetTogether: The architecture behind Hippos relevance platform"

  1. 1. Building a relevance platformwith Couchbase andElasticsearchHippo GetTogether, 21 June 2013Jeroen Reijn | @jreijn | #hgt2013Hippo GetTogether 2013follow the Hippo trail
  2. 2. follow the Hippo trailHippo GetTogether 2013About me• Architect @ Hippo• DevOps guy• Blogger @ http://blog.jeroenreijn.com
  3. 3. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoRelevance?
  4. 4. follow the Hippo trailHippo GetTogether 2013OneHippo @ Goto“The capability of a searchengine or function toretrieve data appropriateto a users needs.”http://www.thefreedictionary.com/relevance
  5. 5. follow the Hippo trailHippo GetTogether 2013OneHippo @ Goto
  6. 6. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoHow we deliverrelevant content@Hippo
  7. 7. follow the Hippo trailHippo GetTogether 2013RegistrationVisitor - entity making HTTP requestsCollector - records data about a visitor or his behaviorExample: location collector (GeoIPCollector)Targeting Data - all data about a specific visitorExample: IP address is located in Amsterdam
  8. 8. follow the Hippo trailHippo GetTogether 2013MatchingCharacteristic - a type of fact about visitorsExample: "comes from a city", "experiences a type ofweather"Target Group - the specification of a CharacteristicExample: "comes from a European city", "comes fromAmsterdam"Persona - one or more target groups that describe acertain type of visitorExample: "Jim, the European urban consumer","Alice, the Pet owner"
  9. 9. follow the Hippo trailHippo GetTogether 2013What do we store?Request logTargeting dataStatisticsAverages, e.g. how many visitors became which persona
  10. 10. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoBIG DATA !!
  11. 11. follow the Hippo trailHippo GetTogether 2013Real-time analysis
  12. 12. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoArchitecture
  13. 13. follow the Hippo trailHippo GetTogether 2013RDBMSHippo Delivery TierHippo RepositoryApp serverXMLJSON (X)HTML
  14. 14. follow the Hippo trailHippo GetTogether 2013Delivery TierURL MatchingFetch contentCompose outputRequestResponse
  15. 15. follow the Hippo trailHippo GetTogether 2013Delivery TierURL MatchingTargeting Data CollectionCompose outputRequestResponseFetch contentScoring
  16. 16. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoScaling
  17. 17. follow the Hippo trailHippo GetTogether 2013RDBMSHippo Delivery TierHippo RepositoryApp serverHippo Delivery TierHippo RepositoryApp serverScaling out
  18. 18. follow the Hippo trailHippo GetTogether 2013RDBMSDelivery TierRepositoryApp serverDelivery TierRepositoryApp serverScaling outTargetingDatastore
  19. 19. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoWhat kind of ‘storage’?
  20. 20. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoQuestion?
  21. 21. follow the Hippo trailHippo GetTogether 2013Distributed Cache?
  22. 22. follow the Hippo trailHippo GetTogether 2013We have a winner!
  23. 23. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoRequirementschange!
  24. 24. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoNoSQL to the rescue
  25. 25. follow the Hippo trailHippo GetTogether 2013Suitable types• Key-value store• Document database
  26. 26. follow the Hippo trailHippo GetTogether 2013Assessment CriteriaMaturity Data modelConsistency modelPerformanceReplicationCaching model Query modelMonitoringScalabilityReliabilitySupport
  27. 27. follow the Hippo trailHippo GetTogether 2013Selection Criteria• Performance• Scalability• Schema flexibility• Simplicity• Monitoring• Support
  28. 28. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoPerformance !!Performance !!!!
  29. 29. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoScalability
  30. 30. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoSchema flexibility
  31. 31. follow the Hippo trailHippo GetTogether 2013{"visitorId": "7a1c7e75-8539-40","pageUrl": "http://localhost:8080/site/news","pathInfo": "/news","remoteAddr": "127.0.0.1","referer": "http://localhost:8080/site/","timestamp": 1371419505909,"collectorData": {"geo": {"country": "","city": "","latitude": 0,"longitude": 0},"returningvisitor": false,"channel": "English Website"},"personaIdScores": [],"globalPersonaIdScores": []}Request log document
  32. 32. follow the Hippo trailHippo GetTogether 2013{"geo": {"collectorId": "geo","city": "","country": "","latitude": 0,"longitude": 0},"channel": {"collectorId": "channel","channels": ["English Website"],"lastVisitedChannel": "English Website"}}Visitor document
  33. 33. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoSimplicity
  34. 34. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoMonitoring
  35. 35. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoSupport
  36. 36. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoCouchbase
  37. 37. follow the Hippo trailHippo GetTogether 2013Why Couchbase?• Drop-in replacement for memcached• Read/Write-through cache• High throughput• Easy scalability• Schema flexibility• Low latency
  38. 38. follow the Hippo trailHippo GetTogether 2013Couchbase• Open Source• Document-oriented• Easy Scalable• Consistent High Performance• Apache license
  39. 39. follow the Hippo trailHippo GetTogether 2013Performance• Object managed cache• Write Queue to disk• Avoids Cold Cache
  40. 40. follow the Hippo trailHippo GetTogether 2013Source: http://www.slideshare.net/Couchbase/benchmarking-couchbaseCopyright © Altoros Systems, Inc.
  41. 41. follow the Hippo trailHippo GetTogether 2013Easy scalable• Auto sharding• Cross cluster replication (XDCR)• Master - Master replication
  42. 42. follow the Hippo trailHippo GetTogether 2013Flexible data model• Native JSON support• Incremental Map Reduce• Gives power to the developer
  43. 43. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoHow we runCouchbase @Hippo
  44. 44. follow the Hippo trailHippo GetTogether 2013Load BalancerDatabase clusterHippo Delivery TierCouchbase cluster•Request log data•Targeting data•Statistics data
  45. 45. follow the Hippo trailHippo GetTogether 2013Query capabilities• Querying via views• Secondary indexes via views• Views based on Map - Reduce• Lacks some advanced query capabilities
  46. 46. follow the Hippo trailHippo GetTogether 2013Elasticsearch• Apache Lucene• Designed to be distributed• Schema free• Apache license• RESTful API
  47. 47. follow the Hippo trailHippo GetTogether 2013Added value of ES• Full text search• Faceted search• Geo spatial search• All in (near) real-time
  48. 48. follow the Hippo trailHippo GetTogether 2013Couchbase Server Cluster Elasticsearch Server ClusterHippo Delivery TierJava APIWriteReadXDCR Couchbase ESTransport pluginReplicating to ES
  49. 49. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoWhat’s Next?
  50. 50. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoWhat’s Next?
  51. 51. follow the Hippo trailHippo GetTogether 2013Advanced analytics
  52. 52. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoDemo time!
  53. 53. follow the Hippo trailHippo GetTogether 2013OneHippo @ GotoThank you!Questions?j.reijn@onehippo.com | @jreijnps. We’re hiring!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×