Real-time visitor analysis with Couchbase and Elastichsearch

  • 2,151 views
Uploaded on

These slides were from my NoSQL Matters Barcelona 2013 presentation. During this presentation I went into detail about the architecture behind our high performance real-time visitor analysis platform. …

These slides were from my NoSQL Matters Barcelona 2013 presentation. During this presentation I went into detail about the architecture behind our high performance real-time visitor analysis platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for advanced search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,151
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
40
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Real-time visitor analysis with Couchbase and Elasticsearch Jeroen Reijn | @jreijn | #nosql13 follow the Hippo trail
  • 2. NoSQL Matters 2013 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com follow the Hippo trail
  • 3. NoSQL Matters 2013 About Hippo follow the Hippo trail
  • 4. NoSQL Matters 2013 Visitor Analysis OneHippo @ Goto follow the Hippo trail
  • 5. NoSQL Matters 2013 OneHippo @ Goto follow the Hippo trail
  • 6. NoSQL Matters 2013 OneHippo @ Goto follow the Hippo trail
  • 7. NoSQL Matters 2013 Journey based Targeting follow the Hippo trail
  • 8. NoSQL Matters 2013 How we analyse visitors @ Hippo OneHippo @ Goto follow the Hippo trail
  • 9. NoSQL Matters 2013 Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behaviour Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam follow the Hippo trail
  • 10. NoSQL Matters 2013 Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner" follow the Hippo trail
  • 11. NoSQL Matters 2013 What do we store? Request log ! Targeting data ! Statistics Averages, e.g. how many visitors became which persona follow the Hippo trail
  • 12. NoSQL Matters 2013 Real-time analysis follow the Hippo trail
  • 13. NoSQL Matters 2013 How about YOU? • Do you analyse your visitors? • Do you do it ‘realtime’? follow the Hippo trail
  • 14. NoSQL Matters 2013 Architecture OneHippo @ Goto follow the Hippo trail
  • 15. NoSQL Matters 2013 JSON XML (X)HTML App server Hippo Delivery Tier Hippo Repository RDBMS follow the Hippo trail
  • 16. NoSQL Matters 2013 Request Delivery Tier URL Matching Fetch content Compose output Response follow the Hippo trail
  • 17. Request NoSQL Matters 2013 Delivery Tier URL Matching Collect data Scoring Fetch content Compose output Response follow the Hippo trail
  • 18. NoSQL Matters 2013 Scaling OneHippo @ Goto follow the Hippo trail
  • 19. NoSQL Matters 2013 Scaling out App server App server Hippo Delivery Tier Hippo Delivery Tier Hippo Repository Hippo Repository RDBMS follow the Hippo trail
  • 20. NoSQL Matters 2013 Scaling out App server Delivery Tier App server Targeting Datastore Repository Delivery Tier Repository RDBMS follow the Hippo trail
  • 21. NoSQL Matters 2013 What kind of storage? OneHippo @ Goto follow the Hippo trail
  • 22. NoSQL Matters 2013 Typical Data Access Pattern Several reads OneHippo @ Goto Single write Writer Datastore follow the Hippo trail
  • 23. NoSQL Matters 2013 Analytics Data Access Pattern Several writes Single read Datastore CMS user Writers follow the Hippo trail
  • 24. NoSQL Matters 2013 Targeting Data Access Pattern Several writes Single read Several reads Datastore CMS user Visitors follow the Hippo trail
  • 25. NoSQL Matters 2013 Distributed Cache follow the Hippo trail
  • 26. NoSQL Matters 2013 Requirements change! OneHippo @ Goto follow the Hippo trail
  • 27. NoSQL Matters 2013 NoSQL ? OneHippo @ Goto follow the Hippo trail
  • 28. NoSQL Matters 2013 Suitable types • Key-value store • Document database • Column oriented store follow the Hippo trail
  • 29. NoSQL Matters 2013 Assessment Criteria Maturity Data model Scalability Replication Performance Reliability Caching model Query model Consistency model Support Monitoring follow the Hippo trail
  • 30. NoSQL Matters 2013 Selection Criteria • Performance • Scalability • Schema flexibility • Simplicity follow the Hippo trail
  • 31. NoSQL Matters 2013 Couchbase OneHippo @ Goto follow the Hippo trail
  • 32. NoSQL Matters 2013 Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easily scalable • Schema flexibility • Low latency follow the Hippo trail
  • 33. NoSQL Matters 2013 Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache licensed follow the Hippo trail
  • 34. NoSQL Matters 2013 Performance • • Object managed cache Write Queue to disk follow the Hippo trail
  • 35. NoSQL Matters 2013 Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication follow the Hippo trail
  • 36. NoSQL Matters 2013 Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer follow the Hippo trail
  • 37. NoSQL Matters 2013 How we run Couchbase @ Hippo OneHippo @ Goto follow the Hippo trail
  • 38. NoSQL Matters 2013 Load Balancer Hippo Delivery Tier Database cluster Couchbase cluster • Request log data • Targeting data • Statistics data follow the Hippo trail
  • 39. NoSQL Matters 2013 Analysis capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Limited ad-hoc query capabilities follow the Hippo trail
  • 40. NoSQL Matters 2013 Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache license • RESTful API follow the Hippo trail
  • 41. NoSQL Matters 2013 Added value • Unstructured search • Structured search • Faceted search • Geo spatial search • Combinate all • All in (near) real-time follow the Hippo trail
  • 42. NoSQL Matters 2013 Replication Read ry ue Write /Q Java API ad Re Hippo Delivery Tier Couchbase Server Cluster Elasticsearch Server Cluster XDCR Couchbase Transport plugin follow the Hippo trail
  • 43. NoSQL Matters 2013 What’s Next? OneHippo @ Goto follow the Hippo trail
  • 44. NoSQL Matters 2013 Advanced analytics follow the Hippo trail
  • 45. NoSQL Matters 2013 { Demo } OneHippo @ Goto follow the Hippo trail
  • 46. NoSQL Matters 2013 ! Thanks! OneHippo @ Goto ! j.reijn@onehippo.com @jreijn www.onehippo.com follow the Hippo trail