Using&Elas*csearch&and&Couchbase&Together&to&Build&Large&Scale&Apps&Uri&Boness,&Founder,&Elas*csearch&Dip*&Borkar,&Directo...
Introduction to Elasticsearch
What is Elasticsearch?Open source Apache 2 license•multi-tenant, realtime anddistributed search & analyticsengineBacked by...
What can Elasticsearch do?Unstructured searchfind all companies in the “search” market•Structured searchfind all companies...
(near) real-time!
Distributed & multi-tenantA node is single Elasticsearch instanceMultiple nodes can form a clusterA cluster can manage mul...
The Index
What’s in an index?An identified collection of documentsBuilt & designed for small & large scalesdata volumesdata can be s...
starting a nodenode_1
creating our first indexnode_1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"num...
the two shards are allocatednode_10 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" :...
starting a second nodenode_1 node_20 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" ...
shards are relocatingnode_1 node_20 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" :...
replicas are allocatednode_1 node_20 11 0curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shard...
Indexing Data
the dataDocuments are typically JSON formatted•curl -XPUT localhost:9200/companies/company/1 -d {"id" : "elasticsearch","n...
sending req. to one of the nodesnode_3node_1 node_20 11 010client
sending req. to one of the nodesnode_3node_1 node_20 11 010clientresolve thetarget shard
resolve shard & index to primarynode_3node_1 node_20 11 010client
replicate to replicasnode_3node_1 node_20 11 010client
Searching
unstructured searchUsing an extensive & powerful QueryDSL•curl -XGET localhost:9200/companies/_search -d {"query" : {,"mat...
unstructured searchUsing an extensive & powerful QueryDSL•curl -XGET localhost:9200/companies/_search -d {"query" : {,"mat...
structured searchnarrows the “searchable” document space•curl -XGET localhost:9200/companies/company/_search -d {"query" :...
structured searchnarrows the “searchable” document space•curl -XGET localhost:9200/companies/company/_search -d {"query" :...
returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "ela...
returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "ela...
returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "ela...
returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "ela...
Query DSLQueries (unstructured)term queriesboolean queriesphrase (proximity) queriesfuzzy/prefix/regexp/wildcardsmore...Fi...
Analytics(a.k.a facets)
Analytics (facets)Slice & dice your dataCompute aggregations over field valuesAcross any index field/sAll in (near) realti...
used as navigation aid
or analytics dashboards
Elasticsearch is often usedpurely for analytics(without incorporating free text search)
ExampleFind the average revenue of all companiessince 2000•curl -XGET localhost:9200/companies/revenues/_search -d {"query...
ExampleFind the average revenue of all companiessince 2000•curl -XGET localhost:9200/companies/revenues/_search -d {"query...
response"facets": {"revenue_stats": {"_type": "date_histogram","entries": [{"time": 956448895664,"mean": 23.0},{"time": 98...
response"facets": {"revenue_stats": {"_type": "date_histogram","entries": [{"time": 956448895664,"mean": 23.0},{"time": 98...
Types of analyticstermsunique value countsrangestatistics of specific field for a set of range groups ofanother fieldstati...
There’s much moreFine control of how documents are treatedindexed, stored, text analysis, relationsAdditional featureshigh...
Introduc)on*to*Couchbase*
Couchbase*Server*NoSQL*Document*Database*
Couchbase*Open*Source*Project*•  Leading(NoSQL(database(project(focused(on(distributed(database(technology(and(surrounding...
Easy*Scalability*Consistent*High*Performance*Always*On*24x365*Grow(cluster(without(applicaAon(changes,(without(downAme(wit...
Features*in*Couchbase*Server*2.0*JSON*support* Indexing*and*Querying*Cross*data*center*replica)on*Incremental*Map*Reduce*J...
Addi)onal*Features*Built;in(clustering(–(All(nodes(equal((Data(replicaAon(with(auto;failover((Zero;downAme(maintenance(((B...
Couchbase*Server*2.0*Architecture*Heartbeat(Process(monitor(Global(singleton(supervisor(ConfiguraAon(manager(on(each(node(R...
3(3( 2(Cross*data*center*replica)on*–*Data*flow*2(Managed(Cache(Disk(Queue(Disk(ReplicaAon(Queue(App(Server(Couchbase(Serve...
Cross*Datacenter*Replica)on*(XDCR)*
Couchbase*plugYin*for*Elas)csearch*
How*does*it*work?*Elas)cSearch*UnidirecAonal(Cross(Data(Center(ReplicaAon(
ElasAcsearch(IntegraAon((via(XDCR)(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(1(Doc(6(DISK(RAM(CACHE...
Install*the*Couchbase*PlugYIn*•  PreYrequisite*­  ExisAng(Couchbase(and(ElasAcSearch(Clusters(•  Install*the*Elas)cSearch*...
Configure*Couchbase*XDCR*(step*1)*
Configure*Couchbase*XDCR*(step*2)*
Documents*are*now*indexed*in*Elas)csearch*Document(Count(Increasing(
Reference*Architecture*
Recommended*Usage*PaQern*Elas)cSearch*1.((ElasAcSearch(Query(2.(ElasAcSearch(Result(3.(Couchbase(MulA;GET(4.(Couchbase(Res...
Common*Couchbase*Use*Cases*Social*Gaming**•  Couchbase(stores(player(and(game(data((•  Examples(customers(include:(Zynga(•...
Social*Gaming**•  Couchbase(stores(player(and(game(data((•  Examples(customers(include:(Zynga(•  Tapjoy,(Ubiso?,(Tencent(*...
RealYworld*example*Couchbase*+*Elas)csearch*
• Content*metadata*• Content:*Ar)cles,*text**• Landing*pages*for*website*• Digital*content:*eBooks,*magazine,*research*mat...
McGraw*Hill*Educa)on*Labs**Learning*portal**
Use*Case:*Content*and*metadata*store*Building(a(self;adapAng,(interacAve(learning(portal(with(Couchbase(and(ElasAcsearch(
As learning move online in great numbersGrowing need to build interactive learning environments thatScale!!Scale(to(millio...
• Allow(for(elasAc(scaling(under(spike(periods(• Ability(to(catalog(&(deliver(content(from(many*sources*• Consistent(lowYl...
•  Document(Modeling(•  Metadata(&(Content(Storage(•  View(Querying(to(support(Content(Browsing(•  ElasAcsearch(IntegraAon...
Couchbase*2.0*****+******Elas)csearch*Store(full-text articles(as(well(as(document metadata(for(image,(video(and(text(cont...
Data(Model(Content Metadata BucketUser ProfilesBucketContent StatsBucket•  Stores content metadata formedia objects and con...
Couchbase ViewsTop Contributors &Tagsdriven byIncremental MapReduceViews!Calcula)ng*sta)s)cs*via*Couchbase*
Tuning(content(ranking(via(ElasticSearchElasticSearch-drivenbased on settings below!Content popularity boost!User preferen...
{"_id": 4ae5be2df3122f06ba45b70753001841 ,_rev : 1-0013b349ffc3afc700000000068000000 ,$flags : 0,#expiration : 0,type : ac...
{"filter": {"term": {"type": "video}},"boost": USER_VIDEO_PREFERENCE *PREFERENCE_SLIDER}User*Preference*Boost*•  Use*Elas)...
"script": "_score * (((doc[popularity].value +1) / AVG_POPULARITY ) * POPULARITY_SLIDER)"Document Popularity Boost*•  Use*...
"filters": [ { "filter":{ "term":{ "type":"video" } },"boost": USER_VIDEO_PREFERENCE *PREFERENCE_SLIDER }, … imageand text...
The Learning Portal*•  Designed and built as acollaboration between MHE Labsand Couchbase•  Serves as proof-of-concept and...
Q*&*A*
Thank*you********dip)@couchbase.com*uri.boness@elas)csearch.com*****
Using Elasticsearch and Couchbase Together to Build Large Scale Applications
Upcoming SlideShare
Loading in...5
×

Using Elasticsearch and Couchbase Together to Build Large Scale Applications

9,566

Published on

Couchbase Server 2.0 allows for full-text search integration. In this webinar we examine how you can integrate your Couchbase Server 2.0 cluster with an Elasticsearch Cluster to provide enhanced querying capabilities and build large scale applications.

Published in: Technology, Business

Using Elasticsearch and Couchbase Together to Build Large Scale Applications

  1. 1. Using&Elas*csearch&and&Couchbase&Together&to&Build&Large&Scale&Apps&Uri&Boness,&Founder,&Elas*csearch&Dip*&Borkar,&Director,&Products,&Couchbase&
  2. 2. Introduction to Elasticsearch
  3. 3. What is Elasticsearch?Open source Apache 2 license•multi-tenant, realtime anddistributed search & analyticsengineBacked by Elasticsearch (the company)•Proven technology in productionOver 2 million downloads•
  4. 4. What can Elasticsearch do?Unstructured searchfind all companies in the “search” market•Structured searchfind all companies founded since 2000•Analyticsfind the average annual revenue of all companies•Combine allfind the average annual revenue of all companies foundedsince 2000 within the “search” market•
  5. 5. (near) real-time!
  6. 6. Distributed & multi-tenantA node is single Elasticsearch instanceMultiple nodes can form a clusterA cluster can manage multiple indicesA cluster is agile & self managingcontinuously ensuring the distributed characteristics of allindices are maintained and that all nodes in the cluster areefficiently & effectively utilized••••
  7. 7. The Index
  8. 8. What’s in an index?An identified collection of documentsBuilt & designed for small & large scalesdata volumesdata can be split and distributed between shardsloads & HAeach shard can have zero or more replicas••••
  9. 9. starting a nodenode_1
  10. 10. creating our first indexnode_1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
  11. 11. the two shards are allocatednode_10 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
  12. 12. starting a second nodenode_1 node_20 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
  13. 13. shards are relocatingnode_1 node_20 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
  14. 14. replicas are allocatednode_1 node_20 11 0curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
  15. 15. Indexing Data
  16. 16. the dataDocuments are typically JSON formatted•curl -XPUT localhost:9200/companies/company/1 -d {"id" : "elasticsearch","name" : "elasticsearch","website" : "http://www.elasticsearch.com","category" : "software","overview" : "distributed search & analytics engine","founded_year" : 2012,"location" : {"city" : "Amsterdam","country_code" : "NL","geo" : {"lat" : 52.370176,"lon" : 4.895008}}}
  17. 17. sending req. to one of the nodesnode_3node_1 node_20 11 010client
  18. 18. sending req. to one of the nodesnode_3node_1 node_20 11 010clientresolve thetarget shard
  19. 19. resolve shard & index to primarynode_3node_1 node_20 11 010client
  20. 20. replicate to replicasnode_3node_1 node_20 11 010client
  21. 21. Searching
  22. 22. unstructured searchUsing an extensive & powerful QueryDSL•curl -XGET localhost:9200/companies/_search -d {"query" : {,"match" : {"overview" : "search"}}}
  23. 23. unstructured searchUsing an extensive & powerful QueryDSL•curl -XGET localhost:9200/companies/_search -d {"query" : {,"match" : {"overview" : "search"}}}search for the term “search” in the “overview”field
  24. 24. structured searchnarrows the “searchable” document space•curl -XGET localhost:9200/companies/company/_search -d {"query" : {,"filtered" : {"query" : {"match" : {"overview" : "search"}},"filter" : {"range" : {"founded_year" : { "gte" : 2000 }}}}}}
  25. 25. structured searchnarrows the “searchable” document space•curl -XGET localhost:9200/companies/company/_search -d {"query" : {,"filtered" : {"query" : {"match" : {"overview" : "search"}},"filter" : {"range" : {"founded_year" : { "gte" : 2000 }}}}}}only search companies that were founded since year 2000
  26. 26. returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
  27. 27. returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
  28. 28. returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
  29. 29. returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
  30. 30. Query DSLQueries (unstructured)term queriesboolean queriesphrase (proximity) queriesfuzzy/prefix/regexp/wildcardsmore...Filters (structured)term (exact match)rangebooleangeo_* (e.g. geo_distance)••
  31. 31. Analytics(a.k.a facets)
  32. 32. Analytics (facets)Slice & dice your dataCompute aggregations over field valuesAcross any index field/sAll in (near) realtime••••
  33. 33. used as navigation aid
  34. 34. or analytics dashboards
  35. 35. Elasticsearch is often usedpurely for analytics(without incorporating free text search)
  36. 36. ExampleFind the average revenue of all companiessince 2000•curl -XGET localhost:9200/companies/revenues/_search -d {"query" : {"match_all" : {}},"facets" : {"revenue_stats" : {"date_histogram" : {"key_field" : "year","value_field" : "value","interval" : "month"}}}}
  37. 37. ExampleFind the average revenue of all companiessince 2000•curl -XGET localhost:9200/companies/revenues/_search -d {"query" : {"match_all" : {}},"facets" : {"revenue_stats" : {"date_histogram" : {"key_field" : "year","value_field" : "value","interval" : "month"}}}}return a yearly breakdown of stats over companies revenues
  38. 38. response"facets": {"revenue_stats": {"_type": "date_histogram","entries": [{"time": 956448895664,"mean": 23.0},{"time": 987984922557,"mean": 267.1034482758621},{"time": 1019520942098,"mean": 195.51724137931035}...]}}
  39. 39. response"facets": {"revenue_stats": {"_type": "date_histogram","entries": [{"time": 956448895664,"mean": 23.0},{"time": 987984922557,"mean": 267.1034482758621},{"time": 1019520942098,"mean": 195.51724137931035}...]}}year 2000avg revenue
  40. 40. Types of analyticstermsunique value countsrangestatistics of specific field for a set of range groups ofanother fieldstatisticalstats over a specific fieldterms_statsstats over a specific fields for every unique field valuedate_/histograma breakdown of statistics of a specific field over a•••••
  41. 41. There’s much moreFine control of how documents are treatedindexed, stored, text analysis, relationsAdditional featureshighlightingsuggest API (type ahead, auto-completion)percolator (reverse search)support of document relations (parent/child)extensive geo-location search & analyticsmore....••------
  42. 42. Introduc)on*to*Couchbase*
  43. 43. Couchbase*Server*NoSQL*Document*Database*
  44. 44. Couchbase*Open*Source*Project*•  Leading(NoSQL(database(project(focused(on(distributed(database(technology(and(surrounding(ecosystem(•  Supports(both(key;value(and(document;oriented(use(cases(•  All(components(are(available(under(the(Apache*2.0*Public*License*•  Obtained(as(packaged(so?ware(in(both(enterprise(and(community(ediAons.(Couchbase Open Source Project
  45. 45. Easy*Scalability*Consistent*High*Performance*Always*On*24x365*Grow(cluster(without(applicaAon(changes,(without(downAme(with(a(single(click(Consistent(sub;millisecond((read(and(write(response(Ames((with(consistent(high(throughput(No(downAme(for(so?ware(upgrades,(hardware(maintenance,(etc.(JSONJSONJSONJSONJSONPERFORMANCEFlexible*Data*Model*JSON(document(model(with(no(fixed(schema.(Couchbase*Server*
  46. 46. Features*in*Couchbase*Server*2.0*JSON*support* Indexing*and*Querying*Cross*data*center*replica)on*Incremental*Map*Reduce*JSONJSONJSONJSONJSON
  47. 47. Addi)onal*Features*Built;in(clustering(–(All(nodes(equal((Data(replicaAon(with(auto;failover((Zero;downAme(maintenance(((Built;in(managed(cached((((Append;only(storage(layer((Online(compacAon((Monitoring(and(admin(API(&(UI((SDK(for(a(variety(of(languages(
  48. 48. Couchbase*Server*2.0*Architecture*Heartbeat(Process(monitor(Global(singleton(supervisor(ConfiguraAon(manager(on(each(node(Rebalance(orchestrator(Node(health(monitor(one(per(cluster(vBucket(state(and(replicaAon(manager(hQp*REST*management*API/Web*UI*HTTP(8091*Erlang(port(mapper(4369*Distributed(Erlang(21100*Y*21199*Erlang/OTP*storage(interface(Couchbase*EP*Engine*11210*Memcapable((2.0(Moxi*11211*Memcapable((1.0(Memcached*New*Persistence*Layer*8092*Query(API(Query*Engine*Data*Manager* Cluster*Manager*
  49. 49. 3(3( 2(Cross*data*center*replica)on*–*Data*flow*2(Managed(Cache(Disk(Queue(Disk(ReplicaAon(Queue(App(Server(Couchbase(Server(Node(Doc*1*Doc*1*Doc*1*To(other(node(XDCR(Queue(Doc*1*To(other(cluster(
  50. 50. Cross*Datacenter*Replica)on*(XDCR)*
  51. 51. Couchbase*plugYin*for*Elas)csearch*
  52. 52. How*does*it*work?*Elas)cSearch*UnidirecAonal(Cross(Data(Center(ReplicaAon(
  53. 53. ElasAcsearch(IntegraAon((via(XDCR)(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(1(Doc(6(DISK(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(2(Doc(6(DISK(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(3(Doc(6(DISK(Couchbase(Cluster(West(Coast(Data(Center(ES(SERVER(1(ElasAc(Search(Cluster(ES(SERVER(2( ES(SERVER(3(Couchbase(Transport(Plugin(Couchbase(Transport(Plugin(Couchbase(Transport(Plugin(
  54. 54. Install*the*Couchbase*PlugYIn*•  PreYrequisite*­  ExisAng(Couchbase(and(ElasAcSearch(Clusters(•  Install*the*Elas)cSearch*Couchbase*Transport*PlugYin*­  bin/plugin(;install((((((((((((((couchbaselabs/elasAcsearch;transport;couchbase/1.0.0;dp(•  Configure*the*PlugYin*­  Set(a(password(­  Install(the(Couchbase(Index(Template(•  Restart*Elas)cSearch*•  Create*an*Elas)cSearch*index*for*your*documents*
  55. 55. Configure*Couchbase*XDCR*(step*1)*
  56. 56. Configure*Couchbase*XDCR*(step*2)*
  57. 57. Documents*are*now*indexed*in*Elas)csearch*Document(Count(Increasing(
  58. 58. Reference*Architecture*
  59. 59. Recommended*Usage*PaQern*Elas)cSearch*1.((ElasAcSearch(Query(2.(ElasAcSearch(Result(3.(Couchbase(MulA;GET(4.(Couchbase(Result(
  60. 60. Common*Couchbase*Use*Cases*Social*Gaming**•  Couchbase(stores(player(and(game(data((•  Examples(customers(include:(Zynga(•  Tapjoy,(Ubiso?,(Tencent(**Mobile*Apps**•  Couchbase(stores(user(info(and(app(content(•  Examples(customers(include:(Kobo,(PlayAka((**Ad*Targe)ng**•  Couchbase(stores(user(informaAon(for(fast(access(•  Examples(customers(include:(AOL,(Mediamind,(Convertro((*Session*store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Concur,(Sabre(*User*Profile*Store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Tunewiki(*High*availability*cache**•  Couchbase(Server(used(as(a(cache(Aer(replacement(•  Examples(customers(include:(Orbitz(*Content*&*Metadata*Store*•  Couchbase(document(store(with(ElasAc(Search(•  Examples(customers(include:(McGraw(Hill(**3rd*party*data**aggrega)on***•  Couchbase(stores(social(media(and(data(feeds(•  Examples(customers(include:(Sambacloud(*
  61. 61. Social*Gaming**•  Couchbase(stores(player(and(game(data((•  Examples(customers(include:(Zynga(•  Tapjoy,(Ubiso?,(Tencent(**Mobile*Apps**•  Couchbase(stores(user(info(and(app(content(•  Examples(customers(include:(Kobo,(PlayAka((**Ad*Targe)ng**•  Couchbase(stores(user(informaAon(for(fast(access(•  Examples(customers(include:(AOL,(Mediamind,(Convertro((*Session*store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Concur,(Sabre(*User*Profile*Store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Tunewiki(*High*availability*cache**•  Couchbase(Server(used(as(a(cache(Aer(replacement(•  Examples(customers(include:(Orbitz(*Content*&*Metadata*Store*•  Couchbase(document(store(with(ElasAc(Search(•  Examples(customers(include:(McGraw(Hill(**3rd*party*data**aggrega)on***•  Couchbase(stores(social(media(and(data(feeds(•  Examples(customers(include:(Sambacloud(*
  62. 62. RealYworld*example*Couchbase*+*Elas)csearch*
  63. 63. • Content*metadata*• Content:*Ar)cles,*text**• Landing*pages*for*website*• Digital*content:*eBooks,*magazine,*research*material**Content*and*Metadata*Store*Use*Case:*Content*and*Metadata*Store*•  Flexibility*to*store*any*kind*of*content*•  Fast*access*to*content*metadata*(most*accessed*objects)*and*content**•  FullYtext*Search*across*data*set*•  Scales*horizontally*as*more*content*gets*added*to*the*system*• Fast*access*to*metadata*and*content*via*objectYmanaged*cache*• JSON*provides*schema*flexibility*to*store*all*types*of*content*and*metadata*• Indexing*and*querying*provides*realY)me*analy)cs*capabili)es*across*dataset**• Integra)on*with*Elas)cSearch*for*fullYtext*search*• Ease*of*scalability*ensures*that*the*data*cluster*can*be*grown*seamlessly*as*the*amount*of*user*and*ad*data*grows*Types*of*Data* Applica)on*Requirements*Why*NoSQL*and*Couchbase**
  64. 64. McGraw*Hill*Educa)on*Labs**Learning*portal**
  65. 65. Use*Case:*Content*and*metadata*store*Building(a(self;adapAng,(interacAve(learning(portal(with(Couchbase(and(ElasAcsearch(
  66. 66. As learning move online in great numbersGrowing need to build interactive learning environments thatScale!!Scale(to(millions(of(learners(Serve(MHE(as(well(as(third;party(content(Including(open(content(Support(learning(apps(010100100111010101010101001010101010(Self;adapt(via(usage(data(The Problem*
  67. 67. • Allow(for(elasAc(scaling(under(spike(periods(• Ability(to(catalog(&(deliver(content(from(many*sources*• Consistent(lowYlatency*for(metadata(and(stats(access(• Require(fullYtext*search*support(for(content(discovery(• Offer(tunable(content(ranking(&(recommendaAon(funcAons((Backend is an Interactive Content Delivery Cloud that must:XML(Databases(SQL/MR(Engines(In;memory(Data(Grids(Enterprise(Search(Servers(Experimented with a combination of:The Challenge*
  68. 68. •  Document(Modeling(•  Metadata(&(Content(Storage(•  View(Querying(to(support(Content(Browsing(•  ElasAcsearch(IntegraAon((;  Content(Updated(in(near(Real;Time(;  Search(Content(Summaries(;  Relevancy(boosted(based(on(User(Preferences(•  Real;Time(Content(Updates(•  Event(Logging(for(offline(analysis(Techniques*Used*
  69. 69. Couchbase*2.0*****+******Elas)csearch*Store(full-text articles(as(well(as(document metadata(for(image,(video(and(text(content(in(Couchbase(Combine(user(preferences(staAsAcs(with(customrelevancy scoring(to(provide(personalized search resultsLogs(user behavior(to(calculate(user(preference(staAsAcs((e.g.(video(>(text)(1(2( 4(ConAnuously(accept updatesfrom(Couchbase(with(new(content(&(stats(3(
  70. 70. Data(Model(Content Metadata BucketUser ProfilesBucketContent StatsBucket•  Stores content metadata formedia objects and content forarticles•  Includes tags, contributors, typeinformation•  Includes pointer to the media•  Stores user view details per type•  Updated every time a user viewsa doc with running count•  To be used for customizing ESsearch results per userpreference•  Stores content view details•  Updated for every time adocument is viewed•  To be used for boosting ESsearch results based onpopularity
  71. 71. Couchbase ViewsTop Contributors &Tagsdriven byIncremental MapReduceViews!Calcula)ng*sta)s)cs*via*Couchbase*
  72. 72. Tuning(content(ranking(via(ElasticSearchElasticSearch-drivenbased on settings below!Content popularity boost!User preference boost!
  73. 73. {"_id": 4ae5be2df3122f06ba45b70753001841 ,_rev : 1-0013b349ffc3afc700000000068000000 ,$flags : 0,#expiration : 0,type : access ,user : chris@gmail.com ,resource : 379823 ,timestamp : 2012-09-02T22:46:07Z}{"_id": 4ae5be2df3122f06ba45b70753001842 ,_rev : 1-0013b349ffc3afc700000000068000000 ,$flags : 0,#expiration : 0,type : create ,user : chris.tse@gmail.com ,resource : 948177 ,timestamp : 2012-09-02T22:48:32Z}What?!Who?!Which?!When?!Analy)cs*and*Event*Logging*•  Store*full*event*log*for*offline*analysis*•  Stored*on*a*separate*analy)cs*cluster**•  Limit*impact*on*OLTP*•  Tuned*differently*•  Keep*an*upperYbound*on*data*size*via*TTL*(24*hrs)*
  74. 74. {"filter": {"term": {"type": "video}},"boost": USER_VIDEO_PREFERENCE *PREFERENCE_SLIDER}User*Preference*Boost*•  Use*Elas)csearch*filter*boos)ng*
  75. 75. "script": "_score * (((doc[popularity].value +1) / AVG_POPULARITY ) * POPULARITY_SLIDER)"Document Popularity Boost*•  Use*Elas)csearch*custom*script*to*score*documents*
  76. 76. "filters": [ { "filter":{ "term":{ "type":"video" } },"boost": USER_VIDEO_PREFERENCE *PREFERENCE_SLIDER }, … imageand texts filters omitted … ],"score_mode": "total" } },"script": "_score * (((doc[popularity].value +1) / AVG_POPULARITY ) *POPULARITY_SLIDER)" }Combined Algorithm in a Query*
  77. 77. The Learning Portal*•  Designed and built as acollaboration between MHE Labsand Couchbase•  Serves as proof-of-concept andtesting harness for Couchbase +Elasticsearch integration•  Available for download and furtherdevelopment as open sourcecodeh"ps://github.com/couchbaselabs/learningportal5
  78. 78. Q*&*A*
  79. 79. Thank*you********dip)@couchbase.com*uri.boness@elas)csearch.com*****
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×