Using Elasticsearch and Couchbase Together to Build Large Scale Applications
 

Using Elasticsearch and Couchbase Together to Build Large Scale Applications

on

  • 6,634 views

Couchbase Server 2.0 allows for full-text search integration. In this webinar we examine how you can integrate your Couchbase Server 2.0 cluster with an Elasticsearch Cluster to provide enhanced ...

Couchbase Server 2.0 allows for full-text search integration. In this webinar we examine how you can integrate your Couchbase Server 2.0 cluster with an Elasticsearch Cluster to provide enhanced querying capabilities and build large scale applications.

Statistics

Views

Total Views
6,634
Slideshare-icon Views on SlideShare
6,220
Embed Views
414

Actions

Likes
7
Downloads
95
Comments
0

3 Embeds 414

http://www.couchbase.com 397
https://twitter.com 15
http://bundlr.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Using Elasticsearch and Couchbase Together to Build Large Scale Applications Using Elasticsearch and Couchbase Together to Build Large Scale Applications Presentation Transcript

    • Using&Elas*csearch&and&Couchbase&Together&to&Build&Large&Scale&Apps&Uri&Boness,&Founder,&Elas*csearch&Dip*&Borkar,&Director,&Products,&Couchbase&
    • Introduction to Elasticsearch
    • What is Elasticsearch?Open source Apache 2 license•multi-tenant, realtime anddistributed search & analyticsengineBacked by Elasticsearch (the company)•Proven technology in productionOver 2 million downloads•
    • What can Elasticsearch do?Unstructured searchfind all companies in the “search” market•Structured searchfind all companies founded since 2000•Analyticsfind the average annual revenue of all companies•Combine allfind the average annual revenue of all companies foundedsince 2000 within the “search” market•
    • (near) real-time!
    • Distributed & multi-tenantA node is single Elasticsearch instanceMultiple nodes can form a clusterA cluster can manage multiple indicesA cluster is agile & self managingcontinuously ensuring the distributed characteristics of allindices are maintained and that all nodes in the cluster areefficiently & effectively utilized••••
    • The Index
    • What’s in an index?An identified collection of documentsBuilt & designed for small & large scalesdata volumesdata can be split and distributed between shardsloads & HAeach shard can have zero or more replicas••••
    • starting a nodenode_1
    • creating our first indexnode_1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
    • the two shards are allocatednode_10 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
    • starting a second nodenode_1 node_20 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
    • shards are relocatingnode_1 node_20 1curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
    • replicas are allocatednode_1 node_20 11 0curl -XPUT localhost:9200/companies -d {"settings" : {"index" : {"number_of_shards" : 2,"number_of_replicas" : 1}}}
    • Indexing Data
    • the dataDocuments are typically JSON formatted•curl -XPUT localhost:9200/companies/company/1 -d {"id" : "elasticsearch","name" : "elasticsearch","website" : "http://www.elasticsearch.com","category" : "software","overview" : "distributed search & analytics engine","founded_year" : 2012,"location" : {"city" : "Amsterdam","country_code" : "NL","geo" : {"lat" : 52.370176,"lon" : 4.895008}}}
    • sending req. to one of the nodesnode_3node_1 node_20 11 010client
    • sending req. to one of the nodesnode_3node_1 node_20 11 010clientresolve thetarget shard
    • resolve shard & index to primarynode_3node_1 node_20 11 010client
    • replicate to replicasnode_3node_1 node_20 11 010client
    • Searching
    • unstructured searchUsing an extensive & powerful QueryDSL•curl -XGET localhost:9200/companies/_search -d {"query" : {,"match" : {"overview" : "search"}}}
    • unstructured searchUsing an extensive & powerful QueryDSL•curl -XGET localhost:9200/companies/_search -d {"query" : {,"match" : {"overview" : "search"}}}search for the term “search” in the “overview”field
    • structured searchnarrows the “searchable” document space•curl -XGET localhost:9200/companies/company/_search -d {"query" : {,"filtered" : {"query" : {"match" : {"overview" : "search"}},"filter" : {"range" : {"founded_year" : { "gte" : 2000 }}}}}}
    • structured searchnarrows the “searchable” document space•curl -XGET localhost:9200/companies/company/_search -d {"query" : {,"filtered" : {"query" : {"match" : {"overview" : "search"}},"filter" : {"range" : {"founded_year" : { "gte" : 2000 }}}}}}only search companies that were founded since year 2000
    • returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
    • returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
    • returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
    • returned hits{..."hits": [{"_index": "companies","_type": "company","_id": "1","_score": 0.13424811,"_source": {"id": "elasticsearch","name": "elasticsearch","website": "http://www.elasticsearch.com","category": "software","founded_year": 2012,"overview": "distributed search & analytics engine","location": {"city": "Amsterdam","country_code": "NL","geo": {"lat": 52.370176,"lon": 4.895008}}}}]}}
    • Query DSLQueries (unstructured)term queriesboolean queriesphrase (proximity) queriesfuzzy/prefix/regexp/wildcardsmore...Filters (structured)term (exact match)rangebooleangeo_* (e.g. geo_distance)••
    • Analytics(a.k.a facets)
    • Analytics (facets)Slice & dice your dataCompute aggregations over field valuesAcross any index field/sAll in (near) realtime••••
    • used as navigation aid
    • or analytics dashboards
    • Elasticsearch is often usedpurely for analytics(without incorporating free text search)
    • ExampleFind the average revenue of all companiessince 2000•curl -XGET localhost:9200/companies/revenues/_search -d {"query" : {"match_all" : {}},"facets" : {"revenue_stats" : {"date_histogram" : {"key_field" : "year","value_field" : "value","interval" : "month"}}}}
    • ExampleFind the average revenue of all companiessince 2000•curl -XGET localhost:9200/companies/revenues/_search -d {"query" : {"match_all" : {}},"facets" : {"revenue_stats" : {"date_histogram" : {"key_field" : "year","value_field" : "value","interval" : "month"}}}}return a yearly breakdown of stats over companies revenues
    • response"facets": {"revenue_stats": {"_type": "date_histogram","entries": [{"time": 956448895664,"mean": 23.0},{"time": 987984922557,"mean": 267.1034482758621},{"time": 1019520942098,"mean": 195.51724137931035}...]}}
    • response"facets": {"revenue_stats": {"_type": "date_histogram","entries": [{"time": 956448895664,"mean": 23.0},{"time": 987984922557,"mean": 267.1034482758621},{"time": 1019520942098,"mean": 195.51724137931035}...]}}year 2000avg revenue
    • Types of analyticstermsunique value countsrangestatistics of specific field for a set of range groups ofanother fieldstatisticalstats over a specific fieldterms_statsstats over a specific fields for every unique field valuedate_/histograma breakdown of statistics of a specific field over a•••••
    • There’s much moreFine control of how documents are treatedindexed, stored, text analysis, relationsAdditional featureshighlightingsuggest API (type ahead, auto-completion)percolator (reverse search)support of document relations (parent/child)extensive geo-location search & analyticsmore....••------
    • Introduc)on*to*Couchbase*
    • Couchbase*Server*NoSQL*Document*Database*
    • Couchbase*Open*Source*Project*•  Leading(NoSQL(database(project(focused(on(distributed(database(technology(and(surrounding(ecosystem(•  Supports(both(key;value(and(document;oriented(use(cases(•  All(components(are(available(under(the(Apache*2.0*Public*License*•  Obtained(as(packaged(so?ware(in(both(enterprise(and(community(ediAons.(Couchbase Open Source Project
    • Easy*Scalability*Consistent*High*Performance*Always*On*24x365*Grow(cluster(without(applicaAon(changes,(without(downAme(with(a(single(click(Consistent(sub;millisecond((read(and(write(response(Ames((with(consistent(high(throughput(No(downAme(for(so?ware(upgrades,(hardware(maintenance,(etc.(JSONJSONJSONJSONJSONPERFORMANCEFlexible*Data*Model*JSON(document(model(with(no(fixed(schema.(Couchbase*Server*
    • Features*in*Couchbase*Server*2.0*JSON*support* Indexing*and*Querying*Cross*data*center*replica)on*Incremental*Map*Reduce*JSONJSONJSONJSONJSON
    • Addi)onal*Features*Built;in(clustering(–(All(nodes(equal((Data(replicaAon(with(auto;failover((Zero;downAme(maintenance(((Built;in(managed(cached((((Append;only(storage(layer((Online(compacAon((Monitoring(and(admin(API(&(UI((SDK(for(a(variety(of(languages(
    • Couchbase*Server*2.0*Architecture*Heartbeat(Process(monitor(Global(singleton(supervisor(ConfiguraAon(manager(on(each(node(Rebalance(orchestrator(Node(health(monitor(one(per(cluster(vBucket(state(and(replicaAon(manager(hQp*REST*management*API/Web*UI*HTTP(8091*Erlang(port(mapper(4369*Distributed(Erlang(21100*Y*21199*Erlang/OTP*storage(interface(Couchbase*EP*Engine*11210*Memcapable((2.0(Moxi*11211*Memcapable((1.0(Memcached*New*Persistence*Layer*8092*Query(API(Query*Engine*Data*Manager* Cluster*Manager*
    • 3(3( 2(Cross*data*center*replica)on*–*Data*flow*2(Managed(Cache(Disk(Queue(Disk(ReplicaAon(Queue(App(Server(Couchbase(Server(Node(Doc*1*Doc*1*Doc*1*To(other(node(XDCR(Queue(Doc*1*To(other(cluster(
    • Cross*Datacenter*Replica)on*(XDCR)*
    • Couchbase*plugYin*for*Elas)csearch*
    • How*does*it*work?*Elas)cSearch*UnidirecAonal(Cross(Data(Center(ReplicaAon(
    • ElasAcsearch(IntegraAon((via(XDCR)(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(1(Doc(6(DISK(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(2(Doc(6(DISK(RAM(CACHE(Doc(1(Doc(2(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(Doc(SERVER(3(Doc(6(DISK(Couchbase(Cluster(West(Coast(Data(Center(ES(SERVER(1(ElasAc(Search(Cluster(ES(SERVER(2( ES(SERVER(3(Couchbase(Transport(Plugin(Couchbase(Transport(Plugin(Couchbase(Transport(Plugin(
    • Install*the*Couchbase*PlugYIn*•  PreYrequisite*­  ExisAng(Couchbase(and(ElasAcSearch(Clusters(•  Install*the*Elas)cSearch*Couchbase*Transport*PlugYin*­  bin/plugin(;install((((((((((((((couchbaselabs/elasAcsearch;transport;couchbase/1.0.0;dp(•  Configure*the*PlugYin*­  Set(a(password(­  Install(the(Couchbase(Index(Template(•  Restart*Elas)cSearch*•  Create*an*Elas)cSearch*index*for*your*documents*
    • Configure*Couchbase*XDCR*(step*1)*
    • Configure*Couchbase*XDCR*(step*2)*
    • Documents*are*now*indexed*in*Elas)csearch*Document(Count(Increasing(
    • Reference*Architecture*
    • Recommended*Usage*PaQern*Elas)cSearch*1.((ElasAcSearch(Query(2.(ElasAcSearch(Result(3.(Couchbase(MulA;GET(4.(Couchbase(Result(
    • Common*Couchbase*Use*Cases*Social*Gaming**•  Couchbase(stores(player(and(game(data((•  Examples(customers(include:(Zynga(•  Tapjoy,(Ubiso?,(Tencent(**Mobile*Apps**•  Couchbase(stores(user(info(and(app(content(•  Examples(customers(include:(Kobo,(PlayAka((**Ad*Targe)ng**•  Couchbase(stores(user(informaAon(for(fast(access(•  Examples(customers(include:(AOL,(Mediamind,(Convertro((*Session*store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Concur,(Sabre(*User*Profile*Store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Tunewiki(*High*availability*cache**•  Couchbase(Server(used(as(a(cache(Aer(replacement(•  Examples(customers(include:(Orbitz(*Content*&*Metadata*Store*•  Couchbase(document(store(with(ElasAc(Search(•  Examples(customers(include:(McGraw(Hill(**3rd*party*data**aggrega)on***•  Couchbase(stores(social(media(and(data(feeds(•  Examples(customers(include:(Sambacloud(*
    • Social*Gaming**•  Couchbase(stores(player(and(game(data((•  Examples(customers(include:(Zynga(•  Tapjoy,(Ubiso?,(Tencent(**Mobile*Apps**•  Couchbase(stores(user(info(and(app(content(•  Examples(customers(include:(Kobo,(PlayAka((**Ad*Targe)ng**•  Couchbase(stores(user(informaAon(for(fast(access(•  Examples(customers(include:(AOL,(Mediamind,(Convertro((*Session*store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Concur,(Sabre(*User*Profile*Store**•  Couchbase(Server(as(a(key;value(store(•  Examples(customers(include:(Tunewiki(*High*availability*cache**•  Couchbase(Server(used(as(a(cache(Aer(replacement(•  Examples(customers(include:(Orbitz(*Content*&*Metadata*Store*•  Couchbase(document(store(with(ElasAc(Search(•  Examples(customers(include:(McGraw(Hill(**3rd*party*data**aggrega)on***•  Couchbase(stores(social(media(and(data(feeds(•  Examples(customers(include:(Sambacloud(*
    • RealYworld*example*Couchbase*+*Elas)csearch*
    • • Content*metadata*• Content:*Ar)cles,*text**• Landing*pages*for*website*• Digital*content:*eBooks,*magazine,*research*material**Content*and*Metadata*Store*Use*Case:*Content*and*Metadata*Store*•  Flexibility*to*store*any*kind*of*content*•  Fast*access*to*content*metadata*(most*accessed*objects)*and*content**•  FullYtext*Search*across*data*set*•  Scales*horizontally*as*more*content*gets*added*to*the*system*• Fast*access*to*metadata*and*content*via*objectYmanaged*cache*• JSON*provides*schema*flexibility*to*store*all*types*of*content*and*metadata*• Indexing*and*querying*provides*realY)me*analy)cs*capabili)es*across*dataset**• Integra)on*with*Elas)cSearch*for*fullYtext*search*• Ease*of*scalability*ensures*that*the*data*cluster*can*be*grown*seamlessly*as*the*amount*of*user*and*ad*data*grows*Types*of*Data* Applica)on*Requirements*Why*NoSQL*and*Couchbase**
    • McGraw*Hill*Educa)on*Labs**Learning*portal**
    • Use*Case:*Content*and*metadata*store*Building(a(self;adapAng,(interacAve(learning(portal(with(Couchbase(and(ElasAcsearch(
    • As learning move online in great numbersGrowing need to build interactive learning environments thatScale!!Scale(to(millions(of(learners(Serve(MHE(as(well(as(third;party(content(Including(open(content(Support(learning(apps(010100100111010101010101001010101010(Self;adapt(via(usage(data(The Problem*
    • • Allow(for(elasAc(scaling(under(spike(periods(• Ability(to(catalog(&(deliver(content(from(many*sources*• Consistent(lowYlatency*for(metadata(and(stats(access(• Require(fullYtext*search*support(for(content(discovery(• Offer(tunable(content(ranking(&(recommendaAon(funcAons((Backend is an Interactive Content Delivery Cloud that must:XML(Databases(SQL/MR(Engines(In;memory(Data(Grids(Enterprise(Search(Servers(Experimented with a combination of:The Challenge*
    • •  Document(Modeling(•  Metadata(&(Content(Storage(•  View(Querying(to(support(Content(Browsing(•  ElasAcsearch(IntegraAon((;  Content(Updated(in(near(Real;Time(;  Search(Content(Summaries(;  Relevancy(boosted(based(on(User(Preferences(•  Real;Time(Content(Updates(•  Event(Logging(for(offline(analysis(Techniques*Used*
    • Couchbase*2.0*****+******Elas)csearch*Store(full-text articles(as(well(as(document metadata(for(image,(video(and(text(content(in(Couchbase(Combine(user(preferences(staAsAcs(with(customrelevancy scoring(to(provide(personalized search resultsLogs(user behavior(to(calculate(user(preference(staAsAcs((e.g.(video(>(text)(1(2( 4(ConAnuously(accept updatesfrom(Couchbase(with(new(content(&(stats(3(
    • Data(Model(Content Metadata BucketUser ProfilesBucketContent StatsBucket•  Stores content metadata formedia objects and content forarticles•  Includes tags, contributors, typeinformation•  Includes pointer to the media•  Stores user view details per type•  Updated every time a user viewsa doc with running count•  To be used for customizing ESsearch results per userpreference•  Stores content view details•  Updated for every time adocument is viewed•  To be used for boosting ESsearch results based onpopularity
    • Couchbase ViewsTop Contributors &Tagsdriven byIncremental MapReduceViews!Calcula)ng*sta)s)cs*via*Couchbase*
    • Tuning(content(ranking(via(ElasticSearchElasticSearch-drivenbased on settings below!Content popularity boost!User preference boost!
    • {"_id": 4ae5be2df3122f06ba45b70753001841 ,_rev : 1-0013b349ffc3afc700000000068000000 ,$flags : 0,#expiration : 0,type : access ,user : chris@gmail.com ,resource : 379823 ,timestamp : 2012-09-02T22:46:07Z}{"_id": 4ae5be2df3122f06ba45b70753001842 ,_rev : 1-0013b349ffc3afc700000000068000000 ,$flags : 0,#expiration : 0,type : create ,user : chris.tse@gmail.com ,resource : 948177 ,timestamp : 2012-09-02T22:48:32Z}What?!Who?!Which?!When?!Analy)cs*and*Event*Logging*•  Store*full*event*log*for*offline*analysis*•  Stored*on*a*separate*analy)cs*cluster**•  Limit*impact*on*OLTP*•  Tuned*differently*•  Keep*an*upperYbound*on*data*size*via*TTL*(24*hrs)*
    • {"filter": {"term": {"type": "video}},"boost": USER_VIDEO_PREFERENCE *PREFERENCE_SLIDER}User*Preference*Boost*•  Use*Elas)csearch*filter*boos)ng*
    • "script": "_score * (((doc[popularity].value +1) / AVG_POPULARITY ) * POPULARITY_SLIDER)"Document Popularity Boost*•  Use*Elas)csearch*custom*script*to*score*documents*
    • "filters": [ { "filter":{ "term":{ "type":"video" } },"boost": USER_VIDEO_PREFERENCE *PREFERENCE_SLIDER }, … imageand texts filters omitted … ],"score_mode": "total" } },"script": "_score * (((doc[popularity].value +1) / AVG_POPULARITY ) *POPULARITY_SLIDER)" }Combined Algorithm in a Query*
    • The Learning Portal*•  Designed and built as acollaboration between MHE Labsand Couchbase•  Serves as proof-of-concept andtesting harness for Couchbase +Elasticsearch integration•  Available for download and furtherdevelopment as open sourcecodeh"ps://github.com/couchbaselabs/learningportal5
    • Q*&*A*
    • Thank*you********dip)@couchbase.com*uri.boness@elas)csearch.com*****