To infinity and beyond
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

To infinity and beyond

  • 1,997 views
Uploaded on

Elastic::Model is a new framework to store your Moose objects, which uses ElasticSearch as a NoSQL document store and flexible search engine. ...

Elastic::Model is a new framework to store your Moose objects, which uses ElasticSearch as a NoSQL document store and flexible search engine.

It is designed to make small beginnings simple, but to scale easily to Big Data requirements without needing to rearchitect your application. No job too big or small!

This talk will introduce Elastic::Model, demonstrate how to develop a simple application, introduce some more advanced techniques, and discuss how it uses ElasticSearch to scale.

https://github.com/clintongormley/Elastic-Model

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,997
On Slideshare
1,973
From Embeds
24
Number of Embeds
3

Actions

Shares
Downloads
20
Comments
0
Likes
2

Embeds 24

http://localhost 12
http://www.scoop.it 10
http://www.linkedin.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. To infinity and beyond! A practical guide for Mooseherds (and other carers of livestock) @clintongormley #elasticsearch YAPC::EU 2012
  • 2. I have an idea for a killer app!
  • 3. Quick! Lets...
  • 4. Design our objects
  • 5. Flatten them into tables
  • 6. Normalize data
  • 7. Add indexes
  • 8. Add tables formany-to-one
  • 9. More indexes
  • 10. Need full text search?
  • 11. Copy data tosearch engine
  • 12. Keep the two in sync
  • 13. Get search results,pull objects from DB
  • 14. Success!
  • 15. Need to scale
  • 16. Buy a bigger box
  • 17. Tune indexes
  • 18. Add caching
  • 19. Fix caching bugs
  • 20. Master - Slave replication
  • 21. Buy SSDs
  • 22. Denormalize data
  • 23. Buy bigger boxes
  • 24. Shard your data (ie rewrite your application)
  • 25. Do you really need a relational DB?
  • 26. Do you really need a relational DB? faster horse?
  • 27. NoSQL advantages
  • 28. Document oriented
  • 29. ...just store your object
  • 30. Fast reads and writes
  • 31. Scale horizontally
  • 32. Recover from failure
  • 33. But...
  • 34. Different from RDBM
  • 35. No transactions
  • 36. No joins
  • 37. Denormalized data
  • 38. Still need to add: indexes
  • 39. Still need to add: full text search
  • 40. elasticsearch
  • 41. Real timedocument store
  • 42. Powerfulfull text search (Near real time: < 1 second)
  • 43. Filters, geolocation...
  • 44. Distributed by design
  • 45. Fault tolerant
  • 46. Easy sharding
  • 47. Start smallScale massively
  • 48. Why keep twodatastores in sync?
  • 49. Just useelasticsearch
  • 50. withElastic::Model
  • 51. Store and query Moose objects
  • 52. Exposes full power of elasticsearch
  • 53. and takes care ofthe housekeeping
  • 54. How?
  • 55. package MyApp::Post;use Moose;has title => ( is => rw, isa => Str);has content => ( is => rw, isa => Str);has created => ( is => rw, isa => DateTime, default => sub { DateTime->now });
  • 56. package MyApp::Post; package MyApp::User;use Moose; use Moose;has title => ( has name => ( is => rw, is => rw, isa => Str isa => Str); );has content => ( has email => ( is => rw, is => rw, isa => Str isa => Str,); required => 1 );has created => ( is => rw, isa => DateTime, default => sub { DateTime->now });
  • 57. package MyApp::Post; package MyApp::User;use Moose; use Moose;has title => ( has name => ( is => rw, is => rw, isa => Str isa => Str); );has content => ( has email => ( is => rw, is => rw, isa => Str isa => Str,); required => 1 );has created => ( is => rw, isa => DateTime, default => sub { DateTime->now });has user => ( is => ro, isa => MyApp::User,);
  • 58. package MyApp::Post; package MyApp::User;use Moose; use Moose;has title => ( has name => ( is => rw, is => rw, isa => Str isa => Str); );has content => ( has email => ( is => rw, is => rw, isa => Str isa => Str,); required => 1 );has created => ( is => rw, isa => DateTime, default => sub { DateTime->now });has user => ( is => ro, isa => MyApp::User,);
  • 59. package MyApp::Post; package MyApp::User;use Moose; use Moose;has title => ( has name => ( is => rw, is => rw, isa => Str isa => Str); );has content => ( has email => ( is => rw, is => rw, isa => Str isa => Str,); required => 1 );has created => ( is => rw, isa => DateTime, default => sub { DateTime->now });has user => ( is => ro, isa => MyApp::User,);
  • 60. package MyApp::Post; package MyApp::User;use Elastic::Doc; use Elastic::Doc;has title => ( has name => ( is => rw, is => rw, isa => Str isa => Str); );has content => ( has email => ( is => rw, is => rw, isa => Str isa => Str,); required => 1 );has created => ( is => rw, isa => DateTime, default => sub { DateTime->now });has user => ( is => ro, isa => MyApp::User,);
  • 61. Some definitions...elasticsearch* index Like a database* type Like a table* doc Like a row in a table* alias Like a symbolic link, points to one or more indicesElastic::Model* domain An index or an alias, used for CRUD* namespace Maps type <=> class for all associated domains* model Connects your app to elasticsearch.
  • 62. We need a Model
  • 63. package MyApp;use Elastic::Model;
  • 64. package MyApp;use Elastic::Model;has_namespace myapp => {};
  • 65. package MyApp;use Elastic::Model;has_namespace myapp => { user => MyApp::User, post => MyApp::Post,};
  • 66. package MyApp;use Elastic::Model;has_namespace myapp => { user => MyApp::User, post => MyApp::Post,};# like table <=> class
  • 67. Using our Model
  • 68. use MyApp;
  • 69. use MyApp;my $model = MyApp->new;
  • 70. use MyApp;my $model = MyApp->new;To do anything useful, we need:my $namespace = $model->namespace(myapp);# For index and alias managementmy $domain = $model->domain(myapp);# For document CRUDmy $view = $model->view;# For searching
  • 71. Namespace: Create an indexmy $namespace = $model->namespace(myapp);$namespace->index->create;* create index myapp* namespace:myapp => index:myapp
  • 72. Namespace: Delete an indexmy $namespace = $model->namespace(myapp);$namespace->index->delete;
  • 73. Namespace: Create an aliasmy $namespace = $model->namespace(myapp);$namespace->index(myapp_v1)->create;$namespace->alias->to(myapp_v1);* alias:myapp => index:myapp_v1* namespace:myapp => alias:myapp => index:myapp_v1
  • 74. Domain: Create a usermy $domain = $model->domain(myapp);my $user = $domain->new_doc( user => { name => Clinton, email => clint@foo.com, });$user->save;
  • 75. Domain: Create a usermy $domain = $model->domain(myapp);my $user = $domain->create( user => { name => Clinton, email => clint@foo.com, });$user->save;
  • 76. Domain: Create a usermy $domain = $model->domain(myapp);my $user = $domain->create( user => { name => Clinton, email => clint@foo.com, id => 1, });say $user->id;# 1say $user->type;# user
  • 77. Domain: Create a postmy $domain = $model->domain(myapp);my $post = $domain->create( post => { id => 2, title => To infinity and beyond, content => Elastic::Model persists Moose . . objects in elasticsearch, user => $user });
  • 78. Domain: Retrieve a docmy $domain = $model->domain(myapp);my $post = $domain->get( post => 2 );my $user = $post->user; # stub objectsay $user->id; # still stub# 1say $user->name; # full object# Clinton
  • 79. Domain: Update a docmy $domain = $model->domain(myapp);$post->title(Awesome blog post);say $post->has_changed;# 1say $post->has_changed(title);# 1say $post->old_value(title);# To infinity and beyond$post->save;
  • 80. optimisticversion control
  • 81. $version++on every change
  • 82. 1: $post = $domain->get(post=>2); 2: $post = $domain->get(post=>2);1: $post->title(Awesome blog post); 2: $post->title(Brilliant blog post);1: $post->save; 2: $post->save; *** CONFLICT ERROR ***
  • 83. Dealing with conflicts
  • 84. Ignore them $post->overwrite;
  • 85. on_conflict handler
  • 86. $post->save( on_conflict => sub { my ($old,$new) = @_; # do something # to resolve conflict});
  • 87. $post->save( on_conflict => sub { my ($old,$new) = @_; my %changed = $old->old_values; $new->$_( $changed->{$_} ) for keys %changed; $new->save; $post = $new;});
  • 88. Query docs: View $results = $model->view->search;
  • 89. Views are reusable$posts = $model->view( type => post );$featured = $posts->filterb( featured => 1 );
  • 90. Single domain $view = $domain->view;
  • 91. Multi domain$view = $model->view;
  • 92. Multi domain$view = $model->view;$view = $model->view->domain(foo,bar);
  • 93. Multi type$view = $model->view;$view = $model->view->type(user,post);
  • 94. my $view = $domain ->view ->type( post) ->filterb( created => { gte => 2012-08-01 }, user => $user, ) ->queryb( title => awesome ) ->sort( timestamp ) ->size( 20 ) ->highlight( content ) ->explain( 1 ); See "Terms of Endearment" on speakerdeck.com
  • 95. First result$results = $view->first
  • 96. $size results $results = $view->search;
  • 97. Unbounded results $results = $view->scroll $results = $view->scan
  • 98. Results are iterators $result = $results->next $result = $results->prev $result = $results->first $result = $results->last $result = $results->shift
  • 99. Result is:metadata + object say $result->object->title
  • 100. my $results = $view->search;say "Total hits: " . $results->total;say "Took: " . $results->took . "ms";while ( my $result = $results->next ) { say "Title:" . $result->object->title; say "Snippets:" . join "n", $result->highlight(content); say "Score:" . $result->score; say "Debug:" . $result->explain;}
  • 101. Just the object $object = $results->next_object
  • 102. Just objects $results->as_objects;$object = $results->next;
  • 103. Enough dull API!
  • 104. Not just a doc store
  • 105. *** POWERFUL *** search engine
  • 106. BUT...
  • 107. You can only get out what you put in
  • 108. Prepare your data
  • 109. Tell elasticsearch:* what fields you have* what data they contain* how to index them
  • 110. "Mapping"(like a database schema)
  • 111. Moose gives usintrospection (takes the pain away)
  • 112. Examples: analyzed full texthas name => ( name: { is => rw, type: "string" isa => Str, });
  • 113. Examples: analyze and stem texthas name => ( name: { is => rw, type: "string", isa => Str, analyzer: "english" analyzer => english });
  • 114. Examples: analyze and stem texthas name => ( name: { is => rw, type: "string", isa => Str, analyzer: "norwegian" analyzer => norwegian });
  • 115. Examples: store the exact valuehas tag => ( tag: { is => rw, type: "string", isa => Str, index: "not_analyzed" index => not_analyzed });
  • 116. Examples: complex datause MooseX::Types::Moose qw(Str);use MooseX::Types::Structured qw(Dict);has name => ( name: { is => rw, type: "object", isa => Dict[ properties: { first => Str, first: { type: string }, last => Str, last: { type: string }, middle => Optional[Str], middle: { type: string} ], }); }
  • 117. Examples: Elastic::Doc classeshas user => ( user: { is => rw, type: "object", isa => MyApp::User, properties: {); name: { type: string }, email: { type: string }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  • 118. Examples: Elastic::Doc classes Denormalisedhas user => ( is data! => rw, user: { type: "object", isa => MyApp::User, properties: {); name: { type: string }, email: { type: string }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  • 119. Examples: Elastic::Doc classeshas user => ( user: { is => rw, type: "object", isa => MyApp::User, properties: { exclude_attrs => [email] name: { type: string },); email: { type: string }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  • 120. Examples: Elastic::Doc classeshas user => ( user: { is => rw, type: "object", isa => MyApp::User, properties: { include_attrs => [email] name: { type: string },); email: { type: string }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  • 121. Examples: Elastic::Doc classeshas user => ( user: { is => rw, type: "object", isa => MyApp::User, properties: { include_attrs => [] name: { type: string },); email: { type: string }, uid: { type: "object", properties: { index: {...}, type: {...}, id: {...}, routing: {...} } } } }
  • 122. Same data. Different purposehas title => ( title: { is => rw, type: "string" isa => Str, }}title => An AMAZING talk! title: [amazing,talk] What do you sort on? amazing or talk
  • 123. Multi-fieldsindex the same data in different ways
  • 124. Same data. Different purposehas title => ( is => rw, isa => Str,}
  • 125. Same data. Different purposehas title => ( is => rw, isa => Str, multi => { untouched => { index => not_analyzed } }}
  • 126. Same data. Different purposehas title => ( is => rw, isa => Str, multi => { untouched => { index => not_analyzed } }}title => An AMAZING talk! title: { title: [amazing,talk], untouched: "An AMAZING talk!" }
  • 127. Lets TWEAK stuff!
  • 128. How aboutAUTO-COMPLETE?
  • 129. Dont use wildcards Slow & inefficient
  • 130. Prepare your data: "Analysis"
  • 131. With edge-ngrams
  • 132. Analysis process"Édith Piaf" -> standard tokenizer ->["Édith", "Piaf"] -> lowercase token filter ->["édith", "piaf"] -> ascii-folding token filter ->["edith", "piaf"] -> edge-ngrams token filter ->["e", "ed", "edi", "edit", "edith", "p", "pi", "pia", "piaf"] Perfect for partial matching!
  • 133. Add a custom analyzer to our Modelpackage MyApp;use Elastic::Model;has_namespace myapp => { user => MyApp::User, type => MyApp::Post,};
  • 134. Add a custom analyzer to our Modelpackage MyApp;use Elastic::Model;has_namespace myapp => { user => MyApp::User, type => MyApp::Post,};
  • 135. Add a custom analyzer to our Modelpackage MyApp;use Elastic::Model;has_namespace myapp => { user => MyApp::User, type => MyApp::Post,};has_filter my_edge_ngrams => { type => edge_ngrams, min_gram => 1, max_gram => 15};
  • 136. Add a custom analyzer to our Modelpackage MyApp;use Elastic::Model;has_namespace myapp => { user => MyApp::User, type => MyApp::Post,};has_filter my_edge_ngrams => { type => edge_ngrams, min_gram => 1, max_gram => 15};has_analyzer autocomplete => { tokenizer => standard, filter => [lowercase,asciifolding, my_edge_ngrams]};
  • 137. Add analyzer to our Doc classhas title => ( is => rw, isa => Str, multi => { untouched => { index => not_analyzed } }}
  • 138. Add analyzer to our Doc classhas title => ( is => rw, isa => Str, multi => { untouched => { index => not_analyzed }, autocomplete => { analyzer => autocomplete } }}
  • 139. Add analyzer to our Doc classhas title => ( title => An AMAZING talk! is => rw, isa => Str, multi => { title: { untouched => { title: [amazing,talk], index => not_analyzed untouched: "An AMAZING talk!" }, } autocomplete => { analyzer => autocomplete } }}
  • 140. Add analyzer to our Doc classhas title => ( title => An AMAZING talk! is => rw, isa => Str, multi => { title: { untouched => { title: [amazing,talk], index => not_analyzed untouched: "An AMAZING talk!", }, autocomplete: [ autocomplete => { a, am, ama, amaz, analyzer => autocomplete amazi, amazin, amazing, } t, ta, tal, talk } ]} }
  • 141. Apply your changes
  • 142. Update the mapping AND the data
  • 143. Reindex
  • 144. $new = $namespace->index(myapp_v2);$new->reindex(myapp);$namespace->alias->to(myapp_v2);$namespace->index(myapp_v1)->delete;
  • 145. Autocomplete query
  • 146. $view = $domain->view->queryb();
  • 147. $view = $domain->view->queryb( "title.autocomplete" => "amazing ta",);
  • 148. $view = $domain->view->queryb( "title.autocomplete" => "amazing ta",); Matches anything starting with a or t BOOH!
  • 149. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query => "amazing ta", } });
  • 150. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query => "amazing ta", operator => "or" } }); "a OR am OR ama OR amaz OR ... OR t OR ta"
  • 151. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query => "amazing ta", operator => "and" } });
  • 152. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query => "amazing ta", operator => "and" } }); Complete words should be more relevant
  • 153. $view = $domain->view->queryb( "title.autocomplete" => { -text => { query => "amazing ta", operator => "and" } }, "title" => "amazing ta",);
  • 154. $view = $domain->view->queryb([ "title.autocomplete" => { -text => { query => "amazing ta", operator => "and" } }, "title" => "amazing ta",]);
  • 155. Done!
  • 156. Scaling
  • 157. To infinity and beyond!
  • 158. Basic unit of scale: the shard
  • 159. An index has 1-or-moreprimary shards
  • 160. Each primary has 0-or-more replica shards
  • 161. Primariesscale total data
  • 162. Replicas arefor failover andto scale queries
  • 163. Default: 5 primary shardswith 1 replica each
  • 164. 5 * (1 + 1) = 10 shards
  • 165. 10 shards =1 .. 10 servers
  • 166. Can changenumber of replicas
  • 167. CANNOT changenumber of primaries
  • 168. So how do we scale?
  • 169. Kagillion shards!
  • 170. Umm, No.
  • 171. Be a growernot a shower
  • 172. At query time:
  • 173. 1 index x 10 shards ==10 indices x 1 shard
  • 174. Two patterns:
  • 175. Time based indices Index-per-user
  • 176. Time based indices Index-per-user
  • 177. * one index per month* write to alias: logs_current* query alias: logs
  • 178. $ns = $model->namespace(logs);$ns->index(logs_2012_08)->create;$ns->alias(logs_current)->to(logs_2012_08);$ns->alias->to(logs_2012_08);$model->domain(logs_current)->create( log => %data );$model->domain(logs)->view->search;
  • 179. New month, new index $ns->index(logs_2012_09)->create; $ns->alias(logs_current)->to(logs_2012_09); $ns->alias->add(logs_2012_09);
  • 180. Add alias for 2012 $ns->alias(logs_2012)->to( logs_2012_08, logs_2012_09, ... );
  • 181. Time based indices Index-per-user
  • 182. Users have their own data
  • 183. Most searches are per-user
  • 184. Ideal:Index-per-user
  • 185. Expensive
  • 186. Most users have little data
  • 187. Some have LOTS!
  • 188. Start with one indexfor all users
  • 189. Use aliasesto pretend
  • 190. ...aliases with...filters and routing
  • 191. $ns->alias( bloggs_plumbers )->to( myapp_v1 => { filterb => { client_id => bloggs_plumbers }, routing => bloggs_plumbers });
  • 192. Routing determines:which shard stores your data
  • 193. Routing == bloggs_plumbersAll users data on same shard
  • 194. CRUD -> hit one shardQueries -> hit one shard
  • 195. SUPER efficient!
  • 196. New client joins...
  • 197. ...called "Twitter"
  • 198. 6 months later...
  • 199. $new = $ns->index(twitter_v1);$new->reindex(twitter);$ns->alias(twitter)->to(twitter_v1);$ns->alias->add(twitter_v1);
  • 200. What more do you need?
  • 201. Go forth and HERD!