How ElasticSearch lives in my DevOps life
Upcoming SlideShare
Loading in...5
×
 

How ElasticSearch lives in my DevOps life

on

  • 5,524 views

 

Statistics

Views

Total Views
5,524
Views on SlideShare
4,902
Embed Views
622

Actions

Likes
12
Downloads
128
Comments
0

8 Embeds 622

http://y-ken.hatenablog.com 599
https://twitter.com 9
http://cloud.feedly.com 6
https://www.google.co.jp 3
http://www.hanrss.com 2
http://www.feedspot.com 1
http://kibana.dev 1
http://getpocket.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Using LogStash::Outputs::STDOUT with `debug => true`
  • Schema free, but please define schema using /_mapping or template.json for performance.
  • http://demo.kibana.org http://demo.logstash.net

How ElasticSearch lives in my DevOps life How ElasticSearch lives in my DevOps life Presentation Transcript

  • ElasticSearch for DevOps
  • What’s ElasticSearch? • “flexible and powerful open source, distributed real-time search and analytics engine for the cloud” • http://www.elasticsearch.org/
  • What’s ElasticSearch? • “flexible and powerful open source, distributed real-time search and analytics engine for the cloud” • JSON-oriented; • RESTful API; • Schema free. MySQL ElasticSearch database Index table Type column field Defined data type Auto detected
  • What’s ElasticSearch? • “flexible and powerful open source, distributed real-time search and analytics engine for the cloud” • Master nodes & data nodes; • Auto-organize for replicas and shards; • Asynchronous transport between nodes.
  • What’s ElasticSearch? • “flexible and powerful open source, distributed real-time search and analytics engine for the cloud” • Flush every 1 second.
  • What’s ElasticSearch? • “flexible and powerful open source, distributed real-time search and analytics engine for the cloud” • Build on Apache lucene. • Also has facets just as solr.
  • What’s ElasticSearch? • “flexible and powerful open source, distributed real-time search and analytics engine for the cloud” • Give a cluster name, auto-discovery by unicast/multicast ping or EC2 key. • No zookeeper needed.
  • Howto Curl • Index $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elastic Search" }‘ {"ok":true,"_index":“twitter","_type":“tweet","_id":"1","_v ersion":1}
  • Howto Curl • Get $ curl -XGET 'http://localhost:9200/twitter/tweet/1' { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_source" : { "user" : "kimchy", "postDate" : "2009-11-15T14:12:12", "message" : "trying out Elastic Search" } }
  • Howto Curl • Query $ curl -XPOST 'http://localhost:9200/twitter/tweet/_search? pretty=1&size=1' -d '{ "query" : { "term" : { "user" : "kimchy" } "fields": ["message"] } }'
  • Howto Curl • Query • Term => { match some terms (after analyzed)} • Match => { match whole field (no analyzed)} • Prefix => { match field prefix (no analyzed)} • Range => { from, to} • Regexp => { .* } • Query_string => { this AND that OR thus } • Must/must_not => {query} • Shoud => [{query},{}] • Bool => {must,must_not,should,…}
  • Howto Curl • Filter $ curl -XPOST 'http://localhost:9200/twitter/tweet/_search? pretty=1&size=1' -d '{ "query" : { “match_all" : {} }, "filter" : { "term" : { “user" : “kimchy" } } }' Much faster because filter is cacheable and do not calcute _score.
  • Howto Curl • Filter • And => [{filter},{filter}] (only two) • Not => {filter} • Or => [{filter},{filter}](only two) • Script => {“script”:”doc[‘field’].value > 10”} • Other like the query DSL
  • Howto Curl • Facets $ curl -XPOST 'http://localhost:9200/twitter/tweet/_search?pretty=1&size=0' -d '{ "query" : { “match_all" : {} }, "filter" : { “prefix" : { “user" : “k" } }, "facets" : { “usergroup" : { "terms" : { "field" : “user" } } } }'
  • Howto Curl • Facets • terms => [{“term”:”kimchy”,”count”:20},{}] • Range <= [{“from”:10,”to”:20},] • Histogram <= {“field”:”user”,”interval”:10} • Statistical <= {“field”:”reqtime”} => [{“min”:,”max”:,”avg”:,”count”:}]
  • Howto Perl – ElasticSearch.pm use ElasticSearch; my $es = ElasticSearch->new( servers => 'search.foo.com:9200', # default '127.0.0.1:9200' transport => 'http' # default 'http' | 'httplite ' # 30% faster, future default | 'httptiny ' # 1% more faster | 'curl' | 'aehttp' | 'aecurl' | 'thrift', # generated code too slow max_requests => 10_000, # default 10000 trace_calls => 'log_file', no_refresh => 0 | 1, );
  • Howto Perl – ElasticSearch.pm use ElasticSearch; my $es = ElasticSearch->new( servers => 'search.foo.com:9200', transport => 'httptiny ‘, max_requests => 10_000, trace_calls => 'log_file', no_refresh => 0 | 1, ); • Get nodelist by /_cluster API from the $servers; • Rand change request to other node after $max_requests.
  • Howto Perl – ElasticSearch.pm $es->index( index => 'twitter', type => 'tweet', id => 1, data => { user => 'kimchy', post_date => '2009-11-15T14:12:12', message => 'trying out Elastic Search' } );
  • Howto Perl – ElasticSearch.pm $es->search( facets => { wow_facet => { query => { text => { content => 'wow' }}, facet_filter => { term => {status => 'active' }}, } } )
  • Howto Perl – ElasticSearch.pm $es->search( facets => { wow_facet => { queryb => { content => 'wow' }, facet_filterb => { status => 'active' }, } } ) ElasticSearch::SearchBuilder More perlish SQL::Abstract-like But I don’t like ==!
  • Howto Perl – Elastic::Model • Tie a Moose object to elasticsearch package MyApp; use Elastic::Model; has_namespace 'myapp' => { user => 'MyApp::User' }; no Elastic::Model; 1;
  • Howto Perl – Elastic::Model package MyApp::User; use Elastic::Doc; use DateTime; has 'name' => ( is => 'rw', isa => 'Str', ); has 'email' => ( is => 'rw', isa => 'Str', ); has 'created' => ( is => 'ro', isa => 'DateTime', default => sub { DateTime->now } ); no Elastic::Doc; 1;
  • Howto Perl – Elastic::Model package MyApp::User; use Moose; use DateTime; has 'name' => ( is => 'rw', isa => 'Str', ); has 'email' => ( is => 'rw', isa => 'Str', ); has 'created' => ( is => 'ro', isa => 'DateTime', default => sub { DateTime->now } ); no Moose; 1;
  • Howto Perl – Elastic::Model • Connect to db my $es = ElasticSearch->new( servers => 'localhost:9200' ); my $model = MyApp->new( es => $es ); • Create database and table $model->namespace('myapp')->index->create(); • CRUD my $domain = $model->domain('myapp'); $domain->newdoc()|get(); • search my $search = $domain->view->type(‘user’)->query(…)->filterb(…); $results = $search->search; say "Total results found: ".$results->total; while (my $doc = $results->next_doc) { say $doc->name; }
  • ES for Dev -- Github • 20TB data; • 1300000000 files; • 130000000000 code lines. • Using 26 Elasticsearch storage nodes(each has 2TB SSD) managed by puppet. • 1replica + 20 shards. • https://github.com/blog/1381-a-whole-new-code-search • https://github.com/blog/1397-recent-code-search-outages
  • ES for Dev – Git::Search • Thank you, Mateu Hunter! • https://github.com/mateu/Git-Search cpanm --installdeps . cp git-search.conf git-search-local.conf edit git-search-local.conf perl -Ilib bin/insert_docs.pl plackup -Ilib curl http://localhost:5000/text_you_want
  • ES for Perler -- Metacpan • search.cpan.org => metacpan.org • use ElasticSearch as API backend; • use Catalyst build website frontend. • Learn API: https://github.com/CPAN-API/cpan-api/wiki/API-docs • Have a try: http://explorer.metacpan.org/
  • ES for Perler – index-weekly • A Perl script (55 lines) to index devopsweekly into elasticsearch. • https://github.com/alcy/index-weekly • We can do same thing to perlweekly,right?
  • ES for logging - Logstash • “logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.” • http://logstash.net/
  • ES for logging - Logstash • “logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.” • Log is stream, not file! • Event is something not only oneline!
  • ES for logging - Logstash • “logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.” • file/*mq/stdin/tcp/udp/websocket…(34 input plugins now)
  • ES for logging - Logstash • “logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.” • date/geoip/grok/multiline/mutate…(29 filter plugins now)
  • ES for logging - Logstash • “logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.” • transfer:stdout/*mq/tcp/udp/file/websocket… • alert:ganglia/nagios/opentsdb/graphite/irc/xmpp /email… • store:elasticsearch/mongodb/riak • (47 output plugins now)
  • ES for logging - Logstash
  • ES for logging - Logstash input { redis { host => "127.0.0.1“ type => "redis-input“ data_type => "list“ key => "logstash“ } } filter { grok { type => “redis-input“ pattern => "%{COMBINEDAPACHELOG}" } } output { elasticsearch { host => "127.0.0.1“ } }
  • ES for logging - Logstash • Grok(Regexp capture): %{IP:client:string} %{NUMBER:bytes:int} More default patterns at source: https://github.com/logstash/logstash/tree/master/patterns
  • ES for logging - Logstash For example: 10.2.21.130 - - [08/Apr/2013:11:13:40 +0800] "GET /mediawiki/load.php HTTP/1.1" 304 - "http://som.d.xiaonei.com/mediawiki/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10"
  • ES for logging - Logstash {"@source":"file://chenryn-Lenovo/home/chenryn/test.txt", "@tags":[], "@fields":{ "clientip":["10.2.21.130"], "ident":["-"], "auth":["-"], "timestamp":["08/Apr/2013:11:13:40 +0800"], "verb":["GET"], "request":["/mediawiki/load.php"], "httpversion":["1.1"], "response":["304"], "referrer":[""http://som.d.xiaonei.com/mediawiki/index.php""], "agent":[""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10""] }, "@timestamp":"2013-04-08T03:34:37.959Z", "@source_host":"chenryn-Lenovo", "@source_path":"/home/chenryn/test.txt", "@message":"10.2.21.130 - - [08/Apr/2013:11:13:40 +0800] "GET /mediawiki/load.php HTTP/1.1" 304 - "http://som.d.xiaonei.com/mediawiki/index.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10"", "@type":"apache“ }
  • ES for logging - Logstash "properties" : { "@fields" : { "dynamic" : "true", "properties" : { "client" : { "type" : "string", "index" : "not_analyzed“ }, "size" : { "type" : "long", "index" : "not_analyzed“ }, "status" : { "type" : "string", "index" : "not_analyzed“ }, "upstreamtime" : { "type" : "double“ }, } },
  • ES for logging - Kibana
  • ES for logging – Message::Passing • Logstash port to Perl5 • 17 CPAN modules
  • ES for logging – Message::Passing use Message::Passing::DSL; run_message_server message_chain { output elasticsearch => ( class => 'ElasticSearch', elasticsearch_servers => ['127.0.0.1:9200'], ); filter regexp => ( class => 'Regexp', format => ':nginxaccesslog', capture => [qw( ts status remotehost url oh responsetime upstreamtime bytes )] output_to => 'elasticsearch', ); filter tologstash => ( class => 'ToLogstash', output_to => 'regexp', ); input file => ( class => 'FileTail', output_to => ‘tologstash', ); };
  • Message::Passing vs Logstash 100_000 lines nginx access log logstash::output::elasticsearch_http (default) 4m30.013s logstash::output::elasticsearch_http (flush_size => 1000) 3m41.657s message::passing::filter::regexp (v0.01 call $self->_regex->regexp() everyline) 1m22.519s message::passing::filter::regexp (v0.04 store $self->_regex->regexp() to $self->_re) 0m44.606s
  • D::P::Elasticsearch & D::P::Ajax
  • Build Website using PerlDancer get '/' => require_role SOM => sub { my $indices = elsearch->cluster_state->{routing_table}->{indices}; template 'psa/map', { providers => [ sort keys %$default_provider ], datasources => [ grep { /^$index_prefix/ && s/$index_prefix// } keys %$indices ], inputfrom => strftime( "%FT%T", localtime( time() - 864000 ) ), inputto => strftime( "%FT%T", localtime() ), }; }; ajax '/api/area' => sub { my $param = from_json( request->body ); my $index = $index_prefix . $param->{'datasource'}; my $limit = $param->{'limit'} || 50; my $from = $param->{'from'} || 'now-10d'; my $to = $param->{'to'} || 'now'; my $res = pct_terms( $index, $limit, $from, $to ); return to_json($res); };
  • use Dancer ‘:syntax’; get '/' => require_role SOM => sub { my $indices = elsearch->cluster_state->{routing_table}->{indices}; template 'psa/map', { providers => [ sort keys %$default_provider ], datasources => [ grep { /^$index_prefix/ && s/$index_prefix// } keys %$indices ], inputfrom => strftime( "%FT%T", localtime( time() - 864000 ) ), inputto => strftime( "%FT%T", localtime() ), }; }; ajax '/api/area' => sub { my $param = from_json( request->body ); my $index = $index_prefix . $param->{'datasource'}; my $limit = $param->{'limit'} || 50; my $from = $param->{'from'} || 'now-10d'; my $to = $param->{'to'} || 'now'; my $res = pct_terms( $index, $limit, $from, $to ); return to_json($res); };
  • use Dancer::Plugin::Auth::Extensible; get '/' => require_role SOM => sub { my $indices = elsearch->cluster_state->{routing_table}->{indices}; template 'psa/map', { providers => [ sort keys %$default_provider ], datasources => [ grep { /^$index_prefix/ && s/$index_prefix// } keys %$indices ], inputfrom => strftime( "%FT%T", localtime( time() - 864000 ) ), inputto => strftime( "%FT%T", localtime() ), }; }; ajax '/api/area' => sub { my $param = from_json( request->body ); my $index = $index_prefix . $param->{'datasource'}; my $limit = $param->{'limit'} || 50; my $from = $param->{'from'} || 'now-10d'; my $to = $param->{'to'} || 'now'; my $res = pct_terms( $index, $limit, $from, $to ); return to_json($res); };
  • use Dancer::Plugin::Ajax; get '/' => require_role SOM => sub { my $indices = elsearch->cluster_state->{routing_table}->{indices}; template 'psa/map', { providers => [ sort keys %$default_provider ], datasources => [ grep { /^$index_prefix/ && s/$index_prefix// } keys %$indices ], inputfrom => strftime( "%FT%T", localtime( time() - 864000 ) ), inputto => strftime( "%FT%T", localtime() ), }; }; ajax '/api/area' => sub { my $param = from_json( request->body ); my $index = $index_prefix . $param->{'datasource'}; my $limit = $param->{'limit'} || 50; my $from = $param->{'from'} || 'now-10d'; my $to = $param->{'to'} || 'now'; my $res = pct_terms( $index, $limit, $from, $to ); return to_json($res); };
  • use Dancer::Plugin::ElasticSearch; get '/' => require_role SOM => sub { my $indices = elsearch->cluster_state->{routing_table}->{indices}; template 'psa/map', { providers => [ sort keys %$default_provider ], datasources => [ grep { /^$index_prefix/ && s/$index_prefix// } keys %$indices ], inputfrom => strftime( "%FT%T", localtime( time() - 864000 ) ), inputto => strftime( "%FT%T", localtime() ), }; }; ajax '/api/area' => sub { my $param = from_json( request->body ); my $index = $index_prefix . $param->{'datasource'}; my $limit = $param->{'limit'} || 50; my $from = $param->{'from'} || 'now-10d'; my $to = $param->{'to'} || 'now'; my $res = pct_terms( $index, $limit, $from, $to ); return to_json($res); };
  • use Dancer::Plugin::ElasticSearch; sub area_terms { my ( $index, $level, $limit, $from, $to ) = @_; my $data = elsearch->search( index => $index, type => $type, facets => { area => { facet_filter => { and => [ { range => { date => { from => $from, to => $to } } }, { numeric_range => { timeCost => { gte => $level } } }, ], }, terms => { field => "fromArea", size => $limit, } } } ); return $data->{facets}->{area}->{terms}; }
  • ES for monitor – oculus(Etsy Kale) • Kale to detect anomalous metrics and see if any other metrics look similar. • http://codeascraft.com/2013/06/11/introd ucing-kale/
  • ES for monitor – oculus(Etsy Kale) • Kale to detect anomalous metrics and see if any other metrics look similar. • https://github.com/etsy/skyline
  • ES for monitor – oculus(Etsy Kale) • Kale to detect anomalous metrics and see if any other metrics look similar. • https://github.com/etsy/oculus
  • ES for monitor – oculus(Etsy Kale) • import monitor data from redis/ganglia to elasticsearch • Using native script to calculate distance: script.native: oculus_euclidian.type: com.etsy.oculus.tsscorers.EuclidianScriptFactory oculus_dtw.type: com.etsy.oculus.tsscorers.DTWScriptFactory
  • ES for monitor – oculus(Etsy Kale) • https://speakerdeck.com/astanway/bring-the-noise- continuously-deploying-under-a-hailstorm-of-metrics
  • VBox example • apt-get install -y git cpanminus virtualbox • cpanm Rex • git clone https://github.com/chenryn/esdevops • cd esdevops • rex init --name esdevops