SlideShare a Scribd company logo
Perl & Elasticsearch:
Jumping on the
bandwagon.
Me Dean Hamstead
dean@bytefoundry.com.au
Primary Usage:
Pretty graphs
generated live,
from log’s
In most cases, you will be
asked to feed logs into an
Elasticsearch database.
Then make dashboards
with charts and graphs.
At the heart of Elasticsearch is
Apache Lucene
Elasticsearch uses Lucene as its
text indexer.
What it adds is an ability to scale
horizontally with relative ease.
It also adds a comprehensive
RESTful JSON interface.
Should I use Elasticsearch?
De-normalized data?
Don’t need transactions?
Willing to fight with Java Runtime
Environment?
Maybe.
Need lots of data types?
Join queries?
Referential integrity?
100’s GB data only?
Access control?
Probably not.
Terminology
Roughly equivalent terms...
MySQL Elasticsearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping/Templates
Index Everything is indexed
SQL Query DSL
SELECT * FROM table … GET http://…
UPDATE table SET … PUT http://…
The ELK Stack
A “Stack” with an memorable acronym? Management will love it!
Elasticsearch Logstash
The actual database
software. It’s written in
Java, which explains many
of its quirks.
A log tailer in Java. It’s
performance is appalling.
Don’t waste any time on it.
Kibana
This is a Web frontend to
Kibana, from searches to
graphs and dashboards.
It’s node.js and js heavy.
Use Rsyslog instead of
Logstash - IMO it’s pointless to
write logs to file then slurp
them back in.
Amazingly performant and
flexible. Ostensibly much
better than Syslog-ng.
Stay sane by using RainerScript
for config, eliminating all
legacy style syslog config.
Old versions OK on local
machines, but “Syslog servers”
should run the latest 8.x
If you’re looking for more of an “all
in one” solution, you might find
graylog to be a good fit.
It can use elasticsearch under the
hood to power it’s searches.
Give it a go, let me know how things
work out?
Elasticsearch
Basic Cluster
Data nodes store your data
(Eligible) Master nodes
maintain a map of where
data is.
Types of Elasticsearch nodes
Role node.master = node.data =
Eligible master true false
Data false true
Query false false
Dev-only true true
Also, Tribe nodes are a thing.
Comprehensive Elasticsearch Cluster
Interesting properties of Elasticsearch
A wildcard can be used in the index part of a query
This feature is a key part of using Elasticsearch effectively
Aliases are used to reference one or more indexes
Multiple changes to aliases can (and should) be grouped into one REST command -
which Elasticsearch executes in an Atomic fashion
A template explicitly defines the mapping (schema) of data for yet to be created schemas.
A regular expression is used to match against data insertions referencing an index
name which does not exist. It is subsequently created
Templates also include other index properties
Such as aliases that a new index should be automatically be made a part of
An Index can be closed without deleting it
It becomes unusable until it is opened again. However it is out of memory and sitting on
disk ready to go
Schemaless,
NoSQL?
Elasticsearch queries are
made with JSON in RESTful
http/s. So it’s not SQL.
If no index exists, it will be
created on data insertion. If
no template is defined,
Elasticsearch will guess at the
mapping.
Turn this off, always define a
template for every index.
Tips for server hardware selection & OS configuration
● 30GB of RAM for each Elasticsearch instance (beyond this the JVM slows down)
● +25% RAM for OS. 48GB total is a good number
● Use RAID0 (striping) or no RAID on disks. Elasticsearch will ensure data is preserved via
replication
● Spinning disks have yet to be a bottleneck for me. Scale out rather than up. YMMV
● Turn off Transparent Huge Pages - generally a good idea on any and all servers
● Configure Elasticsearch’s JVM to huge Hugepages directly
● By default, Linux IO is tuned to run as poorly as possible (even set these on your laptop/desktop)
○ echo 1024 > /sys/block/sda/queue/nr_requests (maybe more, benchmark to taste)
○ blockdev --setra 16384 /dev/sda
○ Use XFS with mount options like: rw,nobarrier,logbufs=8,inode64,logbsize=256k (XFS rocks)
○ Don’t use partitions, just format the disk as is (mkfs -t xfs /dev/sdb). XFS will automatically
pick the perfect block alignment
○ echo 0 > /sys/block/sda/queue/add_random (exclude the disk as a source of entropy)
● In iptables, it’s generally a good idea to disable connection tracking on the service ports (assuming
you have no outbound rules). This saves on CPU time and avoids filling the connection state table
● Use the same JVM on all nodes. Either Oracle Java or OpenJDK are fine, pick one and don’t mix
Tips for Tuning Elasticsearch
● Elasticsearch default settings are for a read heavy load
● There are lots and lots of settings, & lots and lots of blogs talking about how people have tuned their
clusters.
● Blogs can be very helpful to find which combination of settings will be right for you
● Be careful with anything referencing Elasticsearch before 2.0, ignore anything before 1.0. Things
have changed too much
● Note above every setting in your config file a small blurb about what it does and why you have set
that setting. This will help you remember “why on earth did I think that was a good setting??”
● The Elasticsearch official documentation is very very good. Take the time to read what each setting
does before you attempt to change it (or if that that setting still exists in the version you are running)
● Increase settings by small amounts and observe if performance improves
● Having a setting too high or too low can both reduce performance - you’re trying to find the sweet
spot
● More replicas can help read heavy loads if you have more nodes for them to run on, more shards
can too. However, shards cannot be changed after an index is created, replicas you can change at
any time
● More indexes plus more nodes can help write heavy loads
● Don’t run queries against data nodes
Elasticsearch lets you scale
horizontally, so you have to actually
scale your work load horizontally…
but without overwhelming your
cluster.
Achieving peak performance in
Elasticsearch is a balancing game
of server settings, indexing strategy
and well conceived queries.
Different workloads will require
retuning your cluster.
Degrading and Deleting Data
Elasticsearch is not intended to be a data warehouse.
Design a policy which degrades then eventually deletes your data
Degrades? Reduce the number of replicas, move data to nodes with slower
disks, eventually close the index
Delete data? If you’re using date stamped index named, just drop the index.
Records can also be created with a TTL
Degrading and Deleting Data (continued)
Your policy is implemented via cron tasks, only TTL expiry of records is inbuilt
Curator is the stock tool for this. es-daily-index-maintenance.pl from
App::ElasticSearch::Utilities is better IMO
Put them all in a single file like /etc/cron.d/elasticsearch so you can keep track
of them. Or maybe several cron.d files.
Aliases are also very helpful, as Elasticsearch will add indexes to them when
created, if the template defines it. You can then use the cron job to remove
older indexes etc.
Single Node
Development
Environment
A single node is a perfectly valid Elasticsearch
cluster. Although, it’s not really suitable for
production it’s perfectly fine for development use.
The node is configured to be a master node and a
data node, with the number of expected masters
also set to 1
For all indexes, shards = 1, replicas = 1
Use upto 30GB of RAM - you will probably be using
less. Don’t worry too much about tuning, dedicated
disks etc.
Elasticsearch is packages for deb, rpm etc. And
only a few settings need changing to get running. Or
chose one of the many Vagrant or similar install
methods available online.
Now about Perl
Just use Search::Elasticsearch;
Don’t be tempted to craft JSON and GET/POST yourself
JSON queries translate nicely into Perl data structures, but are much much less
annoying (trailing commas don’t matter)
Search::Elasticsearch takes care of connection pooling, proper
serialization/deserialization, scrolling, and makes bulk requests very easy.
Search::Elasticsearch 2.03 includes
support for 0.9, 1.0 and 2.0 series
clusters.
They’re still available by installing
their ::Client modules directly:
Search::Elasticsearch::Client::0_90,
Search::Elasticsearch::Client::1_0 or
Search::Elasticsearch::Client::2_0
Search::Elasticsearch 5.01
dropped support for pre
Elasticsearch 5.0 from the main
tar ball
Connecting to Elasticsearch
Explicitly connect to a single server
Provide a number of servers, which the client will RR between (i.e. query
nodes)
Provide a single hostname, and have the client Sniff out the rest of the
cluster. Which it will RR between.
Connecting to Elasticsearch (straight from the Pod)
use Search::Elasticsearch;
# Connect to localhost:9200:
my $e = Search::Elasticsearch->new();
# Round-robin between two nodes:
my $e = Search::Elasticsearch->new(
nodes => [
'search1:9200',
'search2:9200'
]
);
# Connect to cluster at search1:9200, sniff all nodes and round-robin between them:
my $e = Search::Elasticsearch->new(
nodes => 'search1:9200',
cxn_pool => 'Sniff'
);
Insert something, retrieve it again
Really basic stuff...
Some basics
# Index a document:
$e->index(
index => 'my_app',
type => 'blog_post',
id => 1,
body => {
title => 'Elasticsearch clients',
content => 'Interesting content...',
date => '2013-09-24'
}
);
# Get the document:
my $doc = $e->get(
index => 'my_app',
type => 'blog_post',
id => 1
);
Searching
Just a simple example to get started...
Searching
# Search:
my $results = $e->search(
index => 'my_app',
body => {
query => {
match => { title => 'elasticsearch' }
}
}
);
Cluster Status, Other stuff
Administrative type functions are also all available...
Cluster Status, Other stuff
# Cluster status requests:
$info = $e->cluster->info;
$health = $e->cluster->health;
$node_stats = $e->cluster->node_stats;
# Index admin. requests:
$e->indices->create(index=>'my_index');
$e->indices->delete(index=>'my_index');
Scrolled Search Results
Elasticsearch has a limit to how many results it will return (which is a setting
you can change, but has side effects)
Like the cursor function in an SQL database, Scrolled Search has the client work
with the server to return results in small chunks.
Search::Elasticsearch takes care of all the details and makes it almost
transparent.
Scrolled Search (like a cursor in SQL)
my $es = Search::Elasticsearch->new;
my $scroll = $es->scroll_helper(
index => 'my_index',
body => {
query => {...},
size => 1000, # chunk size
sort => '_doc'
}
);
say "Total hits: ". $scroll->total;
while (my $doc = $scroll->next) {
# do something
}
Bulk Functions
RESTful HTTP/s has a lot of overheads and adds a lot of latency. Inserting one
record per HTTP request will almost certainly never keep up with your logs.
Bulk requests allow more than one action at a time for each HTTP request.
Search::Elasticsearch makes this very very easily. You push actions into the
$bulk object, and it will flush them based on your parameters or when explicitly
asked. Callbacks hooks are also provided
(Elasticsearch used to have a UDP data insert feature. It’s gone now)
Bulk Functions
my $es = Search::Elasticsearch->new;
my $bulk = $es->bulk_helper(
index => 'my_index',
type => 'my_type'
);
# Index docs:
$bulk->index({ id => 1, source => { foo => 'bar' }});
$bulk->add_action( index => { id => 1, source => { foo=> 'bar' }});
# Create docs:
$bulk->create({ id => 1, source => { foo => 'bar' }});
$bulk->add_action( create => { id => 1, source => { foo=> 'bar' }});
$bulk->create_docs({ foo => 'bar' })
Bulk Functions (continued)
# on_success callback, called for every action that succeeds
my $bulk = $es->bulk_helper(
on_success => sub {
my ($action,$response,$i) = @_;
# do something
},
);
# on_conflict callback, called for every conflict
my $bulk = $es->bulk_helper(
on_conflict => sub {
my ($action,$response,$i,$version) = @_;
# do something
},
);
# on_error callback, called for every error
my $bulk = $es->bulk_helper(
on_error => sub {
my ($action,$response,$i) = @_;
# do something
Search::Elasticsearch takes care of
connection pooling - so no load
balancer is required.
It makes Scrolled Searches easy
and almost transparent.
It makes Bulk functions amazingly
easy.
It makes use of several HTTP
clients, picking the “best” one
available on the fly.
It’s awesome! Don’t bother with DIY
More Awesomes...
App::ElasticSearch::Utilities - very useful CLI/cron tools for managing
Elasticsearch
Dancer2::Plugin::ElasticSearch - Dancer 2 plugin
Dancer::Plugin::ElasticSearch - Dancer plugin (uses older perl ElasticSearch
library)
Catalyst::Model::Search::ElasticSearch - Catalyst Model
Note: CPAN has lots of ElasticSearch, but Elasticsearch is the correct capitalization
More on query’s...
Non-search Query
Parameters
All the things you might expect…
...plus many many more!
my $res = $e->search(
index => ‘mydata-*’, # wildcards allowed
body => {
query => { .. }, # search query
},
from => 0, # first result to return
size => 10_000, # no. of results to return
sort => [ # sort results by
{ "@timestamp" => {"order" => "asc"}},
"srcport",
{ "ipv4" => "desc" }, ],
# we don’t want e/s to send us the raw original data
_source => 0,
# which fields we want returned
fields => [ 'ipv4', 'srcport', '@timestamp' ]
);
More on Queries
Wildcard queries
What you would expect
Regexp queries
Also, what you would expect
query => {
wildcard => { user => "ki*y" }
}
query => {
regexp => {
"name.first" => "s.*y"
}
}
More on Queries
Range query
Used with numeric and date field
types
query => {
range => { # range query
age => { # field
gte => 10, # greater than
lte => 20, # less than
}
}
}
query => {
range => {
date => { # ranges for dates can be date math
gte => "now-1d/d", # /d rounds to the day
Lt => "now/d"
"time_zone" => "+01:00" # optional
}
}
}
More on Queries
Exists query
Exists literally the same meaning
as in perl
Bool query
There’s a lot too this, I will just
touch on it
query => {
exists => { field => "user" }
}
query => {
bool => {
must => [ # basically AND
{ exists => { field => 'ipv4' } },
{ exists => { field => 'srcport' } },
{ missing => { field => 'natv4' } }, # opposite of exists
]
}
}
Effective queries rely on good mappings
A mapping is the schema
You can create an empty index with the mapping you define
Or, an index can be automatically created on insert, with a mapping based
upon a matching template
The more you can break you data up into fields with a native datatype, the
better Elasticsearch can serve results and the more you can make use of
datetype specific functionality (date math for example)
Core Datatypes
The basics
String
● text and keyword
Numeric datatypes
● long, integer, short, byte, double, float
Date datatype
● date
Boolean datatype
● boolean
Binary datatype
● binary
Complex
Datatypes
Objects and things
Array datatype
● (Array support does not require a dedicated type)
Object datatype
● object for single JSON objects
Nested datatype
● nested for arrays of JSON objects
Geo Datatypes
Fun with maps etc
Geo-point datatype
● geo_point for lat/lon points
Geo-Shape datatype
● geo_shape for complex shapes like polygons
Specialised
Datatypes
You’ll need to read up on a lot of
these.
IP datatype
● ip for IPv4 and IPv6 addresses
Completion datatype
● completion to provide auto-complete suggestions
Token count datatype
● token_count to count the number of tokens in a string
mapper-murmur3
● murmur3 to compute hashes of values at index-time and
store them in the index
Attachment datatype
● See the mapper-attachments plugin which supports indexing
attachments like Microsoft Office formats, Open Document
formats, ePub, HTML, etc. into an attachment datatype.
Percolator type
● Accepts queries from the query-dsl
Summary
● Select sensible hardware (or VM) and tune your OS
● Know your workload and tune Elasticsearch to match
● Rsyslog is amazing, it can talk natively to Elasticsearch and is unbelievably scalable
● Search::Elasticsearch is always the way to go (except perhaps, for trivial shell scripts)
● Break your data up into as many fields as you can
● Use native dataypes and get maximum value using Elasticsearch’s query functions
● More shards and/or more replicas with more servers will increase query performance
● More indexes will increase write performance if you write across them
● Use Index names with date stamps and Aliases to manage data elegantly and efficiently
● Plan how you will degrade then drop data
Thank You!

More Related Content

What's hot

Cost accounting vs financial accounting
Cost accounting vs financial accountingCost accounting vs financial accounting
Cost accounting vs financial accounting
COMMERCEIETS
 
Inventory 1
Inventory 1Inventory 1
Inventory 1
Nico Iswaraputra
 
The nature and purpose of auditing
The nature and purpose of auditingThe nature and purpose of auditing
The nature and purpose of auditing
Syed Ali Gohar Shah Shah
 
The internal auditor can improve an organization reducing cost, enhancing rev...
The internal auditor can improve an organization reducing cost, enhancing rev...The internal auditor can improve an organization reducing cost, enhancing rev...
The internal auditor can improve an organization reducing cost, enhancing rev...Mohammad Wahid Abdullah Khan
 
Intangible Assets, Patents, Copyrights
Intangible Assets, Patents, CopyrightsIntangible Assets, Patents, Copyrights
Intangible Assets, Patents, Copyrights
Advance Business Consulting
 
Merger.pptx
Merger.pptxMerger.pptx
Merger.pptx
erwinamrulloh1
 
Internal audit
Internal auditInternal audit
Akuntansi dan-laporan-keuangan
Akuntansi dan-laporan-keuanganAkuntansi dan-laporan-keuangan
Akuntansi dan-laporan-keuangan
Ima Rosmiati
 
Cost as a source of competitive advaage
Cost as a source of competitive advaageCost as a source of competitive advaage
Cost as a source of competitive advaagenayana bs
 
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRAPROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA
Rutuja Chudnaik
 
Chapter 8
Chapter 8Chapter 8
Chapter 8ysitko2
 
process costing ppt
process  costing pptprocess  costing ppt
process costing ppt
student
 
Module 1- Basics of Costing.ppt
Module 1- Basics of Costing.pptModule 1- Basics of Costing.ppt
Module 1- Basics of Costing.ppt
Mahesh Bendigeri
 
Management accounting
Management accountingManagement accounting
Management accounting
Pinkey Rana
 
Akuntansi internasional (standar akuntansi jerman)
Akuntansi internasional (standar akuntansi jerman)Akuntansi internasional (standar akuntansi jerman)
Akuntansi internasional (standar akuntansi jerman)riansaputro
 
Regulatory Framework Chapter 02
Regulatory Framework Chapter 02Regulatory Framework Chapter 02
Regulatory Framework Chapter 02sajeel
 
PPT SPM KEL 11 ORGANISASI JASA.pptx
PPT SPM KEL 11 ORGANISASI JASA.pptxPPT SPM KEL 11 ORGANISASI JASA.pptx
PPT SPM KEL 11 ORGANISASI JASA.pptx
HwaniePani
 
Objectives of transfer pricing
Objectives of transfer pricingObjectives of transfer pricing
Objectives of transfer pricing
vincent konadu
 
basic-accounting.pptx
basic-accounting.pptxbasic-accounting.pptx
basic-accounting.pptx
JayeshPashte
 
Pusat laba
Pusat labaPusat laba
Pusat laba
Eds last
 

What's hot (20)

Cost accounting vs financial accounting
Cost accounting vs financial accountingCost accounting vs financial accounting
Cost accounting vs financial accounting
 
Inventory 1
Inventory 1Inventory 1
Inventory 1
 
The nature and purpose of auditing
The nature and purpose of auditingThe nature and purpose of auditing
The nature and purpose of auditing
 
The internal auditor can improve an organization reducing cost, enhancing rev...
The internal auditor can improve an organization reducing cost, enhancing rev...The internal auditor can improve an organization reducing cost, enhancing rev...
The internal auditor can improve an organization reducing cost, enhancing rev...
 
Intangible Assets, Patents, Copyrights
Intangible Assets, Patents, CopyrightsIntangible Assets, Patents, Copyrights
Intangible Assets, Patents, Copyrights
 
Merger.pptx
Merger.pptxMerger.pptx
Merger.pptx
 
Internal audit
Internal auditInternal audit
Internal audit
 
Akuntansi dan-laporan-keuangan
Akuntansi dan-laporan-keuanganAkuntansi dan-laporan-keuangan
Akuntansi dan-laporan-keuangan
 
Cost as a source of competitive advaage
Cost as a source of competitive advaageCost as a source of competitive advaage
Cost as a source of competitive advaage
 
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRAPROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
process costing ppt
process  costing pptprocess  costing ppt
process costing ppt
 
Module 1- Basics of Costing.ppt
Module 1- Basics of Costing.pptModule 1- Basics of Costing.ppt
Module 1- Basics of Costing.ppt
 
Management accounting
Management accountingManagement accounting
Management accounting
 
Akuntansi internasional (standar akuntansi jerman)
Akuntansi internasional (standar akuntansi jerman)Akuntansi internasional (standar akuntansi jerman)
Akuntansi internasional (standar akuntansi jerman)
 
Regulatory Framework Chapter 02
Regulatory Framework Chapter 02Regulatory Framework Chapter 02
Regulatory Framework Chapter 02
 
PPT SPM KEL 11 ORGANISASI JASA.pptx
PPT SPM KEL 11 ORGANISASI JASA.pptxPPT SPM KEL 11 ORGANISASI JASA.pptx
PPT SPM KEL 11 ORGANISASI JASA.pptx
 
Objectives of transfer pricing
Objectives of transfer pricingObjectives of transfer pricing
Objectives of transfer pricing
 
basic-accounting.pptx
basic-accounting.pptxbasic-accounting.pptx
basic-accounting.pptx
 
Pusat laba
Pusat labaPusat laba
Pusat laba
 

Viewers also liked

Error(s) Free Programming
Error(s) Free ProgrammingError(s) Free Programming
Error(s) Free Programming
Dave Cross
 
OpenWRT and Perl
OpenWRT and PerlOpenWRT and Perl
OpenWRT and Perl
Dean Hamstead
 
Getting modern with logging via log4perl
Getting modern with logging via log4perlGetting modern with logging via log4perl
Getting modern with logging via log4perl
Dean Hamstead
 
HTML::FormFu talk for Sydney PM
HTML::FormFu talk for Sydney PMHTML::FormFu talk for Sydney PM
HTML::FormFu talk for Sydney PM
Dean Hamstead
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
OpenThink Labs
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
Petar Djekic
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
Andreas Chatzakis
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)MongoSF
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Brendan Gregg
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Sematext Group, Inc.
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Konstantin Gredeskoul
 
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...
Amazon Web Services
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Rafał Kuć
 
大規模分散システムの現在 -- Twitter
大規模分散システムの現在 -- Twitter大規模分散システムの現在 -- Twitter
大規模分散システムの現在 -- Twitter
maruyama097
 
mruby を C# に 組み込んでみる
mruby を C# に 組み込んでみるmruby を C# に 組み込んでみる
mruby を C# に 組み込んでみる
Ryosuke Akiyama
 

Viewers also liked (16)

Error(s) Free Programming
Error(s) Free ProgrammingError(s) Free Programming
Error(s) Free Programming
 
OpenWRT and Perl
OpenWRT and PerlOpenWRT and Perl
OpenWRT and Perl
 
Getting modern with logging via log4perl
Getting modern with logging via log4perlGetting modern with logging via log4perl
Getting modern with logging via log4perl
 
HTML::FormFu talk for Sydney PM
HTML::FormFu talk for Sydney PMHTML::FormFu talk for Sydney PM
HTML::FormFu talk for Sydney PM
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)Enterprise Architectures with Ruby (and Rails)
Enterprise Architectures with Ruby (and Rails)
 
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
大規模分散システムの現在 -- Twitter
大規模分散システムの現在 -- Twitter大規模分散システムの現在 -- Twitter
大規模分散システムの現在 -- Twitter
 
mruby を C# に 組み込んでみる
mruby を C# に 組み込んでみるmruby を C# に 組み込んでみる
mruby を C# に 組み込んでみる
 

Similar to Perl and Elasticsearch

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
Neil Baker
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
Dave Stokes
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
Dave Stokes
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Shagun Rathore
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Ehsan Asgarian
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Divij Sehgal
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
Fred de Villamil
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
Audible, Inc.
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
Carol McDonald
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosql
David Daeschler
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testingsmittal81
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
Showmax Engineering
 
Elastic search
Elastic searchElastic search
Elastic search
Binit Pathak
 
Elastic search
Elastic searchElastic search
Elastic search
Mahmoud91Tx
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
Ruby Shrestha
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Elastic search
Elastic searchElastic search
Elastic search
NexThoughts Technologies
 

Similar to Perl and Elasticsearch (20)

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Scaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosqlScaling opensimulator inventory using nosql
Scaling opensimulator inventory using nosql
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Elastic search
Elastic searchElastic search
Elastic search
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Perl and Elasticsearch

  • 1. Perl & Elasticsearch: Jumping on the bandwagon.
  • 3. Primary Usage: Pretty graphs generated live, from log’s In most cases, you will be asked to feed logs into an Elasticsearch database. Then make dashboards with charts and graphs.
  • 4. At the heart of Elasticsearch is Apache Lucene Elasticsearch uses Lucene as its text indexer. What it adds is an ability to scale horizontally with relative ease. It also adds a comprehensive RESTful JSON interface.
  • 5. Should I use Elasticsearch? De-normalized data? Don’t need transactions? Willing to fight with Java Runtime Environment? Maybe. Need lots of data types? Join queries? Referential integrity? 100’s GB data only? Access control? Probably not.
  • 6. Terminology Roughly equivalent terms... MySQL Elasticsearch Database Index Table Type Row Document Column Field Schema Mapping/Templates Index Everything is indexed SQL Query DSL SELECT * FROM table … GET http://… UPDATE table SET … PUT http://…
  • 7. The ELK Stack A “Stack” with an memorable acronym? Management will love it! Elasticsearch Logstash The actual database software. It’s written in Java, which explains many of its quirks. A log tailer in Java. It’s performance is appalling. Don’t waste any time on it. Kibana This is a Web frontend to Kibana, from searches to graphs and dashboards. It’s node.js and js heavy.
  • 8. Use Rsyslog instead of Logstash - IMO it’s pointless to write logs to file then slurp them back in. Amazingly performant and flexible. Ostensibly much better than Syslog-ng. Stay sane by using RainerScript for config, eliminating all legacy style syslog config. Old versions OK on local machines, but “Syslog servers” should run the latest 8.x
  • 9. If you’re looking for more of an “all in one” solution, you might find graylog to be a good fit. It can use elasticsearch under the hood to power it’s searches. Give it a go, let me know how things work out?
  • 10. Elasticsearch Basic Cluster Data nodes store your data (Eligible) Master nodes maintain a map of where data is.
  • 11. Types of Elasticsearch nodes Role node.master = node.data = Eligible master true false Data false true Query false false Dev-only true true Also, Tribe nodes are a thing.
  • 13. Interesting properties of Elasticsearch A wildcard can be used in the index part of a query This feature is a key part of using Elasticsearch effectively Aliases are used to reference one or more indexes Multiple changes to aliases can (and should) be grouped into one REST command - which Elasticsearch executes in an Atomic fashion A template explicitly defines the mapping (schema) of data for yet to be created schemas. A regular expression is used to match against data insertions referencing an index name which does not exist. It is subsequently created Templates also include other index properties Such as aliases that a new index should be automatically be made a part of An Index can be closed without deleting it It becomes unusable until it is opened again. However it is out of memory and sitting on disk ready to go
  • 14. Schemaless, NoSQL? Elasticsearch queries are made with JSON in RESTful http/s. So it’s not SQL. If no index exists, it will be created on data insertion. If no template is defined, Elasticsearch will guess at the mapping. Turn this off, always define a template for every index.
  • 15. Tips for server hardware selection & OS configuration ● 30GB of RAM for each Elasticsearch instance (beyond this the JVM slows down) ● +25% RAM for OS. 48GB total is a good number ● Use RAID0 (striping) or no RAID on disks. Elasticsearch will ensure data is preserved via replication ● Spinning disks have yet to be a bottleneck for me. Scale out rather than up. YMMV ● Turn off Transparent Huge Pages - generally a good idea on any and all servers ● Configure Elasticsearch’s JVM to huge Hugepages directly ● By default, Linux IO is tuned to run as poorly as possible (even set these on your laptop/desktop) ○ echo 1024 > /sys/block/sda/queue/nr_requests (maybe more, benchmark to taste) ○ blockdev --setra 16384 /dev/sda ○ Use XFS with mount options like: rw,nobarrier,logbufs=8,inode64,logbsize=256k (XFS rocks) ○ Don’t use partitions, just format the disk as is (mkfs -t xfs /dev/sdb). XFS will automatically pick the perfect block alignment ○ echo 0 > /sys/block/sda/queue/add_random (exclude the disk as a source of entropy) ● In iptables, it’s generally a good idea to disable connection tracking on the service ports (assuming you have no outbound rules). This saves on CPU time and avoids filling the connection state table ● Use the same JVM on all nodes. Either Oracle Java or OpenJDK are fine, pick one and don’t mix
  • 16. Tips for Tuning Elasticsearch ● Elasticsearch default settings are for a read heavy load ● There are lots and lots of settings, & lots and lots of blogs talking about how people have tuned their clusters. ● Blogs can be very helpful to find which combination of settings will be right for you ● Be careful with anything referencing Elasticsearch before 2.0, ignore anything before 1.0. Things have changed too much ● Note above every setting in your config file a small blurb about what it does and why you have set that setting. This will help you remember “why on earth did I think that was a good setting??” ● The Elasticsearch official documentation is very very good. Take the time to read what each setting does before you attempt to change it (or if that that setting still exists in the version you are running) ● Increase settings by small amounts and observe if performance improves ● Having a setting too high or too low can both reduce performance - you’re trying to find the sweet spot ● More replicas can help read heavy loads if you have more nodes for them to run on, more shards can too. However, shards cannot be changed after an index is created, replicas you can change at any time ● More indexes plus more nodes can help write heavy loads ● Don’t run queries against data nodes
  • 17. Elasticsearch lets you scale horizontally, so you have to actually scale your work load horizontally… but without overwhelming your cluster. Achieving peak performance in Elasticsearch is a balancing game of server settings, indexing strategy and well conceived queries. Different workloads will require retuning your cluster.
  • 18. Degrading and Deleting Data Elasticsearch is not intended to be a data warehouse. Design a policy which degrades then eventually deletes your data Degrades? Reduce the number of replicas, move data to nodes with slower disks, eventually close the index Delete data? If you’re using date stamped index named, just drop the index. Records can also be created with a TTL
  • 19. Degrading and Deleting Data (continued) Your policy is implemented via cron tasks, only TTL expiry of records is inbuilt Curator is the stock tool for this. es-daily-index-maintenance.pl from App::ElasticSearch::Utilities is better IMO Put them all in a single file like /etc/cron.d/elasticsearch so you can keep track of them. Or maybe several cron.d files. Aliases are also very helpful, as Elasticsearch will add indexes to them when created, if the template defines it. You can then use the cron job to remove older indexes etc.
  • 20. Single Node Development Environment A single node is a perfectly valid Elasticsearch cluster. Although, it’s not really suitable for production it’s perfectly fine for development use. The node is configured to be a master node and a data node, with the number of expected masters also set to 1 For all indexes, shards = 1, replicas = 1 Use upto 30GB of RAM - you will probably be using less. Don’t worry too much about tuning, dedicated disks etc. Elasticsearch is packages for deb, rpm etc. And only a few settings need changing to get running. Or chose one of the many Vagrant or similar install methods available online.
  • 21. Now about Perl Just use Search::Elasticsearch; Don’t be tempted to craft JSON and GET/POST yourself JSON queries translate nicely into Perl data structures, but are much much less annoying (trailing commas don’t matter) Search::Elasticsearch takes care of connection pooling, proper serialization/deserialization, scrolling, and makes bulk requests very easy.
  • 22. Search::Elasticsearch 2.03 includes support for 0.9, 1.0 and 2.0 series clusters. They’re still available by installing their ::Client modules directly: Search::Elasticsearch::Client::0_90, Search::Elasticsearch::Client::1_0 or Search::Elasticsearch::Client::2_0 Search::Elasticsearch 5.01 dropped support for pre Elasticsearch 5.0 from the main tar ball
  • 23. Connecting to Elasticsearch Explicitly connect to a single server Provide a number of servers, which the client will RR between (i.e. query nodes) Provide a single hostname, and have the client Sniff out the rest of the cluster. Which it will RR between.
  • 24. Connecting to Elasticsearch (straight from the Pod) use Search::Elasticsearch; # Connect to localhost:9200: my $e = Search::Elasticsearch->new(); # Round-robin between two nodes: my $e = Search::Elasticsearch->new( nodes => [ 'search1:9200', 'search2:9200' ] ); # Connect to cluster at search1:9200, sniff all nodes and round-robin between them: my $e = Search::Elasticsearch->new( nodes => 'search1:9200', cxn_pool => 'Sniff' );
  • 25. Insert something, retrieve it again Really basic stuff...
  • 26. Some basics # Index a document: $e->index( index => 'my_app', type => 'blog_post', id => 1, body => { title => 'Elasticsearch clients', content => 'Interesting content...', date => '2013-09-24' } ); # Get the document: my $doc = $e->get( index => 'my_app', type => 'blog_post', id => 1 );
  • 27. Searching Just a simple example to get started...
  • 28. Searching # Search: my $results = $e->search( index => 'my_app', body => { query => { match => { title => 'elasticsearch' } } } );
  • 29. Cluster Status, Other stuff Administrative type functions are also all available...
  • 30. Cluster Status, Other stuff # Cluster status requests: $info = $e->cluster->info; $health = $e->cluster->health; $node_stats = $e->cluster->node_stats; # Index admin. requests: $e->indices->create(index=>'my_index'); $e->indices->delete(index=>'my_index');
  • 31. Scrolled Search Results Elasticsearch has a limit to how many results it will return (which is a setting you can change, but has side effects) Like the cursor function in an SQL database, Scrolled Search has the client work with the server to return results in small chunks. Search::Elasticsearch takes care of all the details and makes it almost transparent.
  • 32. Scrolled Search (like a cursor in SQL) my $es = Search::Elasticsearch->new; my $scroll = $es->scroll_helper( index => 'my_index', body => { query => {...}, size => 1000, # chunk size sort => '_doc' } ); say "Total hits: ". $scroll->total; while (my $doc = $scroll->next) { # do something }
  • 33. Bulk Functions RESTful HTTP/s has a lot of overheads and adds a lot of latency. Inserting one record per HTTP request will almost certainly never keep up with your logs. Bulk requests allow more than one action at a time for each HTTP request. Search::Elasticsearch makes this very very easily. You push actions into the $bulk object, and it will flush them based on your parameters or when explicitly asked. Callbacks hooks are also provided (Elasticsearch used to have a UDP data insert feature. It’s gone now)
  • 34. Bulk Functions my $es = Search::Elasticsearch->new; my $bulk = $es->bulk_helper( index => 'my_index', type => 'my_type' ); # Index docs: $bulk->index({ id => 1, source => { foo => 'bar' }}); $bulk->add_action( index => { id => 1, source => { foo=> 'bar' }}); # Create docs: $bulk->create({ id => 1, source => { foo => 'bar' }}); $bulk->add_action( create => { id => 1, source => { foo=> 'bar' }}); $bulk->create_docs({ foo => 'bar' })
  • 35. Bulk Functions (continued) # on_success callback, called for every action that succeeds my $bulk = $es->bulk_helper( on_success => sub { my ($action,$response,$i) = @_; # do something }, ); # on_conflict callback, called for every conflict my $bulk = $es->bulk_helper( on_conflict => sub { my ($action,$response,$i,$version) = @_; # do something }, ); # on_error callback, called for every error my $bulk = $es->bulk_helper( on_error => sub { my ($action,$response,$i) = @_; # do something
  • 36. Search::Elasticsearch takes care of connection pooling - so no load balancer is required. It makes Scrolled Searches easy and almost transparent. It makes Bulk functions amazingly easy. It makes use of several HTTP clients, picking the “best” one available on the fly. It’s awesome! Don’t bother with DIY
  • 37. More Awesomes... App::ElasticSearch::Utilities - very useful CLI/cron tools for managing Elasticsearch Dancer2::Plugin::ElasticSearch - Dancer 2 plugin Dancer::Plugin::ElasticSearch - Dancer plugin (uses older perl ElasticSearch library) Catalyst::Model::Search::ElasticSearch - Catalyst Model Note: CPAN has lots of ElasticSearch, but Elasticsearch is the correct capitalization
  • 39. Non-search Query Parameters All the things you might expect… ...plus many many more! my $res = $e->search( index => ‘mydata-*’, # wildcards allowed body => { query => { .. }, # search query }, from => 0, # first result to return size => 10_000, # no. of results to return sort => [ # sort results by { "@timestamp" => {"order" => "asc"}}, "srcport", { "ipv4" => "desc" }, ], # we don’t want e/s to send us the raw original data _source => 0, # which fields we want returned fields => [ 'ipv4', 'srcport', '@timestamp' ] );
  • 40. More on Queries Wildcard queries What you would expect Regexp queries Also, what you would expect query => { wildcard => { user => "ki*y" } } query => { regexp => { "name.first" => "s.*y" } }
  • 41. More on Queries Range query Used with numeric and date field types query => { range => { # range query age => { # field gte => 10, # greater than lte => 20, # less than } } } query => { range => { date => { # ranges for dates can be date math gte => "now-1d/d", # /d rounds to the day Lt => "now/d" "time_zone" => "+01:00" # optional } } }
  • 42. More on Queries Exists query Exists literally the same meaning as in perl Bool query There’s a lot too this, I will just touch on it query => { exists => { field => "user" } } query => { bool => { must => [ # basically AND { exists => { field => 'ipv4' } }, { exists => { field => 'srcport' } }, { missing => { field => 'natv4' } }, # opposite of exists ] } }
  • 43. Effective queries rely on good mappings A mapping is the schema You can create an empty index with the mapping you define Or, an index can be automatically created on insert, with a mapping based upon a matching template The more you can break you data up into fields with a native datatype, the better Elasticsearch can serve results and the more you can make use of datetype specific functionality (date math for example)
  • 44. Core Datatypes The basics String ● text and keyword Numeric datatypes ● long, integer, short, byte, double, float Date datatype ● date Boolean datatype ● boolean Binary datatype ● binary
  • 45. Complex Datatypes Objects and things Array datatype ● (Array support does not require a dedicated type) Object datatype ● object for single JSON objects Nested datatype ● nested for arrays of JSON objects
  • 46. Geo Datatypes Fun with maps etc Geo-point datatype ● geo_point for lat/lon points Geo-Shape datatype ● geo_shape for complex shapes like polygons
  • 47. Specialised Datatypes You’ll need to read up on a lot of these. IP datatype ● ip for IPv4 and IPv6 addresses Completion datatype ● completion to provide auto-complete suggestions Token count datatype ● token_count to count the number of tokens in a string mapper-murmur3 ● murmur3 to compute hashes of values at index-time and store them in the index Attachment datatype ● See the mapper-attachments plugin which supports indexing attachments like Microsoft Office formats, Open Document formats, ePub, HTML, etc. into an attachment datatype. Percolator type ● Accepts queries from the query-dsl
  • 48. Summary ● Select sensible hardware (or VM) and tune your OS ● Know your workload and tune Elasticsearch to match ● Rsyslog is amazing, it can talk natively to Elasticsearch and is unbelievably scalable ● Search::Elasticsearch is always the way to go (except perhaps, for trivial shell scripts) ● Break your data up into as many fields as you can ● Use native dataypes and get maximum value using Elasticsearch’s query functions ● More shards and/or more replicas with more servers will increase query performance ● More indexes will increase write performance if you write across them ● Use Index names with date stamps and Aliases to manage data elegantly and efficiently ● Plan how you will degrade then drop data