SlideShare a Scribd company logo
Data	
  explora+on	
  with	
  Elas+csearch	
  
Aleksander	
  M.	
  Stensby	
  
Monokkel	
  A/S	
  
•  Aleksander	
  M.	
  Stensby	
  
•  CEO	
  in	
  Monokkel	
  AS	
  
•  Previously	
  COO	
  in	
  Integrasco	
  AS	
  
•  Working	
  with	
  search	
  and	
  data	
  analysis	
  since	
  2004	
  
www.monokkel.io	
  
•  Daglig	
  leder	
  i	
  Monokkel	
  AS	
  
•  Tidligere	
  COO	
  i	
  Integrasco	
  AS	
  
•  Persistering,	
  Prosessering	
  og	
  Presentasjon	
  av	
  data	
  
Persistence	
  –	
  Processing	
  –	
  PresentaHon	
  
Agenda	
  
•  Search	
  fundamentals	
  primer	
  
	
  
•  Intro	
  to	
  elasHcsearch	
  
	
  
•  Search,	
  filter	
  and	
  aggregate!	
  
Agenda	
  
•  Search	
  fundamentals	
  primer	
  
	
  
•  Intro	
  to	
  elasHcsearch	
  
•  Search,	
  filter	
  and	
  aggregate!	
  
…	
  and	
  some	
  bonus	
  visualisaHon!	
  
What	
  we	
  will	
  not	
  cover	
  today…	
  
•  All	
  the	
  different	
  searches,	
  filters	
  and	
  
aggregaHons	
  available	
  in	
  elasHcsearch	
  J	
  
	
  
•  Details	
  on	
  tokenizaHon,	
  analyzers…	
  
	
  
•  ElasHcsearch	
  in	
  producHon	
  and	
  performance	
  
tuning…	
  
•  Data	
  integraHon	
  
Search	
  fundamentals	
  101	
  
Document
Fields
(Key Value)
Title
Content
Signature
“We know what we
are, but know not
what we may be.”
Term	
   Frequency	
  
we	
   3	
  
know	
   2	
  
what	
   2	
  
are	
   1	
  
but	
   1	
  
not	
   1	
  
may	
   1	
  
be	
   1	
  
“We know what
we are, but
know not what
we may be.”
Term Vector
Index
“We were born to run”
“No one told you when
to run”
“Some were born to sing
the blues”
The	
  Inverted	
  Index	
  
Term	
   Frequency	
  
blues	
   1	
  
born	
   2	
  
no	
   1	
  
one	
   1	
  
run	
   2	
  
sing	
   1	
  
some	
   1	
  
the	
   1	
  
to	
   3	
  
told	
   1	
  
we	
   1	
  
were	
   2	
  
when	
   1	
  
you	
   1	
  
Documents	
  
3	
  
1,3	
  
2	
  
2	
  
1,2	
  
3	
  
3	
  
3	
  
1,2,3	
  
2	
  
1	
  
1,3	
  
2	
  
2	
  
dictionary postings
1. “We were born to
run ”
2. “No one told you
when to run”
3. “Some were born to
sing the blues”
Searching	
  
born	
  
1. “We were born to
run ”
2. “No one told you
when to run”
3. “Some were born to
sing the blues”
The	
  Boolean	
  Model	
  
Term	
   Frequency	
  
blues	
   1	
  
born	
   2	
  
no	
   1	
  
one	
   1	
  
run	
   2	
  
sing	
   1	
  
some	
   1	
  
the	
   1	
  
to	
   3	
  
told	
   1	
  
we	
   1	
  
were	
   2	
  
when	
   1	
  
you	
   1	
  
Documents	
  
3	
  
1,3	
  
2	
  
2	
  
1,2	
  
3	
  
3	
  
3	
  
1,2,3	
  
2	
  
1	
  
1,3	
  
2	
  
2	
  
dictionary postings
born	
  
Term	
   Frequency	
  
blues	
   1	
  
born	
   2	
  
no	
   1	
  
one	
   1	
  
run	
   2	
  
sing	
   1	
  
some	
   1	
  
the	
   1	
  
to	
   3	
  
told	
   1	
  
we	
   1	
  
were	
   2	
  
when	
   1	
  
you	
   1	
  
Documents	
  
3	
  
1,3	
  
2	
  
2	
  
1,2	
  
3	
  
3	
  
3	
  
1,2,3	
  
2	
  
1	
  
1,3	
  
2	
  
2	
  
dictionary postings
born	
  blues	
  
Term	
   Frequency	
  
blues	
   1	
  
born	
   2	
  
no	
   1	
  
one	
   1	
  
run	
   2	
  
sing	
   1	
  
some	
   1	
  
the	
   1	
  
to	
   3	
  
told	
   1	
  
we	
   1	
  
were	
   2	
  
when	
   1	
  
you	
   1	
  
Documents	
  
3	
  
1,3	
  
2	
  
2	
  
1,2	
  
3	
  
3	
  
3	
  
1,2,3	
  
2	
  
1	
  
1,3	
  
2	
  
2	
  
dictionary postings
born	
  OR	
  blues	
  
Term	
   Frequency	
  
blues	
   1	
  
born	
   2	
  
no	
   1	
  
one	
   1	
  
run	
   2	
  
sing	
   1	
  
some	
   1	
  
the	
   1	
  
to	
   3	
  
told	
   1	
  
we	
   1	
  
were	
   2	
  
when	
   1	
  
you	
   1	
  
Documents	
  
3	
  
1,3	
  
2	
  
2	
  
1,2	
  
3	
  
3	
  
3	
  
1,2,3	
  
2	
  
1	
  
1,3	
  
2	
  
2	
  
dictionary postings
born	
  AND	
  blues	
  
Term	
   Frequency	
  
blues	
   1	
  
born	
   2	
  
no	
   1	
  
one	
   1	
  
run	
   2	
  
sing	
   1	
  
some	
   1	
  
the	
   1	
  
to	
   3	
  
told	
   1	
  
we	
   1	
  
were	
   2	
  
when	
   1	
  
you	
   1	
  
Documents	
  
3	
  
1,3	
  
2	
  
2	
  
1,2	
  
3	
  
3	
  
3	
  
1,2,3	
  
2	
  
1	
  
1,3	
  
2	
  
2	
  
dictionary postings
born	
  NOT	
  blues	
  
Relevancy	
  and	
  Ranking	
  
•  Term	
  frequency	
  
	
  
•  Inverse	
  document	
  frequency	
  
	
  
•  Field-­‐length	
  norm	
  
Similarity	
  
1. “We were born to
run ”
2. “No one told you
when to run”
3. “Some were born to
sing the blues”
[2,	
  0]	
  
[0,	
  0]	
  
[2,	
  5]	
  
0	
  
0	
   1	
   2	
   3	
   4	
   5	
  
1	
  
2	
  
3	
  
“blues”	
  
“born”	
  
query:	
  	
  [2,5]	
  
doc	
  3:	
  	
  [2,5]	
  
doc	
  2:	
  	
  [0,0]	
  
doc	
  1:	
  	
  [2,0]	
  
Search	
  fundamentals	
  101!	
  
•  TokenizaHon	
  
	
  
•  NormalizaHon	
  (case,	
  stop	
  words	
  etc)	
  
	
  
•  Stemming,	
  synonyms	
  
Brief	
  history	
  of	
  elasHcsearch	
  
Shay	
  Banon	
  	
  
-­‐>	
  AbstracHon	
  Layer	
  on	
  top	
  of	
  Lucene	
  	
  
-­‐>	
  Compass	
  	
  
-­‐>	
  Rewricen	
  high	
  performance,	
  	
  
real-­‐Hme,	
  distributed	
  	
  
-­‐>	
  ElasHcsearch	
  	
  
-­‐>	
  February	
  2010	
  
elasHcsearch	
  
•  Open	
  source	
  search	
  engine	
  -­‐	
  wricen	
  in	
  Java	
  
	
  
•  Built	
  on	
  top	
  of	
  Lucene	
  	
  
	
  
•  Simple,	
  coherent,	
  RESTful	
  API	
  
•  Distributed,	
  scalable	
  search	
  engine	
  with	
  real-­‐
Hme	
  analyHcs	
  
{	
  }	
  
 
	
  
“more	
  useable	
  and	
  concise	
  API,	
  scalability,	
  and	
  
opera+onal	
  tools	
  on	
  top	
  of	
  Lucene’s	
  search	
  
implementa+on”	
  
ElasHcsearch	
  nodes	
  and	
  cluster	
  
node
node
node
cluster
ElasHcsearch	
  shards,	
  nodes	
  
index = shard
node
Lucene	
  index	
  and	
  segments	
  
segments
lucene
index
Much	
  more	
  than	
  just	
  search!	
  
•  Real-­‐Hme	
  analyHcs	
  
•  Log	
  analysis	
  
•  PredicHon	
  modelling	
  
•  RecommendaHons	
  
 
	
  
	
  
	
  
	
  
in	
  5	
  minutes	
  	
  
DEMO	
  
	
  
DEMO	
  
•  Install	
  ElasHcSearch	
  
	
  
•  Load	
  in	
  some	
  data	
  
	
  
•  Run	
  a	
  very	
  basic	
  search	
  
 
	
  
	
  
	
  
	
  
in	
  15	
  minutes	
  	
  
DEMO	
  
	
  
Easy	
  peasy…	
  
•  hcp://www.elasHcsearch.org/download	
  
	
  
•  bin/elasHcsearch	
  
	
  or	
  bin/elasHcsearch.bat	
  on	
  windows	
  
	
  
•  hcp://localhost:9200/	
  
	
  or	
  curl	
  –X	
  GET	
  hcp://localhost:9200/	
  
Easy	
  peasy	
  lemon	
  squeezy!	
  
hcp://localhost:9200/<index>/<type>/[<id>]	
  
	
  
Indexing	
  data	
  
curl	
  -­‐XPUT	
  'hcp://localhost:9200/monokkel/user/aleks'	
  
-­‐d	
  '{	
  "name"	
  :	
  "Aleksander	
  Stensby"	
  }’	
  
	
  
	
  
Indexing	
  data	
  
•  shakespeare.json	
  
– hcp://www.elasHcsearch.org/guide/en/kibana/
current/snippets/shakespeare.json	
  
	
  
•  curl	
  -­‐XPUT	
  localhost:9200/_bulk	
  -­‐-­‐data-­‐binary	
  
@shakespeare.json	
  
hcp://localhost:9200/<index>/<type>/	
  
	
  
hcp://localhost:9200/<index>/	
  
	
  
hcp://localhost:9200/	
  
_search	
  
Mapping	
  
•  Is	
  it	
  a	
  number?	
  String?	
  Date?	
  
•  Combining	
  mulHple	
  fields?	
  
•  Default	
  values?	
  
•  Stored?	
  
•  Analyzed?	
  
•  How	
  should	
  we	
  tokenize/analyse/normalize	
  
the	
  field?	
  
Mapping	
  
curl	
  -­‐XPUT	
  hcp://localhost:9200/shakespeare	
  -­‐d	
  '	
  
{	
  
	
  "mappings"	
  :	
  {	
  
	
  	
  "_default_"	
  :	
  {	
  
	
  	
  	
  "properHes"	
  :	
  {	
  
	
  	
  	
  	
  "speaker"	
  :	
  {"type":	
  "string",	
  "index"	
  :	
  "not_analyzed"	
  },	
  
	
  	
  	
  	
  "play_name"	
  :	
  {"type":	
  "string",	
  "index"	
  :	
  "not_analyzed"	
  },	
  
	
  	
  	
  	
  "line_id"	
  :	
  {	
  "type"	
  :	
  "integer"	
  },	
  
	
  	
  	
  	
  "speech_number"	
  :	
  {	
  "type"	
  :	
  "integer"	
  }	
  
	
  	
  	
  }	
  
	
  	
  }	
  
	
  }	
  
}	
  
';	
  
The	
  Query	
  DSL	
  
{	
  
	
  	
  	
  	
  "query":	
  {YOUR_QUERY_HERE}	
  
}	
  
Match	
  Query	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "match":	
  {"text_entry"	
  :	
  "romeo"}	
  
	
  	
  	
  	
  }	
  
}	
  
MulH	
  Match	
  Query	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
   	
  "mulM_match":	
  {	
  
	
   	
  "query":	
  	
  	
  	
  "romeo",	
  
	
   	
  "fields":	
  	
  	
  [	
  "text_entry",	
  "speaker"	
  ]	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
Bool	
  Query	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
"bool":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "must":	
  	
  	
  	
  	
  [	
  ],	
  
	
  	
  	
  	
  	
  	
  	
  	
  "must_not":	
  [	
  ],	
  
	
  	
  	
  	
  	
  	
  	
  	
  "should":	
  [	
  ]	
  
	
  	
  	
  	
  }	
  
}	
  
}	
  
Bool	
  Query	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
"bool":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "must":	
  	
  	
  	
  	
  {	
  "match":	
  {"text_entry":	
  "romeo"	
  }},	
  
	
  	
  	
  	
  	
  	
  	
  	
  "must_not":	
  {	
  "match":	
  {"speaker":	
  	
  	
  "ROMEO"	
  }},	
  
	
  	
  	
  	
  	
  	
  	
  	
  "should":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  "match":	
  {"speaker":	
  "JULIET"	
  }},	
  
	
  {	
  "match":	
  {"speaker":	
  "FRIAR	
  LAURENCE"	
  }}	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  ]	
  
	
  	
  	
  	
  }	
  
}	
  
}	
  
And	
  lots	
  more…	
  
filtered	
  query	
  
prefix	
  query	
  
simple	
  query	
  string	
  query	
  
range	
  query	
  
regexp	
  query	
  
term	
  query	
  
terms	
  query	
  
wildcard	
  query	
  
dis	
  max	
  query	
  
geoshape	
  query	
  
nested	
  query	
  
	
  
more	
  like	
  this	
  query	
  
more	
  like	
  this	
  field	
  query	
  
boosHng	
  query	
  
common	
  terms	
  query	
  
constant	
  score	
  query	
  
fuzzy	
  like	
  this	
  query	
  
fuzzy	
  like	
  this	
  field	
  query	
  
funcHon	
  score	
  query	
  
fuzzy	
  query	
  
has	
  child	
  query	
  
has	
  parent	
  query	
  
	
  
ids	
  query	
  
indices	
  query	
  
span	
  first	
  query	
  
span	
  mulH	
  term	
  query	
  
span	
  near	
  query	
  
span	
  not	
  query	
  
span	
  or	
  query	
  
span	
  term	
  query	
  
top	
  children	
  query	
  
minimum	
  should	
  match	
  
mulH	
  term	
  query	
  rewrite	
  
template	
  query	
  
	
  
	
  
hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/query-­‐dsl-­‐queries.html	
  
Filtering	
  
•  Filters	
  do	
  not	
  score	
  so	
  they	
  are	
  faster	
  to	
  
execute	
  than	
  queries	
  
	
  
•  Filters	
  can	
  be	
  cached	
  in	
  memory	
  -­‐	
  significantly	
  
faster	
  than	
  queries	
  
	
  
If relevance is not important, use
filters, otherwise, use queries!
The	
  Filtered	
  Query:	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "filtered":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "query":	
  	
  {YOUR_QUERY_HERE},	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "filter":	
  {YOUR_FILTER_HERE}	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
The	
  Filtered	
  Query:	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "filtered":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "query":	
  	
  {	
  "match":	
  {"content":	
  "monokkel"	
  }},	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "filter":	
  {	
  "term":	
  {	
  "tag":	
  "awesome"	
  }}	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
Term	
  Filter	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "filtered":	
  {	
  
	
   	
  "filter":	
  {	
  	
  
	
   	
   	
  "term":	
  {	
  
	
   	
   	
   	
  "speaker":	
  "ROMEO"	
  	
  
	
   	
   	
  }	
  
	
   	
  }	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
Terms	
  Filter	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "filtered":	
  {	
  
	
   	
  "filter":	
  {	
  	
  
	
   	
   	
  "terms":	
  {	
  
	
   	
   	
   	
  "speaker":	
  ["ROMEO",	
  "JULIET"]	
  	
  
	
   	
   	
  }	
  
	
   	
  }	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
Bool	
  Filter	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "filtered":	
  {	
  
	
   	
  "filter":	
  {	
  	
  
	
   	
   	
  	
  "bool"	
  :	
  {	
  
	
  	
  	
  	
  	
  	
   	
   	
   	
   	
  "must"	
  :	
  	
  	
  	
  	
  [],	
  
	
  	
  	
  	
  	
  	
   	
   	
   	
   	
  "should"	
  :	
  	
  	
  [],	
  
	
  	
  	
  	
  	
  	
   	
   	
   	
   	
  "must_not"	
  :	
  []	
  
	
  	
  	
   	
   	
   	
  } 	
   	
  	
  
	
   	
  }	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
Range	
  Filter	
  
{	
  
	
  	
  	
  	
  "query":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "filtered":	
  {	
  
	
   	
  "filter":	
  {	
  	
  
	
   	
   	
  	
  "range"	
  :	
  {	
  
	
  	
  	
  	
   	
   	
   	
   	
  "price"	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
   	
   	
   	
   	
   	
  "gt"	
  :	
  20,	
  
	
  	
  	
  	
  	
  	
  	
  	
   	
   	
   	
   	
   	
  "lt"	
  :	
  40	
  
	
  	
  	
  	
   	
   	
   	
   	
  }	
  
	
   	
   	
  } 	
   	
  	
  
	
   	
  }	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
And	
  lots	
  more…	
  
match	
  all	
  filter	
  
and	
  filter	
  
not	
  filter	
  
or	
  filter	
  
prefix	
  filter	
  
query	
  filter	
  
regexp	
  filter	
  
type	
  filter	
  
	
  
geo	
  bounding	
  box	
  filter	
  
geo	
  distance	
  filter	
  
geo	
  distance	
  range	
  filter	
  
geo	
  polygon	
  filter	
  
geoshape	
  filter	
  
geohash	
  cell	
  filter	
  
has	
  child	
  filter	
  
has	
  parent	
  filter	
  
ids	
  filter	
  
indices	
  filter	
  
limit	
  filter	
  
nested	
  filter	
  
script	
  filter	
  
hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/query-­‐dsl-­‐filters.html	
  
Kibana	
  
•  hcp://www.elasHcsearch.org/overview/
kibana/installaHon/	
  
	
  
•  bin/kibana 	
  	
  
or	
  bin/kibana.bat	
  on	
  windows	
  
	
  
•  hcp://localhost:5601/	
  
	
  
AggregaHons	
  
•  Buckets	
  and	
  Metrics:	
  
par++oning	
  documents	
  based	
  on	
  a	
  criteria	
  
SELECT	
  COUNT(color)	
  
FROM	
  table	
  
GROUP	
  BY	
  color	
  
	
  
An	
  aggrega+on	
  is	
  a	
  combina+on	
  of	
  buckets	
  and	
  
metrics	
  
metric
bucket
AggregaHons	
  
{	
  
	
  	
  	
  	
  "aggs":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "speakers":	
  {	
  
	
   	
  "terms":	
  {	
  	
  
	
   	
   	
  "field":	
  "speaker"	
  	
  
	
   	
  }	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
your aggregation name
bucket type
AggregaHons	
  
AggregaHons	
  
{	
  
	
  	
  	
  	
  "aggs":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "beertypes":	
  {	
  
	
   	
  "terms":	
  {	
  	
  
	
   	
   	
  "field":	
  "beertype"	
  	
  
	
   	
  }	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
AggregaHons	
  
{	
  
	
  	
  	
  	
  "aggs":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "beertypes":	
  {	
  
	
   	
  "terms":	
  {	
  	
  
	
   	
   	
  "field":	
  "beertype"	
  	
  
	
   	
  },	
  
	
   	
  "aggs":	
  {	
  
	
   	
   	
  "avg_ibu":	
  {	
  
	
   	
   	
   	
  "avg":	
  {	
  
	
   	
   	
   	
   	
  "field":	
  "ibu"	
  
	
   	
   	
   	
  }	
  
	
   	
   	
  }	
  
	
   	
  }	
  	
  
	
  }	
  
	
  	
  	
  	
  }	
  
}	
  
your aggregation name
metric type
AggregaHons	
  
min	
  
max	
  
sum	
  
avg	
  
stats	
  
extended	
  stats	
  
value	
  count	
  
percenHles	
  
percenHle	
  ranks	
  
cardinality	
  
top	
  hits	
  
scripted	
  metric	
  
global	
  
filter	
  
filters	
  
missing	
  
nested	
  
reverse	
  nested	
  
children	
  
terms	
  
significant	
  terms	
  
range	
  
date	
  range	
  
ipv4	
  range	
  
histogram	
  
date	
  historgram	
  
geo	
  bounds	
  
geo	
  distance	
  
geohash	
  grid	
  
hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/search-­‐aggrega+ons.html	
  
And	
  a	
  whole	
  lot	
  more!	
  
•  Geosearch,	
  distance	
  and	
  bounds	
  	
  
•  ”More	
  Like	
  This”	
  
•  Suggesters	
  /	
  Autocomplete	
  
•  PercolaMon	
  
•  Language	
  drivers	
  
•  ScripMng	
  
Further	
  reading	
  and	
  some	
  great	
  
resources!	
  
•  hcp://www.elasHcsearch.org/guide/	
  
	
  
•  hcp://blog.monokkel.io/	
  
	
  
•  hcps://found.no/foundaHon/	
  
Shameful	
  self-­‐promoHon	
  	
  
/ Tarjei Romtveit
/ Tarjei Romtveit

More Related Content

What's hot

Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
Ben van Mol
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Lucidworks
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
Suhel Meman
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
Lucidworks
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
Satish Mohan
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Karel Minarik
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
Karel Minarik
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Christos Manios
 
it's just search
it's just searchit's just search
it's just search
Erik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Beautiful REST+JSON APIs with Ion
Beautiful REST+JSON APIs with IonBeautiful REST+JSON APIs with Ion
Beautiful REST+JSON APIs with Ion
Stormpath
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
David Smiley
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
Atlogys Technical Consulting
 
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearchSearch Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
Florian Hopf
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
pittaya
 

What's hot (20)

Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)Elasticsearch (Rubyshift 2013)
Elasticsearch (Rubyshift 2013)
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
it's just search
it's just searchit's just search
it's just search
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Beautiful REST+JSON APIs with Ion
Beautiful REST+JSON APIs with IonBeautiful REST+JSON APIs with Ion
Beautiful REST+JSON APIs with Ion
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearchSearch Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 

Viewers also liked

Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas HollowayUtah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
Thomas Holloway
 
Quality Reach I SMX West Slides
Quality Reach I SMX West SlidesQuality Reach I SMX West Slides
Quality Reach I SMX West Slides
97th Floor
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
Codemotion
 
Cadenas marquistas 2009
Cadenas marquistas 2009Cadenas marquistas 2009
Cadenas marquistas 2009
Leopoldo Barcena Roji
 
ElasticSearch: la tenés atroden Google
ElasticSearch: la tenés atroden GoogleElasticSearch: la tenés atroden Google
ElasticSearch: la tenés atroden GoogleMariano Iglesias
 
10 pasos para desarrollar un plan de negocios en internet.
10 pasos para desarrollar un plan de negocios en internet. 10 pasos para desarrollar un plan de negocios en internet.
10 pasos para desarrollar un plan de negocios en internet.
Interlat
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017
Linkfluence
 

Viewers also liked (10)

Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas HollowayUtah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
 
Quality Reach I SMX West Slides
Quality Reach I SMX West SlidesQuality Reach I SMX West Slides
Quality Reach I SMX West Slides
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Cadenas marquistas 2009
Cadenas marquistas 2009Cadenas marquistas 2009
Cadenas marquistas 2009
 
ElasticSearch: la tenés atroden Google
ElasticSearch: la tenés atroden GoogleElasticSearch: la tenés atroden Google
ElasticSearch: la tenés atroden Google
 
10 pasos para desarrollar un plan de negocios en internet.
10 pasos para desarrollar un plan de negocios en internet. 10 pasos para desarrollar un plan de negocios en internet.
10 pasos para desarrollar un plan de negocios en internet.
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017
 

Similar to Data Exploration with Elasticsearch

Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
Itamar
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
Itamar
 
Elasticsearch at EyeEm
Elasticsearch at EyeEmElasticsearch at EyeEm
Elasticsearch at EyeEm
Lars Fronius
 
Optimizing Multilingual Search: Presented by David Troiano, Basis Technology
Optimizing Multilingual Search: Presented by David Troiano, Basis TechnologyOptimizing Multilingual Search: Presented by David Troiano, Basis Technology
Optimizing Multilingual Search: Presented by David Troiano, Basis Technology
Lucidworks
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
rtelmore
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBlehresman
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
SEMINARGROOT
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engine
Sylvain Utard
 
Stopwords in Search
Stopwords in SearchStopwords in Search
Stopwords in Search
Tomasz Sobczak
 
Elasto Mania
Elasto ManiaElasto Mania
Elasto Mania
andrefsantos
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache Solr
Brooke Ganz
 
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
Amazon Web Services
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Luiz Messias
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)
Timon Vonk
 
bm25 demystified
bm25 demystifiedbm25 demystified
bm25 demystified
Fan Robbin
 
Python.pptx
Python.pptxPython.pptx
Python.pptx
AshaS74
 
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기 (한국어 보이게 수정)
<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기 (한국어 보이게 수정)<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기 (한국어 보이게 수정)
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기 (한국어 보이게 수정)
Han-seok Jo
 
Text Mining
Text MiningText Mining
Text Mining
sathish sak
 

Similar to Data Exploration with Elasticsearch (20)

Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
 
Elasticsearch at EyeEm
Elasticsearch at EyeEmElasticsearch at EyeEm
Elasticsearch at EyeEm
 
Optimizing Multilingual Search: Presented by David Troiano, Basis Technology
Optimizing Multilingual Search: Presented by David Troiano, Basis TechnologyOptimizing Multilingual Search: Presented by David Troiano, Basis Technology
Optimizing Multilingual Search: Presented by David Troiano, Basis Technology
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
NLTK
NLTKNLTK
NLTK
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engine
 
Stopwords in Search
Stopwords in SearchStopwords in Search
Stopwords in Search
 
Elasto Mania
Elasto ManiaElasto Mania
Elasto Mania
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache Solr
 
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)You're not using ElasticSearch (outdated)
You're not using ElasticSearch (outdated)
 
bm25 demystified
bm25 demystifiedbm25 demystified
bm25 demystified
 
Python.pptx
Python.pptxPython.pptx
Python.pptx
 
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기 (한국어 보이게 수정)
<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기 (한국어 보이게 수정)<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기 (한국어 보이게 수정)
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기 (한국어 보이게 수정)
 
Text Mining
Text MiningText Mining
Text Mining
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Data Exploration with Elasticsearch

  • 1. Data  explora+on  with  Elas+csearch   Aleksander  M.  Stensby   Monokkel  A/S  
  • 2.
  • 3.
  • 4.
  • 5. •  Aleksander  M.  Stensby   •  CEO  in  Monokkel  AS   •  Previously  COO  in  Integrasco  AS   •  Working  with  search  and  data  analysis  since  2004   www.monokkel.io  
  • 6. •  Daglig  leder  i  Monokkel  AS   •  Tidligere  COO  i  Integrasco  AS   •  Persistering,  Prosessering  og  Presentasjon  av  data   Persistence  –  Processing  –  PresentaHon  
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Agenda   •  Search  fundamentals  primer     •  Intro  to  elasHcsearch     •  Search,  filter  and  aggregate!  
  • 13. Agenda   •  Search  fundamentals  primer     •  Intro  to  elasHcsearch   •  Search,  filter  and  aggregate!   …  and  some  bonus  visualisaHon!  
  • 14. What  we  will  not  cover  today…   •  All  the  different  searches,  filters  and   aggregaHons  available  in  elasHcsearch  J     •  Details  on  tokenizaHon,  analyzers…     •  ElasHcsearch  in  producHon  and  performance   tuning…   •  Data  integraHon  
  • 18. “We know what we are, but know not what we may be.”
  • 19. Term   Frequency   we   3   know   2   what   2   are   1   but   1   not   1   may   1   be   1   “We know what we are, but know not what we may be.” Term Vector
  • 20.
  • 21. Index
  • 22. “We were born to run” “No one told you when to run” “Some were born to sing the blues”
  • 23.
  • 24.
  • 25.
  • 26. The  Inverted  Index   Term   Frequency   blues   1   born   2   no   1   one   1   run   2   sing   1   some   1   the   1   to   3   told   1   we   1   were   2   when   1   you   1   Documents   3   1,3   2   2   1,2   3   3   3   1,2,3   2   1   1,3   2   2   dictionary postings 1. “We were born to run ” 2. “No one told you when to run” 3. “Some were born to sing the blues”
  • 27. Searching   born   1. “We were born to run ” 2. “No one told you when to run” 3. “Some were born to sing the blues”
  • 28. The  Boolean  Model   Term   Frequency   blues   1   born   2   no   1   one   1   run   2   sing   1   some   1   the   1   to   3   told   1   we   1   were   2   when   1   you   1   Documents   3   1,3   2   2   1,2   3   3   3   1,2,3   2   1   1,3   2   2   dictionary postings born  
  • 29. Term   Frequency   blues   1   born   2   no   1   one   1   run   2   sing   1   some   1   the   1   to   3   told   1   we   1   were   2   when   1   you   1   Documents   3   1,3   2   2   1,2   3   3   3   1,2,3   2   1   1,3   2   2   dictionary postings born  blues  
  • 30. Term   Frequency   blues   1   born   2   no   1   one   1   run   2   sing   1   some   1   the   1   to   3   told   1   we   1   were   2   when   1   you   1   Documents   3   1,3   2   2   1,2   3   3   3   1,2,3   2   1   1,3   2   2   dictionary postings born  OR  blues  
  • 31. Term   Frequency   blues   1   born   2   no   1   one   1   run   2   sing   1   some   1   the   1   to   3   told   1   we   1   were   2   when   1   you   1   Documents   3   1,3   2   2   1,2   3   3   3   1,2,3   2   1   1,3   2   2   dictionary postings born  AND  blues  
  • 32. Term   Frequency   blues   1   born   2   no   1   one   1   run   2   sing   1   some   1   the   1   to   3   told   1   we   1   were   2   when   1   you   1   Documents   3   1,3   2   2   1,2   3   3   3   1,2,3   2   1   1,3   2   2   dictionary postings born  NOT  blues  
  • 33. Relevancy  and  Ranking   •  Term  frequency     •  Inverse  document  frequency     •  Field-­‐length  norm  
  • 34. Similarity   1. “We were born to run ” 2. “No one told you when to run” 3. “Some were born to sing the blues” [2,  0]   [0,  0]   [2,  5]   0   0   1   2   3   4   5   1   2   3   “blues”   “born”   query:    [2,5]   doc  3:    [2,5]   doc  2:    [0,0]   doc  1:    [2,0]  
  • 35. Search  fundamentals  101!   •  TokenizaHon     •  NormalizaHon  (case,  stop  words  etc)     •  Stemming,  synonyms  
  • 36. Brief  history  of  elasHcsearch   Shay  Banon     -­‐>  AbstracHon  Layer  on  top  of  Lucene     -­‐>  Compass     -­‐>  Rewricen  high  performance,     real-­‐Hme,  distributed     -­‐>  ElasHcsearch     -­‐>  February  2010  
  • 37. elasHcsearch   •  Open  source  search  engine  -­‐  wricen  in  Java     •  Built  on  top  of  Lucene       •  Simple,  coherent,  RESTful  API   •  Distributed,  scalable  search  engine  with  real-­‐ Hme  analyHcs   {  }  
  • 38.     “more  useable  and  concise  API,  scalability,  and   opera+onal  tools  on  top  of  Lucene’s  search   implementa+on”  
  • 39. ElasHcsearch  nodes  and  cluster   node node node cluster
  • 40. ElasHcsearch  shards,  nodes   index = shard node
  • 41. Lucene  index  and  segments   segments lucene index
  • 42. Much  more  than  just  search!   •  Real-­‐Hme  analyHcs   •  Log  analysis   •  PredicHon  modelling   •  RecommendaHons  
  • 43.           in  5  minutes     DEMO    
  • 44. DEMO   •  Install  ElasHcSearch     •  Load  in  some  data     •  Run  a  very  basic  search  
  • 45.           in  15  minutes     DEMO    
  • 46. Easy  peasy…   •  hcp://www.elasHcsearch.org/download     •  bin/elasHcsearch    or  bin/elasHcsearch.bat  on  windows     •  hcp://localhost:9200/    or  curl  –X  GET  hcp://localhost:9200/  
  • 47. Easy  peasy  lemon  squeezy!  
  • 49. Indexing  data   curl  -­‐XPUT  'hcp://localhost:9200/monokkel/user/aleks'   -­‐d  '{  "name"  :  "Aleksander  Stensby"  }’      
  • 50. Indexing  data   •  shakespeare.json   – hcp://www.elasHcsearch.org/guide/en/kibana/ current/snippets/shakespeare.json     •  curl  -­‐XPUT  localhost:9200/_bulk  -­‐-­‐data-­‐binary   @shakespeare.json  
  • 51.
  • 52.
  • 54. Mapping   •  Is  it  a  number?  String?  Date?   •  Combining  mulHple  fields?   •  Default  values?   •  Stored?   •  Analyzed?   •  How  should  we  tokenize/analyse/normalize   the  field?  
  • 55.
  • 56. Mapping   curl  -­‐XPUT  hcp://localhost:9200/shakespeare  -­‐d  '   {    "mappings"  :  {      "_default_"  :  {        "properHes"  :  {          "speaker"  :  {"type":  "string",  "index"  :  "not_analyzed"  },          "play_name"  :  {"type":  "string",  "index"  :  "not_analyzed"  },          "line_id"  :  {  "type"  :  "integer"  },          "speech_number"  :  {  "type"  :  "integer"  }        }      }    }   }   ';  
  • 57.
  • 58.
  • 59. The  Query  DSL   {          "query":  {YOUR_QUERY_HERE}   }  
  • 60. Match  Query   {          "query":  {                  "match":  {"text_entry"  :  "romeo"}          }   }  
  • 61. MulH  Match  Query   {          "query":  {                    "mulM_match":  {      "query":        "romeo",      "fields":      [  "text_entry",  "speaker"  ]    }          }   }  
  • 62. Bool  Query   {          "query":  {   "bool":  {                  "must":          [  ],                  "must_not":  [  ],                  "should":  [  ]          }   }   }  
  • 63. Bool  Query   {          "query":  {   "bool":  {                  "must":          {  "match":  {"text_entry":  "romeo"  }},                  "must_not":  {  "match":  {"speaker":      "ROMEO"  }},                  "should":  [                            {  "match":  {"speaker":  "JULIET"  }},    {  "match":  {"speaker":  "FRIAR  LAURENCE"  }}                      ]          }   }   }  
  • 64. And  lots  more…   filtered  query   prefix  query   simple  query  string  query   range  query   regexp  query   term  query   terms  query   wildcard  query   dis  max  query   geoshape  query   nested  query     more  like  this  query   more  like  this  field  query   boosHng  query   common  terms  query   constant  score  query   fuzzy  like  this  query   fuzzy  like  this  field  query   funcHon  score  query   fuzzy  query   has  child  query   has  parent  query     ids  query   indices  query   span  first  query   span  mulH  term  query   span  near  query   span  not  query   span  or  query   span  term  query   top  children  query   minimum  should  match   mulH  term  query  rewrite   template  query       hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/query-­‐dsl-­‐queries.html  
  • 65. Filtering   •  Filters  do  not  score  so  they  are  faster  to   execute  than  queries     •  Filters  can  be  cached  in  memory  -­‐  significantly   faster  than  queries     If relevance is not important, use filters, otherwise, use queries!
  • 66. The  Filtered  Query:   {          "query":  {                  "filtered":  {                          "query":    {YOUR_QUERY_HERE},                          "filter":  {YOUR_FILTER_HERE}    }          }   }  
  • 67. The  Filtered  Query:   {          "query":  {                  "filtered":  {                          "query":    {  "match":  {"content":  "monokkel"  }},                          "filter":  {  "term":  {  "tag":  "awesome"  }}    }          }   }  
  • 68. Term  Filter   {          "query":  {                  "filtered":  {      "filter":  {          "term":  {          "speaker":  "ROMEO"          }      }    }          }   }  
  • 69. Terms  Filter   {          "query":  {                  "filtered":  {      "filter":  {          "terms":  {          "speaker":  ["ROMEO",  "JULIET"]          }      }    }          }   }  
  • 70. Bool  Filter   {          "query":  {                  "filtered":  {      "filter":  {            "bool"  :  {                      "must"  :          [],                      "should"  :      [],                      "must_not"  :  []              }          }    }          }   }  
  • 71. Range  Filter   {          "query":  {                  "filtered":  {      "filter":  {            "range"  :  {                  "price"  :  {                            "gt"  :  20,                            "lt"  :  40                  }        }          }    }          }   }  
  • 72. And  lots  more…   match  all  filter   and  filter   not  filter   or  filter   prefix  filter   query  filter   regexp  filter   type  filter     geo  bounding  box  filter   geo  distance  filter   geo  distance  range  filter   geo  polygon  filter   geoshape  filter   geohash  cell  filter   has  child  filter   has  parent  filter   ids  filter   indices  filter   limit  filter   nested  filter   script  filter   hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/query-­‐dsl-­‐filters.html  
  • 73.
  • 74. Kibana   •  hcp://www.elasHcsearch.org/overview/ kibana/installaHon/     •  bin/kibana     or  bin/kibana.bat  on  windows     •  hcp://localhost:5601/    
  • 75.
  • 76.
  • 77. AggregaHons   •  Buckets  and  Metrics:   par++oning  documents  based  on  a  criteria   SELECT  COUNT(color)   FROM  table   GROUP  BY  color     An  aggrega+on  is  a  combina+on  of  buckets  and   metrics   metric bucket
  • 78. AggregaHons   {          "aggs":  {                  "speakers":  {      "terms":  {          "field":  "speaker"        }    }          }   }   your aggregation name bucket type
  • 80. AggregaHons   {          "aggs":  {                  "beertypes":  {      "terms":  {          "field":  "beertype"        }    }          }   }  
  • 81. AggregaHons   {          "aggs":  {                  "beertypes":  {      "terms":  {          "field":  "beertype"        },      "aggs":  {        "avg_ibu":  {          "avg":  {            "field":  "ibu"          }        }      }      }          }   }   your aggregation name metric type
  • 82. AggregaHons   min   max   sum   avg   stats   extended  stats   value  count   percenHles   percenHle  ranks   cardinality   top  hits   scripted  metric   global   filter   filters   missing   nested   reverse  nested   children   terms   significant  terms   range   date  range   ipv4  range   histogram   date  historgram   geo  bounds   geo  distance   geohash  grid   hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/search-­‐aggrega+ons.html  
  • 83. And  a  whole  lot  more!   •  Geosearch,  distance  and  bounds     •  ”More  Like  This”   •  Suggesters  /  Autocomplete   •  PercolaMon   •  Language  drivers   •  ScripMng  
  • 84. Further  reading  and  some  great   resources!   •  hcp://www.elasHcsearch.org/guide/     •  hcp://blog.monokkel.io/     •  hcps://found.no/foundaHon/  
  • 85. Shameful  self-­‐promoHon     / Tarjei Romtveit / Tarjei Romtveit