Meetup Elastic Lisboa
Abril 7, 2016
ElasticSearch - Know Your Data
7 de Abril, 2016
Nuno Ochoa
Elastic Evangelist@Polarising
3
ElasticSearch - Know Your Data
Elastic Stack
Polarisign 2014 3
Kibana
ElasticSearch
Logstash Beats
Plugins
Data Visualization
Store, Index,
Search, Analyse
Data Ingestion
4
ElasticSearch - Know Your Data
ElasticSearch – Store, Index, Search and Analyze
Polarisign 2014 4
Distributed and Scalable
﹥ Design to scale-out: improved resiliency
﹥ High Availability
﹥ Multi tenant
Search and Analytics
﹥ Real-time (near)
﹥ Full text search
﹥ Multi language support
﹥ Aggregations
﹥ Geospacial Info support
Developing Key Features
﹥ Schemaless (structure and unstructured data)
﹥ Document oriented (JSON)
﹥ RESTful API
﹥ Client Libraries (Java, .Net, PHP, etc)
﹥ Build on top of Apache Lucene
5
ElasticSearch - Know Your Data
ElasticSearch – Scale-Out and High-Availability
Polarisign 2014 5
Elastic Cluster
Node 1
Shard 1 Shard 2
Shard 3 Shard 4
Elastic Cluster
Node 1
Shard 1 Shard 2
Shard 3 Shard 4
Node 2
Shard 1 Shard 2
Shard 3 Shard 4
Add one node
• 1 index
• Nr of shards: 4
• Nr of Replicas: 1
6
ElasticSearch - Know Your Data
Logstash – Data Ingest
Polarisign 2014 6
Collect, Enrich and Transport Data
﹥ Centralize data processing
﹥ Normalize/Format distinct data types
﹥ Easily extensible
﹥ 200+ plugins
Data Pipeline
Input Filter(s) Output
• elasticsearch
• file
• syslog
• rabbitmq
• csv
• geoip
• elasticsearch
• mutate
• file
• mongodb
• elasticsearch
• rabbitmq
7
ElasticSearch - Know Your Data
Beats – Data Ingest
Polarisign 2014 7
Collect, Parse and Ship Data
﹥ Lightweight data shippers
﹥ Forward host-based operational metrics
﹥ Single purpose
﹥ Libbeat, framework to build new Beats
Operational Data Examples
﹥ Wire Data => Packetbeat (multiple decoders available, like HTTP, MySql)
﹥ System Stats => Topbeat
﹥ Logs => Filebeat, Winlogbeat
8
ElasticSearch - Know Your Data
Kibana – Data Visualization
Polarisign 2014 8
Search and Analytics
﹥ Web based interface for data visualization stored in ES
﹥ Explore and analyze patterns in data
﹥ Leverage the power of Elasticsearch analytics capabilities
Visualization
﹥ Supports multiple types, charts, maps, histograms
﹥ Share and embed dashboards into operational dashaboards
﹥ Supports custom visualizations and applications
﹥ Plugins for cluster management and administration
9
ElasticSearch - Know Your Data
Elastic – Use Cases
Polarisign 2014 9
Search
﹥ Full text search (fast)
﹥ Fuzzy text search
﹥ Geospacial
Analytics
﹥ Explore your data
﹥ Ask complex queries about your data
﹥ Leveraged by ES aggregations feature
﹥ Kibana!
Logging
﹥ Logstash or Beats => ElasticSearch => Kibana
﹥ Centralize log store, analytics and visualization
10
ElasticSearch - Know Your Data
Elastic – Aggregations
Polarisign 2014 10
﹥ Summarize our data vs looking for particular documents
Sample Questions
﹥ Most popular items?
﹥ Average shopping value for each day?
﹥ Which stores sells more?
﹥ Sky is the limit …
﹥ Quick and near real-time, just like search
﹥ Powerful for reports and dashboards => No need for long run jobs
﹥ Can be combined with search/filter queries:
﹥ Most popular items? => Most popular Electronic items?
11
ElasticSearch - Know Your Data
Elastic – Aggregations
Polarisign 2014 11
Concepts
﹥ Buckets: Collections of documents that meet a criterion
﹥ Metrics: Statistics calculated on the documents in a bucket
﹥ An aggregation is a combination of one or more buckets and zero or more metrics
SQL
SELECT COUNT(product)
FROM SALE
GROUP BY product
GET sales/_search
{
"size": 0,
"aggs": {
"products": {
"terms": {
"field": “product",
"size": 100
}
}
}
}
Elastic
12
ElasticSearch - Know Your Data
Elastic – Aggregations
Polarisign 2014 12
Concepts
﹥ Buckets: Collections of documents that meet a criterion
﹥ Metrics: Statistics calculated on the documents in a bucket
﹥ An aggregation is a combination of one or more buckets and zero or more metrics
SQL
SELECT COUNT(product)
FROM SALE
GROUP BY product
GET sales/_search
{
"size": 0,
"aggs": {
"products": {
"terms": {
"field": “product",
"size": 100
}
}
}
}
Elastic
13
ElasticSearch - Know Your Data
Elastic – Aggregations
Polarisign 2014 13
Types
﹥ Bucketing: builds buckets according to a criteria
﹥ Histogram, Date Histogram, Geo Distance, Terms
﹥ Metrics: compute metrics over a set of documents
﹥ Avg, Stats, Sum, Top Hits
﹥ Pipeline: aggregate the output of other aggregations and their associated metrics
Sub-Aggregation
﹥ Bucketing aggregations can have sub-aggregations
﹥ No hard-limit of nested aggregation
﹥ Sub-aggregations will execute within the context of parent buckets
14
João Duarte
@jsvd
What's happening at
Elastic?
Special Elastic{ON} 2016 Edition
15
•February 2016
•San Francisco
•2000+ attendees
•42 sessions
16
Introducing the
Elastic Stack
17
18
19
20
POST /_reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Reindex API
2.3
21
POST /twitter/_update_by_query
{
"script": {
"inline": "ctx._source.likes++"
},
"query": {
"term": {
"user": "kimchy"
}
}
}
Update by Query
2.3
22
Grok
Geo
23
Grok
Geo
Ingest Node!
5.0
24
25
I F O
Pipeline:
The Next Generation
2.1
26
I F O
Pipeline:
The Next Generation
2.2
27
I F O
……………
Pipeline:
The Next Generation
5.x
28
Automatic
configuration
reload
2.3
29
GET /_node/stats/events
GET /_stats/jvm
GET /_node/hot_threads
GET /_plugins
Metrics API5.x
30
GET /_logstash
GET /_logstash/${role}/config/current
POST /_logstash/${role}/config/_rollback
GET /_logstash/${role}/config/${version_number}
GET /_logstash/${role}/nodes/${node_name}
Management API5.x
31
What else in..
•Community Maintainers <3
• Dead letter queues
•Centralized configuration
• Management UIs
Logstash
Source: Gray Arial 10pt
32
33
What to talk about
• Redis/Kafka output
• Filtering
• Metricbeat
• Tons of new community beats!!
Beats
Source: Gray Arial 10pt
34
35
Ability to customize colors, text,
numbers, labels, layouts, skins,
and visualizations.
1 All-new Visualization tools for
Graph and Time Series data.
2 Strong integration with Security,
Monitoring, and the rest of the
Elastic Stack
3
36
37
What to talk about
• space efficiency up 20% => more real estate
• export all the things!
• plugin all the things!
• kibana app generator
• status page
Kibana
Source: Gray Arial 10pt
38
We love extensions
39
Packs
40
Security,
Alerting,
Monitoring
NO OPEN SOURCE
ENTERPRISE EDITION
41
42
Elasticsearch + Kibana as a Service
Latest release of the Elastic Stack and X-Pack
43
Elasticsearch + Kibana as a Service
Latest release of the Elastic Stack and X-Pack
44
Deploy Download
45
What else in..
• Query Profiler
• Painless scripting engine
•New relevancy scoring algo (BM-25)
•Security Manager on by default
• Move from string field to keyword/text
• Java HTTP Client
• So many geo improvements!
Elasticsearch
Source: Gray Arial 10pt
46

Meetup070416 Presentations

  • 1.
  • 2.
    ElasticSearch - KnowYour Data 7 de Abril, 2016 Nuno Ochoa Elastic Evangelist@Polarising
  • 3.
    3 ElasticSearch - KnowYour Data Elastic Stack Polarisign 2014 3 Kibana ElasticSearch Logstash Beats Plugins Data Visualization Store, Index, Search, Analyse Data Ingestion
  • 4.
    4 ElasticSearch - KnowYour Data ElasticSearch – Store, Index, Search and Analyze Polarisign 2014 4 Distributed and Scalable ﹥ Design to scale-out: improved resiliency ﹥ High Availability ﹥ Multi tenant Search and Analytics ﹥ Real-time (near) ﹥ Full text search ﹥ Multi language support ﹥ Aggregations ﹥ Geospacial Info support Developing Key Features ﹥ Schemaless (structure and unstructured data) ﹥ Document oriented (JSON) ﹥ RESTful API ﹥ Client Libraries (Java, .Net, PHP, etc) ﹥ Build on top of Apache Lucene
  • 5.
    5 ElasticSearch - KnowYour Data ElasticSearch – Scale-Out and High-Availability Polarisign 2014 5 Elastic Cluster Node 1 Shard 1 Shard 2 Shard 3 Shard 4 Elastic Cluster Node 1 Shard 1 Shard 2 Shard 3 Shard 4 Node 2 Shard 1 Shard 2 Shard 3 Shard 4 Add one node • 1 index • Nr of shards: 4 • Nr of Replicas: 1
  • 6.
    6 ElasticSearch - KnowYour Data Logstash – Data Ingest Polarisign 2014 6 Collect, Enrich and Transport Data ﹥ Centralize data processing ﹥ Normalize/Format distinct data types ﹥ Easily extensible ﹥ 200+ plugins Data Pipeline Input Filter(s) Output • elasticsearch • file • syslog • rabbitmq • csv • geoip • elasticsearch • mutate • file • mongodb • elasticsearch • rabbitmq
  • 7.
    7 ElasticSearch - KnowYour Data Beats – Data Ingest Polarisign 2014 7 Collect, Parse and Ship Data ﹥ Lightweight data shippers ﹥ Forward host-based operational metrics ﹥ Single purpose ﹥ Libbeat, framework to build new Beats Operational Data Examples ﹥ Wire Data => Packetbeat (multiple decoders available, like HTTP, MySql) ﹥ System Stats => Topbeat ﹥ Logs => Filebeat, Winlogbeat
  • 8.
    8 ElasticSearch - KnowYour Data Kibana – Data Visualization Polarisign 2014 8 Search and Analytics ﹥ Web based interface for data visualization stored in ES ﹥ Explore and analyze patterns in data ﹥ Leverage the power of Elasticsearch analytics capabilities Visualization ﹥ Supports multiple types, charts, maps, histograms ﹥ Share and embed dashboards into operational dashaboards ﹥ Supports custom visualizations and applications ﹥ Plugins for cluster management and administration
  • 9.
    9 ElasticSearch - KnowYour Data Elastic – Use Cases Polarisign 2014 9 Search ﹥ Full text search (fast) ﹥ Fuzzy text search ﹥ Geospacial Analytics ﹥ Explore your data ﹥ Ask complex queries about your data ﹥ Leveraged by ES aggregations feature ﹥ Kibana! Logging ﹥ Logstash or Beats => ElasticSearch => Kibana ﹥ Centralize log store, analytics and visualization
  • 10.
    10 ElasticSearch - KnowYour Data Elastic – Aggregations Polarisign 2014 10 ﹥ Summarize our data vs looking for particular documents Sample Questions ﹥ Most popular items? ﹥ Average shopping value for each day? ﹥ Which stores sells more? ﹥ Sky is the limit … ﹥ Quick and near real-time, just like search ﹥ Powerful for reports and dashboards => No need for long run jobs ﹥ Can be combined with search/filter queries: ﹥ Most popular items? => Most popular Electronic items?
  • 11.
    11 ElasticSearch - KnowYour Data Elastic – Aggregations Polarisign 2014 11 Concepts ﹥ Buckets: Collections of documents that meet a criterion ﹥ Metrics: Statistics calculated on the documents in a bucket ﹥ An aggregation is a combination of one or more buckets and zero or more metrics SQL SELECT COUNT(product) FROM SALE GROUP BY product GET sales/_search { "size": 0, "aggs": { "products": { "terms": { "field": “product", "size": 100 } } } } Elastic
  • 12.
    12 ElasticSearch - KnowYour Data Elastic – Aggregations Polarisign 2014 12 Concepts ﹥ Buckets: Collections of documents that meet a criterion ﹥ Metrics: Statistics calculated on the documents in a bucket ﹥ An aggregation is a combination of one or more buckets and zero or more metrics SQL SELECT COUNT(product) FROM SALE GROUP BY product GET sales/_search { "size": 0, "aggs": { "products": { "terms": { "field": “product", "size": 100 } } } } Elastic
  • 13.
    13 ElasticSearch - KnowYour Data Elastic – Aggregations Polarisign 2014 13 Types ﹥ Bucketing: builds buckets according to a criteria ﹥ Histogram, Date Histogram, Geo Distance, Terms ﹥ Metrics: compute metrics over a set of documents ﹥ Avg, Stats, Sum, Top Hits ﹥ Pipeline: aggregate the output of other aggregations and their associated metrics Sub-Aggregation ﹥ Bucketing aggregations can have sub-aggregations ﹥ No hard-limit of nested aggregation ﹥ Sub-aggregations will execute within the context of parent buckets
  • 14.
    14 João Duarte @jsvd What's happeningat Elastic? Special Elastic{ON} 2016 Edition
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    20 POST /_reindex { "source": { "index":"twitter" }, "dest": { "index": "new_twitter" } } Reindex API 2.3
  • 21.
    21 POST /twitter/_update_by_query { "script": { "inline":"ctx._source.likes++" }, "query": { "term": { "user": "kimchy" } } } Update by Query 2.3
  • 22.
  • 23.
  • 24.
  • 25.
    25 I F O Pipeline: TheNext Generation 2.1
  • 26.
    26 I F O Pipeline: TheNext Generation 2.2
  • 27.
  • 28.
  • 29.
    29 GET /_node/stats/events GET /_stats/jvm GET/_node/hot_threads GET /_plugins Metrics API5.x
  • 30.
    30 GET /_logstash GET /_logstash/${role}/config/current POST/_logstash/${role}/config/_rollback GET /_logstash/${role}/config/${version_number} GET /_logstash/${role}/nodes/${node_name} Management API5.x
  • 31.
    31 What else in.. •CommunityMaintainers <3 • Dead letter queues •Centralized configuration • Management UIs Logstash Source: Gray Arial 10pt
  • 32.
  • 33.
    33 What to talkabout • Redis/Kafka output • Filtering • Metricbeat • Tons of new community beats!! Beats Source: Gray Arial 10pt
  • 34.
  • 35.
    35 Ability to customizecolors, text, numbers, labels, layouts, skins, and visualizations. 1 All-new Visualization tools for Graph and Time Series data. 2 Strong integration with Security, Monitoring, and the rest of the Elastic Stack 3
  • 36.
  • 37.
    37 What to talkabout • space efficiency up 20% => more real estate • export all the things! • plugin all the things! • kibana app generator • status page Kibana Source: Gray Arial 10pt
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
    42 Elasticsearch + Kibanaas a Service Latest release of the Elastic Stack and X-Pack
  • 43.
    43 Elasticsearch + Kibanaas a Service Latest release of the Elastic Stack and X-Pack
  • 44.
  • 45.
    45 What else in.. •Query Profiler • Painless scripting engine •New relevancy scoring algo (BM-25) •Security Manager on by default • Move from string field to keyword/text • Java HTTP Client • So many geo improvements! Elasticsearch Source: Gray Arial 10pt
  • 46.