Introduction to ELK

Introduction to ELK
20160316
Yuhsuan_Chen

2
ELK
• What is ELK?
– Elasticsearch
It provides a distributed, multitenant-capable full-text search engine with an HTTP web
interface and schema-free JSON documents
– Logstash
It is a tool to collect, process, and forward events and log messages
– Kibana
It provides visualization capabilities on top of the content indexed on an Elasticsearch
cluster.
Server Filebeat
Server Filebeat
Logstash Elasticsearch Kibana Nginx User

3
ELK
• Who use it?
– SOUNDCLOUD
• https://www.elastic.co/assets/blt7955f90878661eec/case-study-soundcloud.pdf
– TANGO
• https://www.elastic.co/assets/blt0dc7d9c62f60d38f/case-study-tango.pdf
– GOGOLOOK
• http://www.slideshare.net/tw_dsconf/elasticsearch-kibana
– KKBOX
• http://www.slideshare.net/tw_dsconf/kkbox-51962632

4
Beats
• The Beats are open source data shippers that you install as agents on your
servers to send different types of operational data to Elasticsearch.
• Beats can send data directly to Elasticsearch or send it to Elasticsearch via
Logstash, which you can use to enrich or archive the data

5
Logstash
• The ingestion workhorse for Elasticsearch and more
– Horizontally scalable data processing pipeline with strong Elasticsearch and
Kibana synergy
• Pluggable pipeline architecture
– Mix, match, and orchestrate different inputs, filters, and outputs to play in
pipeline harmony
• Community-extensible and developer-friendly plugin ecosystem
– Over 200 plugins available, plus the flexibility of creating and contributing your
own

6
Logstash
• Logstash Download Link
– Linux
https://download.elastic.co/logstash/logstash/packages/debian/logstash_2.2.2-1_all.deb
– Windows
https://download.elastic.co/logstash/logstash/logstash-2.2.2.zip
• How to set up Logstash
– http://howtorapeurjob.tumblr.com/post/140724250861/

7
Logstash
• Start up
– Linux:
service logstash restart
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/ -v
– Windows:
{file_path}binlogstash.bat
• Parameter
-h or --help 相關資訊
-t or --configtest 測試config是否正常
-f 指定config檔案
-v or --debug 輸出debug資訊
-w 是以多少thread處理資料，預設4
-l 指定log輸出路徑

8
Logstash - Configure
• Configure
– 主結構分三個 input / filter / output，同時都支援多種輸入輸出與外掛
– Configure可以放在同一份檔案也可以放在不同檔案，Logstash 會自動尋找相對應的資料
• Configure Structure
input {
….
}
filter {
….
}
output {
….
}
1
2
3
4
5
6
7
8
9
10
11

9
• stdin
• beats
Logstash - INPUT
Input {
# 將console上的資料輸入
stdin{}
}
1
2
3
4
5
input {
#資料來源為Beats，port是5044，定義的資料類型為 HTCLog
beats{
port => 5044
type => "HTCLog"
}
1
2
3
4
5
6

10
Logstash - INPUT
• generator
input{
generator{
#資料來源是以行方式呈現，如下面兩行
lines =>[
"02-23 10:19:58.588 1696 1750 D 1",
"02-23 10:19:58.695 904 904 D 2"
]
#每一行的次數，這邊設定只輸入一次
count => 1
type => "device_log"
}
}
1
2
3
4
5
6
7
8
9
10
11
12

11
Logstash - INPUT
• file
input{
file{
#指定輸入的檔案路徑，可用array方式記錄多個路徑
path => [
"/home/elasticadmin/Desktop/TestLog/*",
"/home/elasticadmin/Desktop/TestLog/**/*.txt",
]
#用來監聽指定的檔案路徑下變更的時間，預設是15秒
discover_interval => 1
#用來決定找檔案內容的方式，預設是默認結束的位置，如果要從頭開始找資訊就要設定為
beginning
start_position => "beginning"
#用來記錄那些檔案是找過的，如果把檔案砍掉就可以讓 Logstash 重新查找之前的資料，Linux如
果要每次啟動都重新查找，要設定為/dev/null
sincedb_path => "/home/elasticadmin/Desktop/.sincedb_TestLog"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

12
Logstash - FILTER
• grok
• Grok test web
http://grokdebug.herokuapp.com/
filter {
#假設進來的資料type是HTCLog就已grok這個外掛來進行資料的篩選
if [type] == "HTCLog" {
grok {
match => {"message" => "(?<log_time>(0[0-9]+|1[0-2])-[0-6][0-9]
%{TIME})s+(?<pid>([0-9]+))s+(?<ppid>([0-9]+))s+(?<log_level>([Ww]))s+(?<log>(.*)) "}
#輸出時增加一個資料欄位叫file_type，內容為kernel_log
add_field => {
"file_type" => "kernel_log"
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

13
Logstash - FILTER
• mutate
filter {
mutate {
#新增一個field
add_field => {"add word for host" => "Hello world, from %{host}" }
#新增一個tag
add_tag => {"add word for host" => "Hello world, from %{host}"}
#移除host這個field
remove_field => ["host"]
#移除tag這個host
remove_tag => ["host"]
#重新命名host這個field
rename => {"host" => "hostname"}
#重新將message的內容取代
replace => {"message" =>"The message was removed"}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

14
Logstash - OUTPUT
• stdout
• file
output {
#將結果輸出到console
stdout {}
}
1
2
3
4
output{
file{
#檔案輸出路徑
path =>"C:UsersYuhsuan_chenDesktopElasticlogstashresult.gzip"
#將檔案輸出後，壓縮
gzip => true
}
}
1
2
3
4
5
6
7
8
9

15
Logstash - OUTPUT
• elasticsearch
output {
elasticsearch {
# Elasticsearch的web link
hosts => ["10.8.49.16:9200"]
# 要將資料塞進的index名稱
index =>"%{[@metadata][beat]}-%{+YYYY.MM.dd}“
# 要將資料塞進的type名稱
document_type => "%{[@metadata][type]}"
}
}
1
2
3
4
5
6
7
8
9
10

16
Elasticsearch
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to
store, search, and analyze big volumes of data quickly and in near real time.

17
Elasticsearch
• Elasticsearch on Ubuntu
• Elasticsearch cluster on Windows
• Elasticsearch Cluster Design
• CRUD in Elasticsearch
• Query DSL in Elasticsearch

18
Elasticsearch on Ubuntu
# Close firewall
sudo ufw disable
# Edit the configure file
sudo gedit /etc/elasticsearch/elasticsearch.yml
1
2
3
4
5
# elasticsearch.yml
network.host: host_name
cluster.name: "scd500"
node.name: "s1"
discovery.zen.ping.unicast.hosts: ["host_name"]
discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 1
node.master: true
node.data: true
1
2
3
4
5
6
7
8
9
# restart service
sudo service elasticsearch restart
1
2
• Close Firewall & Edit Configure File
• Configure File
• Restart Service

19
Elasticsearch on Ubuntu
• Install plugin for elasticsearch
– Firefox: https://goo.gl/tzwE76 (RESTClient)
– Chrome: https://goo.gl/GCc75v (Postman)
check node status
http://hostname:9200/_cluster/health?pretty
Check Service Stats
http://hostname:9200

20
Elasticsearch cluster on Windows
• Set Java Path on Windows
• Set Configuration
File Path: elasticsearch-2.2.0configelasticsearch.yml
# Node S1: elasticsearch-2.2.0configelasticsearch.yml
network.host: ES1 #ES2 #ES3
cluster.name: "scd500"
node.name: "s1" #"S2" "S3"
discovery.zen.ping.unicast.hosts: ["ES1"] discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 1
node.master: true
node.data: true
1
2
3
4
5
6
7
8
9

21
Elasticsearch cluster on Windows
• Setting in elasticsearch.yml
– network.host: 要寫可以ping到的hostname，要記得關防火牆
– cluster.name: 一定要同一個名稱，不然就算在同往段裡面也不會視為同群組
– node.name: 節點名稱，如果不寫也可以，但是每次重開就會自動重新更新名字
– discovery.zen.ping.unicast.hosts: 用來尋找host的設定，實際測試在HTC網域內設定設定為你的
Master主機才可以自動加入叢集，倘若在單一電腦上不設定就可以偵測到
– discovery.zen.ping.multicast.enabled: 用以防止偵測到預期以外的叢集或節點
– node.master: 是否為Master節點，若沒指定本身在加入節點的時候就會自動設定
– node.data: 是否儲存資料
– path.data: Elasticsearch的資料存放位置
– path.logs: Elasticsearch的log存放位置
– http.port: 預設的port都是9200

22
Elasticsearch Cluster Design
• Check Cluster Status
– Status: Red -> shard與副本有問題
– Status: yellow -> shard啟動，副本有問題
– Status: Red -> shard與副本啟動
GET localhost:9200/_cluster/health1

23
• Index / Shard / replicas
– Elasticsearch最基本的儲存單位為 shard
– 當設定多於一個儲存單位時，會有演算法決定要將資料存在哪個地方
– replicas為你的副本數，在建立index的時候就要先決定好你的架設方式

24
• Single Shard
– 在一台機器裡面只有一個儲存空間，沒有副本
POST localhost:9200/{index_name}
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
1
2
3
4
5
6
7

25
• 2 Shards
– 當有其他node被建立起來時，系統會自動將第二個shard分散給Node2
POST localhost:9200/{index_name}
{
"settings": {
}
}
1
2
3
4
5
6
7

26
• 3 Shards , 1 replica
– 當有其他node被建立起來時，系統會自動將第二個shard分散給Node2
s POST localhost:9200/{index_name}
{
"settings": {
}
}
1
2
3
4
5
6
7
3 Shards
0 Replica
1 Node
3 Shards
1 Replica
3 Nodes
3 Shards
1 Replica
2 nodes

27
CRUD in Elasticsearch
• REST API
在Elasticsearch裡，REST API 結構如下
當執行時，index與type是必要的，id可以指定或不指定
當不指定id而塞入資料時，要以POST傳遞，不能用PUT
以設計Facebook為例子來看CRUD操作
每則訊息都包含了以下基本資料
http://localhost:9200/<index>/<type>/[<id>]1
{
"message" : "",
"feel" : "",
"location" : "",
"datetime" : ""
}
1
2
3
4
5
6

28
• Create Index
預設索引為空的，所以要先建立一個為facebook的index
– Linux
– REST Client
– Result
• Query Indices
– Result
curl -XPOST 'http://localhost:9200/facebook'1
{
"acknowledged": true
}
1
2
3
POST http://localhost:9200/facebook1
GET http://localhost:9200/_cat/indices?v1
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open facebook 5 1 0 0 795b 795b
1
2

29
• Create Document
– Linux
– RESR Client
– Result
curl -XPOST 'http://localhost:9200/facebook/yuhsuan_chen' -d ‘{"message": "
Hello!!!","feel": "Good","location": "Taoyuan City","datetime": "2016-03-01T12:00:00"}’
1
POST http://localhost:9200/facebook/yuhsuan_chen
{
"message": "Hello!!!",
"feel": "Good",
"location": "Taoyuan City",
"datetime": "2016-03-01T12:00:00"
}
1
2
3
4
5
6
7
{
"_index":"facebook",
"_type":"yuhsuan_chen",
"_id":"AVM7YFHl16LIHfUR_IEh", ->自動編號或是可以自己指定
"_version":1, ->修改的第幾個版本
"_shards":{
"total":2, ->有多少個副本
"successful":1,
"failed":0
},
"created":true
}
1
2
3
4
5
6
7
8
9
10
11
12

30
• Query Data
# 不指定查詢
GET http://localhost:9200/_search?
# 指定關鍵字查詢
GET http://localhost:9200/_search?q='good'
# 在特定index下查詢
GET http://localhost:9200/facebook/_search?q='good'
# 在特定type下查詢
GET http://localhost:9200/facebook/yuhsuan_chen/_search?q='good'
# 尋找message這欄位內包含hello的資料
GET http://localhost:9200/facebook/_search?q=message:hello
# 尋找包含datetime這欄位，且其他資料內包含hello
GET http://localhost:9200/facebook/_search?_field=datetime&q=hello
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

31
• Delete Data
# 根據id刪除資料
DELETE http://localhost:9200/facebook/yuhsuan_chen/1
# 刪除index
DELETE http://localhost:9200/facebook
# 刪除type
在2.2版之後已不支援刪除type，只能刪除index
https://goo.gl/znbOuB
1
2
3
4
5
6
7
8
9

32
• Update Data
# 直接更新整個文件
POST http://localhost:9200/facebook/yuhsuan_chen/1
{
"message": "Hello!!!",
"feel": "Good",
"location": "Taoyuan City",
"datetime": "2016-03-01T12:00:00"
}
# 更新文件內的某一個資料欄位
POST http://localhost:9200/facebook/yuhsuan_chen/1/_update
{
"doc": {
"message": "haha"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

33
• Bulk
– 如果POST link中已經包含index及type，在輸入內容的時候可以不用指定
– 如果需要自己指定，則需定義index及type
– id可以不指定，但是delete的時候還是必須知道id才有辦法刪除
要注意很重要的一點，最後結尾一定要加上換行字元!!!!
# 新增資料
POST http://localhost:9200/facebook/yuhsuan_chen/_bulk
{"create":{"_id":"1"}}
{"message":"Hello!!!","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-
01T12:00:00"}
{"create":{"_index":"facebook","_type":"Other","_id":"2"}}
{"message": "The weather is so good.","feel": "Comfortable","location": "Taoyuan City",
"datetime": "2016-03-01T12:01:00"}
{"create":{"_id":"3"}}
{"message": "I like my job.","feel": "bad","location": "Taoyuan City", "datetime":
"2016-03-01T12:02:00"}
{"create":{}}
{"message": "Time for launch.","feel": "hungry","location": "Taoyuan City", "datetime":
"2016-03-01T12:03:00"}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

34
• Bulk
要注意很重要的一點，最後結尾一定要加上換行字元!!!!
顯示剛剛操作的最後結果
# 新增三筆資料，第一筆location修改為台北市，刪除第二筆資料
POST http://localhost:9200/facebook/_bulk
{"create":{"_type":"test","_id":"1"}}
{"message":"test1","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-
01T12:00:00"}
01T12:00:00"}
01T12:00:00"}
{"update":{"_type":"test","_id":"1"}}
{"doc": { "location": "Taipei City"}}
{"delete":{"_type":"test","_id":"2"}}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET http://localhost:9200/facebook/_search?q=_type:test1

35
• Import Data
– Syntax
– Example
curl -XPOST localhost:9200/_bulk --data-binary @file.json1
curl -POST "http://yuhsuan_chen_w7p:9200/_bulk" --data-binary @C:UsersYuhsuan_chenDesktopshakespeare.json1

36
Query DSL in Elasticsearch
• 特別注意
如果使用linux的curl套件，可以直接使用GET進行參數傳遞
如果使用windows的curl套件，會對字串的判別上有問題，尤其是單引號與雙引號，因此不建議使用
windows版本的curl。在windows上建議直接使用POSTMAN進行查詢
如果使用套件進行操作，因為大多數的http相關的library都不支援GET帶參數，所以建議使用POST操
作
IETF的相關文件內也沒特別規定GET是否可以帶參數，但是Elasticsearch開發人員覺得GET比較合理
，但實際上查詢時還是使用POST降低問題發生
http://tools.ietf.org/html/rfc7231#section-4.3.1
# Create test data
POST http://localhost:9200/facebook/yuhsuan_chen/_bulk
{"create":{"_id":"1"}}
{"message":"Hello!!!","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-
01T12:00:00","temp":25.6}
{"create":{"_index":"facebook","_type":"Other","_id":"2"}}
{"message": "The weather is so good.","feel": "Comfortable","location": "Taipei City",
"datetime": "2016-03-01T12:01:00","temp":27.3}
{"create":{"_id":"3"}}
{"message": "I like my job.","feel": "bad","location": "Taoyuan City", "datetime":
"2016-03-01T12:02:00"}
{"create":{}}
{"message": "Time for launch.","feel": "hungry","location": "Taoyuan City", "datetime":
"2016-03-01T12:03:00","temp":30.3}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

37
• Query All Data
• Query Specific Field Data
# Use match_all to query all data
POST http://localhost:9200/facebook/_search
{
"query" : {
"match_all" : {
}
}
}
1
2
3
4
5
6
7
8
# Use match to query “city” in location field
{
"query" : {
"match" : {
"location" : "city"
}
}
}
1
2
3
4
5
6
7
8
9

38
• Query Multi Data
• Query Value
# Find location/message field include hello/good
{
"query" : {
"multi_match" : {
"query": ["hello","good"],
"fields": ["location","message"]
}
}
}
1
2
3
4
5
6
7
8
9
10
# Find the temp >= 27, gt -> 大於，gte ->大於等於，lt ->小於，lte ->小於等於
POST http://localhost:9200/facebook/yuhsuan_chen/_search
{
"query" : {
"range" : {
"temp":{
"gte":27
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11

39
• Query Data By Regular Expression
• Order Data
# Find the message can be match “fo.*” or “he.*”
{
"query" : {
"regexp" : {
"message":"fo.*|he.*"
}
}
}
1
2
3
4
5
6
7
8
9
# Query all data, order _type by desc, order _id by asc
{
"query" : {
"match_all" : {}
},
"sort":[
{"_type" : {"order":"desc"}},
{"_id" : {"order":"asc"}}
]
}
1
2
3
4
5
6
7
8
9
10
11

40
• Query Condition
– must代表必須符合
– must_not代表必不能符合
– should可以加重查詢比重
– 至少要有一個should,但是如果有must / must_not就可以不用should
# Message中必須包含”.*i.*”, 而且Message不含”good”, 如果feel中有包含bad則加重排名
{
"query":{
"bool":{
"must": {"regexp":{"message":".*i.*"}},
"must_not": {"regexp":{"message":"good"}},
"should":{"match":{"feel":"bad"}}
}
}
}
1
2
3
4
5
6
7
8
9
10
11

41
• Query by “bool”
包含should，_id:3排名會在前面不包含should，_id:3排名較低

42
Kibana
Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.
You use Kibana to search, view, and interact with data stored in Elasticsearch indices
• Flexible analytics and visualization platform
• Real-time summary and charting of streaming data
• Intuitive interface for a variety of users
• Instant sharing and embedding of dashboards
• Configure Kibana & Elasticsearch

43
Kibana
• Set Configuration
• YML
sudo gedit /opt/kibana/config/kibana.yml1
# configkibana.yml
server.host: "yuhsuan_chen_w7p.htctaoyuan.htc.com.tw"
elasticsearch.url: "http://yuhsuan_chen_w7p.htctaoyuan.htc.com.tw:9200"
1
2
3

44
Kibana
• Download Sample Files
https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true
https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
• Create Index
# Index: shakespeare
curl -XPUT http://localhost:9200/shakespeare
{
"mappings" : {
"_default_" : {
"properties" : {
"speaker" : {"type": "string", "index" : "not_analyzed" },
"play_name" : {"type": "string", "index" : "not_analyzed" },
"line_id" : { "type" : "integer" },
"speech_number" : { "type" : "integer" }
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

45
Kibana
• Create Index
# Index: Index: logstash-{datetime}
curl -XPUT http://localhost:9200/logstash-2015.05.18 #19 20
{
"mappings": {
"log": {
"properties": {
"geo": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
}
}
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

46
Kibana
• Import Data
• Check All Indices
• Connect to Kibana
http://localhost:5601
如果有設定Nginx的自動導向可以不用輸入port
剛建立起來第一次會需要一點時間讓Kibana初始化
如果沒有跳到設定就自己點選setting
# Import Data
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl
1
2
3
4
# Check indices
curl 'localhost:9200/_cat/indices?v'
1
2

47
Kibana
如果有設定Nginx的自動導向可以不用輸入port
剛建立起來第一次會需要一點時間讓Kibana初始化
如果沒有跳到設定就自己點選setting
這裡要建立三個index給Kibana可以快速搜尋
logstash-*
ba*
shakes*

48
Kibana
這代表著當我們要搜尋Kibana時，可以先行過濾要抓取哪個Elasticsearch的index
我們可以看到所有相關的欄位屬性資料，也可以從這邊來做調整
回到Discover的部分，選擇ba*的index，搜尋帳號編號<100且資產為>47500的資料，全部總共有五筆
如果我們想要針對其他欄位做資料選取，可以點選左邊avaliable fields -> add就可以看到我們篩選過後的
欄位資料
account_number:<100 AND balance:>475001

49
Kibana
• 建立資產年齡分布圓餅圖
選擇Visualize -> Pie chart -> From a new search -> Select an index pattern -> ba*
選擇buckets -> Split Slices -> Aggregation -> Range -> Field -> balance ->輸入範圍數值

50
Kibana
接著新增一個 Split Slices -> Aggregation -> Terms-> Field -> age -> Order By -> metric: Count ->
Order -> Descending -> Size -> 5
最後按下上方的綠色三角已產生我們要的圖表資料
如果要儲存可以選擇右上方的儲存

51
Kibana
• 建立Dashboard
選擇以儲存的圖表就會顯示在Dashboard上，可以自由調整大小，位置等

52
Nginx
• Nginx port forwarding

53
ELK parse system log on Ubuntu
• How to set ELK on Ubuntu

Introduction to ELK

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to ELK

Similar to Introduction to ELK (20)

Recently uploaded

Recently uploaded (20)

Introduction to ELK

Editor's Notes