ELK stack at weibo.com

real-time log search &
analysis
ELKstack@weibo.com

about me
• Perler, SA @ weibo.com, renren.com,
china.com...
• Writer of 《网站运维技术与实践》
• Translator of 《 Puppet 3 Cookbook 》
• weibo account ： @ARGV

agenda
• ELKstack situation
• ELKstack usecase
• from ELK to ERK
• performance tuning of LERK

ERK situation
• datanode * 26:
• 2.4Ghz*8, 42G, 300G *10 RAID5
• logtype * 25 ， 7days ， 65 billion events ， 60k fields
• size 8TB /day ， indexing 190k eps
• rsyslog/logstash * 10
• custom plugins of rsyslog/logstash/kibana
• user ： qa team, app/server dev team, are team
• ops ： ME*0.8

kopf
stats monitor & setting modify

zabbix trapper
monitor and alert KPI of ELK

First, what can log do?
• Identify problem
• data-driven develop/test/operate
• audit
• Laws of Marcus J. Ranum
• Monitor
• Monitoring is the aggregation of health and performance data, events,
and relationships delivered via an interface that provides an holistic view
of a system's state to better understand and address failure scenarios.
@etsy

difficulties of LA(1)
• timestamp + data = log
• OK, what happened between 23:12 and 23:29
yesterday?

•text is un-structured data

•grep/awk only run at single host

• 格式复杂不方便可视化效果

So...
• We need a
real-time big-
data search
platform.
• But, splunk is
expensive.
• So, spell OSS
pls.

Hello World
# bin/logstash -e
‘input{stdin{}}output{stdout{codec=>rubyd
ebug}}’
Hello World
{
"message" => "Hello World",
"@version" => "1",
"@timestamp" => "2014-08-
07T10:30:59.937Z",
"host" => "raochenlindeMacBook-
Air.local",
}

How Powerful
• $ ./bin/logstash -e
‘input{generator{count=>10000
0000000}output{stdout{codec=
>dots}}}’ | pv -abt > /dev/null
• 15.1MiB 0:02:21 [ 112kiB/s]

Talk is cheap,
show me the case!

Kibana3
backend dev and ops use to identify the error of APIs and
apps

and Kibana4
ok, K4 need a pretty color bynow

after multiline codec
ops use to check php slow function stack within IDCs and
hosts

grok {
match => { "message" => "(?<datetime>d{4}/dd/dd dd:dd:dd) [(?
<errtype>w+)] S+: *d+ (?<errmsg>[^,]+), (?<errinfo>.*)$" }
}
mutate {
gsub => [ "errmsg", "too large body: d+ bytes", "too large body" ]
}
if [errinfo] {
ruby {
code => "event.append(Hash[event['errinfo'].split(', ').map{|l| l.split(': ')}])"
}
}
grok {
match => { "request" => '"%{WORD:verb} %{URIPATH:urlpath}(?:?%
{NGX_URIPARAM:urlparam})?(?: HTTP/%{NUMBER:httpversion})"' }
}
kv {
prefix => "url_"
source => "urlparam"
field_split => "&"
}
date {
locale => 'en'
match => [ "datetime", "yyyy/MM/dd HH:mm:ss" ]
}

performance tuning and troubleshooting based
on multi dimensions reports

difference tops in another time
range

app crash
app dev focus on crash stacks which system functions
were filtered out. 。

New release, Ad-hoc filter, Focus
crash

Query helper for QA and NOC,
decease MTTI for complaint

H5 devs focus on the performance
timeline of index.html

probability distribution of response
time
no more average, no more guess

compare
logstash
• Design ： multithreads + SizedQueue
• Lang ： JRuby
• Syntax ： DSL
• ENV ： jre1.7
• Queue ： rely on external system
• regexp ： ruby
• output ： java to ES
• plugin ： 182
• monitor ： NO!
rsyslog
• multithreads + mainQ
• C
• rainerscript
• within rhel6
• async queue
• ERE
• HTTP to ES
• 57
• pstats

problem of Logstash
• poor performance of Input/syslog, use input/tcp+filter/grok;
• poor performance of Filter/geoip, had developed filter/geoip2
• high CPU cost by Filter/grok, use filter/ruby with split by myself
• OOM in Input/tcp(prior 1.4.2)
• OOM in Output/elasticsearch(prior 1.5.0)
• retry in Output/elasticsearch repeat with SizedQueue in
stud(bynow)

problem of LogStash(1)
• LogStash::Inputs::Syslog
• logstash pipeline ：
• input thread
-> filterworker threads * Num
-> output thread
• But What's in Inputs::Syslog ：
• TCPServer/accept
-> client thread -> filter/grok -> filter/date
-> filterworker threads
• We need to do grok and date in only one thread!
• Pure TCPServer can processing 50k qps, but 6k after filter/grok, and then 700 after
filter/date!

• LogStash::Inputs::Syslog
• Solution:
input {
tcp { port => 514 }
}
filter {
grok { match => ["message", "%{SYSLOGLINE}"] }
syslog_pri { }
date { match => ["timestamp", "ISO8601"] }
}
• 30k eps in `logstash -w 20` testing.

problem LogStash(2)
• LogStash::Filters::Grok
• What's Grok:
• pre-define ： NUMBER d+
use %{NUMBER:score} instead (?<score>d+)
• regexp cost LOTS of CPU.

• LogStash::Filters::Grok
• solution:
• aviod grok, if you can define a separator to your log format:
filter {
ruby {
init => "@kname = ['datetime','uid','limittype','limitkey','client','clientip','request_time','url']"
code => "event.append(Hash[@kname.zip(event['message'].split('|'))])"
}
mutate {
convert => ["request_time", "float"]
}
}
• Result: cpu utils reduce about 20%

• LogStash::Filters::GeoIP
• 7k eps, even if `logstash -w 30`
• The new MaxMindDB format has a great
performance improvement. But LogStash can't
distribute it for some license reason.

• LogStash::Filters::GeoIP
• solution:
• use MaxMind::DB::Writer, change the internal
ip.db into ip.mmdb, 300MB->50MB
• JRuby can java_import maxminddb-java.
• 28k eps with LogStash::Filters::MaxMindDB

• LogStash::Outputs::Elasticsearch
• 3 bugs bynow ：
1. OOM in logstash1.4.2(ftw-0.0.39)
2. retry by Manticore(logstash1.5.0beta1) was repeat with stud in
pipeline, would cause an infinite loop of resending
3. logstash1.5.0rc1 can't record the 429 code, who knows the"got
response of . source:" mean?
• 1 and 3 were solved in the newest logstash1.5.0rc3.

• LogStash::Pipeline
• no supervisor for filterworkers. If all filter workers exception, logstash
was blocking but long live!
• If you use filter/ruby to reference `event['field']` as I introduced before,
check the field first!
if [url] {
ruby { code => "event['urlpath']=event['url'].split('?')[0]" }
}

• LogStash::Pipeline
• new event would go through the rest filter after
`yield`, but just to output thread(prior
logstash1.5.0).
• yield was used in filter-split, filter-clone

Rsyslog tuning
• action with linkedlist
• imfile with an appropriate statepresistinterval(avoid too many duplication after
restart)
• omfwd with a small rebindinterval(when target with LVS)
• an appropriate global.maxmessagesize
• an appropriate queue.size and queue.highwatermask
• recommended CEE log format, using with mmjsonparse
• separator log format can be processing with mmfields
• make the best use of rainerscript
• concat JSON strings with property replacer
• developed a rsyslog-mmdblookup for ip lookup

problem of rsyslog(1)
• I find an experimental `foreach` in rsyslog8.7, great! but when I
process my JSON array logs from apps, there are 3 bugs:
1. foreach don't judge the type of parameters;
2. action() don't copy msg but ref. If you omfwd each item in foreach,
crash...The test-suite only use omfile which is synchronous.
3. omelasticsearch has an uninitialized variable when enabled
errorfile option.
There will be a new copymsg option of action() in rsyslog8.10,
suppose to publish at May 20.

problem of rsyslog(2)
• Not so many message modification plugins.
• mmexternal could fork too many subprocess in
v8(but not in v7). And the process speed is 2k
eps!
• We had finished a new rsyslog-mmdblookup
plugin, would run in production env in May 15.

input( type=“imtcp” port=“514” )
template( name=“clientlog" type="list" ) {
constant(value="{"@timestamp":"") property(name="timereported" dateFormat="rfc3339")
constant(value="","host":"") property(name="hostname")
constant(value="",“mmdb":") property(name="!iplocation")
constant(value=",") property(name="$.line" position.from="2")
}
ruleset( name=“clientlog” ) {
action(type="mmjsonparse")
if($parsesuccess == "OK") then {
foreach ($.line in $!msgarray) {
if($.line!rtt == “-”) then {
set $.line!rtt = 0;
}
set $.line!urlpath = field($.line!url, 63, 1);
set $.line!urlargs = field($.line!url, 63, 2);
set $.line!from = "";
if ( $.line!urlargs != "***FIELD NOT FOUND***" ) then {
reset $.line!from = re_extract($.line!urlargs, "from=([0-9]+)", 0, 1, "");
} else {
unset $.line!urlargs;
}
action(type=“mmdb” key=“.line!clientip” fields=[“city”,“isp”,“country”] mmdbfile="./ip.mmdb")
action(type="omelasticsearch" server=“1.1.1.1“ bulkmode=“on“
template=“clientlog” queue.size="10000" queue.dequeuebatchsize="2000“ )
}
}
}
if ($programname startswith “mweibo_client”) then {
call clientlog
stop
}

ES tuning
• DO NOT believe the articles online!!
• DO testing use your own dataset, start from one node, one index, one shard,
zero replica.
• use unicast with a bigger fd.ping_timeout
•doc_values, doc_values, doc_values!!!
• increase the sets of gateway, recovery and allocation
• increase refresh_interval and flush_threshold_size
• increase store.throttle.max_bytes_per_sec
• upgrade to 1.5.1 at least
• scale: use max_shards_per_node
• use bulk! no multithreads client, no async
•use curator for _optimize
• no _all for fixed format log

problem of ES(1)
• OOM:
• Kibana3 use facet_filter, which means lots of hits in
QUERY phase.
• There is circuit breaker in new version. So you may watch
the following errors:
Data too large, data for field [@timestamp] would be larger than
limit of[639015321/609.4mb]]

problem of ES(1)
• OOM:
• solution:
• doc_values,doc_values,doc_values!
• No more heap needed, 31GB is enough.

ES 稳定性问题 (2)
• long long down time when relocation and recovery.
• default strategy:
• recovery immediately after restart
• only one shard relocation one time
• limit 20MB
• replica need to copy all files from primary shard!

• long long down time when relocation and recovery.
• solution:
• gateway.*: recovery after cluster has enough nodes
• cluster.routing.allocation.*: larger concurrent
• indices.recovery.*: larger limit
• red to yellow: 20 min for full restart.
• Note: there is a bug may cause the recovery process blocking in translog phase.(prior
1.5.1)

problem of ES(3)
• new nodes die.
• default strategy of shard allocation:
• try to balance the total shards number per node.
• no new shard if over 90% disk.
• The second day of scaling, all new shards would be
allocated to the new node! That mean all indexing
load.

• new nodes die.
• solution:
1. finish relocation before the creation of next new index.
2. set index.routing.allocation.total_shards_per_node
• note1: pls set a little larger value, in case of recovery for
fault...
• note2: DO NOT set this to old indices, your new node is
busy now.

problem of ES(4)
• async replica
• cpu util% would be rising violently if one segment has some deviation, async do NOT
validate the indexing data.
• ES will delete such async parameter.

ES performance(1)
• 429, 429, 429...
• length of one "client_net_fatal_error" logline may
target than 1MB.
• the max HTTP body of ES is 100MB. Be careful
with bulk_size.

ES performance(2)
• index size is several times larger than raw message size.
• _source: raw JSON
• _all: terms in every fields, for full text searching
• multi-field: .raw for all fields in logstash template
• So:
• no _all for nginx accesslog.
• no _source for metrics tsdb log.
• now analyzed fields for most fields, only analyzed for raw message.

ES performance(3)
• always CPU utils% for segment merge(hot threads forever).
• max segment: 5GB
• min segment: 2MB
• increase: refresh(1s)/flush(200MB)_interval 。

cluster.name: es1003
cluster.routing.allocation.node_initial_primaries_recoveries: 30
cluster.routing.allocation.node_concurrent_recoveries: 5
cluster.routing.allocation.cluster_concurrent_rebalance: 5
cluster.routing.allocation.enable: all
node.name: esnode001
node.master: false
node.data: data
node.max_local_storage_nodes: 1
index.routing.allocation.total_shards_per_node : 3
index.merge.scheduler.max_thread_count: 1
index.refresh_interval: 30s
index.number_of_shards: 26
index.number_of_replicas: 1
index.translog.flush_threshold_size : 5000mb
index.translog.flush_threshold_ops: 50000
index.search.slowlog.threshold.query.warn: 30s
index.search.slowlog.threshold.fetch.warn: 1s
index.indexing.slowlog.threshold.index.warn: 10s
indices.store.throttle.max_bytes_per_sec: 1000mb
indices.cache.filter.size: 10%
indices.fielddata.cache.size: 10%
indices.recovery.max_bytes_per_sec: 2gb
indices.recovery.concurrent_streams: 30
path.data: /data1/elasticsearch/data
path.logs: /data1/elasticsearch/logs
bootstrap.mlockall: true
http.max_content_length: 400mb
http.enabled: true
http.cors.enabled: true
http.cors.allow-origin: "*"
gateway.type: local
gateway.recover_after_nodes: 30
gateway.recover_after_time: 5m
gateway.expected_nodes: 30
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.timeout: 100s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.19.0.97","10.19.0.98","10.19.0.99"]
monitor.jvm.gc.young.warn: 1000ms
monitor.jvm.gc.old.warn: 10s
monitor.jvm.gc.old.info: 5s
monitor.jvm.gc.old.debug: 2s

problem of ES(1)
• different result in search and store:
curl es.domain.com:9200/logstash-accesslog-2015.04.03/nginx/_search?q=_id:AUx-
QvSBS-dhpiB8_1f1&pretty -d '{
"fields": ["requestTime"],
"script_fields" : {
"test1" : {
"script" : "doc["requestTime"].value"
},
"test2" : {
"script" : "_source.requestTime"
},
"test3" : {
"script" : "doc["requestTime"].value * 1000"
}
}
}'

NOT schema free!
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-accesslog-2015.04.03",
"_type" : "nginx",
"_id" : "AUx-QvSBS-dhpiB8_1f1",
"_score" : 1.0,
"fields" : {
"test1" : [ 4603039107142836552 ],
"test3" : [ -8646911284551352000 ],
"requestTime" : [ 0.54 ],
"test2" : [ 0.54 ],
}
} ]
}

problem of ES(2)
• some data can't be found!
• ES need the same mapping type with the same field name in
the same _type of same index.
• My "client_net_fatal_error" log data was changed after one
release:
• {"reqhdr":{"Host":"api.weibo.cn"}}
• {"reqhdr":"{"Host":"api.weibo.cn"}"}
• Set the mapping of "reqhdr" object to {"enabled":false}. the
string can only be watched in _sourceJSON, but not searched.

problem of ES(3)
•some data can't be found! Again!
•There was a default setting `ignore_above:256` in logstash template.
curl 10.19.0.100:9200/logstash-mweibo-2015.05.18/mweibo_client_crash/_search?q=_id:AU1ltyTCQC8tD04iYBIe&pretty
-d '{
"fielddata_fields" : ["jsoncontent.content", "jsoncontent.platform"],
"fields" : ["jsoncontent.content","jsoncontent.platform"]
}'
...
"fields" : {
"jsoncontent.content" : [ "dalvik.system.NativeStart.main(Native Method)nCaused by:
java.lang.ClassNotFoundException: Didn't find class "com.sina.weibo.hc.tracking.manager.TrackingService" on path:
DexPathList[[zip file "/data/app/com.sina.weibo-1.apk", zip file "/data/data/com.sina.weibo/code_cache/secondary-
dexes/com.sina.weibo-1.apk.classes2.zip", zip
file "/data/data/com.sina.weibo/app_dex/dbcf1705b9ffbc30ec98d1a76ada120909.jar"],nativeLibraryDirectories=[/data/a
pp-lib/com.sina.weibo-1, /vendor/lib, /system/lib]]" ],
"jsoncontent.platform" : [ "Android_4.4.4_MX4 Pro_Weibo_5.3.0 Beta_WIFI", "Android_4.4.4_MX4 Pro_Weibo_5.3.0
Beta_WIFI" ]
}

kibana custom develop
• upgrade the elastic.js version in K3 to support the API of ES1.2. Then
we can use aggs API to implement new panels(percentile panel,
range panel, and cardinality histogram panel).
• "export as csv" for table panel.
• map provider setting for bettermap.
• term_stats for map.
• china map.
• query helper.
• script field for terms panel.
• OR filtering.
• more in <https://github.com/chenryn/kibana>

see also
•《 Elasticsearch Server(2 edition) 》
•《 Logging and Log Management the Authoritative Guide to Understanding the Concepts Surrounding
Logging and Log Management 》
•《 Data Analysis with Open Source Tools 》
•《 Web Operations: Keeping the data on time 》
•《 The Art of Capacity Planning 》
•《大规模 Web 服务开发技术》
•https://codeascraft.com/
•http://calendar.perfplanet.com
•http://kibana.logstash.es

–JordanSissel@logstash.net
“If a newbie has a bad time, it's a bug.”

ELK stack at weibo.com

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ELK stack at weibo.com

Similar to ELK stack at weibo.com (20)

More from 琛琳饶

More from 琛琳饶 (8)

Recently uploaded

Recently uploaded (20)