Tensorflow and python : fault detection system - PyCon Taiwan 2017
1. P Y C O N T W - 2 0 1 7
P Y C O N TA I WA N - 2 0 1 7
E R I C ( B Y U N G W O O K ) A H N
Te n s o r f l o w & P y t h o n :
F a u l t D e t e c t i o n S y s t e m
2. Who am I
Experienced
CDN
Media streaming
Device driver(windows, linux)
2015 PyCon KR
2015 PyCon HK
2016 Swift KR
2016 Tensorflow KR
2017 PyCon Taiwan
?
3. FA U LT D E T E C T I O N
L O G L O G L O G
L O G 2 M L
T O D A Y …
3 P Y C O N T W - 2 0 1 7
4. P Y C O N T W - 2 0 1 7
Currently, My company has over
200 services.
They’re using different systems,
located at different IDC.
5. P Y C O N T W - 2 0 1 7
What is a Fault Detection?
5
6. P Y C O N T W - 2 0 1 7
Before knowing Fault Detection!
We need to know what a Fault is.
6
7. P Y C O N T W - 2 0 1 7
Fault
Wikipedia : A fault is defined as an abnormal
condition or defect at the component, equipment,
or sub-system level which may lead to a failure.
7
8. P Y C O N T W - 2 0 1 7
There are many Fault
Detection System.
8
9. P Y C O N T W - 2 0 1 7
In Generally,
Usage of CPU, Memory, Disk I/O
Network bandwidth
System Log, Application Log
JVM monitoring
URI check
…
9
10. P Y C O N T W - 2 0 1 7
PRTG
Cacti
zabbix
L4/L7(Commercial Product)
Log and Process monitoring ( Commercial Product)
DB Monitoring ( Open source, Commercial Product)
Application Performance Monitoring ( APM ) for JVM
ElasticSearch
Hadoop
grafana
kibana
…
10
11. P Y C O N T W - 2 0 1 7
Many views, Charts, and Alarm system in a IDC Center
11
12. P Y C O N T W - 2 0 1 7
I would like to detect a Fault
with ML.
12
14. P Y C O N T W - 2 0 1 7
Log format
apache
squid
custom log format
..
14
15. P Y C O N T W - 2 0 1 7
Type of log log filename daemon description
kernel log /dev/console kernel Console log
system log /var/log/messages syslogd
security log /var/log/secure xinetd
mail log /var/log/maillog sendmail Sendmail
cron log /var/log/cron crowd
booting log /var/log/boot.log kernel
kernel boot log /var/dmesg kernel
kernel log /var/log/wtmp kernel Record a system login totally
kernel log /var/log/utmp kernel Record current login, ip ..
ftp log /var/log/xferlog ftpd
http log /var/log/httpd/access.log httpd
http error log /var/log/httpd/error.log httpd
named log /var/log/named.log named
Application log depends on application app daemon
15
16. P Y C O N T W - 2 0 1 7
Example :
system log
( /var/log/messages )
16
17. P Y C O N T W - 2 0 1 7
May 14 03:43:01 web01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1673" x-info="http://www.rsyslog.com"] rsyslogd was
HUPed
May 14 04:40:01 web01 ntpdate[6617]: the NTP socket is in use, exiting
May 14 05:40:01 web01 ntpdate[8437]: the NTP socket is in use, exiting
May 14 06:40:01 web01 ntpdate[10212]: the NTP socket is in use, exiting
May 14 07:40:01 web01 ntpdate[12315]: the NTP socket is in use, exiting
May 14 08:40:01 web01 ntpdate[14090]: the NTP socket is in use, exiting
May 14 15:40:01 web01 ntpdate[27169]: the NTP socket is in use, exiting
May 14 16:40:01 web01 ntpdate[28940]: the NTP socket is in use, exiting
May 14 17:40:01 web01 ntpdate[30706]: the NTP socket is in use, exiting
May 14 18:40:01 web01 ntpdate[32818]: the NTP socket is in use, exiting
May 14 19:40:01 web01 ntpdate[34583]: the NTP socket is in use, exiting
Mar 15 10:45:58 web01 oddjobd: oddjobd shutdown succeeded
Mar 15 10:45:59 web01 sshd[1153]: Received signal 15; terminating.
Mar 15 10:45:59 web01 snmpd[1139]: Received TERM or STOP signal... shutting down...
Mar 15 10:45:59 web01 snmpd[1139]: snmpd: send_trap: Failure in sendto (Network is unreachable)
Mar 15 10:45:59 web01 xinetd[1164]: Exiting...
Mar 15 10:45:59 web01 ntpd[1175]: ntpd exiting on signal 15
Mar 15 10:45:59 web01 init: Disconnected from system bus
Mar 15 10:45:59 web01 console-kit-daemon[37526]: WARNING: no sender#012
Mar 15 10:45:59 web01 nslcd[1073]: caught signal SIGTERM (15), shutting down
Mar 15 10:45:59 web01 nslcd[1073]: version 0.7.5 bailing out
Mar 15 10:45:59 web01 kernel: Kernel logging (proc) stopped.
May 16 03:40:01 web01 ntpdate[49591]: the NTP socket is in use, exiting
May 16 04:40:01 web01 ntpdate[51553]: the NTP socket is in use, exiting
May 16 05:40:01 web01 ntpdate[53365]: the NTP socket is in use, exiting
May 16 11:40:01 web01 ntpdate[64664]: the NTP socket is in use, exiting
May 16 12:40:01 web01 ntpdate[1211]: the NTP socket is in use, exiting
May 16 13:29:40 web01 sshd[2748]: Did not receive identification string from 10.40.133.188
May 16 13:29:45 web01 sshd[2749]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.40.133.188
user=pycon
May 16 13:29:45 web01 sshd[2749]: Accepted password for pycon from 10.40.133.188 port 57463 ssh2
May 16 13:29:45 web01 sshd[2749]: pam_unix(sshd:session): session opened for user pycon by (uid=0)
May 16 13:29:49 web01 su: pam_unix(su-l:session): session opened for user root by pycon(uid=9000709)
May 16 13:31:24 web01 sshd[2749]: pam_unix(sshd:session): session closed for user pycon
17
19. P Y C O N T W - 2 0 1 7
Log data is also natural language.
The sequence of words and
expressions is important
sequential data.
19
20. P Y C O N T W - 2 0 1 7
Machine Learning
Supervided learing
vs
unsupervised learning
Binary classification
vs
Multi-label classification
Sentences Clustering
Topic modeling
word2vec, doc2vec,
paragraph2vec
Sentiment Analysis
CNN
RNN
20
21. P Y C O N T W - 2 0 1 7
As you know, CNN is an
architecture to process
for image classification.
21
22. P Y C O N T W - 2 0 1 7
ref(2) : “cs231n : Covolutional Neural Networks for Visual Recognition”, Stanford
A regular 3-layer Neural Network.
ConvNet Architecture
22
23. P Y C O N T W - 2 0 1 7
What is a Conv layer
compute the output of neurons that are connected to local
regions in the input, each computing a dot product between
their weights and a small region they are connected to in the
input volume.
23
24. P Y C O N T W - 2 0 1 7
0 0 1 1 0
0 1 1 1 0
0 0 1 1 0
0 0 1 1 0
0 1 1 1 1
3 3 3
2 4 3
2 4 4
Original Image
Convolved Feature
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
24
25. P Y C O N T W - 2 0 1 7
3
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0x1 0x0 1x1 1 0
0x0 1x1 1x0 1 0
0x1 0x0 1x1 1 0
0 0 1 1 0
0 1 1 1 1
25
26. P Y C O N T W - 2 0 1 7
3 3
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0x1 1x0 1x1 0
0 1x0 1x1 1x0 0
0 0x1 1x0 1x1 0
0 0 1 1 0
0 1 1 1 1
26
27. P Y C O N T W - 2 0 1 7
3 3 3
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1x1 1x0 0x1
0 1 1x0 1x1 0x0
0 0 1x1 1x0 0x1
0 0 1 1 0
0 1 1 1 1
27
28. P Y C O N T W - 2 0 1 7
3 3 3
2
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1 1 0
0x1 1x0 1x1 1 0
0x0 0x1 1x0 1 0
0x1 0x0 1x1 1 0
0 1 1 1 1
28
29. P Y C O N T W - 2 0 1 7
3 3 3
2 4
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1 1 0
0 1x1 1x0 1x1 0
0 0x0 1x1 1x0 0
0 0x1 1x0 1x1 0
0 1 1 1 1
29
30. P Y C O N T W - 2 0 1 7
3 3 3
2 4 3
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1 1 0
0 1 1x1 1x0 0x1
0 0 1x0 1x1 0x0
0 0 1x1 1x0 0x1
0 1 1 1 1
30
31. P Y C O N T W - 2 0 1 7
3 3 3
2 4 3
2
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1 1 0
0 1 1 1 0
0x1 0x0 1x1 1 0
0x0 0x1 1x0 1 0
0x1 1x0 1x1 1 1
31
32. P Y C O N T W - 2 0 1 7
3 3 3
2 4 3
2 4
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1 1 0
0 1 1 1 0
0 0x1 1x0 1x1 0
0 0x0 1x1 1x0 0
0 1x1 1x0 1x1 1
32
33. P Y C O N T W - 2 0 1 7
3 3 3
2 4 3
2 4 4
3x3 filter
x1 x0 x1
x0 x1 x0
x1 x0 x1
0 0 1 1 0
0 1 1 1 0
0 0 1x1 1x0 0x1
0 0 1x0 1x1 0x0
0 1 1x1 1x0 1x1
33
Convolved Feature
34. P Y C O N T W - 2 0 1 7
ref(2) : “cs231n : Covolutional Neural Networks for Visual Recognition”, Stanford
A regular 3-layer Neural Network.
ConvNet Architecture
34
35. P Y C O N T W - 2 0 1 7
Sentence, Paragraph,
Document?
35
36. P Y C O N T W - 2 0 1 7
Convolutional Neural Networks for Sentence Classification
ref(1) : “Convolutional Neural Networks for Sentence Classification”, 2014y
36
Using Text CNN filter
- Save a locally information of text, sequential data, and
context information
38. P Y C O N T W - 2 0 1 7
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
snmpd[1139]: Received TERM or STOP signal shutting down, fault
caught SIGTERM shutting down, fault
SIGHUP received Attempting to restart, fault
sshd[2749]: Accepted password for pycon from 10.40.133.188 port 57463 ssh2, normal
ntpdate[27169]: the NTP socket is in use exiting, normal
Messages log :
38
fault
normal
category
39. P Y C O N T W - 2 0 1 7
# To make an index for words use VocabularyProcess() function of Tensorflow
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_document_length)
x_train = np.array(list(self.vocab_processor.fit_transform(x_train)))
log example
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
39
40. P Y C O N T W - 2 0 1 7
embedding size
vocab size :
number of
words
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
W = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name=“W")
embedded_chars = tf.nn.embedding_lookup(W, input_x)
40
41. P Y C O N T W - 2 0 1 7
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
filter_shape = [filter_size, embedding_size, 1, num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name=“conv")
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
41
42. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
42
43. P Y C O N T W - 2 0 1 7
sliding
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
43
44. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
44
45. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
45
46. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
46
47. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
47
48. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
48
Finally, we get convolved feature
49. P Y C O N T W - 2 0 1 7
word V1 V2 V3 … Vp-2 Vp-1 Vp
snmpd[1139]:
Received
TERM
or
…
…
signal
shutting
down
filter
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
49
filter size : 4
50. P Y C O N T W - 2 0 1 7
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_length - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
padding='VALID',
name=“pool")
# Combine all the pooled features
num_filters_total = num_filters * len(filter_sizes)
self.h_pool = tf.concat(3, pooled_outputs)
self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
50
51. P Y C O N T W - 2 0 1 7
Input Layer
Conv. Layer
Pooling Layer
Fully-
Connected
Layer
with tf.name_scope("output"):
W = tf.get_variable(
"W",
shape=[num_filters_total, num_classes],
initializer=tf.contrib.layers.xavier_initializer())
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
l2_loss += tf.nn.l2_loss(W)
l2_loss += tf.nn.l2_loss(b)
self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
self.predictions = tf.argmax(self.scores, 1, name="predictions")
51
52. P Y C O N T W - 2 0 1 7
Convolutional Neural Networks for Sentence Classification
52
53. P Y C O N T W - 2 0 1 7
$ tensorboard --logdir ./runs/1496236844/summaries
53
55. P Y C O N T W - 2 0 1 7
End
Reference
(1) “Convolutional Neural Networks for Sentence Classification”, Yoon Kim, 2014y
(2) “cs231n : Covolutional Neural Networks for Visual Recognition”, Stanford
55