4. Copyright 2016 FUJITSU LIMITED
Demo System Configuration
Master server Visualization
server
OS
Elastic
Search
Apache
(httpd)
Kibana
JDK
OS
collection/store
definition
Hadoop
Sparkfluentd
RabbitMQ
Parallel distributed
processing platform
process
definition
Stream
process
SparkStreaming/
SparkSQL
Data
converter
Target
server
#1
OS
fluentd
collection
definition fluentd
collection/store
definition
Slave server
#3
Spark
OS
Hadoop
JDK
JDK
Data collection target
Slave server
#2
Spark
OS
Hadoop
JDK
Slave server
#1
Spark
OS
Hadoop
JDK
Batch
process
Task
controller
Target
server
#2
OS
fluentd
collection
definition
Target server
#n
OS
fluentd
collection
definition
3
5. Copyright 2016 FUJITSU LIMITED
Parallel distributed processing platform
Apache Spark(Core)
SparkSQL
(SQL query)
SparkStreaming
(Event stream
processing)
Parallel distributed processing platform
Job
Definition
(XML)
RabbitMQ
(Message
broker)Fluentd
(Data
collector)
HDFS
(Distributed File System )
ElasticSearch
(Real time search engine)
Kibana
(Data
visualization)
Stream
data
reception
Data
process
with SQL
Create
time-series
data
Analysis
process
Ex. “stream data analysis” in the anomaly detection process
Enable to execute Stream process and Batch process
Fast-acting data conversion based on XML-based Job
Definition
4
6. Copyright 2016 FUJITSU LIMITED
Ex. Batch process
Parallel distributed processing platform
Job definition (XML)
TASK:1
Read “master data”
SparkBatch
Application
TASK:2
Read “Web access log”
Web access log
Analysis
TASK:3
Query and Save
Spark Cluster
HDFS
HDFS
Analyze a lot of Web access log on file system
5
7. Copyright 2016 FUJITSU LIMITED
Ex. Stream process
Parallel distributed processing platform
Job definition (XML)
RabbitMQ
Receiver
RabbitMQ
TASK:1
Process and store
the CPU information
HDFS
Spark
Streaming
Application
TASK:2
Process and store
the MEM information
Analysis
Target server
Analyze statistics information (CPU/MEM) in real-time
6
8. Copyright 2016 FUJITSU LIMITED
Findings/Problems from POC
Needs manpower for data collection on target servers
Have discussions with customers to define collecting data and
then configure fluentd agents (Num of POCs is limited)
Difficult to store experiences of IT analytics
Data and its format are different each customer so suitable
anomaly detection libraries are also different
Difficult to catch up for anomaly detection libraries
Rapid tech evolution for Machine Learning such as Mllib,
TensorFlow, CNTK and so on
7
9. Copyright 2016 FUJITSU LIMITED
Seems to solve two problems from POC
Needs manpower for data collection on target servers
•Monasca provides agents for OpenStack env so we just use them.
Difficult to store experiences of IT analytics
•Data come from Monasca agents and the format is stable. So we use
the data as stable input and are looking for “which libraries are
suitable for this env which is monitored by Monasca”
Add a catching function to Monasca
Boosts Monasca sales
•A lot of our customers are interested in IT analytics
•Fujitsu sells Monasca-based product
Why I’m interested in Monasca…
8
10. Copyright 2016 FUJITSU LIMITED
Current Concerns
Performance for real time anomaly detection (Storm vs.
ApacheStreaming)
Rapid tech evolution for Machine Learning (Needs to have plugin
arch for the libraries)
Approach (a base for discussion)
How to move Anomaly & Prediction Engine (APE) dev ahead?
Idea
•First Rebase current prototype on Monasca master (If possible, I would
like to do this with Roland’s help)
•Then use it to find out problems
Current Concerns & Approach
9