KDDI - OpenStack Summit 2016/Red Hat NFV Mini Summit
1. KDDI Research Inc. Proprietary and Confidential
Troubles prediction and detection based
on Distributed Monitoring & Analytics
framework
Yuki Kasuya <yu-kasuya@kddi-research.jp>
KDDI Research
2. KDDI Research Inc. Proprietary and Confidential
Agenda
1
2
3
4
Motivation / Problem
Solution / Architecture
Use case
Conclusion
2
4. KDDI Research Inc. Proprietary and Confidential
Remote
Operation
Center
Development
Division
nRequire reliability
nMany operators needed
nHigh cost operation
l change to low cost
4
Current operation style
Data Center
24hours / 365days
HW replacement
daytime
software bug
5. KDDI Research Inc. Proprietary and Confidential
Driver: Changing operation style
Reactive
Operation
Periodic
Operation
Proactive
Operation
Cost Reduction
Agility
Proactive
Reactive
24/7
maintenance
9am -‐‑‒ 5pm
maintenance
Automation
5
before broken,
do prevention
6. KDDI Research Inc. Proprietary and Confidential
What is key point?
n Auto healing process
1. Fault detection by monitoring system
1. Recovery plan by OSS/Orchestrator
1. Auto healing by Orchestrator
For fast recovery, real-time fault detection / prediction is key point.
6
7. KDDI Research Inc. Proprietary and Confidential
Problems for proactive operation
NFVI NFVI
VMVNF
pollerpoller
verbose
data
verbose
data
notifier
evaluator
collector
DB
n Centralized monitoring architecture
l Difficult to real-time(fine-grained) monitoring
l High load to collect a lot of data
n Generally, delay of collecting data
affects several area.
Now, the time has come to consider to enhance the architecture.
7
delay
8. KDDI Research Inc. Proprietary and Confidential
2. Solution / Architecture
9. KDDI Research Inc. Proprietary and Confidential
n Distribute each function into computing nodes
l Monitoring process is complete
in each computing node
l Real-time(Fine-grained) monitoring
l Scale with the number of
computing nodes
Distributed Monitoring and Analytics (DMA)
NFVI NFVI
VMVNF
notifier
evaluator
collector
analyzer
evaluator
collector
analyzer
evaluator
collector
DB
DB DB
analytics
result
concise
data
pollerpoller
9
10. KDDI Research Inc. Proprietary and Confidential
Architecture detail
Poller/Notifica
tion
libvirt
API
SNMP Get
SNMP Trap
Collector
Database
Meter
Translat
or
Evaluator
Analytics
Engine
Fault
Detection
(Predictio
n)
Statistical Analysis
Alarm Correlation
Transmitte
r
Fault
Alarm
Statistical
Data
OpenStac
k
API
...
Perf.
Data
Perf.
data
(CPU/Memory...)
Guest.
(Alarm)
Fault
data
Perf.
data
(CPU/Memory...)
Host.
(Alarm)
Fault
data
Fault
Data
...
NFVI
10
11. KDDI Research Inc. Proprietary and Confidential
Centralized Distributed
Fault ✔
✔
(predict, silent)
Account ✔
Performance
✔
(Micro burst)
Target area
n Centralized
l cloud platform
n Distributed
l NFV
l carrier grade network
11
13. KDDI Research Inc. Proprietary and Confidential
Use Case 1: Prediction using machine learning
T. Niwa et al., “Universal Fault Detection for NFV using SOM-‐‑‒based Clustering,” APNOMS 2015.
Distributed
machine
learning
13
Demo
@MWC2016
14. KDDI Research Inc. Proprietary and Confidential
Use Case 2: Detect micro burst traffic
Finer performance data processing
M. Miyazawa et al., “In-‐‑‒network real-‐‑‒time performance monitoring with distributed event processing,” NOMS 2014.
• Centralized approach cannot detect the bursts.
• DMA can achieve x1000 finer monitoring.
-‐‑‒ -‐‑‒
8
-‐‑‒ -‐‑‒
5
-‐‑‒B75 1 B A 1A1
1 5 8 A5 C1
B A
A 1 8
051 A8 5 5 8 7 8 5AD
A8 8 1A8 8 55 5
Time%(sec)
0
20
40
60
80
100
0 20 40 60 80 100 120 140
Bandwidth%Utilization%(%)
5 A 1 8 5 1 175 5 A
8 A 82BA5 1 175 5 A
C5 H
.8 2B A H
70
18Mbps
DMA approach
Centralized approach
14
16. KDDI Research Inc. Proprietary and Confidential
nChange to proactive / low cost operation
nFault detection / prediction is key point
nDistributed Monitoring and Analytics is
suitable for proactive operation
Conclusion
16
17. KDDI Research Inc. Proprietary and Confidential
nStart to discuss how to integrate DMA to OpenStack
Join us
17