1) Hatohol is a server that collects and merges data from Zabbix and Nagios servers. It has a web-based client for visualizing this data.
2) The Hatohol server architecture pulls data from Zabbix and Nagios using APIs and stores it in a unified database. The server also has a REST API for the client.
3) Future plans for Hatohol include adding an action framework to allow it to take actions based on triggers, improving high availability, adding graphing capabilities, and a more sophisticated web client.
4. Hatohol Server
An image of Hatohol Server
ZABBIX
Server
Naigos
Naigos
Merged
Data
ZABBIX
Server
3 / 13
5. Input and Output Intarface of Hatohol Server
Hatohol Server
ZABBIX
Server
Naigos
Merged
Data
ZABBIX API
(json-rpc)
REST + json
(HTTP)
MySQL DB
by NDOUtils
MySQL
Protocol
Other clients can
be connected.
e.g. Android, iOS, and
native Apps.
4 / 13
6. Internal data flow (rough sketch)
Hatohol Server
ArmZabbixAPI
ArmNagios
NDOUtils
FaceRest
UnifiedData
Store
DBClient
Zabbix
cache
cache
5 / 13
7. In terms of Database
Hatohol
Server
Zabbix
Server
Nagios
NDOUtils
cache cache
config.
log MySQL
MySQL,
PosgreSQL,
...
MySQL,
PosgreSQL
SQLite3
- Easy use, light, and fast for
the use from one app.
- No need to make backup.
- Important (we cannot lost)
- Possible: HA configuration
6 / 13
8. In terms of Threads
ArmZabbixAPI
ArmNagios
NDOUtils
FaceRest
UnifiedData
Store
DBClient
Zabbix
main()
GLIB event loop
7 / 13
10. Data from ZABBIX
● Call ZABBIX API during periodic intervals
(Polling)
○ Triggers
Current status: over/under threshold ?
○ Events
History of trigger status change
○ Items (all items of all hosts)
■ on-demand taking ver. has been
developed
(turned off by default)
9 / 13
11. Data from Nagios (NDOUtils)
ndo2db
nagios
MySQL
ndomod
ArmNDOUtils
Hatohol Server
mysqlclient
doesn’t have
save feature
Another
process
Broker
module
10 / 13
12. Data from Nagios (NDOUtils)
● SQL statement during periodic intervals
(Polling)
○ Triggers (servicestatus)
○ Events (statehistory)
○ Items (servicestatus)
■ on-demand taking ver. has Not been
developed
11 / 13
14. Development language
● C++
○ I like it and low-level programming.
○ Fast
○ Flexibility
■ Can manage memory, threads, etc.
■ Fine control
● user system call
● libraries (GLIB, libsoup…)
● system-specific feature
13 / 13
22. Hatohol
Client
①
②
HTTP
REST API on HTTP
・html, css
・js
・image
・json
・browse page:①
・exec js usercode
・$getJSON():②
・draw data
DATA STREAM around BROWSER
Django R.Proxy
(impl w/ Django)
Server
8 / 15
23. Hatohol
Client
apache R.Proxy
①
②
HTTP
REST API on HTTP
・json
・browse page:①
・exec js usercode
・$getJSON():②
・draw data
DATA STREAM around BROWSER / minimize
Server
be static
pre-compiled
・html, css
・js
・image
9 / 15
24. PRIMARY ELEMENTS in USERCODE
html objects:
- table
variables:
- raw data (JSON, just returned by Server)
- parsed data
functions:
- parseData()
- drawData()
parseData()
drawData()
raw
data
parsed
data
<table>
:
</table>
$(“#table”).empty()
$(“#table”).append(...);
10 / 15
25. SAMPLE of RAW DATA (events_ajax.html)
{
apiVersion : 1,
result : true,
numberOfEvents : 2229,
events :
[
{ serverId: 1, hostId: 10084, triggerId: 13520, time: 1368496195,
type: 0, status: 0, severity: 2, brief: "Free disk space is less than 20%" },
{ serverId: 1, hostId: 10084, triggerId: 13498, time: 1368496761,
type: 1, status: 0, severity: 2, brief: "Disk I/O is overloaded" },
{ serverId: 2, hostId: 10061, triggerId: 13681, time: 1359087529,
type: 2, status: 0, severity: 3, brief: "FTP server is down" },
{ serverId: 2, hostId: 10061, triggerId: 13682, time: 1359087530,
type: 2, status: 1, severity: 3, brief: "WEB (HTTP) server is down" },
{ serverId: 2, hostId: 10061, triggerId: 13683, time: 1359087531,
type: 2, status: 1, severity: 3, brief: "IMAP server is down" },
{ serverId: 2, hostId: 10061, triggerId: 13684, time: 1359087532,
type: 2, status: 1, severity: 3, brief: "News (NNTP) server is down" },
{ serverId: 2, hostId: 10061, triggerId: 13685, time: 1359087533,
type: 2, status: 1, severity: 3, brief: "POP3 server is down" },
{ serverId: 2, hostId: 10061, triggerId: 13686, time: 1359087534,
type: 2, status: 0, severity: 3, brief: "Email (SMTP) server is down" },
{ serverId: 2, hostId: 10061, triggerId: 13700, time: 1359087530,
type: 2, status: 0, severity: 4, brief: "Lack of free swap space" },
],
}
11 / 15
27. {
diffs:
{
1: # serverId
{
13001: # triggerId
{
1359087525: 5, # time and delta of time
1359087530: 10,
1359087540: 99999,
},
13002:
{
1359080912: 99999,
},
},
2:
{
13001:
{
1359087600: 100,
1359087700: 100,
1359087800: 99999,
},
}
},
}
SAMPLE of PARSED DATA (events_ajax.html)
13 / 15
28. TEST POINT in USERCODE
- input, output of parseData()
- input, output of drawData()
parseData()
drawData()
parsed
data
<table>
:
</table>
raw
data
14 / 15
29. TEST at SERVER SIDE
EXPERIMENTAL : ON THE commonjs BRANCH
- runnable both Server Side and Client Side
(on Browser)
- run with teajs (oldly called v8cgi)
- CommonJS compliant
- Modules/1.x
- Unit Testing/1.0
- no loader is used
- alt. Django view template directive ‘include’
http://code.google.com/p/teajs/
http://www.commonjs.org/
15 / 15
32. Action
When ZABBIX/Nagios detect any trouble (condition)
Hatohol => any action
Different kinds of actions: by the combination of
● Server ID (Zabbix server or Nagios ID)
● Host ID
● host group ID
● Trigger ID
● Trigger status (OK or Problem)
● Trigger severity
3 / 9
33. Hatohol’s Action
● provides two kinds of actions
○ Command (executed by an event)
■ binary, shell script, LL script.
○ Resident (stay as a process)
■ executes a function in the module
by an event.
■ module (developed by)
● C/C++ (shared library)
● Python
4 / 9
34. Motivation of resident
● To handle complicated condtions
● For example,
○ Send e-mail when a trouble detects
○ Disable sending e-mail for 10min.
■ The times of events should be recorded
○ After 10min. send the number of events
○ The next event should be emailed
○ Send the number of events by e-mail every 60min
It’d better be handled by a program.
But the state is also should be saved.
=> It is difficult for a command
5 / 9
35. Picture of the resident action mechanism
Hatohol
Server
hatohol-resident-yard
usermodule.so
int num_events;
event_handler()
{
…
}
Connected
with pipe
- Different processes
- No damage
if user module crashes
-Relaunchable
Save a state as variables
Called user written
handler by an event
6 / 9
36. HA configuration (Under study…)
Zabbix
Server
Zabbix
Server
Nagios
(NDOUtils)
Hatohol
Server
Hatohol
Server HA config.
MySQL
Server
config.
log
Note: Just one idea...
Traffic x2
But not so heavy
7 / 9
37. Graph (Now no concrete ideas…)
● Basically similar to ZABBIX’s graph ?
○ users can specify duration
○ Shows maximum, minimum, average
● Are there other useful features ?
8 / 9
38. Hatohol is open source software.
● Please tell us any requirements
● Of course, bug report and bug fix are very
welcome
● Project site (Github)
○ https://github.com/project-hatohol/hatohol
9 / 9