1. A list and explanation of AOS services as of AOS 5.16. Some of the services are PC only, but
most are on PE. Also, some of this information can be found on The Nutanix Bible while other
bits and pieces can be found in TOIs and other documentation. Service leader discovery is in KB
Acropolis
KEY ROLE: VM Management service for AHV.
DESCRIPTION: An Acropolis Slave runs on every CVM with an elected Acropolis Master
which is responsible for task scheduling, execution, IPAM, etc. Similar to other components
which have a Master, if the Acropolis Master fails, a new one will be elected. The role
breakdown for each can be seen below:
PORTS USED: 2030
DEPENDENCIES: Arithmos (for stats), Ergon (task execution), Pithos and Insights (object
metadata), Zookeeper (leadership election)
DISRUPTIVE RESTART?: Yes. User VM management in AHV clusters will not be possible
while this service is stopped.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in AOS 4.1
Acropolis Master
Task scheduling & execution
Stat collection / publishing
Network Controller (for hypervisor)
VNC proxy (for hypervisor)
VM placement and scheduling (for hypervisor)
Acropolis Slave
Stat collection / publishing
VNC proxy (for hypervisor)
Aplos / Aplos Engine
KEY ROLE: New Prism gateway service in AOS 5.0 and newer.
2. DESCRIPTION: It’s an intentful orchestration engine + intentful API proxy. It is built as a
distributed layer capable of handling node failures, redistributing intents in the event a node
fails, supports fine grained Role-Based Access Control (RBAC) on all API’s, and has a built-in
intentful cli so you don’t need to install new cli jar files every time we update AOS.
PORTS USED: 9447
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in 5.0
Atlas
KEY ROLE: Curator like service for object store
DESCRIPTION: is responsible for object lifecycle policy enforcement and performing
garbage collection in Nutanix Buckets/Objects.
PORTS USED: 7103, 7104
DEPENDENCIES:
DISRUPTIVE RESTART?: No
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Arithmos
KEY ROLE: Maintain historical stats and config information.
DESCRIPTION: Arithmos serves as the centralized stats datastore for the Prism GUI and
Nutanix REST APIs. Different components like Stargate, Hyperint and Cerebro publish stats to
Arithmos. Therefore, stats clients are able to get the stats through Arithmos using Arithmos
APIs, instead of talking to individual components using different APIs. Different components
do not need to handle and process the stats for Prism. They only need to publish stats to
Arithmos. Original design information on Arithmos can be found here (This page has
arithmos_cli usage examples). Arithmos stores the last 24 hours of stats in memory.
PORTS USED: 9999, 2025
DEPENDENCIES:Pithos, Zookeeper, Cassandra
DISRUPTIVE RESTART?: Stats data in memory is lost and current stats are not available.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:Added in 3.5
3. Athena
KEY ROLE: Handles Authentication and Identity Management
DESCRIPTION: Prior to 5.10 Each gateway (Java and Aplos) has its own authentication, identity
management and authorization modules. As a result of this, any new improvement or bug fixes
on authentication/authorization are required to be performed on both the gateways leading to
two separate workstreams and two separate experiences, causing complexity. Athena, in 5.10, in
the first step in the process to reduce these two gateways into one.
PORTS USED: 2303
DEPENDENCIES: Aplos, IDF, Zookeeper
DISRUPTIVE RESTART?: No
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: introduced in 5.10
Cassandra
KEY ROLE: Distributed metadata store
DESCRIPTION: Cassandra stores and manages all of the cluster metadata in a distributed
ring-like manner based upon a heavily modified Apache Cassandra. The Paxos algorithm is
utilized to enforce strict consistency. This service runs on every node in the cluster. The
Cassandra is accessed via an interface called Medusa.
CASSANDRA DAEMON (Java process - commonly just Cassandra):
PORTS USED: 9161 (Protobuf comm), 9160 (Thrift comm), 7000 (inter-Cassandra comm),
8080 (JMX port, largely unused), 8081 (MX4J port, state from Cassandra)
DEPENDENCIES: CassandraMonitor
DISRUPTIVE RESTART?: Yes. Must be restarted only in a rolling-fashion allowing at least
several minutes for the service to stabilize.
AOS VERSION WHEN INTRODUCED OR REMOVED: In AOS since day 1
Cerebro
KEY ROLE: Replication/DR manager
DESCRIPTION: Cerebro is responsible for the replication and DR capabilities of DSF. This
includes the scheduling of snapshots, the replication to remote sites, and the site
4. migration/failover. Cerebro runs on every node in the Nutanix cluster and all nodes
participate in replication to remote clusters/sites.
PORTS USED: 2020
DEPENDENCIES:
DISRUPTIVE RESTART?: Yes. If the service is restarted on the Cerebro Master CVM, any
replications or snapshots being taken at the time of this restart may fail and have to be run
again.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Chronos
KEY ROLE: Job and task scheduler
DESCRIPTION: Chronos is responsible for taking the jobs and tasks resulting from a Curator
scan and scheduling/throttling tasks among nodes. Chronos runs on every node and is
controlled by an elected Chronos Master that is responsible for the task and job delegation
and runs on the same node as the Curator Master.
PORTS USED: 2011 (Master), 2012 (Node)
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Cluster Health
KEY ROLE: Health stats collection and carrying out of health checks.
DESCRIPTION: Cluster health handles the collection of health statistics and execution of
health checks. The original architecture doc can be found here.
PORTS USED: 2700
DEPENDENCIES:
DISRUPTIVE RESTART?: No
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Curator
KEY ROLE: MapReduce cluster management and cleanup
5. DESCRIPTION: Curator is responsible for managing and distributing tasks throughout the
cluster, including disk balancing, proactive scrubbing, and many more items. Curator runs on
every node and is controlled by an elected Curator Master who is responsible for the task
and job delegation. There are two scan types for Curator, a full scan which occurs around
every 6 hours and a partial scan which occurs every hour.
PORTS USED: 2010
DEPENDENCIES: Zookeeper (Zeus), Pithos, Cassandra (Medusa)
DISRUPTIVE RESTART?: Yes. If the service is restarted on the Curator Master, any scans
currently taking place will be aborted and will have to be run again.
AOS VERSION WHEN INTRODUCED OR REMOVED: In AOS since day 1
Dynamic Ring Changer
KEY ROLE: responsible for handling node changes in the ring.
DESCRIPTION: Dynamic Ring Changer responsible for handling node changes in the ring
such as node additions and removals.
PORTS USED: 2041
DEPENDENCIES: Zookeeper, Cassandra Monitor, Cassandra Daemon
DISRUPTIVE RESTART?: Yes. This service should not be restarted if any Metadata Disk or
Node Removals or Additions are taking place as they will be disrupted.
AOS VERSION INTRODUCED OR REMOVED: Introduced ~ version 3.0
Epsilon
KEY ROLE: Performs the entity level operations like Snapshot creation/replication, VM
migration, VM clone from snapshot, VM power on by using public V3 API.
DESCRIPTION: Responsible for performing the entity level operations required for recovery
plan execution, epsilon service is used by Nucalm as well for app deployment.
PORTS USED: 4120
DEPENDENCIES: Zookeeper, Ergon, IDF
DISRUPTIVE RESTART?: Stateless service, Magneto will resubmit the task. Epsilon itself is a
collection of services, which handles the crash restart of internal services in stateful manner.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in 5.10.
6. Ergon
KEY ROLE: Task service
DESCRIPTION: Ergon was introduced in 4.6 as the unified task management framework for
all AOS services. This document contains relevant information on Ergon.
PORTS USED: 2090
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in 4.6.
Foundation
KEY ROLE: Automated re-imaging of nodes and creating or expanding clusters.
DESCRIPTION: Foundation originally was only a standalone service that ran within a VM that
Nutanix SEs would deploy on their laptops, travel to the customer's datacenter, and perform
the listed tasks. Now Foundation has been integrated as a service within the CVM as well to
put most of that functionality in the hands of the customer (the standalone VM also still
exists). The behavior of the two versions is quite similar but diverges in some ways so this
description will specifically discuss the CVM integrated service.
Foundation provides a simplified and automated method of setting initial IP addresses
(of CVMs, Hosts, and IPMI) and optionally re-imaging nodes before creating a cluster out
of them using information provided by the user.
Foundation simplifies the cluster expansion process by re-imaging discovered nodes to
match the existing cluster software versions the node is being added to. When imaging
is complete it then interacts with Genesis to tell the cluster that the new node is now
compatible and ready for the node to actually be added into the cluster.
The Prism "Convert Cluster" feature in Nutanix clusters, (internally called "DIAL"), allows a
user to convert an ESXi cluster into an AHV cluster or an AHV cluster to an ESXi cluster
utilizing foundation services to perform the necessary re-imaging.
The Foundation service also provides Genesis with utilities to perform firmware upgrades
or other tasks that require direct host access by rebooting the node to a Phoenix ISO
that will inherit and automatically apply the node's current networking configuration.
The Foundation service interacts with Genesis through RPCs to tell Genesis to perform tasks such
as changing IP addresses or creating the cluster once Foundation has completed imaging on all
7. nodes and Genesis interacts with Foundation via HTTP REST calls to start imaging sessions for
cluster expansion or DIAL as well as using the Phoenix utilities.The Foundation service should
always be running on an "unconfigured" node (a node that is not part of a cluster) and is
selectively turned on by Genesis on configured nodes only when the user attempts to perform
actions that would require the Foundation service.
PORTS USED: 8000, 9441
DEPENDENCIES: Partial dependencies with Genesis
DISRUPTIVE RESTART?: Not to cluster I/O. If restarted during an imaging session you may
have to restart that imaging session.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: AOS 4.5 (came with
Foundation 3.0). The Foundation service is reverse-compatible with AOS versions all the way
back to its introduction in 4.5 which means you can upgrade Foundation on a cluster
regardless of AOS version so long as AOS is at least 4.5.
Genesis
KEY ROLE: Cluster component & service manager
DESCRIPTION: Genesis is a process which runs on each node and is responsible for any
services interactions (start/stop/etc.) as well as for the initial configuration. Genesis is a
process which runs independently of the cluster and does not require the cluster to be
configured/running. The only requirement for Genesis to be running is that Zookeeper is up
and running. The cluster_init and cluster_status pages are displayed by the Genesis process.
PORTS USED: 2100
DEPENDENCIES:
DISRUPTIVE RESTART?: No, however any upgrades that were not completed or cleanly
aborted will be restarted.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Hades
KEY ROLE: Hades is a disk monitoring service that handles all base hard drive operations,
such as disk adding, formatting, and removal.
DESCRIPTION: Purpose is to simplify the break-fix procedures for disks and to automate
several tasks that previously required manual user actions.
8. PORTS USED: 2099
DEPENDENCIES:
DISRUPTIVE RESTART?: No
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in 4.0.2.1 and 4.1
Hyperint
KEY ROLE: Provides abstraction over the details of the underlying hypervisor and provides a
uniform interface for its clients
DESCRIPTION: Hyperint is responsible for sending commands to the hypervisor and fetching
stats and configuration information. Each node will run the Hyperint service. Stats and
configuration information are sent to Arithmos every 20 seconds. Prism has the ability to
query Hyperint directly. The Hyperint architecture can be found here.
NOTE: As of 4.1 (?), Hyperint doesn’t run as an independent service. The logical functionality
was subsumed into Acropolis.
PORTS USED: 2030 (Hyperint OR Acropolis), 2031 (Hyperint Monitor), 2032 (Hyperint JMX),
2033 (Acropolis Hyperint slave)
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Functionality moved to
Acropolis in version 4.1
Insights Server
KEY ROLE: Service for storing (and retrieving) configuration, stats (time-series data).
DESCRIPTION: The insights server (or insights data fabric, aka IDF) provides a data fabric for
storing and retrieving data. Currently (as of Asterix, aka 5.0), configuration information is
stored in IDF on the PE and PC; and metrics are stored in IDF on the PC. IDF caches
configuration and recently used metrics for fast access. The underlying persistence is
handled by Cassandra. There are ongoing efforts to store alert data in IDF (future, written
Jan 25, 2017).
PORTS USED: 2027
DEPENDENCIES:
9. DISRUPTIVE RESTART?: Disruptive to stop on FSVMs in AFS Clusters. This will drop all client
connections to the FSVM.
AOS VERSION WHEN INTRODUCED OR REMOVED: Introduced in 4.6
Kanon
KEY ROLE: Protection of entities based on protection policy
DESCRIPTION: Responsible for protecting the entity, after protection policy is applied.
Cerebro service will take and replicate the snapshot to the desired Availability Zone.
PORTS USED: 2077
DEPENDENCIES: IDF, Cerebro(PE), Aplos(for transporting the request from PC to PE cerebro)
DISRUPTIVE RESTART?: No
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in 5.10
Lazan
KEY ROLE: DRS, advanced scheduling in AHV clusters in AOS 5.0 and newer.
DESCRIPTION: Generates plan for vm scheduling/placement. Considers affinity/anti-affinity
rules. Detects anomalies/hot spots for remediation.
PORTS USED: 2038
DEPENDENCIES: Acropolis
DISRUPTIVE RESTART?: No
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: Introduced in 5.0
Minerva CVM
KEY ROLE: Minerva is a Scale Out File Server for Nutanix.
DESCRIPTION: Minerva is intended to run initially on a Nutanix Cluster and will provide SMB
file services to Windows clients, often running as VDIs on Nutanix Clusters. Minerva consists
of a federation or cluster of user VMs running on nodes in a Nutanix cluster called NVMs.
The design doc can be found here. Minerva consists of the following local services:
PORTS USED: 7501 (Minerva CVM RPC service port), 7502 (Minerva NVM RPC service port),
7503 (Minerva Store RPC service port)
10. DEPENDENCIES: Minerva needs ergon, insightsdb, uhura, acropolis, stargate and cerebro to
be running. This is not all the times, some of the services are used as per need, for example
uhura and cerebro will be used more during FS deployment or update.
DISRUPTIVE RESTART?:
AOS VERSION WHEN INTRODUCED OR REMOVED: Introduced in 4.6
Nutanix Guest Tools
KEY ROLE: Nutanix Guest Tools (NGT) is a software based in-guest agent framework which
enables advanced VM management.
DESCRIPTION: The solution is composed of the NGT installer which is installed on the VMs
and the Guest Tools Framework which is used for coordination between the agent and
Nutanix platform.
PORTS USED: 2073 (Internal, inter CVM), 2074 (SSL, for guest vm's) on CVM IP
DEPENDENCIES: Acropolis, Hyperint, Arithmos, Cerebro
DISRUPTIVE RESTART?:
VERSION WHEN INTRODUCED OR REMOVED: Added in 4.6
The NGT installer contains the following components:
Guest Agent ServiceSelf-service Restore (SSR) aka File-level Restore (FLR) CLIVM Mobility Drivers
(VirtIO drivers for AHV)VSS Agent and Hardware Provider for Windows VMsApp Consistent
snapshot support for Linux VMs (via scripts to quiesce)
This framework is composed of a few high-level components:
Guest Tools Service-Gateway between the Acropolis and Nutanix services and the Guest Agent.
Distributed across CVMs within the cluster with an elected NGT Master which runs on the
current Prism Leader (hosting cluster vIP)Guest Agent-Agent and associated services deployed in
the VM's OS as part of the NGT installation process. Handles any local functions (e.g. VSS, Self-
service Restore (SSR), etc.) and interacts with the Guest Tools Service.
Pithos
KEY ROLE: vDisk configuration manager
11. DESCRIPTION: Pithos is responsible for vDisk (DSF file) configuration data. Pithos runs on
every node and is built on top of Cassandra.
PORTS USED: 2016
DEPENDENCIES: Zookeeper, Cassandra
DISRUPTIVE RESTART?: Yes. Must be restarted only in a rolling-fashion allowing at least
several minutes for the service to stabilize.
AOS VERSION WHEN INTRODUCED OR REMOVED: Present since January 2012
Prism
KEY ROLE: UI and API
DESCRIPTION: Prism is the management gateway for component and administrators to
configure and monitor the Nutanix cluster. This includes Ncli, the HTML5 UI, and REST API.
Prism runs on every node in the cluster and uses an elected leader like all components in
the cluster.
PORTS USED: Prism listens on ports 80 and 9440, if HTTP traffic comes in on port 80 it is
redirected to HTTPS on port 9440.
DEPENDENCIES: For web console access, cluster must be running
DISRUPTIVE RESTART?: Disruptive to accessing the interface.
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED: In product since day 1
SSL Terminator
KEY ROLE: Manages SSL certificates between all the CVMs to make sure all prism instances
are synced from a security standpoint. Load balances the request for prism services.
DESCRIPTION: User access to VM consoles depends on SSL Terminator
PORTS USED: 9443
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Stargate
KEY ROLE: Data I/O manager
12. DESCRIPTION: Stargate is responsible for all data management and I/O operations and is
the main interface from the hypervisor (via NFS, iSCSI, or SMB). This service runs on every
node in the cluster in order to serve localized I/O.
PORTS USED: 2009 (Inter-nod and inter-component comm), 2049 (NFS), 3260 (iSCSI IET
port), 3261 (iSCSI discovery), 3205 (iSCSI redirection), 445 (SMB adapter)
DEPENDENCIES: Zookeeper (configuration), Genesis (initial disk setup), Cassandra
(metadata), Pithos (vdisk config), Arithmos (stats)
DISRUPTIVE RESTART?: Yes. Must be restarted only in a rolling-fashion allowing at least
several minutes for the service to stabilize.
AOS VERSION WHEN INTRODUCED OR REMOVED: Present since day 1
Sys Stat Collector
KEY ROLE: Script used to collect a number of system stats on CVMs
DESCRIPTION: Sys stat collector is a service/script used to collect statistics on each CVM.
Stats are collected and placed in the /home/nutanix/data/logs/sysstat directory. ENG-
2693 describes when and why this service was added.
PORTS USED:
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Uhura
KEY ROLE: Single, VM management back-end service.
DESCRIPTION: Manages VM customization such as Sysprep for Windows VMs and Cloudinit
for Linux VMs.
PORTS USED: 2037
DEPENDENCIES:
DISRUPTIVE RESTART?:
AOS VERSION WHEN IT WAS INTRODUCED OR REMOVED:
Vulcan
13. KEY ROLE: Watches for triggers and executes Playbook actions
DESCRIPTION: Back end service to watch for triggers and execute actions associated with
Playbooks
PORTS USED: 2055
DEPENDENCIES: IDF, Zookeeper, Ergon
DISRUPTIVE RESTART?: Yes. Could terminate some running Playbooks.
AOS VERSION INTRODUCED/REMOVED: introduced in 5.11
X-trim
KEY ROLE: To improve the performance of lower DWPD drives
DESCRIPTION: a background service which will continuously issue fstrim requests in small
chunks.
PORTS USED:
DEPENDENCIES: NGT installed in UVM if trim requests need to be triggered from uvm.
DISRUPTIVE RESTART?: No
AOS VERSION INTRODUCED/REMOVED: Introduced in 5.11 (may be moved into Hades in
the future)
Zookeeper
KEY ROLE: Cluster configuration manager
DESCRIPTION: Zookeeper stores all of the cluster configuration including hosts, IPs, state,
etc. and is based upon Apache Zookeeper. This service runs on three nodes in the cluster,
one of which is elected as a leader. The leader receives all requests and forwards them to its
peers. If the leader fails to respond, a new leader is automatically elected. Zookeeper is
accessed via an interface called Zeus.
PORTS USED: 9876, 9877, 2888, 3888
DEPENDENCIES: Genesis
DISRUPTIVE RESTART?: Yes. The utmost care should be taken when restarting this service,
and this should only be done following proceedures documented in a valid KB or ISB.
AOS VERSION INTRODUCED/REMOVED: Has been part of the architecture since day 1.
14. To find leaders for several of the services
Traditional leaderships
( ztop=/appliance/logical/leaders; for z in $(zkls $ztop | egrep -v 'vdisk|shard'); do
[[ "${#z}" -gt 40 ]] && continue; leader=$(zkls $ztop/$z | grep -m1 ^n) || continue;
echo "$z" $(zkcat $ztop/$z/$leader | strings); done | column -t; )
Zeus leader
$ cluster status | grep Leader
Prism leader
$ 'curl localhost:2019/prism/leader && echo’ on any CVM.
Genesis leader (from kb 2870)
$ zkls /appliance/logical/pyleaders/genesis_cluster_manager
n_0000000003
n_0000000004
n_0000000011
Insert the first value in the following command
nutanix@NTNX-13AM3K090049-1-CVM:10.4.68.243:~$ zkcat
/appliance/logical/pyleaders/genesis_cluster_manager/n_0000000003
10.4.68.244nutanix@NTNX-13AM3K090049-1-CVM:10.4.68.243:~$
15. or
$ zkls /appliance/logical/pyleaders/genesis_cluster_manager | head -1|xargs -I% zkcat
/appliance/logical/pyleaders/genesis_cluster_manager/%
zkls /appliance/logical/pyleaders to see which leaders can be found using this command
Curator master (from kb 4257)
$ curator_cli get_master_location
Cerebro master (from kb 4257)
$ cerebro_cli get_master_location
Alerts manager, aritmos, cassandra_monitor, cerebro, insights, curator, nfs_namespace,
nutanix_guest_tools, pithos, prism_monitor, zookeeper (kb 4355)
$ export ZKLDRPATH="/appliance/logical/leaders"; for i in `zkls $ZKLDRPATH | grep -v vdisk`; do
len=`echo $i | wc -c`; if [ $len -gt 40 ]; then continue; fi; lnode=`zkls "$ZKLDRPATH/$i" | head -1`;
ldr=`zkcat $ZKLDRPATH/$i/$lnode`; echo $i $ldr; done
lcm leader
$ zkls /appliance/logical/pyleaders/lcm_leader
Acropolis master
links -dump http://0:2030/sched | grep Master
Minerva Leader (from KB 3094) from CVM
16. $ minerva get_leader
Python leaderships
( ztop=/appliance/logical/pyleaders; for z in $(zkls $ztop); do leader=$(zkls $ztop/$z
| grep -m1 ^n) || continue; echo "$z" $(zkcat $ztop/$z/$leader | strings); done |
column -t)