One of the major challenges and requirements in achieving a very high (>99.99%) reliability of operation of any major network infrastructure (i.e. data center, enterprise, campus, etc.) is the ability to design and deploy an always-on active system that performs end-to-end functional testing of all the network-connected infrastructure components and, as a result, monitors the infrastructure and its dependent external services with high accuracy and granularity (down to the packet level) in the most efficient way; consuming the least amount of computational or network resources.
When it comes to packet loss detection, metrics reported by the original manufacturers cannot be relied upon; their tools may either be buggy or, in most cases, do not provide APIs for extracting measurements. Therefore, we needed to create our own tool; this is the gap Arachne is filling.
In this talk, we present Arachne. Arachne is a packet loss detection system and an underperforming path detection system. It provides fast and easy active end-to-end functional testing of all the components in Data Center (DC) and Cloud infrastructures. Arachne is able to detect intra-DC, inter-DC, DC-to-Cloud, and DC-to-External-Services issues by generating minimal traffic.