[White paper] detecting problems in industrial networks though continuous monitoring

3,742 views
3,620 views

Published on

Automation networks offer a range of real-time applications and data, making necessary the continuous monitoring of the quality of services. The parameters of QoS (Quality of Service) seek to address priorities, bandwidth allocation and network latency control. There are several QoS parameters to characterize a computer network, and that can be used for monitoring purposes.

Each SCADA network, in a healthy state, presents a specific QoS which rarely changes given the repetitive process of the IACS operations. The continuous monitoring of QoS parameters of an automation network may anticipate problems such as malware contamination and equipment failures like switches and routers. It is very important to be aware of these changes in behavior in order to receive alerts and promptly handle them, avoiding incidents that could compromise the operation of the network and be financially or environmentally costly.

In addition to the monitoring of network traffic, it is also necessary to monitor resource consumption of critical servers, such as the processing (CPU), memory, storage capacity and hard disk failures, among others.

This work aims to establish a method by which SCADA security professionals can differentiate and qualify any problems that may be occurring through continuous monitoring of the automation network performance parameters giving a more behavioral approach than current signature-based ones.

We presented a series of tests conducted in our laboratories in order to measure the performance of a simulated automation network parameters using a small SCADA network sandbox. First we measured the normal operating parameters of the network and reap its main graphics obtained with the proper tools. In a second step we practiced several attacks against the simulated automation network. During all attacks we collected the operating parameters of the network and its main graphics.

At the conclusion of the work we compared the graphs of the network in healthy state with the graphs of the network with the security incidents described above. We detailed how the network parameters were affected by each kind of incident and built a table showing the way the main parameters of an automation network were affected by the attacks

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,742
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

[White paper] detecting problems in industrial networks though continuous monitoring

  1. 1. DETECTING PROBLEMS IN INDUSTRIAL NETWORKSTHROUGH CONTINUOUS MONITORINGJan Seidl 1Marcelo Ayres Branquinho 2SUMMARYAutomation networks offer a range of real-time applications and data, making necessary thecontinuous monitoring of the quality of services. The parameters of QoS (Quality of Service)seek to address priorities, bandwidth allocation and network latency control. There are severalQoS parameters to characterize a computer network, and that can be used for monitoringpurposes.Each SCADA network, in a healthy state, presents a specific QoS which rarely changes giventhe repetitive process of the IACS operations. The continuous monitoring of QoS parametersof an automation network may anticipate problems such as malware contamination andequipment failures like switches and routers. It is very important to be aware of these changesin behavior in order to receive alerts and promptly handle them, avoiding incidents that couldcompromise the operation of the network and be financially or environmentally costly.In addition to the monitoring of network traffic, it is also necessary to monitor resourceconsumption of critical servers, such as the processing (CPU), memory, storage capacity andhard disk failures, among others.This work aims to establish a method by which SCADA security professionals candifferentiate and qualify any problems that may be occurring through continuous monitoringof the automation network performance parameters giving a more behavioral approach thancurrent signature-based ones.We presented a series of tests conducted in our laboratories in order to measure theperformance of a simulated automation network parameters using a small SCADA networksandbox. First we measured the normal operating parameters of the network and reap its maingraphics obtained with the proper tools. In a second step we practiced several attacks againstthe simulated automation network. During all attacks we collected the operating parameters ofthe network and its main graphics.At the conclusion of the work we compared the graphs of the network in healthy state withthe graphs of the network with the security incidents described above. We detailed how thenetwork parameters were affected by each kind of incident and built a table showing the waythe main parameters of an automation network were affected by the attacks.Keywords: Monitoring, SCADA, Security, Malware, Attacks.1 CTO at TI Safe Segurança da Informação Ltda, Brazil (http://br.linkedin.com/in/janseidl)2 CEO at TI Safe Segurança da Informação Ltda, Brazil (http://br.linkedin.com/in/marcelobranquinho)
  2. 2. 1 ABOUT AUTOMATION NETWORK MONITORINGAutomation network monitoring is the term used to describe a system that continuouslymonitors an automation network and notifies the network administrator when a device fails oran outage occurs. This notification is normally performed through messaging systems (usuallye-mail and SNMP traps). Network monitoring is normally made through the use of dedicatedsoftware applications and/or commercial tools. The ping command, for example, is a type ofnetwork monitoring tool.1.1 ASSETS TO BE MONITOREDTo monitor a control system we first need to know exactly what devices exists on the networkand how they communicate with each other. Almost every piece of networked hardware on aindustrial plant can be monitored. From SCADA servers, supervisory/control stations toPLCs, innumerous items can be monitored and aid us on preventing and quickly respondingto incidents.2 PREPARING THE MONITORING ENVIRONMENTThe monitoring data can be very intensive and frequent. Given that axiom, its stronglyrecommended that one creates an entirely separated network segment exclusive to monitoring.This will prevent data from interfering with legit control / supervisory traffic at network leveland can help on isolating traffic from sniffing and other attacks.The appropriate amount of servers must be setup according to the number of assets andlocations to be monitored. Most of the solutions can operate in high-availability and high-performance clustered modes.Its important to determine assets processing and network capacity in order to determinatewhether an agent approach will be used or passive monitoring (ping or SNMP monitoring)must be used.Keep in mind that monitoring solutions usually pair with a database back-end solution and forperformance reasons databases should never share their data hard disk with anotherapplication.Writing up an industrial traffic matrix is also recommended in order to visually determinatewhich assess need to communicate with each other assets and in which function codes so wecan tune up the monitoring triggers.Below is an example of a industrial traffic matrix:Source Destination Function Codes192.168.1.15 192.168.1.1 3, 16192.168.1.18 192.168.1.1 3Table 1: Sample industrial traffic flow matrix
  3. 3. 2.1 MONITORING THE MACHINE HEALTHMachine health monitoring can aid on preventing issues that could interrupt programsoperation disrupting supervisory or control operations. With active performance monitoringissues can be predicted and solved before they happen.Common items are monitored as Free/Used Disk space (applications may crash if cannotwrite temporary files), Disk I/O (may indicate low memory [paging] or data extraction),Logged on users, Number of failed login attempts (may indicate system compromise),Number of incoming connections, number of outgoing connections, incoming/outgoingpackets rate (may indicate data extraction or illegal connections or even malware), CPU andmemory usage (may indicate worms/rootkits).2.2 MONITORING OPERATING SYSTEM ERRORSError monitoring can be very useful on anticipating hardware failures. As in industrial plantsa scheduled stop must be placed in order to perform maintenance on hardware, its better to doon that window than in a hurry in the event of the hardware failing while in production.Errors that can indicate hardware error are memory commit/allocation errors, disk read/writeerrors. CPU temperature and fan speeds, disk temperatures, memory temperature, and such.2.2 MONITORING PROCESSESKey processes can also be monitored to see if they havent crashed so the crew can be alertedjust right it crashed or also restart-it automatically (must be used with extreme cautionbecause may cause inconsistencies).Suspicious known processes names and ports can also be monitored as RDP, HTTP/HTTPS,TeamViewer, “cmd.exe” and Windows PowerShell processes, and such triggering an alert ifpresent that could indicate unauthorized remote access.2.3 MONITORING HIGH AVAILABILITYCommunication link states can be checked to see if plants network enters into contingencystate. The monitoring agent can perform automated tasks if needed.2.4 MONITORING MODBUS TRAFFICYou can setup a host to act as a network sniffer, mirroring all Modbus traffic to that switchport. A simple Modbus sniffer can be built using pure python and scapy in order to dump outfunction codes, sources and destinations.With Modbus monitoring you can create alerts on disallowed function codes, tag values andsource and can also have the graphical representation of the frequency of commands sent andreceived.
  4. 4. 2.5 PLC SNMP TRAPS MONITORINGSome PLCs offer SNMP (Simple Network Management Protocol) monitoring. Items likenetwork I/O, discarded packets, unknown protocols received, network errors, allocation tables(useful against ARP poisoning), fragmentation. Those indicators (specially the error ones) canpromptly tell if something is happening.2.6 PLC ICMP (PING) MONITORINGFor PLCs that do not support SNMP monitoring a simple ping monitoring can be used todetect device connectivity and also response times that could indicate an device overload duea DOS attack.2.7 DISTRIBUTED MONITORINGMonitoring with Zabbix (open source software) can be configured with distributedmonitoring. That means that all automation plants can have their own monitoring stationreporting data up to a central station. This can be very useful as you can have self-regulateddistributed monitoring stations reporting to the companys main office monitoring station.2.8 CHECKING FREQUENCYDepending on the load that the machine performs, items can be configured to be polled in adefined interval. Servers with lighter load can have shorter checks (each 15 or 30 seconds)and servers with higher load can have more delayed checks (1min or more) to preserve themachine computational power and bandwidth.2.9 ALERTINGBesides monitoring and plotting data, email, SMS or Jabber (the original name of theExtensible Messaging and Presence Protocol-XMPP, the open technology for instantmessaging and presence) alerts can be configured to alert the response team. With a littleeffort, alerts can trigger sound alerts or any other type of alerting method.2.10 ESCALATIONSAlerts can also be configured to escalate to other people in case the primary response teamtakes too long to resolve the issue. If a trigger remains active after a configured time, e-mailalerts can be automatically sent to main offices response team or even to the manufacturersresponse team or gradually climb up the hierarchy tree until problem is solved.2.11 ITEM GROUP ALERTINGItems that share the same role can be grouped together making alerting more targeted. Youcan have all database-related items triggering alerts for the database team, SCADA-related tothe automation team and so on. Escalation can be applied here too so higher support levelscan be contacted if the first responsible team cannot solve-it on time.
  5. 5. 3. THE TEST BED"Test bed" is the denomination given to an structured test platform for running experimentson a safe and controlled manner. The structure used for this work is composed of elementsthat emulate the behavior of a real automation network and represent a replica of the realworld of industrial processes. Due to factors specific to SCADA environments such as thecriticality of real-time systems and the need for uninterrupted availability, test beds representideal platforms for observing the behavior of systems and components analysis of controlsystems.3.1 THE TEST ENVIRONMENTThe test structures existing in the TI Safe Laboratory includes the field equipment - consistingof a Wago 741-800 PLC and some hardware simulating an industrial natural gas plant (TofinoScada Security Simulator), an Windows 7 (physical) station acting as the supervisory station,a monitoring server (Debian Linux 6) and a modbus traffic sniffer server (Debian Linux 6with python script + scapy) (both virtual machines).Picture 1: The SCADA Security Simulator used for the tests3.1.1 THE TEST NETWORKThe configured network has no segmentation either on sub-netting, routing or VLANs. Allconnected equipment is on the same “flat” network within the same IP address range(192.168.1.0/24).Diagram 1: The test network
  6. 6. 3.1.2 THE ATTACKER MACHINEThe attacker machine is a HP laptop that will be directly connected at the switch (not shownin diagram above) and is running Kali Linux 1.0 from a Live-CD. Below is the list of thesoftware used on tests:Software / Tool Description Attack AuthorHping3 ICMP flood tool Network Layer 3denial of servicehttp://www.hping.org/T50 Flood tool Network Layer 3denial of servicehttps://github.com/merces/t50Meterpreter Remote access shell Remote compromise,malware infectionhttp://www.metasploit.com/Arpspoof ARP poison/spoofing tool ARP poison http://arpspoof.sourceforge.net/Pymodbus Modbus python library Unauthorized modbustraffichttps://github.com/bashwork/pymodbusTable 2: List of software used on tests3.1.3 THE MONITORING SERVERThe monitoring server is built on top of a Debian Linux machine running the opensourcemonitoring solution Zabbix 2.0.6 and MySQL 5.1 as data backend. Ideally in a productionenvironment, those monitoring solutions backends would be split across servers forperformance and isolation reasons.Diagram 2: Monitoring data flow. Arrow indicates if data is remotely collected or sent byagent on asset.Data is collected either actively by Zabbix agents or passively by ICMP and SNMP queries.The collected data can then be fed directly to Zabbix where is normalized, graphed and stored.
  7. 7. SNMP MIBs (Management Information Bases) on PLC were enumerated with the snmpwalktool and converted to Zabbix template with zload_snmpwalk perl script(https://www.zabbix.com/wiki/howto/monitor/snmp/zload_snmpwalk).3.1.4 THE NETWORK SNIFFERThe network sniffer consists on a simple linux installation connected at an specific switch portthat is configured to receive a mirror of every other port that has modbus traffic on it. Ainternally crafted sniffer built in python within the Scapy(http://www.secdev.org/projects/scapy/), packet manipulation program.The sniffer is able to decode modbus traffic and output information in the following format:{request: 0330000064, unit_id: 1, src_ip: 192.168.1.15, dst_ip:192.168.1.1, response:03c80006005000160012019000120050005000160012001201901130cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd00cd008801004008110000012010480000800000024000000200000000200000000000200410000450200000000200000020004002000000002000408000000040800200000020000000000800cd0010800000cd0000440000, func_code: 3}The sniffer consists of 3 modules, fired-off as forked instances in this multiprocessedparadigm: The Sniffer, The Workers and The Publisher. These were built these way to let theoperations run isolated and without blocking each other. The main processes is the snifferbuilt on top of Scapy. This module feds a IPC queue that is later consumed by the workers(100 instances) that will add the transactions to the pool for summarization. The publishergets the summary, reports it to Zabbix for monitoring and graphing and then flush-it so thecycle begins again. This way alerts can be set up if invalid function codes are detected. Thesniffer could be also reprogrammed to output more data like register-write values and such.3.2 ATTACKS PERFORMEDThe following attacks were performed against the simulated industrial network assets:Attack Attack Vector Affected AssetsCommunications interception ARP poison PLC, Supervisory StationsPLC Denial of Service Layer 3 network flood, 0day PLCSupervisory station malwareinfectionModbus malware, Meterpretershell backdoorSupervisory Stations,NetworkSupervisory stationcompromiseMeterpreter shell backdoor Supervisory StationUnauthorized remote logon Enabling remote desktop onmachine, accessing machinefrom other machine on networkSupervisory StationUnauthorized modbus traffic Sending commands fromattacker machinePLCTable 3: Attacks performed.
  8. 8. 3.3 TEST RESULTSAs testing usually involves some information gathering, we started the tests by issuing somesimple scan with nmap.$ nmap -sV 192.168.1.1Without any rate limiting, three triggers were off: Abnormal incoming traffic, Abnormaloutgoing traffic and TCP Connection number change.Picture 2: Triggers fired by nmap scan
  9. 9. Picture 3: Peaks generated by nmap scan on TX/RX and spit out some TCP RSTs.3.3.1 COMMUNICATIONS INTERCEPTIONOur ARP table checking script has done its job and reported the changed MAC address for192.168.1.1 (PLC) when we poisoned the Supervisory Station with arpspoof.
  10. 10. Picture 4: Trigger for ARP changesThe following UserParameter entry was added to the Supervisory Stations Zabbix agentconfiguration file, for ARP testing:UserParameter=arpcheck[*],ping -n 1 -w 1 $1 > NUL & for /f “tokens=2” %iin (arp -a | findstr /r “$1>”) do @echo %iSo items can be created like “arpcheck[192.168.1.1]” to get the MAC address from the ARPtable to this IP. A trigger is created to fire up on changes to this value.3.3.2 PLC DENIAL OF SERVICEThe ICMP flood attack was very noisy as expected. Several triggers were off during theattack. Graphs clearly showed abnormal peaks.Picture 5: Green peak shows 30mbps peak from flood
  11. 11. Picture 6: Errors caused by the network overflowThe SNMP data isnt available while PLC is under DoS attack because the snmp client cannotconnect to the device to collect data so those triggers will wear off after the attack has ceasedand are considered collateral triggers.Picture 7: Triggers fired after DOS attackTo get alerted just when the device begins to be attacked by a Denial-of-Service attack, weperiodically ping the PLC. If the PLC doesnt responds within a certain timeout, its consideredoffline and a trigger is issued.
  12. 12. Picture 8: Trigger from PLC ping fail3.3.3 SUPERVISORY STATION MALWARE INFECTIONAs ICS network traffic is mostly homogeneous, we can set pretty tight thresholds on networkinput and output variance. Most remote access tools (RATs) doesnt mind about limiting theirspeed and theyll most likely try to communicate as fast as they can.Picture 9: Network traffic hops on meterpreter sessionPicture 10: Triggers from abnormal network trafficThe meterpreter session creates a little noise while downloading the stage but when I startissuing some commands (like ls, ps, migrate and some scripts) the graph gets really off thatnearly-flat line showing that someone is doing something there.Network input and output is one of the best places to early detect malware outbreaks sinceworms are usually noisy as they try to phone-home or spread across the network.
  13. 13. 3.3.4 SUPERVISORY STATION COMPROMISEWe created a monitoring item for the number of running processes “cmd.exe” in order to getnotified every 30 seconds if a cmd.exe is open in any supervisory station. Normally no shellsshould be open unless the system is under some kind of maintenance.Picture 11: Trigger notifies about new shell openThe Windows Powershell is also monitored as is less common to be used on a regular basison supervisory stations.Picture 12: Trigger notifies about new Windows Powershell openWhen system is marked as "under maintenance", triggers are supressed so you may open asmany shells as you want.If you need to run scheduled batch jobs, you can add time ranges where its allowed to haveone or more “cmd.exe” processes running.3.3.5 UNAUTHORIZED REMOTE LOGONBy monitoring Windows Event Log we can determine whether a new session is created. Ourtrigger caught it right away.Picture 13: Trigger for new sessions created on Windows station3.3.6 UNAUTHORIZED MODBUS TRAFFICThe unauthorized traffic caused subtle but noticeable changes in the TX/RX graph. If the ICSnetwork can keep a steady pace, variation thresholds can be tuned to detect anomalous traffic.
  14. 14. The peak within the area marked in blue is where the unauthorized commands were issued.Note the increase in TCP connections count and traffic during this period.Picture 14: Peaks generated by Modbus traffic from attacker machinePicture 15: Triggers triggered by abnormal traffic generated by issuing unauthorized modbustraffic and modbus data extractionThe network sniffer also gives us some good visualization of Modbus Function codes. Takefunction-code 3 (Read Multiple Registers) for example. The Supervisory Station polls it everyN seconds to update the supervisory software. The graph is pretty constant as shown below.
  15. 15. Picture 16: Regular Function-Code 3 Modbus Traffic to PLCAs soon as the attacker starts sending Modbus funcion code 3 to the PLC in order toenumerate tags the graph creates spikes (highlighted below) that blow the whistle on ourenumeration.Picture 17: Peaks after Modbus probingThe first peak is due manual individual tag probing via command-line modbus client. Thesecond (larger) one is due the “enum.sh” script that tries to read Tags from a supplied range.As the normal communication is steady, this subtle change also fires a trigger.Picture 18: Trigger fired by unauthorized Modbus traffic
  16. 16. 4. CONCLUSIONThe homogeneity of the cyclical behavior of industrial networks and servers allows us toestablish with little effort the parameters of a healthy network. This same characteristic ishardly found in IT networks due to their nature of use and makes unfeasible the monitoringwith the same level of accuracy without the massive occurrence of false positives.Network and servers analysis and monitoring applications are critical for the detection ofunusual network traffic, performing network and control systems management, and assistingin responding to security incidents. This type of software addresses a general need for securityin control systems rather than specific vulnerabilities.Through behavior monitoring can be achieved more tangible results than monitoring forknown keywords, called signatures.When applied to an automation network, monitoring software can be used to establish abaseline of normal network traffic, a task that helps facilitate incident response and riskassessment. The establishment of baselines traffic by analyzing packets in the control systemnetwork is required for the detection of anomalous traffic by analyzing the differences.Once the irregular network traffic has been captured and analyzed by the monitoring software,the security team will use the data dumps to assess what is really happening on the network.The anomalous traffic is compared to traffic from baseline to provide important informationabout which servers (or equipments) are generating the anomalous traffic, ports and servicesthat may be involved, and which network protocols are being used. Traffic packets dumps canbe used to determine if the traffic is due to network errors, system configuration, or acompromised system.After the normal activity of the network has been demarcated, triggers can be configured forparameters outside these ranges that can mean a compromise of the assets in question. Basedon these triggers, alarms (including sound) can be configured.For industrial automation environments, with their unusual protocols, there are fewcommercial tools available for purchase, and the customization of an open source tool that fitsthe monitoring needs should be considered.
  17. 17. REFERENCES ON THE INTERNET1. http://www.tisafe.com/en/solucoes/governanca-industrial/2. http://www.tofinosecurity.com/3. https://www.zabbix.org4. https://www.zabbix.com/documentation/2.0/manual/config/items/itemtypes/zabbix_agent/win_keys5. https://www.zabbix.com/forum/showthread.php?t=106796. https://www.zabbix.com/wiki/howto/monitor/snmp/zload_snmpwalk7. http://technet.microsoft.com/en-us/library/dd941635(v=ws.10).aspx8. http://technet.microsoft.com/en-us/library/cc732459(v=ws.10).aspx9. http://www.kali.org10. http://docs.python.org/2/library/multiprocessing.html11. https://www.zabbix.com/forum/showthread.php?p=90132

×