Machine Learning for Computer Security

Good and bad times with machine learning and security research

Visualization of Payload-based Anomaly Detection.

Two paradigms for detection of attacks against computer systems have been widely studied in security research: First, misuse detection which aims at identifying known patterns of misuse and, second, anomaly detection which tries to detect deviations from normal usage. While both concepts have their individual pros and cons, only one of the two paradigms, namely misuse detection, has made its way into regular security products. For example, almost all anti-virus scanners and intrusion detection systems employ a database of known misuse patterns for spotting security problems. Although successful on the market, misuse detection inherently fails to protect from novel threats, such as zero-day exploits. In turn, anomaly detection methods provide means to identify unknown threats with high precision and low run-time overhead (see for instance work by Cretu, Ingham, Perdisci or myself). So, why is there still a lack of acceptance for anomaly detection in practice?

One key problems with most anomaly detection approaches is their inability to explain decisions. The detection process resembles a black-box and security operators are required to thoroughly analyze context information to assess the actual cause of a reported anomaly. We have addressed this problem in a recent paper published at the European Conference on Computer and Network Defense (EC2ND). In particular, we present visualization techniques suitable for explaining the decisions of payload-based anomaly detection systems. Instead of digging into the technical details, I herein present an example for explaining the detection of a real network attack. An in-depth discussion is provided in our paper.




The above figures shows so called feature differences of a command injection attack (awstats configdir exploit), where peaks indicate string features that strongly deviate from a model of normal network traffic. The attack exploits an insecure handling of input parameters to pass shell commands to an HTTP server. The transferred commands are mapped to the standard URI scheme, which replaces reserved characters by the symbol “%” and an hexadecimal value. For example, “%20” denotes a space symbol, “%3b” a semi-colon, “%26” an ampersand and “%27” an apostrophe. In particular, the semi-colon and ampersand are characteristic for shell commands as they reflect specific semantics of shell syntax .In the presented visualization exactly these patterns are represented as high peaks. While similar string patterns can be also observed in legitimate traffic, the high differences in the figure clearly indicate an anomalous activity involving escaped shell commands and explain the reason for reporting of an anomaly.




Another visualization technique, feature shading, is presented in the
second figure, where the payload of a reported anomaly is superimposed with color reflecting the individual deviation of each byte from a model of normal network traffic. The URI of the command injection attack is flagged as anomalous by dark shading, thus indicating the presence of abnormal strings. The part ensuing the URI, however, is indicated as normal region, as it mainly contains frequent HTTP patterns, such as “Mozilla” and “Googlebot”. This example demonstrates the ability of a shading to emphasize anomalous contents in network payloads, while also indicating benign regions and patterns.

By visualizing a “colorful” network payload a security operator is able to quickly identify relevant and malicious content in data, eventually enabling effective countermeasures. Consequently, the decisions made by a payload-based detection system – so far opaque to a security operator – can now be visually explained, such that one can benefit from early detection of novel attacks as well as an explainable detection process.

Visualization and Explanation of Payload-Based Anomaly Detection. Konrad Rieck and Pavel Laskov. Proceedings of 5th European Conference on Computer and Network Defense (EC2ND).

0 comments: