Machine Learning for Computer Security

Good and bad times with machine learning and security research

TokDoc: The Token Doctor.

Detecting and preventing network intrusions is a basic task in computer security. Thus, it no surprise that there exists several security products capable to block network attacks, though with moderate success and restricted to a database of known attack patterns. For a security researcher two issues with these systems are not satisfactory. First, current intrusion prevention systems rely on a up-to-date database of attack signatures. Novel and unknown attacks will likely go undetected. Second, network traffic is either passed or completely blocked—often with fatal consequences and cut-off of benign communication.

In recent years, security research has focused on addressing the first issue by extending intrusion detection systems with techniques of statistics and machine learning. The resulting systems proof effective in many scenarios. Still, they build on a binary pass-or-block decision on every analyzed event. What if we question this rigid decision making? Together with Tammo Krueger, Christian Gehl and Pavel Laskov, we have dared to go one step beyond. In a recent paper, we propose a web-application firewall that is not only capable to detect network attacks, but provides means to "heal" abnormal content to some degree. We call it the Token Doctor.

The Token Doctor, or short TokDoc, acts as a reverse proxy and inspects incoming requests of HTTP traffic. Each requests is parsed into its individual tokens, such as the URL, parameters, headers and so on. Each tokens is then analyzed individually using anomaly detection techniques. If an anomaly is spotted in a token, a fine-grained process is launched. Some tokens, such as usernames and cookie values, can not be corrected, hence there are simply dropped from the request as with usual prevention systems. Other tokens, however, are "healed" by replacing them with benign content. This replacement is automatically selected by determining similar tokens in a pool of benign HTTP requests.

The "healing" implemented in TokDok is not guaranteed to succeed. Frankly speaking, automatic amending of network data is a controversial idea and one can image a lot of things going wrong. Nevertheless, our experiments demonstrate the opposite. TokDoc provides an excellent detection accuracy while significantly reducing false positives in comparison to state-of-the-art methods. For example, several benign requests, which would been dropped by a regular systems due to minor irregularities, are slightly amended without effects on functionality. Overall, there is room for discussion: Either stick to a rigid decision with painful blocking or choose a fuzzy amending with lots of promises but no guarantees. I don't know.

A corresponding paper has been published at the 24th ACM Symposium on Applied Computing (SAC) this year. Its abstract is here:
The growing amount of web-based attacks poses a severe threat to the security of web applications. Signature-based detection techniques increasingly fail to cope with the variety and complexity of novel attack instances. As a remedy, we introduce a protocol-aware reverse HTTP proxy TokDoc (the token doctor), which intercepts requests and decides on a per-token basis whether a token requires automatic "healing". In particular, we propose an intelligent mangling technique, which, based on the decision of previously trained anomaly detectors, replaces suspicious parts in requests by benign data the system has seen in the past. Evaluation of our system in terms of accuracy is performed on two real-world data sets and a large variety of recent attacks. In comparison to state-of-the-art anomaly detectors, TokDoc is not only capable of detecting most attacks, but also significantly outperforms the other methods in terms of false positives. Runtime measurements show that our implementation can be deployed as an inline intrusion prevention system.
TokDoc: A Self-Healing Web Application Firewall. Tammo Krueger, Christian Gehl, Konrad Rieck and Pavel Laskov. Proc. of 25th ACM Symposium on Applied Computing (SAC), 1846-1853, March 2010.