Machine Learning for Computer Security

Good and bad times with machine learning and security research

Machine Learning for Application-Layer Intrusion Detection.

Banzai! Yesterday, I handed over the last print outs of my Ph.D. thesis to the library of the Berlin Institute of Technology. My doctoral defense was a thrilling yet enjoyable event, especially due to challenging questions raised by John McHugh, whom I have to cordially thank for coming to Berlin. After all, it was a great success. I am now heading for a long holiday to recover from and prepare for more exciting research.

Following is the official summary of the thesis. A PDF version is available here.
Misuse detection as employed in current network security products relies on the timely generation and distribution of so called attack signatures. While appropriate signatures are available for the majority of known attacks, misuse detection fails to protect from novel and unknown threats, such as zero-day exploits and worm outbreaks. The increasing diversity and polymorphism of network attacks further obstruct modeling signatures, such that there is a high demand for alternative detection techniques.

We address this problem by presenting a machine learning framework for automatic detection of unknown attacks in the application layer of network communication. The framework rests on three contributions to learning-based intrusion detection: First, we propose a generic technique for embedding of network payloads in vector spaces such that numerical, sequential and syntactical features extracted from the payloads are accessible to statistical and geometric analysis. Second, we apply the concept of kernel functions to network payload data, which enables efficient learning in high-dimensional vector spaces of structured features, such as tokens, q-grams and parse trees. Third, we devise learning methods for geometric anomaly detection using kernel functions where normality of data is modeled using geometric concepts such as hyperspheres and neighborhoods. As a realization of the framework, we implement a standalone prototype called Sandy applicable to live network traffic.

The framework is empirically evaluated using real HTTP and FTP network traffic and over 100 attacks unknown to the applied learning methods. Our prototype Sandy significantly outperforms the misuse detection system Snort and several state-of-the-art anomaly detection methods by identifying 80-97% unknown attacks with less
than 0.002% false positives—a quality that, to the best of our knowledge, has not been attained in previous work on network intrusion detection. Experiments with evasion attacks and unclean training data demonstrate the robustness of our approach. Moreover, run-time experiments show the advantages of kernel functions. Although operating in a vector space with millions of dimensions, our prototype provides throughput rates between 26-60 Mbit/s on real network traffic. This performance renders our approach readily applicable for protection of medium-scale network services, such as enterprise Web services and applications.

While the proposed framework does not generally eliminate the threat of network attacks, it considerably raises the bar for adversaries to get their attacks through network defenses. In combination with existing techniques such as signature-based systems, it strongly hardens today's network protection against future threats.

Machine Learning for Application-Layer Intrusion Detection.
Konrad Rieck. Ph.D. thesis, Berlin Institute of Technology (TU Berlin), 2009.

Securing IMS against Novel Threats.

Recently, we have published an interesting article at Bell Labs Technical Journal on detecting unknown threats in IMS and VoIP networks. This article has been the outcome of a fruitful cooperation of Fraunhofer FIRST and Alcatel-Lucent in Germany. The article's abstract is here:
Fixed mobile convergence (FMC) based on the 3GPP IP Multimedia
Subsystem (IMS) is considered one of the most important communication technologies of this decade. Yet this all-IP-based network technology brings about the growing danger of security vulnerabilities in communication and data services. Protecting IMS infrastructure servers against malicious exploits poses a major challenge due to the huge number of systems that may be affected. We approach this problem by proposing an architecture for an autonomous and self-sufficient monitoring and protection system for devices and infrastructure inspired by network intrusion detection techniques. The crucial feature of our system is a signature-less detection of abnormal events and zero-day attacks. These attacks may be hidden in a single message or spread across a sequence of messages. Anomalies identified at any of the network domain's ingresses can be further analyzed for discriminative patterns that can be immediately distributed to all edge nodes in the network domain.

Securing IMS against Novel Threats. Stefan Wahl, Konrad Rieck, Pavel Laskov, Peter Domschitz and Klaus-Robert Müller. Bell Labs Technical Journal (BLTJ), 14(1), 243-257, John Wiley & Sons, May 2009.