Machine Learning for Computer Security

Good and bad times with machine learning and security research

Automatic Feature Selection for Anomaly Detection.

Network intrusion detection requires discriminative features from network traffic, which for the case of misuse detection allow for the precise formulation of signatures and rules, and which, on the other end, enable application of learning methods for anomaly detection. However, devising a set of features is not trivial, as attacks are reflected in all sorts of different patterns and numerical measures. A good example is the work of Kruegel at al. (see 1, 2, 3), in which various heterogeneous features are proposed and manually combined into an effective anomaly detection system.

Recently, colleagues at our lab came up with a method for automatic selection and weighting of such features for intrusion detection. Instead of defining a weighting of different features manually, the method automatically determines the mixture which optimizes the performance of anomaly detection. This optimization is realized by incorporating the process of feature selection directly into an anomaly detection method—a rather involved mathematical procedure. The paper will be presented at the AISec Workshop co-located with CCS. Following is its abstract:
A frequent problem in anomaly detection is to decide among different feature sets to be used. For example, various features are known in network intrusion detection based on packet headers, content byte streams or application level protocol parsing. A method for automatic feature selection in anomaly detection is proposed which determines optimal mixture coefficients for various sets of features. The method generalizes the support vector data description (SVDD) and can be expressed as a semi-infinite linear program that can be solved with standard techniques. The case of a single feature set can be handled as a particular case of the proposed method. The experimental evaluation of the new method on unsanitized HTTP data demonstrates that detectors using automatically selected features attain competitive performance, while sparing practitioners from a priori decisions on feature sets to be used.

Automatic Feature Selection for Anomaly Detection. M. Kloft, U. Brefeld, P. Düssel, C. Gehl, and P. Laskov. Proceedings of the First ACM Workshop on AISec, October 2008.