Besides these two paradigms, however, learning theory also provides the hybrid concept of semi-supervised learning. Although technically more involved, semi-supervised methods combine the best of the classic learning paradigms: The majority of training data can be unlabeled, whereas only few instances need to be equipped with label information for learning. Semi-supervised methods are more accurate than unsupervised methods while not suffering from the problem of labeling large amounts of data. Unfortunately, the security community has largely ignored semi-supervised learning and, consequently, often argues for either one of the two classic learning paradigms.
In a first study, we have applied the concept of semi-supervised learning to network security and devised an learning method for network intrusion detection. Our method initially operates on unlabeled data as most of previous learning approaches to intrusion detection. However, the devised method then specifically requests label information for certain network events to improve the learning model. For example, our method requests labels for points that lie close to the decision boundary and thus may sharpen the detection accuracy. With only a minimal labeling effort a security operator can tune our method to particular network data and eliminate false-positive alarms that come with the majority of regular detection approaches. A paper of this work has been recently
accepted at AISEC 2009. Here is its abstract:
Anomaly detection for network intrusion detection is usually considered an unsupervised task. Prominent techniques, such as one-class support vector machines, learn a hypersphere enclosing network data, mapped to a vector space, such that points outside of the ball are considered anomalous. However, this setup ignores relevant information such as expert and background knowledge. In this paper, we rephrase anomaly detection as an active learning task. We propose an effective active learning strategy to query low-confidence observations and to expand the data basis with minimal labeling effort. Our empirical evaluation on network intrusion detection shows that our approach consistently outperforms existing methods in relevant scenarios.
Active Learning for Network Intrusion Detection. Nico Görnitz, Marius Kloft, Konrad Rieck and Ulf Brefeld. Proceedings of CCS Workshop on Security and Artificial Intelligence (AISEC), October 2009.


0 comments:
Post a Comment