As a result, we have often abstained from using tree data in our research. However, we recently found the time to address this problem and devised approximate tree kernels. Instead of fiddling with algorithmic issues and implementations, we propose to approximate the computation of kernel functions for trees. To this end, we narrow the kernel computation to a sparse subset of subtrees rooted at relevant symbols and thereby avoid considering all possible subtrees as in regular tree kernels. This sparse subset is automatically selected with respect to a given learning task, such that the expressiveness of the approximate kernel is preserved while its run-time is reduced. Though simple in design, this approximation enables speed-ups up to three orders of magnitude and, for the first time, allows to compare HTML documents for detection of Web spam efficiently.
The concept of approximate tree kernels as well as several applications are described in a recent article published in the Journal of Machine Learning Research. The abstract for the article is given below:
Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space.Approximate Tree Kernels. Konrad Rieck, Tammo Krueger, Ulf Brefeld and Klaus-Robert Müller. Journal of Machine Learning Research (JMLR), 11(Feb):555−580, Microtome, 2010.


0 comments:
Post a Comment