Different concepts have been proposed to mitigate drive-by downloads and to make the Web a safer place. Most notable are two approaches: first, methods for detection of heap-spraying attacks (see Nozzle and Egele et al.), which enable protecting against a prevalent attack technique used in drive-by downloads, and second, methods for offline analysis of web pages (see Provos et al. and Wepawet), which allow to identify malicious web pages by monitoring their JavaScript code in a carefully crafted sandbox environment. Unfortunately, neither of the two approaches is efficient and effective at the same time. While methods for detection of heap spraying are reasonable fast, they fail to protect from other types of attack techniques. On the other hand, offline analysis systems detect all sorts of drive-by threats, but are unable to provide timely protection for the end user.
We have addressed this problem in our recent work and developed a new system protecting against drive-by downloads called Cujo (Classification of Unknown JavaScript Code). Cujo is essentially a web proxy that inspects JavaScript code transfered to the end user during surfing. As there is only a very limited time frame for this inspection, Cujo performs a lightweight static and dynamic code analysis. Concurrently, the static structure of the JavaScript code as well as the dynamic behavior are analysed and monitored for attack patterns using efficient techniques from machine learning. In an empirical evaluation with 600 real drive-by-download attacks, Cujo was able to identify about 94% of the attacks with 2 false alarms in 100,000 web site visits – an accuracy that can only be attained by offline analysis so far. During the evaluation Cujo induced a median delay of 380ms in comparision to a regular proxy. Hence, the inspection of JavaScript code is hardly perceived by the user, though it correctly blocks 94% of the drive-by-download attacks.
A paper on this work and Cujo has been presented at the 26h Annual Computer Security Applications Conference (ACSAC) in Austin, Texas. The abstract of our paper is here:
The JavaScript language is a core component of active and dynamic web content in the Internet today. Besides its great success in enhancing web applications, however, JavaScript provides the basis for drive-by downloads – attacks exploiting vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Due to the diversity and frequent use of obfuscation in these JavaScript attacks, static code inspection proves ineffective in practice. While dynamic analysis and honeypots provide means to identify drive-by-download attacks, current approaches induce a significant overhead which renders immediate prevention of attacks intractable. In this paper, we present Cujo, a system for automatic detection and prevention of drive-by-download attacks. Embedded in a web proxy, Cujo transparently inspects web pages and blocks delivery of malicious JavaScript code. Static and dynamic code features are extracted on-the-fly and analysed for malicious patterns using efficient techniques of machine learning. We demonstrate the efficacy of Cujo in different experiments, where it detects 95% of the drive-by downloads with few false alarms and a median run-time of 500ms per web page – a quality that, to the best of our knowledge, has not been attained in previous work on detection of drive-by-download attacks.Cujo: Efficient Detection and Prevention of Drive-by-Download Attacks. Konrad Rieck, Tammo Krueger and Andreas Dewald. Proc. of 26th Annual Computer Security Applications Conference (ACSAC), December, 2010.

