Machine Learning for Computer Security

Good and bad times with machine learning and security research

Malheur is out!

After almost a year of work, I am proud to announce the first public release of Malheur—a tool for automatic analysis of program behavior recorded from malicious software (www.mlsec.org/malheur).

Malicious software (malware) is one of the major threats in the Internet today and millions of hosts are currently infected with malware programs, such as computer worms, backdoors and trojan horses. The sheer amount of malware renders manual analysis of malicious files impossible and, even worse, automatic inspection of file content is strongly obstructed by obfuscation techniques. An alternative for efficiently crafting new defenses against malware is behavior-based analysis: Malware programs are collected in the wild and executed in a sandbox environment, where their behavior is monitored. The execution of each binary results in a report of monitored behavior which can be used to characterize and ultimately defend against malicious software.

As the first publicly available tool, Malheur analyzes program behavior of malicious software and enables automatic discovery and discrimination of novel variants. This ability is rooted in well-known concepts of machine learning. Discovery of novel malware classes resembles a clustering problem and discrimination between classes matches a classification task. Both techniques are implemented in Malheur with effectivity and efficiency in mind. Instead of fiddling with esoteric learning concepts, Malheur resorts to basic methods, such as linkage clustering and nearest-prototype classification, which are efficiently implemented by means of parallel programming. Expressive access to recorded behavior is realized by embedding reports in a vector space spanned by short behavioral patterns, similar in spirit to bag-of-words models.

Malheur is a joint effort of Berlin Institute of Technology and University of Mannheim. The merits of Malheur along with an empirical evaluation using real malware are detailed in a technical report:

Automatic Analysis of Malware Behavior using Machine Learning. Konrad Rieck, Philipp Trinius, Carsten Willems and Thorsten Holz. Technical Report 18-209, Berlin Institute of Technology, 2009.