EPSRC logo

Details of Grant 

EPSRC Reference: EP/K032585/1
Title: Se-Ma-Match: Semantic Malware Matching
Principal Investigator: King, Professor A
Other Investigators:
Researcher Co-Investigators:
Project Partners:
McAfee Labs
Department: Sch of Computing
Organisation: University of Kent
Scheme: Standard Research
Starts: 14 October 2013 Ends: 13 October 2016 Value (£): 245,525
EPSRC Research Topic Classifications:
Fundamentals of Computing
EPSRC Industrial Sector Classifications:
Information Technologies
Related Grants:
EP/K032623/1
Panel History:
Panel DatePanel NameOutcome
23 Jan 2013 EPSRC Research Institute APA & V Announced
Summary on Grant Application Form
The flood of malware samples is predicted to grow into a deluge in

2012, making the problem of maintaining a database of malware

signatures ever more difficult. For each new sample, it is important

to determine the threat that it poses.

In response to this, dynamic malware analysis

tools have been designed that execute the sample in a sandbox,

monitoring the actions of a sample. If these actions are similar

to those of malware that has been already indexed in the database,

then one might draw conclusions regarding provenance and severity

of the threat posed. If the sample does not match against known

malware, then it can be subject to manual scrutiny, using a dissembler

such as IDA Pro.

This Linnaean approach to malware analysis is both natural and

convenient: it is natural to group malware into families that share

common attributes; and it is provides a convenient way of assessing

threat. Yet the whole methodology is predicated on the accuracy

with which samples are characterised by their signatures. If a

sample is assigned a signature that does not express its behaviour,

then samples that are behaviourally distinct can be erroneously

grouped together. Conversely, samples which behave the same, but

appear different, can be accidentally placed in different groups.

The main problem with dynamic malware analysis tools is that they

execute the binary for a limited time, typically considering just

one path through the binary. This limits the actions that can be

observed, rendering the signature inaccurate for programs that

reveal their true behaviour later. In addition, the dynamic approach

can miss infrequent actions or logic bombs. The dynamic approach is

also susceptible to timing attacks that detect a tracer to turn off

some action. Above all, the signatures are based solely and only

on those actions that are encountered during the trace.

More static approaches have been applied too, at one extreme using

the call graph of the binary itself for classification, and at the

other deploying model checking techniques to search the paths through

call graph for signature behaviours that characterise known malware

families. Yet graph matching techniques are sensitive to control-flow

obfuscation and model checking requires the signature behaviours

to be known up-front and distilled into a temporal formula or an

automata.

A middle ground is offered by abstract interpretation since it

provides a way to systematically consider all paths, while monitoring

a program for actions that inform the construction of the signature.

Abstract interpretation provides a way to break the dichotomy between

the purely dynamic and the purely static approach to malware analysis

into a graduated continuum. Formally, purely static approach

(a.k.a. a static analysis) can be derived from the purely dynamic

approach (a.k.a. a tracer) by compositing a sequence of abstractions:

if all n abstractions are applied the result is the static analysis;

but if the first m < n abstractions are applied the result is a

hybrid. The challenge is to find the hybrid that provides sufficient

path coverage to undercover logic bombs yet is sufficiently robust

to be used by practitioners in the security sector. The proposed

project will discover this sweet point by following two complementary

lines of inquiry. Concrete traces will be abstracted to cover more

paths and more actions (at UCL). Static analyses, which covers all

paths, will be refined to avoid paths and actions that do not

actually occur (at Kent). Thus UCL will add missing information

to signatures (converging on the ideal signature from below) whilst

Kent will remove excess information from signatures (converging on

the ideal signature from above). By reflecting on the relative

merits of these approaches, we will draw conclusions on how to

construct robust signatures for malware classification and thereby

advance the whole field.
Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.kent.ac.uk