Fingerprinting Malware — The better way

Fuzzy Hashing, Section Hashing and Import Hashing

CyberNotes
3 min readFeb 24, 2025
Image from Mesh Security

As discussed in the previous post, cryptographic hashing algorithms (MD5, SHA256, SHA1) are good for detecting identical malware samples. However when malware authors change even the slightest aspect of a malware sample, the hash value changes completely; even though the functionality of the malware is the same! As a result, this makes these algorithms almost of no effect in malware analysis triage.

Malware researchers have come up with better ways to hash malware in order to identify what sought of malware you are dealing with.

Here are some of the more reliable methods of hashing malware:

1. Fuzzy Hashing

Fuzzy hashing can be very useful in identifying malware samples that are similar, but not necessarily identical. It is a great technique for identifying samples that are from the same actor group or the same malware family.

In order do fuzzy hashing, we use the fuzzy hashing program — ssdeep

To use ssdeep with Python, you can utilize the python-ssdeep library, a wrapper for the ssdeep fuzzy hashing library.

Here is an example of how we use it:

Here are the results:

The fuzzy hash algorithm also helps to compare malware files. This can be really handy if you have a directory of malware files, where you can compare a new sample against the repository of malware samples to see of you’ve seen a variant before. — This will give you a percentage ( from 0–100) on the similarity between the malware samples.

2. Import hash

Import hashing is another hashing technique that can be used in identifying similar malware samples. The hashes are calculated based on the libraries/imported function names and their order in the executable file.

If the malware sample was compiled from the same source and in the same manner, then the import hash will also be the same.

If two different malware samples have the same import address table, then they are very likely related.

3. Section Hash

The section hash is also used to identify related malware samples. When doing analysis, the malware sample is often loaded onto petstudio, and this will calculate the MD5 hash for every section of the executable (.data, .rtext, .text, .rsrc etc)

This will give you the hash value for the different sections of the malware sample.

--

--

CyberNotes
CyberNotes

Written by CyberNotes

Data Science/Cyber - Student at Michigan State University.

No responses yet