PE File Format

Malware Basics

Featured image

What is PE ?

Most of malware comes in the form of PE format, PE stands for Portable Executable which means that windows can run and execute the code inside it directly such as exe, dll, sys, fon, drv files

PE Structure

PE file consists of 2 parts:

The first part is Headers such as DOS Header, PE Header, Optional Header, Data Directories, Sections Table.

The second part is PE Sections such as Code, Imports, Data of the file

1. Headers

DOS MZ Header

It is located at the beginning of any PE File and it is 64 bytes long.

The first 2 bytes are MZ and you can see them if you open any PE File in any Hex Editor.

DOS Stub

Appears after the DOS header, and there is nothing to say except that it tells you; This program cannot be run in DOS mode.... etc

PE Header

It has a start signature, which is the first 4 bytes starts with 2 bytes which are PE and their Hex is 50 45, then the last 2 bytes are 00 and 00

File Header

It is the first 20 bytes that come after the first 4bytes, and it contains information about Number of Sections in the file, TimeDate Stamp, and Characteristic that defines whether this file is exe or dll and whether it is 32 or 64 bits..etc.

Optional Header

It comes after that, and It is not optional because It is the most important header that contains many important information such as Address of Entry Point (which starts the execution of the code), Image Base which helps us in the dynamic analysis, and Size of Image which is the size of the malware when running in the memory.

Note: We do not need to count the number of bytes and extract information from them, because we can use automated tools such as CFF-Explorer and PEstudio to analyze the PE headers and extract information from them, these information such as: Entry Point, Image Base, Raw Size & Virtual Size, Characteristic Sections, Import Table, ASLR... etc

Data Directory

This refers to important things such as Import & Export Table which contains the functions and libraries that malware uses to encrypt or delete files or connect to external websites or download/upload files.. etc

Note: Since these libraries are for Windows such as dll files, the code that Malware will use will be inside the dll files not inside the malware itself, so it can show you that is not malicious !

DLL stands for Dynamic Link Library which is a library with a fixed name and contains fixed functions such as: User32.dll - WSock32.dll - Wininet.dll - Advapi32.dll - Kernel32.dll

Each of them contains a set of common functions that can be used by malware.

Kernel32.dll contains functions such as: CreateDirectoryW - CreateFileW - ReadFile - WriteFile

Advapi32.dll contains: RegCloseKey - RegDeleteValueW - RegOpenKeyEx

User32.dll contains: LoadIconW - LoadMenueW - MessageBoxW - SetClipboardData

And so on.

The exported functions are opposite of the imported functions so when We see a DLL file, We will find exported functions not imported one because another program uses them.

On the other hand, when We see an executable file we will find imported functions because the program will import them from another dlls

Note: This doesn’t prevent that dlls contain both import & export together because the code may need functions from other place, so it must import them. and because there are functions that are used by other programs, you must export them.

Why PE Headers are important?

It helps us in the Static Analysis stage because we know from them the names of the functions that the malware imports from Windows, so that if a function is used, it encrypts or deletes files, or communicates with websites on the Internet.. Thus, we know the capabilities of the malware and determine its type as well.

Section

Sections contain the code that will be executed and the data of the program.

Of course, there are other sections like .idata, .edata, .pdata, and .reloc

Why Sections are important?

It helps us in the dynamic analysis stage when we run the malware and see the data inside it. and it helps in static analysis to get the hidden resources or hidden data from inside.