Calling Conventions

Calling conventions refer to the standardised rules used for function calls. They define how function parameters are passed, and how return values are handled and how the stack is managed.

Windows primarily uses the __stdcall calling convention when calling Win32 APIs from 32-bit code. 64-bit differs; it uses the x64 Application Binary Interface (ABI). It is similar, but not to be confused with, the __fastcall calling convention.

Parameters

When invoking a Win32 API, the initial four arguments are typically passed via registers, commonly rcx, rdx, r8, and r9 (except for floating-point arguments). If the API call requires additional arguments beyond the initial four, they are typically passed on the stack. Consider the following example:

MessageBoxA Syntax
int MessageBoxA(
  [in, optional] HWND   hWnd,
  [in, optional] LPCSTR lpText,
  [in, optional] LPCSTR lpCaption,
  [in]           UINT   uType
);

When calling the MessageBoxA function:

  1. The hWnd parameter is passed in the rcx register.

  2. The lpText parameter is passed in the rdx register.

  3. The lpCaption parameter is passed in the r8 register.

  4. The uType parameter is passed in the r9 register.

Note: any parameters that do not fit into an 64 bit register must be passed by reference. In the example given lpText and lpCaption must be passed as pointers to the string.

The assembly given below shows how this call would be made:

Call MessageBox Snippet
; this code assumes that the address of the MessageBox function is in r10
call_messageboxa:
    xor  rax, rax                     ; RAX = 0
    mov  rcx, rax                     ; RCX = hWnd = NULL
    mov  r9,  rax                     ; R9 = uType = NULL (default)
    mov  rax, 0x646c72                ; push HelloWorld\0 on to the stack
    push rax                          ; .
    mov  rax, 0x6f57206f6c6c6548      ; .
    push rax                          ; ---
    mov  rdx, rsp                     ; RDX = lpText = *HelloWorld\0
    mov  r8,  rdx                     ; R8 = lpCaption = *HelloWorld\0
    call r10                          ; CALL R10 (MessageBoxA)

Note: we have not discussed how to find the address of functions yet. It is also worth noting that in the example the parameters are not moved in to the registers in any specific order. The rax register was nullified on line 1 so it makes sense to assign this value to hWnd and uType in this order (lines 4 and 5).

The assembly instructions given above do not take into account another x68 ABI convention relating to stack space and stack alignment, that will be discusssed shortly.

Return Values

Return values that fit into 64-bits are returned in to the rax register. Vaues that are larger than 64-bits, such as __m128, __m128i, and __m128d are returned in the xmm0 register. These will not be discussed now.

Depending on the type being returned this could be a value or a pointer to a value. An int will be stored directly as a value in rax but a pointer will contain the memory address of the 'returned' value.

In the MessageBox example the return value is an int type that refers to a user action. If the Cancel button is pressed the function returns an IDCANCEL value represented by a 2. It is always good practice to refer to the Microsoft Win32 API documentation.

Shadow Space and Stack Alignment

When making a function call in the x64 calling convention, it is necessary to allocate 32 bytes (0x20) on the stack for the called function to utilise, even if there are no additional arguments beyond the four registers. This allocation is done to ensure stack alignment and is referred to as shadow space.

Additionally, the stack space should be aligned to a 16-byte boundary. Allocating 32 bytes on the stack ensures this alignment. However, if you push values to the stack, such as strings, it may be necessary to re-align the stack before the function call to maintain the required alignment.

In the MessageBox example above we push 24 bytes on to the stack, this misaligns the stack by 8 bytes. Before we make the call to MessageBox we need to push 40 bytes instead of 32 bytes:

Call MessageBox Snippet 2
sub  rsp, 0x28                    ; Allocate 40 bytes for call 
                                  ; and stack alignment   
call r10                          ; CALL R10 (MessageBox)
add  rsp, 0x20                    ; cleanup allocated stack space

By adhering to proper stack alignment and allocating the necessary space, you can ensure consistent behavior and compatibility when making function calls in the x64 calling convention.

In x64 ABI it is also the callers responsibility to clean up the stack when the function returns. This can be seen on line 4.

The full assembly listing is shown below:

Call MessageBox Snippet 3
; this code assumes that the address of the MessageBox function is in r10
call_messagebox:
    xor  rax, rax                     ; RAX = 0
    mov  rcx, rax                     ; RCX = hWnd = NULL
    mov  r9,  rax                     ; R9 = uType = NULL (default)
    mov  rax, 0x646c72                ; push HelloWorld\0 on to the stack
    push rax                          ; .
    mov  rax, 0x6f57206f6c6c6548      ; .
    push rax                          ; ---
    mov  rdx, rsp                     ; RDX = lpText = *HelloWorld\0
    mov  r8,  rdx                     ; R8 = lpCaption = *HelloWorld\0
    sub  rsp, 0x28                    ; Allocate 40 bytes for call 
                                      ; and stack alignment   
    call r10                          ; CALL R10 (MessageBox)
    add  rsp, 0x20                    ; cleanup allocated stack space

In some instances we can simply align the stack using the and rsp, byte -0x10 instruction.

Checking Stack Alignment

We can check stack alignment by placing an int3 instruction just before the call instruction. When the breakpoint is hit, we can check that the current value in rsp is correctly aligned by issuing the ?rsp%10 command. If the result is 0 then the stack is correctly aligned.

Advanced Win32 Calls and the Stack

When making calls to functions that require more than 4 arguments it is necessary to push the remaining arguments on to the stack. This can be quite confusing at first but is quite simple if we follow some basic rules.

Let's take the WinHttpSendRequest call as an example, the C code is shown below:

BOOL result = WinHttpSendRequest(hRequest, WINHTTP_NO_ADDITIONAL_HEADERS, 0, WINHTTP_NO_REQUEST_DATA, 0, 0, 0);

And here is the syntax:

WINHTTPAPI BOOL WinHttpSendRequest(
  [in]           HINTERNET hRequest,
  [in, optional] LPCWSTR   lpszHeaders,
  [in]           DWORD     dwHeadersLength,
  [in, optional] LPVOID    lpOptional,
  [in]           DWORD     dwOptionalLength,
  [in]           DWORD     dwTotalLength,
  [in]           DWORD_PTR dwContext
);

This call is used as an example because it is a simple call; we can pass NULL or 0 to all of the arguments except hRequest. Before the call is made, this is how the stack and the registers should look:

Here is an example of the assembly that would be used to make the call:

call_WinHttpSendRequest:
    mov rcx, [rbp-0x40]             ; RCX = hRequest (returned from 
                                    ; WinHttpOpenRequest)
    xor rdx, rdx                    ; RDX = lpszHeaders = NULL
    xor r8, r8                      ; R8 = dwHeadersLength = 0
    xor r9, r9                      ; R9 = lpOptional = 0
    sub rsp, 0x10                   ; Align stack BEFORE the pushes (this value
                                    ; may differ for others depending on stack
                                    ; alignment)       
    xor r11, r11                    ;
    push r11                        ; dwContext= 0
    push r11                        ; dwTotalLength = 0
    push r11                        ; dwOptionalLength = 0
    sub rsp, 0x20                   ; Allocate stack space for the function call
                                    ; AFTER the pushes
    call r10                        ; make the CALL here, assuming the address of
                                    ; WinHttpOpenRequest is in r10

Remember, when a value is pushed on to the stack, rsp is adjusted by 8 bytes to point at a lower memory address. So, we push the 5th, 6th, and 7th argument on to the stack then create the shadow space for the callee.

We will discuss writing, running and debugging shellcode in the next section.

Last updated