Win X64 Asm 1

2020/12/16

Categories: Journal Dev Journal Tags: assembly x64 Windows

Intro

I’m working through the book “Windows 64-bit Assembly Language Programming” and taking some notes.

My first notes are about environment setup, and working on Chapter 3 “Hello World”.

A few tutorials I’ve seen use assemblers like FASM or NASM, but the book uses ML64 so that is what I used as well.

Setup

To get setup with ML64 the book recommended searching for ml64.exe and adding it to the path. I didn’t quite want to do it this way so went searching for more info. What I found was that, similarly to compiling C++ from command line, all I need to do was run vcvarsall.bat x64 (vcvar is Windows C++ toolset).

I can never remember the path to vsvarsall.bat, so I decided to make a .bat file to help.

I placed the following in a “dev_env.bat” placed it in the bin directory of my Cmder install. The same thing could be achieved by putting the .bat file anywhere really. Placing it in the Cmder/bin directory just makes calling said script a bit easier.

@echo off
set plat=%1
:: defaults to x64, but can pass in x86 instead
if "%plat%" == "" (
    SET plat=x64
)
"<Your Drive>:\<your>\<path>\<to>\vcvarsall.bat" %plat%

And because curiosity killed the cat, I had to find out what the difference between .bat and .cmd is.

Compiling

After setting up that script, I wrote up the console input echo program from chapter 3, “Hello World”.

The command line to compile that the book gives is,

ml64 <yourfilename>.asm /link /SUBSYSTEM:CONSOLE /ENTRY:main

I kinda knew what the commands do but decided to go and read the docs.

/link passes “the remainder of the command line to LINK”. That is, anything after /link gets passed as options to the linker.

SUBSYSTEM: <arg> specifies the subsystem…which there are several of. According to the documentation, if main is defined for native code (asm and C++ are native as opposed to “managed” C#), then the default subsystem is CONSOLE so probably, this could have been ommited from the command line. To put that to the test I tried it and…it worked so that is cool.

Another subsytem that could be passed is WINDOWS although that means you are planning to provide your own window (just rephrasing the docs here).

/ENTRY: <function> sets the starting address for the .exe (or .dll). So /ENTRY:main sets the starting point of the .exe to the address of the function main which was defined in the sample code.

So the final command line that would work for compiling is (provided you followed the example and defined the function named main).

ml64 <yourfilename>.asm /link /ENTRY:main

I tried ommiting /link just for kicks but obviously that didn’t work. /ENTRY is a linker command, and without the /link part, nothing in the command line is passed to the linker.

One thing I noticed later, this command line will not create a .pdb so debugging will not work. Just adding -Zi flag solves this.

Coding Highlights

The major highlight while coding up the hello world sample of chapter3 was learning about the Windows 64 bit calling convention. The convention is (as defined in the book)

  • First 4 arguments are passed in RCX, RDX, R8, R9. Additional arguments are pushed onto the stack
  • Calling program assumes RAX, RCX, R8-R11 are volatile, (original contents are not preserved) while RBX, RSI, RDI, RBP, RSP, and R12-R15 are non-volative (original contents are preserved)
  • The called procedure assumes that the stack has room to store four 64-bit registers and that the address contained in RSP (stack pointer) is 16 byte aligned. A CALL puts an 8 byte return address on the stack, so to maintain stack pointer alignment, 40 bytes of “shadow space” is reserved on the stack.

NOTE: the way the book talks about “shadow space” is different from how it is described in other sources. Here it describes the space as a place to store RCX, RDX, R8 and R9 if needed. Also, the part about the first 4 arguments is not strictly accurate as well. It omits how the XMMO registers may be used as well for floating point values. But nonetheless, any arguments in addition to the first 4 are going to be on the stack.

Byte Alignment

I was curious about the 16 byte alignment requirement and went looking for an explanation. Docs from Microsoft say it is to “aid performance” and “xmm registers are commonly 16 byte aligned”. This doc by Agner Fog has additional calling convention info as well.

The xmm registers are 128 bits (16 bytes) so that makes some sense that the stack would be aligned to size of the larger registers.

Alignment is an interesting topic that gets some coverage in Writing Great Code: Vol 1. The super duper paraphrased high level is that, there are some rules about how the CPU reads from memory, and said rules result in a “natural alignment” for data types that allow the CPU to read the data in the least number of cycles (ideally, one cycle). Normally as you’re coding you don’t even notice these things because the rules are handled at a much lower level. My first encounter with it in C++ was learning about padding, where based on how you arrange a struct, it may be larger than you think becuase padding bytes are slipped in by the compiler.

Coding

The coding part was pretty simple but and there were several things I was curious about so went looking for additional information.

The first thing was this line,

Console equ -11

In the docs, it says EQU is has the form _name_ EQU _expression_. It means it assigns a numeric value expression to name. But what is -11? That was my real question. I found the answer in the documentation for GetStdHandle, -11 specifies STD_OUTPUT_HANDLE. What is “standard” depends on the subsystem specified when compiling. Since we are using the console subsystem, this means standard out is our console window. Not that exciting, but I was satisfied to have found documentation that explained in more detail what -11 was.

The second curious thing was using the instruction LEA which is Load Effective Address. For example, the Hello World code has

lea RDX, pmsg

LEA explained in the docs says

Computes the effective addres of the second operand (source) and stores it in the first operand (destination). The sources operand is a memory address (offset part) specified with one of the processors addressing modes;….

Again from Writing Greate Code I remember that there are several addressing modes: direct, indirect, and indexed to name a few. Just from the code in the demo I was not sure what addressing mode is being used, and still am not 100%. But when I looked at the disassebly, what I found was

lea rdx, [pmsg (07FF754B84000h)]

Which to me, the brackets indicates the indirect addressing is being used. I also just tried coding up with brackets directly, and the end result as well as disassembly were identical to without brackets.

lea rdx, [pmsg]

But the main thing with the LEA instruction is that it puts an address into the destination, not a value. So it is a pointer.

Compare with disassembled C

I wanted to see what the assembly would look like if I coded up the same functionality in C.

Here is my translation of the asm demo into C.

#include <windows.h>

char *pmsg = "Please Enter a message: ";
char inputMsg[20];
DWORD numRead;
HANDLE output;
HANDLE input;

int main()
{
    output = GetStdHandle(-11);
    input = GetStdHandle(-10);

    WriteConsole(output, pmsg, 25, NULL, NULL);
    ReadConsole(input, &inputMsg, sizeof(inputMsg) - 1, &numRead, NULL);
    WriteConsole(output, inputMsg, sizeof(inputMsg), NULL, NULL);
    return 0;
}

In Visual Studio you can place a breakpoint, run, then select Debug -> Window -> Disassembly to see the assembly code, but I wanted to know how to do it from the command line.

At first I found dumpbin with an option of /disasm, but that generated a tremendous amount of output and didn’t seem to be what I was looking for.

Finally, I found the msvc compiler flag FA. Be careful with this one though, I used the following command line

cl -Zi -FA hello_world.c

and it generated hello_world.asm…overwriting my already existing hello_world.asm that I had written while following along the book!

The FA option “…generates an assembler listing file for each translation unit in the compilation…”. Adding in an extra option s adds the source alongside the assembly (in the output file, you see your source followed by assembly). So my final command line was,

cl -Zi FAs hello_world.c

Which writes out the following in hello_world.asm,

PUBLIC pmsg
_DATA SEGMENT
COMM inputMsg:BYTE:014H
COMM numRead:DWORD
COMM output:QWORD
COMM input:QWORD
_DATA ENDS
_DATA SEGMENT
pmsg DQ FLAT:$SG95356
$SG95356 DB 'Please Enter a message: ', 00H
_DATA ENDS
PUBLIC main
EXTRN __imp_GetStdHandle:PROC
EXTRN __imp_ReadConsoleA:PROC
EXTRN __imp_WriteConsoleA:PROC
pdata SEGMENT
$pdata$main DD imagerel $LN3
    DD imagerel $LN3+165
    DD imagerel $unwind$main
pdata ENDS
xdata SEGMENT
$unwind$main DD 010401H
    DD 06204H
xdata ENDS
; Function compile flags: /Odtp
; File C:\Users\matt\Documents\SynologyDrive\asm\hello_world.c
_TEXT SEGMENT
main PROC

; 10   : {
$LN3:
    sub rsp, 56     ; 00000038H
; 11   :     output = GetStdHandle(-11);
    mov ecx, -11    ; fffffff5H
    call QWORD PTR __imp_GetStdHandle
    mov QWORD PTR output, rax
; 12   :     input = GetStdHandle(-10);
    mov ecx, -10    ; fffffff6H
    call QWORD PTR __imp_GetStdHandle
    mov QWORD PTR input, rax
; 13   :
; 14   :     WriteConsole(output, pmsg, 25, NULL, NULL);
    mov QWORD PTR [rsp+32], 0
    xor r9d, r9d
    mov r8d, 25
    mov rdx, QWORD PTR pmsg
    mov rcx, QWORD PTR output
    call QWORD PTR __imp_WriteConsoleA
; 15   :     ReadConsole(input, &inputMsg, sizeof(inputMsg) - 1, &numRead, NULL);
    mov QWORD PTR [rsp+32], 0
    lea r9, OFFSET FLAT:numRead
    mov r8d, 19
    lea rdx, OFFSET FLAT:inputMsg
    mov rcx, QWORD PTR input
    call QWORD PTR __imp_ReadConsoleA
; 16   :     WriteConsole(output, inputMsg, sizeof(inputMsg), NULL, NULL);
    mov QWORD PTR [rsp+32], 0
    xor r9d, r9d
    mov r8d, 20
    lea rdx, OFFSET FLAT:inputMsg
    mov rcx, QWORD PTR output
    call QWORD PTR __imp_WriteConsoleA
; 17   :
; 18   :     return 0;
    xor eax, eax
; 19   : }
    add rsp, 56     ; 00000038H
    ret 0
main ENDP
_TEXT ENDS
END

There is quite a bit of stuff in there that I don’t understand…but, it did help me to have a better grasp on the calling convention.

One part of the calling convention is that the first four arguments to a function are passed in RCX, RDX, R8, R9 (well, not entirely accurate as already mentioned). When I was typing out the assembly the first time around, I didn’t quite “get it” that a lot of what was going on is just setting up these registers for a function call.

But for example, this part

; 12   :     input = GetStdHandle(-10);
    mov ecx, -10    ; fffffff6H
    call QWORD PTR __imp_GetStdHandle
    mov QWORD PTR output, rax

is just putting -10 into the lower bits of the RCX register (RCX is the full 64 bit register, ECX is the lower 32 bits of the same register). So it was just prepping the function call. Then, lastly, output is put in RAX which is where integer results are returned from functions.

Similarly,

; 14   :     WriteConsole(output, pmsg, 25, NULL, NULL);
    mov QWORD PTR [rsp+32], 0
    xor r9d, r9d
    mov r8d, 25
    mov rdx, QWORD PTR pmsg
    mov rcx, QWORD PTR output
    call QWORD PTR __imp_WriteConsoleA

The call to WriteConsole takes 5 parameters. So the 5th argument NULL has to get passed on the stack which is what the following line sets up.

mov QWORD PTR [rsp+32], 0

For next two lines, I had to look up what the d on the end of the register name meant. Turns out it is an identical thing to the difference between RCX and EAX. R9 is the full 64 bit register, R9d is the lower 32 bits. XORing the register with itself zeros out the register, which is the value for NULL. The value 25 put in R8d is just a value I counted out as the length of the message I wanted to write. This could be calculated or set as a variable instead.

xor r9d, r9d
mov r8d, 25

Conclusion

When I was reading the book and typing up the demo, the connection between the calling convention and what I was typing did not click in my mind. But doing a deeper dive on the convention along with looking at the assembly generated from C, that really helped my understanding of what is going on. There is still a lot to look into though! Also, discovering the -FA flag took me a while but was super satisfying to find, and I plan on using it quite a bit as I continue working through the rest of the chapters.

References

  1. https://gpfault.net/posts/asm-tut-0.txt.html
  2. https://docs.microsoft.com/en-us/cpp/assembler/masm/masm-for-x64-ml64-exe?view=msvc-160
  3. https://docs.microsoft.com/en-us/cpp/assembler/masm/ml-and-ml64-command-line-reference?view=msvc-160
  4. https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-160
  5. https://stackoverflow.com/questions/148968/windows-batch-files-bat-vs-cmd
  6. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160
  7. https://www.agner.org/optimize/calling_conventions.pdf
  8. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture?redirectedfrom=MSDN
  9. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/annotated-x64-disassembly
  10. https://docs.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-160
  11. https://docs.microsoft.com/en-us/windows/console/getstdhandle
  12. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
  13. https://docs.microsoft.com/en-us/cpp/build/reference/fa-fa-listing-file?view=msvc-160
  14. http://www.songho.ca/misc/alignment/dataalign.html
  15. https://software.intel.com/content/www/us/en/develop/articles/introduction-to-x64-assembly.html#:~:text=Since%20the%2064%2Dbit%20registers,stored%20in%20lower%20memory%20addresses
  16. https://www.gamasutra.com/view/news/171088/x64_ABI_Intro_to_the_Windows_x64_calling_convention.php
  17. https://expobrain.net/2013/06/16/disassembly-c-code-for-fun-part-1/
>> Home