Debugging C Programs with GDB – Part 3

In my previous GDB post, I went over many common debugger commands while exploring the stack build up and initialization of stack variables for the main function in a simple C program. In this post I’ll use more GDB commands to further inspect the body of this small program.

Quick references:

For Loop Assembly

To dig deeper into the exact operation of my main function, I’ll first look at the block of instructions the compiler built when interpreting and optimizing my C code.

0x0000000000400575 <+15>:	mov    DWORD PTR [rbp-0x4],0x0
0x000000000040057c <+22>:	jmp    0x400596 <main+48>
0x000000000040057e <+24>:	mov    eax,DWORD PTR [rbp-0x4]
0x0000000000400581 <+27>:	mov    esi,eax
0x0000000000400583 <+29>:	mov    edi,0x40064a
0x0000000000400588 <+34>:	mov    eax,0x0
0x000000000040058d <+39>:	call   0x400440 <printf@plt>
0x0000000000400592 <+44>:	add    DWORD PTR [rbp-0x4],0x1
0x0000000000400596 <+48>:	cmp    DWORD PTR [rbp-0x4],0x9
0x000000000040059a <+52>:	jle    0x40057e <main+24>
0x000000000040059c <+54>:	mov    rax,QWORD PTR [rip+0x200a9d]
0x00000000004005a3 <+61>:	mov    rdi,rax
0x00000000004005a6 <+64>:	call   0x400430 <puts@plt>

As seen at the end of the previous post, the instruction at <+15> (15 bytes into main) is setting PTR [rbp-4] to 0, which for this program is the i iterator variable used in the for loop of main. This is the implementation of the initializer i = 0; in my for loop.

The program then unconditionally jumps to <+48>, which compares (cmp) i to 9. This compare instruction will change some CPU flags that are use with conditional jumps such as the jle (jump if less than or equal) instruction that follows. This cmp/jle combo is the implementation of the condition of the for loop: i < 10. This shows us that the compiler decided this was better implemented as i <= 9.

Now I’ll watch the loop a bit more closely. I’ll start off by adding the breakpoint at the start of main with break main, start the program with run, and disas main to verify where the breakpoint was set.

Now I’ll use watch i to ask GDB to notify me of changes to i. When I continue the program, it’ll run until the value of i is modified, which happens during the add instruction at <main+44>. This instruction, add DWORD PTR [rbp-0x4],0x1], is the i++ step of my for loop.

With i incremented, the program checks to see if another iteration of the loop should be ran by comparing i to 9. When this comparison happens various CPU flags are modified in the process. If we check the flags before and after the cmp, we can see what’s getting changed.

The cmp operation will substitute i by 9 to see what the difference between them is. In this case the operation was 1 - 9, so -8. The flags that jle will look for is the zero flag ZF that indicates the result is 0 (and the numbers compared are equal) or the sign flag SF that indicates the result of the operations resulted in a negative number. With how this program is written, jle will always jump the program to <main+24> until i is 10, as cmp will have cleared both the zero and sign flags at that point so no jump will be taken.

Calling Functions

Inside of this loop, not too much is going on. The body of the loop is setting up some parameters to be passed to the printf() function, then calling it.

0x000000000040057e <+24>: mov eax,DWORD PTR [rbp-0x4] 
0x0000000000400581 <+27>: mov esi,eax 
0x0000000000400583 <+29>: mov edi,0x40064a 
0x0000000000400588 <+34>: mov eax,0x0 
0x000000000040058d <+39>: call 0x400440 <printf@plt>

So I’ll set a new breakpoint at *0x40057e to watch more closely what’s going on at that point. I’ll use nexti and i r <registers> to monitor the register changes during these operations.

So generally it looks like eax was used to grab the value of i, then set esi to it. I’m not sure why this was done, but the compiler usually knows what it’s doing so I trust it’s for a good reason. The last mov sets eax to 0 so the value of i is only set in eax for usage in the next instruction. There’s also a mov here to set edi to 0x400440.

I referenced the calling convention for System V x86_64 and see that rdi and rsi are the first and second registers used as parameters in a function call.  Matching my C call to printf, edi is my string and esi is the i variables value. I can take a look at this string with x similar to how I inspected numbers before, but I’ll use s as the display type indicator.

The last instruction of the loop body is simply the call to the address of the printf function.

After the Loop

After our for loop finally completes, there are just 6 more instructions to run before main is complete.

0x000000000040059c <+54>:	mov    rax,QWORD PTR [rip+0x200a9d] # 0x601040
0x00000000004005a3 <+61>:	mov    rdi,rax
0x00000000004005a6 <+64>:	call   0x400430 <puts@plt>
0x00000000004005ab <+69>:	mov    eax,0x0
0x00000000004005b0 <+74>:	leave  
0x00000000004005b1 <+75>:	ret

I hope you’re used to what mov does by now, though in this case it’s doing another offset with an important register I haven’t yet touched on, rip. This is the Instruction Pointer, this register keeps holds the value of the next instruction the CPU should execute.

Here I restarted my debugger and set a breakpoint at the <main+54> instruction. I can confirm that’s the next instruction to run because rip is set to that same address. It’s interesting that the compiler decided to use an offset from a register that’s always going to be the same value, GDB even noticed this and tells me the final address will be 0x602040.

Just as before, this is pulling up a string (character pointer) that’ll be used in the puts function call. This pointer has a much different address because of how I defined it in my program. This one was the global variable done, as opposed to my Iteration %s\n string which was a variable in the function scope of main.

The last mov of the program is setting eax to 0, this register is used as the return value of a function call and I wrote main to return 0. The leave begins restoring the stack to as it was before main was called and ret completes the exit of my main function. Internally libc will hold onto the value main returned, do some cleanup then make a system call to exit() with that return value to notify the OS that the program has completed.

With all that I hope this provides a helpful introduction to GDB and some x86 assembly. GDB has a ton of features and this only scratches the surface but I wanted to share these basics as I’ll be using them in future posts when I dive more into advanced C topics. If you have any questions or feedback I’d love to hear from you in the comments!

Debugging C Programs with GDB – Part 2

In my previous post I covered a few basics around building a C program for debugging, looking at the code listing and assembly of the program, setting breakpoints and peeking into the registers.

For Part 2, I’ll be using more gdb commands to explore the assembly code that’s involved in building up and initializing the stack frame at the start of a function.

Stacking Up

I want to dig down into every bit of what this program is doing, I’ll begin by pulling up the disassembly version of the program once more.

If I use break main to set a breakpoint at the start of my main function, GDB will be practical and set this breakpoint at the address 0x400575, 15 bytes into the assembled version of main. It’ll do this because the first 6 instructions of the function are setting up the stack for the function and normally you can trust that the compiler has done a good job handling that for you.

I want to really start at the beginning, so I’m going to instead set my breakpoint with a pointer to the first byte of the function with the command break

The first assembly instruction seen here is push rbp. This is pushing the value stored in the rbp (base pointer) register onto the stack, but where is the stack? The rsp (stack pointer) register tell us where the top of the stack currently is.

These first 3 instructions are building a new call stack frame. Since this program uses libc the program starts by running code to initialize a few things then calls the main function that was created in the source code. The compiler adds these instructions to build out the stack frame for the given function.

The stack when printf is being called from main

As mentioned before, the rsp register points to the points to the top of the stack. By running nexti (next instruction), I can execute one machine operation at a time. After letting push rbp run, the current value of rbp is added to the top of the stack. While this happens, the stack grows and rsp will be updated to the new top of the stack.

The value of rbp hasn’t changed, but the value of rsp is 8 bytes smaller. In most CPU architectures the stack grows “down” like this, where the lowest call frame represents the current frame.

Peeking Into Memory

We should now be able to see the rbp value at the top of the frame. By using the x command to print data in memory, we can take a look at the frame. x has a few useful options, while you’re learning GDB they can be tricky to remember so I recommend keeping a cheat sheet handy until you’ve got it down.

To look at the value at the top of the stack, I’ll use x/1xg 0x7fffffffda10. This is asking to see 1 unit of data, in hexadecimal format, considering 64-bit “giant” words. You can choose instead to dereference the rsp register directly in the command by using $rsp for the address.

The value of rbp is indeed at the top of the stack

With the commands covered so far we can more easily inspect what each instruction is doing. The next instruction is mov rbp, rsp, this moves the value of rsp into rbp. This is to set the base pointer to the top of the previous stack frame which is used at the end of a function to restore the previous stack state.

Now rbp and rsp are set to the same value

The next instruction, sub rsp,0x20, lowers the value of rsp by 32 (4 64-bit words). This is the size of the frame being built for main. Using x again, I’ll look at the 4 words in the stack, plus the next word after (the rbp value that was pushed to the stack).

There’s already data in this stack! This is data left over from previous execution and is considered uninitialized, as this function doesn’t know what data was been left behind here. When gcc warns that 'i' is used uninitialized in this function [-Wuninitialized], it is because the program is using a variable that’s not in a known state.

Pointers IRL

The next 3 instructions are going to be initializing the new stack frame region using pointers. If you’ve been confused about pointers in C, pointers in assembly might help demystify that concept for you.  Let’s consider the next 3 instructions:

=> 0x000000000040056e <+8>:	mov    DWORD PTR [rbp-0x14],edi
   0x0000000000400571 <+11>:	mov    QWORD PTR [rbp-0x20],rsi
   0x0000000000400575 <+15>:	mov    DWORD PTR [rbp-0x4],0x0

These are assignment operators that refer to addresses relative to a memory address that is stored in a register. In this case they are modifying the data in the new stack frame (addresses below rbp). Sometime before main was called, the edi and rsi registers were set to some values that the compiler wants kept around.

The first instruction here is setting a 32-bit (Double Word) at the memory address 20 bytes below rbp. I’ll need to give x a w parameter so that it’ll know I’m now looking for a 32-bit word.

Basically the same thing going on for the next instruction, but setting the memory address 32 bytes below the base pointer to the 64-bit value currently in rsi.

These registers (*di, *si) are index registers that can be used for various string operations. They are also part of the x86_64 calling convention to be used when calling functions. In this case, these registers contain the parameters for main, my argc and argv variables.

We can use the print command to look at the variables as they were named in the source code. The & operator can be used similar to how it used in C to verify that these variables are stored at those locations in the stack.

The next instruction, mov DWORD PTR [rbp-0x4],0x0, is the i variable used within the for loop inside of main, here being initialized to the value we specified, 0.

At this point the stack for main is initialized and main is ready to do its thing. In the next post I’ll continue the GDB exploration to investigate the rest of what this function is up to!

Running Nomad on ppc64le

Recently I’ve been doing some experimentation with Nomad, a tool that helps manage applications running on a cluster of machines. I first ran through their getting started guide, but wanted to continue my education by deploying it on some of my Barreleye systems to see if it could be used for benchmarking and other lab workloads.

I quickly found that there’s no official support for ppc architectures, but since this isn’t a production environment I wasn’t going to let that stop me.

Note on PPC64 vs PPC64LE

Before POWER8, the POWER architecture was exclusively big-endian. This became a barrier for users as many projects were not designed to consider the endianness of the processor and some applications and libraries would not run properly. There’s an assumption the system is little-endian (since x86 dominates the server and desktop market) and this can be fairly time consuming to address.

IBM responded to customer feedback about this, and introduced bi-endian support in POWER8 to allow the user to decide if they want their OS and applications to use little or big endianness. Since the release of POWER8, the Linux community has mostly moved over to ppc64le, as bugs related to processor endianness are no longer a factor.

For the most part, building code for ppc64le is typically identical to building code for x86_64!

But First, Go!

As Nomad does not officially support ppc, I’ll need to build it from source. The README clearly states that we’ll need Go version 1.9 or newer.

The system I’m working on is kicked with Ubuntu 16.04.4 LTS, and if you bring in the golang-go it will install go 1.6. Alternatively, there is a golang-1.9 package that’ll slap 1.9.2 on your system.

In my case though I’m going to manually install the latest stable version, 1.10, since I know there’s been good work going on to improve the Go assembler and performance for PowerPC. Besides, manually installing Go is easy as pie.

To install go I’ll download the latest built archive for ppc64le, extract it to /usr/local/ and append my PATH environment variable for all users by editing my /etc/profile file. Then I’ll create a go directory in my home directory and update my own .bashrc file to set my GOPATH variable to its path, while also appending $GOPATH/bin to my users PATH variable.

Off the Beaten Path

Next I’ll follow the Installing Nomad documentation on installing via source to see how far I get.

So the only hangup here is there’s no build target in the Makefile for ppc64le. The amd64 target is a pretty close guess to it, so I’ll duplicate that target and make some minor modifications to it.

There it is! I’m still a bit of a noob with Nomad and I’ve only run very basic workloads with it, but everything is working as expected so far. I look forward to doing a bit more tinkering with it in the near future.