Debugging C Programs with GDB – Part 1

When you write C code, you’re playing with power! You’re bound to let this power go to your head and shoot yourself in the foot here and there. At some point(s) your program is going to do something that just doesn’t quite make sense.

The bad news is that your program doesn’t make any sense because you’ve written flaws into it. That’s fine, you’ve either written janky C programs, or not written any C. The good news is that GDB is here to help us learn from our mistakes!

Through the next few posts I’ll share some tips on basic GDB usage, explore a bit of history and dig more into how the C programs on my machine are actually working.

Building for Debugging

To kick things off, I’m going to just slap together a quick C program and a Makefile to assist in building it and running my debugger.

// test.c
#include <stdio.h>

char *done = "Done!";

int main(int argc, char *argv[]) {
  int i;

  for (i = 0; i < 10; i++) {
    printf("Iteration %d\n", i);
  }
  printf("%s\n", done);

  return 0;
}

This program has a simple for loop and a few print statements and I’ll use GDB to inspect what it’s doing a bit more. To provide more information to the debugger about this program I’ll use the -g flag when building it.

# Makefile
CC=gcc -g -o $@ -Wall $<

all: test

test: test.c
  $(CC)

debug: test
  gdb -q ./test

For maximum laziness, I added a debug target to my Makefile here so that I can use make debug to jump right it. I gave gdb the -q option to quiet down since it normally has a lot to say on startup.

That’s about all I need to get my program ready for debugging!

Basic Commands

Now we get to the hard part. GDB has a bajillion features so getting started can be daunting. Probably one of the best commands to learn first is the run command, as so far the program has been looked at a little bit, but isn’t actually running at the moment.

You can also provide arguments to the program by providing arguments to run. This program doesn’t care about arguments, but don’t let that stop you from giving it some anyway!

The excitement of just running a program in GDB is very short lived, I want to be able to stop the program somewhere and poke around a bit. The list command can spit out a listing of the program.

Initially gdb will show the first 10 lines of the source. You could run list again to see the next 10 lines but GDB has a friendly feature where hitting enter will automatically rerun your last command, so I used that to continue reading the full source.

Looking at this listing, I think a good place to pause and look around would be at the printf() call within my for loop. To have GDB stop here I’ll use the break command and I’ll give it the argument 10 to indicate I’d like to set a breakpoint at line 10.

Now when I give it a run, it’ll stop the program when it hits that line.

To resume the program, until the next breakpoint is hit, you can use the continue command. Another little time-saver trick with gdb is that many commands have shortcuts, such as c for continue.

Peeking Into The Code

The ability to set breakpoints and resume execution is a good start, but even better is getting a look around at this point in time to glean more about what the program is doing. It’s time to start looking beyond the C code and see what the program is actually doing in assembly, the state of the CPU in the context of our program and what’s going on in memory.

First let’s look at the assembly version of the main function. I’ll use the disassemble command for that, and I’ll tell it that main is what I’m interested in disassembling.

Oh noes! Assembly!

Assembly code get’s a bad rep, but it’s not as bad as people think it is. You might not want to write a large application in assembly, and that’s reasonable, but if you want to be a strong C programmer you need to know enough assembly to figure out what your program is up to.

x86_64 assembly has two different syntaxes to choose from, AT&T syntax and Intel syntax. They both work just fine but GDB defaults to AT&T syntax and I prefer the Intel syntax so I’ll use the command set disassembly-flavor intel to get it to my liking.

That looks better! Now let’s briefly look at a few things. Looks like my main function is 21 instructions long, alright… a smidge more than half of the operations are mov (move) instructions and I see a few branching operations, jmp (jump), call (call a subroutine), jle (jump if less than or equal to) and ret (return from subroutine).

One thing I find interesting is the instruction at offset <+64>, call 0x400430 <puts@plt>. I did not use the puts() function in my code! The compiler caught on that my last printf() statement doesn’t need to be a format string and optimized the result a little bit.

Let’s get back to inspecting what this program is up to, I’m currently still in the middle of my paused program, and I’m at the very start of one of my loop iterations. In this disassembly output I can see I’m at offset <+24>, as indicated by the little => arrow, this is the next instruction the program will run.

The mov instruction moves a value from one place to another, similar to the assignment operator  = in most programming languages. In this case the full instruction is mov eax,DWORD PTR [rbp-0x4] which is basically eax = DWORD PTR [rbp - 0x4]. Ignoring the right side of that for now, we’re assigning a value to something called eax. This eax thing is a CPU register, which is basically a variable in the hardware of the CPU. We can look at all the registers with the info command by saying info registers.

Okay so there are a bunch of registers, and eax is not one of them… GREAT! This is because the x86 architecture has been through a lot, way back in the day (early 70s) Intel released their 8008 CPU that had some 8-bit registers with names like A (for Accumulator).

When Intel got to the 8086 in the late 70s they made the A register twice the size (16-bits) and started calling it the AX register. To help with software compatibility with older system the AX register could be used as an 8-bit register with AH representing the higher 8 bits and AL the lower 8 bits.

Then the mid-80s showed up and Intel was like MOAR BITS and released their 80386 that had 32-bit registers, now they refer to the A register as EAX (there’s our guy!), again preserving backward compatibility by allowing the 16 and 8 bit registers to remain the same. Now-a-days our 64-bit processors are king, so we have the 64-bit register RAX, but can still use EAX, AX, AH, and AL.

All that history lesson to give full context on why mov eax, <stuff>  is going to modify our rax register!

Now, to run just that one instruction, I’ll use the nexti command. I’ll then check the registers again with the shorthand version of info registers and just look at the eax register: i r eax

If I continue my program, I’ll notice that this number correlates with something in my program.

The eax register is getting set to the i value I’m setting during my for loop!

In the next post I’ll continue digging into this program and discover more about the disassembled version of my C program and show off some more GDB commands along the way!

C Strings and Standard Input

Many C tutorials out there will show you some bad ways to do things. I’ll pick on this input and output tutorial as an example.

It has what may appear as a pretty reasonable way to read input with the deprecated gets() function.

#include <stdio.h>
int main( ) {

   char str[100];

   printf( "Enter a value :");
   gets( str );

   printf( "\nYou entered: ");
   puts( str );

   return 0;
}

Or via scanf() like this

#include <stdio.h>
int main( ) {

   char str[100];
   int i;

   printf( "Enter a value :");
   scanf("%s %d", str, &i);

   printf( "\nYou entered: %s %d ", str, i);

   return 0;
}

In either case, the program wants you to enter a string. The string can only fit 100 ascii characters, though should really only have 99 so that your string can end with a 0 byte to be properly NULL-terminated. When I give it 120 a‘s, my system is reasonably displeased as I clobber over other parts of my stack.

$ ./badhabits 
Enter a value :aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1

*** stack smashing detected ***: ./badhabits terminated
You entered: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 Aborted (core dumped)

One option would be to use a better format string. Referring to something like the GNU C Library Manual, we can see that the scanf() function has a few other tricks up its sleeve that can help us.

If we really wanted a 99 character limit on this string, we could change the format string to "%99s %d".

$ ./badhabits 
Enter a value :aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

You entered: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 0

In this case scanf() will truncate the string so that it fits into the size of the buffer. If the library is POSIX compliant, the m modifier can also be used to ask scanf() to dynamically allocate your string with malloc() and give you a pointer to that newly allocated memory space that now holds the input string nicely.

#include <stdio.h>

int main() {
  char *name;

  printf("Enter your name: ");
  scanf("%ms", &name);

  printf("Hello %s!\n", name);

  return 0;
}
$ ./betterscanf 
Enter your name: test
Hello test!

Beyond Scanf

Personally, I’m not a big fan of scanf() in general. When hunting for other options, I first will peruse the GNU C Library’s manual. In section 12.9 I find an approach that fits a common need I have, reading one line at a time with getline().

The getline() function offers a few things that I like. It expects to work with dynamically allocated buffers and will allocate or reallocate them to size for you.

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

Using it is pretty simple, it takes a pointer to a char pointer (lineptr) along with a pointer to a size_t type number (n). It’ll read a line from the stream file descriptor and return a ssize_t (signed size) value of the number of bytes read or -1 on failure.

#define _GNU_SOURCE
#include <stdio.h>

int main() {
  char *string = NULL;
  size_t buffer_size = 0;
  ssize_t read_size;

  printf("Enter some stuff!\n");
  read_size = getline(&string, &buffer_size, stdin);

  printf("Read %zd bytes, buffer is %zd bytes\n", read_size, buffer_size);
  printf("Line read:\n%s", string);

  return 0;
}

You need to make sure to set the string to NULL if it’s not already dynamically allocated or you’ll be passing whatever just happened to be laying around in the stack.

Also worth noting, for the environment I’m building this within I needed to place the processor directive #define _GNU_SOURCE prior to including stdio.h to properly pull in the getline() functionality without angering the compiler.

$ ./getline 
Enter some stuff!
weeeeeeeeeeee
Read 14 bytes, buffer is 120 bytes
Line read:
weeeeeeeeeeee

In my run here, the buffer_size is getting set to a larger size than the string, whose length I got back from the getline() call. There is some consideration here on the part of the library that it is more efficient to give a longer buffer since it is likely to be added to later on and resizing buffers can be slow.

Fancier Getline

I like the idea of making a function that abstracts this a bit so it’s a bit friendlier to use. I can reuse portions of the FancyString type I built in a previous post to build some functions that will let me dynamically read a line in a single step.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

typedef struct {
  ssize_t length;
  char *string;
  size_t buffer_size;
} FancyString;


void FancyString_free(FancyString *target) {
  if (target->string) {
    free(target->string);
  }
  free(target);
}


FancyString* fancy_getline(FILE *stream) {
  FancyString *new = malloc(sizeof(*new));
  new->string = NULL;
  new->buffer_size = 0;

  new->length = getline(&(new->string), &(new->buffer_size), stream);
  if (new->length == -1) {
    free(new);
    return NULL;
  } else {
    return new;
  }
}


int main() {
  FancyString *line = fancy_getline(stdin);

  printf("Read %zd bytes, buffer is %zd bytes\n",
         line->length,
         line->buffer_size);
  printf("Line read:\n%s", line->string);

  FancyString_free(line);

  return 0;
}
$ ./fancy_getline 
this is a test of the fanciness
Read 32 bytes, buffer is 120 bytes
Line read:
this is a test of the fanciness

I find this a bit more convenient to manage the lines of input, this could even be extended to include a function that would operate like readlines() in Python. I’ll modify my FancyString to support usage as a linked list, I’ll share more about linked lists and other data structure patterns in a future post.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

typedef struct _fancystring {
  ssize_t length;
  char *string;
  size_t buffer_size;
  struct _fancystring *next;
} FancyString;


void FancyString_free(FancyString *target) {
  if (target->string) {
    free(target->string);
  }
  free(target);
}


FancyString* fancy_getline(FILE *stream) {
  FancyString *new = malloc(sizeof(*new));
  new->string = NULL;
  new->buffer_size = 0;
  new->next = NULL;

  new->length = getline(&(new->string), &(new->buffer_size), stream);
  if (new->length == -1) {
    free(new);
    return NULL;
  } else {
    return new;
  }
}

FancyString* fancy_readlines(FILE *stream) {
  FancyString *first = NULL;
  FancyString *last = NULL;
  FancyString *i = NULL;

  while ((i = fancy_getline(stream)) != NULL) {
    if (first == NULL) {
      first = i;
      last = i;
    } else {
      last->next = i;
      last = i;
    }
  }

  return first;
}

int main() {
  printf("Enter many lines, end with CTRL+D\n");

  FancyString *line = fancy_readlines(stdin);
  FancyString *previous_line;

  int i = 1;
  while (line != NULL) {
    printf("Line %d: %s", i, line->string);
    i++;
    previous_line = line;
    line = line->next;
    FancyString_free(previous_line);
  }

  return 0;
}
$ ./readlines
Enter many lines, end with CTRL+D
this is a line
and this!
and moar
and moooooaoOOAOOAOARRRRR
Line 1: this is a line
Line 2: and this!
Line 3: and moar
Line 4: and moooooaoOOAOOAOARRRRR

 

Command Line Arguments in C

Today I’m going to share some tips and tricks to using command line arguments with C programs. First I’ll explore the plain ‘ol argc/argv style  followed by a getopt approach.

Let’s jump right to it!

 Classic Approach

Most C programmers will be quite familiar with this approach, so I’ll keep it brief. The most common function signature for main is int main(int argc, char *argv[]). In this setup the argc will tell you the number of arguments passed to the program at launch, with argc getting a list of string pointers to those arguments.

With a small test program, we can inspect these variables pretty easily. I’ll even look at where the argv pointer is and where the string pointers within it are pointing to, because why not!

My classic.c source:

#include <stdio.h>

int main (int argc, char *argv[]) {
  int i;
  printf("Argument Count: %d\n", argc);
  printf("argv is at %p\n", argv);

  for (i = 0; i <= argc; i++) {
    printf("%d at %p: %s\n", i, argv[i], argv[i]);
  }

  return 0;
}

Testing it:

$ ./classic well hello there
Argument Count: 4
argv is at 0x7ffc923396b8
0 at 0x7ffc92339f4d: ./classic
1 at 0x7ffc92339f57: well
2 at 0x7ffc92339f5c: hello
3 at 0x7ffc92339f62: there
4 at (nil): (null)

The first argument is the name of the program as it was executed, the remaining argc - 1 are the command line arguments to that program. They are most often accessed using an index operator [i] as was done here, though usually i < argc; is usually used in the for loop condition to skip the null element that terminates the array.

Since this is a null terminated array, you could also use pointer arithmetic to process through the list and not use argc at all.

#include <stdio.h>

int main (int argc, char *argv[]) {

  char **arg = argv;
  while (*arg) {
    printf("%s\n", *arg);
    arg++;
  }

  return 0;
}
$ ./classic do it live
./classic
do
it
live

How you interpret the command line arguments from there is up to you, though some patterns are very common and there are libraries to assist in implementing them.

Getopt Basics

The getopt() function is part of any Standard C Library implementation that follows a POSIX standard. Many libraries will also follow the GNU standard for getopt_long().

The general idea is that options use a format starting with - followed by a single letter to indicate something about what the user wants the program to do. As an example, many programs on Linux will have a -v option you can use to see more verbose console output, or a -h option to get help on using the program.

The getopt implementation has a few variables that represent the current internal state of its parsing system.

extern int optind, opterr, optopt;

The optind variable is the index value of the next argument that should be handled by the getopt() function. opterr will let you control if the getopt() function should print errors to the console. If the getopt() call returns ? because it did not recognize the option being given, optopt will be set to the character it did not recognize.

#include <stdio.h>
#include <unistd.h>


void print_getopt_state(void) { 
  printf("optind: %d\t" "opterr: %d\t" "optopt: %c (%d)\n" ,
    optind, opterr, optopt, optopt
  );
}


int main (int argc, char *argv[]) {
  print_getopt_state();
  return 0;
}
$ ./getopt 
optind: 1 opterr: 1 optopt: ? (63)

To use the getopt() function you provide the argc and argv variables along with an optstring variable that contains the list of options it should look for.

#include <stdio.h>
#include <unistd.h>


void print_getopt_state(void) {
  printf("optind: %d\t" "opterr: %d\t" "optopt: %c (%d)\n" ,
    optind, opterr, optopt, optopt
  );
}


int main (int argc, char *argv[]) {
  int character;
  char *options = "v";

  print_getopt_state();

  character = getopt(argc, argv, options);

  printf("getopt returned: '%c' (%d)\n", character, character);
  print_getopt_state();

  return 0;
}
$ ./getopt 
optind: 1 opterr: 1 optopt: ? (63)
getopt returned: '�' (-1)
optind: 1 opterr: 1 optopt: (0)

$ ./getopt -v
optind: 1 opterr: 1 optopt: ? (63)
getopt returned: 'v' (118)
optind: 2 opterr: 1 optopt: (0)

$ ./getopt -h
optind: 1 opterr: 1 optopt: ? (63)
./getopt: invalid option -- 'h'
getopt returned: '?' (63)
optind: 2 opterr: 1 optopt: h (104)

On each run of getopt(), until it reaches the end of the argument list and returns -1, it will check the next argument and return the option found or ? if an unrecognized option was given.

#include <stdio.h>
#include <unistd.h>


void print_getopt_state(void) {
  printf("optind: %d\t" "opterr: %d\t" "optopt: %c (%d)\n" ,
    optind, opterr, optopt, optopt
  );
}


int main (int argc, char *argv[]) {
  int character;
  char *options = "abcd";

  print_getopt_state();

  character = getopt(argc, argv, options);

  while(character != -1) {
    printf("getopt returned: '%c' (%d)\n", character, character);
    print_getopt_state();
  
    character = getopt(argc, argv, options);
  }

  printf("getopt returned: '%c' (%d)\n", character, character);
  print_getopt_state();

  return 0;
}
$ ./getopt -d -a -b
optind: 1 opterr: 1 optopt: ? (63)
getopt returned: 'd' (100)
optind: 2 opterr: 1 optopt: (0)
getopt returned: 'a' (97)
optind: 3 opterr: 1 optopt: (0)
getopt returned: 'b' (98)
optind: 4 opterr: 1 optopt: (0)
getopt returned: '�' (-1)
optind: 4 opterr: 1 optopt: (0)

You can also include multiple options in a single argument by just not separating them. Multiple instances of the same option will be iterated on multiple times.

$ ./getopt -baddd
optind: 1 opterr: 1 optopt: ? (63)
getopt returned: 'b' (98)
optind: 1 opterr: 1 optopt: (0)
getopt returned: 'a' (97)
optind: 1 opterr: 1 optopt: (0)
getopt returned: 'd' (100)
optind: 1 opterr: 1 optopt: (0)
getopt returned: 'd' (100)
optind: 1 opterr: 1 optopt: (0)
getopt returned: 'd' (100)
optind: 2 opterr: 1 optopt: (0)
getopt returned: '�' (-1)
optind: 2 opterr: 1 optopt: (0)

Optional and Positional Arguments

A colon after an option in the optstring can be used to indicate that option requires an argument, while two colons can indicate that it supports an argument but is not required. In either case, if an argument is given to an option that supports it, getopt() will set the optarg pointer it provides to the argument.

#include <stdio.h>
#include <unistd.h>


void print_getopt_state(void) {
  printf("optind: %d\t" "opterr: %d\t" "optopt: %c (%d)\t" "optarg: %s\n" ,
    optind, opterr, optopt, optopt, optarg
  );
}


int main (int argc, char *argv[]) {
  int character;
  char *options = "abc:d::";

  print_getopt_state();

  character = getopt(argc, argv, options);

  while(character != -1) {
    printf("getopt returned: '%c' (%d)\n", character, character);
    print_getopt_state();
  
    character = getopt(argc, argv, options);
  }

  printf("getopt returned: '%c' (%d)\n", character, character);
  print_getopt_state();

  return 0;
}
$ ./getopt -dwith -dwithout -c
optind: 1 opterr: 1 optopt: ? (63) optarg: (null)
getopt returned: 'd' (100)
optind: 2 opterr: 1 optopt: (0) optarg: with
getopt returned: 'd' (100)
optind: 3 opterr: 1 optopt: (0) optarg: without
./getopt: option requires an argument -- 'c'
getopt returned: '?' (63)
optind: 4 opterr: 1 optopt: c (99) optarg: (null)
getopt returned: '�' (-1)
optind: 4 opterr: 1 optopt: c (99) optarg: (null)

If optstring begins with -, non-option positional arguments can also be handled. In these cases getopt() will return the value 1 to indicate it has found a positional argument and set the optarg pointer to it.

#include <stdio.h>
#include <unistd.h>


void print_getopt_state(void) {
  printf("optind: %d\t" "opterr: %d\t" "optopt: %c (%d)\t" "optarg: %s\n" ,
    optind, opterr, optopt, optopt, optarg
  );
}


int main (int argc, char *argv[]) {
  int character;
  char *options = "-abc:d::";

  print_getopt_state();

  character = getopt(argc, argv, options);

  while(character != -1) {
    printf("getopt returned: '%c' (%d)\n", character, character);
    print_getopt_state();
  
    character = getopt(argc, argv, options);
  }

  printf("getopt returned: '%c' (%d)\n", character, character);
  print_getopt_state();
  return 0;
}
$ ./getopt well -a -b now
optind: 1 opterr: 1 optopt: ? (63) optarg: (null)
getopt returned: '' (1)
optind: 2 opterr: 1 optopt: (0) optarg: well
getopt returned: 'a' (97)
optind: 3 opterr: 1 optopt: (0) optarg: (null)
getopt returned: 'b' (98)
optind: 4 opterr: 1 optopt: (0) optarg: (null)
getopt returned: '' (1)
optind: 5 opterr: 1 optopt: (0) optarg: now
getopt returned: '�' (-1)
optind: 5 opterr: 1 optopt: (0) optarg: (null)

If optstring begins with a +, it will stop option parsing and return the value -1 at the first non-option argument. The - and + prefixes are only provided by C libraries that follow the GNU extension’s

Long Options

The GNU extensions provide a fancier version of getopt that supports longer, more verbose options that begin with --. It has the same first 3 parameters as getopt(). The 4th parameter is an array, longopts, of struct option structures that describe the longer options. The last parameter is an integer pointer, longindex, that on match will be the index of the matched option from the longopts array. longindex may be set to NULL if you don’t plan to use it.

The struct option has this format:

struct option {
  const char *name;
  int         has_arg;
  int        *flag;
  int         val;
};

The name is the option name that is supported. The has_arg field can be set to no_argument, required_argument or optional_argument, which correlate to the values 0, 1, and 2.

The flag pointer, will be set to val on a match and getopt_long() will return 0. If flag is NULL, val will be what getopt_long()returns when the long option is matched. Often a program will have a long option that returns the short option. This is common for an option like --help, to set val to h so that the code that handles -h can handle either.

Here’s an example that uses a few different ways of handling the long options.

#include <stdio.h>
#include <unistd.h>
#include <getopt.h>


int main (int argc, char *argv[]) {
  int character;
  char *options = "h";
  int longindex;
  int moartest_flag = 0;

  struct option longopts[] = {
    {"help", no_argument, NULL, 'h'},
    {"echo", required_argument, NULL, 0},
    {"longtest", optional_argument, &moartest_flag, 12},
    {NULL, 0, NULL, 0}
  };

  while((character = getopt_long(argc, argv, options, longopts, &longindex)) != -1) {
    printf("getopt_long returned: '%c' (%d)\n", character, character);
    switch (character) {
      case 'h':
        printf("help!\n");
        break;
      case 0:
        printf("longindex: %d\n", longindex);
        printf("longopts[longindex].name = %s\n", longopts[longindex].name);
        printf("optarg: %s\n", optarg);
        printf("moartest_flag: %d\n", moartest_flag);
    }
  }

  return 0;
}
$ ./getopt --help --longtest --echo wee
getopt_long returned: 'h' (104)
help!
getopt_long returned: '' (0)
longindex: 2
longopts[longindex].name = longtest
optarg: (null)
moartest_flag: 12
getopt_long returned: '' (0)
longindex: 1
longopts[longindex].name = echo
optarg: wee
moartest_flag: 12

In this code, -h and --help will match the h case and print my help message. Both --longtest and --echo are handled by the 0 case, I can tell them apart from the value that longindex gets set to. When --longtest is given, the moartest_flag integer will get set to 12 since I provided the pointer to that integer in my flags field for that option.

That covers what I wanted to share on parsing command line arguments! There are other strategies out there for handling it, but these are the most common and I hope this guide proves helpful. If you have any questions or feedback please drop a note in the comments!