Such Programming

Tinkerings and Ramblings

C Strings and Standard Input

Many C tutorials out there will show you some bad ways to do things. In this case I mean objectively bad, as security issues could arrise from naive approaches. Though a hacker wouldn't exactly mind this so maybe that is subjective. I’ll pick on this input and output tutorial as an example.

It has what may appear as a pretty reasonable way to read input with the deprecated gets() function.

#include <stdio.h>
int main( ) {

   char str[100];

   printf( "Enter a value :");
   gets( str );

   printf( "\nYou entered: ");
   puts( str );

   return 0;
}

Or via scanf() like this

#include <stdio.h>
int main( ) {

   char str[100];
   int i;

   printf( "Enter a value :");
   scanf("%s %d", str, &i);

   printf( "\nYou entered: %s %d ", str, i);

   return 0;
}

In either case, the program wants you to enter a string. The string can only fit 100 ascii characters, though should really only have 99 so that your string can end with a 0 byte to be properly NULL-terminated. When I give it 120 a‘s, my system is reasonably displeased as I clobber over other parts of my stack.

$ ./badhabits 
Enter a value :aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1

*** stack smashing detected ***: ./badhabits terminated
You entered: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 Aborted (core dumped)

One option would be to use a better format string. Referring to something like the GNU C Library Manual, we can see that the scanf() function has a few other tricks up its sleeve that can help us.

If we really wanted a 99 character limit on this string, we could change the format string to "%99s %d".

$ ./badhabits 
Enter a value :aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

You entered: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 0

In this case scanf() will truncate the string so that it fits into the size of the buffer. If the library is POSIX compliant, the m modifier can also be used to ask scanf() to dynamically allocate your string with malloc() and give you a pointer to that newly allocated memory space that now holds the input string nicely.

#include <stdio.h>

int main() {
  char *name;

  printf("Enter your name: ");
  scanf("%ms", &name);

  printf("Hello %s!\n", name);

  return 0;
}
$ ./betterscanf 
Enter your name: test
Hello test!
ad

Beyond Scanf

Personally, I’m not a big fan of scanf() in general. When hunting for other options, I first will peruse the GNU C Library’s manual. In section 12.9 I find an approach that fits a common need I have, reading one line at a time with getline().

The getline() function offers a few things that I like. It expects to work with dynamically allocated buffers and will allocate or reallocate them to size for you.

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

Using it is pretty simple, it takes a pointer to a char pointer (lineptr) along with a pointer to a size_t type number (n). It’ll read a line from the stream file descriptor and return a ssize_t (signed size) value of the number of bytes read or -1 on failure.

#define _GNU_SOURCE
#include <stdio.h>

int main() {
  char *string = NULL;
  size_t buffer_size = 0;
  ssize_t read_size;

  printf("Enter some stuff!\n");
  read_size = getline(&string, &buffer_size, stdin);

  printf("Read %zd bytes, buffer is %zd bytes\n", read_size, buffer_size);
  printf("Line read:\n%s", string);

  return 0;
}

You need to make sure to set the string to NULL if it’s not already dynamically allocated or you’ll be passing whatever just happened to be laying around in the stack.

Also worth noting, for the environment I’m building this within I needed to place the processor directive #define _GNU_SOURCE prior to including stdio.h to properly pull in the getline() functionality without angering the compiler.

$ ./getline 
Enter some stuff!
weeeeeeeeeeee
Read 14 bytes, buffer is 120 bytes
Line read:
weeeeeeeeeeee

In my run here, the buffer_size is getting set to a larger size than the string, whose length I got back from the getline() call. There is some consideration here on the part of the library that it is more efficient to give a longer buffer since it is likely to be added to later on and resizing buffers can be slow.

Fancier Getline

I like the idea of making a function that abstracts this a bit so it’s a bit friendlier to use. I can reuse portions of the FancyString type I built in a previous post to build some functions that will let me dynamically read a line in a single step.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

typedef struct {
  ssize_t length;
  char *string;
  size_t buffer_size;
} FancyString;


void FancyString_free(FancyString *target) {
  if (target->string) {
    free(target->string);
  }
  free(target);
}


FancyString* fancy_getline(FILE *stream) {
  FancyString *new = malloc(sizeof(*new));
  new->string = NULL;
  new->buffer_size = 0;

  new->length = getline(&(new->string), &(new->buffer_size), stream);
  if (new->length == -1) {
    free(new);
    return NULL;
  } else {
    return new;
  }
}


int main() {
  FancyString *line = fancy_getline(stdin);

  printf("Read %zd bytes, buffer is %zd bytes\n",
         line->length,
         line->buffer_size);
  printf("Line read:\n%s", line->string);

  FancyString_free(line);

  return 0;
}
$ ./fancy_getline 
this is a test of the fanciness
Read 32 bytes, buffer is 120 bytes
Line read:
this is a test of the fanciness

I find this a bit more convenient to manage the lines of input, this could even be extended to include a function that would operate like readlines() in Python. I’ll modify my FancyString to support usage as a linked list, I’ll share more about linked lists and other data structure patterns in a future post.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

typedef struct _fancystring {
  ssize_t length;
  char *string;
  size_t buffer_size;
  struct _fancystring *next;
} FancyString;


void FancyString_free(FancyString *target) {
  if (target->string) {
    free(target->string);
  }
  free(target);
}


FancyString* fancy_getline(FILE *stream) {
  FancyString *new = malloc(sizeof(*new));
  new->string = NULL;
  new->buffer_size = 0;
  new->next = NULL;

  new->length = getline(&(new->string), &(new->buffer_size), stream);
  if (new->length == -1) {
    free(new);
    return NULL;
  } else {
    return new;
  }
}

FancyString* fancy_readlines(FILE *stream) {
  FancyString *first = NULL;
  FancyString *last = NULL;
  FancyString *i = NULL;

  while ((i = fancy_getline(stream)) != NULL) {
    if (first == NULL) {
      first = i;
      last = i;
    } else {
      last->next = i;
      last = i;
    }
  }

  return first;
}

int main() {
  printf("Enter many lines, end with CTRL+D\n");

  FancyString *line = fancy_readlines(stdin);
  FancyString *previous_line;

  int i = 1;
  while (line != NULL) {
    printf("Line %d: %s", i, line->string);
    i++;
    previous_line = line;
    line = line->next;
    FancyString_free(previous_line);
  }

  return 0;
}
$ ./readlines
Enter many lines, end with CTRL+D
this is a line
and this!
and moar
and moooooaoOOAOOAOARRRRR
Line 1: this is a line
Line 2: and this!
Line 3: and moar
Line 4: and moooooaoOOAOOAOARRRRR

That'll be it for this post, I hope you learned something or find some of this useful!

ad