How to Quickly Pick Up Python

This is a quick post highlighting the basic features of Python, highlighting some topics to help experienced developers understand python quickly.

Basic Project/Code Management

In my opinion Python works best on Linux, though people using it on Windows or OSX can also work it just fine. The best practice is to use Python 3 whenever possible, while knowing some things about 2.7 just in case. The biggest difference you’d see right away between 2.7 and 3 is that print() is a function in 3, but a language keyword in 2.7

Pip is your best friend, it’s the package manager for python and is one of the best things about Python. You can install just about anything with pip install <package>. This is often used in conjunction with virtualenv to isolate package environments. A common practice is to create an environment with virtualenv, then install some packages with pip and get started on your project.

Here’s how I start many projects

virtualenv venv creates a virtual environment in a directory named venv (this is what many python devs use as a convention). The environment is entered by sourcing the script in <environment>/bin/activate. When you want to exit it you can use the alias deactivate from within the virtual environment. Within this environment, all packages are installed in this isolated area.

Another convention for a project is to use a requirements.txt file to track libraries. You can get a list of the current environments’ packages with pip freeze. You can pipe the output of this command to a requirements.txt file so that another machine or user can pull down dependencies with pip install -r requirements.txt.

For managing python code in a git repo, you’ll want to add *.pyc and venv/ to your .gitignore to ignore cached python code files and your virtual environment.

Learning the Syntax and Dabbling

As you probably already know, Python is interpreted and spacing is part of the syntax. Rather than using curly brackets to contain block scope, Python uses levels of indentation. Tabs are permitted but most Python programmers will use 4 spaces for tabs. Python has a culture of appreciation for tidy code, pep8 can do some basic format nit-picking for you.

IPython is a handy tool for playing with basic syntax and exploring code and libraries.

The basic data types you’ll mess with in python are int, float, bool, str, object, list and dict.  Nearly everything in Python is implemented by a PyObject, and Python has a rich set of OO features that are mostly provided by that underlying object.

Dabble through the official Python Tutorial to practice, it’s a relatively quick but thorough avenue to learn the basic syntax, control flow, how to define functions and classes, and it explores the included standard library.

Getting Things Done

Here are some resources for specific topics from the Python Tutorial and Manual:

Beyond the included libraries, there are many other incredibly useful libraries, this is certainly not close to a comprehensive list but these libraries are tried and true:

Hopefully this can serve as a quick reference for those interested in adding Python to their repertoire. If you have any feedback or comments please drop a comment!

Spawning New Linux Processes in C

There are many good reasons to spawn other programs in Linux to do your bidding, but most programming languages don’t give you nearly as much control of the process as in C. In this post I will cover some of the most common ways to create new processes and manage them in C on a Linux system.

Grab a Fork

The classic fork() function is the most popular way to create a new process. It works by duplicating the process that calls it. It may sound complicated but it’s a fairly simple system.

#include <unistd.h>

pid_t fork(void);
#include <unistd.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
  pid_t fork_pid = fork();

  if (fork_pid == 0) {
    printf("Hello from the child!\n");
  } else {
    printf("Hello from the parent!\n");
  }

  return 0;
}
$ ./forktest 
Hello from the parent!
Hello from the child!

When a process calls fork(), Linux will duplicate that current process. The value returned by fork() will be 0 in the child process. In the parent process fork() will return the PID (Process ID) of the new child process or -1 if some error occurs.

Replacing a Running Process

After creating a new process, it’s common to replace that child process with an entirely different program. The exec() family of functions can handle this for us. The execl() is the simplest method.

#include <unistd.h>
int execl(const char *filename, const char *arg, ...);

All you need to provide is the location of the file to load, and the arguments you’d like to provide to it. Just like for any normal process, the first process is the process name.

#include <unistd.h>

int main(int argc, char *argv[]) {
  execl("/bin/bash", "/yes/its/bash", "-c", "echo $0 && uptime", NULL);
  
  return 0;
}
$ ./execltest
/yes/its/bash
10:11:12 up 1:07, 2 users, load average: 0.07, 0.10, 0.18

The other functions in the exec() family have various options to control arguments and environment variables for the new process that takes over.

Playing with the Kids

After you have wee ‘lil child processes, you’ll probably want to make sure they are doing as they are told. After you’ve sent the child process off to do its chores, you can use the wait() function to see what it returns with after it’s done.

#include <sys/wait.h>

pid_t wait(int *status);

When you call wait(), it will block the parent process until any of it’s children processes change state. The status pointer can be used if you’re interested in knowing what kind of state change has occurred, to determine if the program exited normally, or if it was terminated by a control signal. The return value of wait() is the child PID that has changed states.

#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
  pid_t child = fork();
  
  if (child) {
    wait(NULL);
    printf("child process terminated\n");
  } else {
    execl("/bin/bash", "/THECHILD", NULL);
  }
  
  return 0;
}
$ ./waitkids
$ echo $0
/THECHILD
$ exit
exit
child process terminated

Check the man page for wait for more details on the various options available when waiting.

Attack of the Clones

The fork(), exec() and wait() families of functions are portable across POSIX compliant systems. If more fine grained control over the process creation is desired, we’ll need to use the Linux specific clone() function.

I won’t cover the clone() in this post, as that’s probably more suited for a dedicated post. Regardless, I suggest perusing the man page for it to get an idea of what capabilities it offers.

That covers the basics of process creation using C on Linux. In the next post I plan on covering some of the methods available to communicate between running processes. If you found this helpful or informative, or have any feedback, please leave a note in the comments!

Debugging C Programs with GDB – Part 3

In my previous GDB post, I went over many common debugger commands while exploring the stack build up and initialization of stack variables for the main function in a simple C program. In this post I’ll use more GDB commands to further inspect the body of this small program.

Quick references:

For Loop Assembly

To dig deeper into the exact operation of my main function, I’ll first look at the block of instructions the compiler built when interpreting and optimizing my C code.

0x0000000000400575 <+15>:	mov    DWORD PTR [rbp-0x4],0x0
0x000000000040057c <+22>:	jmp    0x400596 <main+48>
0x000000000040057e <+24>:	mov    eax,DWORD PTR [rbp-0x4]
0x0000000000400581 <+27>:	mov    esi,eax
0x0000000000400583 <+29>:	mov    edi,0x40064a
0x0000000000400588 <+34>:	mov    eax,0x0
0x000000000040058d <+39>:	call   0x400440 <printf@plt>
0x0000000000400592 <+44>:	add    DWORD PTR [rbp-0x4],0x1
0x0000000000400596 <+48>:	cmp    DWORD PTR [rbp-0x4],0x9
0x000000000040059a <+52>:	jle    0x40057e <main+24>
0x000000000040059c <+54>:	mov    rax,QWORD PTR [rip+0x200a9d]
0x00000000004005a3 <+61>:	mov    rdi,rax
0x00000000004005a6 <+64>:	call   0x400430 <puts@plt>

As seen at the end of the previous post, the instruction at <+15> (15 bytes into main) is setting PTR [rbp-4] to 0, which for this program is the i iterator variable used in the for loop of main. This is the implementation of the initializer i = 0; in my for loop.

The program then unconditionally jumps to <+48>, which compares (cmp) i to 9. This compare instruction will change some CPU flags that are use with conditional jumps such as the jle (jump if less than or equal) instruction that follows. This cmp/jle combo is the implementation of the condition of the for loop: i < 10. This shows us that the compiler decided this was better implemented as i <= 9.

Now I’ll watch the loop a bit more closely. I’ll start off by adding the breakpoint at the start of main with break main, start the program with run, and disas main to verify where the breakpoint was set.

Now I’ll use watch i to ask GDB to notify me of changes to i. When I continue the program, it’ll run until the value of i is modified, which happens during the add instruction at <main+44>. This instruction, add DWORD PTR [rbp-0x4],0x1], is the i++ step of my for loop.

With i incremented, the program checks to see if another iteration of the loop should be ran by comparing i to 9. When this comparison happens various CPU flags are modified in the process. If we check the flags before and after the cmp, we can see what’s getting changed.

The cmp operation will substitute i by 9 to see what the difference between them is. In this case the operation was 1 - 9, so -8. The flags that jle will look for is the zero flag ZF that indicates the result is 0 (and the numbers compared are equal) or the sign flag SF that indicates the result of the operations resulted in a negative number. With how this program is written, jle will always jump the program to <main+24> until i is 10, as cmp will have cleared both the zero and sign flags at that point so no jump will be taken.

Calling Functions

Inside of this loop, not too much is going on. The body of the loop is setting up some parameters to be passed to the printf() function, then calling it.

0x000000000040057e <+24>: mov eax,DWORD PTR [rbp-0x4] 
0x0000000000400581 <+27>: mov esi,eax 
0x0000000000400583 <+29>: mov edi,0x40064a 
0x0000000000400588 <+34>: mov eax,0x0 
0x000000000040058d <+39>: call 0x400440 <printf@plt>

So I’ll set a new breakpoint at *0x40057e to watch more closely what’s going on at that point. I’ll use nexti and i r <registers> to monitor the register changes during these operations.

So generally it looks like eax was used to grab the value of i, then set esi to it. I’m not sure why this was done, but the compiler usually knows what it’s doing so I trust it’s for a good reason. The last mov sets eax to 0 so the value of i is only set in eax for usage in the next instruction. There’s also a mov here to set edi to 0x400440.

I referenced the calling convention for System V x86_64 and see that rdi and rsi are the first and second registers used as parameters in a function call.  Matching my C call to printf, edi is my string and esi is the i variables value. I can take a look at this string with x similar to how I inspected numbers before, but I’ll use s as the display type indicator.

The last instruction of the loop body is simply the call to the address of the printf function.

After the Loop

After our for loop finally completes, there are just 6 more instructions to run before main is complete.

0x000000000040059c <+54>:	mov    rax,QWORD PTR [rip+0x200a9d] # 0x601040
0x00000000004005a3 <+61>:	mov    rdi,rax
0x00000000004005a6 <+64>:	call   0x400430 <puts@plt>
0x00000000004005ab <+69>:	mov    eax,0x0
0x00000000004005b0 <+74>:	leave  
0x00000000004005b1 <+75>:	ret

I hope you’re used to what mov does by now, though in this case it’s doing another offset with an important register I haven’t yet touched on, rip. This is the Instruction Pointer, this register keeps holds the value of the next instruction the CPU should execute.

Here I restarted my debugger and set a breakpoint at the <main+54> instruction. I can confirm that’s the next instruction to run because rip is set to that same address. It’s interesting that the compiler decided to use an offset from a register that’s always going to be the same value, GDB even noticed this and tells me the final address will be 0x602040.

Just as before, this is pulling up a string (character pointer) that’ll be used in the puts function call. This pointer has a much different address because of how I defined it in my program. This one was the global variable done, as opposed to my Iteration %s\n string which was a variable in the function scope of main.

The last mov of the program is setting eax to 0, this register is used as the return value of a function call and I wrote main to return 0. The leave begins restoring the stack to as it was before main was called and ret completes the exit of my main function. Internally libc will hold onto the value main returned, do some cleanup then make a system call to exit() with that return value to notify the OS that the program has completed.

With all that I hope this provides a helpful introduction to GDB and some x86 assembly. GDB has a ton of features and this only scratches the surface but I wanted to share these basics as I’ll be using them in future posts when I dive more into advanced C topics. If you have any questions or feedback I’d love to hear from you in the comments!