Tuesday, May 21, 2013

A C++ Thread Pool Implementation Using POSIX Threads

Threads are very useful but potentially very tricky constructs in computer programming. They are also generally hard to get right. Building a deadlock-free and correct multi-threaded software requires great care.

And then threads are expensive. The processor needs to do a context switch to jump between threads. Each context switch means saving the state of the current executing thread and then loading the thread selected for execution. Creating a thread for every I/O operation or lengthy computation can push the machine to a halt if the number of requests for I/O and/or computation is too high.

The midway between creating too many threads for all requests for service and doing everything in one thread is to create a pool of threads and reuse a thread as soon as it is done servicing a request. C++ does not have a built-in thread pool library (it even didn't have threading support prior to C++ 11 standard that came out last year). Java has the Executor interface for this. With C++, some people use the Boost threading library or the Boost Asio library for achieving performance gains in their applications.

In this article we will design a very simple thread pool library using C++ and POSIX threads (also known as pthreads). We will not use any other external libraries like Boost.

The core idea of a thread pool is to have a number of worker threads always ready for accepting work from the main process. The process receives requests for work and schedule those requests as doable tasks for the threads. This pattern resembles the well known Readers-writers problem in Computer Science. There is a queue that is populated with tasks as they arrive in the process. The request processing part is the writer and the threads are the readers. Request processor will insert the work items as they arrive and the threads will pick up one item at a time from the queue in a First-In-First-Out fashion.

So a thread pool basically consists of the three primary actors:

1. A number of threads either waiting for or executing tasks.
2. A number of service requests to the server from clients. Each request is considered a task to the threads.
3. A queue holding the incoming service requests.




*image source: http://www.javamex.com/tutorials/threads/thread_pools.shtml

My C++ implementation of a thread pool:

In an attempt to dig deep into the multi-threading and asynchronous world of computation I developed a very simple thread pool using C++ and pthreads.

The source code for the thread pool with example usage can be found on my github page: https://github.com/bilash/threadpool

Each request is represented by an object of the class Task. Class Task essentially wraps a function pointer which points to the actual function that constitutes the task. Since it's a class we also provide a functor based interface to invoke the function pointer stored in the Task class.

The actual thread pool is manged by the ThreadPool class. It stores a vector of threads, a queue to store tasks (instances of the class Task), a method to enqueue incoming tasks, and a method to execute the tasks in a thread. There are also helper methods to initialize and destroy (and clean up) the pool.

Thread synchronization:

When developing a multi-threaded service or application we almost always need to use locks to prevent data corruption and data races in our program. This essentially means data that will be accessed by more than one thread - also known as a critical section - need to be protected by some kind of locking mechanism. The most popular of these locking mechanisms is called a mutex (for mutual exclusion). A mutex achieves what its name suggests - it allows execution of the code in a mutually exclusive way!

In our thread pool program we use mutex to protect our shared resource - the queue holding the tasks. The task queue is populated by the request processing part of the program and the is read by the threads waiting to pick up tasks. The threads also remove the task they have picked from the task queue since the task will no longer be needed after it is executed. Since the queue is accessed (and modified) by multiple actors in the program it is protected by a mutex lock.

The other synchronization mechanism used in the program is called Condition Variables. Condition variables are used to signal threads waiting on a condition to be true. In our program we use it to signal the threads that the queue has been populated with new tasks. The threads wait (put to sleep by the OS) while the task queue is empty. We wake up the waiting threads by using a condition variable.

Feel free to browse through the code and let me know if you find a bug or some issues with it. Again, it's a very minimalist code just to demonstrate the concept of thread pooling, so don't expect it to be very robust and flexible!

Thanks for reading!

Friday, April 5, 2013

Waiting for a child process with a timeout on Linux

Recently at work we were developing a backend server for a Web app. The server process creates a child process for each request that arrives at it. The server then waits for the child process to terminate. But since we couldn't wait indefinitely we needed a wait() function with a timeout. Since Linux does not have such a function in its wait() family of system calls we created a wrapper around the existing system call waitpid() that takes an additional boolean parameter which is set to true or false depending on whether the wrapper function is returning because of a timeout or not.

It looks something like this:

pid_t waitpid_with_timeout(pid_t pid, int *status, int options, int timeout_period, boolean* timed_out);

The body of the function essentially does this:

1. Set an signal handler for SIGALRM which doesn't do anything (we just need to know that alarm went off) and mask all other signals.
2. Install the signal sigaction structure.
3. Set the alarm clock by calling the alarm() system call.
4. Call the Linux system call waitpid().
5. If waitpid() returned -1 and errno was set to EINTR this means our alarm went off and we set timed_out to true. Otherwise if waitpid() succeeded then we did not timeout and the child process terminated before the timeout period specified in the parameter timeout_period.

After waitpid_with_timeout() returned, we check the timed_out parameter. If timed_out is set to true we kill the child process explicitly:

kill(pid, 9);

Now, everything was all good and dandy with this implementation. Until during testing we found out that even though was called waitpid() in the function waitpid_with_timeout() we did not collect the exit status of the child in the case of a timeout (when we explicitly killed the child with kill()). This was the backend of a Web application, so uncollected children were piling up with each request from the browser and they were all becoming zombie processes!

We realized that the solution to this problem was simply another call to waitpid() when the child was explicitly killed with kill(). So when waitpid_with_timeout() returned timed_out == true we simply added another call to waitpid() after we call kill():

kill(pid, 9);
waitpid(pid, &status, 0);

This solved our zombie process problem!

There are some interesting discussion of this topic on Stack Overflow if you are interested: http://stackoverflow.com/questions/282176/waitpid-equivalent-with-timeout