Chapter 11. Threads - Shichao's Notes

Chapter 11. Threads¶

Introduction¶

Processes are discussed in earlier chapters. A limited amount of sharing can occur between related processes.

This chapter looks inside a process further to see how to use multiple threads of control (or simply threads) to perform multiple tasks within the environment of a single process. All threads within a single process have access to the same process components, such as file descriptors and memory.

This chapter is concluded with synchronization mechanisms available to prevent multiple threads from viewing inconsistencies in their shared resources.

Thread Concepts¶

With multiple threads of control, the programs can more than one thing at a time within a single process, with each thread handling a separate task. This approach can have several benefits:

We can simplify code that deals with asynchronous events by assigning a separate thread to handle each event type, while each thread can then handle its event using a synchronous programming model. A synchronous programming model is much simpler than an asynchronous one.
Multiple processes have to use complex mechanisms provided by the operating system to share memory and file descriptors. Threads, in contrast, automatically have access to the same memory address space and file descriptors
The processing of independent tasks can be interleaved by assigning a separate thread per task (only if they don’t depend on the processing performed by each other), so that overall program throughput can be improved. (A single-threaded process has to implicitly serializes those tasks.)
Interactive programs can be improved in response time by using multiple threads to separate the portions of the program that deal with user input and output from the other parts of the program.

The benefits of a multithreaded programming model can be realized on multiprocessor or multicore systems, and even on uniprocessor systems. A program can be simplified using threads regardless of the number of processors, since that doesn’t affect the program structure. As long as your program has to block when serializing tasks, there are improvements in response time and throughput when running on a uniprocessor, because some threads might be able to run while others are blocked.

A thread consists of the information necessary to represent an execution context within a process:

Thread ID: identifies the thread within a process
Set of register values
Stack
Scheduling priority and policy,
Signal mask
An errno variable (Section 1.7)
Thread-specific data (Section 12.6).

Everything within a process is sharable among the threads in a process:

Text of the executable program
The program’s global and heap memory
Stacks
File descriptors.

The threads interfaces of this chapter are from POSIX.1-2001, known as pthreads for "POSIX threads". The feature test macro for POSIX threads is _POSIX_THREADS. Applications can either use this in an #ifdef test to determine at compile time whether threads are supported or call sysconf with the _SC_THREADS constant to determine this at runtime. [p384]

Thread Identification¶

Unlike the process ID, which is unique in the system, the thread ID has significance only within the context of the process to which it belongs.

A thread ID is represented by the pthread_t data type. Implementations are allowed to use a structure to represent the pthread_t data type, so portable implementations can’t treat them as integers (process ID's pid_t data type is a non-negative integer). The pthread_equal function (below) must be used to compare two thread IDs. A consequence of allowing the pthread_t data type to be a structure is that there is no portable way to print its value. Linux 3.2.0 uses an unsigned long integer for the pthread_t data type. FreeBSD 8.0 and Mac OS X 10.6.8 use a pointer to the pthread structure for the pthread_t data type.

#include <pthread.h>

int pthread_equal(pthread_t tid1, pthread_t tid2);

/* Returns: nonzero if equal, 0 otherwise */

A thread can obtain its own thread ID by calling the pthread_self function.

#include <pthread.h>

pthread_t pthread_self(void);

Returns: the thread ID of the calling thread

This function can be used with pthread_equal when a thread needs to identify data structures that are tagged with its thread ID. For example, a single master thread places new jobs on a work queue. A pool of three worker threads removes jobs from the queue. Instead of allowing each thread to process whichever job is at the head of the queue, the master thread controls job assignment by placing the ID of the thread that should process the job in each job structure. Each worker thread then removes only jobs that are tagged with its own thread ID. This situation is illustrated below:

Thread Creation¶

The traditional UNIX process model (one thread of control per process) is conceptually the same as a threads-based model whereby each process is made up of only one thread. As the program runs, its behavior should be indistinguishable from the traditional process, until it creates more threads of control. [p385] Additional threads can be created by calling the pthread_create function:

#include <pthread.h>

int pthread_create(pthread_t *restrict tidp,
                   const pthread_attr_t *restrict attr,
                   void *(*start_rtn)(void *), void *restrict arg);

/* Returns: 0 if OK, error number on failure */

The memory location pointed to by tidp is set to the thread ID of the newly created thread when pthread_create returns successfully.
The attr argument is used to customize various thread attributes (detailed in Section 12.3). This chapter sets this to NULL to create a thread with the default attributes.
The newly created thread starts running at the address of the start_rtn function.
The arg is a pointer to the single argument passed to the start_rtn. If you need to pass more than one argument to the start_rtn function, then you need to store them in a structure and pass the address of the structure in arg.

When a thread is created, there is no guarantee whether the newly created thread or the calling thread. The newly created thread has access to the process address space and inherits the calling thread’s floating-point environment (fenv.h) and signal mask; however, the set of pending signals for the thread is cleared.

The following example creates one thread and prints the process and thread IDs of the new thread and the initial thread:

threads/threadid.c

#include "apue.h"
#include <pthread.h>

pthread_t ntid;

void
printids(const char *s)
{
    pid_t       pid;
    pthread_t   tid;

    pid = getpid();
    tid = pthread_self();
    printf("%s pid %lu tid %lu (0x%lx)\n", s, (unsigned long)pid,
      (unsigned long)tid, (unsigned long)tid);
}

void *
thr_fn(void *arg)
{
    printids("new thread: ");
    return((void *)0);
}

int
main(void)
{
    int     err;

    err = pthread_create(&ntid, NULL, thr_fn, NULL);
    if (err != 0)
        err_exit(err, "can't create thread");
    printids("main thread:");
    sleep(1);
    exit(0);
}

This example handles the races between the main thread and the new thread as follows:

First is the need to sleep in the main thread. Without sleep, the main thread might exit, thereby terminating the entire process before the new thread gets a chance to run. This behavior is dependent on the operating system’s threads implementation and scheduling algorithms
Second, the new thread obtains its thread ID by calling pthread_self instead of reading it out of shared memory or receiving it as an argument to its thread-start routine. If the new thread runs before the main thread returns from calling pthread_create, then the new thread will see the uninitialized contents of ntid instead of the thread ID. [p387-388]

Thread Termination¶

If any thread within a process calls exit, _Exit, or _exit, then the entire process terminates. Similarly, when the default action is to terminate the process, a signal sent to a thread will terminate the entire process.

A single thread can exit in three ways, without terminating the entire process:

The thread can simply return from the start routine. The return value is the thread’s exit code.
The thread can be canceled by another thread in the same process.
The thread can call pthread_exit.

The `pthread_exit` and `pthread_join` functions¶

#include <pthread.h>

void pthread_exit(void *rval_ptr);

The rval_ptr argument is a typeless pointer is available to other threads in the process by calling the pthread_join function.

#include <pthread.h>

int pthread_join(pthread_t thread, void **rval_ptr);

/* Returns: 0 if OK, error number on failure */

The thread that calls pthread_join will block until the specified thread calls pthread_exit, returns from its start routine, or is canceled. If the thread simply returned from its start routine, rval_ptr will contain the return code. If the thread was canceled, the memory location specified by rval_ptr is set to PTHREAD_CANCELED.

By calling pthread_join, we automatically place the thread with which we’re joining in the detached state so that its resources can be recovered. If the thread was already in the detached state, pthread_join can fail, returning EINVAL.

If we’re not interested in a thread’s return value, we can set rval_ptr to NULL.

The following example shows how to fetch the exit code from a thread that has terminated:

threads/exitstatus.c

[p389-390]

The typeless pointer passed to pthread_create and pthread_exit can be used to pass the address of a structure containing more complex information.

If the structure was allocated on the caller’s stack, the memory contents might have changed by the time the structure is used. If a thread allocates a structure on its stack and passes a pointer to this structure to pthread_exit, then the stack might be destroyed and its memory reused for something else by the time the caller of pthread_join tries to use it.

The following example shows the problem with using an automatic variable (allocated on the stack) as the argument to pthread_exit:

threads/badexit2.c

#include "apue.h"
#include <pthread.h>

struct foo {
    int a, b, c, d;
};

void
printfoo(const char *s, const struct foo *fp)
{
    printf("%s", s);
    printf("  structure at 0x%lx\n", (unsigned long)fp);
    printf("  foo.a = %d\n", fp->a);
    printf("  foo.b = %d\n", fp->b);
    printf("  foo.c = %d\n", fp->c);
    printf("  foo.d = %d\n", fp->d);
}

void *
thr_fn1(void *arg)
{
    struct foo  foo = {1, 2, 3, 4};

    printfoo("thread 1:\n", &foo);
    pthread_exit((void *)&foo);
}

void *
thr_fn2(void *arg)
{
    printf("thread 2: ID is %lu\n", (unsigned long)pthread_self());
    pthread_exit((void *)0);
}

int
main(void)
{
    int         err;
    pthread_t   tid1, tid2;
    struct foo  *fp;

    err = pthread_create(&tid1, NULL, thr_fn1, NULL);
    if (err != 0)
        err_exit(err, "can't create thread 1");
    err = pthread_join(tid1, (void *)&fp);
    if (err != 0)
        err_exit(err, "can't join with thread 1");
    sleep(1);
    printf("parent starting second thread\n");
    err = pthread_create(&tid2, NULL, thr_fn2, NULL);
    if (err != 0)
        err_exit(err, "can't create thread 2");
    sleep(1);
    printfoo("parent:\n", fp);
    exit(0);
}

When we run this program on Linux, we get:

$ ./a.out
thread 1:
structure at 0x7f2c83682ed0
foo.a = 1
foo.b = 2
foo.c = 3
foo.d = 4
parent starting second thread
thread 2: ID is 139829159933696
parent:
structure at 0x7f2c83682ed0
foo.a = -2090321472
foo.b = 32556
foo.c = 1
foo.d = 0

The contents of the structure (allocated on the stack of thread tid1) have changed by the time the main thread can access the structure. Note how the stack of the second thread (tid2) has overwritten the first thread’s stack. To solve this problem, we can either use a global structure or allocate the structure using malloc.

The `pthread_cancel` function¶

#include <pthread.h>

int pthread_cancel(pthread_t tid);

/* Returns: 0 if OK, error number on failure */

By default, pthread_cancel will cause the thread specified by tid to behave as if it had called pthread_exit with an argument of PTHREAD_CANCELED, though a thread can ignore or otherwise control how it is canceled.
pthread_cancel doesn’t wait for the thread to terminate; it merely makes the request.

The `pthread_cleanup_push` and `pthread_cleanup_pop` functions¶

A thread can arrange for functions to be called when it exits, similar to the way that the atexit function (Section 7.3). The functions are known as thread cleanup handlers. More than one cleanup handler can be established for a thread. The handlers are recorded in a stack, which means that they are executed in the reverse order from that with which they were registered.

#include <pthread.h>

void pthread_cleanup_push(void (*rtn)(void *), void *arg);
void pthread_cleanup_pop(int execute);

The pthread_cleanup_push function schedules the cleanup function, rtn, to be called with the single argument, arg, when the thread performs one of the following actions:

Makes a call to pthread_exit
Responds to a cancellation request
Makes a call to pthread_cleanup_pop with a nonzero execute argument

If the execute argument is set to zero, the cleanup function is not called.

pthread_cleanup_pop removes the cleanup handler established by the last call to pthread_cleanup_push.

Because they can be implemented as macros, they must be used in matched pairs within the same scope in a thread. The macro definition of pthread_cleanup_push can include a { character, in which case the matching } character is in the pthread_cleanup_pop definition.

The following example shows how to use thread cleanup handlers. We need to match calls to pthread_cleanup_pop with the calls to pthread_cleanup_push; otherwise, the program might not compile. [p394]

threads/cleanup.c

#include "apue.h"
#include <pthread.h>

void
cleanup(void *arg)
{
    printf("cleanup: %s\n", (char *)arg);
}

void *
thr_fn1(void *arg)
{
    printf("thread 1 start\n");
    pthread_cleanup_push(cleanup, "thread 1 first handler");
    pthread_cleanup_push(cleanup, "thread 1 second handler");
    printf("thread 1 push complete\n");
    if (arg)
        return((void *)1);
    pthread_cleanup_pop(0);
    pthread_cleanup_pop(0);
    return((void *)1);
}

void *
thr_fn2(void *arg)
{
    printf("thread 2 start\n");
    pthread_cleanup_push(cleanup, "thread 2 first handler");
    pthread_cleanup_push(cleanup, "thread 2 second handler");
    printf("thread 2 push complete\n");
    if (arg)
        pthread_exit((void *)2);
    pthread_cleanup_pop(0);
    pthread_cleanup_pop(0);
    pthread_exit((void *)2);
}

int
main(void)
{
    int         err;
    pthread_t   tid1, tid2;
    void        *tret;

    err = pthread_create(&tid1, NULL, thr_fn1, (void *)1);
    if (err != 0)
        err_exit(err, "can't create thread 1");
    err = pthread_create(&tid2, NULL, thr_fn2, (void *)1);
    if (err != 0)
        err_exit(err, "can't create thread 2");
    err = pthread_join(tid1, &tret);
    if (err != 0)
        err_exit(err, "can't join with thread 1");
    printf("thread 1 exit code %ld\n", (long)tret);
    err = pthread_join(tid2, &tret);
    if (err != 0)
        err_exit(err, "can't join with thread 2");
    printf("thread 2 exit code %ld\n", (long)tret);
    exit(0);
}

Running the program on Linux gives us:

$ ./a.out
thread 1 start
thread 1 push complete
thread 2 start
thread 2 push complete
cleanup: thread 2 second handler
cleanup: thread 2 first handler
thread 1 exit code 1
thread 2 exit code 2

Note that if the thread terminates by returning from its start routine, its cleanup handlers are not called. [p396]

The table below summarize similarities between the thread functions and the process functions.

Process primitive	Thread primitive	Description
`fork`	`pthread_create`	create a new flow of control
`exit`	`pthread_exit`	exit from an existing flow of control
`waitpid`	`pthread_join`	get exit status from flow of control
`atexit`	`pthread_cleanup_push`	register function to be called at exit from flow of control
`getpid`	`pthread_self`	get ID for flow of control
`abort`	`pthread_cancel`	request abnormal termination of flow of control

The `pthread_detach` function¶

By default, a thread’s termination status is retained until we call pthread_join for that thread. A thread’s underlying storage can be reclaimed immediately on termination if the thread has been detached. After a thread is detached, we can’t use the pthread_join function to wait for its termination status, because calling pthread_join for a detached thread results in undefined behavior. We can detach a thread by calling pthread_detach.

#include <pthread.h>

int pthread_detach(pthread_t tid);

/* Returns: 0 if OK, error number on failure */

We can create a thread that is already in the detached state by modifying the thread attributes we pass to pthread_create. This is detailed in the next chapter.

Thread Synchronization¶

When multiple threads of control share the same memory, one thread can modify a variable that other threads can read or modify, thus we need to synchronize the threads to ensure that they don’t use an invalid value when accessing the variable’s memory contents.

When one thread modifies a variable, other threads can potentially see inconsistencies when reading the value of that variable. On processor architectures in which the modification takes more than one memory cycle, this can happen when the memory read is interleaved between the memory write cycles.

In the following figure, thread A reads the variable and then writes a new value to it, but the write operation takes two memory cycles. If thread B reads the same variable between the two write cycles, it will see an inconsistent value:

To solve this problem, the threads have to use a lock that will allow only one thread to access the variable at a time, as show in the following figure:

If thread B wants to read the variable, it acquires a lock.
When thread A updates the variable, it acquires the same lock. Thus thread B will be unable to read the variable until thread A releases the lock.

We also need to synchronize two or more threads that might try to modify the same variable at the same time.

For example (as in the following figure), the increment operation is usually broken down into three steps.

Read the memory location into a register.
Increment the value in the register.
Write the new value back to the memory location.

If two threads try to increment the same variable at almost the same time without synchronizing with each other, the results can be inconsistent. [p398]

There is no race if one of the following (assumed) condition occurs:

The modification is atomic.
(In the previous example) The increment takes only one memory cycle.
Data always appears to be sequentially consistent.

Our operations are sequentially consistent when multiple threads can’t observe inconsistencies in our data. In modern computer systems, memory accesses take multiple bus cycles, and multiprocessors generally interleave bus cycles among multiple processors, so we aren’t guaranteed that our data is sequentially consistent.

[p399]

Besides the computer architecture, races can arise from the ways in which our programs use variables, creating places where it is possible to view inconsistencies. For example, we might increment a variable and then make a decision based on its value. The combination of the increment step and the decision-making step isn’t atomic, which opens a window where inconsistencies can arise.

Mutexes¶

We can protect our data and ensure access by only one thread at a time by using the pthreads mutual-exclusion interfaces. A mutex is basically a lock that we set (lock) before accessing a shared resource and release (unlock) when we’re done.

While it is set, any other thread that tries to set it will block until we release it.
If more than one thread is blocked when we unlock the mutex, then all threads blocked on the lock will be made runnable, and the first one to run will be able to set the lock. The others will see that the mutex is still locked and go back to waiting for it to become available again.

In this way, only one thread will proceed at a time.

[p400]

A mutex variable is represented by the pthread_mutex_t data type. Before we can use a mutex variable, we must first initialize it by either setting it to the constant PTHREAD_MUTEX_INITIALIZER (for statically allocated mutexes only) or calling pthread_mutex_init. If we allocate the mutex dynamically (by calling malloc, for example), then we need to call pthread_mutex_destroy before freeing the memory.

#include <pthread.h>

int pthread_mutex_init(pthread_mutex_t *restrict mutex,
                       const pthread_mutexattr_t *restrict attr);

int pthread_mutex_destroy(pthread_mutex_t *mutex);

/* Both return: 0 if OK, error number on failure */

To initialize a mutex with the default attributes, we set attr to NULL (mutex attributes is discussed Section 12.4).

To lock a mutex, we call pthread_mutex_lock. If the mutex is already locked, the calling thread will block until the mutex is unlocked. To unlock a mutex, we call pthread_mutex_unlock.

#include <pthread.h>

int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);

/* All return: 0 if OK, error number on failure */

If a thread can’t afford to block, it can use pthread_mutex_trylock to lock the mutex conditionally. If the mutex is unlocked at the time pthread_mutex_trylock is called, then pthread_mutex_trylock will lock the mutex without blocking and return 0. Otherwise, pthread_mutex_trylock will fail, returning EBUSY without locking the mutex.

The following example illustrates a mutex used to protect a data structure. When more than one thread needs to access a dynamically allocated object, we can embed a reference count in the object to ensure that we don’t free its memory before all threads are done using it.

threads/mutex1.c

#include <stdlib.h>
#include <pthread.h>

struct foo {
    int             f_count;
    pthread_mutex_t f_lock;
    int             f_id;
    /* ... more stuff here ... */
};

struct foo *
foo_alloc(int id) /* allocate the object */
{
    struct foo *fp;

    if ((fp = malloc(sizeof(struct foo))) != NULL) {
        fp->f_count = 1;
        fp->f_id = id;
        if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
            free(fp);
            return(NULL);
        }
        /* ... continue initialization ... */
    }
    return(fp);
}

void
foo_hold(struct foo *fp) /* add a reference to the object */
{
    pthread_mutex_lock(&fp->f_lock);
    fp->f_count++;
    pthread_mutex_unlock(&fp->f_lock);
}

void
foo_rele(struct foo *fp) /* release a reference to the object */
{
    pthread_mutex_lock(&fp->f_lock);
    if (--fp->f_count == 0) { /* last reference */
        pthread_mutex_unlock(&fp->f_lock);
        pthread_mutex_destroy(&fp->f_lock);
        free(fp);
    } else {
        pthread_mutex_unlock(&fp->f_lock);
    }
}

We lock the mutex before incrementing the reference count, decrementing the reference count, and checking whether the reference count reaches zero.
No locking is necessary when we initialize the reference count to 1 in the foo_alloc function, because the allocating thread is the only reference to it so far. If we were to place the structure on a list at this point, it could be found by other threads, so we would need to lock it first.

Before using the object, threads are expected to add a reference to it by calling foo_hold. When they are done, they must call foo_rele to release the reference. When the last reference is released, the object’s memory is freed.

In this example, we have ignored how threads find an object before calling foo_hold. Even though the reference count is zero, it would be a mistake for foo_rele to free the object’s memory if another thread is blocked on the mutex in a call to foo_hold. We can avoid this problem by ensuring that the object can’t be found before freeing its memory. We’ll see how to do this in the examples that follow.

Deadlock Avoidance¶

A thread will deadlock itself if it tries to lock the same mutex twice. There are less obvious ways to create deadlocks with mutexes. For example, when we use more than one mutex in our programs, a deadlock can occur if we allow one thread to hold a mutex and block while trying to lock a second mutex at the same time that another thread holding the second mutex tries to lock the first mutex. Neither thread can proceed, because each needs a resource that is held by the other.

Deadlocks can be avoided by carefully controlling the order in which mutexes are locked. For example, assume that you have two mutexes, A and B, that you need to lock at the same time. If all threads always lock mutex A before mutex B (vice versa), no deadlock can occur from the use of the two mutexes (but you can still deadlock on other resources). You’ll have the potential for a deadlock only when one thread attempts to lock the mutexes in the opposite order from another thread.

Another approach when many locks and data structures are involved (it is difficult to apply the previous approach) is that you might be able to release your locks and try again at a later time. You can use the pthread_mutex_trylock interface to avoid deadlocking in this case. If you are already holding locks and pthread_mutex_trylock is successful, then you can proceed. If it can’t acquire the lock, however, you can release the locks you already hold, clean up, and try again later.

Example of two mutexes¶

In the following example which shows the use of two mutexes, we avoid deadlocks by ensuring that when we need to acquire two mutexes at the same time, we always lock them in the same order. The second mutex protects a hash list that we use to keep track of the foo data structures. Thus the hashlock mutex protects both the fh hash table and the f_next hash link field in the foo structure. The f_lock mutex in the foo structure protects access to the remainder of the foo structure’s fields.

threads/mutex2.c

#include <stdlib.h>
#include <pthread.h>

#define NHASH 29
#define HASH(id) (((unsigned long)id)%NHASH)

struct foo *fh[NHASH];

pthread_mutex_t hashlock = PTHREAD_MUTEX_INITIALIZER;

struct foo {
    int             f_count;
    pthread_mutex_t f_lock;
    int             f_id;
    struct foo     *f_next; /* protected by hashlock */
    /* ... more stuff here ... */
};

struct foo *
foo_alloc(int id) /* allocate the object */
{
    struct foo  *fp;
    int         idx;

    if ((fp = malloc(sizeof(struct foo))) != NULL) {
        fp->f_count = 1;
        fp->f_id = id;
        if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
            free(fp);
            return(NULL);
        }
        idx = HASH(id);
        pthread_mutex_lock(&hashlock);
        fp->f_next = fh[idx];
        fh[idx] = fp;
        pthread_mutex_lock(&fp->f_lock);
        pthread_mutex_unlock(&hashlock);
        /* ... continue initialization ... */
        pthread_mutex_unlock(&fp->f_lock);
    }
    return(fp);
}

void
foo_hold(struct foo *fp) /* add a reference to the object */
{
    pthread_mutex_lock(&fp->f_lock);
    fp->f_count++;
    pthread_mutex_unlock(&fp->f_lock);
}

struct foo *
foo_find(int id) /* find an existing object */
{
    struct foo  *fp;

    pthread_mutex_lock(&hashlock);
    for (fp = fh[HASH(id)]; fp != NULL; fp = fp->f_next) {
        if (fp->f_id == id) {
            foo_hold(fp);
            break;
        }
    }
    pthread_mutex_unlock(&hashlock);
    return(fp);
}

void
foo_rele(struct foo *fp) /* release a reference to the object */
{
    struct foo  *tfp;
    int         idx;

    pthread_mutex_lock(&fp->f_lock);
    if (fp->f_count == 1) { /* last reference */
        pthread_mutex_unlock(&fp->f_lock);
        pthread_mutex_lock(&hashlock);
        pthread_mutex_lock(&fp->f_lock);
        /* need to recheck the condition */
        if (fp->f_count != 1) {
            fp->f_count--;
            pthread_mutex_unlock(&fp->f_lock);
            pthread_mutex_unlock(&hashlock);
            return;
        }
        /* remove from list */
        idx = HASH(fp->f_id);
        tfp = fh[idx];
        if (tfp == fp) {
            fh[idx] = fp->f_next;
        } else {
            while (tfp->f_next != fp)
                tfp = tfp->f_next;
            tfp->f_next = fp->f_next;
        }
        pthread_mutex_unlock(&hashlock);
        pthread_mutex_unlock(&fp->f_lock);
        pthread_mutex_destroy(&fp->f_lock);
        free(fp);
    } else {
        fp->f_count--;
        pthread_mutex_unlock(&fp->f_lock);
    }
}

Comparing threads/mutex2.c with threads/mutex1.c, we can see;

The allocation function foo_alloc now locks the hash list lock, adds the new structure to a hash bucket, and before unlocking the hash list lock, locks the mutex in the new structure. Since the new structure is placed on a global list, other threads can find it, so we need to block them if they try to access the new structure, until we are done initializing it.
The foo_find function locks the hash list lock and searches for the requested structure. If it is found, we increase the reference count and return a pointer to the structure. This follows the lock ordering by locking the hash list lock in foo_find before foo_hold locks the foo structure’s f_lock mutex.
With two locks, the foo_rele function is more complicated. If this is the last reference, we need to unlock the structure mutex so that we can acquire the hash list lock, since we’ll need to remove the structure from the hash list. Then we reacquire the structure mutex. Because we could have blocked since the last time we held the structure mutex, we need to recheck the condition to see whether we still need to free the structure. If another thread found the structure and added a reference to it while we blocked to honor the lock ordering, we simply need to decrement the reference count, unlock everything, and return.

Example of two mutexes (simplified)¶

We can simplify the previous example considerably by using the hash list lock to protect the structure reference count. The structure mutex can be used to protect everything else in the foo structure.

threads/mutex3.c

#include <stdlib.h>
#include <pthread.h>

#define NHASH 29
#define HASH(id) (((unsigned long)id)%NHASH)

struct foo *fh[NHASH];
pthread_mutex_t hashlock = PTHREAD_MUTEX_INITIALIZER;

struct foo {
    int             f_count; /* protected by hashlock */
    pthread_mutex_t f_lock;
    int             f_id;
    struct foo     *f_next; /* protected by hashlock */
    /* ... more stuff here ... */
};

struct foo *
foo_alloc(int id) /* allocate the object */
{
    struct foo  *fp;
    int         idx;

    if ((fp = malloc(sizeof(struct foo))) != NULL) {
        fp->f_count = 1;
        fp->f_id = id;
        if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
            free(fp);
            return(NULL);
        }
        idx = HASH(id);
        pthread_mutex_lock(&hashlock);
        fp->f_next = fh[idx];
        fh[idx] = fp;
        pthread_mutex_lock(&fp->f_lock);
        pthread_mutex_unlock(&hashlock);
        /* ... continue initialization ... */
        pthread_mutex_unlock(&fp->f_lock);
    }
    return(fp);
}

void
foo_hold(struct foo *fp) /* add a reference to the object */
{
    pthread_mutex_lock(&hashlock);
    fp->f_count++;
    pthread_mutex_unlock(&hashlock);
}

struct foo *
foo_find(int id) /* find an existing object */
{
    struct foo  *fp;

    pthread_mutex_lock(&hashlock);
    for (fp = fh[HASH(id)]; fp != NULL; fp = fp->f_next) {
        if (fp->f_id == id) {
            fp->f_count++;
            break;
        }
    }
    pthread_mutex_unlock(&hashlock);
    return(fp);
}

void
foo_rele(struct foo *fp) /* release a reference to the object */
{
    struct foo  *tfp;
    int         idx;

    pthread_mutex_lock(&hashlock);
    if (--fp->f_count == 0) { /* last reference, remove from list */
        idx = HASH(fp->f_id);
        tfp = fh[idx];
        if (tfp == fp) {
            fh[idx] = fp->f_next;
        } else {
            while (tfp->f_next != fp)
                tfp = tfp->f_next;
            tfp->f_next = fp->f_next;
        }
        pthread_mutex_unlock(&hashlock);
        pthread_mutex_destroy(&fp->f_lock);
        free(fp);
    } else {
        pthread_mutex_unlock(&hashlock);
    }
}

In this example, we solved the lock-ordering issues surrounding the hash list and the reference count when we use the same lock for both purposes. Multithreaded software design involves these types of trade-offs:

If the locking granularity is too coarse, you end up with too many threads blocking behind the same locks, with little improvement possible from concurrency.
If the locking granularity is too fine, then you suffer bad performance from excess locking overhead, and you end up with complex code.

As a programmer, you need to find the correct balance between code complexity and performance, while still satisfying your locking requirements.

`pthread_mutex_timedlock` Function¶

The pthread_mutex_timedlock function allows us to bound the time that a thread blocks when a mutex it is trying to acquire is already locked is equivalent to pthread_mutex_lock, but if the timeout value is reached, pthread_mutex_timedlock will return the error code ETIMEDOUT without locking the mutex:

#include <pthread.h>
#include <time.h>

int pthread_mutex_timedlock(pthread_mutex_t *restrict mutex,
                            const struct timespec *restrict tsptr);

/* Returns: 0 if OK, error number on failure */

The timeout is represented by the timespec structure (seconds and nanoseconds) and is absolute time.

The following example uses pthread_mutex_timedlock to avoid blocking indefinitely:

threads/timedlock.c

#include "apue.h"
#include <pthread.h>

int
main(void)
{
    int err;
    struct timespec tout;
    struct tm *tmp;
    char buf[64];
    pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

    pthread_mutex_lock(&lock);
    printf("mutex is locked\n");
    clock_gettime(CLOCK_REALTIME, &tout);
    tmp = localtime(&tout.tv_sec);
    strftime(buf, sizeof(buf), "%r", tmp);
    printf("current time is %s\n", buf);
    tout.tv_sec += 10;  /* 10 seconds from now */
    /* caution: this could lead to deadlock */
    err = pthread_mutex_timedlock(&lock, &tout);
    clock_gettime(CLOCK_REALTIME, &tout);
    tmp = localtime(&tout.tv_sec);
    strftime(buf, sizeof(buf), "%r", tmp);
    printf("the time is now %s\n", buf);
    if (err == 0)
        printf("mutex locked again!\n");
    else
        printf("can't lock mutex again: %s\n", strerror(err));
    exit(0);
}

The following is the output:

$ ./a.out
mutex is locked
current time is 11:41:58 AM
the time is now 11:42:08 AM
can’t lock mutex again: Connection timed out

This strategy is not recommended in practice, because it can lead to deadlock.

Mac OS X 10.6.8 doesn’t support pthread_mutex_timedlock yet, but FreeBSD 8.0, Linux 3.2.0, and Solaris 10 do support it.

Reader–Writer Locks¶

The state of a mutex is either locked or unlocked, and only one thread can lock it at a time. A reader–writer lock (also called shared–exclusive lock) has three possible states:

Locked in read mode (also called locked in shared mode)
Locked in write mode (also called locked in exclusive mode)
Unlocked Only one thread at a time can hold a reader–writer lock in write mode, but multiple threads can hold a reader–writer lock in read mode at the same time.

#include <pthread.h>

int pthread_rwlock_init(pthread_rwlock_t *restrict rwlock,
                        const pthread_rwlockattr_t *restrict attr);

int pthread_rwlock_destroy(pthread_rwlock_t *rwlock);

/* Both return: 0 if OK, error number on failure */

A reader–writer lock is initialized by pthread_rwlock_init. A NULL value of attr indicates default attributes.

Before freeing the memory backing a reader–writer lock, we need to call pthread_rwlock_destroy to clean it up. If pthread_rwlock_init allocated any resources for the reader–writer lock, pthread_rwlock_destroy frees those resources. If we free the memory backing a reader–writer lock without first calling pthread_rwlock_destroy, any resources assigned to the lock will be lost (see Doubts and Solutions).

A reader–writer lock can be read locked, write locked and unlocked with the following functions:

#include <pthread.h>

int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_unlock(pthread_rwlock_t *rwlock);

/* All return: 0 if OK, error number on failure */

Implementations might limit the number of times a reader–writer lock can be locked in shared mode, so we need to check the return value of pthread_rwlock_rdlock.
pthread_rwlock_wrlock and pthread_rwlock_unlock have error returns and technically we should always check for errors when we call functions that can potentially fail. However, if we design our locking properly, we don’t need to check for errors. [p410]

The SUS also defines conditional versions of the reader–writer locking primitives.

#include <pthread.h>

int pthread_rwlock_tryrdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_trywrlock(pthread_rwlock_t *rwlock);

/* Both return: 0 if OK, error number on failure */

When the lock can be acquired, these functions return 0. Otherwise, they return the error EBUSY. These functions can be used to avoid deadlocks in situations where conforming to a lock hierarchy is difficult.

The program below illustrates the use of reader–writer locks. A queue of job requests is protected by a single reader–writer lock. This example shows a possible implementation of Figure 11.1, whereby multiple worker threads obtain jobs assigned to them by a single master thread.

threads/rwlock.c

#include <stdlib.h>
#include <pthread.h>

struct job {
    struct job *j_next;
    struct job *j_prev;
    pthread_t   j_id;   /* tells which thread handles this job */
    /* ... more stuff here ... */
};

struct queue {
    struct job      *q_head;
    struct job      *q_tail;
    pthread_rwlock_t q_lock;
};

/*
 * Initialize a queue.
 */
int
queue_init(struct queue *qp)
{
    int err;

    qp->q_head = NULL;
    qp->q_tail = NULL;
    err = pthread_rwlock_init(&qp->q_lock, NULL);
    if (err != 0)
        return(err);
    /* ... continue initialization ... */
    return(0);
}

/*
 * Insert a job at the head of the queue.
 */
void
job_insert(struct queue *qp, struct job *jp)
{
    pthread_rwlock_wrlock(&qp->q_lock);
    jp->j_next = qp->q_head;
    jp->j_prev = NULL;
    if (qp->q_head != NULL)
        qp->q_head->j_prev = jp;
    else
        qp->q_tail = jp;    /* list was empty */
    qp->q_head = jp;
    pthread_rwlock_unlock(&qp->q_lock);
}

/*
 * Append a job on the tail of the queue.
 */
void
job_append(struct queue *qp, struct job *jp)
{
    pthread_rwlock_wrlock(&qp->q_lock);
    jp->j_next = NULL;
    jp->j_prev = qp->q_tail;
    if (qp->q_tail != NULL)
        qp->q_tail->j_next = jp;
    else
        qp->q_head = jp;    /* list was empty */
    qp->q_tail = jp;
    pthread_rwlock_unlock(&qp->q_lock);
}

/*
 * Remove the given job from a queue.
 */
void
job_remove(struct queue *qp, struct job *jp)
{
    pthread_rwlock_wrlock(&qp->q_lock);
    if (jp == qp->q_head) {
        qp->q_head = jp->j_next;
        if (qp->q_tail == jp)
            qp->q_tail = NULL;
        else
            jp->j_next->j_prev = jp->j_prev;
    } else if (jp == qp->q_tail) {
        qp->q_tail = jp->j_prev;
        jp->j_prev->j_next = jp->j_next;
    } else {
        jp->j_prev->j_next = jp->j_next;
        jp->j_next->j_prev = jp->j_prev;
    }
    pthread_rwlock_unlock(&qp->q_lock);
}

/*
 * Find a job for the given thread ID.
 */
struct job *
job_find(struct queue *qp, pthread_t id)
{
    struct job *jp;

    if (pthread_rwlock_rdlock(&qp->q_lock) != 0)
        return(NULL);

    for (jp = qp->q_head; jp != NULL; jp = jp->j_next)
        if (pthread_equal(jp->j_id, id))
            break;

    pthread_rwlock_unlock(&qp->q_lock);
    return(jp);
}

In this example, we lock the queue’s reader–writer lock in write mode whenever we need to add a job to the queue or remove a job from the queue. Whenever we search the queue, we grab the lock in read mode, allowing all the worker threads to search the queue concurrently. Using a reader–writer lock will improve performance in this case only if threads search the queue much more frequently than they add or remove jobs. The worker threads take only those jobs that match their thread ID off the queue. Since the job structures are used only by one thread at a time, they don’t need any extra locking.

Reader–Writer Locking with Timeouts¶

As with mutexes, the SUS provides functions to lock reader–writer locks with a timeout to give applications a way to avoid blocking indefinitely while trying to acquire a reader–writer lock.

#include <pthread.h>
#include <time.h>

int pthread_rwlock_timedrdlock(pthread_rwlock_t *restrict rwlock,
                               const struct timespec *restrict tsptr);
int pthread_rwlock_timedwrlock(pthread_rwlock_t *restrict rwlock,
                               const struct timespec *restrict tsptr);
/* Both return: 0 if OK, error number on failure */

The tsptr argument points to a timespec structure specifying the time at which the thread should stop blocking. If they can’t acquire the lock, these functions return the ETIMEDOUT error when the timeout expires. Like the pthread_mutex_timedlock function, the timeout specifies an absolute time, not a relative one.

Condition Variables¶

Condition variables are another synchronization mechanism available to threads. These synchronization objects provide a place for threads to rendezvous. When used with mutexes, condition variables allow threads to wait in a race-free way for arbitrary conditions to occur.

The condition itself is protected by a mutex. A thread must first lock the mutex to change the condition state. Other threads will not notice the change until they acquire the mutex, because the mutex must be locked to be able to evaluate the condition.

A condition variable, represented by the pthread_cond_t data type, must be initialized and can be initialized in two ways:

Assign the constant PTHREAD_COND_INITIALIZER to a statically allocated condition
Use the pthread_cond_init function to initialize a dynamically allocated condition

pthread_cond_destroy deinitializes a condition variable before freeing its underlying memory.

#include <pthread.h>

int pthread_cond_init(pthread_cond_t *restrict cond,
                      const pthread_condattr_t *restrict attr);
int pthread_cond_destroy(pthread_cond_t *cond);

/* Both return: 0 if OK, error number on failure */

Unless using nondefault attributes, the attr argument to pthread_cond_init can be set to NULL.

We use pthread_cond_wait to wait for a condition to be true. A variant pthread_cond_timedwait is provided to return an error code if the condition hasn’t been satisfied in the specified amount of time.

#include <pthread.h>

int pthread_cond_wait(pthread_cond_t *restrict cond,
                      pthread_mutex_t *restrict mutex);

int pthread_cond_timedwait(pthread_cond_t *restrict cond,
                           pthread_mutex_t *restrict mutex,
                           const struct timespec *restrict tsptr);

/* Both return: 0 if OK, error number on failure */

The mutex passed to pthread_cond_wait protects the condition. The caller passes the locked mutex to the function, which then atomically places the calling thread on the list of threads waiting for the condition and unlocks the mutex. This closes the window between the time that the condition is checked and the time that the thread goes to sleep waiting for the condition to change, so that the thread doesn’t miss a change in the condition. When pthread_cond_wait returns, the mutex is again locked.

The pthread_cond_timedwait function provides the same functionality as the pthread_cond_wait function with the addition of the timeout (tsptr). The timeout value specifies how long we are willing to wait expressed as a timespec structure, as an absolute time instead of a relative time, as in the example of pthread_mutex_timedlock.

Alternatively to the clock_gettime function, we can use the gettimeofday function to get the current time expressed as a timeval structure and translate it into a timespec structure. The following function can be used to obtain the absolute time for the timeout value

threads/maketimeout.c

#include <sys/time.h>
#include <stdlib.h>

void
maketimeout(struct timespec *tsp, long minutes)
{
    struct timeval now;

    /* get the current time */
    gettimeofday(&now, NULL);
    tsp->tv_sec = now.tv_sec;
    tsp->tv_nsec = now.tv_usec * 1000; /* usec to nsec */
    /* add the offset to get timeout value */
    tsp->tv_sec += minutes * 60;
}

If the timeout expires without the condition occurring, pthread_cond_timedwait will reacquire the mutex and return the error ETIMEDOUT. When it returns from a successful call to pthread_cond_wait or pthread_cond_timedwait, a thread needs to reevaluate the condition, since another thread might have run and already changed the condition.

There are two functions to notify threads that a condition has been satisfied. The pthread_cond_signal function will wake up at least one thread waiting on a condition (The POSIX specification allows for implementations of pthread_cond_signal to wake up more than one thread, to make the implementation simpler.), whereas the pthread_cond_broadcast function will wake up all threads waiting on a condition.

#include <pthread.h>

int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);

/* Both return: 0 if OK, error number on failure */

When we call pthread_cond_signal or pthread_cond_broadcast, we are said to be signaling the thread or condition. We have to be careful to signal the threads only after changing the state of the condition.

The following example shows how to use a condition variable and a mutex together to synchronize threads.

threads/condvar.c

#include <pthread.h>

struct msg {
    struct msg *m_next;
    /* ... more stuff here ... */
};

struct msg *workq;

pthread_cond_t qready = PTHREAD_COND_INITIALIZER;

pthread_mutex_t qlock = PTHREAD_MUTEX_INITIALIZER;

void
process_msg(void)
{
    struct msg *mp;

    for (;;) {
        pthread_mutex_lock(&qlock);
        while (workq == NULL)
            pthread_cond_wait(&qready, &qlock);
        mp = workq;
        workq = mp->m_next;
        pthread_mutex_unlock(&qlock);
        /* now process the message mp */
    }
}

void
enqueue_msg(struct msg *mp)
{
    pthread_mutex_lock(&qlock);
    mp->m_next = workq;
    workq = mp;
    pthread_mutex_unlock(&qlock);
    pthread_cond_signal(&qready);
}

The condition is the state of the work queue.
We protect the condition with a mutex and evaluate the condition in a while loop.
When we put a message on the work queue, we need to hold the mutex, but we don’t need to hold the mutex when we signal the waiting threads.
As long as it is okay for a thread to pull the message off the queue before we call pthread_cond_signal, we can do this after releasing the mutex. Since we check the condition in a while loop, this doesn’t present a problem; a thread will wake up, find that the queue is still empty, and go back to waiting again. If the code couldn’t tolerate this race, we would need to hold the mutex when we signal the threads.

Spin Locks¶

A spin lock is like a mutex, except that instead of blocking a process by sleeping, the process is blocked by busy-waiting (spinning) until the lock can be acquired. A spin lock could be used in situations where locks are held for short periods of times and threads don’t want to incur the cost of being descheduled.

Spin locks are often used as low-level primitives to implement other types of locks. They can be implemented efficiently using test-and-set instructions. Although efficient, they can lead to wasting CPU resources: while a thread is spinning and waiting for a lock to become available, the CPU can’t do anything else. This is why spin locks should be held only for short periods of time.

Kernel-space. Spin locks are useful when used in a nonpreemptive kernel. Besides providing a mutual exclusion mechanism, they block interrupts so an interrupt handler can’t deadlock the system by trying to acquire a spin lock that is already locked (think of interrupts as another type of preemption). In these types of kernels, interrupt handlers can’t sleep, so the only synchronization primitives they can use are spin locks.
User-space. Spin locks are not as useful unless you are running in a realtime scheduling class that doesn’t allow preemption. User-level threads running in a time-sharing scheduling class can be descheduled when their time quantum expires or when a thread with a higher scheduling priority becomes runnable. In these cases, if a thread is holding a spin lock, it will be put to sleep and other threads blocked on the lock will continue spinning longer than intended.

Many mutex implementations are efficient: using mutex locks is equivalent to using spin locks in terms of an application's performance. Some mutex implementations will spin for a limited amount of time trying to acquire the mutex, and only sleep when the spin count threshold is reached. These factors, combined with faster context switch in modern processors, make spin locks useful only in limited circumstances. [p417]

Similar to mutex (we can replace one with the other), we can initialize a spin lock with the pthread_spin_init function. To deinitialize a spin lock, we can call the pthread_spin_destroy function.

#include <pthread.h>

int pthread_spin_init(pthread_spinlock_t *lock, int pshared);
int pthread_spin_destroy(pthread_spinlock_t *lock);

/* Both return: 0 if OK, error number on failure */

Only one attribute is specified for spin locks, which matters only if the platform supports the Thread Process-Shared Synchronization option.
The pshared argument represents the process-shared attribute, which indicates how the spin lock will be acquired.
- If it is set to PTHREAD_PROCESS_SHARED, then the spin lock can be acquired by threads that have access to the lock’s underlying memory, even if those threads are from different processes.
- Otherwise, the pshared argument is set to PTHREAD_PROCESS_PRIVATE and the spin lock can be accessed only from threads within the process that initialized it.

To lock the spin lock, we can call either of the following: pthread_spin_lock, which will spin until the lock is acquired pthread_spin_trylock, which will return the EBUSY error if the lock can’t be acquired immediately. Note that pthread_spin_trylock doesn’t spin.

Regardless of how it was locked, a spin lock can be unlocked by calling pthread_spin_unlock.

#include <pthread.h>

int pthread_spin_lock(pthread_spinlock_t *lock);
int pthread_spin_trylock(pthread_spinlock_t *lock);
int pthread_spin_unlock(pthread_spinlock_t *lock);

/* All return: 0 if OK, error number on failure */

If a spin lock is currently unlocked, then the pthread_spin_lock function can lock it without spinning.
If the thread already has it locked, the results are undefined. The call to pthread_spin_lock could fail with the EDEADLK error (or some other error), or the call could spin indefinitely. The behavior depends on the implementation.
If we try to unlock a spin lock that is not locked, the results are also undefined.

If either pthread_spin_lock or pthread_spin_trylock returns 0, then the spin lock is locked. We need to be careful not to call any functions that might sleep while holding the spin lock. Otherwise, other threads trying to acquire it will spin, wasting CPU resources.

Barriers¶

Barriers are a synchronization mechanism to coordinate multiple threads working in parallel. A barrier allows each thread to wait until all cooperating threads have reached the same point, and then continue executing from there. The pthread_join function acts as a barrier to allow one thread to wait until another thread exits.

Barrier objects are more general. They allow an arbitrary number of threads to wait until all of the threads have completed processing, but the threads don’t have to exit. They can continue working after all threads have reached the barrier.

The pthread_barrier_init function initializes a barrier, and pthread_barrier_destroy function deinitializes a barrier.

#include <pthread.h>

int pthread_barrier_init(pthread_barrier_t *restrict barrier,
                         const pthread_barrierattr_t *restrict attr,
                         unsigned int count);
int pthread_barrier_destroy(pthread_barrier_t *barrier);

/* Both return: 0 if OK, error number on failure */

The count argument for pthread_barrier_init specifies the number of threads that must reach the barrier before all of the threads will be allowed to continue.
The attr argument specifies the attributes of the barrier object (NULL for default attributes).

If the pthread_barrier_init function allocated any resources for the barrier, the resources will be freed when we deinitialize the barrier by calling the pthread_barrier_destroy function.

The pthread_barrier_wait function indicates that a thread is done with its work and is ready to wait for all the other threads to catch up.

#include <pthread.h>

int pthread_barrier_wait(pthread_barrier_t *barrier);

/* Returns: 0 or PTHREAD_BARRIER_SERIAL_THREAD if OK, error number on failure */

The thread calling pthread_barrier_wait is put to sleep if the barrier count (set in the call to pthread_barrier_init) is not yet satisfied. If the thread is the last one to call pthread_barrier_wait, thereby satisfying the barrier count, all of the threads are awakened.

To one arbitrary thread, it will appear as if the pthread_barrier_wait function returned a value of PTHREAD_BARRIER_SERIAL_THREAD. The remaining threads see a return value of 0. This allows one thread to continue as the master to act on the results of the work done by all of the other threads.

Once the barrier count is reached and the threads are unblocked, the barrier can be used again.
The barrier count can’t be changed unless we call the pthread_barrier_destroy function followed by the pthread_barrier_init function with a different count.

The following example shows how a barrier can be used to synchronize threads cooperating on a single task.

threads/barrier.c

#include "apue.h"
#include <pthread.h>
#include <limits.h>
#include <sys/time.h>

#define NTHR   8                /* number of threads */
#define NUMNUM 8000000L         /* number of numbers to sort */
#define TNUM   (NUMNUM/NTHR)    /* number to sort per thread */

long nums[NUMNUM];
long snums[NUMNUM];

pthread_barrier_t b;

#ifdef SOLARIS
#define heapsort qsort
#else
extern int heapsort(void *, size_t, size_t,
                    int (*)(const void *, const void *));
#endif

/*
 * Compare two long integers (helper function for heapsort)
 */
int
complong(const void *arg1, const void *arg2)
{
    long l1 = *(long *)arg1;
    long l2 = *(long *)arg2;

    if (l1 == l2)
        return 0;
    else if (l1 < l2)
        return -1;
    else
        return 1;
}

/*
 * Worker thread to sort a portion of the set of numbers.
 */
void *
thr_fn(void *arg)
{
    long    idx = (long)arg;

    heapsort(&nums[idx], TNUM, sizeof(long), complong);
    pthread_barrier_wait(&b);

    /*
     * Go off and perform more work ...
     */
    return((void *)0);
}

/*
 * Merge the results of the individual sorted ranges.
 */
void
merge()
{
    long    idx[NTHR];
    long    i, minidx, sidx, num;

    for (i = 0; i < NTHR; i++)
        idx[i] = i * TNUM;
    for (sidx = 0; sidx < NUMNUM; sidx++) {
        num = LONG_MAX;
        for (i = 0; i < NTHR; i++) {
            if ((idx[i] < (i+1)*TNUM) && (nums[idx[i]] < num)) {
                num = nums[idx[i]];
                minidx = i;
            }
        }
        snums[sidx] = nums[idx[minidx]];
        idx[minidx]++;
    }
}

int
main()
{
    unsigned long   i;
    struct timeval  start, end;
    long long       startusec, endusec;
    double          elapsed;
    int             err;
    pthread_t       tid;

    /*
     * Create the initial set of numbers to sort.
     */
    srandom(1);
    for (i = 0; i < NUMNUM; i++)
        nums[i] = random();

    /*
     * Create 8 threads to sort the numbers.
     */
    gettimeofday(&start, NULL);
    pthread_barrier_init(&b, NULL, NTHR+1);
    for (i = 0; i < NTHR; i++) {
        err = pthread_create(&tid, NULL, thr_fn, (void *)(i * TNUM));
        if (err != 0)
            err_exit(err, "can't create thread");
    }
    pthread_barrier_wait(&b);
    merge();
    gettimeofday(&end, NULL);

    /*
     * Print the sorted list.
     */
    startusec = start.tv_sec * 1000000 + start.tv_usec;
    endusec = end.tv_sec * 1000000 + end.tv_usec;
    elapsed = (double)(endusec - startusec) / 1000000.0;
    printf("sort took %.4f seconds\n", elapsed);
    for (i = 0; i < NUMNUM; i++)
        printf("%ld\n", snums[i]);
    exit(0);
}

In this example: We use eight threads to divide the job of sorting 8 million numbers. Each thread sorts 1 million numbers using the heapsort algorithm. Then the main thread calls a function to merge the results. We don’t need to use the PTHREAD_BARRIER_SERIAL_THREAD return value from pthread_barrier_wait to decide which thread merges the results, because we use the main thread for this task. That is why we specify the barrier count as one more than the number of worker threads; the main thread counts as one waiter.

This example shows the use of a barrier in a simplified situation where the threads perform only one task. In more realistic situations, the worker threads will continue with other activities after the call to pthread_barrier_wait returns.

[p422]

Summary¶

This chapter introduces the concept of threads and discussed the POSIX.1 primitives available to create and destroy them. We also introduced the problem of thread synchronization. We discussed five fundamental synchronization mechanisms: mutexes, reader–writer locks, condition variables, spin locks, and barriers; and we saw how to use them to protect shared resources.

Doubts and Solutions¶

Verbatim¶

p409-410 on Reader–Writer Locks¶

If we free the memory backing a reader–writer lock without first calling pthread_rwlock_destroy, any resources assigned to the lock will be lost.

Question: "any resources assigned to the lock will be lost" probably means a form of resource leak.

p409 on `PTHREAD_BARRIER_SERIAL_THREAD.`¶

To one arbitrary thread, it will appear as if the pthread_barrier_wait function returned a value of PTHREAD_BARRIER_SERIAL_THREAD. The remaining threads see a return value of 0.

Question: "one arbitrary thread" means "one unspecified thread".

As in pthread_barrier_wait: Upon successful completion, the pthread_barrier_wait() function shall return PTHREAD_BARRIER_SERIAL_THREAD for a single (arbitrary) thread synchronized at the barrier and zero for each of the other threads.