Chapter 7. Process Environment

Introduction

main Function

A C program starts execution with a function called main:

int main(int argc, char *argv[]);

When a C program is executed by the kernel (by one of the exec functions), a special start-up routine is called before the main function is called. The executable program file specifies this routine as the starting address for the program; this is set up by the link editor (linker) when it is invoked by the C compiler. This start-up routine takes values from the kernel (the command-line arguments and the environment) and sets things up so that the main function is called as shown earlier.

Process Termination

There are eight ways for a process to terminate.

Exit Functions

Three functions terminate a program normally:

apue_exit.h

#include <stdlib.h>

void exit(int status);
void _Exit(int status);

#include <unistd.h>

void _exit(int status);

All three exit functions expect a single integer argument (exit status).

The exit status of the process is undefined, if any of the following occurs:

If the return type of main is an integer and main "falls off the end" (an implicit return), the exit status of the process is 0.

Returning an integer value from the main function is equivalent to calling exit with the same value:

exit(0); is same as return(0); from the main function.

atexit Function

With ISO C, a process can register at least 32 functions that are automatically called by exit. These are called exit handlers and are registered by calling the atexit function.

apue_atexit.h

#include <stdlib.h>

int atexit(void (*func)(void));

/* Returns: 0 if OK, nonzero on error */

With ISO C and POSIX.1, exit first calls the exit handlers and then closes (via fclose) all open streams. POSIX.1 extends the ISO C standard by specifying that any exit handlers installed will be cleared if the program calls any of the exec family of functions.

The only way a program can be executed by the kernel is if one of the exec functions is called. The only way a process can voluntarily terminate is if _exit or _Exit is called, either explicitly or implicitly (by calling exit). A process can also be involuntarily terminated by a signal.

Command-Line Arguments

When a program is executed, the process that does the exec can pass command-line arguments to the new program. This is part of the normal operation of the UNIX system shells.

Example:

#include "apue.h"

int
main(int argc, char *argv[])
{
    int i;
    for (i = 0; i < argc; i++) /* echo all command-line args */
        printf("argv[%d]: %s\n", i, argv[i]);
    exit(0);
}

We are guaranteed by both ISO C and POSIX.1 that argv[argc] is a null pointer. This lets us alternatively code the argument-processing loop as:

for (i = 0; argv[i] != NULL; i++)

Environment List

Each program is also passed an environment list, which is an array of character pointers, with each pointer containing the address of a null-terminated C string. It is contained in the global variable environ:

extern char **environ;

Figure 7.5 Environment consisting of five C character strings

Memory Layout of a C Program

Historically, a C program has been composed of the following pieces:

Figure 7.5 Environment consisting of five C character strings

With Linux on a 32-bit Intel x86 processor, the text segment starts at location 0x08048000, and the bottom of the stack starts just below 0xC0000000. The stack grows from higher-numbered addresses to lower-numbered addresses on this particular architecture. The unused virtual address space between the top of the heap and the top of the stack is large

The size(1) command reports the sizes (in bytes) of the text, data, and bss segments:

$ size /usr/bin/cc /bin/sh
text data bss dec hex filename
346919 3576 6680 357175 57337 /usr/bin/cc
102134 1776 11272 115182 1c1ee /bin/sh

Shared Libraries

Shared libraries remove the common library routines from the executable file and maintains a single copy of the library routine somewhere in memory that all processes reference:

Memory Allocation

apue_malloc.h

#include <stdlib.h>

void *malloc(size_t size);
void *calloc(size_t nobj, size_t size);
void *realloc(void *ptr, size_t newsize);

/* All three return: non-null pointer if OK, NULL on error */

void free(void *ptr);

The pointer returned by the three allocation functions is guaranteed to be suitably aligned so that it can be used for any data object.

Because the three alloc functions return a generic void * pointer, if we #include <stdlib.h> (to obtain the function prototypes), we do not explicitly have to cast the pointer returned by these functions when we assign it to a pointer of a different type. The default return value for undeclared functions is int, so using a cast without the proper function declaration could hide an error on systems where the size of type int differs from the size of a function’s return value (a pointer in this case).

The allocation routines are usually implemented with the sbrk(2) system call. This system call expands (or contracts) the heap of the process. Although sbrk can expand or contract the memory of a process, most versions of malloc and free never decrease their memory size. The space that we free is available for a later allocation, but the freed space is not usually returned to the kernel; instead, that space is kept in the malloc pool.

Alternate Memory Allocators

[p209]

Environment Variables

The environment strings are usually of the form:

name=value

The UNIX kernel never looks at these strings; their interpretation is up to the various applications.

apue_getenv.h

#include <stdlib.h>

char *getenv(const char *name);
/* Returns: pointer to value associated with name, NULL if not found */

int putenv(char *str);
/* Returns: 0 if OK, nonzero on error */

int setenv(const char *name, const char *value, int rewrite);
int unsetenv(const char *name);

/* Both return: 0 if OK, −1 on error */

Figure 7.7 Environment variables defined in the Single UNIX Specification

Note the difference between putenv and setenv. Whereas setenv must allocate memory to create the name=value string from its arguments, putenv is free to place the string passed to it directly into the environment. Indeed, many implementations do exactly this, so it would be an error to pass putenv a string allocated on the stack, since the memory would be reused after we return from the current function.

setjmp and longjmp Functions

In C, we can't goto a label that’s in another function. Instead, we must use the setjmp and longjmp functions to perform this type of branching. These two functions are useful for handling error conditions that occur in a deeply nested function call.

apue_setjmp.h

#include <setjmp.h>

int setjmp(jmp_buf env);
/* Returns: 0 if called directly, nonzero if returning from a call to longjmp */

void longjmp(jmp_buf env, int val);

Examples:

Automatic, Register, and Volatile Variables

When we return to main as a result of the longjmp, implementations do not try to roll back these automatic variables and register variables (in main), though standards say only that their values are indeterminate.

Example:

Compile the above program, with and without compiler optimizations, the results are different:

$ gcc testjmp.c compile without any optimization
$ ./a.out
in f1():
globval = 95, autoval = 96, regival = 97, volaval = 98, statval = 99
after longjmp:
globval = 95, autoval = 96, regival = 97, volaval = 98, statval = 99
$ gcc -O testjmp.c compile with full optimization
$ ./a.out
in f1():
globval = 95, autoval = 96, regival = 97, volaval = 98, statval = 99
after longjmp:
globval = 95, autoval = 2, regival = 3, volaval = 98, statval = 99

The optimizations don’t affect the global, static, and volatile variables. The setjmp(3) manual page on one system states that variables stored in memory will have values as of the time of the longjmp, whereas variables in the CPU and floating-point registers are restored to their values when setjmp was called. Without optimization, all five variables are stored in memory. When we enable optimization, both autoval and regival go into registers, even though the former wasn't declared register, and the volatile variable stays in memory.

Potential Problem with Automatic Variables

An automatic variable can never be referenced after the function that declared it returns.

Incorrect usage of an automatic variable:

#include <stdio.h>
FILE *
open_data(void)
{
    FILE *fp;
    char databuf[BUFSIZ]; /* setvbuf makes this the stdio buffer */
    if ((fp = fopen("datafile", "r")) == NULL)
        return(NULL);
    if (setvbuf(fp, databuf, _IOLBF, BUFSIZ) != 0)
        return(NULL);
    return(fp); /* error */
}

The problem is that when open_data returns, the space it used on the stack will be used by the stack frame for the next function that is called. But the standard I/O library will still be using that portion of memory for its stream buffer. Chaos is sure to result. To correct this problem, the array databuf needs to be allocated from global memory, either statically (static or extern) or dynamically (one of the alloc functions).

getrlimit and setrlimit Functions

Every process has a set of resource limits, some of which can be queried and changed by the getrlimit and setrlimit functions.

apue_getrlimit.h

#include <sys/resource.h>

int getrlimit(int resource, struct rlimit *rlptr);
int setrlimit(int resource, const struct rlimit *rlptr);

/* Both return: 0 if OK, −1 on error */

These two functions are defined in the XSI option in the Single UNIX Specification. The resource limits for a process are normally established by process 0 when the system is initialized and then inherited by each successive process. Each implementation has its own way of tuning the various limits.

struct rlimit {
    rlim_t rlim_cur; /* soft limit: current limit */
    rlim_t rlim_max; /* hard limit: maximum value for rlim_cur */
};

Rules of changing resource limits:

  1. A process can change its soft limit to a value less than or equal to its hard limit.
  2. A process can lower its hard limit to a value greater than or equal to its soft limit. This lowering of the hard limit is irreversible for normal users.
  3. Only a superuser process can raise a hard limit.

The resource limits affect the calling process and are inherited by any of its children. This means that the setting of resource limits needs to be built into the shells to affect all our future processes. Indeed, the Bourne shell, the GNU Bourne-again shell, and the Korn shell have the built-in ulimit command, and the C shell has the built-in limit command. (The umask and chdir functions also have to be handled as shell built-ins.)

Example:

Summary

Understanding the environment of a C program within a UNIX system’s environment is a prerequisite to understanding the process control features of the UNIX System. This chapter discusses process start and termination, and how a process is passed an argument list and an environment. Although both the argument list and the environment are uninterpreted by the kernel, it is the kernel that passes both from the caller of exec to the new process. This chapter also examines the typical memory layout of a C program and how a process can dynamically allocate and free memory.