Chapter 5. Standard I/O Library

The standard I/O library handles such details as buffer allocation and performing I/O in optimal-sized chunks.

Streams and FILE Objects

Standard I/O file streams can be used with both single-byte and multibyte ("wide") character sets. A stream’s orientation determines whether the characters that are read and written are single byte or multibyte.

Standard Input, Standard Output, and Standard Error

Three streams are predefined and automatically available to a process. They refer to file descriptors STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO (defined in <unistd.h>) [p9]. These three standard I/O streams are referenced through the predefined file pointers stdin, stdout,and stderr(defined in <stdio.h>).

Buffering

The goal of the buffering provided by the standard I/O library is to use the minimum number of read and write calls. This library also tries to do its buffering automatically for each I/O stream, obviating the need for the application to worry about it.

Three types of buffering are provided:

  1. Fully buffered. Actual I/O takes place when the standard I/O buffer is filled.
    • Files residing on disk are normally fully buffered by the standard I/O library.
    • The buffer used is usually obtained by one of the standard I/O functions calling malloc (Section 7.8) the first time I/O is performed on a stream.
    • The term flush describes the writing of a standard I/O buffer. A buffer can be flushed automatically by the standard I/O routines, such as when a buffer fills, or we can call the function fflush to flush a stream. Unfortunately, in the UNIX environment, flush means two different things:
      1. In terms of the standard I/O library, it means writing out the contents of a buffer, which may be partially filled.
      2. In terms of the terminal driver, such as the tcflush function in Chapter 18, it means to discard the data that's already stored in a buffer.
  2. Line buffered. The standard I/O library performs I/O when a newline character is encountered on input or output.
    • This allows us to output a single character at a time (with the standard I/O fputc function), knowing that actual I/O will take place only when we finish writing each line.
    • Line buffering is typically used on a stream when it refers to a terminal, such as standard input and standard output.
    • Line buffering has two caveats:
      1. The size of the buffer that the standard I/O library uses to collect each line is fixed, so I/O might take place if this buffer is filled before writing a newline.
      2. Whenever input is requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel), all line-buffered output streams are flushed. The reason for the qualifier on (b) is that the requested data may already be in the buffer, which doesn't require data to be read from the kernel. Obviously, any input from an unbuffered stream, item (a), requires data to be obtained from the kernel.
  3. Unbuffered. The standard I/O library does not buffer the characters. For example:
    • If we write 15 characters with the standard I/O fputs function, we expect these 15 characters to be output as soon as possible, probably with the write function from Section 3.8.
    • The standard error stream is normally unbuffered so that any error messages are displayed as quickly as possible, regardless of whether they contain a newline.

ISO C requires the following buffering characteristics:

However, this doesn't tell us either of the following:

Most implementations default to the following types of buffering:

apue_setbuf.h

#include <stdio.h>

void setbuf(FILE *restrict fp, char *restrict buf );
int setvbuf(FILE *restrict fp, char *restrict buf, int mode, size_t size);

/* Returns: 0 if OK, nonzero on error */

The GNU C librarys use the value from the st_blksize member of the stat structure to determine the optimal standard I/O buffer size.

The fflush function causes any unwritten data for the stream to be passed to the kernel. If fp is NULL, fflush causes all output streams to be flushed.

Opening a Stream

apue_fopen.h

#include <stdio.h>

FILE *fopen(const char *restrict pathname, const char *restrict type);
FILE *freopen(const char *restrict pathname,
              const char *restrict type,
              FILE *restrict fp);
FILE *fdopen(int fd, const char *type);

/* All three return: file pointer if OK, NULL on error */

type argument has 15 values as specifed by ISO C:

type Description open(2) Flags
r, rb open for reading O_RDONLY
w, wb truncate to 0 length or create for writing O_WRONLY|O_CREAT|O_TRUNC
a, ab append; open for writing at end of file, or create for writing O_WRONLY|O_CREAT|O_APPEND
r+, r+b, rb+ open for reading and writing O_RDWR
w+, w+b, wb+ truncate to 0 length or create for reading and writing O_RDWR|O_CREAT|O_TRUNC
a+, a+b, ab+ open or create for reading and writing at end of file O_RDWR|O_CREAT|O_APPEND

Character b allows the standard I/O system to differentiate between a text file and a binary file. The UNIX kernel doesn’t differentiate between these types of files, thus character b has no effect.

apue_fclose.h

#include <stdio.h>

int fclose(FILE *fp);

/* Returns: 0 if OK, EOF on error */

An open stream is closed by calling fclose:

When a process terminates normally, either by calling the exit function directly or by returning from the main function, all standard I/O streams with unwritten buffered data are flushed and all open standard I/O streams are closed.

Reading and Writing a Stream

Unformatted I/O:

Input Functions

apue_getc.h

#include <stdio.h>

int getc(FILE *fp);
int fgetc(FILE *fp);
int getchar(void);

/* All three return: next character if OK, EOF on end of file or error */

These functions return the same value whether an error occurs or the end of file is reached. To distinguish between the two, we must call either ferror or feof:

apue_ferror.h

#include <stdio.h>

int ferror(FILE *fp);
int feof(FILE *fp);

/* Both return: nonzero (true) if condition is true, 0 (false) otherwise */

void clearerr(FILE *fp);

In most implementations, two flags are maintained for each stream in the FILE object:

Both flags are cleared by calling clearerr.

Pushback

After reading from a stream, we can push back characters by calling ungetc.

apue_ungetc.h

#include <stdio.h>

int ungetc(int c, FILE *fp);

/* Returns: c if OK, EOF on error */

Output Functions

apue_putc.h

#include <stdio.h>

int putc(int c, FILE *fp);
int fputc(int c, FILE *fp);
int putchar(int c);

/* All three return: c if OK, EOF on error */

Line-at-a-Time I/O

apue_fgets.h

#include <stdio.h>

char *fgets(char *restrict buf, int n, FILE *restrict fp);
char *gets(char *buf);

/* Both return: buf if OK, NULL on end of file or error */

apue_fputs.h

#include <stdio.h>

int fputs(const char *restrict str, FILE *restrict fp);
int puts(const char *str);

/* Both return: non-negative value if OK, EOF on error */

Standard I/O Efficiency

Function User CPU (seconds) System CPU (seconds) Clock time (seconds) Bytes of program text
best time from Figure 3.6 0.05 0.29 3.18
fgets, fputs 2.27 0.30 3.49 143
getc, putc 8.45 0.29 10.33 114
fgetc, fputc 8.16 0.40 10.18 114
single byte time from Figure 3.6 134.61 249.94 394.95

Binary I/O

If doing binary I/O, we often want to read or write an entire structure at a time. There are problems with the previous functions:

apue_fread.h

#include <stdio.h>

size_t fread(void *restrict ptr, size_t size, size_t nobj,
             FILE *restrict fp);
size_t fwrite(const void *restrict ptr, size_t size, size_t nobj,
              FILE *restrict fp);

/* Both return: number of objects read or written */

These functions have two common uses:

Read or write a binary array (e.g write elements 2 through 5 of a floating-point array):

float   data[10];

if (fwrite(&data[2], sizeof(float), 4, fp) != 4)
    err_sys("fwrite error");

Read or write a structure:

struct {
    short  count;
    long   total;
    char   name[NAMESIZE];
} item;

if (fwrite(&item, sizeof(item), 1, fp) != 1)
    err_sys("fwrite error");

These two functions won't work on different systems (sometimes even on the same system):

  1. The offset of a member within a structure can differ between compilers and systems because of different alignment requirements. Even on a single system, the binary layout of a structure can differ, depending on compiler options. [p157]
  2. The binary formats used to store multibyte integers and floating-point values differ among machine architectures

Positioning a Stream

apue_ftell.h

#include <stdio.h>

long ftell(FILE *fp);

/* Returns: current file position indicator if OK, −1L on error */

int fseek(FILE *fp, long offset, int whence);

/* Returns: 0 if OK, −1 on error */

void rewind(FILE *fp);

apue_ftello.h

#include <stdio.h>

off_t ftello(FILE *fp);

/* Returns: current file position indicator if OK, (off_t)−1 on error */

int fseeko(FILE *fp, off_t offset, int whence);

/* Returns: 0 if OK, −1 on error */

apue_fgetpos.h

#include <stdio.h>

int fgetpos(FILE *restrict fp, fpos_t *restrict pos);
int fsetpos(FILE *fp, const fpos_t *pos);

/* Both return: 0 if OK, nonzero on error */

Formatted I/O

Formatted Output

apue_printf.h

#include <stdio.h>

int printf(const char *restrict format, ...);
int fprintf(FILE *restrict fp, const char *restrict format, ...);
int dprintf(int fd, const char *restrict format, ...);

/* All three return: number of characters output if OK, negative value if output error */

int sprintf(char *restrict buf, const char *restrict format, ...);
/* Returns: number of characters stored in array if OK, negative value if encoding error */

int snprintf(char *restrict buf, size_t n, const char *restrict format, ...);
/* Returns: number of characters that would have been stored in array if buffer was
   large enough, negative value if encoding error */
Conversion specification
%[flags][fldwidth][precision][lenmodifier]convtype
Conversion type Description
d,i signed decimal
o unsigned octal
u unsigned decimal
x,X unsigned hexadecimal
f,F double floating-point number
e,E double floating-point number in exponential format
g,G interpreted as f, F, e, or E, depending on value converted
a,A double floating-point number in hexadecimal exponential format
c character (with l length modifier, wide character)
s string (with l length modifier, wide character string)
p pointer to a void
n pointer to a signed integer into which is written the number of characters written so far
% a % character
C wide character (XSI option, equivalent to lc)
S wide character string (XSI option, equivalent to ls)

With the normal conversion specification, conversions are applied to the arguments in the order they appear after the format argument. An alternative conversion specification syntax allows the arguments to be named explicitly with the sequence %n$ representing the nth argument.

The following five variants of the printf family are similar to the previous five, but the variable argument list (...) is replaced with arg.

apue_vprintf.h

#include <stdarg.h>
#include <stdio.h>

int vprintf(const char *restrict format, va_list arg);
int vfprintf(FILE *restrict fp, const char *restrict format,
             va_list arg);
int vdprintf(int fd, const char *restrict format, va_list arg);

/* All three return: number of characters output if OK, negative value if output error */

int vsprintf(char *restrict buf, const char *restrict format, va_list arg);

/* Returns: number of characters stored in array if OK, negative value if encoding error */

int vsnprintf(char *restrict buf, size_t n,
              const char *restrict format, va_list arg);

/* Returns: number of characters that would have been stored in array if buffer was
   large enough, negative value if encoding error */

Formatted Output

apue_scanf.h

#include <stdio.h>

int scanf(const char *restrict format, ...);
int fscanf(FILE *restrict fp, const char *restrict format, ...);
int sscanf(const char *restrict buf, const char *restrict format, ...);

/* All three return: number of input items assigned, EOF if input error
   or end of file before any conversion */

Except for the conversion specifications and white space, other characters in the format have to match the input. If a character doesn’t match, processing stops, leaving the remainder of the input unread.

Conversion specification
%[*][fldwidth][m][lenmodifier]convtype
Conversion type Description
d signed decimal, base 10
i signed decimal, base determined by format of input
o unsigned octal (input optionally signed)
u unsigned decimal, base 10 (input optionally signed)
x,X unsigned hexadecimal (input optionally signed)
a,A,e,E,f,F,g,G floating-point number
c character (with l length modifier, wide character)
s string (with l length modifier, wide character string)
[ matches a sequence of listed characters, ending with ]
matches all characters except the ones listed, ending with ]
p pointer to a void
n pointer to a signed integer into which is written the number of characters read so far
% a % character
C wide character (XSI option, equivalent to lc)
S wide character string (XSI option, equivalent to ls)

Implementation Details

apue_fileno.h

#include <stdio.h>

int fileno(FILE *fp);

/* Returns: the file descriptor associated with the stream */

Each standard I/O stream has an associated file descriptor, and we can obtain the descriptor for a stream by calling fileno.

Result on OS X 10.10:

$ ./buf
enter any character

one line to standard error
stream = stdin, line buffered, buffer size = 4096
stream = stdout, line buffered, buffer size = 4096
stream = stderr, unbuffered, buffer size = 1
stream = /etc/passwd, fully buffered, buffer size = 4096

$ ./buf < /etc/group > std.out 2> std.err
$ cat std.out
enter any character
stream = stdin, fully buffered, buffer size = 4096
stream = stdout, fully buffered, buffer size = 4096
stream = stderr, unbuffered, buffer size = 1
stream = /etc/passwd, fully buffered, buffer size = 4096
$ cat std.err
one line to standard error

Temporary Files

apue_tmpnam.h

#include <stdio.h>

char *tmpnam(char *ptr);
FILE *tmpfile(void);

/* Returns: pointer to unique pathname Returns: file pointer if OK, NULL on error */

apue_mkdtemp.h

#include <stdlib.h>

char *mkdtemp(char *template);

/* Returns: pointer to directory name if OK, NULL on error */

int mkstemp(char *template);

/* Returns: file descriptor if OK, −1 on error */

Unlike tmpfile, the temporary file created by mkstemp is not removed automatically for us.

The tmpfile and mkstemp functions should be used instead of tmpnam. [p169]

Example:

Memory Streams

Memory streams are standard I/O streams for which there are no underlying files, although they are still accessed with FILE pointers. All I/O is done by transferring bytes to and from buffers in main memory.

apue_fmemopen.h

#include <stdio.h>

FILE *fmemopen(void *restrict buf, size_t size,
               const char *restrict type);

/* Returns: stream pointer if OK, NULL on error */

Note:

Alternatives to Standard I/O

When we use the line-at-a-time functions, fgets and fputs, the data is usually copied twice: once between the kernel and the standard I/O buffer (when the corresponding read or write is issued) and again between the standard I/O buffer and our line buffer.

Doubts and Solutions

Verbatim

Section 5.4 on line buffering [p145]

Second, whenever input is requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel), all line-buffered output streams are flushed. The reason for the qualifier on (b) is that the requested data may already be in the buffer, which doesn’t require data to be read from the kernel. Obviously, any input from an unbuffered stream, item (a), requires data to be obtained from the kernel.

Section 5.8 Standard I/O Efficiency [p155]

The version using line-at-a-time I/O is almost twice as fast as the version using character-at-a-time I/O. If the fgets and fputs functions are implemented using getc and putc, then we would expect the timing to be similar to the getc version. Actually, we might expect the line-at-a-time version to take longer, since we would be adding the overhead of 200 million extra function calls to the existing 6 million ones.

Section 5.14 on Memory Stream [p172]

Third, a null byte is written at the current position in the stream whenever we increase the amount of data in the stream’s buffer and call fclose, fflush, fseek, fseeko, or fsetpos.