Chapter 3. Sockets Introduction

Introduction

This chapter begins the description of the sockets API.

Socket Address Structures

The name of socket address structures begin with sockaddr_ and end with a unique suffix for each protocol suite.

IPv4 Socket Address Structure

An IPv4 socket address structure, commonly called an "Internet socket address structure", is named sockaddr_in and is defined by including the <netinet/in.h> header.

struct in_addr {
  in_addr_t   s_addr;           /* 32-bit IPv4 address */
                                /* network byte ordered */
};

struct sockaddr_in {
  uint8_t         sin_len;      /* length of structure (16) */
  sa_family_t     sin_family;   /* AF_INET */
  in_port_t       sin_port;     /* 16-bit TCP or UDP port number */
                                /* network byte ordered */
  struct in_addr  sin_addr;     /* 32-bit IPv4 address */
                                /* network byte ordered */
  char            sin_zero[8];  /* unused */
};

Generic Socket Address Structure

A socket address structures is always passed by reference when passed as an argument to any socket functions. But any socket function that takes one of these pointers as an argument must deal with socket address structures from any of the supported protocol families.

A generic socket address structure in the <sys/socket.h> header:

struct sockaddr {
  uint8_t      sa_len;
  sa_family_t  sa_family;    /* address family: AF_xxx value */
  char         sa_data[14];  /* protocol-specific address */
};

The socket functions are then defined as taking a pointer to the generic socket address structure, as shown here in the ANSI C function prototype for the bind function:

int bind(int, struct sockaddr *, socklen_t);

This requires that any calls to these functions must cast the pointer to the protocol-specific socket address structure to be a pointer to a generic socket address structure.

For example:

struct sockaddr_in  serv;      /* IPv4 socket address structure */

/* fill in serv{} */

bind(sockfd, (struct sockaddr *) &serv, sizeof(serv));

In Chapter 1 in our unp.h header, we define SA to be the string struct sockaddr, just to shorten the code that we must write to cast these pointers.

IPv6 Socket Address Structure

The IPv6 socket address is defined by including the <netinet/in.h> header:

struct in6_addr {
  uint8_t  s6_addr[16];          /* 128-bit IPv6 address */
                                 /* network byte ordered */
};

#define SIN6_LEN      /* required for compile-time tests */

struct sockaddr_in6 {
  uint8_t         sin6_len;      /* length of this struct (28) */
  sa_family_t     sin6_family;   /* AF_INET6 */
  in_port_t       sin6_port;     /* transport layer port# */
                                 /* network byte ordered */
  uint32_t        sin6_flowinfo; /* flow information, undefined */
  struct in6_addr sin6_addr;     /* IPv6 address */
                                 /* network byte ordered */
  uint32_t        sin6_scope_id; /* set of interfaces for a scope */
};

New Generic Socket Address Structure

A new generic socket address structure was defined as part of the IPv6 sockets API, to overcome some of the shortcomings of the existing struct sockaddr. Unlike the struct sockaddr, the new struct sockaddr_storage is large enough to hold any socket address type supported by the system. The sockaddr_storage structure is defined by including the <netinet/in.h> header:

struct sockaddr_storage {
  uint8_t      ss_len;       /* length of this struct (implementation dependent) */
  sa_family_t  ss_family;    /* address family: AF_xxx value */
  /* implementation-dependent elements to provide:
   * a) alignment sufficient to fulfill the alignment requirements of
   *    all socket address types that the system supports.
   * b) enough storage to hold any type of socket address that the
   *    system supports.
   */
};

The sockaddr_storage type provides a generic socket address structure that is different from struct sockaddr in two ways:

  1. If any socket address structures that the system supports have alignment requirements, the sockaddr_storage provides the strictest alignment requirement.
  2. The sockaddr_storage is large enough to contain any socket address structure that the system supports.

The fields of the sockaddr_storage structure are opaque to the user, except for ss_family and ss_len (if present). The sockaddr_storage must be cast or copied to the appropriate socket address structure for the address given in ss_family to access any other fields.

Comparison of Socket Address Structures

In this figure, we assume that:

Figure 3.6 Comparison of various socket address structures.

To handle variable-length structures, whenever we pass a pointer to a socket address structure as an argument to one of the socket functions, we pass its length as another argument.

Value-Result Arguments

When a socket address structure is passed to any socket function, it is always passed by reference (a pointer to the structure is passed). The length of the structure is also passed as an argument.

The way in which the length is passed depends on which direction the structure is being passed:

  1. From the process to the kernel
  2. From the kernel to the process

From process to kernel

bind, connect, and sendto functions pass a socket address structure from the process to the kernel.

Arumgents to these functions:

struct sockaddr_in serv;

/* fill in serv{} */
connect (sockfd, (SA *) &serv, sizeof(serv));

Figure 3.7 Socket address structure passed from process to kernel.

The datatype for the size of a socket address structure is actually socklen_t and not int, but the POSIX specification recommends that socklen_t be defined as uint32_t.

From kernel to process

accept, recvfrom, getsockname, and getpeername functions pass a socket address structure from the kernel to the process.

Arguments to these functions:

struct sockaddr_un  cli;   /* Unix domain */
socklen_t  len;

len = sizeof(cli);         /* len is a value */
getpeername(unixfd, (SA *) &cli, &len);
/* len may have changed */

Figure 3.8 Socket address structure passed from kernel to process.

Value-result argument (Figure 3.8): the size changes from an integer to be a pointer to an integer because the size is both a value when the function is called and a result when the function returns.

For two other functions that pass socket address structures, recvmsg and sendmsg, the length field is not a function argument but a structure member.

If the socket address structure is fixed-length, the value returned by the kernel will always be that fixed size: 16 for an IPv4 sockaddr_in and 28 for an IPv6 sockaddr_in6. But with a variable-length socket address structure (e.g., a Unix domain sockaddr_un), the value returned can be less than the maximum size of the structure.

Though the most common example of a value-result argument is the length of a returned socket address structure, we will encounter other value-result arguments in this text:

Byte Ordering Functions

For a 16-bit integer that is made up of 2 bytes, there are two ways to store the two bytes in memory:

Figure 3.9 Little-endian byte order and big-endian byte order for a 16-bit integer.

The figure shows the most significant bit (MSB) as the leftmost bit of the 16-bit value and the least significant bit (LSB) as the rightmost bit.

The terms "little-endian" and "big-endian" indicate which end of the multibyte value, the little end or the big end, is stored at the starting address of the value.

Host byte order refer to the byte ordering used by a given system. The program below prints the host byte order:

byteorder.c

We store the two-byte value 0x0102 in the short integer and then look at the two consecutive bytes, c[0] (the address A) and c[1] (the address A+1) to determine the byte order.

The string CPU_VENDOR_OS is determined by the GNU autoconf program.

freebsd4 % byteorder
i386-unknown-freebsd4.8: little-endian

macosx % byteorder
powerpc-apple-darwin6.6: big-endian

freebsd5 % byteorder
sparc64-unknown-freebsd5.1: big-endian

aix % byteorder
powerpc-ibm-aix5.1.0.0: big-endian

hpux % byteorder
hppa1.1-hp-hpux11.11: big-endian

linux % byteorder
i586-pc-linux-gnu: little-endian

solaris % byteorder
sparc-sun-solaris2.9: big-endian

Networking protocols must specify a network byte order. The sending protocol stack and the receiving protocol stack must agree on the order in which the bytes of these multibyte fields will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte integers.

But, both history and the POSIX specification say that certain fields in the socket address structures must be maintained in network byte order. We use the following four functions to convert between these two byte orders:

unp_htons.h

#include <netinet/in.h>

uint16_t htons(uint16_t host16bitvalue);
uint32_t htonl(uint32_t host32bitvalue);

/* Both return: value in network byte order */

uint16_t ntohs(uint16_t net16bitvalue);
uint32_t ntohl(uint32_t net32bitvalue);

/* Both return: value in host byte order */

When using these functions, we do not care about the actual values (big-endian or little-endian) for the host byte order and the network byte order. What we must do is call the appropriate function to convert a given value between the host and network byte order. On those systems that have the same byte ordering as the Internet protocols (big-endian), these four functions are usually defined as null macros.

We use the term "byte" to mean an 8-bit quantity since almost all current computer systems use 8-bit bytes. Most Internet standards use the term octet instead of byte to mean an 8-bit quantity.

Bit ordering is an important convention in Internet standards, such as the the first 32 bits of the IPv4 header from RFC 791:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL |Type of Service|           Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This represents four bytes in the order in which they appear on the wire; the leftmost bit is the most significant. However, the numbering starts with zero assigned to the most significant bit.

Byte Manipulation Functions

Two types functions differ in whether they deal with null-terminated C strings:

unp_bzero.h

#include <strings.h>

void bzero(void *dest, size_t nbytes);
void bcopy(const void *src, void *dest, size_t nbytes);
int bcmp(const void *ptr1, const void *ptr2, size_t nbytes);

/* Returns: 0 if equal, nonzero if unequal */

The memory pointed to by the const pointer is read but not modified by the function.

unp_memset.h

#include <string.h>

void *memset(void *dest, int c, size_t len);
void *memcpy(void *dest, const void *src, size_t nbytes);
int memcmp(const void *ptr1, const void *ptr2, size_t nbytes);

/* Returns: 0 if equal, <0 or >0 if unequal (see text) */

Note:

inet_aton, inet_addr, and inet_ntoa Functions

These functions convert Internet addresses between ASCII strings (what humans prefer to use) and network byte ordered binary values (values that are stored in socket address structures).

unp_inet_aton.h

#include <arpa/inet.h>

int inet_aton(const char *strptr, struct in_addr *addrptr);
/* Returns: 1 if string was valid, 0 on error */

in_addr_t inet_addr(const char *strptr);
/* Returns: 32-bit binary network byte ordered IPv4 address; INADDR_NONE if error */

char *inet_ntoa(struct in_addr inaddr);
/* Returns: pointer to dotted-decimal string */

inet_pton and inet_ntop Functions

These two functions are new with IPv6 and work with both IPv4 and IPv6 addresses. We use these two functions throughout the text. The letters "p" and "n" stand for presentation and numeric. The presentation format for an address is often an ASCII string and the numeric format is the binary value that goes into a socket address structure.

unp_inet_pton.h

#include <arpa/inet.h>

int inet_pton(int family, const char *strptr, void *addrptr);
/* Returns: 1 if OK, 0 if input not a valid presentation format, -1 on error */

const char *inet_ntop(int family, const void *addrptr, char *strptr, size_t len);
/* Returns: pointer to result if OK, NULL on error */

Arguments:

Functions:

Size definitions in <netinet/in.h> header for the len argument:

#define INET_ADDRSTRLEN       16       /* for IPv4 dotted-decimal */
#define INET6_ADDRSTRLEN      46       /* for IPv6 hex string */

The following figure summarizes the five functions on address conversion functions:

Figure 3.11 Summary of address conversion functions.

Even if your system does not yet include support for IPv6, you can start using these newer functions by replacing calls of the form.

Replacing inet_addr to inet_pton

Replace:

foo.sin_addr.s_addr = inet_addr(cp);

with

inet_pton(AF_INET, cp, &foo.sin_addr);

Replacing inet_ntoa to inet_ntop

Replace:

ptr = inet_ntoa(foo.sin_addr);

with

char str[INET_ADDRSTRLEN];
ptr = inet_ntop(AF_INET, &foo.sin_addr, str, sizeof(str));

Simple definitions of inet_pton and inet_ntop that support IPv4

libfree/inet_pton_ipv4.c

int
inet_pton(int family, const char *strptr, void *addrptr)
{
    if (family == AF_INET) {
        struct in_addr  in_val;

        if (inet_aton(strptr, &in_val)) {
            memcpy(addrptr, &in_val, sizeof(struct in_addr));
            return (1);
        }
        return(0);
    }
    errno = EAFNOSUPPORT;
    return (-1);
}

inet_ntop_ipv4.c

const char *
inet_ntop(int family, const void *addrptr, char *strptr, size_t len)
{
    const u_char *p = (const u_char *) addrptr;

    if (family == AF_INET) {
        char    temp[INET_ADDRSTRLEN];

        snprintf(temp, sizeof(temp), "%d.%d.%d.%d",
                 p[0], p[1], p[2], p[3]);
        if (strlen(temp) >= len) {
            errno = ENOSPC;
            return (NULL);
        }
        strcpy(strptr, temp);
        return (strptr);
    }
    errno = EAFNOSUPPORT;
    return (NULL);
}

A basic problem with inet_ntop is that it requires the caller to pass a pointer to a binary address. This address is normally contained in a socket address structure, requiring the caller to know the format of the structure and the address family.

For IPv4:

struct sockaddr_in   addr;
inet_ntop(AF_INET, &addr.sin_addr, str, sizeof(str));

For IPv6:

struct sockaddr_in6   addr6;
inet_ntop(AF_INET6, &addr6.sin6_addr, str, sizeof(str));

This (above) makes our code protocol-dependent.

To solve this, we will write our own function named sock_ntop that takes a pointer to a socket address structure, looks inside the structure, and calls the appropriate function to return the presentation format of the address.

unp_sock_ntop.h

#include "unp.h"

char *sock_ntop(const struct sockaddr *sockaddr, socklen_t addrlen);

/* Returns: non-null pointer if OK, NULL on error */

sockaddr points to a socket address structure whose length is addrlen. The function uses its own static buffer to hold the result and a pointer to this buffer is the return value. Notice that using static storage for the result prevents the function from being re-entrant or thread-safe.

Presentation format of sock_ntop

sock_ntop definition

The source code for only the AF_INET case:

char *
sock_ntop(const struct sockaddr *sa, socklen_t salen)
{
    char        portstr[8];
    static char str[128];       /* Unix domain is largest */

    switch (sa->sa_family) {
    case AF_INET: {
        struct sockaddr_in  *sin = (struct sockaddr_in *) sa;

        if (inet_ntop(AF_INET, &sin->sin_addr, str, sizeof(str)) == NULL)
            return(NULL);
        if (ntohs(sin->sin_port) != 0) {
            snprintf(portstr, sizeof(portstr), ":%d", ntohs(sin->sin_port));
            strcat(str, portstr);
        }
        return(str);
    }
  /* ... */

There are a few other functions that we define to operate on socket address structures, and these will simplify the portability of our code between IPv4 and IPv6.

apue_sock_bind_wild.h

#include "unp.h"

int sock_bind_wild(int sockfd, int family);
/* Returns: 0 if OK, -1 on error */

int sock_cmp_addr(const struct sockaddr *sockaddr1,
                  const struct sockaddr *sockaddr2, socklen_t addrlen);
/* Returns: 0 if addresses are of the same family and ports are equal,
   else nonzero
*/

int sock_cmp_port(const struct sockaddr *sockaddr1,
                  const struct sockaddr *sockaddr2, socklen_t addrlen);
/* Returns: 0 if addresses are of the same family and ports are equal,
   else nonzero
*/

int sock_get_port(const struct sockaddr *sockaddr, socklen_t addrlen);
/* Returns: non-negative port number for IPv4 or IPv6 address, else -1 */

char *sock_ntop_host(const struct sockaddr *sockaddr, socklen_t addrlen);
/* Returns: non-null pointer if OK, NULL on error */

void sock_set_addr(const struct sockaddr *sockaddr, socklen_t addrlen,
                   void *ptr);
void sock_set_port(const struct sockaddr *sockaddr, socklen_t addrlen,
                   int port);
void sock_set_wild(struct sockaddr *sockaddr, socklen_t addrlen);

readn, writen, and readline Functions

Stream sockets (e.g., TCP sockets) exhibit a behavior with the read and write functions that differs from normal file I/O. A read or write on a stream socket might input or output fewer bytes than requested, but this is not an error condition. The reason is that buffer limits might be reached for the socket in the kernel. All that is required to input or output the remaining bytes is for the caller to invoke the read or write function again. This scenario is always a possibility on a stream socket with read, but is normally seen with write only if the socket is nonblocking.

unp_readn.h

#include "unp.h"

ssize_t readn(int filedes, void *buff, size_t nbytes);
ssize_t writen(int filedes, const void *buff, size_t nbytes);
ssize_t readline(int filedes, void *buff, size_t maxlen);

/* All return: number of bytes read or written, –1 on error */
#include    "unp.h"

ssize_t                     /* Read "n" bytes from a descriptor. */
readn(int fd, void *vptr, size_t n)
{
    size_t  nleft;
    ssize_t nread;
    char    *ptr;

    ptr = vptr;
    nleft = n;
    while (nleft > 0) {
        if ( (nread = read(fd, ptr, nleft)) < 0) {
            if (errno == EINTR)
                nread = 0;      /* and call read() again */
            else
                return(-1);
        } else if (nread == 0)
            break;              /* EOF */

        nleft -= nread;
        ptr   += nread;
    }
    return(n - nleft);      /* return >= 0 */
}

Our three functions look for the error EINTR (the system call was interrupted by a caught signal) and continue reading or writing if the error occurs. We handle the error here, instead of forcing the caller to call readn or writen again, since the purpose of these three functions is to prevent the caller from having to handle a short count.

In Section 14.3, we will mention that the MSG_WAITALL flag can be used with the recv function to replace the need for a separate readn function.

In test/readline1.c, our readline function calls the system’s read function once for every byte of data. This is very inefficient, and why we’ve commented the code to state it is "PAINFULLY SLOW".

Our advice is to think in terms of buffers and not lines. Write your code to read buffers of data, and if a line is expected, check the buffer to see if it contains that line.

lib/readline.c shows a faster version of the readline function, which uses its own buffering rather than stdio buffering. Most importantly, the state of readline’s internal buffer is exposed, so callers have visibility into exactly what has been received.

In lib/readline.c, the internal function my_read reads up to MAXLINE characters at a time and then returns them, one at a time. The only change to the readline function itself is to call my_read instead of read. A new function, readlinebuf, exposes the internal buffer state so that callers can check and see if more data was received beyond a single line.

Unfortunately, by using static variables in readline.c to maintain the state information across successive calls, the functions are not re-entrant or thread-safe.