Chapter 12. TCP: The Transmission Control Protocol (Preliminaries)

Introduction

[p579]

The protocols discussed so far do not include mechanisms for delivering data reliably; they may detect that erroneous data has been received, using a checksum or CRC, but they do not try very hard to repair errors:

Information theory and coding theory

ARQ and Retransmission

For a multihop communications channel, there are other problems besides packet bit errors:

An error-correcting protocol designed for use over a multihop communications channel (such as IP) must cope with all of these problems.

Packet drops and bit errors

A straightforward method dealing with packet drops (and bit errors) is to resend the packet until it is received properly. This requires a way to determine:

  1. Whether the receiver has received the packet.
  2. Whether the packet it received was the same one the sender sent.

This is solved by using acknowledgment (ACK): the sender sends a packet and awaits an ACK. When the receiver receives the packet, it sends the ACK. When the sender receives the ACK, it sends another packet, and the process continues. Interesting questions to ask here are:

  1. How long should the sender (expect to) wait for an ACK?
  2. What if the ACK is lost?
    • If an ACK is dropped, the sender cannot distinguish this case from the case in which the original packet is dropped, so it simply sends the packet again. The receiver may receive two or more copies in that case, so it must be prepared to handle that situation
  3. What if the packet was received but had errors in it?
    • Detecting errors is easiter than correcting errors. By using a form of checksum. When a receiver receives a packet containing an error, it refrains from sending an ACK. Eventually, the sender resends the packet, which ideally arrives undamaged.
Packet duplication

The receiver might receive duplicate copies of the packet. This problem is addressed using a sequence number. Every unique packet gets a new sequence number when it is sent at the source, and this sequence number is carried along in the packet itself. The receiver can use this number to determine whether it has already seen the packet and if so, discard it.

Efficiency

The protocol described so far is reliable but not very efficient. The sender injects a single packet into the communications path but then must stop until it hears the ACK. This protocol is therefore called "stop and wait". Its throughput performance (data sent on the network per unit time) is proportional to M/R where M is the packet size and R is the round-trip time (RTT), assuming no packets are lost or irreparably damaged in transit. For a fixed-size packet, as R goes up, the throughput goes down. If packets are lost or damaged, the situation is even worse: the "goodput" (useful amount of data transferred per unit time) can be considerably less than the throughput.

For a network that doesn’t damage or drop many packets, the cause for low throughput is usually that the network is not being kept busy. The situation is similar to using an assembly line where new work cannot enter the line until a complete product emerges. Most of the line goes idle. We could have more than one work unit in the line at a time. This is same for networks: if we could have more than one packet in the network, we would keep it "more busy", leading to higher throughput.

Allowing more than one packet to be in the network at a time:

There are other issues:

Windows of Packets and Sliding Windows

Assume each unique packet has a sequence number. We define a window of packets as the collection of packets (or their sequence numbers) that have been injected by the sender but not yet completely acknowledged (the sender has not received an ACK for them). We refer to the window size as the number of packets in the window.

The sender’s window, showing which packets are eligible to be sent (or have already been sent), which are not yet eligible, and which have already been sent and acknowledged. In this example, the window size is fixed at three packets.

In the figure:

This movement of the window gives rise to another name for this type of protocol, a sliding window protocol.

Typically, this window structure is kept at both the sender and the receiver.

Although the window structure is convenient for keeping track of data as it flows between sender and receiver, it does not provide guidance as to how large the window should be, or what happens if the receiver or network cannot handle the sender’s data rate.

Variable Windows: Flow Control and Congestion Control

Flow control can handle problem that arises when a receiver is too slow relative to a sender, by forcing the sender to slow down when the receiver cannot keep up. It is usually handled in one of two ways:

If we consider the effect of changing the window size at the sender, it becomes clear how this achieves flow control. The sender is allowed to inject W packets into the network before it hears an ACK for any of them. If the sender and receiver are sufficiently fast, and the network loses no packets and has an infinite capacity, this means that the transfer rate is proportional to (SW/R) bits/s, where W is the window size, S is the packet size in bits, and R is the RTT. When the window advertisement from the receiver clamps the value of W at the sender, the sender’s overall rate can be limited so as to not overwhelm the receiver.

This approach works fine for protecting the receiver, but what about the network in between? We may have routers with limited memory between the sender and the receiver that have to contend with slow network links. When this happens, it is possible for the sender’s rate to exceed a router’s ability to keep up, leading to packet loss. This is addressed with a special form of flow control called congestion control.

Congestion control involves the sender slowing down so as to not overwhelm the network between itself and the receiver.

The problem of congestion control in datagram-style networks, and more generally queuing theory to which it is closely related, has remained a major research topic for years, and it is unlikely to ever be solved completely for all circumstances. It is also not practical to discuss all the options and methods of performing flow control here. In Chapter 16 we will explore the particular congestion control technique used with TCP in more detail, along with a number of variants that have arisen over the years.

Setting the Retransmission Timeout

One of the most important performance issues is how long to wait before concluding that a packet has been lost and should be resent. That is, What should the retransmission timeout be? Intuitively, the amount of time the sender should wait before resending a packet is about the sum of the following times:

  1. The time to send the packet,
  2. The time for the receiver to process it and send an ACK,
  3. The time for the ACK to travel back to the sender,
  4. The time for the sender to process the ACK.

In practice, none of these times are known with certainty and any or all of them vary over time as additional load is added to or removed from the end hosts or routers.

Because it is not practical for the user to estimate all the times, a better strategy is to have the protocol implementation try to estimate them. This is called round-trip-time estimation and is a statistical process. The true RTT is likely to be close to the sample mean of a collection of samples of RTTs. This average naturally changes over time (it is not stationary), as the paths taken through the network may change.

[p584]

It would not be sensible to set the retransmission timer to be exactly equal to the mean estimator, as it is likely that many actual RTTs will be larger, thereby inducing unwanted retransmissions.

This is further explored in Chapter 14.

Introduction to TCP

Our description of TCP starts in this chapter and continues in the next five chapters:

The TCP Service Model

Even though TCP and UDP use the same network layer (IPv4 or IPv6), TCP provides a totally different service to the application layer from what UDP does. TCP provides a connection-oriented, reliable, byte stream service. The term connection-oriented means that the two applications using TCP must establish a TCP connection by contacting each other before they can exchange data. There are exactly two endpoints communicating with each other on a TCP connection; concepts such as broadcasting and multicasting (Chapter 9) are not applicable to TCP.

TCP provides a byte stream abstraction to applications that use it. Its consequence is that no record markers or message boundaries are automatically inserted by TCP (Chapter 1). A record marker corresponds to an indication of an application’s write extent. If the application on one end writes 10 bytes, followed by a write of 20 bytes, followed by a write of 50 bytes, the application at the other end of the connection cannot tell what size the individual writes were. For example, the other end may read the 80 bytes in four reads of 20 bytes at a time or in some other way. One end puts a stream of bytes into TCP, and the identical stream of bytes appears at the other end. Each endpoint individually chooses its read and write sizes.

TCP does not interpret the contents of the bytes in the byte stream at all. It has no idea if the data bytes being exchanged are binary data, ASCII characters, EBCDIC characters, or something else. The interpretation of this byte stream is up to the applications on each end of the connection. TCP supports the urgent mechanism mentioned before, although it is no longer recommended for use.

Reliability in TCP

TCP provides reliability using specific variations on the techniques just described.

The application data is broken into what TCP considers the best-size chunks to send, typically fitting each segment into a single IP-layer datagram that will not be fragmented. This is different from UDP, where each write by the application usually generates a UDP datagram of that size (plus headers). The chunk passed by TCP to IP is called a segment. Chapter 15 discusses how TCP decides what size a segment should be.

TCP Header and Encapsulation

TCP is encapsulated in IP datagrams as shown the figure below:

Figure 12-2 The TCP header appears immediately following the IP header or last IPv6 extension header and is often 20 bytes long (with no TCP options). With options, the TCP header can be as large as 60 bytes. Common options include Maximum Segment Size, Timestamps, Window Scaling, and Selective ACKs.

The TCP header (show in the figure below) is considerably more complicated than the UDP header (Chapter 10). This is not very surprising, as TCP is a significantly more complicated protocol that must keep each end of the connection informed (synchronized) about the current state.

Figure 12-3 The TCP header. Its normal size is 20 bytes, unless options are present. The Header Length field gives the size of the header in 32-bit words (minimum value is 5). The shaded fields (Acknowledgment Number, Window Size, plus ECE and ACK bits) refer to the data flowing in the opposite direction relative to the sender of this segment.

Each TCP header contains the source and destination port number. The source and destination port number, along with the source and destination IP addresses in the IP header, uniquely identify each connection.

The combination of an IP address and a port number is sometimes called an endpoint or socket in the TCP literature. The term "socket" appeared in [RFC0793] and was ultimately adopted as the name of the Berkeley-derived programming interface for network communications (called "Berkeley sockets"). It is a pair of sockets or endpoints (the 4-tuple consisting of the client IP address, client port number, server IP address, and server port number) that uniquely identifies each TCP connection. This fact will become important when we look at how a TCP server can communicate with multiple clients (Chapter 13).

Sequence Number and Acknowledgment Number fields *

When a new connection is being established, the SYN bit field is turned on in the first segment sent from client to server. Such segments are called SYN segments, or simply SYNs. The Sequence Number field contains the first sequence number to be used on that direction of the connection for subsequent sequence numbers and in returning ACK numbers. This number is not 0 or 1 but instead is another number, often randomly chosen, called the initial sequence number (ISN). The reason for the ISN not being 0 or 1 is a security measure and will be discussed in Chapter 13. The sequence number of the first byte of data sent on this direction of the connection is the ISN plus 1 because the SYN bit field consumes one sequence number. Consuming a sequence number also implies reliable delivery using retransmission (discussed later). Thus, SYNs and application bytes (and FINs) are reliably delivered. ACKs, which do not consume sequence numbers, are not.

TCP can be described as "a sliding window protocol with cumulative positive acknowledgments". The ACK Number field is constructed to indicate the largest byte received in order at the receiver (plus 1). For example, if bytes 1–1024 are received OK, and the next segment contains bytes 2049–3072, the receiver cannot use the regular ACK Number field to signal the sender that it received this new segment. Modern TCPs, however, have a selective acknowledgment (SACK) option that allows the receiver to indicate to the sender out-of-order data it has received correctly. When paired with a TCP sender capable of selective repeat, a significant performance benefit may be realized [FF96]. Chapter 14 discusses how TCP uses duplicate acknowledgments (multiple segments with the same ACK field) to help with its congestion control and error control procedures

Other fields in the TCP header *

Currently eight bit fields are defined for the TCP header, although some older implementations understand only the last six of them ([RFC3540], an experimental RFC, also defines the least significant of the Resv bits as a nonce sum (NS)). One or more of them can be turned on at the same time. Their use are briefly mentioned here and detailed in later chapters:

  1. CWR. Congestion Window Reduced (the sender reduced its sending rate); see Chapter 16.
  2. ECE. ECN Echo (the sender received an earlier congestion notification); see Chapter 16.
  3. URG. Urgent (the Urgent Pointer field is valid; rarely used); see Chapter 15.
  4. ACK. Acknowledgment (the Acknowledgment Number field is valid; always on after a connection is established); see Chapters 13 and 15.
  5. PSH. Push (the receiver should pass this data to the application as soon as possible; not reliably implemented or used); see Chapter 15.
  6. RST. Reset the connection (connection abort, usually because of an error); see Chapter 13.
  7. SYN. Synchronize sequence numbers to initiate a connection; see Chapter 13.
  8. FIN. The sender of the segment is finished sending data to its peer; see Chapter 13.

Remaining fields:

TCP segments without data *

In Figure 12-2, the data portion of the TCP segment is optional. The TCP segment without data may be one of the following cases:

Summary

The problem of providing reliable communications over lossy communication channels has been studied for years. The two primary methods for dealing with errors include error-correcting codes and data retransmission. The protocols using retransmissions must also handle data loss, usually by setting a timer, and must also arrange some way for the receiver to signal the sender what it has received. Deciding how long to wait for an ACK can be tricky, as the appropriate time may change as network routing or load on the end systems varies. Modern protocols estimate the round-trip time and set the retransmission timer based on some function of these measurements.

Except for setting the retransmission timer, retransmission protocols are simple when only one packet may be in the network at one time, but they perform poorly for networks where the delay is high. To be more efficient, multiple packets must be injected into the network before an ACK is received. This approach is more efficient but also more complex. A typical approach to managing the complexity is to use sliding windows, whereby packets are marked with sequence numbers, and the window size bounds the number of such packets. When the window size varies based on either feedback from the receiver or other signals (such as dropped packets), both flow control and congestion control can be achieved.

TCP provides a reliable, connection-oriented, byte stream, transport-layer service built using many of these techniques. We looked briefly at all of the fields in the TCP header, noting that most of them are directly related to these abstract concepts in reliable delivery. We will examine them in detail in the chapters that follow. TCP packetizes the application data into segments, sets a timeout anytime it sends data, acknowledges data received by the other end, reorders out-of-order data, discards duplicate data, provides end-to-end flow control, and calculates and verifies a mandatory end-to-end checksum. It is the most widely used protocol on the Internet. It is used by most of the popular applications, such as HTTP, SSH/TLS, NetBIOS (NBT—NetBIOS over TCP), Telnet, FTP, and electronic mail (SMTP). Many distributed file-sharing applications (e.g., BitTorrent, Shareaza) also use TCP.

Doubts and Solutions

Verbatim

p589 on SYN segments:

Consuming a sequence number also implies reliable delivery using retransmission (discussed later). Thus, SYNs and application bytes (and FINs) are reliably delivered. ACKs, which do not consume sequence numbers, are not.

Why?