Chapter 5. Building and Testing

Introduction

The infrastructure used to support the development and deployment process should support the following requirements (some ignored):

A deployment pipeline (as shown in the figure below) consists of the steps that are taken between a developer committing code and the code actually being promoted into normal production, while ensuring high quality.

Figure 5.1 Deployment pipeline [Notation: BPMN]

The deployment pipeline has following steps:

  1. Pre-commit tests. The developer performs a series of pre-commit tests on their local environment
  2. Commit. The developer commits code to the joint versioning system
  3. Integration tests. A commit then triggers an integration build of the service being developed. This build is tested by integration tests.
  4. Staging tests. If these tests are successful, the build is promoted to a quasi-production environment, the staging environment, where it is tested once more.
  5. Production. Then, it is promoted to production under close supervision for another period of close supervision
  6. Normal production. It is promoted to normal production.

The specific tasks may vary a bit for different organizations. For example, a small company may not have a staging environment or special supervision for a recently deployed version. A larger company may have several different production environments for different purposes. Some of these different production environments are described in Chapter 6.

Some definitions:

Once a service is deployed into production it is closely monitored for a period and then it is promoted into normal production. At this final stage, monitoring and testing still exist but the service is no different from other services in this regard. In this chapter, we are concerned with the building and testing aspects of this pipeline.

Chapter 6 describes deployment practices, and Chapter 7 discusses monitoring methods.

Moving a System Through the Deployment Pipeline

Committed code moves through the steps shown in Figure 5.1. It is moved by tools, which are controlled by their programs (called scripts in this context) or by developer/operator commands. Two aspects of this movement are of interest in this section:

  1. Traceability
  2. The environment associated with each step of the pipeline

Traceability

Traceability means that, for any system in production, it is possible to determine exactly how it came to be in production. This means keeping track not only of source code but also of all the commands to all the tools that acted on the elements of the system.

A movement called Infrastructure as Code uses the rationale that:

Treating infrastructure-as-code means that this code should be subject to the same quality control as application source code.

A complication to the requirement to keep everything in version control is the treatment of third-party software such as Java libraries. Software project management tools like Apache Maven can go a long way to managing the complexities of library usage. [p82]

The Environment

An executing system can be viewed as a collection of executing code, an environment, configuration, systems outside of the environment with which the primary system interacts, and data.

Figure 5.2 A sample environment [Notation: Architecture]

As the system moves through the deployment pipeline, these items work together to generate the desired behavior or information:

The configuration for each of these environments will be different, for example:

While some changes in configuration are unavoidable, it is important to keep these changes to a minimum to prevent affecting the behavior of the system. As such, testing with a vastly different configuration from the production system will not be helpful.

Wikipedia has a longer list of environments:

Crosscutting Aspects

This section discusses various crosscutting aspects of a deployment pipeline:

Development and Pre-commit Testing

Version Control and Branching

Core features of version control are: the ability to identify distinct versions of the source code, sharing code revisions between developers, recording who made a change from one version to the next, and recording the scope of a change.

Almost all version control systems support the creation of new branches. A branch is essentially a copy of a repository (or a portion) and allows independent evolution of two or more streams of work.

For example, if part of the development team is working on a set of new features while a previous version is in production and a critical error is discovered in the production system, the version currently in production must be fixed. This can be done by creating a branch for the fix based on the version of the code that was released into production. After the error has been fixed and the fixed version has been released into production, the branch with the fix is typically merged back into the main branch (also called the trunk, mainline, or master branch).

This example is useful in highlighting the need for traceability that we discussed previously. In order to fix the error, the code that was executing needs to be determined (traceability of the code). The error may be due to a problem with the configuration (traceability of the configuration) or with the tool suite used to promote it into production (traceability of the infrastructure).

Although the branch structure is useful and important, two problems exist in using branches.

  1. You may have too many branches and lose track of which branch you should be working on for a particular task. For this reason, short-lived tasks should not create a new branch.
  2. Merging two branches can be difficult. Different branches evolve concurrently, and often developers touch many different parts of the code.

An alternative to branching is to have all developers working on the trunk directly. Instead of reintegrating a big branch, a developer deals with integration issues at each commit, which is a simpler solution, but requires more frequent action than using branches.

The problem with doing all of the development on one trunk is that a developer may be working on several different tasks within the same module simultaneously. When one task is finished, the module cannot be committed until the other tasks are completed. To do so would introduce incomplete and untested code for the new feature into the deployment pipeline. Solving this problem is the rationale for feature toggles.

Feature Toggles

A feature toggle (also called a feature flag or a feature switch) is an "if" statement around immature code. A new feature that is not ready for testing or production is disabled in the source code itself, for example, by setting a global Boolean variable.

However, there are certain dangers in feature toggles.

When there are many feature toggles, managing them becomes complicated. It would be useful to have a specialized tool or library that knows about all of the feature toggles in the system, is aware of their current state, can change their state, and can eventually remove the feature toggle from your code base.

Configuration Parameters

A configuration parameter is an externally settable variable that changes the behavior of a system. A configuration setting may be: the language you wish to expose to the user, the location of a data file, the thread pool size, the color of the background on the screen, or the feature toggle settings.

In this book, we are interested in configuration settings that either control the relation of the system to its environment or control behavior related to the stage in the deployment pipeline in which the system is currently run.

[p90]

One decision to make about configuration parameters is whether the values should be the same in the different steps of the deployment pipeline. If the production system’s values are different, you must also decide whether they must be kept confidential. These decisions yield three categories.

  1. Values are the same in multiple environments. Feature toggles and performance-related values (e.g., database connection pool size) should be the same in performance testing/UAT/staging and production, but may be different on local developer machines.
  2. Values are different depending on the environment. The number of virtual machines (VMs) running in production is likely bigger than that number for the testing environments.
  3. Values must be kept confidential. The credentials for accessing the production database or changing the production infrastructure must be kept confidential and only shared with those who need access to them: no sizeable organization can take the risk that a development intern walks away with the customer data.

Keeping values of configuration parameters confidential introduces some complications to the deployment pipeline. The overall goal is to make these values be the current ones in production but keep them confidential.

Testing During Development and Pre-commit Tests

[p91]

While these tests can be run by the developer at any point, a modern practice is to enforce pre-commit tests. These tests are run automatically before a commit is executed. Typically they include a relevant set of unit tests, as well as a few smoke tests. Smoke tests are specific tests that check in a fast (and incomplete) manner that the overall functionality of the service can still be performed. The goal is that any bugs that pass unit tests but break the overall system can be found long before integration testing. Once the pre-commit tests succeed, the commit is executed.

Build and Integration Testing

Build is the process of creating an executable artifact from input such as source code and configuration. It primarily consists of compiling source code and packaging all files that are required for execution. Once the build is complete, a set of automated tests are executed that test whether the integration with other parts of the system uncovers any errors. The unit tests can be repeated here to generate a history available more broadly than to a single developer.

Build Scripts

[p91-92]

The build and integration tests are performed by a continuous integration (CI) server. The input to this server should be scripts that can be invoked by a single command. This practice ensures that the build is repeatable and traceable.

Packaging

The goal of building is to create something suitable for deployment. There are several standard methods of packaging the elements of a system for deployment. The appropriate method of packaging will depend on the production environment. Some packaging options are:

There are two dominant strategies for applying changes in an application when using VM images or lightweight containers: heavily baked versus lightly baked images. Heavily baked images cannot be changed at runtime. This concept is also termed immutable servers: Once a VM has been started, no changes (other than configuration values) are applied to it.

[p93]

Continuous Integration and Build Status

[p93-94]

Integration Testing

Integration testing is the step in which the built executable artifact is tested. The environment includes connections to external services, such as a surrogate database. Including other services requires mechanisms to distinguish between production and test requests, so that running a test does not trigger any actual transactions, such as production, shipment, or payment.

[p94-95]

UAT/Staging/Performance Testing

[p95]

Staging is the last step of the deployment pipeline prior to deploying the system into production. The staging environment mirrors, as much as possible, the production environment. The types of tests that occur at this step are the following:

Production

Early Release Testing

This subsection focuses on the testing method. Chapter 6 discusses how to release the application to achieve early release testing.

Error Detection

Even systems that have passed all of their tests may still have errors. These errors can be either functional or nonfunctional. Techniques used to determine nonfunctional errors include monitoring of the system for indications of poor behavior. This can consist of monitoring the timing of the response to user requests, the queue lengths, and so forth.

Once an alert has been raised, tracking and finding its source can be quite difficult. Logs produced by the system are important in enabling this tracking (Chapter 7). It is important that the provenance of the software causing the alert and the user requests that triggered the alert all can be easily obtained. Enabling the diagnosis of errors is one of the reasons for the emphasis on using automated tools that maintain histories of their activities.

In any case, once the error is diagnosed and repaired, the cause of the error can be made one of the regression tests for future releases.

Live Testing

Monitoring is a passive form of testing: the systems run in their normal fashion and data is gathered about their behavior and performance. Another form of testing after the system has been placed in production is to actually perturb the running system. This form is called live testing. Netflix has a set of test tools called the Simian Army. The elements of the Simian Army are both passive and active. The passive elements examine running instances to determine unused resources, expired certificates, health checks on instances, and adherence to best practices.

Incidents

No matter how well you test or organize a deployment, errors will exist once a system gets into production. Understanding potential causes of post-deployment errors helps to more quickly diagnose problems. Here are several anecdotes we have heard from IT professionals:

Summary

Having an appropriate deployment pipeline is essential for rapidly creating and deploying systems. The pipeline has at least five major step: pre-commit, build and integration testing, UAT/staging/performance tests, production, and promoting to normal production.

Each step operates within a different environment and with a set of different configuration parameter values—although this set should be limited in size as much as possible. As the system moves through the pipeline, you can have progressively more confidence in its correctness. Even systems promoted to normal production, however, can have errors and can be improved from the perspective of performance or reliability. Live testing is a mechanism to continue to test even after placing a system in production or promoting it to normal production.

Feature toggles are used to make code inaccessible during production. They allow incomplete code to be contained in a committed module. They should be removed when no longer necessary because otherwise they clutter the code base; also, repurposed feature toggles can cause errors.

Tests should be automated, run by a test harness, and report results back to the development team and other interested parties. Many incidents after placing a system in production are caused by either developer or configuration errors.

An architect involved in a DevOps project should ensure the following: