Chapter 5. Functions¶
Functions are a critical part of any programming language. [p119]
Function Declarations¶
A function declaration has a name, a list of parameters, an optional list of results, and a body:
func name(parameter-list) (result-list) { body }
- The parameter list specifies the names and types of the function's parameters, which are local variables whose values (arguments) are supplied by the caller.
- The result list specifies the types of the values that the function returns.
- If the function returns one unnamed result or no results, parentheses are optional and usually omitted.
- Leaving off the result list entirely declares a function that does not return any value.
For example, in the hypot
function below:
func hypot(x, y float64) float64 { return math.Sqrt(x*x + y*y) } fmt.Println(hypot(3, 4)) // "5"
x
andy
are parameters in the declaration.3
and4
are arguments of the call.- The function returns a
float64
value.
Results may be named, in which case each name declares a local variable initialized to the zero value for its type.
A function that has a result list must end with a return
statement unless execution clearly cannot reach the end of the function, the possible cases being:
- The function ends with a call to
panic
- Infinite
for
loop with nobreak
A sequence of parameters or results of the same type can be factored so that the type itself is written only once. These two declarations are equivalent:
func f(i, j, k int, s, t string) { /* ... */ } func f(i int, j int, k int, s string, t string) { /* ... */ }
Below are four ways to declare a function with two parameters and one result, all of type int
. The blank identifier can be used to emphasize that a parameter is unused.
func add(x int, y int) int { return x + y } func sub(x, y int) (z int) { z = x - y; return } func first(x int, _ int) int { return x } func zero(int, int) int { return 0 } fmt.Printf("%T\n", add) // "func(int, int) int" fmt.Printf("%T\n", sub) // "func(int, int) int" fmt.Printf("%T\n", first) // "func(int, int) int" fmt.Printf("%T\n", zero) // "func(int, int) int"
The type of a function is sometimes called its signature. Two functions have the same type or signature if they have the same sequence of parameter types and the same sequence of result types. The following don't affect the type:
- The names of parameters and results
- Whether or not they were declared using the factored form
Every function call must provide an argument for each parameter, in the order in which the parameters were declared.
Go does not have the concept of the following:
- Default parameter values
- Any way to specify arguments by name
The names of parameters and results don't matter to the caller except as documentation.
Parameters are local variables within the body of the function, with their initial values set to the arguments supplied by the caller. Function parameters and named results are variables in the same lexical block as the functions' outermost local variables.
Arguments are passed by value: the function receives a copy of each argument; modifications to the copy do not affect the caller. However, if the argument contains some kind of reference, like a pointer, slice, map, function, or channel, then the caller may be affected by any modifications the function makes to variables indirectly referred to by the argument.
You may occasionally encounter a function declaration without a body, indicating that the function is implemented in a language other than Go. Such a declaration defines the function signature.
package math func Sin(x float64) float64 // implemented in assembly language
Recursion¶
Functions may be recursive, which means they may call themselves, either directly or indirectly.
The example program below uses a non-standard package, golang.org/x/net/html
, which provides an HTML parser. The golang.org/x/...
repositories hold packages designed and maintained by the Go team for applications such as networking, internationalized text processing, mobile platforms, image manipulation, cryptography, and developer tools. These packages are not in the standard library because they're still under development or rarely needed by the majority of Go programmers.
The parts of the golang.org/x/net/html
API are shown below.
- The function
html.Parse
reads a sequence of bytes, parses them, and returns the root of the HTML document tree, which is anhtml.Node
. - HTML has several kinds of nodes. We are concerned only with element nodes of the form
<name key='value'>
.
package html type Node struct { Type NodeType Data string Attr []Attribute FirstChild, NextSibling *Node } type NodeType int32 const ( ErrorNode NodeType = iota TextNode DocumentNode ElementNode CommentNode DoctypeNode ) type Attribute struct { Key, Val string } func Parse(r io.Reader) (*Node, error)
The main
function parses the standard input as HTML, extracts the links using a recursive visit
function, and prints each discovered link:
gopl.io/ch5/findlinks1/main.go
// Findlinks1 prints the links in an HTML document read from standard input. package main import ( "fmt" "os" "golang.org/x/net/html" ) func main() { doc, err := html.Parse(os.Stdin) if err != nil { fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err) os.Exit(1) } for _, link := range visit(nil, doc) { fmt.Println(link) } }
The visit
function traverses an HTML node tree, extracts the link from the href
attribute of each anchor element <a href='...'>
, appends the links to a slice of strings, and returns the resulting slice:
// visit appends to links each link found in n and returns the result. func visit(links []string, n *html.Node) []string { if n.Type == html.ElementNode && n.Data == "a" { for _, a := range n.Attr { if a.Key == "href" { links = append(links, a.Val) } } } for c := n.FirstChild; c != nil; c = c.NextSibling { links = visit(links, c) } return links }
[p123]
The next program uses recursion over the HTML node tree to print the structure of the tree in outline. As it encounters each element, it pushes the element's tag onto a stack, then prints the stack.
func main() { doc, err := html.Parse(os.Stdin) if err != nil { fmt.Fprintf(os.Stderr, "outline: %v\n", err) os.Exit(1) } outline(nil, doc) } func outline(stack []string, n *html.Node) { if n.Type == html.ElementNode { stack = append(stack, n.Data) // push tag fmt.Println(stack) } for c := n.FirstChild; c != nil; c = c.NextSibling { outline(stack, c) } }
Note that outline
"pushes" an element on stack, but there is no corresponding pop. When outline
calls itself recursively, the callee receives a copy of stack. Although the callee may append elements to this slice, modifying its underlying array and perhaps even allocating a new array, it doesn't modify the initial elements that are visible to the caller, so when the function returns, the caller's stack is as it was before the call.
The following is the outline of https://golang.org
:
$ go build gopl.io/ch5/outline $ ./fetch https://golang.org | ./outline [html] [html head] [html head meta] [html head title] [html head link] [html body] [html body div] [html body div] [html body div div] [html body div div form] [html body div div form div] [html body div div form div a] ...
Variable-size stacks in Go *¶
Many programming language implementations use a fixed-size function call stack; sizes from 64KB to 2MB are typical. Fixed-size stacks impose a limit on the depth of recursion, so one must be careful to avoid a stack overflow when traversing large data structures recursively; fixed-size stacks may even pose a security risk.
In contrast, typical Go implementations use variable-size stacks that start small and grow as needed up to a limit on the order of a gigabyte, which lets us use recursion safely and without worrying about overflow.
Multiple Return Values¶
A function can return more than one result. Many examples of functions from standard packages return two values, the desired computational result and an error value or boolean that indicates whether the computation worked.
The program below is a variation of findlinks
that makes the HTTP request itself. Because the HTTP and parsing operations can fail, findLinks
declares two results: the list of discovered links and an error. The HTML parser can usually recover from bad input and construct a document containing error nodes, so Parse
rarely fails; when it does, it’s typically due to underlying I/O errors.
gopl.io/ch5/findlinks2/main.go
func main() { for _, url := range os.Args[1:] { links, err := findLinks(url) if err != nil { fmt.Fprintf(os.Stderr, "findlinks2: %v\n", err) continue } for _, link := range links { fmt.Println(link) } } } // findLinks performs an HTTP GET request for url, parses the // response as HTML, and extracts and returns the links. func findLinks(url string) ([]string, error) { resp, err := http.Get(url) if err != nil { return nil, err } if resp.StatusCode != http.StatusOK { resp.Body.Close() return nil, fmt.Errorf("getting %s: %s", url, resp.Status) } doc, err := html.Parse(resp.Body) resp.Body.Close() if err != nil { return nil, fmt.Errorf("parsing %s as HTML: %v", url, err) } return visit(nil, doc), nil }
There are four return statements in findLinks
, each of which returns a pair of values. The first three returns cause the function to pass the underlying errors from the http
and html
packages on to the caller.
- In the first case, the error is returned unchanged.
- In the second and third, it is augmented with additional context information by
fmt.Errorf
(Section 7.8). - If
findLinks
is successful, the final return statement returns the slice of links, with no error.
We must ensure that resp.Body
is closed so that network resources are properly released even in case of error. Go's garbage collector recycles unused memory, but you cannot assume it will release unused operating system resources like open files and network connections. They should be closed explicitly.
The result of calling a multi-valued function is a tuple of values. The caller of such a function must explicitly assign the values to variables if any of them are to be used:
links, err := findLinks(url)
To ignore one of the values, assign it to the blank identifier:
links, _ := findLinks(url) // errors ignored
The result of a multi-valued call may itself be returned from a (multi-valued) calling function. For example, the following function behaves like findLinks
but logs its argument:
func findLinksLog(url string) ([]string, error) { log.Printf("findLinks %s", url) return findLinks(url) }
A multi-valued call may appear as the sole argument when calling a function of multiple parameters. Although rarely used in production code, this feature is sometimes convenient during debugging since it prints all the results of a call using a single statement. The two print statements below have the same effect.
log.Println(findLinks(url)) links, err := findLinks(url) log.Println(links, err)
Well-chosen names can document the significance of a function's results. Names are particularly valuable when a function returns multiple results of the same type. For example:
func Size(rect image.Rectangle) (width, height int) func Split(path string) (dir, file string) func HourMinSec(t time.Time) (hour, minute, second int)
However, it's not always necessary to name multiple results solely for documentation. For instance, convention dictates that a final bool
result indicates success; an error
result often needs no explanation.
Bare return *¶
In a function with named results, the operands of a return statement may be omitted. This is called a bare return.
// CountWordsAndImages does an HTTP GET request for the HTML // document url and returns the number of words and images in it. func CountWordsAndImages(url string) (words, images int, err error) { resp, err := http.Get(url) if err != nil { return } doc, err := html.Parse(resp.Body) resp.Body.Close() if err != nil { err = fmt.Errorf("parsing HTML: %s", err) return } words, images = countWordsAndImages(doc) return } func countWordsAndImages(n *html.Node) (words, images int) { /* ... */ }
A bare return is a shorthand way to return each of the named result variables in order: in the function above, each return statement is equivalent to.
return words, images, err
In such functions with many return statements and several results, bare returns can reduce code duplication, but they rarely make code easier to understand. For instance, it's not obvious that the two early returns are equivalent to return 0, 0, err
(because the result variables words and images are initialized to their zero values) and that the final return is equivalent to return words, images, nil
. For this reason, bare returns should be sparingly used.
Errors¶
Some functions always succeed. For example, strings.Contains
and strconv.FormatBool
have well-defined results for all possible argument values and cannot fail, except catastrophic and unpredictable scenarios like running out of memory. [p127]
Other functions always succeed so long as their preconditions are met. For example, the time.Date
function always constructs a time.Time
from its components (year, month, etc.), unless the last argument (the time zone) is nil
, in which case it panics. This panic is a sure sign of a bug in the calling code and should never happen in a well-written program.
For many other functions, even in a well-written program, success is not assured because it depends on factors beyond the programmers' control. Any function that does I/O, for example, must confront the possibility of error, and only a naive programmer believes a simple read or write cannot fail. We most need to know why when the most reliable operations fail unexpectedly
Errors are thus an important part of a package's API or an applications' user interface, and failure is just one of several expected behaviors. This is the approach Go takes to error handling.
A function for which failure is an expected behavior returns an additional result, conventionally the last one. If the failure has only one possible cause, the result is a boolean, usually called ok
, as in the following example of a cache lookup that always succeeds unless there was no entry for that key:
value, ok := cache.Lookup(key) if !ok { // ...cache[key] does not exist... }
More often, and especially for I/O, the failure may have a variety of causes for which the caller will need an explanation. In such cases, the type of the additional result is error
.
The built-in type error
is an interface type (detailed in Chapter 7. An error may be nil
or non-nil
; nil
implies success and non-nil
implies failure, and non-nil
error has an error message string which can be obtained by calling its Error
method or print by calling fmt.Println(err)
or fmt.Printf("%v", err)
.
Usually when a function returns a non-nil error, its other results are undefined and should be ignored. However, a few functions may return partial results in error cases. For example, if an error occurs while reading from a file, a call to Read
returns the number of bytes it was able to read and an error
value describing the problem. For correct behavior, some callers may need to process the incomplete data before handling the error, so it is important that such functions clearly document their results.
Why Go uses control-flow mechanisms for error handling *¶
Go's approach differs from many other languages in which failures are reported using exceptions, not ordinary values. Although Go does have an exception mechanism of sorts, which is discussed in Section 5.9, it is used only for reporting truly unexpected errors that indicate a bug, not the routine errors that a robust program should be built to expect.
The problem is that exceptions tend to entangle the description of an error with the control flow required to handle it, often leading to an undesirable outcome: routine errors are reported to the end user in the form of an incomprehensible stack trace, full of information about the structure of the program but lacking intelligible context about what went wrong.
By contrast, Go programs use ordinary control-flow mechanisms like if
and return
to respond to errors. This style undeniably demands that more attention be paid to error-handling logic, but that is precisely the point.
Error-Handling Strategies¶
When a function call returns an error, it's the caller's responsibility to check it and take appropriate action. Depending on the situation, there may be a number of possibilities. This section discusses five of them.
Propagating the error *¶
The first and most common strategy is to propagate the error, so that a failure in a subroutine becomes a failure of the calling routine. In the findLinks
function of Section 5.3, if the call to http.Get
fails, findLinks
returns the HTTP error to the caller immediately:
resp, err := http.Get(url) if err != nil { return nil, err }
In contrast, if the call to html.Parse
fails, findLinks
does not return the HTML parser's error directly because it lacks two crucial pieces of information:
- The error occurred in the parser
- The URL of the document that was being parsed.
In this case, findLinks
constructs a new error message that includes both pieces of information as well as the underlying parse error:
doc, err := html.Parse(resp.Body) resp.Body.Close() if err != nil { return nil, fmt.Errorf("parsing %s as HTML: %v", url, err) }
The fmt.Errorf
function formats an error message using fmt.Sprintf
and returns a new error
value. It is used to build descriptive errors by successively prefixing additional context information to the original error message. When the error is ultimately handled by the program's main function, it should provide a clear causal chain from the root problem to the overall failure, reminiscent of a NASA accident investigation:
genesis: crashed: no parachute: G-switch failed: bad relay orientation
Because error messages are frequently chained together, message strings should not be capitalized and newlines should be avoided. The resulting errors may be long, but they will be self-contained when found by tools like grep
.
When designing error messages, remember the following:
- Be deliberate, so that each one is a meaningful description of the problem with sufficient and relevant detail.
- Be consistent, so that errors returned by the same function or by a group of functions in the same package are similar in form and can be dealt with in the same way.
For example, the os
package guarantees that every error returned by a file operation, such as os.Open
or the Read
, Write
, or Close
methods of an open file, describes not just the nature of the failure (permission denied, no such directory, and so on) but also the name of the file, so the caller needn't include this information in the error message it constructs.
In general, the call f(x)
is responsible for reporting the attempted operation f
and the argument value x
as they relate to the context of the error. The caller is responsible for adding further information that it has but the call f(x)
does not, such as the URL in the call to html.Parse
above.
Retrying the failed operation *¶
The second strategy is for errors that represent transient or unpredictable problems; in such cases, it may make sense to retry the failed operation, possibly with a delay between tries and with a limit on the number of attempts or the time spent trying before giving up entirely.
// WaitForServer attempts to contact the server of a URL. // It tries for one minute using exponential back-off. // It reports an error if all attempts fail. func WaitForServer(url string) error { const timeout = 1 * time.Minute deadline := time.Now().Add(timeout) for tries := 0; time.Now().Before(deadline); tries++ { _, err := http.Head(url) if err == nil { return nil // success } log.Printf("server not responding (%s); retrying...", err) time.Sleep(time.Second << uint(tries)) // exponential back-off } return fmt.Errorf("server %s failed to respond after %s", url, timeout) }
Printing the error and stopping the program *¶
Third, if progress is impossible, the caller can print the error and stop the program gracefully, but this action should generally be reserved for the main package of a program.
Library functions should usually propagate errors to the caller, unless the error is a sign of an internal inconsistency, that is, a bug.
// (In function main.) if err := WaitForServer(url); err != nil { fmt.Fprintf(os.Stderr, "Site is down: %v\n", err) os.Exit(1) }
A more convenient, equivalent way is to call log.Fatalf
. As with all the log
functions, by default it prefixes the time and date to the error message.
if err := WaitForServer(url); err != nil { log.Fatalf("Site is down: %v\n", err) }
The default format is helpful in a long-running server, but not useful for an interactive tool:
2006/01/02 15:04:05 Site is down: no such domain: bad.gopl.io
We can set the prefix used by the log
package to the name of the command, and suppress the display of the date and time:
log.SetPrefix("wait: ") log.SetFlags(0)
Logging the error and continuing *¶
Fourth, in some cases, it's sufficient just to log the error and then continue, perhaps with
reduced functionality. Using the log
package adds the usual prefix:
if err := Ping(); err != nil { log.Printf("ping failed: %v; networking disabled", err) }
To print directly to the standard error stream:
if err := Ping(); err != nil { fmt.Fprintf(os.Stderr, "ping failed: %v; networking disabled\n", err) }
Note that all log
functions append a newline if one is not already present.
Ignoring the error *¶
Fifth and finally, in rare cases we can safely ignore an error entirely:
dir, err := ioutil.TempDir("", "scratch") if err != nil { return fmt.Errorf("failed to create temp dir: %v", err) } // ...use temp dir... os.RemoveAll(dir) // ignore errors; $TMPDIR is cleaned periodically
The call to os.RemoveAll
may fail, but the program ignores it because the operating system periodically cleans out the temporary directory. In this case, discarding the error was intentional, but the program logic would be the same had we forgotten to deal with it. When you deliberately ignore one, document your intention clearly.
Summary of error handling *¶
Error handling in Go has a particular rhythm.
- After checking an error, failure is usually dealt with before success.
- If failure causes the function to return, the logic for success is not indented within an else block but follows at the outer level.
- Functions tend to exhibit a common structure, with a series of initial checks to reject errors, followed by the substance of the function at the end, minimally indented.
End of File (EOF)¶
Occasionally, program must take different actions depending on the kind of error that has occurred. Consider an attempt to read n bytes of data from a file:
- If n is chosen to be the length of the file, any error represents a failure.
- If the caller repeatedly tries to read fixed-size chunks until the file is exhausted, the caller must respond differently to an end-of-file condition than it does to all other errors.
For this reason, the io
package guarantees that any read failure caused by an end-of-file condition is always reported by a distinguished error, io.EOF
, which is defined as follows:
package io import "errors" // EOF is the error returned by Read when no more input is available. var EOF = errors.New("EOF")
The caller can detect this condition using a simple comparison, as in the loop below, which reads runes from the standard input. (The charcount
program in Section 4.3 provides a more complete example.)
in := bufio.NewReader(os.Stdin) for { r, _, err := in.ReadRune() if err == io.EOF { break // finished reading } if err != nil { return fmt.Errorf("read failed: %v", err) } // ...use r... }
Since in an end-of-file condition there is no information to report besides the fact of it, io.EOF
has a fixed error message, "EOF"
. For other errors, we may need to report both the quality and quantity of the error, so a fixed error value will not do. Section 7.11 will present a more systematic way to distinguish certain error values from others.
Function Values¶
Functions are first-class values in Go: like other values, function values have types, and they may be assigned to variables or passed to or returned from functions. A function value may be called like any other function. For example:
func square(n int) int { return n * n } func negative(n int) int { return -n } func product(m, n int) int { return m * n } f := square fmt.Println(f(3)) // "9" f = negative fmt.Println(f(3)) // "-3" fmt.Printf("%T\n", f) // "func(int) int" f = product // compile error: can't assign f(int, int) int to f(int) int
The zero value of a function type is nil
. Calling a nil
function value causes a panic:
var f func(int) int f(3) // panic: call of nil function
Function values may be compared with nil
:
var f func(int) int if f != nil { f(3) }
But they are not comparable, so they may not be compared against each other or used as keys in a map.
With function values, we can parameterize our functions over not only data, but also behavior. The standard libraries contain many examples. For instance, strings.Map
applies a function to each character of a string, joining the results to make another string.
func add1(r rune) rune { return r + 1 } fmt.Println(strings.Map(add1, "HAL-9000")) // "IBM.:111" fmt.Println(strings.Map(add1, "VMS")) // "WNT" fmt.Println(strings.Map(add1, "Admix")) // "Benjy"
The findLinks
function from Section 5.2 uses a helper function, visit
, to visit all the nodes in an HTML document and apply an action to each one. Using a function value, we can separate the logic for tree traversal from the logic for the action to be applied to each node, so that we can reuse the traversal with different actions.
gopl.io/ch5/outline2/outline.go
// forEachNode calls the functions pre(x) and post(x) for each node // x in the tree rooted at n. Both functions are optional. // pre is called before the children are visited (preorder) and // post is called after (postorder). func forEachNode(n *html.Node, pre, post func(n *html.Node)) { if pre != nil { pre(n) } for c := n.FirstChild; c != nil; c = c.NextSibling { forEachNode(c, pre, post) } if post != nil { post(n) } }
The forEachNode
function accepts two function arguments, one to call before a node's children are visited and one to call after. This arrangement gives the caller a great deal of flexibility. For example, the functions startElement
and endElement
print the start and end tags of an HTML element like <b>...</b>
:
var depth int func startElement(n *html.Node) { if n.Type == html.ElementNode { fmt.Printf("%*s<%s>\n", depth*2, "", n.Data) depth++ } } func endElement(n *html.Node) { if n.Type == html.ElementNode { depth-- fmt.Printf("%*s</%s>\n", depth*2, "", n.Data) } }
The functions also indent the output using another fmt.Printf
trick. The *
adverb in %*s
prints a string padded with a variable number of spaces. The width and the string are provided by the arguments depth*2
and ""
.
If we call forEachNode
on an HTML document, like this:
forEachNode(doc, startElement, endElement)
This will output:
$ go build gopl.io/ch5/outline2 $ ./outline2 http://gopl.io <html> <head> <meta> </meta> <title> </title> <style> </style> </head> <body> <table> <tbody> <tr> <td> <a> <img> </img> ...
Anonymous Functions¶
Named functions can be declared only at the package level, but we can use a function literal to denote a function value within any expression. A function literal is written like a function declaration, but without a name following the func
keyword. It is an expression, and its value is called an anonymous function.
Function literals let us define a function at its point of use. As an example, the earlier call to strings.Map
in Section 5.5 can be rewritten as:
strings.Map(func(r rune) rune { return r + 1 }, "HAL-9000")
Functions defined in this way have access to the entire lexical environment, so the inner function can refer to variables from the enclosing function. For example:
// squares returns a function that returns // the next square number each time it is called. func squares() func() int { var x int return func() int { x++ return x * x } } func main() { f := squares() fmt.Println(f()) // "1" fmt.Println(f()) // "4" fmt.Println(f()) // "9" fmt.Println(f()) // "16" }
The function squares
returns another function of type func() int
. A call to squares
creates a local variable x
and returns an anonymous function that, each time it is called, increments x
and returns its square. A second call to squares would create a second variable x
and return a new anonymous function.
This example demonstrates:
- Function values are not just code but can have state.
- The anonymous inner function can access and update the local variables of the enclosing function. These hidden variable references are why we classify functions as reference types and why function values are not comparable.
- Function values like these are implemented using a technique called closures, and Go programmers often use this term for function values.
- The lifetime of a variable is not determined by its scope
- In the above example, the variable
x
exists aftersquares
has returned withinmain
, even thoughx
is hidden insidef
.
- In the above example, the variable
The following example is a problem of computing a sequence of computer science courses that satisfies the prerequisite requirements of each one. The prerequisites are given in the prereqs
table below, which is a mapping from each course to the list of courses that must be completed before it.
// prereqs maps computer science courses to their prerequisites. var prereqs = map[string][]string{ "algorithms": {"data structures"}, "calculus": {"linear algebra"}, "compilers": { "data structures", "formal languages", "computer organization", }, "data structures": {"discrete math"}, "databases": {"data structures"}, "discrete math": {"intro to programming"}, "formal languages": {"discrete math"}, "networks": {"operating systems"}, "operating systems": {"data structures", "computer organization"}, "programming languages": {"data structures", "computer organization"}, }
This kind of problem is known as topological sorting. The prerequisite information forms a directed graph with a node for each course and edges from each course to the courses that it depends on. The graph is acyclic: there is no path from a course that leads back to itself. We can compute a valid sequence using depth-first search through the graph with the code below:
func main() { for i, course := range topoSort(prereqs) { fmt.Printf("%d:\t%s\n", i+1, course) } } func topoSort(m map[string][]string) []string { var order []string seen := make(map[string]bool) var visitAll func(items []string) visitAll = func(items []string) { for _, item := range items { if !seen[item] { seen[item] = true visitAll(m[item]) order = append(order, item) } } } var keys []string for key := range m { keys = append(keys, key) } sort.Strings(keys) visitAll(keys) return order }
When an anonymous function requires recursion, we must first declare a variable, and then assign the anonymous function to that variable.
If two steps are combined in the declaration, the function literal would not be within the scope of the variable visitAll
so it would have no way to call itself recursively:
visitAll := func(items []string) { // ... visitAll(m[item]) // compile error: undefined: visitAll // ... }
The output of the toposort
program is shown below. It is deterministic, an often-desirable property that doesn't always come for free. The values of the prereqs
map are slices, not more maps, so their iteration order is deterministic, and we sorted the keys of prereqs
before making the initial calls to visitAll
.
1: intro to programming 2: discrete math 3: data structures 4: algorithms 5: linear algebra 6: calculus 7: formal languages 8: computer organization 9: compilers 10: databases 11: operating systems 12: networks 13: programming languages
Returning to the findLinks
example, link-extraction function links.Extract
is moved to its own package, since it'll be used again in Chapter 8. We replaced the visit
function with an anonymous function that appends to the links
slice directly, and used forEachNode
to handle the traversal. Since Extract
needs only the pre
function, it passes nil
for the post
argument.
// Package links provides a link-extraction function. package links import ( "fmt" "net/http" "golang.org/x/net/html" ) // Extract makes an HTTP GET request to the specified URL, parses // the response as HTML, and returns the links in the HTML document. func Extract(url string) ([]string, error) { resp, err := http.Get(url) if err != nil { return nil, err } if resp.StatusCode != http.StatusOK { resp.Body.Close() return nil, fmt.Errorf("getting %s: %s", url, resp.Status) } doc, err := html.Parse(resp.Body) resp.Body.Close() if err != nil { return nil, fmt.Errorf("parsing %s as HTML: %v", url, err) } var links []string visitNode := func(n *html.Node) { if n.Type == html.ElementNode && n.Data == "a" { for _, a := range n.Attr { if a.Key != "href" { continue } link, err := resp.Request.URL.Parse(a.Val) if err != nil { continue // ignore bad URLs } links = append(links, link.String()) } } } forEachNode(doc, visitNode, nil) return links, nil }
Instead of appending the raw href
attribute value to the links
slice, this version parses it as a URL relative to the base URL of the document, resp.Request.URL
. The resulting link
is in absolute form, suitable for use in a call to http.Get
.
Crawling the web is a problem of graph traversal. The topoSort
example showed a depth-first traversal. The following web crawler uses breadth-first traversal. In Chapter 8, we'll explore concurrent traversal.
The function below encapsulates the essence of a breadth-first traversal:
- The caller provides an initial list
worklist
of items to visit and a function valuef
to call for each item, which is identified by a string. - The function
f
returns a list of new items to append to the worklist. - The
breadthFirst
function returns when all items have been visited. It maintains a set of strings to ensure that no item is visited twice.
gopl.io/ch5/findlinks3/findlinks.go
// breadthFirst calls f for each item in the worklist. // Any items returned by f are added to the worklist. // f is called at most once for each item. func breadthFirst(f func(item string) []string, worklist []string) { seen := make(map[string]bool) for len(worklist) > 0 { items := worklist worklist = nil for _, item := range items { if !seen[item] { seen[item] = true worklist = append(worklist, f(item)...) } } } }
The argument "f(item)...
" causes all the items in the list returned by f
to be appended to the worklist.
In this crawler, items are URLs. The crawl
function, which is passed to breadthFirst
, prints the URL, extracts its links, and returns them so that they are also visited.
func crawl(url string) []string { fmt.Println(url) list, err := links.Extract(url) if err != nil { log.Print(err) } return list }
To start off the crawler, we use the command-line arguments as the initial URLs.
func main() { // Crawl the web breadth-first, // starting from the command-line arguments. breadthFirst(crawl, os.Args[1:]) }
Crawl the web starting from https://golang.org
:
$ go build gopl.io/ch5/findlinks3 $ ./findlinks3 https://golang.org https://golang.org/ https://golang.org/doc/ https://golang.org/pkg/ https://golang.org/project/ https://code.google.com/p/go-tour/ https://golang.org/doc/code.html https://www.youtube.com/watch?v=XCsL89YtqCs http://research.swtch.com/gotour https://vimeo.com/53221560 ...
The process ends when all reachable web pages have been crawled or the memory of the computer is exhausted.
Caveat: Capturing Iteration Variables¶
This section discusses a pitfall of Go's lexical scope rules that can cause surprising results.
Consider a program that must create a set of directories and later remove them. We can use a slice of function values to hold the clean-up operations. (All error handling in this example for brevity.)
var rmdirs []func() for _, d := range tempDirs() { dir := d // NOTE: necessary! os.MkdirAll(dir, 0755) // creates parent directories too rmdirs = append(rmdirs, func() { os.RemoveAll(dir) }) } // ...do some work... for _, rmdir := range rmdirs { rmdir() // clean up }
You may be wondering why we assigned the loop variable d
to a new local variable dir
within the loop body, instead of just using the loop variable dir
as in this subtly incorrect variant:
var rmdirs []func() for _, dir := range tempDirs() { os.MkdirAll(dir, 0755) rmdirs = append(rmdirs, func() { os.RemoveAll(dir) // NOTE: incorrect! }) }
The reason is a consequence of the scope rules for loop variables. In the program immediately above:
- The
for
loop introduces a new lexical block in which the variabledir
is declared. All function values created by this loop "capture" and share the same variable, which is an addressable storage location, not its value at that particular moment. - The value of
dir
is updated in successive iterations, so by the time the cleanup functions are called, thedir
variable has been updated several times by the now-completedfor
loop. - Thus
dir
holds the value from the final iteration, and consequently all calls toos.RemoveAll
will attempt to remove the same directory.
Frequently, the inner variable introduced to work around this problem (dir
in the example above) is given the exact same name as the outer variable of which it is a copy. This leads to odd-looking but crucial variable declarations like this:
for _, dir := range tempDirs() { dir := dir // declares inner dir, initialized to outer dir // ... }
The risk is not unique to range
-based for
loops. The loop in the example below suffers from the same problem due to unintended capture of the index variable i
.
var rmdirs []func() dirs := tempDirs() for i := 0; i < len(dirs); i++ { os.MkdirAll(dirs[i], 0755) // OK rmdirs = append(rmdirs, func() { os.RemoveAll(dirs[i]) // NOTE: incorrect! }) }
The problem of iteration variable capture is most often encountered when using the go
statement (Chapter 8) or with defer
(to be discussed later) since both may delay the execution of a function value until after the loop has finished. But the problem is not inherent to go
or defer
. See also Common Mistakes.
Variadic Functions¶
A variadic function can be called with varying numbers of arguments. The most familiar examples are fmt.Printf
and its variants. Printf
requires one fixed argument at the beginning, then accepts any number of subsequent arguments.
To declare a variadic function, the type of the final parameter is preceded by an ellipsis, "...
", which indicates that the function may be called with any number of arguments of this type.
func sum(vals ...int) int { total := 0 for _, val := range vals { total += val } return total }
- The
sum
function above returns the sum of zero or moreint
arguments. - Within the body of the function, the type of
vals
is an[]int
slice. - When
sum
is called, any number of values may be provided for itsvals
parameter.
fmt.Println(sum()) // "0" fmt.Println(sum(3)) // "3" fmt.Println(sum(1, 2, 3, 4)) // "10"
Implicitly, the caller allocates an array, copies the arguments into it, and passes a slice of the entire array to the function.
To invoke a variadic function when the arguments are already in a slice, place an ellipsis after the final argument. The call below behaves the same as the last call above.
values := []int{1, 2, 3, 4} fmt.Println(sum(values...)) // "10"
Although the ...int
parameter behaves like a slice within the function body, the type of a variadic function is distinct from the type of a function with an ordinary slice parameter.
func f(...int) {} func g([]int) {} fmt.Printf("%T\n", f) // "func(...int)" fmt.Printf("%T\n", g) // "func([]int)"
Variadic functions are often used for string formatting. The errorf
function below constructs a formatted error message with a line number at the beginning. The suffix f
is a widely followed naming convention for variadic functions that accept a Printf
-style format string.
func errorf(linenum int, format string, args ...interface{}) { fmt.Fprintf(os.Stderr, "Line %d: ", linenum) fmt.Fprintf(os.Stderr, format, args...) fmt.Fprintln(os.Stderr) } linenum, name := 12, "count" errorf(linenum, "undefined: %s", name) // "Line 12: undefined: count"
The interface{}
type means that this function can accept any values for its final arguments, which is discussed in Chapter 7.
Deferred Function Calls¶
The findLinks
examples used the output of http.Get
as the input to html.Parse
. However, many pages contain images, plain text, and other file formats instead of HTML. Feeding such files into an HTML parser could have undesirable effects.
The program below fetches an HTML document and prints its title. The title
function inspects the Content-Type
header of the server's response and returns an error if the document is not HTML.
func title(url string) error { resp, err := http.Get(url) if err != nil { return err } // Check Content-Type is HTML (e.g., "text/html; charset=utf-8"). ct := resp.Header.Get("Content-Type") if ct != "text/html" && !strings.HasPrefix(ct, "text/html;") { resp.Body.Close() return fmt.Errorf("%s has type %s, not text/html", url, ct) } doc, err := html.Parse(resp.Body) resp.Body.Close() if err != nil { return fmt.Errorf("parsing %s as HTML: %v", url, err) } visitNode := func(n *html.Node) { if n.Type == html.ElementNode && n.Data == "title" && n.FirstChild != nil { fmt.Println(n.FirstChild.Data) } } forEachNode(doc, visitNode, nil) return nil }
$ go build gopl.io/ch5/title1 $ ./title1 http://gopl.io The Go Programming Language $ ./title1 https://golang.org/doc/effective_go.html Effective Go - The Go Programming Language $ ./title1 https://golang.org/doc/gopher/frontpage.png title: https://golang.org/doc/gopher/frontpage.png has type image/png, not text/html
The resp.Body.Close()
call, which is duplicated, ensures that title closes the network connection on all execution paths, including failures. As functions grow more complex and have to handle more errors, such duplication of clean-up logic may become a maintenance problem. Go' defer
mechanism makes things simpler.
Syntactically, a defer
statement is an ordinary function or method call prefixed by the keyword defer
.
- The function and argument expressions are evaluated when the statement is executed, but the actual call is deferred until the (caller) function that contains the
defer
statement has finished. The "finished" here means either normally or abnormally:- Normally: the caller function executing a return statement or falling off the end.
- Abnormally: the caller function panicking.
- Any number of calls may be deferred; they are executed in the reverse of the order in which they were deferred.
A defer
statement is often used with paired operations to ensure that resources are released in all cases no matter how complex the control flow. Some examples for such paired operations are:
- Open and close
- Connect and disconnect
- Lock and unlock
The right place for a defer
statement that releases a resource is immediately after the resource has been successfully acquired. In the title
function below, a single deferred call replaces both previous calls to resp.Body.Close()
:
func title(url string) error { resp, err := http.Get(url) if err != nil { return err } defer resp.Body.Close() ct := resp.Header.Get("Content-Type") if ct != "text/html" && !strings.HasPrefix(ct, "text/html;") { return fmt.Errorf("%s has type %s, not text/html", url, ct) } doc, err := html.Parse(resp.Body) if err != nil { return fmt.Errorf("parsing %s as HTML: %v", url, err) } // ...print doc's title element... //!- visitNode := func(n *html.Node) { if n.Type == html.ElementNode && n.Data == "title" && n.FirstChild != nil { fmt.Println(n.FirstChild.Data) } } forEachNode(doc, visitNode, nil) //!+ return nil }
The same pattern can be used for other resources. For example, close an open file:
package ioutil func ReadFile(filename string) ([]byte, error) { f, err := os.Open(filename) if err != nil { return nil, err } defer f.Close() return ReadAll(f) }
Unlock a mutex (Section 9.2):
var mu sync.Mutex var m = make(map[string]int) func lookup(key string) int { mu.Lock() defer mu.Unlock() return m[key] }
On-entry and on-exit actions *¶
The defer
statement can also be used to pair "on entry" and "on exit" actions when debugging a complex function.
The bigSlowOperation
function below does two things:
- It calls trace immediately, which does the "on entry" action.
- Then, it returns a function value that, when called, does the corresponding "on exit" action.
func bigSlowOperation() { defer trace("bigSlowOperation")() // don't forget the extra parentheses // ...lots of work... time.Sleep(10 * time.Second) // simulate slow operation by sleeping } func trace(msg string) func() { start := time.Now() log.Printf("enter %s", msg) return func() { log.Printf("exit %s (%s)", msg, time.Since(start)) } }
By deferring a call to the returned function in this way, we can instrument the entry point and all exit points of a function in a single statement and even pass values, like the start time, between the two actions. Do not forget the final parentheses in the defer statement, or the "on entry" action will happen on exit and the on-exit action won't happen at all.
Each time bigSlowOperation
is called, it logs its entry and exit and the elapsed time between them.
$ go build gopl.io/ch5/trace $ ./trace 2015/11/18 09:53:26 enter bigSlowOperation 2015/11/18 09:53:36 exit bigSlowOperation (10.000589217s)
Accessing result variables *¶
Deferred functions run after return statements have updated the function's result variables. Because an anonymous function can access its enclosing function's variables, including named results, a deferred anonymous function can observe the function's results.
For the following function:
func double(x int) int { return x + x }
By naming its result variable and adding a defer
statement, we can make the function print its arguments and results each time it is called.
func double(x int) (result int) { defer func() { fmt.Printf("double(%d) = %d\n", x, result) }() return x + x } _=double(4) // Output: // "double(4) = 8"
This is useful in functions with many return statements.
A deferred anonymous function can even change the values that the enclosing function returns to its caller:
func triple(x int) (result int) { defer func() { result += x }() return double(x) } fmt.Println(triple(4)) // "12"
Caveats of deferred functions *¶
Because deferred functions aren't executed until the end of a function's execution, a defer
statement in a loop deserves extra scrutiny. The code below could run out of file descriptors since no file will be closed until all files have been processed:
for _, filename := range filenames { f, err := os.Open(filename) if err != nil { return err } defer f.Close() // NOTE: risky; could run out of file descriptors // ...process f... }
One solution is to move the loop body, including the defer
statement, into another function that is called on each iteration:
for _, filename := range filenames { if err := doFile(filename); err != nil { return err } } func doFile(filename string) error { f, err := os.Open(filename) if err != nil { return err } defer f.Close() // ...process f... }
The example below is an improved fetch
program (Section 1.5) that writes the HTTP response to a local file instead of to the standard output. It derives the file name from the last component of the URL path, which it obtains using the path.Base
function.
// Fetch downloads the URL and returns the // name and length of the local file. func fetch(url string) (filename string, n int64, err error) { resp, err := http.Get(url) if err != nil { return "", 0, err } defer resp.Body.Close() local := path.Base(resp.Request.URL.Path) if local == "/" { local = "index.html" } f, err := os.Create(local) if err != nil { return "", 0, err } n, err = io.Copy(f, resp.Body) // Close file, but prefer error from Copy, if any. if closeErr := f.Close(); err == nil { err = closeErr } return local, n, err }
- The call to
resp.Body.Close
is deferred. - It's tempting to use a second deferred call, to
f.Close
, to close the local file, but this would be subtly wrong becauseos.Create
opens a file for writing, creating it as needed. On many file systems, notably NFS, write errors are not reported immediately but may be postponed until the file is closed. Failure to check the result of the close operation could cause serious data loss to go unnoticed. - If both
io.Copy
andf.Close
fail, we should prefer to report the error fromio.Copy
since it occurred first and is more likely to tell us the root cause.
Panic¶
Go's type system catches many mistakes at compile time; other mistakes, such as an out-of-bounds array access or nil pointer dereference, require checks at run time. When the Go runtime detects these mistakes, it panics.
During a typical panic, normal execution stops, all deferred function calls in that goroutine are executed, and the program crashes with a log message:
- This log message includes the panic value, which is usually an error message and a stack trace (for each goroutine) showing the stack of function calls that were active at the time of the panic.
- This log message often has enough information to diagnose the root cause of the problem without running the program again, so it should always be included in a bug report about a panicking program.
Besides the panics that come from the runtime, the built-in panic
function may be called directly. It accepts any value as an argument. A panic is often the best thing to do when some "impossible" situation happens, for instance, execution reaches a case that logically can't happen:
switch s := suit(drawCard()); s { case "Spades": // ... case "Hearts": // ... case "Diamonds": // ... case "Clubs": // ... default: panic(fmt.Sprintf("invalid suit %q", s)) // Joker? }
It's good practice to assert that the preconditions of a function hold only if you can provide a more informative error message or detect an error sooner. Otherwise, there is no point asserting a condition that the runtime will check for you.
func Reset(x *Buffer) { if x == nil { panic("x is nil") // unnecessary! } x.elements = nil }
Panic vs. error
values *¶
Although Go's panic mechanism resembles exceptions in other languages, its usages are quite different. Since a panic causes the program to crash, it is generally used for grave errors, such as a logical inconsistency in the program; diligent programmers consider any crash to be proof of a bug in their code.
In a robust program, "expected" errors that arise from incorrect input, misconfiguration, or failing I/O, should be handled gracefully; they are best dealt with using error values.
Consider the function regexp.Compile
, which compiles a regular expression into an efficient form for matching. It returns an error if called with an ill-formed pattern, but checking this error is unnecessary and burdensome if the caller knows that a particular call cannot fail. In such cases, it's reasonable for the caller to handle an error by panicking, since it is believed to be impossible.
Since most regular expressions are literals in the program source code, the regexp
package provides a wrapper function regexp.MustCompile
that does this check:
package regexp func Compile(expr string) (*Regexp, error) { /* ... */ } func MustCompile(expr string) *Regexp { re, err := Compile(expr) if err != nil { panic(err) } return re }
This wrapper function makes it convenient for clients to initialize a package-level variable with a compiled regular expression, like this:
var httpSchemeRE = regexp.MustCompile(`^https?:`) // "http:" or "https:"
MustCompile
should not be called with untrusted input values. The Must
prefix is a common naming convention for functions of this kind, like template.Must
in Section 4.6.
Panic and deferred functions *¶
When a panic occurs, all deferred functions are run in reverse order, starting with those of the topmost function on the stack and proceeding up to main
. For example:
func main() { f(3) } func f(x int) { fmt.Printf("f(%d)\n", x+0/x) // panics if x == 0 defer fmt.Printf("defer %d\n", x) f(x - 1) }
When the program is run, it prints the following to the standard output:
f(3) f(2) f(1) defer 1 defer 2 defer 3
And the following to the standard error (simplified for clarity):
panic: runtime error: integer divide by zero main.f(0) src/gopl.io/ch5/defer1/defer.go:14 main.f(1) src/gopl.io/ch5/defer1/defer.go:16 main.f(2) src/gopl.io/ch5/defer1/defer.go:16 main.f(3) src/gopl.io/ch5/defer1/defer.go:16 main.main() src/gopl.io/ch5/defer1/defer.go:10
- A panic occurs during the call to
f(0)
, causing the three deferred calls tofmt.Printf
to run. - Then the runtime terminates the program, printing the panic message and a stack dump to the standard error stream.
For diagnostic purposes, the runtime
package lets the programmer dump the stack.
For example, the program below defers a call to printStack
in main
:
func main() { defer printStack() f(3) } func printStack() { var buf [4096]byte n := runtime.Stack(buf[:], false) os.Stdout.Write(buf[:n]) }
The following additional text (simplified for clarity) is printed to the standard output:
goroutine 1 [running]: main.printStack() src/gopl.io/ch5/defer2/defer.go:20 main.f(0) src/gopl.io/ch5/defer2/defer.go:27 main.f(1) src/gopl.io/ch5/defer2/defer.go:29 main.f(2) src/gopl.io/ch5/defer2/defer.go:29 main.f(3) src/gopl.io/ch5/defer2/defer.go:29 main.main() src/gopl.io/ch5/defer2/defer.go:15
Those familiar with exceptions in other languages may found that runtime.Stack
can print information about functions that seem to have already been "unwound". Go's panic mechanism runs the deferred functions before it unwinds the stack.
Recover¶
Giving up is not always the right response to a panic. It might be possible to recover in some way, or at least do the clean-up before quitting. For example, a web server that encounters an unexpected problem could close the connection rather than leave the client hanging.
If the built-in recover
function is called within a deferred function and the function containing the defer
statement is panicking, recover
ends the current state of panic and returns the panic value. The function that was panicking does not continue where it left off but returns normally. If recover
is called at any other time, it has no effect and returns nil
.
In the following parser example, instead of crashing, it turns these panics into ordinary parse errors.
func Parse(input string) (s *Syntax, err error) { defer func() { if p := recover(); p != nil { err = fmt.Errorf("internal error: %v", p) } }() // ...parser... }
The deferred function in Parse
recovers from a panic, using the panic value to construct an error message; a fancier version might include the entire call stack using runtime.Stack
. The deferred function then assigns to the err
result, which is returned to the caller.
Recovering indiscriminately from panics is a dubious practice because:
- The state of a package's variables after a panic is rarely well defined or documented. For example:
- A critical update to a data structure was incomplete.
- A file or network connection was opened but not closed.
- A lock was acquired but not released.
- Indiscriminate recovery may cause bugs to go unnoticed. For example, a crash is replaced with a line in a log file.
Recovering from a panic within the same package simplifies the handling of complex or unexpected errors. However, there is a general rule of exceptions:
- You should not attempt to recover from another package's panic. Public APIs should report failures as errors.
- You should not recover from a panic that may pass through a function you do not maintain, such as a caller-provided callback, since you cannot reason about its safety.
For example, the net/http
package provides a web server that dispatches incoming requests to user-provided handler functions. Rather than allowing a panic in one of these handlers kill the process, the server calls recover
, prints a stack trace, and continues serving. This is convenient in practice, but it does risk leaking resources or leaving the failed handler in an unspecified state that could lead to other problems.
For all the above reasons, it's safest to recover only from panics that were intended to be recovered from, which should be rare. This can be achieved by using a distinct, unexported type for the panic value and testing whether the value returned by recover has that type. If so, we report the panic as an ordinary error; if not, we call panic with the same value to resume the state of panic.
// soleTitle returns the text of the first non-empty title element // in doc, and an error if there was not exactly one. func soleTitle(doc *html.Node) (title string, err error) { type bailout struct{} defer func() { switch p := recover(); p { case nil: // no panic case bailout{}: // "expected" panic err = fmt.Errorf("multiple title elements") default: panic(p) // unexpected panic; carry on panicking } }() // Bail out of recursion if we find more than one non-empty title. forEachNode(doc, func(n *html.Node) { if n.Type == html.ElementNode && n.Data == "title" && n.FirstChild != nil { if title != "" { panic(bailout{}) // multiple title elements } title = n.FirstChild.Data } }, nil) if title == "" { return "", fmt.Errorf("no title element") } return title, nil }
The deferred handler function calls recover
, checks the panic value, and reports an ordinary error if the value was bailout{}
. All other non-nil values indicate an unexpected panic, in which case the handler calls panic with that value, undoing the effect of recover and resuming the original state of panic. [p153]
From some conditions there is no recovery. For example, running out of memory causes the Go runtime to terminate the program with a fatal error.
Doubts and Solution¶
Verbatim¶
p131 on error handling¶
After checking an error, failure is usually dealt with before success.
Question: What does it mean exactly?
p148 on panic¶
During a typical panic, normal execution stops, all deferred function calls in that goroutine are executed, and the program crashes with a log message.
Question: From this text, does it mean that all functions in Go are "goroutines"?