Pointers in Go

Published on: February 27, 2022

When I first started learning to write Go, I found two concepts most confusing at first: slices and pointers. Because, up until that point, I'd spent most of my time working with dynamic languages like Python and JavaScript, which do not support slices and explicit pointers.

"When should I use a pointer?" That's the key question I've had as I've learned about pointers. In some cases, it's clear that a pointer is the way to go: pointer receivers let methods modify their receivers, and a nil pointer signifies that a value is "missing". But in some other scenarios, I still have to rethink why some variable should be a pointer.

Global variables in Lox

I've recently been building an interpreter for a programming language called Lox (see Notes on Crafting Interpreters: Go, Ambiguous Grammars, and The Temporal Dead Zone in JavaScript). The struct that implements the core interpreter receives a list of statements representing the program and interprets (or executes) them in turn:

type Interpreter struct {
	// the current execution environment
	env environment
}

func (in *Interpreter) Interpret(statements []ast.Stmt) {
	for _, statement := range statements {
		in.execute(statement)
	}
}

(It's important to note that the env field has a type of environment. We'll discuss it in more detail in the next section.)

Program execution happens within a context (or an environment). The environment stores all the variables defined by and accessible to the program at each point in time. So, the environment struct defines methods to set and retrieve variable values.

type environment struct {
	values map[string]interface{}
}

func (e *environment) define(name string, value interface{}) {
	e.values[name] = value
}

func (e *environment) get(name ast.Token) (interface{}, error) {
	if val, ok := e.values[name.Lexeme]; ok {
		return val, nil
	}
	return nil, runtimeError{name, fmt.Sprintf("Undefined variable '%s'", name.Lexeme)}
}

The implementation of the interpreter worked. Declaring a variable with a var statement defined the variable name in the environment, while using the variable identifier within a statement or expression retrieved its value from the environment.

// in.env == {values:map[]}

var a = 30;

// in.env == {values:map[a:30]}

var b = 45;

// in.env == {values:map[a:30 b:45]}

print a * b / 2; // prints "675"

Local variables in Lox

Lox also supports local (or block-scoped) variables. Like global variables, local variables are also defined with a var statement. But their scope is limited within the block in which they are defined.

var a = 1;
var b = 2;
{
	var b = 3;
	var c = 4;
	print a; // prints 1 from global
	print b; // prints 3 from re-declared local
	print c; // prints 4 from local
}
print a; // prints 1 from global
print b; // prints 2 from global
print c; // Error: Undefined variable c

I implemented block scoping using a linked list of environment structs. In addition to its own values, an environment now holds a pointer to the environment of its parent (or enclosing) scope.

type environment struct {
	// environment of the parent scope
	enclosing *environment
	// values set in this environment
	values map[string]interface{}
}

In the environment's get method, it check its values to see if the variable is set in the current scope. And if it isn't, it recurses into the enclosing environment to look for the variable.

func (e *environment) get(name ast.Token) (interface{}, error) {
	if val, ok := e.values[name.Lexeme]; ok {
		return val, nil
	}
	if e.enclosing != nil {
		return e.enclosing.get(name)
	}
	return nil, runtimeError{name, fmt.Sprintf("Undefined variable '%s'", name.Lexeme)}
}

To execute a block statement (a sequence of statements inside a block), the interpreter creates a new environment, setting the current one as its "parent"; executes the body of the block within this environment; and then restores the initial environment at the end of the block.

func (in *Interpreter) VisitBlockStmt(stmt ast.BlockStmt) interface{} {
  // Create a new environment and set the current environment as enclosing
  blockEnv := environment{enclosing: &in.env}

  // Restore the current environment after executing this block
  previous := in.env
  defer func() { in.env = previous }()

  // Set the blockEnv as the new execution environment
  in.env = blockEnv

  // Then execute all the statements
  for _, statement := range statements {
		in.execute(statement)
	}
}

Testing the block scope

This implementation seemed to make sense. The interpreter could look up variables defined in the current scope:

var a = 1;
{
  var a = 2;
  print a; // 2
}
print a; // 2

But when a program tried to access a variable defined in an enclosing scope, the interpreter crashed with a stack overflow error:

var a = 1;
{
  print a;
}

/*
runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc0201603b8 stack=[0xc020160000, 0xc040160000]
fatal error: stack overflow

runtime stack:
runtime.throw({0x1124316, 0x11f9ae0})
	/usr/local/go/src/runtime/panic.go:1198 +0x71
runtime.newstack()
	/usr/local/go/src/runtime/stack.go:1088 +0x5ac
runtime.morestack()
	/usr/local/go/src/runtime/asm_amd64.s:461 +0x8b
*/

The offending line of code came from the get method of the environment struct, where we looked up the value of a variable from the enclosing scope.

func (e *environment) get(name ast.Token) (interface{}, error) {
	if val, ok := e.values[name.Lexeme]; ok {
		return val, nil
	}
	if e.enclosing != nil {
		return e.enclosing.get(name) // <<<<<<<<<
	}
	return nil, runtimeError{name, fmt.Sprintf("Undefined variable '%s'", name.Lexeme)}
}

For some reason, the pointer to the enclosing environment referred to the environment itself. The linked list of environment structs formed a cycle, and looking for the last value of the cycle produced the stack overflow error.

I instinctively suspected the issue might have been related to defining the env field in the Interpreter as a struct. (It was.) And so I changed it to a pointer to a struct, without thinking much more about it.

type Interpreter struct {
  // before: "env environment"
	env *environment
}

func (in *Interpreter) VisitBlockStmt(stmt ast.BlockStmt) interface{} {
  // before: "blockEnv := environment{enclosing: &in.env}"
  blockEnv := environment{enclosing: in.env}

  previous := in.env
  defer func() { in.env = previous }()

  // before: "in.env = blockEnv"
  in.env = &blockEnv

  // ...execute the block...
}

Changing those three lines worked, and the interpreter began to handle block scopes as expected. If you are familiar with how pointers work, you might have already caught why this happened. Here's a summary of the change and a more detailed review below.

Before	After
`Interpreter` is a struct with an `environment` field.	`Interpreter` is a struct with an `*environment` field.
The block environment is `environment{enclosing: &in.env}`, which is enclosed by whatever the current environment is.	The block environment is `environment{enclosing: in.env}`, which is enclosed by what the current environment is pointing to.
After setting the block environment to be the new environment, its `enclosing` field now points to itself.	After setting the block environment to be the new environment, its `enclosing` field still points to what the previous environment was pointing to.
(Not what we want)	(Exactly what we want)

A review of the first case

Let's take a closer look at the first implementation.

type Interpreter struct {
	env environment
}

Interpreter with environment struct

To interpret a block statement, we created a new environment with its enclosing field pointing to in.env.

blockEnv := environment{enclosing: &in.env}

Here's what that actually looks like:

Block environment pointing to interpreter environment

When we say "pointing to in.env", we mean that the value of blockEnv.enclosing is set to the memory address of the in.env field:

// Create a new interpreter with an environment
in := interpreter{env: environment{}}

// Create an environment for the block
blockEnv := environment{enclosing: &in.env}

// Print the address of in.env
fmt.Printf("%p", &in.env)             // 0xc00004a510

// Print the value of blockEnv.enclosing
fmt.Printf("%p", blockEnv.enclosing)  // 0xc00004a510

Even if the value of in.env changes, the address &in.env doesn't change and blockEnv.enclosing still points to it.

in.env = environment{}

fmt.Printf("%p", &in.env)            // 0xc00004a510
fmt.Printf("%p", blockEnv.enclosing) // 0xc00004a510

It may now be clearer how we set the environment to a structure that points to its own location when we assigned in.env = blockEnv.

Interpreter environment pointing to itself

A review of the second case

In the working version of the interpreter, we defined the env field as a pointer to an environment:

type Interpreter struct {
  env *environment
}

Interpreter with environment pointer

To execute a block, we create a new environment:

blockEnv := environment{enclosing: in.env}

In this version, we create a new environment struct. Its enclosing field takes (a copy of) the value of in.env, which is a pointer to the current environment.

Block environment pointing to same as interpreter environment

The value of blockEnv.enclosing is the memory address of the environment in.env points to, not the memory address of in.env itself.

in := interpreter{env: &environment{}}

blockEnv := environment{enclosing: in.env}

fmt.Printf("%p\n", &in.env)            // 0xc00011a680
fmt.Printf("%p\n", blockEnv.enclosing) // 0xc00010a500
fmt.Printf("%p\n", in.env)             // 0xc00010a500

If we-reassign in.env to a new environment, the address &in.env and the value of blockEnv.enclosing stay the same, while the pointer value of in.env changes:

in.env = &environment{}
fmt.Printf("%p\n", &in.env)            // 0xc00011a680
fmt.Printf("%p\n", blockEnv.enclosing) // 0xc00010a500
fmt.Printf("%p\n", in.env)             // 0xc00010a510

So when we set the interpreter's environment to the new environment we created, it links to the enclosing scope correctly.

Interpreter environment pointing to correct block environment

Coda

Two questions I've found to help me understand and use pointers better:

Do I want to point to X itself or share an underlying value with X? (The former means creating a pointer to X, while the latter implies changing X to be a pointer itself and using its pointer value.)
What do I expect to happen when the underlying value of the pointer changes?

While working on the problem in this post, I also learned about the memory layout of structs and how structure padding affects the sizes of structs, both of which are relevant to understanding how structs and pointers work in Go.

Global variables in Lox ​

Local variables in Lox ​

Testing the block scope ​

A review of the first case ​

A review of the second case ​