Building an Expression Evaluator (continued)

March 17, 2021

In my previous post, we explored how expression evaluation works and built a program, evaluate, that returns the result of an arithmetic expression.

Here, we'll extend the evaluator with a few more features. We'll add support for predefined functions, like MAX and SQRT. Then, we'll add relational operations: three new operators (<, >, =) and an IF function. And finally, we'll create an environment: a place to store and retrieve the results of an expression.

Let's start with a recap of the evaluation process. In the first stage of evaluation, we extract the tokens in the arithmetic expression string. Tokens are the meaningful units of an expression. Numbers and symbols are tokens, but whitespace characters aren't.

tokenize('2 + 3 * (5 + 4)'); // [2, '+', 3, '*', '(', 5, '+', 4, ')']

In the second stage, we convert the tokens from infix notation to Reverse Polish notation (RPN). In infix notation, which you are probably more familiar with, we place operators between their operands. For example, 1 + 2. In RPN, also called postfix notation, we place operators after their operands. For example, 1 2 +.

Unlike infix notation, RPN doesn't need parentheses to indicate the order of evaluation of an expression. And so, it's easier to evaluate than infix notation.

toRPN([2, '+', 3, '*', '(', 5, '+', 4, ')']); // [2, 3, 5, 4, '+', '*', '+']

In the final stage, we evaluate the RPN expression.

evalRPN([2, 3, 5, 4, '+', '*', '+']); // 29

Functions

We'll predefine two functions in the evaluator: MAX and SQRT. MAX(a, b) will return the larger of a and b, while SQRT will return the square root of a.

evaluate('45 + SQRT(9)'); // 48
evaluate('3 * MAX(4, 19)'); // 57

Tokenizing function names

In the tokenization stage, we need to parse two new types of tokens: function names (like SQRT) and the comma symbol that separates function arguments.

A valid function name may contain one or more uppercase letters. In tokenize, if we find an uppercase letter in the expression, we'll check for the other characters that form the name and push them as a single string to the list of tokens.

if (/[A-Z]/.test(char)) {
  let name = '';

  while (scanner < input.length && /[A-Z]/.test(input[scanner])) {
    name += input[scanner++];
  }

  tokens.push(name);
  continue;
}

If the current token is a comma, we'll push it to the tokens array, as we do the other symbols.

if (/[+\-/*(),^]/.test(char)) {
  tokens.push(char);
  scanner++;
  continue;
}

Evaluating RPNs with function names

After tokenization, we need to convert the tokens to RPN and then evaluate the RPN expression. But we don’t yet know how RPN expressions that contain function names look. Let's try to figure that out first.

We'll take MAX(4, 19) * 3 as an example. To evaluate this expression, we would:

Get the maximum value of 4 and 19, which is 19
Multiply 19 by 3 to get 57

Looking closely at these two steps, you'll notice that we use MAX and * similarly. We "max 4 and 19" like we "multiply 19 and 3". Functions work just like operators: they return the result of operating on arguments (or operands).

So, we can represent MAX(4, 19) * 3 in RPN as 4 19 MAX 3 *.

Remember that, to evaluate an RPN expression, we perform operations from left to right till we get a final value. When we find a number in the expression, we push it to a stack. When we find an operator, we pop the topmost two numbers in the stack, operate on them, and then push the result back onto the stack. At the end of the expression, we return the last result left in the stack.

function evalRPN(rpn) {
  const stack = [];

  for (let i = 0; i < rpn.length; i++) {
    const token = rpn[i];

    if (/[+\-/*^]/.test(token)) {
      stack.push(operate(token, stack));
      continue;
    }

    if (typeof token === 'number') {
      stack.push(token);
      continue;
    }
  }

  return stack.pop();
}

When the current token in the expression is a function name, we'll call a function called apply. And then, we'll push its result back to the stack (just like we do with operators).

if (/^[A-Z]/.test(token)) {
  stack.push(apply(token, stack));
  continue;
}

apply works very much like operate. We'll pop the function's arguments from the stack, and then return the result of applying the function to the arguments.

function apply(func, stack) {
  if (func === 'MAX') {
    const b = stack.pop();
    const a = stack.pop();
    return Math.max(a, b);
  }

  if (func === 'SQRT') {
    const a = stack.pop();
    return Math.sqrt(a);
  }

  throw new Error(`Undefined function: ${func}`);
}

Converting infix expressions with function names to RPN

Finally, we'll see how to convert infix expressions containing function names to RPN.

toRPN(['MAX', '(', 4, ',', 19, ')', '*', 4]); // [4, 19, 'MAX', 4, '*']

Let's start by reviewing how we convert infix expressions to RPN.

function toRPN(tokens) {
  // First, we set up a stack to hold operators that should
  // not yet be in the final RPN expression
  const operators = [];

  // ... and an array to hold the final expression
  const out = [];

  // For each token in the infix expression...
  for (let i = 0; i < tokens.length; i++) {
    const token = tokens[i];

    // If the token is a number, we'll push it onto `out`
    if (typeof token === 'number') {
      out.push(token);
      continue;
    }

    // If the token is an operator...
    if (/[+\-/*^]/.test(token)) {
      // While there are operators in `operators` with higher precedence than
      // the current token, we'll unwind them onto `out`
      while (shouldUnwindOperatorStack(operators, token)) {
        out.push(operators.pop());
      }

      // Then we'll push the token to `operators`
      operators.push(token);
      continue;
    }

    // If the token is a left parenthesis symbol, we'll push it to `operators`
    if (token === '(') {
      operators.push(token);
      continue;
    }

    // If the token is a right parenthesis symbol...
    if (token === ')') {
      // While there are operators in `operators` and we haven't yet reached the
      // last left parenthesis, we'll unwind them onto `out`
      while (operators.length > 0 && operators[operators.length - 1] !== '(') {
        out.push(operators.pop());
      }

      // Then we'll remove the last left parenthesis
      operators.pop();
      continue;
    }
  }

  // Finally we'll unwind all the remaining operators onto `out`
  for (let i = operators.length - 1; i >= 0; i--) {
    out.push(operators[i]);
  }

  // And then return `out`
  return out;
}

Now, we need this function to handle function names and commas. As we've already seen, functions in the RPN expression act like operators. When the current token in the infix expression is a function name, we'll push it to the operators stack.

if (/[A-Z]/.test(token)) {
  operators.push(token);
  continue;
}

Remember that in shouldUnwindOperatorStack(operators, token), we check whether the topmost operator in the operators stack has higher precedence than token. If it does, we pop it out of the operators stack and push it onto out.

We need to decide what the precedence of a function is. In an expression like 5 * MAX(4, 2), we evaluate the function before the multiplication. Functions have higher precedence than their neighboring mathematical operators. We'll unwind the operators stack if the topmost operator is a function, regardless of whatever nextToken is.

const precedence = { '^': 3, '*': 2, '/': 2, '+': 1, '-': 1 };

function shouldUnwindOperatorStack(operators, nextToken) {
  if (operators.length === 0) {
    return false;
  }

  const lastOperator = operators[operators.length - 1];
  return (
    /[A-Z]/.test(lastOperator) || // unwind, if lastOperator is a function
    precedence[lastOperator] >= precedence[nextToken]
  );
}

Next, we'll take a look at commas. To evaluate the infix expression, 3 * MAX(1 + 4, 2 - 8), we would:

Add 1 and 4 together to get 5
Subtract 8 from 2 to get -6
Get the maximum value of 5 (from step 1) and -6 (from step 2), which is 5
Multiply 3 by 5 to get 8

We can represent this evaluation with the RPN: 3 1 4 + 2 8 - MAX *. Looking closely, we see that the comma in the infix expression behaves like a closing parenthesis. The sub-expression between the comma and the opening parenthesis, 1 + 4, has higher precedence than the operations before it. (In other words: we evaluate sub-expressions that are function arguments before evaluating the function itself.)

Just like the closing parenthesis, when the current token in toRPN is a comma, we would unwind the operators in the stack until we get to the most recent opening parenthesis. However, we won't pop out the opening parenthesis from the stack. We'll leave that for the closing parenthesis to do.

if (token === ',') {
  while (operators.length > 0 && operators[operators.length - 1] !== '(') {
    out.push(operators.pop());
  }
  continue;
}

Relational operations

In this next section, we'll implement three relational operations: less than (<), greater than (>), and equal to (=); and a function, IF(a, b, c), which returns b if a is true, or c if it isn't.

We'll start by extending the list of valid operators in the evaluator. /[+\-/*(),^]/ becomes /[+\-/*(),^<>=]/. Then, in the RPN evaluation stage, we'll add the implementations of the relational operators.

function operate(operator, stack) {
  const b = stack.pop();
  const a = stack.pop();

  switch (operator) {
    // ...

    case '<':
      return a < b;
    case '>':
      return a < b;
    case '=':
      return a === b;
  }
}

And next, we'll implement IF(a, b, c) in the apply function we wrote in the first section.

function apply(func, stack) {
  // ...

  if (func === 'IF') {
    const ifFalse = stack.pop();
    const ifTrue = stack.pop();
    const predicate = stack.pop();
    return predicate ? ifTrue : ifFalse;
  }
}

Variables

In this final section, we'll implement an environment in the evaluator. An environment is a collection of variables and their corresponding values.

Let's initialize the environment with a few handy math constants.

const environment = {
  PI: Math.PI,
  E: Math.E,
};

Accessing variables

To access a variable from the environment, we'll use a dollar sign followed by the name of the variable. For example, to calculate the area of a circle with a radius of 5:

evaluate('$PI * 5 * 5'); // 78.5398...

Tokenizing variable names

In tokenize, we'll extend the regex that matches function names to also match variable names. /[A-Z]/ becomes /[A-Z$]/.

tokenize('$PI + 34'); // ['$PI', '+', 34]

Converting infix expressions with variable names to RPN

In the next stage of the evaluation, we convert the infix expression into RPN. Variable names represent values (like numbers), not operations (like operators or functions). So, we'll handle them like numbers.

While converting from infix notation to RPN, when the current token is a variable name, we'll push it to the out array.

if (typeof token === 'number' || /^\$/.test(token)) {
  out.push(token);
  continue;
}

For the infix expression, ['$PI', '+', 34], toRPN will return ['$PI', 34, '+'].

Evaluating RPN expressions with variable names

The final stage of the evaluation is where we apply operators and functions to values in the expression. And so it is here that we replace the variable name with its value.

While evaluating the RPN expression, if the current token is a variable name, we'll attempt to get its value from the environment. If the value doesn't exist, we'll throw an error. But if it does, we'll push the value onto the stack.

if (/^\$/.test(token)) {
  const value = environment[token.slice(1)]; // .slice(1) removes the '$' character

  if (value === undefined) {
    throw new Error(`${token} is undefined`);
  }

  stack.push(value);
  continue;
}

After updating all three sections, our program is now able to evaluate expressions that contain variables from the environment.

evaluate('$PI * 78 + $E'); // 247.7625088084629

Setting variables

Next, we'll look at a way to set and change the values of variables. We'll add a function, SET(#X, Y), which sets the value of variable X to Y.

Why do we use a different syntax, #X, to reference variable X when we already have $X?

The distinction between $X and #X is that $X represents the value of the variable. If we evaluate the expression, SET($X, 50), the evaluator will attempt to get the value of X from the environment and then try to set the value of that value to 50. If $X was previously 20, SET($X, 50) becomes SET(20, 50). That's not what we want.

What we want is a way to point to the place of the variable. SET(#X, Y) means: set the value of the place called X to Y. We'll refer to #X as a variable pointer to X.

(We can use both a variable and its pointer in an expression. For example, SET(#X, $X + 1) increments the value of X by 1.)

Tokenizing variable pointer names and SET

We'll parse the variable pointer name like we parse function names and variable names. We'll add the hash character, #, to the regex that matches names in tokenize() and toRPN().

tokenize('SET(#X, 50)'); // ['SET', '(', '#X', ',', 50, ')']
toRPN(['SET', '(', '#X', ',', 50, ')']); // ['#X', 50, 'SET']

Evaluating RPN expressions with variable pointer names and SET

Remember that in evalRPN, when the current token is a number, we push the number to the stack for the operators and functions to pick up. We'll do the same for the variable pointer names so that SET picks them up.

if (typeof token === 'number' || /^\#/.test(token)) {
  stack.push(token);
  continue;
}

In the implementation of SET, we'll pop the last two items from the stack, which will be the variable name and its new value. We'll update the environment and then return the value we set.

if (func === 'SET') {
  const value = stack.pop();
  const key = stack.pop();
  environment[key.slice(1)] = value; // .slice(1) removes the '#' character
  return value;
}

Conclusion

We added a handful of features to the evaluator in this post. Our nifty, little program can now evaluate predefined functions, relational operations, and variables.

There are many more places to go in the world of expression evaluators: string primitives, other predefined functions, lambdas, and more. But I'll have to leave them up to you to explore. Thanks for reading.

Try out the evaluator online or check out the complete code on Github.