HTT Documentation

Implementing types in Go

This example implements some of the AST nodes from the book, Writing an Interpreter in Go, specifically as implemented in this git repository.

You will see nodes generally come in two types, Statement and Expression, implementing the empty method of either to mark the type of node. Beyond this, nodes implement String and TokenLiteral, and sometimes these are implemented in the same way.

Getting Started

We start by creating a template file, ast.htt and defining the package, implementing the base types and so on. All of this is regular Go code:

% <main>
package ast;

import (
	"bytes"
	"strings"

	"github.com/zanshin/interpreter/token"
)

type Node interface {
	TokenLiteral() string
	String() string
}

type Statement interface {
	Node
	statementNode()
}

type Expression interface {
	Node
	expressionNode()
}
% </main>
Rendering this essentially produces the same output:
package ast;
            
            import (
            	"bytes"
            	"strings"
            
            	"github.com/zanshin/interpreter/token"
            )
            
            type Node interface {
            	TokenLiteral() string
            	String() string
            }
            
            type Statement interface {
            	Node
            	statementNode()
            }
            
            type Expression interface {
            	Node
            	expressionNode()
            }

Rendering Nodes

Now let's start rendering nodes.

Defining the model

To start, let's define some data describing the nodes we want. When speaking of code generation, the data we are generating code from is often called the model. For larger projects, I would store the model in a separate file. That way, template files could refer to it.

For small, self-contained projects, we can embed the model with the template. When compiling a template file (ast.htt), HTT will look for a corresponding Lua file (ast.htt.lua) matching the name of the template file, with the prefix .lua.

Hence, here is the ast.htt.lua file to accompany our ast.htt template file:

function nodes()
    return {
        {
			is_expr = true,
			short = "i",
			name = "Identifier",
			fields = {
				{ name = "Value", type = "string" }
			}
		},
        {
			is_expr = true,
			short = "il",
			name = "IntegerLiteral",
			fields = {
				{ name = "Value", type = "int64" }
			}
		},
    }
end

The code in ast.htt.lua is, as the name implies, regular Lua code. The function nodes returns a list (table) of (associative) tables, each describing a type of AST node. I like to wrap the data in a function to ensure I cannot modify global state and impact other rendering code.

Defining how to render a node

To use this data, let's define a component, node, and just render out the name of the component for now:

% <node>
{{ctx.name}}
% </node>

Defining how to render all nodes

To render all nodes, we define a component render_nodes, which loops over each entry in the model (the nodes function in ast.htt.lua) and renders it:
% <render_nodes>
% for _, elem in ipairs(nodes()) do
{{@ node elem }}
% end
% </render_nodes>

You could add these lines directly to main. I normally would. But this way I can show you changes to this component in isolation.

Putting it all together

Finally, we call render_nodes from our main component. Since we have no data to pass to render_nodes, we pass an empty Lua table ({}). Our main component, thus becomes:

% <main>
package ast;

import (
	"bytes"
	"strings"

	"github.com/zanshin/interpreter/token"
)

type Node interface {
	TokenLiteral() string
	String() string
}

type Statement interface {
	Node
	statementNode()
}

type Expression interface {
	Node
	expressionNode()
}

{{@ render_nodes {} }}
% </main>

Output

package ast;
            
            import (
            	"bytes"
            	"strings"
            
            	"github.com/zanshin/interpreter/token"
            )
            
            type Node interface {
            	TokenLiteral() string
            	String() string
            }
            
            type Statement interface {
            	Node
            	statementNode()
            }
            
            type Expression interface {
            	Node
            	expressionNode()
            }
            
            Identifier
            IntegerLiteral
Success! We see the names of the nodes we defined in our model below the Go code we started out with. Now we can start defining in earnest how the AST nodes are to be rendered.

Rendering the Node struct

Let's start by rendering the struct for east AST node:
% <node>
type {{ctx.name}} struct {
    Token token.Token
    % for _, field in ipairs(ctx.fields) do
    {{field.name}} {{field.type}}
    % end
}
% </node>
The nodes now render as:
type Identifier struct {
                Token token.Token
                Value string
            }
            type IntegerLiteral struct {
                Token token.Token
                Value int64
            }
Before moving on, let's add some whitespace between components. You can do this in multiple ways, but inserting an empty line ahead of rendering each component in loop body of render_nodes works well:
% <render_nodes>
% for _, elem in ipairs(nodes()) do

{{@ node elem }}
% end
% </render_nodes>
Now we get:

            type Identifier struct {
                Token token.Token
                Value string
            }
            
            type IntegerLiteral struct {
                Token token.Token
                Value int64
            }

Implementing the node interface

For now, we only handle nodes whose String and TokenLiteral implementations both return the value of .Token.Literal. Beyond that, we must implement either the empty function expressionNode() or statementNode(), depending on the type of node, as identified by the is_expr field for the node in our model:
% <node>
type {{ctx.name}} struct {
    Token token.Token
    % for _, field in ipairs(ctx.fields) do
    {{field.name}} {{field.type}}
    % end
}

% if ctx.is_expr == true then
func ({{ctx.short}} *{{ctx.name}}) expressionNode() {}
% else
func ({{ctx.short}} *{{ctx.name}}) statementNode() {}
% end

func ({{ctx.short}} *{{ctx.name}}) TokenLiteral() string {
  return {{ctx.short}}.Token.Literal
}

func ({{ctx.short}} *{{ctx.name}}) String() string {
  return {{ctx.short}}.Token.Literal
}
% </node>
Now, the code for the nodes themselves becomes:

            type Identifier struct {
                Token token.Token
                Value string
            }
            
            func (i *Identifier) expressionNode() {}
            
            func (i *Identifier) TokenLiteral() string {
              return i.Token.Literal
            }
            
            func (i *Identifier) String() string {
              return i.Token.Literal
            }
            
            type IntegerLiteral struct {
                Token token.Token
                Value int64
            }
            
            func (il *IntegerLiteral) expressionNode() {}
            
            func (il *IntegerLiteral) TokenLiteral() string {
              return il.Token.Literal
            }
            
            func (il *IntegerLiteral) String() string {
              return il.Token.Literal
            }

Overriding the String method

Some nodes implement a custom String() method to stringify themselves and their contents. To support this, let us extend the model such that each node can define a string_fn attribute, pointing to a component which implements the String() method.

Referencing components from the model.

Recall that we defined our components in the template ast.htt and put our accompanying model in ast.htt.lua, and that it is these two files combined which will

To support this, we need a way to reference the components to use for implementing the String() method. One way to solve this is to send along a reference to the module generated from the template file itself.

All templates can use the variable M to refer to their own modules. So we can just send this along to the nodes() function which returns the model:

% <render_nodes>
% for _, elem in ipairs(nodes(M)) do

{{@ node elem }}
% end
% </render_nodes>

From the model function, we can now refer to components in the template by their name. So m.node would refer to the node component defined in ast.htt

function nodes(m)
    -- `m` is a reference to the template module, now we can reference
    -- components like `node` as `m.node`.
end

With this change, we update the model, expanding the number of nodes we implement. Some, like the ReturnStatement node, will now use a custom component (here: return_statement_string) to implement the body of their String() method.

function nodes(m)
	return {
		{
			is_expr = true,
			short = "i",
			name = "Identifier",
			fields = {
				{ name = "Value", type = "string" }
			}
		},
		{
			is_expr = false,
			short = "rs",
			name = "ReturnStatement",
			fields = {
				{ name = "ReturnValue", type = "Expression" }
			},
			string_fn = m.return_statement_string,
		},
		{
			is_expr = false,
			short = "es",
			name = "ExpressionStatement",
			fields = {
				{ name = "Expression", type = "Expression" }
			},
			string_fn = m.expr_statement_string,
		},
		{
			is_expr = true,
			short = "il",
			name = "IntegerLiteral",
			fields = {
				{ name = "Value", type = "int64" }
			}
		},
		{
			is_expr = true,
			short = "pe",
			name = "PrefixExpression",
			fields = {
				{ name = "Operator", type = "string" },
				{ name = "Right",    type = "Expression" },
			},
			string_fn = m.prefix_expr_string,
		}
	}
end

Composing components

The key change in the template is how we implement node Notice now that we check if ctx.string_fn is defined, and if so, renders that component:

% <node>
type {{ctx.name}} struct {
    Token token.Token
    % for _, field in ipairs(ctx.fields) do
    {{field.name}} {{field.type}}
    % end
}

% if ctx.is_expr == true then
func ({{ctx.short}} *{{ctx.name}}) expressionNode() {}
% else
func ({{ctx.short}} *{{ctx.name}}) statementNode() {}
% end

func ({{ctx.short}} *{{ctx.name}}) TokenLiteral() string {
  return {{ctx.short}}.Token.Literal
}

func ({{ctx.short}} *{{ctx.name}}) String() string {
  % if ctx.string_fn then
  {{@ ctx.string_fn ctx }}
  % else
  return {{ctx.short}}.Token.Literal
  % end
}
% </node>

This demonstrates how HTT components can be passed as arguments to other components which can render them. Also notice that since all compontents take exactly one table argument, we can pass all arguments along by passing ctx. These two facts combine to make components very composable.

Implementing the custom String() methods

The final change to the ast.htt template file is implementing the components for the custom String() methods.

There is nothing to these components aside from noting that since we passed the entire ctx from the node component along to these, we still have access to attributes like short from the model describing the node.

% <return_statement_string>
var out bytes.Buffer

out.WriteString(rs.Token.Literal + " ")
if rs.ReturnValue != nil {
	out.WriteString(rs.ReturnValue.String())
}

out.WriteString(";")

return out.String()
% </return_statement_string>
% <expr_statement_string>
if {{ctx.short}}.Expression != nil {
	return {{ctx.short}}.Expression.String()
}
return ""
% </expr_statement_string>
% <prefix_expr_string>
var out bytes.Buffer

out.WriteString("(")
out.WriteString({{ctx.short}}.Operator)
out.WriteString({{ctx.short}}.Right.String())
out.WriteString(")")

return out.String()
% </prefix_expr_string>

Output


            type Identifier struct {
                Token token.Token
                Value string
            }
            
            func (i *Identifier) expressionNode() {}
            
            func (i *Identifier) TokenLiteral() string {
              return i.Token.Literal
            }
            
            func (i *Identifier) String() string {
              return i.Token.Literal
            }
            
            type ReturnStatement struct {
                Token token.Token
                ReturnValue Expression
            }
            
            func (rs *ReturnStatement) statementNode() {}
            
            func (rs *ReturnStatement) TokenLiteral() string {
              return rs.Token.Literal
            }
            
            func (rs *ReturnStatement) String() string {
              var out bytes.Buffer
              
              out.WriteString(rs.Token.Literal + " ")
              if rs.ReturnValue != nil {
              	out.WriteString(rs.ReturnValue.String())
              }
              
              out.WriteString(";")
              
              return out.String()
            }
            
            type ExpressionStatement struct {
                Token token.Token
                Expression Expression
            }
            
            func (es *ExpressionStatement) statementNode() {}
            
            func (es *ExpressionStatement) TokenLiteral() string {
              return es.Token.Literal
            }
            
            func (es *ExpressionStatement) String() string {
              if es.Expression != nil {
              	return es.Expression.String()
              }
              return ""
            }
            
            type IntegerLiteral struct {
                Token token.Token
                Value int64
            }
            
            func (il *IntegerLiteral) expressionNode() {}
            
            func (il *IntegerLiteral) TokenLiteral() string {
              return il.Token.Literal
            }
            
            func (il *IntegerLiteral) String() string {
              return il.Token.Literal
            }
            
            type PrefixExpression struct {
                Token token.Token
                Operator string
                Right Expression
            }
            
            func (pe *PrefixExpression) expressionNode() {}
            
            func (pe *PrefixExpression) TokenLiteral() string {
              return pe.Token.Literal
            }
            
            func (pe *PrefixExpression) String() string {
              var out bytes.Buffer
              
              out.WriteString("(")
              out.WriteString(pe.Operator)
              out.WriteString(pe.Right.String())
              out.WriteString(")")
              
              return out.String()
            }

Homework: refactoring

I wanted to implement this piece-meal, with you looking on, to get a sense of the process rather than just seeing the finished thing. As a next step, I would refactor the model out into its own file, ast_model.lua. If we did this, we would not have to pass the model (self) reference (M) from the template to the nodes() model function in ast.htt.lua.

To achieve this, we would:

  1. move the model code from ast.htt.lua to ast_model.lua
  2. in ast.htt, import the module by adding % local model = require 'ast_model' to the top of the file
  3. in ast.htt, change the call to nodes(M) to model.nodes()
  4. in ast_model.lua, import the template module by adding local tpl = require '//ast.htt' to the top of the file.
  5. in ast_model.lua, change all references to the model m.component to tpl.component.