This example implements some of the AST nodes from the book, Writing an Interpreter in Go, specifically as implemented in this git repository.
You will see nodes generally come in two types, Statement
and Expression
, implementing the empty method of either to mark the type of node.
Beyond this, nodes implement String
and TokenLiteral
, and sometimes these are implemented in the same way.
We start by creating a template file, ast.htt
and defining the package, implementing the base types and so on. All of this is regular Go code:
% <main>
package ast;
import (
"bytes"
"strings"
"github.com/zanshin/interpreter/token"
)
type Node interface {
TokenLiteral() string
String() string
}
type Statement interface {
Node
statementNode()
}
type Expression interface {
Node
expressionNode()
}
% </main>
package ast;
import (
"bytes"
"strings"
"github.com/zanshin/interpreter/token"
)
type Node interface {
TokenLiteral() string
String() string
}
type Statement interface {
Node
statementNode()
}
type Expression interface {
Node
expressionNode()
}
Now let's start rendering nodes.
To start, let's define some data describing the nodes we want. When speaking of code generation, the data we are generating code from is often called the model. For larger projects, I would store the model in a separate file. That way, template files could refer to it.
For small, self-contained projects, we can embed the model with the template. When compiling a template file (ast.htt
), HTT will look for a corresponding Lua file (ast.htt.lua
) matching the name of the template file, with the prefix .lua
.
Hence, here is the ast.htt.lua
file to accompany our ast.htt
template file:
function nodes()
return {
{
is_expr = true,
short = "i",
name = "Identifier",
fields = {
{ name = "Value", type = "string" }
}
},
{
is_expr = true,
short = "il",
name = "IntegerLiteral",
fields = {
{ name = "Value", type = "int64" }
}
},
}
end
The code in ast.htt.lua
is, as the name implies, regular Lua code. The function nodes
returns a list (table) of (associative) tables, each describing a type of AST node. I like to wrap the data in a function to ensure I cannot modify global state and impact other rendering code.
To use this data, let's define a component, node
, and just render out the name of the component for now:
% <node>
{{ctx.name}}
% </node>
render_nodes
, which loops over each entry in the model (the nodes
function in ast.htt.lua
) and renders it:
% <render_nodes>
% for _, elem in ipairs(nodes()) do
{{@ node elem }}
% end
% </render_nodes>
You could add these lines directly to main
. I normally would. But this way I can show you changes to this component in isolation.
Finally, we call render_nodes
from our main
component. Since we have no data to pass to render_nodes
, we pass an empty Lua table ({}
).
Our main component, thus becomes:
% <main>
package ast;
import (
"bytes"
"strings"
"github.com/zanshin/interpreter/token"
)
type Node interface {
TokenLiteral() string
String() string
}
type Statement interface {
Node
statementNode()
}
type Expression interface {
Node
expressionNode()
}
{{@ render_nodes {} }}
% </main>
package ast;
import (
"bytes"
"strings"
"github.com/zanshin/interpreter/token"
)
type Node interface {
TokenLiteral() string
String() string
}
type Statement interface {
Node
statementNode()
}
type Expression interface {
Node
expressionNode()
}
Identifier
IntegerLiteral
% <node>
type {{ctx.name}} struct {
Token token.Token
% for _, field in ipairs(ctx.fields) do
{{field.name}} {{field.type}}
% end
}
% </node>
type Identifier struct {
Token token.Token
Value string
}
type IntegerLiteral struct {
Token token.Token
Value int64
}
render_nodes
works well:
% <render_nodes>
% for _, elem in ipairs(nodes()) do
{{@ node elem }}
% end
% </render_nodes>
type Identifier struct {
Token token.Token
Value string
}
type IntegerLiteral struct {
Token token.Token
Value int64
}
String
and TokenLiteral
implementations both return the value of .Token.Literal
.
Beyond that, we must implement either the empty function expressionNode()
or statementNode()
, depending on the type of node, as identified by the is_expr
field for the node in our model:
% <node>
type {{ctx.name}} struct {
Token token.Token
% for _, field in ipairs(ctx.fields) do
{{field.name}} {{field.type}}
% end
}
% if ctx.is_expr == true then
func ({{ctx.short}} *{{ctx.name}}) expressionNode() {}
% else
func ({{ctx.short}} *{{ctx.name}}) statementNode() {}
% end
func ({{ctx.short}} *{{ctx.name}}) TokenLiteral() string {
return {{ctx.short}}.Token.Literal
}
func ({{ctx.short}} *{{ctx.name}}) String() string {
return {{ctx.short}}.Token.Literal
}
% </node>
type Identifier struct {
Token token.Token
Value string
}
func (i *Identifier) expressionNode() {}
func (i *Identifier) TokenLiteral() string {
return i.Token.Literal
}
func (i *Identifier) String() string {
return i.Token.Literal
}
type IntegerLiteral struct {
Token token.Token
Value int64
}
func (il *IntegerLiteral) expressionNode() {}
func (il *IntegerLiteral) TokenLiteral() string {
return il.Token.Literal
}
func (il *IntegerLiteral) String() string {
return il.Token.Literal
}
Some nodes implement a custom String()
method to stringify themselves and their contents.
To support this, let us extend the model such that each node can define a string_fn
attribute, pointing to a component which implements the String()
method.
Recall that we defined our components in the template ast.htt
and put our accompanying model in ast.htt.lua
, and that it is these two files combined which will
To support this, we need a way to reference the components to use for implementing the String()
method. One way to solve this is to send along a reference to the module generated from the template file itself.
All templates can use the variable M
to refer to their own modules. So we can just send this along to the nodes()
function which returns the model:
% <render_nodes>
% for _, elem in ipairs(nodes(M)) do
{{@ node elem }}
% end
% </render_nodes>
From the model function, we can now refer to components in the template by their name. So m.node
would refer to the node
component defined in ast.htt
function nodes(m)
-- `m` is a reference to the template module, now we can reference
-- components like `node` as `m.node`.
end
With this change, we update the model, expanding the number of nodes we implement. Some, like the ReturnStatement
node, will now use a custom component (here: return_statement_string
) to implement the body of their String()
method.
function nodes(m)
return {
{
is_expr = true,
short = "i",
name = "Identifier",
fields = {
{ name = "Value", type = "string" }
}
},
{
is_expr = false,
short = "rs",
name = "ReturnStatement",
fields = {
{ name = "ReturnValue", type = "Expression" }
},
string_fn = m.return_statement_string,
},
{
is_expr = false,
short = "es",
name = "ExpressionStatement",
fields = {
{ name = "Expression", type = "Expression" }
},
string_fn = m.expr_statement_string,
},
{
is_expr = true,
short = "il",
name = "IntegerLiteral",
fields = {
{ name = "Value", type = "int64" }
}
},
{
is_expr = true,
short = "pe",
name = "PrefixExpression",
fields = {
{ name = "Operator", type = "string" },
{ name = "Right", type = "Expression" },
},
string_fn = m.prefix_expr_string,
}
}
end
The key change in the template is how we implement node
Notice now that we check if ctx.string_fn
is defined, and if so, renders that component:
% <node>
type {{ctx.name}} struct {
Token token.Token
% for _, field in ipairs(ctx.fields) do
{{field.name}} {{field.type}}
% end
}
% if ctx.is_expr == true then
func ({{ctx.short}} *{{ctx.name}}) expressionNode() {}
% else
func ({{ctx.short}} *{{ctx.name}}) statementNode() {}
% end
func ({{ctx.short}} *{{ctx.name}}) TokenLiteral() string {
return {{ctx.short}}.Token.Literal
}
func ({{ctx.short}} *{{ctx.name}}) String() string {
% if ctx.string_fn then
{{@ ctx.string_fn ctx }}
% else
return {{ctx.short}}.Token.Literal
% end
}
% </node>
This demonstrates how HTT components can be passed as arguments to other components which can render them. Also notice that since all compontents take exactly one table argument, we can pass all arguments along by passing ctx
.
These two facts combine to make components very composable.
The final change to the ast.htt
template file is implementing the components for the custom String()
methods.
There is nothing to these components aside from noting that since we passed the entire ctx
from the node
component along to these, we still have access to attributes like short
from the model describing the node.
% <return_statement_string>
var out bytes.Buffer
out.WriteString(rs.Token.Literal + " ")
if rs.ReturnValue != nil {
out.WriteString(rs.ReturnValue.String())
}
out.WriteString(";")
return out.String()
% </return_statement_string>
% <expr_statement_string>
if {{ctx.short}}.Expression != nil {
return {{ctx.short}}.Expression.String()
}
return ""
% </expr_statement_string>
% <prefix_expr_string>
var out bytes.Buffer
out.WriteString("(")
out.WriteString({{ctx.short}}.Operator)
out.WriteString({{ctx.short}}.Right.String())
out.WriteString(")")
return out.String()
% </prefix_expr_string>
type Identifier struct {
Token token.Token
Value string
}
func (i *Identifier) expressionNode() {}
func (i *Identifier) TokenLiteral() string {
return i.Token.Literal
}
func (i *Identifier) String() string {
return i.Token.Literal
}
type ReturnStatement struct {
Token token.Token
ReturnValue Expression
}
func (rs *ReturnStatement) statementNode() {}
func (rs *ReturnStatement) TokenLiteral() string {
return rs.Token.Literal
}
func (rs *ReturnStatement) String() string {
var out bytes.Buffer
out.WriteString(rs.Token.Literal + " ")
if rs.ReturnValue != nil {
out.WriteString(rs.ReturnValue.String())
}
out.WriteString(";")
return out.String()
}
type ExpressionStatement struct {
Token token.Token
Expression Expression
}
func (es *ExpressionStatement) statementNode() {}
func (es *ExpressionStatement) TokenLiteral() string {
return es.Token.Literal
}
func (es *ExpressionStatement) String() string {
if es.Expression != nil {
return es.Expression.String()
}
return ""
}
type IntegerLiteral struct {
Token token.Token
Value int64
}
func (il *IntegerLiteral) expressionNode() {}
func (il *IntegerLiteral) TokenLiteral() string {
return il.Token.Literal
}
func (il *IntegerLiteral) String() string {
return il.Token.Literal
}
type PrefixExpression struct {
Token token.Token
Operator string
Right Expression
}
func (pe *PrefixExpression) expressionNode() {}
func (pe *PrefixExpression) TokenLiteral() string {
return pe.Token.Literal
}
func (pe *PrefixExpression) String() string {
var out bytes.Buffer
out.WriteString("(")
out.WriteString(pe.Operator)
out.WriteString(pe.Right.String())
out.WriteString(")")
return out.String()
}
I wanted to implement this piece-meal, with you looking on, to get a sense of the process rather than just seeing the finished thing.
As a next step, I would refactor the model out into its own file, ast_model.lua
. If we did this, we would not have to pass the model (self) reference (M
) from the template to the nodes()
model function in ast.htt.lua
.
To achieve this, we would:
ast.htt.lua
to ast_model.lua
ast.htt
, import the module by adding % local model = require 'ast_model'
to the top of the fileast.htt
, change the call to nodes(M)
to model.nodes()
ast_model.lua
, import the template module by adding local tpl = require '//ast.htt'
to the top of the file.ast_model.lua
, change all references to the model m.component
to tpl.component
.