This is a POSIX-compliant* shell written in Go.
*: (not really)
The following are checked if they are implemented at all. This doesn't mean the checked feature is complete or bug-free.
- Simple commands
- Environment variables
- Parameter expansion
- Strings
- Redirection
- Piping
- Variable assignment
- Control flow
To build psh, use the Makefile
$ make psh
$ ./psh -t '/bin/echo hello world'
To run the tests, you will need to:
- Install dependencies managed by Glide. First, install Glide
- Run
glide install
- Run
make test
Links:
There are three forms of for loops.
for x in `seq 1 10`; do
echo $x
done
This is interpreted as an empty sequence, so no loop iterations are done (e.g. if your sequence comes from an empty shell variable, the loop does nothing).
for x in; do
echo $x
done
In this case, the loop iterates over $@
, the command line arguments.
for x; do
echo $x
done
There is no order of operations for And/Or operators.
The operators "&&" and "||" shall have equal precedence and shall be evaluated with left associativity. For example, both of the following commands write solely bar to standard output:
false && echo foo || echo bar true || echo foo && echo bar
Tilde expansions, parameter expansions, command substitutions, arithmetic expansions, and quote removals that occur within a single word expand to a single field. It is only field splitting or pathname expansion that can create multiple fields from a single word. The single exception to this rule is the expansion of the special parameter '@' within double-quotes, as described in Special Parameters.
We can execute a single command stored in an environment variable
bash> FOO='echo wumbo'; $FOO
wumbo
We cannot execute multiple commands in a variable
bash> FOO='echo wumbo; echo mini'; $FOO
wumbo; echo mini
The executed command does not expand variables
bash> export Y=thisisy
bash> FOO='echo $Y'; $FOO
$Y
We can use eval
to make this work though
bash> export Y=thisisy
bash> FOO='echo $Y'; eval $FOO
$Y
There are four peices:
- The
Lexer
which converts an input string to a stream of tokens - The
Parser
which converts a stream of tokens to a parse tree, representing the shell program - The
Interpreter
which "executes" the parse tree - The CLI which gets input from a user, and orchestrates the lexing, parsing, and interpretation of that input.
This is modeled after the Golang template lexer, which is described in Rob Pike's talk on the Golang template lexer.
The Lexer
type runs a loop that advances through "state functions". In
general, state functions will,
- Read the next character(s) in the stream
- Emit token(s)
- Return the next state (which is picked up by the lexer)
This makes writing the lexer fairly straightforward and readable (see
lex/states.go
).
Tokens are emitted onto a golang channel. I actually don't like using the channel for a couple of reasons:
- The token channel introduces concurrency where we probably don't need it. (There may be some tiny performance benefit, but I doubt it is noticeable.)
- It makes debug logging from the lexer and other stages appear out of order.
But channels are idiomatic for golang and it wasn't problematic.
In comparison to the Golang template lexer, the psh lexer makes a modification to the state function signature to include a next state:
// the golang template lexer's version of state functions
type stateFn func(*Lexer) stateFn
// the version of state functions in psh
type stateFn func(*Lexer, stateFn) stateFn
Why do I want a state function that accepts the next state? Well, it is easier to compose state functions into chains, or to "resume" in a previous state.
For example, the psh lexer must be able to tokenize a parameter expansion both inside and outside a string.
${X} -- lexDollarExpansion
Dollar "$" -- These are the tokens for a "bare" parameter expansion
LeftBrace "{"
Name "X"
RightBrace "}"
"abc${X}def"
Quote '"' -- lexDoubleQuotedString
Word "abc"
-- lexDollarExpansion
Dollar "$" -- These tokens inside a string are identical to those above
LeftBrace "{"
Name "X"
RightBrace "}"
Word "def"
Quote '"'
And there is recursion like "abc${X:-"${Y:-"$Z"}"}def"
where you have strings
in parameter expansions in strings in parameter expansions in strings in ...,
which I needed to support.
In order to lex these, I wrote two state functions:
lexDollarExpansion
- tokenizes a parameter expansionlexDoubleQuotedString
- tokenizes a string, and calls out tolexDollarExpansion
where it needs to
Simple!!
Well, not really. There were some issues trying to use unmodified state functions for this use case:
- I cannot test state functions for equality. If I execute one state function, it returns the next state. But which one? I have no way to know where I'm at, which precludes making intelligent decisions. I could maybe use hacks, like reflection, or I could wrap state functions in a struct, but this would muddy the code.
- A state function can return any state, which could return other states and so on. If I want state A to delegate to state B to consume a specific portion of the text (and then have state A pick up where B left off), I first needed to the number of states advance through after invoking B - which is possible but that information (e.g. "how many states until I've tokenized from left brace to right brace?") is implicit in the code. Relying on that sort of implicit information makes the code brittle (easy to break when updated), which is undesirable.
- State functions are not "customizable". For example, they do not accept a
convenient boolean parameter that says "yes, you are inside a string" in order
for one state to sometimes return different states depending on context. Now,
the golang template lexer uses additional variables in "state functions",
but this changes the type signature of the function (it is no longer of type
stateFn
), and it doesn't scale well since I have to forward each parameter along to possibly many other state functions. Another solution to this is to store a flag on theLexer
type, but this defeats one of the nice things about state functions: the code is the state! Putting state variables into theLexer
meant less comprehensible code.
For the solution, I wanted something that was consistent and understandable. I
wanted all the state functions to be of type stateFn
. I wanted to avoid
state variables, either as parameters or as flags on Lexer
. I wanted to avoid
testing for states. That is, I should never have to ask "which state am I at?".
I should be able to reuse state functions easily.
The solution was to change the stateFn
type to receive a nextState
parameter:
// the golang template lexer's version of state functions
type stateFn func(*Lexer) stateFn
// the version of state functions in psh
type stateFn func(*Lexer, stateFn) stateFn
This lets me compose states into chains that are easy to read:
// lex the end of a brace expansion, and then go to `nextState`
func lexBraceExpansionEnd(lx *Lexer, nextState stateFn) stateFn {
...
// "I will get to nextState, but first we need to lex an operator, then
// some text, then another operator, and then we can go to nextState"
return composeStates(lx, lexOperator, lexText, lexOperator, nextState)
...
}
For my string tokenizing problem, I could now say:
// lex the contents of a string, then go to `nextState`
func lexDoubleQuotedStringContents(lx *Lexer, nextState stateFn) stateFn {
...
if c == '$' {
// lex the dollar expansion, lex more string contents, then go to nextState
return composeStates(lx, lexDollarExpansion, lexDoubleQuotedStringContents, nextState)
}
...
}
The Parser
is in ast/parser.go
. It accepts a Lexer
, consumes all tokens
from the token channel, and returns an ast.Node
.
Rather than having a single enormous Parser class responsible for everything,
parsing different bits of the syntax tree is delegated to the different
node types. If a node implements ast.Parselet
, then it can be used to parse a
portion of the syntax tree.
The incoming tokens retain their line/position information, which we can use for error messages.
Strings felt a bit weird. For example, the string "abd${X}def"
has three
pieces:
- the plain string
"abc"
- the parameter expansion
${X}
- the plain string
"def"
But, a string like this could represent a single command we need to execute, so we want to be able to treat it as a single entity.
I ended up with an interface called ast.StrPiece
. The ast.Str
type stores a
list of ast.StrPiece
. This lets me store both plain string and parameter
expansion node types in an ast.Str
node:
&ast.Str{
Pieces: []StrPiece{
&ast.RawStr{"abc"},
&ast.ParameterExpansion{VarName: "X"},
&ast.RawStr{"def"},
},
}
This gets passed along to the interpreter which knows how to evaluate and concatenate the different string pieces into a single string, which we can then insert into an environment variable or wherever.