How rsh Code Gets Run
As you probably noticed, rsh behaves quite differently from other shells and dynamic languages. In Thinking in Rsh, we advise you to think of rsh as a compiled language but we do not give much insight into why. This section hopefully fills the gap.
First, let's give a few example which you might intuitively try but which do not work in rsh.
- Sourcing a dynamic path
source $"($my_path)/common.rsh"
- Write to a file and source it in a single script
"def abc [] { 1 + 2 }" | save output.rsh
source "output.rsh"
- Change a directory and source a path within (even though the file exists)
if ('spam/foo.rsh' | path exists) {
cd spam
source-env foo.rsh
}
The underlying reason why all of the above examples won't work is a strict separation of parsing and evaluation steps by disallowing eval function. In the rest of this section, we'll explain in detail what it means, why we're doing it, and what the implications are. The explanation aims to be as simple as possible, but it might help if you've written a program in some language before.
Parsing and Evaluation
Interpreted Languages
Let's start with a simple "hello world" rsh program:
# hello.rsh
print "Hello world!"
When you run rsh hello.rsh
, rsh's interpreter
directly runs the program and prints the result to the screen.
This is similar (on the highest level) to other languages that
are typically interpreted, such as Python or Bash. If you write
a similar "hello world" program in any of these
languages and call python hello.py
or
bash hello.bash
, the result will be printed to the
screen. We can say that interpreters take the program in some
representation (e.g., a source code), run it, and give you the
result:
source code --> interpreting --> result
Under the hood, rsh's interpreter is split into two parts, like this:
1. source code --> parsing --> Intermediate Representation (IR)
2. IR --> evaluating --> result
First, the source code is analyzed by the parser and converted into an intermediate representation (IR), which in rsh's case are just some data structures. Then, these data structures are passed to the engine which evaluates them and produces the result. This is nothing unusual. For example, Python's source code is typically converted into bytecode before evaluation.
Compiled Languages
On the other side are languages that are typically "compiled", such as C, C++, or Rust. Assuming a simple "hello world" in Rust
// main.rs
fn main() {
println!("Hello, world!");
}
you first need to compile the program into
machine code instructions
and store the binary file to a disk (rustc main.rs
). Then, to produce a result, you need to run the binary
(./main
), which passes the instructions to the CPU:
1. source code --> compiler --> machine code
2. machine code --> CPU --> result
You can see the compile-run sequence is not that much different from the parse-evaluate sequence of an interpreter. You begin with a source code, parse (or compile) it into some IR (or machine code), then evaluate (or run) the IR to get a result. You could think of machine code as just another type of IR and the CPU as its interpreter.
One big difference, however, between interpreted and compiled languages is that interpreted languages typically implement an eval function while compiled languages do not. What does it mean?
Eval Function
Most languages considered as "dynamic" or "interpreted" have an eval function, for example Python (it has two, eval and exec) or Bash. It is used to take source code and interpret it within a running interpreter. This can get a bit confusing, so let's give a Python example:
# hello_eval.py
print("Hello world!")
eval("print('Hello eval!')")
When you run the file (python hello_eval.py
),
you'll see two messages: "Hello world!" and
"Hello eval!". Here is what happened:
- Parse the whole source code
- Evaluate
print("Hello world!")
-
To evaluate
eval("print('Hello eval!')")
: 3.1. Parseprint('Hello eval!')
3.2. Evaluateprint('Hello eval!')
Of course, you can have more fun and try
eval("eval(\"print('Hello
eval!')\")")
and so on...
You can see the eval function adds a new "meta" layer into the code execution. Instead of parsing the whole source code, then evaluating it, there is an extra parse-eval step during the evaluation. This means that the IR produced by the parser (whatever it is) can be further modified during the evaluation.
We've seen that without eval
, the difference
between compiled and interpreted languages is actually not that
big. This is exactly what we mean by
thinking of rsh as a compiled language: Despite rsh being an interpreted language, its lack of
eval
gives it characteristics and limitations
typical for traditional compiled languages like C or Rust.
We'll dig deeper into what it means in the next section.
Implications
Consider this Python example:
exec("def hello(): print('Hello eval!')")
hello()
Note: We're using exec
instead of
eval
because it can execute all valid Python
code, not just expressions. The principle is similar,
though.
What happens:
- Parse the whole source code
-
To evaluate
exec("def hello(): print('Hello eval!')")
: 2.1. Parsedef hello(): print('Hello eval!')
2.2 Evaluatedef hello(): print('Hello eval!')
- Evaluate
hello()
Note, that until step 2.2, the interpreter has no idea a
function hello
exists! This makes static analysis
of dynamic languages challenging. In the example, the existence
of hello
function cannot be checked just by parsing
(compiling) the source code. You actually need to go and
evaluate (run) the code to find out. While in a compiled
language, missing function is a guaranteed compile error, in a
dynamic interpreted language, it is a runtime error (which can
slip unnoticed if the line calling hello()
is, for
example, behind an if
condition and does not get
executed).
In rsh, there are exactly two steps:
- Parse the whole source code
- Evaluate the whole source code
This is the complete parse-eval sequence.
Not having eval
-like functionality prevents
eval
-related bugs from happening. Calling a
non-existent function is 100% guaranteed parse-time error in
rsh. Furthermore, after the parse step, we have a deep insight
into the program and we're 100% sure it is not going to
change during evaluation. This trivially allows for powerful and
reliable static analysis and IDE integration which is
challenging to achieve with more dynamic languages. In general,
you have more peace of mind when scaling rsh programs to bigger
applications.
Before going into examples, one note about the
"dynamic" and "static" terminology. Stuff
that happens at runtime (during evaluation, after parsing) is
considered "dynamic". Stuff that happens before
running (during parsing / compilation) is called
"static". Languages that have more stuff (such as
eval
, type checking, etc.) happening at runtime
are sometimes called "dynamic". Languages that
analyze most of the information (type checking,
data ownership, etc.) before evaluating the program are sometimes called
"static". The whole debate can get quite confusing,
but for the purpose of this text, the main difference between
a "static" and "dynamic" language is
whether it has or has not the eval function.
Common Mistakes
By insisting on strict parse-evaluation separation, we lose much of a flexibility users expect from dynamic interpreted languages, especially other shells, such as bash, fish, zsh and others. This leads to the examples at the beginning of this page not working. Let's break them down one by one
Note: The following examples use
source
, but similar conclusions apply to other commands that parse
rsh source code, such as
use
,
overlay use
,
hide
,
register
or
source-env
.
1. Sourcing a dynamic path
source $"($my_path)/common.rsh"
Let's break down what would need to happen for this to work
(assuming $my_path
is set somewhere):
-
Parse
source $"($my_path)/common.rsh"
-
To evaluate
source $"($my_path)/common.rsh"
: 2.1. Parse$"($my_path)/common.rsh"
2.2. Evaluate$"($my_path)/common.rsh"
to get the file name 2.3. Parse the contents of the file 2.4. Evaluate the contents of the file
You can see the process is similar to the
eval
functionality we talked about earlier. Nesting
parse-evaluation cycles into the evaluation is not allowed in
rsh.
To give another perspective, here is why it is helpful to think of rsh as a compiled language. Instead of
let my_path = 'foo'
source $"($my_path)/common.rsh"
imagine it being written in some typical compiled language, such as C++
#include <string>
std::string my_path("foo");
#include <my_path + "/common.h">
or Rust
let my_path = "foo";
use format!("{}::common", my_path);
If you've ever written a simple program in any of these languages, you can see these examples do not make a whole lot of sense. You need to have all the source code files ready and available to the compiler beforehand.
2. Write to a file and source it in a single script
"def abc [] { 1 + 2 }" | save output.rsh
source "output.rsh"
Here, the sourced path is static (= known at parse-time) so everything should be fine, right? Well... no. Let's break down the sequence again:
-
Parse the whole source code 1.1. Parse
"def abc [] { 1 + 2 }" | save output.rsh
1.2. Parsesource "output.rsh"
- 1.2.1. Openoutput.rsh
and parse its contents -
Evaluate the whole source code 2.1. Evaluate
"def abc [] { 1 + 2 }" | save output.rsh
to generateoutput.rsh
2.2. ...wait what???
We're asking rsh to read output.rsh
before it
even exists. All the source code needs to be available to rsh at
parse-time, but output.rsh
is only generated during
evaluation. Again, it helps here to
think of rsh as a compiled language.
3. Change a directory and source a path within
(We assume the spam/foo.rsh
file exists.)
if ('spam/foo.rsh' | path exists) {
cd spam
source-env foo.rsh
}
This one is similar to the previous example.
cd spam
changes the directory
during evaluation but
source-env
attempts to open and read foo.rsh
during parsing.
REPL
REPL
is what happens when you run rsh
without any file.
You launch an interactive prompt. By
> some code...
we denote a REPL entry followed by pressing Enter. For example
> print "Hello world!"
Hello world!
> ls
# prints files and directories...
means the following:
- Launch
rsh
-
Type
print "Hello world!"
, press Enter -
Type
ls
, press Enter
Hopefully, that's clear. Now, when you press Enter, these things happen:
- Parse the line input
- Evaluate the line input
- Merge the environment (such as the current working directory) to the internal rsh state
- Wait for another input
In other words, each REPL invocation is its own separate parse-evaluation sequence. By merging the environment back to the rsh's state, we maintain continuity between the REPL invocations.
To give an example, we showed that
cd spam
source-env foo.rsh
does not work because the directory will be changed
after
source-env
attempts to read the file. Running these commands as separate
REPL entries, however, works:
> cd spam
> source-env foo.rsh
# yay, works!
To see why, let's break down what happens in the example:
- Launch
rsh
- Parse
cd spam
- Evaluate
cd spam
- Merge environment (including the current directory) into the rsh state
- Parse
source-env foo.rsh
- Evaluate
source-env foo.rsh
- Merge environment (including the current directory) into the rsh state
When
source-env
tries to open foo.rsh
during the parsing in step
5., it can do so because the directory change from step 3. was
merged into the rsh state in step 4. and therefore is visible in
the following parse-evaluation cycles.
Parse-time Evaluation
While it is impossible to add parsing into the evaluation, we can add a little bit of evaluation into parsing. This feature has been added only recently and we're going to expand it as needed.
One pattern that this unlocks is being able to
source
/use
/etc. a path from a "variable". We've seen that
let some_path = 'foo/common.rsh'
source $some_path
does not work, but we can do the following:
const some_path = 'foo/common.rsh'
source $some_path
We can break down what is happening again:
-
Parse the whole source code 1.1. Parse
const some_path = 'foo/common.rsh'
- 1.1.1. Evaluate*'foo/common.rsh'
and store it as asome_path
constant 1.2. Parsesource $some_path
- 1.2.1. Evaluate*$some_path
, see that it is a constant, fetch it - 1.2.2. Parse thefoo/common.rsh
file -
Evaluate the whole source code 2.1. Evaluate
const some_path = 'foo/common.rsh'
(i.e., add thefoo/common.rsh
string to the runtime stack assome_path
variable) 2.2. Evaluatesource $some_path
(i.e., evaluate the contents offoo/common.rsh
)
This still does not violate our rule of not having an eval function, because an eval function adds additional parsing to the evaluation step. With parse-time evaluation we're doing the opposite.
Also, note the * in steps 1.1.1. and 1.2.1. The evaluation happening during parsing is very restricted and limited to only a small subset of what is normally allowed during a regular evaluation. For example, the following is not allowed:
const foo_contents = (open foo.rsh)
By allowing everything during parse-time evaluation, we could set ourselves up to a lot of trouble (think of generating an infinite stream in a subexpression...). Generally, only a simple expressions without side effects are allowed, such as string literals or integers, or composite types of these literals (records, lists, tables).
Compiled ("static") languages also tend to have a way to convey some logic at compile time, be it C's preprocessor, Rust's macros, or Zig's comptime. One reason is performance (if you can do it during compilation, you save the time during runtime) which is not as important for rsh because we always do both parsing and evaluation, we do not store the parsed result anywhere (yet?). The second reason is similar to rsh's: Dealing with limitations caused by the absence of the eval function.
Conclusion
rsh operates in a scripting language space typically dominated
by "dynamic" "interpreted" languages, such
as Python, bash, zsh, fish, etc. While rsh is also
"interpreted" in a sense that it runs the code
immediately, instead of storing the intermediate representation
(IR) to a disk, one feature sets it apart from the pack: It does
not have an eval function. In other words, rsh
cannot parse code and manipulate its IR during evaluation. This
gives rsh one characteristic typical for "static"
"compiled" languages, such as C or Rust: All the
source code must be visible to the parser beforehand, just like
all the source code must be available to a C or Rust compiler.
For example, you cannot
source
or
use
a path computed "dynamically" (during evaluation).
This is surprising for users of more traditional scripting
languages, but it helps to
think of rsh as a compiled language.