rsh 0.34

rsh, or Rsh for short, is a new shell that takes a modern, structured approach to your commandline. It works seamlessly with the data from your filesystem, operating system, and a growing number of file formats to make it easy to build powerful commandline pipelines.

Today, we're releasing 0.34 of Rsh. This release is the first to support dataframes and also includes a set of usability improvements.

Where to get it

Rsh 0.34 is available as pre-built binariesopen in new window or from crates.ioopen in new window. If you have Rust installed you can install it using cargo install rsh.

If you want all the goodies, you can install cargo install rsh --features=extra.

If you'd like to try the experimental paging feature in this release, you can install with cargo install rsh --features=table-pager.

As part of this release, we also publish a set of plugins you can install and use with Rsh. To install, use cargo install rsh_plugin_<plugin name>.

What's New

Dataframes (elferherrera)

With 0.34, we've introduced a new family of commands to work with dataframes. Dataframes are an efficient way of working with large datasets by storing data as columns and offering a set of operations over them.

To create a dataframe, you can use the dataframe open command and pass it a source file to load. This command currently supports CSV and parquet files.

> let df = (dataframe open .\Data7602DescendingYearOrder.csv)

Once loaded, there are a variety of commands you can use to interact with the dataframe (you can get the full list with dataframe --help). For example, to see the first few rows of the dataframe we just loaded, we can use dataframe first:

> $df | dataframe first

───┬──────────┬─────────┬──────┬───────────┬──────────
 # │ anzsic06 │  Area   │ year │ geo_count │ ec_count
───┼──────────┼─────────┼──────┼───────────┼──────────
 0 │ A        │ A100100 │ 2000 │        96 │      130
 1 │ A        │ A100200 │ 2000 │       198 │      110
 2 │ A        │ A100300 │ 2000 │        42 │       25
 3 │ A        │ A100400 │ 2000 │        66 │       40
 4 │ A        │ A100500 │ 2000 │        63 │       40
───┴──────────┴─────────┴──────┴───────────┴──────────

Where dataframes really shine is their performance.

For example, the above dataset is 5 columns and ~5.5 million rows of data. We're able to process group it by the year column, sum the results, and display it to the user in 557ms:

# process.rsh
let df = (dataframe open Data7602DescendingYearOrder.csv)
let res = ($df | dataframe group-by year | dataframe aggregate sum | dataframe select geo_count)
$res

> benchmark {source process.rsh}

───┬───────────────────
 # │     real time
───┼───────────────────
 0 │ 557ms 658us 500ns
───┴───────────────────

By comparison, here's the same example in pandas:

import pandas as pd

df = pd.read_csv("Data7602DescendingYearOrder.csv")
res = df.groupby("year")["geo_count"].sum()
print(res)

> benchmark {python .\load.py}

───┬────────────────────────
 # │       real time
───┼────────────────────────
 0 │ 1sec 966ms 954us 800ns
───┴────────────────────────

System Details: The benchmarks presented in this section were run using a machine with a processor Intel(R) Core(TM) i7-10710U (CPU @1.10GHz 1.61 GHz) and 16 gb of RAM.

While these results are still early, we're excited to see what can be possible using rsh for processing large datasets.

You can learn more about dataframes, including many examples and a much more in-depth explanation, by reading the new dataframes chapter of the rsh bookopen in new window.

Note: while all the dataframe functionality is currently grouped behind the dataframe top-level command, we hope to extend support for dataframes to other common rsh commands.

Improved multiline support (jt)

We've extended multiline expression support to more areas. Now, you can span tables over multiple lines more naturally:

[
  [name, value];
  [foo, 2]
  [bar, 7]
]

Subexpression now also span multiple lines. Everything inside of the parentheses are treated as if they were written together:

(echo foo
| str length)

This also gives you a way to split up commands that have many arguments over multiple lines:

(echo foo
bar)

Multiple shorthand environment vars (jt)

A long-time shortcoming is now fixed in 0.34. You can now pass multiple environment shorthands to the same command:

> FOO=bar BAR=baz $rsh.env.FOO + $rsh.env.BAR
barbaz

Variable completions (andrasio)

In addition to steadily improving the completion engine, we've started adding support for completions for built-in variables.

You can now write $rsh.<TAB> to complete into the built-in $rsh variable, including completions for $rsh.env.S<TAB> for completing into environment variables.

Other variables that are in scope can also have their names completed.

New commands

Added the pathvar command for updating the PATHopen in new window (nathom)
Added a paste command for pasting from clipboardopen in new window (1ntEgr8)
Added $rsh.lang to reflect on the current commandsopen in new window (fdncred)

Additional improvements

Updated into binary to be more composableopen in new window (fdncred)
Added unique option to uniqopen in new window (mcbattirola)
Removed an outdated README noteopen in new window (yaymukund)
Added more comparison coercions with $nothingopen in new window (jt)
Updated the version command to output more infoopen in new window (fdncred)
Fixed a broken unit testopen in new window (fdncred)
Downgraded crossterm to fix pager compilationopen in new window (kubouch)
Removed unused crate featuresopen in new window (waywardmonkeys)
Updated a few dependenciesopen in new window and hereopen in new window and hereopen in new window and hereopen in new window and hereopen in new window (therealprof, waywardmonkeys)
Added dataframe take commandopen in new window (elferherrera)
Added script to submit winget package during releaseopen in new window (TechWatching)
Aligned dataframe params to match other rsh commandsopen in new window (elferherrera)
Added the ansi osc string terminatoropen in new window (fdncred)
Removed unused dependenciesopen in new window and alsoopen in new window (waywardmonkeys, andrasio)
Added casting operations for Series dataopen in new window (elferherrera)
Fixed a dataframe series bug with f64open in new window (elferherrera)
Added all-trim option to str trimopen in new window (palashahuja)
Ported more commands to engine-p 1open in new window and 2open in new window (efx)
Added support for arbitrarily nested subcommandsopen in new window (jt)
Added support for string interpolation when calling externalsopen in new window (voanhduy1512)
Made URL docs more consistentopen in new window (efx)
Speed up dataframe loadingopen in new window (elferherrera)
Improved parse errors for defopen in new window (jt)
Updated textview to always read its input from the streamopen in new window (jt)
Dataframe aggregation uses simpler column namesopen in new window (elferherrera)
Add support for more filesize to filesize mathopen in new window (fdncred)
Updated the Rsh API surface to expose more useful functionalityopen in new window (stormasm)
Fixed a panic during math with large durationsopen in new window (luccasmmg)

Looking ahead

Work on reedlineopen in new window has steadily grown in the background, and is now nearing the time where we will explore integrating it into rsh as rsh's line editor.

We're also working on a number of parser and engine improvementsopen in new window which we hope will make their way into future version of rsh.

Dataframe support continues to grow, and we're continuing to collaborate with projects that this builds on to ensure we are using the best techniques possible. There's a lot of potential here not only in terms of using dataframes, but where the Apache Arrow support might allow us to grow additional functionality in the future.

# rsh 0.34

# Where to get it

# What's New

# Dataframes (elferherrera)

# Improved multiline support (jt)

# Multiple shorthand environment vars (jt)

# Variable completions (andrasio)

# New commands

# Additional improvements

# Looking ahead