Loading data
Earlier, we saw how you can use commands like
ls
, ps
,
date
, and
sys
to load information about your files, processes, time of date,
and the system itself. Each command gives us a table of
information that we can explore. There are other ways we can
load in a table of data to work with.
Opening files
One of Rsh's most powerful assets in working with data is
the
open
command. It is a multi-tool that can work with a number of
different data formats. To see what this means, let's try
opening a json file:
> open editors/vscode/package.json
──────────────────┬───────────────────────────────────────────────────────────────────────────────
name │ lark
description │ Lark support for VS Code
author │ Lark developers
license │ MIT
version │ 1.0.0
repository │ [row type url]
publisher │ vscode
categories │ [table 0 rows]
keywords │ [table 1 rows]
engines │ [row vscode]
activationEvents │ [table 1 rows]
main │ ./out/extension
contributes │ [row configuration grammars languages]
scripts │ [row compile postinstall test vscode:prepublish watch]
devDependencies │ [row @types/mocha @types/node tslint typescript vscode vscode-languageclient]
──────────────────┴───────────────────────────────────────────────────────────────────────────────
In a similar way to
ls
, opening a file type that Rsh understands will give us back
something that is more than just text (or a stream of bytes).
Here we open a "package.json" file from a JavaScript
project. Rsh can recognize the JSON text and parse it to a table
of data.
If we wanted to check the version of the project we were looking
at, we can use the
get
command.
> open editors/vscode/package.json | get version
1.0.0
Rsh currently supports the following formats for loading data directly into tables:
- csv
- eml
- ics
- ini
- json
- nuon
- ods
- SQLite databases
- ssv
- toml
- tsv
- url
- vcf
- xlsx / xls
- xml
- yaml / yml
Did you know?
Under the hood open
will look for a
from ...
subcommand in your scope which matches
the extension of your file. You can thus simply extend the set
of supported file types of open
by creating your
own from ...
subcommand.
But what happens if you load a text file that isn't one of these? Let's try it:
> open README.md
We're shown the contents of the file.
Below the surface, what Rsh sees in these text files is one large string. Next, we'll talk about how to work with these strings to get the data we need out of them.
NUON
rsh Object Notation (NUON) aims to be for rsh what JavaScript Object Notation (JSON) is for JavaScript. That is, NUON code is a valid rsh code that describes some data structure. For example, this is a valid NUON (example from the default configuration file):
{
menus: [
# Configuration for default rsh menus
# Note the lack of source parameter
{
name: completion_menu
only_buffer_difference: false
marker: "| "
type: {
layout: columnar
columns: 4
col_width: 20 # Optional value. If missing all the screen width is used to calculate column width
col_padding: 2
}
style: {
text: green
selected_text: green_reverse
description_text: yellow
}
}
]
}
You might notice it is quite similar to JSON, and you're right. NUON is a superset of JSON! That is, any JSON code is a valid NUON code, therefore a valid rsh code. Compared to JSON, NUON is more "human-friendly". For example, comments are allowed and commas are not required.
One limitation of NUON currently is that it cannot represent all of the rsh data types. Most notably, NUON does not allow to serialize blocks.
Handling Strings
An important part of working with data coming from outside Rsh is that it's not always in a format that Rsh understands. Often this data is given to us as a string.
Let's imagine that we're given this data file:
> open people.txt
Octavia | Butler | Writer
Bob | Ross | Painter
Antonio | Vivaldi | Composer
Each bit of data we want is separated by the pipe ('|') symbol, and each person is on a separate line. Rsh doesn't have a pipe-delimited file format by default, so we'll have to parse this ourselves.
The first thing we want to do when bringing in the file is to work with it a line at a time:
> open people.txt | lines
───┬──────────────────────────────
0 │ Octavia | Butler | Writer
1 │ Bob | Ross | Painter
2 │ Antonio | Vivaldi | Composer
───┴──────────────────────────────
We can see that we're working with the lines because
we're back into a list. Our next step is to see if we can
split up the rows into something a little more useful. For that,
we'll use the
split
command.
split
, as the name implies, gives us a way to split a delimited
string. We will use
split
's column
subcommand to split the contents
across multiple columns. We tell it what the delimiter is, and
it does the rest:
> open people.txt | lines | split column "|"
───┬──────────┬───────────┬───────────
# │ column1 │ column2 │ column3
───┼──────────┼───────────┼───────────
0 │ Octavia │ Butler │ Writer
1 │ Bob │ Ross │ Painter
2 │ Antonio │ Vivaldi │ Composer
───┴──────────┴───────────┴───────────
That almost looks correct. It looks like there's an
extra space there. Let's
trim
that extra space:
> open people.txt | lines | split column "|" | str trim
───┬─────────┬─────────┬──────────
# │ column1 │ column2 │ column3
───┼─────────┼─────────┼──────────
0 │ Octavia │ Butler │ Writer
1 │ Bob │ Ross │ Painter
2 │ Antonio │ Vivaldi │ Composer
───┴─────────┴─────────┴──────────
Not bad. The
split
command gives us data we can use. It also goes ahead and gives
us default column names:
> open people.txt | lines | split column "|" | str trim | get column1
───┬─────────
0 │ Octavia
1 │ Bob
2 │ Antonio
───┴─────────
We can also name our columns instead of using the default names:
> open people.txt | lines | split column "|" first_name last_name job | str trim
───┬────────────┬───────────┬──────────
# │ first_name │ last_name │ job
───┼────────────┼───────────┼──────────
0 │ Octavia │ Butler │ Writer
1 │ Bob │ Ross │ Painter
2 │ Antonio │ Vivaldi │ Composer
───┴────────────┴───────────┴──────────
Now that our data is in a table, we can use all the commands we've used on tables before:
> open people.txt | lines | split column "|" first_name last_name job | str trim | sort-by first_name
───┬────────────┬───────────┬──────────
# │ first_name │ last_name │ job
───┼────────────┼───────────┼──────────
0 │ Antonio │ Vivaldi │ Composer
1 │ Bob │ Ross │ Painter
2 │ Octavia │ Butler │ Writer
───┴────────────┴───────────┴──────────
There are other commands you can use to work with strings:
There is also a set of helper commands we can call if we know the data has a structure that Rsh should be able to understand. For example, let's open a Rust lock file:
> open Cargo.lock
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
[[package]]
name = "adhoc_derive"
version = "0.1.2"
The "Cargo.lock" file is actually a .toml file, but
the file extension isn't .toml. That's okay, we can use
the
from
command using the toml
subcommand:
> open Cargo.lock | from toml
──────────┬───────────────────
metadata │ [row 107 columns]
package │ [table 130 rows]
──────────┴───────────────────
The
from
command can be used for each of the structured data text formats
that Rsh can open and understand by passing it the supported
format as a subcommand.
Opening in raw mode
While it's helpful to be able to open a file and immediately
work with a table of its data, this is not always what you want
to do. To get to the underlying text, the
open
command can take an optional --raw
flag:
> open Cargo.toml --raw
[package] name = "rsh"
version = "0.1.3"
authors = ["Yehuda Katz <wycats@gmail.com>", "Jonathan Turner <jonathan.d.turner@gmail.com>"]
description = "A shell for the GitHub era"
license = "MIT"
SQLite
SQLite databases are automatically detected by
open
, no matter what their file extension is. You can open a whole
database:
> open foo.db
Or
get
a specific table:
> open foo.db | get some_table
Or run any SQL query you like:
> open foo.db | query db "select * from some_table"
(Note: some older versions of Rsh use
into db | query
instead of query db
)
Fetching URLs
In addition to loading files from your filesystem, you can also
load URLs by using the
http get
command. This will fetch the contents of the URL from the
internet and return it:
> http get https://blog.rust-lang.org/feed.xml
╭────────────┬──────────────────╮
│ tag │ feed │
│ attributes │ {record 1 field} │
│ content │ [table 18 rows] │
╰────────────┴──────────────────╯