techWebsite/content/posts/nushell.md

+++
title = "Nushell first impressions"
date = 2024-03-01T11:34:04-06:00
+++

Ive been trying out a bunch of new shell utilities lately,
switching up my shell, terminal multiplexer, and even experimenting with my editor.
Today, Id like to focus on my experiments with my shell.

## My old setup

Before this, I had been using a minimal zsh setup for a long time,
with only built in features and a handmade prompt.
Zsh is a good shell, probably one of the best posix shells out there,
and I still use it when a posix shell is needed.

However, I got tired of the endless footguns that posix shell scripting imposes,
easy to make errors around quoting, word splitting, and escaping,
the sort of thing that makes [shellcheck](https://www.shellcheck.net/) necessary.

I played around with fish for a few days,
but it had many of the same fundamental design choices, mainly, being 'stringly typed',
that made posix shells such a displeasure to work with.

## A Nu shell

While googling around for alternative shells, I stumbled across [nushell](https://www.nushell.sh/),
a shell that claimed to work around structured data instead of just strings.
This was *exactly* what I was looking for, and I installed it immediately.
I decided to work with it for around a month,
give myself enough time to really use it,
see not only how it felt with ordinary usage,
but to give myself time and opportunity to construct a few pipelines and scripts in it.

All that said, the month is up, and Ive been collecting examples,
thoughts, and some criticisms along the way.

## Piping structured data

One of the core features of nushell is that commands return structured data,
instead of plain strings.
Pipelines can pass lists, records, or tables.
Individual entries can be one of several built in datatypes,
including rich datatypes like datetimes, durations, and filesizes.

Nushell can also open many filetypes and turn them into nushell native datastructures to work with, 
including csv's, json, toml, yaml, xml, sqlite files, and even excel and libreoffice calc spreadsheets.

Once you have your data in nushell datastructures,
you can do all sorts of manipulations on it.
It feels like an interesting mix of functional programming and SQL,
but it actually works really, really well.
You can sort, filter, and aggregate the data,
use a SQL style join statement between two tables,
and use functional programming patterns to manipulate tables.

Some examples of things that nushell enables with this structured data passing
through pipelines includes:

{{<highlight sh>}}
# show all files recursively that were modified in the last week
ls **/* | where modified > (
    # create timestamp from relative human readable string.
    '1 week ago' | into datetime
)
{{</highlight>}}

{{<highlight sh>}}
# show all executables in the current directory that are currently running.
ps |
# convert the name of the called process into a path
update name {|process| (which $process.name).path.0?} |
# join  with the list of all files in the current directory, recursing down subdirectories.
join (ls -f **/*) name
{{</highlight>}}

{{<highlight sh>}}
# show all values in 1 csv but not in another
open all_tasks.csv |
# filter out tasks that cause the closure to return false
filter { |task|
    not (
        # check if the task number is in the other csv.
        $task.number in (
            open tasks_done.csv | 
            get 'task_number'
        )
    )
}
{{</highlight>}}

All of these can be one liners, but have been broken up in order to insert
explanatory comments.

## Parsing non-nu tools

But what if our tool/text file isnt in a format nushell understands?
Thankfully, for most formats, parsing is relatively straightforward.
Lets take this NGINX server log, for example (not a log of real traffic, just a
sample log I found)

{{<highlight text "linenostart=9184">}}
135.125.217.54 - - [27/Mar/2023:12:57:44 +0000] "GET /.env HTTP/1.1" 404 197 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
135.125.217.54 - - [27/Mar/2023:12:57:44 +0000] "POST / HTTP/1.1" 405 568 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36"
43.154.141.71 - - [27/Mar/2023:12:58:04 +0000] "HEAD /Core/Skin/Login.aspx HTTP/1.1" 404 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
193.32.162.159 - - [27/Mar/2023:13:01:07 +0000] "GET / HTTP/1.1" 200 13 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36 Edg/90.0.818.46"
193.32.162.159 - - [27/Mar/2023:13:01:18 +0000] "GET /dispatch.asp HTTP/1.1" 404 197 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36 Edg/90.0.818.46"

{{</highlight>}}

We can parse it into a nu table like so (each line has a comment explaining what
it does, for those unfamiliar with the nushell language):

{{<highlight sh>}}
open access.log |
# turn into a list of lines
lines |
# parse into a table
parse '{ip} - {user} [{time}] "{request_type} {request}" {status} {bytes_sent} "{referrer}" "{user_agent}"' |
# parse time into proper datetime
update time {into datetime -f '%d/%b/%Y:%T %z'} |
# parse into proper integer
update bytes_sent {into int}
{{</highlight>}}

Now that we have it in nushell tables, we can bring all of nushells tools to
bear on the data. We could, for example, plot a histogram of the most common
ips, just by piping the whole thing into `histogram ip`. We could easily
calculate the average bytes sent per request. We could group the records by the
day or hour they happened, and analyze each of those groups independently. And
we can do all of that after arbitrarily filtering, sorting, or otherwise
transforming the table.

While it would be a pretty long one liner if we decided to put it in a single
line, its still quite easy and straightforward to write.
Most log formats and command outputs are similarly straightforward.

## Defining custom commands, with built in arg parsing

Nushell has a feature called Custom Commands, which fill the same purpose as
functions in other shells/programming languages, but are a bit more featurefull
than traditional POSIX shell functions.

First of all, nushell custom commands specify the number of positional arguments
they take.

{{<highlight sh>}}
def recently-modified [cutoff] {
# show all files recursively that were modified after a specified cutoff
    # show all files recurisively that were modified after a specified cutoff
    ls **/* | where modified > (
        # create timestamp from input
        $cutoff | into datetime
    )
}
{{</highlight>}}

You can optionally give the arguments a type

{{<highlight sh>}}
def recently-modified [cutoff: string] {
# show all files recursively that were modified after a specified cutoff
    # show all files recurisively that were modified after a specified cutoff
    ls **/* | where modified > (
        # create timestamp from input
        $cutoff | into datetime
    )
}
{{</highlight>}}

You can give the arguments a default value, making it optional, (can be combined
with a type specification)

{{<highlight sh>}}
def recently-modified [cutoff = '1 week ago'] {
    # show all files recursively that were modified after a specified cutoff
    ls **/* | where modified > (
        # create timestamp from input
        $cutoff | into datetime
    )
}
{{</highlight>}}

You have flag parsing, complete with short flags, is included as well. (A
flag without a type will be taken as a boolean flag, set by its presence or
absence)

{{<highlight sh>}}
def recently-modified [cutoff: string = '1 week ago' --older-than (-o)] {
    if $older_than {
        # show all files recursively that were modified after a specified cutoff
        ls **/* | where modified > (
            # create timestamp from input
            $cutoff | into datetime
        )
    } else {
        # show all files recursively that were modified before a specified cutoff
        ls **/* | where modified < (
            # create timestamp from input
            $cutoff | into datetime
        )
    }
}
{{</highlight>}}

And finally, you can add a rest command at the end, allowing you to take a variable number of
arguments.
{{<highlight sh>}}
def recently-modified [--cutoff = '1 week ago' ...paths] {
    for $path in $paths {
        # show all files recursively that were modified after a specified cutoff
        ls $path | where modified > (
            # create timestamp from input
            $cutoff | into datetime
        )
    }
}
{{</highlight>}}

All of the specified parameters are automatically added to a generated `--help`
page, along with a documentation comments, so that the following code block:

{{<highlight sh>}}
# display recently modified files
def recently-modified [
    --cutoff = '1 week ago' # cutoff to be considered 'recently modified'
    ...paths # paths to consider
] {
    for $path in $paths {
        # show all files recursively that were modified after a specified cutoff
        ls $path | where modified > (
            # create timestamp from input
            $cutoff | into datetime
        )
    }
}
{{</highlight>}}

Results in a help page that looks like this.

```
> recently-modified --help
display recently modified files

Usage:
  > recently-modified {flags} ...(paths) 

Flags:
  --cutoff <String> - cutoff to be considered 'recently modified' (default: '1 week ago')
  -h, --help - Display the help message for this command

Parameters:
  ...paths <any>: paths to consider

Input/output types:
  ╭───┬───────┬────────╮
  │ # │ input │ output │
  ├───┼───────┼────────┤
  │ 0 │ any   │ any    │
  ╰───┴───────┴────────╯
```

(the input/output table at the bottom has to do with how the command is used in
a pipeline, and is covered in more detail in the
[book](https://www.nushell.sh/book/command_signature.html))

This addition of easy argument parsing makes it incredibly convenient to add
command line arguments to your scripts and functions, something that is anything
but easy in POSIX shells.

## Error messages

Nushell brings with it great error messages that explain where the error
occurred. In bash, if we have a loop like:
{{<highlight sh "linenos=false">}}
$ for i in $(ls -l | tr -s " " | cut --fields=5 --delimiter=" "); do
echo "$i / 1000" | bc
done
{{</highlight>}}

This gets the sizes of all the files in kib. But what if we typo something?

{{<highlight sh "linenos=false">}}
$ for i in $(ls -l | tr -s " " | cut --fields=6 --delimiter=" "); do
echo "$i / 1000" | bc
done

(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
(standard_in) 1: syntax error
{{</highlight>}}

This error tells you nothing about what went wrong, and your only option is to
start print debugging.

The equivalent in nushell would be:

{{<highlight sh "linenos=false">}}
> ls | get size | each {|item| $item / 1000}
{{</highlight>}}

If we typo the size column, we get a nice error telling us exactly what we got
wrong, and where in the pipeline the error and value originated. Much better.

{{<highlight sh "linenos=false">}}
> ls | get szie | each {|item| $item / 1000}
Error: nu::shell::column_not_found

  × Cannot find column
   ╭─[entry #1:1:1]
 1 │ ls | get szie | each {|item| $item / 1000}
   · ─┬       ──┬─
   ·  │         ╰── cannot find column 'szie'
   ·  ╰── value originates here
   ╰────
{{</highlight>}}

## Whats not there yet

Now, nushell is not finished yet.
As I write, I am running version 0.91 of nu.
Similar to fish, it not being a POSIX shell means that you still need to drop
into bash or zsh in order to source env files in order to,
for example, use a cross-compiling c/c++ sdk.
(thankfully, python virtualenvs already come with a nu script for you to source,
so doing python dev will not require you to launch a POSIX shell)

Additionally, while you can write nu script files,
invoking them from within nu treats them as external commands,
meaning they take in and pass out plain text,
rather than the structured data that you would get with a proper custom command
or nu builtin.
The best workaround Ive found so far is instead of making scripts that you run
directly, you define a custom command in the script file, `use` that file, and
then run the custom command, like this:

{{<highlight sh>}}
# recently-modified.nu
# display recently modified files
def recently-modified [
    --cutoff = '1 week ago' # cutoff to be considered 'recently modified'
    ...paths # paths to consider
] {
    for $path in $paths {
        # show all files recursively that were modified after a specified cutoff
        ls $path | where modified > (
            # create timestamp from input
            $cutoff | into datetime
        )
    }
}
{{</highlight>}}

{{<highlight sh "linenos=false">}}
> use recently-modified.sh
> recently-modified --cutoff '2 weeks ago' ./
{{</highlight>}}

Its certainly not the most ergonomic, but seems to be the best way at the moment
to make 'scripts' that are integrated with the rest of nushell.

## So, overall, is it worth it?

Nushell is certainly an interesting project, and I will almost certainly be
continuing to use it as my daily shell. It cant do everything, but dropping into
zsh for a task or two every once in a while isnt that big a deal for me, and
having access to such a powerful shell by default has made other tasks much
easier for me.