techWebsite/content/posts/nushell.md

11 KiB

+++ title = "Nushell first impressions" date = 2024-03-01T11:34:04-06:00 draft = true +++

Ive been trying out a bunch of new shell utilities lately, switching up my shell, terminal multiplexer, and even experimenting with my editor. Today, Id like to focus on my experiments with my shell.

My old setup

Before this, I had been using a minimal zsh setup for a long time, with only built in features and a handmade prompt. Zsh is a good shell, probably one of the best posix shells out there, and I still use it when a posix shell is needed.

However, I got tired of the endless footguns that posix shell scripting imposes, easy to make errors around quoting, word splitting, and escaping, the sort of thing that makes shellcheck necessary.

I played around with fish for a few days, but it had many of the same fundamental design choices, mainly, being 'stringly typed', that made posix shells such a displeasure to work with.

A Nu shell

While googling around for alternative shells, I stumbled across nushell, a shell that claimed to work around structured data instead of just strings. This was exactly what I was looking for, and I installed it immediately. I decided to work with it for around a month, give myself enough time to really use it, see not only how it felt with ordinary usage, but to give myself time and opportunity to construct a few pipelines and scripts in it.

All that said, the month is up, and Ive been collecting examples, thoughts, and some criticisms along the way.

Piping structured data

One of the core features of nushell is that commands return structured data, instead of plain strings. Pipelines can pass lists, records, or tables. Individual entries can be one of several built in datatypes, including rich datatypes like datetimes, durations, and filesizes.

Nushell can also open many filetypes and turn them into nushell native datastructures to work with, including csv's, json, toml, yaml, xml, sqlite files, and even excel and libreoffice calc spreadsheets.

Once you have your data in nushell datastructures, you can do all sorts of manipulations on it. It feels like an interesting mix of functional programming and SQL, but it actually works really, really well. You can sort, filter, and aggregate the data, use a SQL style join statement between two tables, and use functional programming patterns to manipulate tables.

Some examples of things that nushell enables with this structured data passing through pipelines includes:

{{}}

show all files recurisively that were modified in the last week

ls */ | where modified > ( # create timestamp from relative human readable string. '1 week ago' | into datetime ) {{}}

{{}}

show all executables in the current directory that are currently running.

ps |

convert the name of the called process into a path

update name {|process| (which $process.name).path.0?} |

join with the list of all files in the current directory, recursing down subdirectories.

join (ls -f **/*) name {{}}

{{}}

show all values in 1 csv but not in another

open all_tasks.csv |

filter out tasks that cause the closure to return false

filter { |task| not ( # check if the task number is in the other csv. $task.number in ( open tasks_done.csv | get 'task_number' ) ) } {{}}

All of these can be one liners, but have been broken up in order to insert explanatory comments.

Parsing non-nu tools

But what if our tool/text file isnt in a format nushell understands? Thankfully, for most formats, parsing is relatively straightforward. Lets take this NGINX server log, for example (not a log of real traffic, just a sample log I found)

{{<highlight text "linenostart=9184">}} 135.125.217.54 - - [27/Mar/2023:12:57:44 +0000] "GET /.env HTTP/1.1" 404 197 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 135.125.217.54 - - [27/Mar/2023:12:57:44 +0000] "POST / HTTP/1.1" 405 568 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 43.154.141.71 - - [27/Mar/2023:12:58:04 +0000] "HEAD /Core/Skin/Login.aspx HTTP/1.1" 404 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" 193.32.162.159 - - [27/Mar/2023:13:01:07 +0000] "GET / HTTP/1.1" 200 13 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36 Edg/90.0.818.46" 193.32.162.159 - - [27/Mar/2023:13:01:18 +0000] "GET /dispatch.asp HTTP/1.1" 404 197 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36 Edg/90.0.818.46"

{{}}

We can parse it into a nu table like so (each line has a comment explaining what it does, for those unfamiliar with the nushell language):

{{}} open access.log |

turn into a list of lines

lines |

parse into a table

parse '{ip} - {user} [{time}] "{request_type} {request}" {status} {bytes_sent} "{referrer}" "{user_agent}"' |

parse time into proper datetime

update time {into datetime -f '%d/%b/%Y:%T %z'} |

parse into proper integer

update bytes_sent {into int} {{}}

Now that we have it in nushell tables, we can bring all of nushells tools to bear on the data. We could, for example, plot a histogram of the most common ips, just by piping the whole thing into histogram ip. We could easily calculate the average bytes sent per request. We could group the records by the day or hour they happened, and analyze each of those groups independently. And we can do all of that after arbitrarily filtering, sorting, or otherwise transforming the table.

While it would be a pretty long one liner if we decided to put it in a single line, its still quite easy and straightforward to write. Most log formats and command outputs are similarly straightforward.

Defining custom commands, with built in arg parsing

Nushell has a feature called Custom Commands, which fill the same purpose as functions in other shells/programming languages, but are a bit more featurefull than traditional POSIX shell functions.

First of all, nushell custom commands specify the number of positional arguments they take.

{{}} def recently-modified [cutoff] {

show all files recurisively that were modified after a specified cutoff

# show all files recurisively that were modified after a specified cutoff
ls **/* | where modified > (
    # create timestamp from input
    $cutoff | into datetime
)

} {{}}

You can optionally give the arguments a type

{{}} def recently-modified [cutoff: string] {

show all files recurisively that were modified after a specified cutoff

# show all files recurisively that were modified after a specified cutoff
ls **/* | where modified > (
    # create timestamp from input
    $cutoff | into datetime
)

} {{}}

You can give the arguments a default value, making it optional, (can be combined with a type specification)

{{}} def recently-modified [cutoff = '1 week ago'] { # show all files recurisively that were modified after a specified cutoff ls */ | where modified > ( # create timestamp from input $cutoff | into datetime ) } {{}}

You have flag parsing, complete with short flags, is included as well. (A flag without a type will be taken as a boolean flag, set by its presence or absence)

{{}} def recently-modified [cutoff: string = '1 week ago' --older-than (-o)] { if $older_than { # show all files recurisively that were modified after a specified cutoff ls */ | where modified > ( # create timestamp from input $cutoff | into datetime ) } else { # show all files recurisively that were modified before a specified cutoff ls */ | where modified < ( # create timestamp from input $cutoff | into datetime ) } } {{}}

And finally, you can add a rest command at the end, allowing you to take a variable number of arguments. {{}} def recently-modified [--cutoff = '1 week ago' ...paths] { for $path in $paths { # show all files recurisively that were modified after a specified cutoff ls $path | where modified > ( # create timestamp from input $cutoff | into datetime ) } } {{}}

All of the specified parameters are automatically added to a generated --help page, along with a documentation comments, so that the following code block:

{{}}

display recently modified files

def recently-modified [ --cutoff = '1 week ago' # cutoff to be considered 'recently modified' ...paths # paths to consider ] { for $path in $paths { # show all files recurisively that were modified after a specified cutoff ls $path | where modified > ( # create timestamp from input $cutoff | into datetime ) } } {{}}

Results in a help page that looks like this.

> recently-modified --help
display recently modified files

Usage:
  > recently-modified {flags} ...(paths) 

Flags:
  --cutoff <String> - cutoff to be considered 'recently modified' (default: '1 week ago')
  -h, --help - Display the help message for this command

Parameters:
  ...paths <any>: paths to consider

Input/output types:
  ╭───┬───────┬────────╮
  │ # │ input │ output │
  ├───┼───────┼────────┤
  │ 0 │ any   │ any    │
  ╰───┴───────┴────────╯

(the input/output table at the bottom has to do with how the command is used in a pipeline, and is covered in more detail in the book

This addition of easy argument parsing makes it incredibly convenient to add command line arguments to your scripts and functions, something that is anything but easy in POSIX shells.

Error messages

Nushell brings with it great, self explanatory error messages. For example, if we do this:

Whats not there yet

Now, nushell is not finished yet. As I write, I am running version 0.91 of nu. Similar to fish, it not being a POSIX shell means that you still need to drop into bash or zsh in order to source env files in order to, for example, use a cross-compiling c/c++ sdk. (thankfully, python virtualenvs already come with a nu script for you to source, so doing python dev will not require you to launch a POSIX shell)

Additionally, while you can write nu script files, invoking them from within nu treats them as external commands, meaning they take in and pass out plain text, rather than the structured data that you would get with a proper custom command or nu builtin. //explain the best workaround.