mirror of
https://github.com/nushell/nushell.git
synced 2025-05-30 19:32:46 +00:00
<!-- if this PR closes one or more issues, you can automatically link the PR with them by using one of the [*linking keywords*](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword), e.g. - this PR should close #xxxx - fixes #xxxx you can also mention related issues, PRs or discussions! --> # Description <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> This PR adds type checking of all command input types at run-time. Generally, these errors should be caught by the parser, but sometimes we can't know the type of a value at parse-time. The simplest example is using the `echo` command, which has an output type of `any`, so prefixing a literal with `echo` will bypass parse-time type checking. Before this PR, each command has to individually check its input types. This can result in scenarios where the input/output types don't match the actual command behavior. This can cause valid usage with an non-`any` type to become a parse-time error if a command is missing that type in its pipeline input/output (`drop nth` and `history import` do this before this PR). Alternatively, a command may not list a type in its input/output types, but doesn't actually reject that type in its code, which can have unintended side effects (`get` does this on an empty pipeline input, and `sort` used to before #13154). After this PR, the type of the pipeline input is checked to ensure it matches one of the input types listed in the proceeding command's input/output types. While each of the issues in the "before this PR" section could be addressed with each command individually, this PR solves this issue for _all_ commands. **This will likely cause some breakage**, as some commands have incorrect input/output types, and should be adjusted. Also, some scripts may have erroneous usage of commands. In writing this PR, I discovered that `toolkit.nu` was passing `null` values to `str join`, which doesn't accept nothing types (if folks think it should, we can adjust it in this PR or in a different PR). I found some issues in the standard library and its tests. I also found that carapace's vendor script had an incorrect chaining of `get -i`: ```nushell let expanded_alias = (scope aliases | where name == $spans.0 | get -i 0 | get -i expansion) ``` Before this PR, if the `get -i 0` ever actually did evaluate to `null`, the second `get` invocation would error since `get` doesn't operate on `null` values. After this PR, this is immediately a run-time error, alerting the user to the problematic code. As a side note, we'll need to PR this fix (`get -i 0 | get -i expansion` -> `get -i 0.expansion`) to carapace. A notable exception to the type checking is commands with input type of `nothing -> <type>`. In this case, any input type is allowed. This allows piping values into the command without an error being thrown. For example, `123 | echo $in` would be an error without this exception. Additionally, custom types bypass type checking (I believe this also happens during parsing, but not certain) I added a `is_subtype` method to `Value` and `PipelineData`. It functions slightly differently than `get_type().is_subtype()`, as noted in the doccomments. Notably, it respects structural typing of lists and tables. For example, the type of a value `[{a: 123} {a: 456, b: 789}]` is a subtype of `table<a: int>`, whereas the type returned by `Value::get_type` is a `list<any>`. Similarly, `PipelineData` has some special handling for `ListStream`s and `ByteStream`s. The latter was needed for this PR to work properly with external commands. Here's some examples. Before: ```nu 1..2 | drop nth 1 Error: nu::parser::input_type_mismatch × Command does not support range input. ╭─[entry #9:1:8] 1 │ 1..2 | drop nth 1 · ────┬─── · ╰── command doesn't support range input ╰──── echo 1..2 | drop nth 1 # => ╭───┬───╮ # => │ 0 │ 1 │ # => ╰───┴───╯ ``` After this PR, I've adjusted `drop nth`'s input/output types to accept range input. Before this PR, zip accepted any value despite not being listed in its input/output types. This caused different behavior depending on if you triggered a parse error or not: ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => ╭───┬───────────╮ # => │ 0 │ ╭───┬───╮ │ # => │ │ │ 0 │ 1 │ │ # => │ │ │ 1 │ 2 │ │ # => │ │ ╰───┴───╯ │ # => ╰───┴───────────╯ ``` After this PR, it works the same in both cases. For cases like this, if we do decide we want `zip` or other commands to accept any input value, then we should explicitly add that to the input types. ```nushell 1 | zip [2] # => Error: nu::parser::input_type_mismatch # => # => × Command does not support int input. # => ╭─[entry #3:1:5] # => 1 │ 1 | zip [2] # => · ─┬─ # => · ╰── command doesn't support int input # => ╰──── echo 1 | zip [2] # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #14:2:6] # => 2 │ echo 1 | zip [2] # => · ┬ ─┬─ # => · │ ╰── only list<any> and range input data is supported # => · ╰── input type: int # => ╰──── ``` # User-Facing Changes <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> **Breaking change**: The type of a command's input is now checked against the input/output types of that command at run-time. While these errors should mostly be caught at parse-time, in cases where they can't be detected at parse-time they will be caught at run-time instead. This applies to both internal commands and custom commands. Example function and corresponding parse-time error (same before and after PR): ```nushell def foo []: int -> nothing { print $"my cool int is ($in)" } 1 | foo # => my cool int is 1 "evil string" | foo # => Error: nu::parser::input_type_mismatch # => # => × Command does not support string input. # => ╭─[entry #16:1:17] # => 1 │ "evil string" | foo # => · ─┬─ # => · ╰── command doesn't support string input # => ╰──── # => ``` Before: ```nu echo "evil string" | foo # => my cool int is evil string ``` After: ```nu echo "evil string" | foo # => Error: nu:🐚:only_supports_this_input_type # => # => × Input type not supported. # => ╭─[entry #17:1:6] # => 1 │ echo "evil string" | foo # => · ──────┬────── ─┬─ # => · │ ╰── only int input data is supported # => · ╰── input type: string # => ╰──── ``` Known affected internal commands which erroneously accepted any type: * `str join` * `zip` * `reduce` # Tests + Formatting <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> - 🟢 `toolkit fmt` - 🟢 `toolkit clippy` - 🟢 `toolkit test` - 🟢 `toolkit test stdlib` # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> * Play whack-a-mole with the commands and scripts this will inevitably break
263 lines
8.7 KiB
Rust
263 lines
8.7 KiB
Rust
use serde::{Deserialize, Serialize};
|
|
#[cfg(test)]
|
|
use strum_macros::EnumIter;
|
|
|
|
use std::fmt::Display;
|
|
|
|
use crate::SyntaxShape;
|
|
|
|
#[derive(Clone, Debug, Default, PartialEq, Eq, Serialize, Deserialize, Hash)]
|
|
#[cfg_attr(test, derive(EnumIter))]
|
|
pub enum Type {
|
|
Any,
|
|
Binary,
|
|
Block,
|
|
Bool,
|
|
CellPath,
|
|
Closure,
|
|
Custom(Box<str>),
|
|
Date,
|
|
Duration,
|
|
Error,
|
|
Filesize,
|
|
Float,
|
|
Int,
|
|
List(Box<Type>),
|
|
#[default]
|
|
Nothing,
|
|
Number,
|
|
Range,
|
|
Record(Box<[(String, Type)]>),
|
|
Signature,
|
|
String,
|
|
Glob,
|
|
Table(Box<[(String, Type)]>),
|
|
}
|
|
|
|
impl Type {
|
|
pub fn list(inner: Type) -> Self {
|
|
Self::List(Box::new(inner))
|
|
}
|
|
|
|
pub fn record() -> Self {
|
|
Self::Record([].into())
|
|
}
|
|
|
|
pub fn table() -> Self {
|
|
Self::Table([].into())
|
|
}
|
|
|
|
pub fn custom(name: impl Into<Box<str>>) -> Self {
|
|
Self::Custom(name.into())
|
|
}
|
|
|
|
/// Determine of the [`Type`] is a [subtype](https://en.wikipedia.org/wiki/Subtyping) of `other`.
|
|
///
|
|
/// This should only be used at parse-time. If you have a concrete [`Value`](crate::Value) or [`PipelineData`](crate::PipelineData),
|
|
/// you should use their respective [`is_subtype_of`] methods instead.
|
|
pub fn is_subtype_of(&self, other: &Type) -> bool {
|
|
// Structural subtyping
|
|
let is_subtype_collection = |this: &[(String, Type)], that: &[(String, Type)]| {
|
|
if this.is_empty() || that.is_empty() {
|
|
true
|
|
} else if this.len() < that.len() {
|
|
false
|
|
} else {
|
|
that.iter().all(|(col_y, ty_y)| {
|
|
if let Some((_, ty_x)) = this.iter().find(|(col_x, _)| col_x == col_y) {
|
|
ty_x.is_subtype_of(ty_y)
|
|
} else {
|
|
false
|
|
}
|
|
})
|
|
}
|
|
};
|
|
|
|
match (self, other) {
|
|
(t, u) if t == u => true,
|
|
(Type::Float, Type::Number) => true,
|
|
(Type::Int, Type::Number) => true,
|
|
(_, Type::Any) => true,
|
|
(Type::List(t), Type::List(u)) if t.is_subtype_of(u) => true, // List is covariant
|
|
(Type::Record(this), Type::Record(that)) | (Type::Table(this), Type::Table(that)) => {
|
|
is_subtype_collection(this, that)
|
|
}
|
|
(Type::Table(_), Type::List(_)) => true,
|
|
_ => false,
|
|
}
|
|
}
|
|
|
|
pub fn is_numeric(&self) -> bool {
|
|
matches!(self, Type::Int | Type::Float | Type::Number)
|
|
}
|
|
|
|
pub fn is_list(&self) -> bool {
|
|
matches!(self, Type::List(_))
|
|
}
|
|
|
|
/// Does this type represent a data structure containing values that can be addressed using 'cell paths'?
|
|
pub fn accepts_cell_paths(&self) -> bool {
|
|
matches!(self, Type::List(_) | Type::Record(_) | Type::Table(_))
|
|
}
|
|
|
|
pub fn to_shape(&self) -> SyntaxShape {
|
|
let mk_shape = |tys: &[(String, Type)]| {
|
|
tys.iter()
|
|
.map(|(key, val)| (key.clone(), val.to_shape()))
|
|
.collect()
|
|
};
|
|
|
|
match self {
|
|
Type::Int => SyntaxShape::Int,
|
|
Type::Float => SyntaxShape::Float,
|
|
Type::Range => SyntaxShape::Range,
|
|
Type::Bool => SyntaxShape::Boolean,
|
|
Type::String => SyntaxShape::String,
|
|
Type::Block => SyntaxShape::Block, // FIXME needs more accuracy
|
|
Type::Closure => SyntaxShape::Closure(None), // FIXME needs more accuracy
|
|
Type::CellPath => SyntaxShape::CellPath,
|
|
Type::Duration => SyntaxShape::Duration,
|
|
Type::Date => SyntaxShape::DateTime,
|
|
Type::Filesize => SyntaxShape::Filesize,
|
|
Type::List(x) => SyntaxShape::List(Box::new(x.to_shape())),
|
|
Type::Number => SyntaxShape::Number,
|
|
Type::Nothing => SyntaxShape::Nothing,
|
|
Type::Record(entries) => SyntaxShape::Record(mk_shape(entries)),
|
|
Type::Table(columns) => SyntaxShape::Table(mk_shape(columns)),
|
|
Type::Any => SyntaxShape::Any,
|
|
Type::Error => SyntaxShape::Any,
|
|
Type::Binary => SyntaxShape::Binary,
|
|
Type::Custom(_) => SyntaxShape::Any,
|
|
Type::Signature => SyntaxShape::Signature,
|
|
Type::Glob => SyntaxShape::GlobPattern,
|
|
}
|
|
}
|
|
|
|
/// Get a string representation, without inner type specification of lists,
|
|
/// tables and records (get `list` instead of `list<any>`
|
|
pub fn get_non_specified_string(&self) -> String {
|
|
match self {
|
|
Type::Block => String::from("block"),
|
|
Type::Closure => String::from("closure"),
|
|
Type::Bool => String::from("bool"),
|
|
Type::CellPath => String::from("cell-path"),
|
|
Type::Date => String::from("date"),
|
|
Type::Duration => String::from("duration"),
|
|
Type::Filesize => String::from("filesize"),
|
|
Type::Float => String::from("float"),
|
|
Type::Int => String::from("int"),
|
|
Type::Range => String::from("range"),
|
|
Type::Record(_) => String::from("record"),
|
|
Type::Table(_) => String::from("table"),
|
|
Type::List(_) => String::from("list"),
|
|
Type::Nothing => String::from("nothing"),
|
|
Type::Number => String::from("number"),
|
|
Type::String => String::from("string"),
|
|
Type::Any => String::from("any"),
|
|
Type::Error => String::from("error"),
|
|
Type::Binary => String::from("binary"),
|
|
Type::Custom(_) => String::from("custom"),
|
|
Type::Signature => String::from("signature"),
|
|
Type::Glob => String::from("glob"),
|
|
}
|
|
}
|
|
}
|
|
|
|
impl Display for Type {
|
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
|
match self {
|
|
Type::Block => write!(f, "block"),
|
|
Type::Closure => write!(f, "closure"),
|
|
Type::Bool => write!(f, "bool"),
|
|
Type::CellPath => write!(f, "cell-path"),
|
|
Type::Date => write!(f, "date"),
|
|
Type::Duration => write!(f, "duration"),
|
|
Type::Filesize => write!(f, "filesize"),
|
|
Type::Float => write!(f, "float"),
|
|
Type::Int => write!(f, "int"),
|
|
Type::Range => write!(f, "range"),
|
|
Type::Record(fields) => {
|
|
if fields.is_empty() {
|
|
write!(f, "record")
|
|
} else {
|
|
write!(
|
|
f,
|
|
"record<{}>",
|
|
fields
|
|
.iter()
|
|
.map(|(x, y)| format!("{x}: {y}"))
|
|
.collect::<Vec<String>>()
|
|
.join(", "),
|
|
)
|
|
}
|
|
}
|
|
Type::Table(columns) => {
|
|
if columns.is_empty() {
|
|
write!(f, "table")
|
|
} else {
|
|
write!(
|
|
f,
|
|
"table<{}>",
|
|
columns
|
|
.iter()
|
|
.map(|(x, y)| format!("{x}: {y}"))
|
|
.collect::<Vec<String>>()
|
|
.join(", ")
|
|
)
|
|
}
|
|
}
|
|
Type::List(l) => write!(f, "list<{l}>"),
|
|
Type::Nothing => write!(f, "nothing"),
|
|
Type::Number => write!(f, "number"),
|
|
Type::String => write!(f, "string"),
|
|
Type::Any => write!(f, "any"),
|
|
Type::Error => write!(f, "error"),
|
|
Type::Binary => write!(f, "binary"),
|
|
Type::Custom(custom) => write!(f, "{custom}"),
|
|
Type::Signature => write!(f, "signature"),
|
|
Type::Glob => write!(f, "glob"),
|
|
}
|
|
}
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::Type;
|
|
use strum::IntoEnumIterator;
|
|
|
|
mod subtype_relation {
|
|
use super::*;
|
|
|
|
#[test]
|
|
fn test_reflexivity() {
|
|
for ty in Type::iter() {
|
|
assert!(ty.is_subtype_of(&ty));
|
|
}
|
|
}
|
|
|
|
#[test]
|
|
fn test_any_is_top_type() {
|
|
for ty in Type::iter() {
|
|
assert!(ty.is_subtype_of(&Type::Any));
|
|
}
|
|
}
|
|
|
|
#[test]
|
|
fn test_number_supertype() {
|
|
assert!(Type::Int.is_subtype_of(&Type::Number));
|
|
assert!(Type::Float.is_subtype_of(&Type::Number));
|
|
}
|
|
|
|
#[test]
|
|
fn test_list_covariance() {
|
|
for ty1 in Type::iter() {
|
|
for ty2 in Type::iter() {
|
|
let list_ty1 = Type::List(Box::new(ty1.clone()));
|
|
let list_ty2 = Type::List(Box::new(ty2.clone()));
|
|
assert_eq!(list_ty1.is_subtype_of(&list_ty2), ty1.is_subtype_of(&ty2));
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|