Ian Manske 6fd854ed9f
Replace ExternalStream with new ByteStream type (#12774)
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes #7017, fixes #10763, and fixes #12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
2024-05-16 07:11:18 -07:00

133 lines
4.4 KiB
Rust

use chrono_humanize::HumanTime;
use nu_engine::command_prelude::*;
use nu_protocol::{format_duration, format_filesize_from_conf, ByteStream, Config};
const LINE_ENDING: &str = if cfg!(target_os = "windows") {
"\r\n"
} else {
"\n"
};
#[derive(Clone)]
pub struct ToText;
impl Command for ToText {
fn name(&self) -> &str {
"to text"
}
fn signature(&self) -> Signature {
Signature::build("to text")
.input_output_types(vec![(Type::Any, Type::String)])
.category(Category::Formats)
}
fn usage(&self) -> &str {
"Converts data into simple text."
}
fn run(
&self,
engine_state: &EngineState,
_stack: &mut Stack,
call: &Call,
input: PipelineData,
) -> Result<PipelineData, ShellError> {
let span = call.head;
let input = input.try_expand_range()?;
match input {
PipelineData::Empty => Ok(Value::string(String::new(), span).into_pipeline_data()),
PipelineData::Value(value, ..) => {
let str = local_into_string(value, LINE_ENDING, engine_state.get_config());
Ok(Value::string(str, span).into_pipeline_data())
}
PipelineData::ListStream(stream, meta) => {
let span = stream.span();
let config = engine_state.get_config().clone();
let iter = stream.into_inner().map(move |value| {
let mut str = local_into_string(value, LINE_ENDING, &config);
str.push_str(LINE_ENDING);
str
});
Ok(PipelineData::ByteStream(
ByteStream::from_iter(iter, span, engine_state.ctrlc.clone()),
meta,
))
}
PipelineData::ByteStream(stream, meta) => Ok(PipelineData::ByteStream(stream, meta)),
}
}
fn examples(&self) -> Vec<Example> {
vec![
Example {
description: "Outputs data as simple text",
example: "1 | to text",
result: Some(Value::test_string("1")),
},
Example {
description: "Outputs external data as simple text",
example: "git help -a | lines | find -r '^ ' | to text",
result: None,
},
Example {
description: "Outputs records as simple text",
example: "ls | to text",
result: None,
},
]
}
}
fn local_into_string(value: Value, separator: &str, config: &Config) -> String {
let span = value.span();
match value {
Value::Bool { val, .. } => val.to_string(),
Value::Int { val, .. } => val.to_string(),
Value::Float { val, .. } => val.to_string(),
Value::Filesize { val, .. } => format_filesize_from_conf(val, config),
Value::Duration { val, .. } => format_duration(val),
Value::Date { val, .. } => {
format!("{} ({})", val.to_rfc2822(), HumanTime::from(val))
}
Value::Range { val, .. } => val.to_string(),
Value::String { val, .. } => val,
Value::Glob { val, .. } => val,
Value::List { vals: val, .. } => val
.into_iter()
.map(|x| local_into_string(x, ", ", config))
.collect::<Vec<_>>()
.join(separator),
Value::Record { val, .. } => val
.into_owned()
.into_iter()
.map(|(x, y)| format!("{}: {}", x, local_into_string(y, ", ", config)))
.collect::<Vec<_>>()
.join(separator),
Value::Closure { val, .. } => format!("<Closure {}>", val.block_id),
Value::Nothing { .. } => String::new(),
Value::Error { error, .. } => format!("{error:?}"),
Value::Binary { val, .. } => format!("{val:?}"),
Value::CellPath { val, .. } => val.to_string(),
// If we fail to collapse the custom value, just print <{type_name}> - failure is not
// that critical here
Value::Custom { val, .. } => val
.to_base_value(span)
.map(|val| local_into_string(val, separator, config))
.unwrap_or_else(|_| format!("<{}>", val.type_name())),
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn test_examples() {
use crate::test_examples;
test_examples(ToText {})
}
}