This reverts commit 68adc4657f9c57bb7090df3984a5d2931f8e7358.
# Description
Reverts the lazyframe refactor (#12669) for the next release, since
there are still a few lingering issues. This temporarily solves #12863
and #12828. After the release, the lazyframes can be added back and
cleaned up.
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
Read(Box<dyn Read + Send + 'static>),
File(File),
Child(ChildProcess),
}
```
This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.
Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
Empty,
Value(Value, Option<PipelineMetadata>),
ListStream(ListStream, Option<PipelineMetadata>),
ByteStream(ByteStream, Option<PipelineMetadata>),
}
```
The PR is relatively large, but a decent amount of it is just repetitive
changes.
This PR fixes#7017, fixes#10763, and fixes#12369.
This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |
# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.
# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.
---------
Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
This moves to predominantly supporting only lazy dataframes for most
operations. It removes a lot of the type conversion between lazy and
eager dataframes based on what was inputted into the command.
For the most part the changes will mean:
* You will need to run `polars collect` after performing operations
* The into-lazy command has been removed as it is redundant.
* When opening files a lazy frame will be outputted by default if the
reader supports lazy frames
A list of individual command changes can be found
[here](https://hackmd.io/@nucore/Bk-3V-hW0)
---------
Co-authored-by: Ian Manske <ian.manske@pm.me>
# Description
I added some more tests to our mighty `polars` ~~, yet I don't know how
to add expected results in some of them. I would like to ask for help.~~
~~My experiments are in the last commit: [polars:
experiments](f7e5e72019).
Without those experiments `cargo test` goes well.~~
UPD. I moved out my unsuccessful test experiments into a separate
[branch](https://github.com/maxim-uvarov/nushell/blob/polars-tests-broken2/).
So, this branch seems ready for a merge.
@ayax79, maybe you'll find time for me please? It's not urgent for sure.
P.S. I'm very new to git. Please feel free to give me any suggestions on
how I should use it better
I had previously changed NuLazyFrame::collect to set the NuDataFrame's
from_lazy field to false to prevent conversion back to a lazy frame. It
appears there are cases where this should happen. Instead, I am only
setting from_lazy=false inside the `polars collect` command.
[Related discord
message](https://discord.com/channels/601130461678272522/1227612017171501136/1230600465159421993)
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
This is just some cleanup. I moved to_pipeline_data and to_cache_value
to the CustomValueSupport trait, where I should've put them to begin
with.
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
@maxim-uvarov discovered the following error:
```
> [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
Error: × Error using as series
╭─[entry #1:1:68]
1 │ [[a b]; [6 2] [1 4] [4 1]] | polars into-lazy | polars sort-by a | polars unique --subset [a]
· ──────┬──────
· ╰── dataframe has more than one column
╰────
```
During investigation, I discovered the root cause was that the lazy frame was incorrectly converted back to a eager dataframe. In order to keep this from happening, I explicitly set that the dataframe did not come from an eager frame. This causes the conversion logic to not attempt to convert the dataframe later in the pipeline.
---------
Co-authored-by: Jack Wright <jack.wright@disqo.com>
# Description
`polars ls` is already different that `dfr ls`. Currently it just shows
the cache key, columns, rows, and type. I have added:
- creation time
- size
- span contents
- span start and end
<img width="1471" alt="Screenshot 2024-04-10 at 17 27 06"
src="https://github.com/nushell/nushell/assets/56345/545918b7-7c96-4c25-bc01-b9e2b659a408">
# Tests + Formatting
Done
Co-authored-by: Jack Wright <jack.wright@disqo.com>