125 Commits

Author SHA1 Message Date
Martin von Zweigbergk
1970ddef15 tree: propagate errors from sub_tree()/path_value() 2024-05-22 06:46:38 -07:00
Martin von Zweigbergk
428e209304 cleanup: consistently use BackendResult
We have the type alias so we should use it consistently.
2024-05-07 19:35:03 -07:00
Martin von Zweigbergk
0d1ff8a150 merged_tree: propagate errors from TreeEntriesIterator
We shouldn't panic if we fail to read a tree from the backend.
2024-05-01 06:10:08 -07:00
Yuya Nishihara
73b60903ce tree: flatten TreeMergeError into BackendError 2024-03-30 22:40:05 +09:00
Yuya Nishihara
35f718f212 merged_tree: remove canceling terms prior to resolving file-level conflict
I think this is a variant of the problem fixed by 7fda80fc2221 "tree: simplify
conflict before resolving at hunk level." We need to simplify() the conflict
before and after extracting file ids because the source conflict values may
contain trees to be cancelled out, and the file values may differ only in exec
bits. Since the legacy tree passes a simplified conflict in to this function,
I made the merged tree do the same.

Fixes #2654
2023-12-03 07:44:58 +09:00
Yuya Nishihara
4ffbf40c82 merged_tree: do not propagate conflicting empty tree value to parent
Otherwise an empty subtree would be added to the parent tree.

If the stored tree contained an empty subtree, simplify() wouldn't work
against new "absent" subtree representation. I don't know if there's a
such code path, but I believe it's very rare to encounter the problem.

#2654
2023-12-03 07:44:58 +09:00
Yuya Nishihara
076b49b610 merged_tree: use merged_tree_entry_diff() in stream version 2023-12-01 00:05:06 +09:00
Yuya Nishihara
97a260b1bf merged_tree: reimplement TreeEntryDiffIterator by using iterator adapter
We don't need a named type anymore.
2023-12-01 00:05:06 +09:00
Yuya Nishihara
fd1c03d037 merged_tree: use sync get_tree() in TreeDiffIterator
This basically backs out the change 1b9a3e27e07b "merged_tree: read before/after
trees concurrently." As we decided to add a separate impl for async access, it
doesn't make sense to read before/after pair in parallel.

The async single_tree() is moved to TreeDiffStreamImpl. It will help remove
the sync version when the performance problem is solved.
2023-12-01 00:05:06 +09:00
Yuya Nishihara
6ce7bd5338 repo_path: replace .contains() with .starts_with(), flipping the arguments
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
2023-11-29 08:41:23 +09:00
Yuya Nishihara
28ab9593c3 repo_path: split RepoPath into owned and borrowed types
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
2023-11-28 07:33:28 +09:00
Yuya Nishihara
0a1bc2ba42 repo_path: add stub RepoPathBuf type, update callers
Most RepoPath::from_internal_string() callers will be migrated to the function
that returns &RepoPath, and cloning &RepoPath won't work.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
974a6870b3 repo_path: make RepoPath::components() return iterator
This allows us to change the backing type from Vec<String> to String.
2023-11-27 08:42:09 +09:00
Yuya Nishihara
f2096da2d6 repo_path: add stub type to introduce borrowed RepoPathComponent type
The current RepoPathComponent will be renamed to RepoPathComponentBuf, and
new str wrapper will be added as RepoPathComponent.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
6344cd56b3 repo_path: remove RepoPathJoin trait, just implement join() on the type
I don't think we'll add join() that takes different types.
2023-11-26 07:14:47 +09:00
Yuya Nishihara
16620e0e4c merged_tree: drop legacy tree handling from ConflictsDirItem constructor
No callers pass in a legacy tree.
2023-11-21 07:45:30 +09:00
Yuya Nishihara
4ad3db2e84 merged_tree: extract value() function of non-legacy trees 2023-11-21 07:45:30 +09:00
Yuya Nishihara
ca3f549c9e merged_tree: remove redundant clone() from ConflictIterator construction 2023-11-21 07:45:30 +09:00
Martin von Zweigbergk
acc35a89a8 merged_tree: inline non-recursive entry iterator 2023-11-19 20:29:40 -10:00
Martin von Zweigbergk
426f6d0cdd merged_tree: inline non-recursive conflict iterator
The abstraction is no longer useful since we made the types not
self-referential.
2023-11-19 20:29:40 -10:00
Martin von Zweigbergk
c0295c5dbc merged_tree: make ConflictsDirItem not self-referential
This removes the last use of `ouroboros` in `merged_tree.rs`. The set
of conflicts to iterate is usually so small that I didn't bother
checking the performance impact.
2023-11-17 03:50:34 -08:00
Martin von Zweigbergk
e1a02c5c5b merged_tree: make TreeDiffDirItem not self-referential
This removes another dependency on `ouroboros`, for a small
performance hit:


```
❯ hyperfine --warmup 3 --runs 30 \
      '/tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0' \
      '/tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0
  Time (mean ± σ):     689.7 ms ±  23.9 ms    [User: 400.0 ms, System: 289.8 ms]
  Range (min … max):   666.9 ms … 759.2 ms    30 runs

Benchmark 2: /tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0
  Time (mean ± σ):     710.9 ms ±  19.2 ms    [User: 420.4 ms, System: 290.6 ms]
  Range (min … max):   688.5 ms … 752.0 ms    30 runs

Summary
  '/tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0' ran
    1.03 ± 0.05 times faster than '/tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0'
```
2023-11-17 03:50:34 -08:00
Martin von Zweigbergk
61d87fe296 merged_tree: make TreeEntriesIterator not self-referential
While importing the `ouroboros` crate and the `aliasable` crate it
depends on, the "unsafe Rust reviewer" expressed some concern that
they contain a lot of unsafe code that's hard to review. We can avoid
the unsafe code altogether by making `TreeEntriesIterator` not
self-refential. Instead, we can collect the matching entries in an
individual tree up front. It does have some performance cost:


```
❯ hyperfine --warmup 3 --runs 30 \
      '/tmp/jj-before --ignore-working-copy files -r v6.0' \
      '/tmp/jj-after --ignore-working-copy files -r v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy files -r v6.0
  Time (mean ± σ):     461.4 ms ±  14.3 ms    [User: 232.1 ms, System: 229.4 ms]
  Range (min … max):   443.4 ms … 496.3 ms    30 runs

Benchmark 2: /tmp/jj-after --ignore-working-copy files -r v6.0
  Time (mean ± σ):     482.0 ms ±  14.3 ms    [User: 257.2 ms, System: 224.9 ms]
  Range (min … max):   461.8 ms … 513.3 ms    30 runs

Summary
  '/tmp/jj-before --ignore-working-copy files -r v6.0' ran
    1.04 ± 0.04 times faster than '/tmp/jj-after --ignore-working-copy files -r v6.0'
```

I think that's acceptable.
2023-11-17 03:50:34 -08:00
Waleed Khan
a60733f632 tree: remove unsafe with ouroboros for self-referential iterators 2023-11-09 21:50:29 -08:00
Yuya Nishihara
2c128f1b61 merged_tree: convert from legacy conflicts through interleaved list
This is basically the same change as the previous commit.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
a734f46130 merged_tree: build unresolved Merge<Tree> from interleaved list
We no longer need to iterate removes and adds separately.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
dd26b7be40 merge: add Merge constructor that accepts interleaved values
Also migrated some callers of 3-way merge, where [left, base, right] order
looks okay.
2023-11-07 17:10:12 +09:00
Martin von Zweigbergk
1140295829 merged_tree: extract polling of tree futures into a function 2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
c77417d4e4 merged_tree: drop outer loop in TreeDiffStreamImpl::poll_next()
As suggested by Yuya. I also added a comment and an assertion in the
case where return `Poll::Pending`.
2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
f40adb84fc merged_tree: add a Stream for concurrent diff off trees
When diffing two trees, we currently start at the root and diff those
trees. Then we diff each subtree, one at a time, recursively. When
using a commit backend that uses remote storage, like our backend at
Google does, diffing the subtrees one at a time gets very slow. We
should be able to diff subtrees concurrently. That way, the number of
roundtrips to a server becomes determined by the depth of the deepest
difference instead of by the number of differing trees (times 2,
even). This patch implements such an algorithm behind a `Stream`
interface. It's not hooked in to `MergedTree::diff_stream()` yet; that
will happen in the next commit.

I timed the new implementation by updating `jj diff -s` to use the new
diff stream and then ran it on the Linux repo with `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by
~20%, from ~750 ms to ~900 ms. Maybe we can get some of that
performance back but I think it'll be hard to match
`MergedTree::diff()`. We can decide later if we're okay with the
difference (after hopefully reducing the gap a bit) or if we want to
keep both implementations.

I also timed the new implementation on our cloud-based repo at
Google. As expected, it made some diffs much faster (I'm not sure if
I'm allowed to share figures).
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
c9ce80a82a merged_tree: extract function for merged iterator of basenames in diff
I'm going to reuse this for stream/async diffing.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
b72f04ba61 merged_tree: rename all_tree_conflict_names() since it's not about conflicts 2023-11-06 23:12:02 -08:00
Yuya Nishihara
d9fbf21794 merge: have Merge::adds()/removes() return iterator
The Merge type will be changed to store interleaved values internally.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
1c6913d618 merge: use Merge::iter() instead of adds()/removes() where order doesn't matter
Merge::iter() will be a slice::Iter, and be more efficient than chaining adds
and removes.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
f6d85c51cd merge: add non-optional Merge accessor to the zeroth value
We have a few callers which just need to obtain an object common among all
the merge values. Let's add a non-failing accessor for that purpose.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
b12c688ea0 merge: add method for indexed adds/removes access
The current adds()/removes() will be changed to return iterators.
2023-11-05 16:43:06 +09:00
Martin von Zweigbergk
72245cfac5 merged_tree: add Stream-based version of diff(), delegating for now
I'm going to implement a `Stream`-based version optimized for
high-latency (RPC-based) commit backends. So far, that implementation
is about 20% slower in the Linux repo when running `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. I think that's almost
only because the algorithm is different, not because it's async per
se.

This commit adds a `Stream`-based version of `MergedTree::diff()` that
just wraps the regular iterator in stream. I updated `jj diff` to use
it. I couldn't measure any difference on the command above in the
Linux repo. I think that means we can safely use the same
`Stream`-based interface regardless of backend, even if we end up
needing two different implementations of the `Stream`. We would then
be using the wrapped iterator from this commit for local backends, and
the new implementation for remote backends. But ideally we can make
the remote-friendly implementation fast enough that we don't need two
implementations.
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
24b706641f async: switch to pollster's block_on()
During the transition to using more async code, I keep running into
https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
a1ef9dc845 merged_tree: propagate backend errors in diff iterator
I want to fix error propagation before I start using async in this
code. This makes the diff iterator propagate errors from reading tree
objects.

Errors include the path and don't stop the iteration. The idea is that
we should be able to show the user an error inline in diff output if
we failed to read a tree. That's going to be especially useful for
backends that can return `BackendError::AccessDenied`. That error
variant doesn't yet exist, but I plan to add it, and use it in
Google's internal backend.
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
309f1200d6 merge: introduce a type alias for Merge<Option<TreeValue>>
Reasons to introduce this alias:

* Reduces complexity of a type, to silence Clippy warnings in the
  future if we use this type as a type parameter

* The type is used quite frequently, so it makes sense to have a name
  for it

* It's easier to visually scan for the end of the type when you don't
  have to match opening and closing angle brackets
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
6ad71e658d merged_tree: rename MergedTreeValue to MergedTreeVal
I'm going to add `MergedTreeValue` as an alias for
`Merge<Option<TreeValue>>`, but we already have a type by that name in
`merged_tree`. This patch renames it away, to make room for the new
alias. I used `MergedTreeVal` for this borrowing version to be a bit
like how `str` is a borrowed version of `String`.
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
f541f9f3a6 cleanup: import futures::exectutor::block_on() instead of qualifying
It seems we'll end up using `block_on()` quite a bit, at least until
we're done transitioning to async, and the function name doesn't
conflict with anything else, so let's always import it when we need
it.
2023-10-20 07:38:34 -07:00
Martin von Zweigbergk
1b9a3e27e0 merged_tree: read before/after trees concurrently
I'm going to rewrite `TreeDiffIterator` to fetch one level (depth) of
the tree at a time and concurrently. One step towards that is to
convert the iterator to a `Stream`. I'd like to do that by making the
current `Iterator` implementation call the new `Stream`
implementation. However, we can't call `futures::executor::block_on()`
on a future that itself calls `futures::executor::block_on()` (as
`Store::read_tree()` does), so the first step is to bubble up the
async-ness a bit. This patch does that by fetching both sides of the
diff concurrently. That should give close to a 2x speedup on
high-latency backends. (It doesn't help with our backend at Google,
however, because we have a daemon process that does some speculative
prefetching that usually downloads the child trees anyway.)
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
7fda80fc22 tree: simplify conflict before resolving at hunk level
I ran into a bug the other day where `jj status` said there was a
conflict in a file but there were no conflict markers in the working
copy. The commit was created when I squashed a conflict resolution
into the commit's parent. The rebased child commit then ended up in
this state. I.e., it looked something like this before squashing:

```
C (no conflict)
|
| B conflict
|/
A conflict
```

The conflict in B was different from the conflict in A. When I
squashed in C, jj would try to resolve the conflicts by first creating
a 7-way conflict (3 from A, 3 from B, 1 from C). Because of the exact
content-level changes, the 7-way conflict couldn't be automatically
resolved by `files::merge()` (the way it currently works
anyway). However, after simplifying the conflict, it could be
resolved. Because `MergedTree::merge()` does another round of conflict
simplification of the result at the end of the function, it was the
simplifed version that actually got stored in the commit. So when
inspecting the conflict later (e.g. in the working copy, as I did), it
could be automatically resolved.

I think there are at least two ways to solve this. One is to call
`merge_trees()` again after calling `tree.simplify()` in
`MergedTree::merge()`. However, I think it would only matter in the
case of content-level conflicts. Therefore, it seems better to make
the content-level resolution solve this case to start with. I've done
that by simplifying the conflict before passing it into
`files::merge()`. We could even do the fix in `files::merge()`, but
doing it before calling it has the advantage that we can avoid reading
some unchanged content from the backend.
2023-09-27 22:14:39 -07:00
Martin von Zweigbergk
f39b0d24c8 tests: use test backend in working copy tests, fix MergedTree bug
Only tests dealing with Git submodules care about the backend type.

Switching the tests to use the test backend also uncovered another bug
in `MergedTree`, so I fixed that too. The bug only happens with legacy
trees (path-level conflicts) and backends that care about the conflict
path, so it wouldn't happen with Git backends, and it wouldn't happen
at Google either (because we use tree-level conflicts).
2023-09-19 20:49:41 -07:00
Martin von Zweigbergk
7ecd64fde1 merged_tree: use child path when merging child
This fixes a bug where we used the parent directory's path when trying
read trees and files for a child entry. Many tests in
`test_merged_tree` fail after switching to the test backend there
without this fix/
2023-09-18 07:53:19 -07:00
Martin von Zweigbergk
5ef0be73c1 merged_tree: allow building trees with variable-arity overrides
When restoring (`jj restore`) a 3-sided conflict from one tree into a
2-sided tree (or a resolved tree), we'll need to extend the size arity
of the target tree to that of the source tree. I had not considered
this case before. This patch relaxes the constraint in
`MergedTreeBuilder` to allow such cases. The additional trees are
based on empty trees with only the larger overrides in them.
2023-09-01 06:09:37 -07:00
Martin von Zweigbergk
8e47d2d66f merged_tree: add config option to write trees using new format
We're finally ready to start writing trees using the new format where
we represent conflicts by having multiple trees in the commit instead
of having a single tree with multiple entries at a path. This patch
adds a config option for that. It's not ready to be used yet, so I
haven't updated the release notes or other documentation.

I added only a simple CLI test for testing what happens when the
config is enabled in an existing repo. 108 tests currently fail if we
flip the default.
2023-08-30 06:17:21 -07:00
Martin von Zweigbergk
2d50d8a077 merged_tree: propagate errors from from_legacy_tree() 2023-08-29 08:32:04 -07:00