I'll add a public function that resolves file conflicts. This function will
take owned MergedTreeValue, and that's why the extracted function returns
None instead of cloning the passed value.
MergedTreeVal was roughly equivalent to Merge<Option<Cow<_>>. As we've dropped
support for the legacy trees, it can be simplified to Merge<Option<&_>>.
try_resolve_file_conflict() is also updated. It could be a generic function,
but there are only two callers, and the legacy tree one is used only in tests.
For the same reason as 2cb7e91d "merged_tree: do not re-look up non-conflicting
tree values by name." This appears to bring a similar performance improvement.
I assume this change is/will be covered by test_merged_tree.rs. I considered
adding a few unit tests, but constructing Tree object isn't trivial, and the
iterator implementation is relatively straightforward.
- force each diff command to explicitly enable copy tracking
- enable copy tracking in diff_summary
- post-process for diff iterator
- post-process for diff stream
- update changelog
This allows us to diff trees without fully resolving conflicts:
let from_tree = merge_no_resolve(..);
for (path, (from, to)) in from_tree.diff(to_tree, matcher) {
let from = resolve_conflicts(from);
if from == to {
continue; // resolved file may be identical
...
I originally considered adding a matcher argument to merge() functions, but the
resulting API looked misleading. If merge() took a matcher, callers might expect
unmatched trees and files were omitted, not left unresolved. It's also slower
than diffing unresolved trees because merge(.., matcher) would have to write
partially resolved trees to the store.
Since "ancestor_tree" isn't resolved by itself, this patch has subtle behavior
change. For example, "jj diff -r9eaef582" in the "git" repository is no longer
empty. I think the new behavior is also technically correct, but I'm not pretty
sure.
While measuring file(path) query, I noticed BTreeMap lookup appears in perf.
It actually has a measurable cost if the history is linear and parent trees
don't have to be merged dynamically. For merge-heavy history, the cost of
tree merges is more significant. I'll address that separately.
```
% hyperfine --sort command --warmup 3 --runs 50 -L bin jj-1,jj-2 \
'target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy \
log -r "::trunk() & ~merges() & file(root:builtin)" --no-graph -n100'
Benchmark 1: target/release-with-debug/jj-1 ..
Time (mean ± σ): 239.7 ms ± 7.1 ms [User: 192.1 ms, System: 46.5 ms]
Range (min … max): 222.2 ms … 249.7 ms 50 runs
Benchmark 2: target/release-with-debug/jj-2 ..
Time (mean ± σ): 201.7 ms ± 6.9 ms [User: 153.7 ms, System: 46.6 ms]
Range (min … max): 184.2 ms … 211.1 ms 50 runs
Relative speed comparison
1.19 ± 0.05 target/release-with-debug/jj-1 ..
1.00 target/release-with-debug/jj-2 ..
```
Suppose we add copy information to MergedTree, a MergedTree can be considered
a root tree representation plus global metadata. I think Merge<Tree> is a better
type for sub trees.
I considered making `MergedTree` just a newtype (1-tuple) but I went
with a struct instead because we may want to add copy information in a
separate field in the future.
As we discovered in the `jj fix` tests,
`MergedTreeBuilder::write_tree()` doesn't try to resolve conflicts,
not even trivial ones. This patch fixes that.
This changes the documented behavior of `resolve()` since it was
previously documented to not change the arity unless all conflicts
were resolved.
I plan to use `resolve()` from `MergedTreeBuilder::write_tree()`.
The current implementation of `resolve()` on legacy trees just
pretended any conflicts were regular files. It's better to error
out. The function is only used in tests so far.
Before this patch, `MergedTreeBuilder::write_tree()` would create a
new legacy tree if the base tree was a legacy tree. This patch makes
it always write multiple root trees instead (if there are conflicts).
We still support reading legacy conflicts if the
`format.tree-level-conflicts` config is set.
It's been about six months since we started using tree-level conflicts
by default. I can't imagine we would switch back. So let's continue
the migration by always using tree-level conflicts when merging trees,
even if all inputs were legacy trees.
I think this is a variant of the problem fixed by 7fda80fc2221 "tree: simplify
conflict before resolving at hunk level." We need to simplify() the conflict
before and after extracting file ids because the source conflict values may
contain trees to be cancelled out, and the file values may differ only in exec
bits. Since the legacy tree passes a simplified conflict in to this function,
I made the merged tree do the same.
Fixes#2654
Otherwise an empty subtree would be added to the parent tree.
If the stored tree contained an empty subtree, simplify() wouldn't work
against new "absent" subtree representation. I don't know if there's a
such code path, but I believe it's very rare to encounter the problem.
#2654
This basically backs out the change 1b9a3e27e07b "merged_tree: read before/after
trees concurrently." As we decided to add a separate impl for async access, it
doesn't make sense to read before/after pair in parallel.
The async single_tree() is moved to TreeDiffStreamImpl. It will help remove
the sync version when the performance problem is solved.
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().