111 Commits

Author SHA1 Message Date
Yuya Nishihara
a6566832c2 merged_tree: extract file-conflict resolution from merge_tree_values()
I'll add a public function that resolves file conflicts. This function will
take owned MergedTreeValue, and that's why the extracted function returns
None instead of cloning the passed value.
2024-08-13 15:02:24 +09:00
Yuya Nishihara
f7377fbbcd merged_tree: replace MergedTreeVal<'_> by Merge<Option<&TreeValue>>
MergedTreeVal was roughly equivalent to Merge<Option<Cow<_>>. As we've dropped
support for the legacy trees, it can be simplified to Merge<Option<&_>>.
2024-08-12 23:01:46 +09:00
Yuya Nishihara
accd1e337a merge: add .cloned() method that maps inner Option<&T> to Option<T>
MergedTreeVal::to_merge() will be replaced with this.
2024-08-12 23:01:46 +09:00
Yuya Nishihara
fd52efa0ba merged_tree: leverage Merge<Tree> entries iterator in all_tree_entries() 2024-08-12 10:20:34 +09:00
Yuya Nishihara
88018e84fc merged_tree: micro-optimize Merge<Tree> entries iterator to return &TreeValue
try_resolve_file_conflict() is also updated. It could be a generic function,
but there are only two callers, and the legacy tree one is used only in tests.
2024-08-12 10:20:34 +09:00
Yuya Nishihara
6d6f5990de merged_tree: add merge-join iterator over Merge<Tree> entries
For the same reason as 2cb7e91d "merged_tree: do not re-look up non-conflicting
tree values by name." This appears to bring a similar performance improvement.

I assume this change is/will be covered by test_merged_tree.rs. I considered
adding a few unit tests, but constructing Tree object isn't trivial, and the
iterator implementation is relatively straightforward.
2024-08-12 10:20:34 +09:00
Matt Kulukundis
5911e5c9b2 copy-tracking: Add copy tracking as a post iteration step
- force each diff command to explicitly enable copy tracking
- enable copy tracking in diff_summary
- post-process for diff iterator
- post-process for diff stream
- update changelog
2024-08-11 17:01:45 -04:00
Matt Kulukundis
0349d9ead3 copy-tracking: extract next_impl from next in diff iter/stream 2024-08-11 17:01:45 -04:00
Matt Kulukundis
34b0f87584 copy-tracking: plumb CopyRecordMap through diff method 2024-08-11 17:01:45 -04:00
Matt Kulukundis
e123eb21b9 copy-tracking: add source field to TreeDiffEntry
- add the field and make it compile, but don't use it yet
2024-08-11 17:01:45 -04:00
Matt Kulukundis
8e84c60157 copy-tracking: create an explicit TreeDiffEntry struct 2024-08-11 17:01:45 -04:00
Yuya Nishihara
c9e147c425 merged_tree: allow to postpone resolution of intermediate trees
This allows us to diff trees without fully resolving conflicts:

    let from_tree = merge_no_resolve(..);
    for (path, (from, to)) in from_tree.diff(to_tree, matcher) {
        let from = resolve_conflicts(from);
        if from == to {
            continue; // resolved file may be identical
        ...

I originally considered adding a matcher argument to merge() functions, but the
resulting API looked misleading. If merge() took a matcher, callers might expect
unmatched trees and files were omitted, not left unresolved. It's also slower
than diffing unresolved trees because merge(.., matcher) would have to write
partially resolved trees to the store.

Since "ancestor_tree" isn't resolved by itself, this patch has subtle behavior
change. For example, "jj diff -r9eaef582" in the "git" repository is no longer
empty. I think the new behavior is also technically correct, but I'm not pretty
sure.
2024-08-11 18:23:21 +09:00
Yuya Nishihara
ed1c07e73e tree: fill in valid id to null tree, rename function to empty()
If a null tree were written to the store, GitBackend would crash because of
invalid hash length.
2024-08-11 18:23:21 +09:00
Yuya Nishihara
2cb7e91dc7 merged_tree: do not re-look up non-conflicting tree values by name
While measuring file(path) query, I noticed BTreeMap lookup appears in perf.
It actually has a measurable cost if the history is linear and parent trees
don't have to be merged dynamically. For merge-heavy history, the cost of
tree merges is more significant. I'll address that separately.

```
% hyperfine --sort command --warmup 3 --runs 50 -L bin jj-1,jj-2 \
  'target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy \
   log -r "::trunk() & ~merges() & file(root:builtin)" --no-graph -n100'
Benchmark 1: target/release-with-debug/jj-1 ..
  Time (mean ± σ):     239.7 ms ±   7.1 ms    [User: 192.1 ms, System: 46.5 ms]
  Range (min … max):   222.2 ms … 249.7 ms    50 runs

Benchmark 2: target/release-with-debug/jj-2 ..
  Time (mean ± σ):     201.7 ms ±   6.9 ms    [User: 153.7 ms, System: 46.6 ms]
  Range (min … max):   184.2 ms … 211.1 ms    50 runs

Relative speed comparison
        1.19 ±  0.05  target/release-with-debug/jj-1 ..
        1.00          target/release-with-debug/jj-2 ..
```
2024-08-09 00:17:37 +09:00
Yuya Nishihara
19b62d29ba merged_tree: leverage .to_tree_merge() in TreeDiffIterator 2024-08-08 23:05:37 +09:00
Yuya Nishihara
6fc7cec4a5 merged_tree: make TreeDiffIterator accept trees as &Merge<Tree>
For the same reason as the patch for TreeEntriesIterator. It's probably
better to assume that MergedTree represents the root tree.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
9378adedb7 merged_tree: hold store globally by TreeDiffIterator
Since TreeDiffDirItem is now calculated eagerly, it doesn't make sense to
keep MergedTree in it.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
8b72dad095 merged_tree: replace explicit .is_tree() call in TreeEntriesIterator
The value here shouldn't be absent, so .is_tree() is equivalent to
.to_tree_merge().is_some().
2024-08-08 23:05:37 +09:00
Yuya Nishihara
12434b49b8 merged_tree: make TreeEntriesIterator accept trees as &Merge<Tree>
Suppose we add copy information to MergedTree, a MergedTree can be considered
a root tree representation plus global metadata. I think Merge<Tree> is a better
type for sub trees.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
8a3e4ad966 merged_tree: hold store globally by TreeEntriesIterator
Since TreeEntriesDirItem is now calculated eagerly, it doesn't make sense to
keep MergedTree in it.
2024-08-08 23:05:37 +09:00
Martin von Zweigbergk
ec7725064b merged_tree: make MergedTree a struct
I considered making `MergedTree` just a newtype (1-tuple) but I went
with a struct instead because we may want to add copy information in a
separate field in the future.
2024-08-08 05:32:16 -07:00
Martin von Zweigbergk
7596935285 merged_tree: make ConflictIterator a struct 2024-08-08 05:32:16 -07:00
Martin von Zweigbergk
109391f9c7 merged_tree: delete MergedTree::Legacy 2024-08-08 05:32:16 -07:00
Yuya Nishihara
202fb533f4 merged_tree: remove .diff() method in favor of .diff_stream()
It's unlikely we'll need the iterator version of .diff() except for testing
the stream implementation.
2024-08-08 10:45:59 +09:00
Yuya Nishihara
d061c3782f merged_tree: remove .diff_summary()
There are no non-test callers since 452fecb7c4f6 "cli: colorize diff summary
and sort by path."
2024-08-06 10:15:44 +09:00
Martin von Zweigbergk
8d67b1412e cleanup: leverage BoxStream/BoxFuture type aliases
Thanks to @fowles for bringing these to my attention.
2024-06-26 11:34:52 +09:00
Martin von Zweigbergk
65a988e3d2 merged_tree: make tree builder attempt to resolve conflicts
As we discovered in the `jj fix` tests,
`MergedTreeBuilder::write_tree()` doesn't try to resolve conflicts,
not even trivial ones. This patch fixes that.
2024-06-08 20:29:30 +09:00
Martin von Zweigbergk
2e6a0f9e96 merged_tree: make resolve() also simplify the result
This changes the documented behavior of `resolve()` since it was
previously documented to not change the arity unless all conflicts
were resolved.

I plan to use `resolve()` from `MergedTreeBuilder::write_tree()`.
2024-06-08 20:29:30 +09:00
Martin von Zweigbergk
776b2d981f merged_tree: make resolve() return a MergedTree
It seems like a method on `MergedTree` should return another
`MergedTree` when reasonable. I'm not sure why I made it return a
`Merge<Tree>` instead.
2024-06-08 20:29:30 +09:00
Martin von Zweigbergk
6ab9d7bdc7 merged_tree: make resolve() on legacy tree an error
The current implementation of `resolve()` on legacy trees just
pretended any conflicts were regular files. It's better to error
out. The function is only used in tests so far.
2024-06-08 20:29:30 +09:00
Martin von Zweigbergk
43bf8667d8 conflicts: always write tree-level conflicts from MergedTreeBuilder
Before this patch, `MergedTreeBuilder::write_tree()` would create a
new legacy tree if the base tree was a legacy tree. This patch makes
it always write multiple root trees instead (if there are conflicts).

We still support reading legacy conflicts if the
`format.tree-level-conflicts` config is set.
2024-05-27 06:25:27 -07:00
Martin von Zweigbergk
8e6e04b929 conflicts: always use tree-level format for merged trees
It's been about six months since we started using tree-level conflicts
by default. I can't imagine we would switch back. So let's continue
the migration by always using tree-level conflicts when merging trees,
even if all inputs were legacy trees.
2024-05-27 06:25:27 -07:00
Yuya Nishihara
f65ba88109 tree: take sub_tree_recursive() argument as &RepoPath
Since RepoPath is now a slice type, it can be constructed without cloning the
backing buffer. Let's simply use it instead of the iterator type.
2024-05-23 10:14:48 +09:00
Martin von Zweigbergk
cd5e82d0d3 tree: make sub_tree_recursive() public
These functions (in `Tree` and `MergedTree`) are safe to use. We have
a duplicate of these functions at Google, which would be nice to
avoid.
2024-05-22 11:21:18 -07:00
Martin von Zweigbergk
aecee1d6cc tree: make MergedTreeVal::to_merge() public
I don't think there's any harm in this function being public. We have
a duplicate of it at Google.
2024-05-22 11:20:43 -07:00
Martin von Zweigbergk
07bb1d81b7 tree_builder: propagate errors from write_tree() 2024-05-22 06:46:38 -07:00
Martin von Zweigbergk
1970ddef15 tree: propagate errors from sub_tree()/path_value() 2024-05-22 06:46:38 -07:00
Martin von Zweigbergk
428e209304 cleanup: consistently use BackendResult
We have the type alias so we should use it consistently.
2024-05-07 19:35:03 -07:00
Martin von Zweigbergk
0d1ff8a150 merged_tree: propagate errors from TreeEntriesIterator
We shouldn't panic if we fail to read a tree from the backend.
2024-05-01 06:10:08 -07:00
Yuya Nishihara
73b60903ce tree: flatten TreeMergeError into BackendError 2024-03-30 22:40:05 +09:00
Yuya Nishihara
35f718f212 merged_tree: remove canceling terms prior to resolving file-level conflict
I think this is a variant of the problem fixed by 7fda80fc2221 "tree: simplify
conflict before resolving at hunk level." We need to simplify() the conflict
before and after extracting file ids because the source conflict values may
contain trees to be cancelled out, and the file values may differ only in exec
bits. Since the legacy tree passes a simplified conflict in to this function,
I made the merged tree do the same.

Fixes #2654
2023-12-03 07:44:58 +09:00
Yuya Nishihara
4ffbf40c82 merged_tree: do not propagate conflicting empty tree value to parent
Otherwise an empty subtree would be added to the parent tree.

If the stored tree contained an empty subtree, simplify() wouldn't work
against new "absent" subtree representation. I don't know if there's a
such code path, but I believe it's very rare to encounter the problem.

#2654
2023-12-03 07:44:58 +09:00
Yuya Nishihara
076b49b610 merged_tree: use merged_tree_entry_diff() in stream version 2023-12-01 00:05:06 +09:00
Yuya Nishihara
97a260b1bf merged_tree: reimplement TreeEntryDiffIterator by using iterator adapter
We don't need a named type anymore.
2023-12-01 00:05:06 +09:00
Yuya Nishihara
fd1c03d037 merged_tree: use sync get_tree() in TreeDiffIterator
This basically backs out the change 1b9a3e27e07b "merged_tree: read before/after
trees concurrently." As we decided to add a separate impl for async access, it
doesn't make sense to read before/after pair in parallel.

The async single_tree() is moved to TreeDiffStreamImpl. It will help remove
the sync version when the performance problem is solved.
2023-12-01 00:05:06 +09:00
Yuya Nishihara
6ce7bd5338 repo_path: replace .contains() with .starts_with(), flipping the arguments
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
2023-11-29 08:41:23 +09:00
Yuya Nishihara
28ab9593c3 repo_path: split RepoPath into owned and borrowed types
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
2023-11-28 07:33:28 +09:00
Yuya Nishihara
0a1bc2ba42 repo_path: add stub RepoPathBuf type, update callers
Most RepoPath::from_internal_string() callers will be migrated to the function
that returns &RepoPath, and cloning &RepoPath won't work.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
974a6870b3 repo_path: make RepoPath::components() return iterator
This allows us to change the backing type from Vec<String> to String.
2023-11-27 08:42:09 +09:00
Yuya Nishihara
f2096da2d6 repo_path: add stub type to introduce borrowed RepoPathComponent type
The current RepoPathComponent will be renamed to RepoPathComponentBuf, and
new str wrapper will be added as RepoPathComponent.
2023-11-26 18:21:40 +09:00