Even though we don't know the details yet, we know that we want to
make the index pluggable like the commit and opstore
backends. Defining a trait for it should be a good step. We can refine
the trait later.
Our internal backend at Google uses a 32-byte change id, so I'd like
to make the backend able to decide the length. To start with, let's
make the backend able to decide what the root change id should
be. That's consistent with how we already let the backend decide what
the root commit id should be.
Git's HEAD ref is similar to other refs and can logically have
conflicts just like the other refs in `git_refs`. As with the other
refs, it can happen if you run concurrent commands importing two
different updates from Git. So let's treat `git_head` the same as
`git_refs` by making it an `Option<RefTarget>`.
I would expect `Commit::is_empty()` to check if the commit is empty in
our usual sense, i.e. that there are no changes compared to the
auto-merged parents. However, it would return `false` for any merge
commit (and for the root commit). Since we only use it in one place,
let's inline it there. The use there does seem reasonable, because
it's about abandoning an "uninteresting" working-copy commit.
revset::resolve_change_id() for ReadonlyRepo will be replaced with this
implementation. This doesn't mean revset query will speed up. A trivial
query will become slower due to the initialization cost of the change id
index. "jj log -r hex" will get faster since we have to pay the cost anyway.
Benchmark numbers (against my "linux" repo):
Command:
hyperfine --warmup 3 --runs 20 \
"jj log -r $hex -T '' --no-commit-working-copy --no-graph"
Linear search (e874570947):
Time (mean ± σ): 223.9 ms ± 16.2 ms [User: 181.2 ms, System: 42.7 ms]
Range (min … max): 207.7 ms … 247.6 ms 50 runs
Building IdIndex:
Time (mean ± σ): 855.0 ms ± 21.7 ms [User: 788.4 ms, System: 66.6 ms]
Range (min … max): 822.6 ms … 927.5 ms 50 runs
Building IdIndex, but hacked to store SmallVec<[u8; 20]>:
Time (mean ± σ): 406.1 ms ± 15.9 ms [User: 354.1 ms, System: 52.0 ms]
Range (min … max): 382.2 ms … 428.6 ms 50 runs
For my "jj" work repo, changes are < ~1ms.
I've preferred "working-copy commit" over "checkout" for a while
because I think it's clearer, but there were lots of places still
using "checkout". I've left "checkout" in places where it refers to
the action of updating the working copy or the working-copy commit.
Since this function depends on both index and view, it can't be moved to
one of the storage objects. If we go forward with this approach, some
revset::resolve_*() functions will also be migrated to RepoRef.
This patch slightly changes the function name since a "prefix" might have
various meanings.
This should fix the panic in the case reported in #1107. It's a bit
hard to reproduce because we normally notice the missing commit when
we snapshot the working copy, but it's possible to reproduce it using
`--no-commit-working-copy`.
I suspect the added test is too brittle because it checks the exact
error message. On the other hand, it might be useful to have one test
case like this so we catch accidental changes in the format.
Since IdIndex is immutable, we don't need fast insertion provided by BTreeMap.
Let's simply use Vec for some speed up. More importantly, this allows us to
store multiple (ChangeId, CommitId) pairs for the same change id, and will
unblock the use of IdIndex in revset::resolve_symbol().
Some benchmark numbers (against my "linux" repo) follow.
Command:
hyperfine --warmup 3 "jj log -r master \
-T 'commit_id.short_prefix_and_brackets()' \
--no-commit-working-copy --no-graph"
Original:
Time (mean ± σ): 1.892 s ± 0.031 s [User: 1.800 s, System: 0.092 s]
Range (min … max): 1.833 s … 1.935 s 10 runs
This commit:
Time (mean ± σ): 867.5 ms ± 2.7 ms [User: 809.9 ms, System: 57.7 ms]
Range (min … max): 862.3 ms … 871.0 ms 10 runs
With my "jj" work repo, this saves ~4ms to show the log with default revset.
Command:
JJ_CONFIG=/dev/null hyperfine --warmup 3 --runs 100 \
"jj log -T 'commit_id.short_prefix_and_brackets() \
change_id.short_prefix_and_brackets()' \
--no-commit-working-copy"
Baseline (a7541e1ba4):
Time (mean ± σ): 54.1 ms ± 16.4 ms [User: 46.4 ms, System: 7.8 ms]
Range (min … max): 36.5 ms … 78.1 ms 100 runs
This commit:
Time (mean ± σ): 49.5 ms ± 16.4 ms [User: 42.4 ms, System: 7.2 ms]
Range (min … max): 31.4 ms … 70.9 ms 100 runs
This is ugly, but we need a special case because root_change_id and
root_commit_id aren't equal but share the same prefix bytes. In practice,
no one would care for the shortest root id prefix, but we'll need to deal
with a similar problem when migrating prefix id resolution to repo layer.
This helps us to migrate commit_id index to ReadonlyIndex. For large
repositories, this also reduces initialization cost, but that's not the main
intent of this change.
https://github.com/martinvonz/jj/pull/1041#issuecomment-1399225876
common_hex_len() and iter_half_bytes() are added to backend.rs since more
call sites will be added to index.rs, and I feel index.rs isn't a good place
to host this kind of utility functions.
Make op resolution a closed operation, powered by a callback provided by the
caller which runs under an internal lock scope. This allows for greatly
simplifying the internal lifetime structuring.
This is fast enough to be used on medium-sized repositories such as git/git.
It is a bit slow, but bearable, on huge repositories such as torvalds/linux.
There is 0 performance penalty if the display of unique prefixes is disabled
A trie-based implementation will be submitted for consideration in a
follow-up PR. It is faster, but more complicated.
**Update:** I also just discovered https://sapling-scm.com/docs/internals/indexedlog/
There are three important aspects of performance that seemed relevant:
1. Speed of computing the shortest unique prefix per id. It is worlds faster
than the naive implementation before this commit. It can be optimized
furher by using a trie or maybe the `fst` crate.
2. Speed of inital loading of the index that happens before the first commit is
shown. This is the part that's noticeable but bearable on torvalds/linux.
This could be optimized by storing a sorted list of commit and change ids on
disk. This would likely involve reworking the `Index`.
Failing that, the speed of inital loading doesn't change if a trie is used
and would likely be worse with the `fst` crate
3. Memory use is unremarkable here. I don't have good tools to measure it
precisely, but it does not balloon to gigabytes even on the linux repo.
This creates a templater function `short_underscore_prefix` for commit and
change ids. It is similar to `short` function, but shows one fewer hexadecimal
digit and inserts an underscore after the shortest unique prefix.
Highlighting with an underline and perhaps color/bold will be in a follow-up
PR.
The implementation is quadratic, a simple comparison of each id with every
other id. It is replaced in a subsequent commit. The problem with it is that,
while it works fine for a `jj`-sized repo, it becomes is painfully slow with a
repo the size of git/git.
Still, this naive implemenation is included here since it's simple, and could
be used as a reference implementation.
The `shortest_unique_prefix_length` function goes into `repo.rs` since that's
convenient for follow-up commits in this PR to have nicer diffs.
Fixes https://github.com/martinvonz/jj/issues/1050
Thanks to Martin for suggesting the exact fix.
The tests go into the new tests/test_duplicate_command.rs, which will be
expanded shortly with other tests depending on this bugfix.
We forgot to actually call `StoreFactories::load_op_heads_store()` to
load the right type of `OpHeadsStore` depending on the contents of
`.jj/repo/op_heads/type`. That shouldn't have any effect yet since we
only have one type so far, and there are no out-of-tree types yet
either (clearly, since they would not work).
When you're done with the `CommitBuilder`, you're going to have to
call `write_to_repo()`, passing it a mutable `MutableRepo`
reference. It's a bit simpler to pass that reference when we create
the `CommitBuilder` instead, so that's what this patch does.
A drawback of passing in the mutable reference when we create the
builder is that we can't have multiple unfinished `CommitBuilder`
instance live at the same time. We don't have any such use cases yet,
and it's not hard to work around them, so I think this change is worth
it.
The implementation has some hoops to jump through because Rust does not allow
`self: &Arc<Self>` on trait methods, and two of the OpHeadsStore functions need
to return cloned selves. This is worked around by making the implementation type
itself a wrapper around Arc<>.
This is not particularly note worthy for the current implementation type where
the only data copied is a PathBuf, but for extensions it is likely to be more
critical that the lifetime management of the OpHeadsStore is properly
maintained.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.
We currently get the hostname and username from the `whoami` crate. We
do that in lib crate, without giving the caller a way to override
them. That seems wrong since it might be used in a server and
performing operations on behalf of some other user. This commit makes
the hostname and username configurable, so the calling crate can pass
them in. If they have not been passed in, we still default to the
values from the `whoami` crate.