331 Commits

Author SHA1 Message Date
Martin von Zweigbergk
37151e0ff9 index: load store based on type recorded in .jj/repo/index/type
This is another step towards allowing a custom `jj` binary to have its
own index type. We're going to have a server-backed index
implementation at Google, for example.
2023-03-11 22:22:46 -08:00
Martin von Zweigbergk
3e4e0dc916 index_store: extract trait
This is a step towards making the index storage pluggable. The
interface will probably change a bit soon, but let's start with
functions that match the current implementation.

I called the current implementation the `DefaultIndexStore`. Calling
it `SimpleIndexStore` (like `SimpleOpStore` and `SimpleOpHeadsStore`)
didn't seem accurate.
2023-03-11 22:22:46 -08:00
Samuel Tardieu
616058c2fa lib: add Commit::is_discardable() 2023-03-05 23:50:20 +01:00
Martin von Zweigbergk
94bdbb7ff7 index: make IndexStore factory functions take &Path
This is just for consistency with the other backends.
2023-03-02 12:33:11 -08:00
Martin von Zweigbergk
fd6c7f3bb3 op_heads_store: let caller create initial operation
It makes the APIs much simpler if we don't have to pass in information
about the initial operation when we create the `OpHeadsStore`. It also
makes the alternative `OpHeadsStore` implementations simpler since we
move some logic into a shared location (`ReadonlyRepo::init()`).

This effectively undoes ec071041268d. Maybe some further refactoring
made it possible to move it back as I'm doing in this commit?
2023-02-28 08:08:31 -08:00
Martin von Zweigbergk
346e3c849b repo: propagate error when failing to look up backend type 2023-02-27 09:44:28 -08:00
Martin von Zweigbergk
491ecc6b2e repo: replace load_at_head() by helper in tests
I'm about to make `RepoLoader::init()` return a `Result`, and I don't
want to have to wrap that in a new error in
`ReadonlyRepo::load_at_head()` since that's only used in tests.
2023-02-27 09:44:28 -08:00
Martin von Zweigbergk
d8997999f2 repo: replace RepoRef by Repo trait 2023-02-15 19:15:17 -08:00
Martin von Zweigbergk
f6a4cb57da repo: extract a Repo trait for Arc<ReadonlyRepo> and MutableRepo
This will soon replace the `RepoRef` enum, just like how the `Index`
trait replaced the `IndexRef` enum.
2023-02-15 19:15:17 -08:00
Martin von Zweigbergk
b7dad291df repo: make all base_repo() functions return &Arc<ReadonlyRepo>
This is to prepare for a `Repo` trait implemented for
`Arc<ReadonlyRepo>` (not for `ReadonlyRepo` itself).
2023-02-15 19:15:17 -08:00
Martin von Zweigbergk
8a067282c8 repo: make ReadonlyRepo::index() return a &dyn Index
This is just a little preparation for extracting a `Repo` trait that's
implemented by both `ReadonlyRepo` and `MutableRepo`. The `index()`
function in that trait will of course have to return the same type in
both implementations, and that type will be `&dyn Index`.
2023-02-15 19:15:17 -08:00
Martin von Zweigbergk
2d8aa2d90e index: delete IndexRef, use Index trait
I don't know why I didn't create a trait to begin with. Maybe I had
trouble with lifetimes or object-safety.
2023-02-14 06:51:49 -08:00
Martin von Zweigbergk
b955e3de03 index: extract a trait for the index
Even though we don't know the details yet, we know that we want to
make the index pluggable like the commit and opstore
backends. Defining a trait for it should be a good step. We can refine
the trait later.
2023-02-14 06:51:49 -08:00
Martin von Zweigbergk
7a985ed122 index: remove lifetime parameter to IndexRef::heads()/topo_order()
I want to replace `IndexRef` by a trait, and I want that trait to be
object-safe.
2023-02-14 06:51:49 -08:00
Martin von Zweigbergk
81af5f820b repo: calculate shortest unique prefix separately for commit/change
We now resolve the two kinds of ids in separate spaces, so the
shortest prefixes should also be calculated in separate spaces.
2023-02-13 22:49:21 -08:00
Martin von Zweigbergk
222709196a repo: remove code for conflict between root commit/change id
The two ids no longer share a prefix, so we don't need to worry about
one being a prefix of the other.
2023-02-13 22:49:21 -08:00
Martin von Zweigbergk
d6909002f0 repo: elide lifetime on resolve_change_id_prefix() 2023-02-13 22:49:21 -08:00
Martin von Zweigbergk
e6693d0f68 backend: let backend choose root change id
Our internal backend at Google uses a 32-byte change id, so I'd like
to make the backend able to decide the length. To start with, let's
make the backend able to decide what the root change id should
be. That's consistent with how we already let the backend decide what
the root commit id should be.
2023-02-07 22:31:34 -08:00
Martin von Zweigbergk
fafa9b70fc view: also merge git_heads when merging views
I don't know if I had just forgotten to merge `git_heads` when I added
it to the view object, but it seems like it should be merged just like
refs.
2023-01-30 09:05:03 -08:00
Martin von Zweigbergk
4e8fbaa210 git: allow conflicts in "HEAD@git"
Git's HEAD ref is similar to other refs and can logically have
conflicts just like the other refs in `git_refs`. As with the other
refs, it can happen if you run concurrent commands importing two
different updates from Git. So let's treat `git_head` the same as
`git_refs` by making it an `Option<RefTarget>`.
2023-01-30 09:05:03 -08:00
Martin von Zweigbergk
aaf75b4793 repo: inline single-caller, and surprising, Commit::is_empty()
I would expect `Commit::is_empty()` to check if the commit is empty in
our usual sense, i.e. that there are no changes compared to the
auto-merged parents. However, it would return `false` for any merge
commit (and for the root commit). Since we only use it in one place,
let's inline it there. The use there does seem reasonable, because
it's about abandoning an "uninteresting" working-copy commit.
2023-01-28 15:54:03 -08:00
Yuya Nishihara
d771c12637 index: make HexPrefix accessor simply return "min" prefix as bytes slice
This is low-level function, so I think using &[u8] should be good here.
2023-01-27 03:37:44 +09:00
Yuya Nishihara
956a2d5f83 index: remove redundant prefix tests from resolve_prefix functions
The "min" prefix guarantees that the first entry matches the hex prefix
if any. Spotted by @ilyagr.
2023-01-27 03:37:44 +09:00
Yuya Nishihara
b9fc6d4203 templater: rewrite divergent property by leveraging IdIndex 2023-01-26 14:10:26 +09:00
Yuya Nishihara
824f2106fd repo: migrate revset::resolve_change_id() to use IdIndex for ReadonlyRepo
The MutableRepo implementation is the same as before.
2023-01-26 14:10:26 +09:00
Yuya Nishihara
4f15d1f779 repo: implement method to look up change_id prefix by using IdIndex
revset::resolve_change_id() for ReadonlyRepo will be replaced with this
implementation. This doesn't mean revset query will speed up. A trivial
query will become slower due to the initialization cost of the change id
index. "jj log -r hex" will get faster since we have to pay the cost anyway.

Benchmark numbers (against my "linux" repo):

Command:
    hyperfine --warmup 3 --runs 20 \
      "jj log -r $hex -T '' --no-commit-working-copy --no-graph"

Linear search (e874570947):
    Time (mean ± σ):     223.9 ms ±  16.2 ms    [User: 181.2 ms, System: 42.7 ms]
    Range (min … max):   207.7 ms … 247.6 ms    50 runs

Building IdIndex:
    Time (mean ± σ):     855.0 ms ±  21.7 ms    [User: 788.4 ms, System: 66.6 ms]
    Range (min … max):   822.6 ms … 927.5 ms    50 runs

Building IdIndex, but hacked to store SmallVec<[u8; 20]>:
    Time (mean ± σ):     406.1 ms ±  15.9 ms    [User: 354.1 ms, System: 52.0 ms]
    Range (min … max):   382.2 ms … 428.6 ms    50 runs

For my "jj" work repo, changes are < ~1ms.
2023-01-26 14:10:26 +09:00
Yuya Nishihara
38a9180bb7 repo: generalize IdIndex over key and value types
Though we'll only need IdIndex<ChangeId, IndexPosition>, this allows us to
write unit tests without setting up MutableIndex.
2023-01-26 14:10:26 +09:00
Martin von Zweigbergk
10725c095f cleanup: update more "checkout" to "working-copy commit" and similar
I've preferred "working-copy commit" over "checkout" for a while
because I think it's clearer, but there were lots of places still
using "checkout". I've left "checkout" in places where it refers to
the action of updating the working copy or the working-copy commit.
2023-01-25 11:02:59 -08:00
Martin von Zweigbergk
0d1ec835c1 repo: rename .jj/repo/store/backend to .jj/repo/store/type
We decided to call the files identifying the backend type `type`. We
already use that name for `OpStore` and `OpHeadsStore`.
2023-01-25 09:22:38 -08:00
Yuya Nishihara
c018ef229b repo: proxy shortest unique prefix function through RepoRef
Since this function depends on both index and view, it can't be moved to
one of the storage objects. If we go forward with this approach, some
revset::resolve_*() functions will also be migrated to RepoRef.

This patch slightly changes the function name since a "prefix" might have
various meanings.
2023-01-25 10:47:39 +09:00
Yuya Nishihara
c0c5e8f041 repo: rewrite "all()" query to clarify data dependency 2023-01-25 10:47:39 +09:00
Martin von Zweigbergk
ce094c618b repo: propagate error when current working-copy commit is not found
This should fix the panic in the case reported in #1107. It's a bit
hard to reproduce because we normally notice the missing commit when
we snapshot the working copy, but it's possible to reproduce it using
`--no-commit-working-copy`.

I suspect the added test is too brittle because it checks the exact
error message. On the other hand, it might be useful to have one test
case like this so we catch accidental changes in the format.
2023-01-24 12:20:28 -08:00
Martin von Zweigbergk
63aa484046 repo: add a specific error type for MutableRepo::check_out() 2023-01-24 12:20:28 -08:00
Martin von Zweigbergk
eb7de6dd3c repo: inline leave_commit() into single caller 2023-01-24 12:20:28 -08:00
Martin von Zweigbergk
4777508df0 repo: make check_out() call edit()
This reduces duplication a little, and it makes logical sense.
2023-01-24 12:20:28 -08:00
Martin von Zweigbergk
dd3472924b repo: add a specific error type for MutableRepo::edit()
The new type is just an enum version of `RewriteRootCommit`.  I'll add
another variant soon.
2023-01-24 12:20:28 -08:00
Yuya Nishihara
c82a62cf99 repo: turn IdIndex into sorted Vec, use binary search
Since IdIndex is immutable, we don't need fast insertion provided by BTreeMap.
Let's simply use Vec for some speed up. More importantly, this allows us to
store multiple (ChangeId, CommitId) pairs for the same change id, and will
unblock the use of IdIndex in revset::resolve_symbol().

Some benchmark numbers (against my "linux" repo) follow.

Command:
    hyperfine --warmup 3 "jj log -r master \
      -T 'commit_id.short_prefix_and_brackets()' \
      --no-commit-working-copy --no-graph"

Original:
    Time (mean ± σ):      1.892 s ±  0.031 s    [User: 1.800 s, System: 0.092 s]
    Range (min … max):    1.833 s …  1.935 s    10 runs

This commit:
    Time (mean ± σ):     867.5 ms ±   2.7 ms    [User: 809.9 ms, System: 57.7 ms]
    Range (min … max):   862.3 ms … 871.0 ms    10 runs
2023-01-23 07:38:04 +09:00
Yuya Nishihara
879f585b21 repo: leverage stored index to calculate shortest prefix in commit id space
With my "jj" work repo, this saves ~4ms to show the log with default revset.

Command:
    JJ_CONFIG=/dev/null hyperfine --warmup 3 --runs 100 \
      "jj log -T 'commit_id.short_prefix_and_brackets() \
                  change_id.short_prefix_and_brackets()' \
              --no-commit-working-copy"

Baseline (a7541e1ba4):
    Time (mean ± σ):      54.1 ms ±  16.4 ms    [User: 46.4 ms, System: 7.8 ms]
    Range (min … max):    36.5 ms …  78.1 ms    100 runs

This commit:
    Time (mean ± σ):      49.5 ms ±  16.4 ms    [User: 42.4 ms, System: 7.2 ms]
    Range (min … max):    31.4 ms …  70.9 ms    100 runs
2023-01-22 17:24:03 +09:00
Yuya Nishihara
a7541e1ba4 repo: add workaround for shortest prefix calculation of root ids
This is ugly, but we need a special case because root_change_id and
root_commit_id aren't equal but share the same prefix bytes. In practice,
no one would care for the shortest root id prefix, but we'll need to deal
with a similar problem when migrating prefix id resolution to repo layer.
2023-01-22 12:03:08 +09:00
Yuya Nishihara
1a4b5c5ee6 index: make IdIndex store raw bytes, not hex bytes
This helps us to migrate commit_id index to ReadonlyIndex. For large
repositories, this also reduces initialization cost, but that's not the main
intent of this change.

https://github.com/martinvonz/jj/pull/1041#issuecomment-1399225876

common_hex_len() and iter_half_bytes() are added to backend.rs since more
call sites will be added to index.rs, and I feel index.rs isn't a good place
to host this kind of utility functions.
2023-01-22 12:03:08 +09:00
Yuya Nishihara
65a659347e tests: pad odd-length hex bytes passed in to repo::IdIndex
This allows us to migrate IdIndex to raw bytes. In practice, these ids are
full hashes which should never be odd length.
2023-01-22 12:03:08 +09:00
Yuya Nishihara
1d2642de1e repo: split commit_id and change_id indices
The goal is to replace the commit_id index with ReadonlyIndex to save the
initialization cost, but this also helps to fix root id handling.
2023-01-22 12:03:08 +09:00
Daniel Ploch
bd43580437 op_heads_store: remove LockedOpHeads
Make op resolution a closed operation, powered by a callback provided by the
caller which runs under an internal lock scope. This allows for greatly
simplifying the internal lifetime structuring.
2023-01-20 15:18:08 -08:00
Martin von Zweigbergk
0f8622dd5c repo: move test_id_index() into a tests module
This is the usual convention (to save on compilation time when not
running tests).
2023-01-18 16:59:16 -08:00
Ilya Grigoriev
606eefa8c4 A BTree-based index of commit & change ids to optimize unique_prefix
This is fast enough to be used on medium-sized repositories such as git/git.
It is a bit slow, but bearable, on huge repositories such as torvalds/linux.

There is 0 performance penalty if the display of unique prefixes is disabled

A trie-based implementation will be submitted for consideration in a
follow-up PR. It is faster, but more complicated.

**Update:** I also just discovered https://sapling-scm.com/docs/internals/indexedlog/

There are three important aspects of performance that seemed relevant:

1. Speed of computing the shortest unique prefix per id. It is worlds faster
  than the naive implementation before this commit. It can be optimized
  furher by using a trie or maybe the `fst` crate.

2. Speed of inital loading of the index that happens before the first commit is
  shown. This is the part that's noticeable but bearable on torvalds/linux. 
  
  This could be optimized by storing a sorted list of commit and change ids on
  disk.  This would likely involve reworking the `Index`.

  Failing that, the speed of inital loading doesn't change if a trie is used
  and would likely be worse with the `fst` crate

3. Memory use is unremarkable here. I don't have good tools to measure it
  precisely, but it does not balloon to gigabytes even on the linux repo.
2023-01-17 22:01:09 -08:00
Ilya Grigoriev
19d341d32a Templater: naive implementation of shortest prefix highlight for ids
This creates a templater function `short_underscore_prefix` for commit and
change ids. It is similar to `short` function, but shows one fewer hexadecimal
digit and inserts an underscore after the shortest unique prefix.

Highlighting with an underline and perhaps color/bold will be in a follow-up
PR.

The implementation is quadratic, a simple comparison of each id with every
other id. It is replaced in a subsequent commit. The problem with it is that,
while it works fine for a `jj`-sized repo, it becomes is painfully slow with a
repo the size of git/git. 

Still, this naive implemenation is included here since it's simple, and could
be used as a reference implementation. 

The `shortest_unique_prefix_length` function goes into `repo.rs` since that's
convenient for follow-up commits in this PR to have nicer diffs.
2023-01-17 22:01:09 -08:00
Ilya Grigoriev
a9e7c9bffc Make jj undo work after jj duplicate
Fixes https://github.com/martinvonz/jj/issues/1050

Thanks to Martin for suggesting the exact fix.

The tests go into the new tests/test_duplicate_command.rs, which will be
expanded shortly with other tests depending on this bugfix.
2023-01-17 21:17:27 -08:00
Martin von Zweigbergk
d6fcf4c7b2 repo: load correct OpHeadsStore depending on repo's type
We forgot to actually call `StoreFactories::load_op_heads_store()` to
load the right type of `OpHeadsStore` depending on the contents of
`.jj/repo/op_heads/type`. That shouldn't have any effect yet since we
only have one type so far, and there are no out-of-tree types yet
either (clearly, since they would not work).
2022-12-31 01:22:29 -08:00
Martin von Zweigbergk
d86ba708a3 repo: add MutableRepo::rewrite_commit() returning CommitBuilder
Same reasoning as the previous commit.
2022-12-26 23:30:52 -08:00
Martin von Zweigbergk
812ef97adb repo: add MutableRepo::new_commit() returning CommitBuilder
Since `CommitBuilder` now has a reference to `MutableRepo`, it's
convenient to create instances of it by calling a method on
`MutableRepo`.
2022-12-26 23:30:52 -08:00