79 Commits

Author SHA1 Message Date
Martin von Zweigbergk
e50f6acab1 templater: fast-path empty and conflict to not read trees
When there's a single parent, we can determine if a commit is empty by
just comparing the tree ids. Also, when using tree-level conflicts, we
don't need to read the trees to determine if there's a conflict. This
patch adds both of those fast paths, speeding up `jj log -r ::main`
from 317 ms to 227 ms (-28.4%). It has much larger impact with our
cloud-based backend at Google (~5x faster).

I made the same fix in the revset engine and the Git push code (thanks
to Yuya for the suggestion).
2023-09-26 18:18:52 -07:00
Martin von Zweigbergk
145b0b24d8 commit: drop merged_ prefix from tree() and tree_id()
The old `tree()` and `tree_id()` functions are now gone, so we can use
those names for the new functions.
2023-08-29 08:32:04 -07:00
Martin von Zweigbergk
dc06bbc7d1 commit: migrate remaining uses of Commit::tree_id() and delete it 2023-08-28 15:58:34 -07:00
Martin von Zweigbergk
7bfd439bd1 rewrite: return MergedTree from merge_commit_trees_without_repo() 2023-08-28 15:58:34 -07:00
Yuya Nishihara
5b3c73dfc4 revset: insert StringPattern enum to add support for other kind of matching 2023-08-17 07:42:12 +09:00
Yuya Nishihara
925d54614d revset: remove round-trip conversion from heads() evaluation
This wouldn't matter much in practice, but I think it's better to stick to
low-level index primitives during revset evaluation.
2023-08-12 02:16:29 +09:00
Martin von Zweigbergk
48580ed8b1 revsets: allow :: as synonym for :
The `--allow-large-revsets` flag we have on `jj rebase` and `jj new`
allows the user to do e.g. `jj rebase --allow-large-revsets -b
main.. -d main` to rebase all commits that are not in main onto
main. The reason we don't allow these revsets to resolve to multiple
commits by default is that we think users might specify multiple
commits by mistake. That's probably not much of a problem with `jj
rebase -b` (maybe we should always allow that to resolve to multiple
commits), but the user might want to know if `jj rebase -d @-`
resolves to multiple commits.

One problem with having a flag to allow multiple commits is that it
needs to be added to every command where we want to allow multiple
commits but default to one. Also, it should probably apply to each
revset argument those commands take. For example, even if the user
meant `-b main..` to resolve to multiple commits, they might not have
meant `-d main` to resolve to multiple commits (which it will in case
of a conflicted branch), so we might want separate
`--allow-large-revsets-in-destination` and
`--allow-large-revsets-in-source`, which gets quite cumbersome. It
seems better to have some syntax in the individual revsets for saying
that multiple commits are allowed.

One proposal I had was to use a `multiple()` revset function which
would have no effect in general but would be used as a marker if used
at the top level (e.g. `jj rebase -d 'multiple(@-)'`). After some
discussion on the PR adding that function (#1911), it seems that the
consensus is to instead use a prefix like `many:` or `all:`. That
avoids the problem with having a function that has no effect unless
it's used at the top level (`jj rebase -d 'multiple(x)|y'` would have
no effect).

Since we already have the `:` operator for DAG ranges, we need to
change it to make room for `many:`/`all:` syntax. This commit starts
that by allowing both `:` and `::`.

I have tried to update the documentation in this commit to either
mention both forms, or just the new and preferred `::` form. However,
it's useless to search for `:` in Rust code, so I'm sure I've missed
many instances. We'll have to address those as we notice them. I'll
let most tests use `:` until we deprecate it or delete it.
2023-07-28 22:30:40 -07:00
Yuya Nishihara
e2f9ed439e revset: extract graph-related types to separate module
I'm going to add a topo-grouping iterator adapter, and the revset module is
already big enough to split.
2023-07-25 01:45:37 +09:00
Yuya Nishihara
817713c921 graphlog: use IndexPosition until transitive edges get eliminated
This partially reverts 4c8f484278de "graphlog: key by commit id (not index
position)." As Martin pointed out, it made "log -r 'tags()' -T.." in git
repo super slow. Apparently, both clone() and hash map insertion/lookup costs
increased by that change. Since we don't need CommitId inside the graph
iterator, we can simply replace it with IndexPosition, and resolve it to
CommitId later.
2023-07-24 05:07:07 +09:00
Martin von Zweigbergk
1c3fe9a651 cli: use MergedTree for finding conflicts
`MergedTree` is now ready to be used when checking if a commit has
conflicts, and when listing conflicts. We don't yet a way for the user
to say they want to use tree-level conflicts even for these
cases. However, since the backend can decide, we should be able to
have our backend return tree-level conflicts. All writes will still
use path-level conflicts, so the experimentation we can do at Google
is limited.

Beacause `MergedTree` doesn't yet have a way of walking conflicts
while restricting it by a matcher, this will make `jj resolve` a
little slower. I suspect no one will notice.
2023-07-19 22:04:16 -07:00
Waleed Khan
54dba51a08 docs: warn about missing docs for jj-lib crate 2023-07-10 18:28:59 +03:00
Martin von Zweigbergk
b297c0c0d8 rewrite: propagate errors from merge_trees() 2023-06-30 14:12:36 +02:00
Yuya Nishihara
ca6b9828d1 id_prefix: only store first few bytes of keys in IdIndex
This eliminates indirect access through Vec<u8> and improves cache locality
while sorting the index entries. We can achieve a similar result by using
SmallVec<[u8; 24]> in place of Commit/ChangeId(Vec<u8>), but we would have
to determine a reasonable id length across backends. Indexing [u8; 4] performs
better, at the cost of the API and implementation complexity.

For temporary Commit/ChangeId allocation in general, I think a borrowed type
like Path/PathBuf will help.

Testing with my "linux" repo, this saves ~670ms needed to initialize both
change id index and disambiguation indexes.
2023-06-25 12:54:18 +09:00
Yuya Nishihara
580d8bd92e id_prefix: introduce builder interface to IdIndex
It allows us to build multiple IdIndex instances within a single loop. As the
final sorting is heavy operation, I don't want to implement Default + Extend
for IdIndex to be compatible with Iterator::unzip().
2023-06-25 12:54:18 +09:00
Yuya Nishihara
a67d8b5a65 index: turn CompositeIndex::walk_revs() into position-based API
This gets rid of round-trip conversion from queries like "(main..)-". I have
such expression in my default log/disambiguation revset, and the query could
take ~150ms to convert head positions back and forth if the repository had
tons of unmerged commits.
2023-06-19 13:41:43 +09:00
Yuya Nishihara
b7b9b8c88e index: pass only CompositeIndex to default_revset_engine::evaluate() 2023-05-29 08:15:40 +09:00
Yuya Nishihara
92c1b7091b index: make CompositeIndex copyable to clarify it is a cheap reference type
Well, I might change it to an owned wrapper later, but if I made such change,
the current CompositeIndex<'_> would be replaced with &CompositeIndex.
2023-05-29 08:15:40 +09:00
Yuya Nishihara
0b2f0eca05 revset: add Revset::count() API
The default-engine implementation is pretty much the same as iter().count(),
but custom engine may have an optimal path.
2023-05-24 01:02:37 +09:00
Yuya Nishihara
5b568cabcc revset: add iterator of (CommitId, ChangeId) pairs, use it in id_index
There are a few more places where we need these pairs.
2023-05-24 01:02:37 +09:00
Yuya Nishihara
44927be7c9 id_prefix: add IdIndex method that looks up unambiguous key
resolve_prefix_with() is changed to return both key and values.
2023-05-24 01:02:37 +09:00
Martin von Zweigbergk
f657bcb6ae prefixes: move IdIndex to id_prefix module
I'll reuse it there next.
2023-05-11 23:41:24 -07:00
Yuya Nishihara
d948acd5bf revset: do not scan ancestors more than once to evaluate nested children set 2023-04-28 08:36:58 +09:00
Yuya Nishihara
e6740d9c3b index: migrate walk_ancestors_until_roots() from revset engine
I'm going to add a RevWalk method to walk descendants with generation filter,
which will use this helper method. RevWalk::take_until_roots() uses .min()
instead of .last() since RevWalk shouldn't know the order of the input set.
2023-04-28 08:36:58 +09:00
Yuya Nishihara
837e8aa81a revset: add substitution rule for nested descendants/children
The substitution rule and tests are copied from ancestors/parents. The backend
logic will be reimplemented later. For now, it naively repeats children().
2023-04-24 20:45:13 +09:00
Yuya Nishihara
32253fed5e revset: replace children node with descendants of generation 1..2 2023-04-24 20:45:13 +09:00
Yuya Nishihara
a99b82c634 revset: add generation parameter to descendants node
This is a minimal change to replace Children with Descendants. A generation
parameter could be added to RevsetExpression::DagRange, but it's not needed
as of now.
2023-04-24 20:45:13 +09:00
Yuya Nishihara
eadf8faded revset: extract children() evaluation to function
I'm going to add generation parameter to Children/DagRange nodes, and
'Children { .. }' will be substituted to 'DagRange { .., gen: 1 }'. This
commit helps future code move.

Lifetime bounds of the arguments are unnecessarily restricted. It appears
walk_ancestors_until_roots() captures arguments lifetime on rustc 1.64.0.
I think the problem will go away if walk_*() functions are extracted to
RevWalk methods where input arguments will become less generic.
2023-04-24 20:45:13 +09:00
Yuya Nishihara
d9d2b405e1 revset: remove redundant boxing from evaluated children node
Just spotted while moving codes around. This wouldn't matter in practice.
2023-04-24 20:45:13 +09:00
Yuya Nishihara
36e7afe0db revset: exclude unreachable roots from collect_dag_range() result
It doesn't matter, but can simplify the function interface. I'll probably
extract this function to RevWalk so the descendants with/without generation
filter can be tested without using revset API.
2023-04-24 20:45:13 +09:00
Martin von Zweigbergk
c60f14899a index: remove entry_by_id() from trait
It no longer needs to be on the `Index` trait, thereby removing the
last direct use of `IndexEntry` in the trait (it's still used
indirectly in `walk_revs()`).
2023-04-18 18:32:23 -07:00
Martin von Zweigbergk
e492548772 revset: bump generation numbers in API to 64 bits
A chain of 4 billion commits is a lot, but it's not out of the
question, so let's support it. The current default index will not be
able to handle that many commits, so I let that still use 32-bit
integers.
2023-04-12 21:18:49 -07:00
Yuya Nishihara
5351371d51 revset: resolve visible heads prior to evaluation 2023-04-10 00:39:58 +09:00
Yuya Nishihara
7e1e9efa38 revset: resolve "all()" prior to evaluation 2023-04-10 00:39:58 +09:00
Yuya Nishihara
f43f0d24b8 revset: resolve candidates of children set prior to evaluation 2023-04-10 00:39:58 +09:00
Yuya Nishihara
7974269bab revset: remove None variant from resolved enum, use Commits([]) instead
We'll remove All, so it makes sense to not have None either.
2023-04-10 00:39:58 +09:00
Yuya Nishihara
0fcc13a6f4 revset: make resolve() return different type describing evaluation plan
New ResolvedExpression enum ensures that the evaluation engine doesn't have
to know the symbol resolution details. In this commit, I've moved Filter
and NotIn resolution to resolve_visibility(). Implicit All/VisibleHeads
resolution will be migrated later.

It's tempting to combine resolve_symbols() and resolve_visibility() to get
rid of panic!()s, but the resolution might have to be two passes to first
resolve&collect explicit commit ids, and then substitute "all()" with
"(:visible_heads())|commit_id|..". It's also possible to apply some tree
transformation after symbol resolution.
2023-04-10 00:39:58 +09:00
Yuya Nishihara
6d9b836d10 revset: extract unresolved commit references to separate enum
This makes it clear what should be resolved at resolve_symbols(). Symbol
is a bit special while parsing function arguments, but it's no different
than the other unresolved references at expression level.
2023-04-10 00:39:58 +09:00
Yuya Nishihara
adfd52445b revset: reimplement children to not scan visible ancestors twice
It's slightly faster, and removes the use of RevsetExpression::descendants()
API.
2023-04-08 12:13:30 +09:00
Yuya Nishihara
5dd99db250 revset: make evaluation helper not create trait object eagerly
We wouldn't care for the cost of virtual dispatch at this level, but I
think a concrete struct type is easier to deal with than trait object.
2023-04-08 12:13:30 +09:00
Yuya Nishihara
85fb1f74c3 revset: for roots:heads, terminate ancestor lookup at min(roots) 2023-04-08 12:13:30 +09:00
Yuya Nishihara
ddff089286 revset: do not evaluate roots() candidates three times 2023-04-08 12:13:30 +09:00
Yuya Nishihara
eef6a77aa4 revset: reuse reachable dag-range set to calculate roots
This also removes the use of RevsetExpression::connected() API from the
evaluation engine.
2023-04-08 12:13:30 +09:00
Yuya Nishihara
20aa31336e revset: extract dag-range calculation to function
The returned reachable set can be reused to calculate roots() expression.
2023-04-08 12:13:30 +09:00
Yuya Nishihara
7dc35b82b0 revset: evaluate ancestors without using RevsetExpression builder API
I'm thinking of transforming RevsetExpression to a enum dedicated for
the evaluation stage. To help the migration, I want to remove the use of
the RevsetExpression builder API from the evaluation engine.

Fewer virtual dispatch is also better.
2023-04-08 12:13:30 +09:00
Martin von Zweigbergk
24a512683b revset: add a revset function for finding commits with conflicts
This adds `conflict()` revset that selects commits with conflicts. We
may want to extend it later to consider only conflicts at certain
paths.
2023-04-06 16:46:21 -07:00
Yuya Nishihara
308a5b9eae revset: make empty()/file(".") not load root tree for liner history
TreeDiffIterator wouldn't load identical subtrees, but it's up to caller to
optimize out the root tree loading.
2023-04-05 21:53:24 +09:00
Martin von Zweigbergk
e1c57338a1 revset: split out no-args head() to visible_heads()
The `heads()` revset function with one argument is the counterpart to
`roots()`. Without arguments, it returns the visible heads in the
repo, i.e. `heads(all())`. The two use cases are quite different, and
I think it would be good to clarify that the no-arg form returns the
visible heads, so let's split that out to a new `visible_heads()`
function.
2023-04-03 23:46:34 -07:00
Yuya Nishihara
982062bd75 revset: do not always evaluate filter node to InternalRevset
This basically removes hidden 'all() &' from union/negation of filters. To
achieve that, I have two options: 1. add separate evaluation path (like the
one this commit introduced), or 2. wrap "all()" revset to override predicate
as Box::new(|_| true) function. I took the former since it's less ad-hoc.

We can add an explicit RevsetExpression node to branch between evaluate()
and evaluate_predicate(), but I don't think it would simplify the
implementation at this point. We might need such node if we want to resolve
"all()" at resolve_symbols(). It might be even better to extract a subset of
RevsetExpression enum, which only contains evaluatable nodes.

The cost of 'all() &' isn't significant for most filters. '~merges()' is
the exception. For jj repo,

    revsets/:v0.3.0 & (author(martinvonz) | committer(martinvonz))
    --------------------------------------------------------------
    base     1.06      11.2±0.04m
    new      1.00      10.5±0.05m

    revsets/~merges()
    -----------------
    base     1.69     750.0±8.47µ
    new      1.00     444.1±3.50µ
2023-04-04 15:21:21 +09:00
Yuya Nishihara
69794f2585 revset: add method to upcast InternalRevset to ToPredicateFn 2023-04-04 15:21:21 +09:00
Yuya Nishihara
426f3e4e0a revset: simplify evaluation of "all()"
I think this is more readable, and apparently it produces slightly better code
maybe because the compiler can determine that there are no unwanted markers.
2023-04-04 15:21:21 +09:00