While working on ancestor generation, I noticed Mercurial has this
substitution rule. Since it's easier to deal with Ancestors() than Range {},
'roots..heads' is first decomposed to ':heads & ~:roots'.
I failed to solve type puzzle for to_predicate_fn<'a>(&'a self) where
'repo: 'a, so struct RevWalkRevset<'repo, T> is bounded by T to consume
the lifetime parameter.
We're more likely to filter out empty commits, so this should be slightly
faster in practice.
The extra Option<> isn't needed, but it should clarify that "prefix([])"
is not "everything".
This basically transforms 's1 & (f() | s2)' to
's1.iter().filter(all && f || s2)'. Still the predicate part includes "all",
the filter function doesn't need to load commit data for every entry since
's1.iter().filter(all)' is tested first. To optimize "all" predicate out,
maybe we can add a wrapper that returns '|_: &IndexEntry| true'.
Instead of inserting AsFilter(_) node, I could add a recursive is_filter()
function. That would also work so long as the height of RevsetExpression tree
is limited. I chose node insertion just for ease of snapshot testing.
In order to optimize a query like '(author(_) | @) & main..', we'll probably
need a predicate form of an iterable set so that the query can be evaluated
to '(main..).iter().filter(author(_) | @)'. And if a predicate function can
terminate the source iterator early (by returning true/false/false_forever),
complexity of a filtered revset is basically the same as an intersection of
iterator pair. This means we can eventually merge IntersectionRevset with
FilterRevset.
With that in mind, this patch removes the redundant 'candidates' field from
the filter node, which would otherwise appear in the predicate function as
'candidates.contains(entry)'. A filter node with candidates was somewhat
useful while rewriting the tree, but that can be dealt with a view function
like as_filter_intersection() in this patch.
This also simplify the subsequent filter transformation as we no longer need
to test if candidates == All.
Previously we only have a test for the left recursion. The added test
contains right recursion path, which should have caught the error I made
while working on the next "unary filter node" patch.
The doc comment summarizes what I'm going to implement. I'm not sure if
we'll add all of them because revset evaluation isn't the key performance
bottleneck at the moment. Anyway, I don't think any of these ideas would
logically conflict with segmented changelog adaptation unless we decide to
replace the whole revset stack with Eden/Sapling's.
Writing double negates is silly, but it might be hidden by revset alias
if we added such feature.
I made fold_redundant_expression() a separate step from fold_difference()
since I'll probably want to apply the cleanup step before rewriting filter
expressions.
Because a unary negation node '~y' is more primitive than the corresponding
difference node 'x~y', '~y' is easier to deal with while rewriting the tree.
That's the main reason to add RevsetExpression::NotIn node.
As we have a NotIn node, it makes sense to add an operator for that. This
patch reuses '~' token, which I feel intuitive since the other set operators
looks like bitwise ops. Another option is '!'.
The unary '~' operator has the highest precedence among the set operators,
but they are lower than the ranges. This might be counter intuitive, but
useful because a prefix range ':x' can be negated without parens.
Maybe we can remove the redundant infix operator 'x ~ y', but it isn't
decided yet.
Function parameters are processed as local symbols while substituting
alias expression. This isn't as efficient as Mercurial which caches
a tree of fully-expanded function template, but that wouldn't matter in
practice.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.
Follows up c5ed3e14776b. Now change/commit ids are resolved at the same
precedence, which means there are at least three types of ambiguity.
I don't think we would need to discriminate these.
Because the use of the change id is recommended, any operation should abort
if a valid change id happens to match a commit id. We still try the commit
id lookup first as the change id lookup is more costly.
Ambiguous change/commit id is reported as AmbiguousCommitIdPrefix for now.
Maybe we can merge AmbiguousCommit/ChangeIdPrefix errors into one?
Closes#799
Since syntactic information like symbol or function name is lost after
parse(), alias substitution is inserted to the middle of the post-parsing
stage, not after the whole RevsetExpression tree is built. This is the main
difference from Mercurial. Mercurial also caches parsed aliases, but I don't
think that would have a measurable impact.
The CLI will load aliases from config, insert them one by one, and warn if
declaration part is invalid. That's why RevsetAliasesMap is a public struct
and needs to be instantiated by the caller.
I'll add aliases map, substitution stack (to detect recursion), and locals
(for function aliases) there. Fortunately, we can avoid shared mutables
so a copyable struct should be good.
parse_function_argument_to_string() doesn't need a workspace_ctx, but there
should be no reason to explicitly nullify it either.
I'm thinking of adding alias expansion at this stage, and it would be a bit
tedious to pass around mutable context by function parameter. So let's reduce
the number of the intermediate functions.
This also produces a better error message.
The expression 'x ~ empty()' is identical to 'x & file(".")', but more
intuitive.
Note that 'x ~ empty()' is slower than 'x & file(".")' since the negative
intersection isn't optimized right now. I think that can be handled as
follows: 'x ~ filter(f)' -> 'x & filter(!f)' -> 'filter(!f, x)'