There are several existing commands that would benefit from an API
that makes it easier to rewrite a whole graph of commits while
transforming them in some way.
`jj squash` is one example. When squashing into an ancestor, that
command currently rewrites the ancestor, then rebases descendants, and
then rewrites the rewritten source commit. It would be better to
rewrite the source commit (and any descendants) only once.
Another example is the future `jj fix`. That command will want to
rewrite a graph while updating the trees. There's currently no good
API for that; you have to manually iterate over descendants and
rewrite them.
This patch adds a new `MutableRepo::transform_descendants()` method
that takes a callback which gets a `CommitRewriter` passed to it. The
callback can then decide to change the parents, the tree, etc. The
callback is also free to leave the commit in place or to abandon it.
I updated the regular `rebase_descendants()` to use the new function
in order to exercise it. I hope we can replace all of the
`rebase_descendant_*()` flavors later.
I added a `replace_parent()` method that was a bit useful for the test
case. It could easily be hard-coded in the test case instead, but I
think the method will be useful for `jj git sync` and similar in the
future.
It's cheap to look up commits again from the cache in `Store` but it
can be expensive to look up commits we didn't end up needing. This
will make it easier to refactor further and be able to cheaply set
preliminary parents for a rewritten commits and then let the caller
update them.
The functions now depend only on `MutableRepo`, so I think they belong
on that type. This gets us closer to being able to make
`parent_mapping` private again.
If we ever implement some sort of ABI for dynamic extension loading, we'll need these underlying APIs to support multiple extensions, so we might as well do that first.
I don't think we have any transactions that mark commit as abandoned
and then later mark it as rewritten or divergent. But if we ever do, I
think it should be considered just rewritten/divergent. So let's
enforce that invariant by removing the old value from the set of
abandoned commits.
When `rebase_commit_with_options()` decides to abandons a commit, it
records the new parents in the `MutableRepo`, but it's currently the
caller's responsibility to remember to mark it as abandoned. Let's
move that logic into the function to reduce the risk of future bugs.
By adding the abandoned commit's parents to `parent_mapping`, we can
remove a bit more of the special handling of abandoned commitsin
`DescendantRebaser`.
A subset of the state in `DescendantRebaser` now matches exactly what
`MutableRepo` already stores, so we can avoid copying that state and
have `DescendantRebaser` use it directly instead. Having a single
source of truth for the state will enable further simplifications and
improvements.
This is just to match `DescendantRebaser`, to make the next commit a
bit simpler. I think `MutableRepo` still has few enough fields that
just `abandoned` is clear enough. Maybe we'll move the three
rewrite-related fields into a new struct at some point.
With this patch, `MutableRepo` has the same tracking of rewritten
commits as `DescendantRebaser`, so we can simply pass that state into
`DescendantRebaser` when we create it. The next step is to remove the
state from `DescendantRebaser`.
I don't think we have any callers left that call
`record_rewritten_commit()` multiple times within a transaction and
expect it to result in divergence. I think we should consider it a bug
to do that.
MutableRepo and CommitBuilder both define public (now crate-public) functions
which should only be called by each other. This commit adds documentation and
restricts visibility of these functions to the jj_lib crate. It might be even
better to move CommitBuilder to the same module as MutableRepo so that these
codependent functions can be private to the module to avoid misuse.
Since the operation log has a root operation, we don't need to create
the repo-initialization operation in order to create a valid
`ReadonlyRepo` instance. I think it's conceptually simpler to create
the instance at the root operation id and then add the initial
operation using the usual `Transaction` API. That's what this patch
does.
Doing that also brought two issues to light:
1. The empty view object doesn't have the root commit as head.
2. The initialized `OpHeadsStore` doesn't have the root operation as
head.
Both of those seem somewhat reasonable, but maybe we should change
them. For now, I just made the initial repo (before the initial
operation) have a single op head (to compensate for (2)). It might be
worth addressing both issues so the repo is in a better state before
we create the initial operation. Until we do, we probably shouldn't
drop the initial operation.
Because the default index cuts off the traversal at min(generations), including
the root id means all ancestors will be visited. This could be worked around at
the index side, but I think it's the repo/view's responsibility. That being
said, it's not uncommon to pad a revset with "root()", so it might make sense
for the index to special case the root id.
I also removed the redundant .clone().
It seems obvious in hindsight to have a virtual root operation just
like we have a virtual root commit. It removes the same kind of
problems by making sure there's always a common ancestor (or multiple)
between any two commits.
I think the reason I didn't add a root operation from the beginning
was that there used to be a mandatory working-copy commit in the view
(this was before support for multiple workspaces).
Perhaps we should remove the "initialize repo" operation now. The only
difference between their view objects is that the "initialize repo"
operation adds the root commit as a head. We could add that to the
root operation, but then the root operation's value depends on the
commit backend.
We've had the public_heads for as long as we've had the View object,
IIRC (I didn't check), but we still don't use it for anything. I don't
have any concrete plans for using it either. Maybe our config for
immutable commits is good enough, or maybe we'll want something more
generic (like Mercurial's phases). For now, I think we should simplify
by removing it the storage for public heads.
change_id_index() is only used by Readonly/MutableRepo, so we don't need an
abstraction at Index. evaluate_revset() is somewhat similar, but the callers
rely on &dyn Repo.
We current have `Revset::change_id_index()` for creating a
`ChangeIdIndex` for a given revset. I think it will be hard to make it
performant for general revsets, especially in very large repos and
with custom index implementations, like the one we have at Google. If
we instead restrict it to including all ancestors of a set of heads, I
think it will be much easier to implement. We only use
`Revset::change_id_index()` with revsets including all visible commits
today, so we won't lose any current functionality by making it more
restricted.
I'd like to move `change_id_index()` from `Revset` to `Index` (and
make it take the set of visible heads as argument). We currently use
`evaluate_revset_static()` only to get a `ChangeIdIndex`, so a good
place to start is to convert that into `change_id_index_static()`.
The `ChangeIdIndex` type is currently in defined in the `revset`
module because that's the only placed it's used. However, I'd like to
start using it directly from `index`. The idea is to make it possible
to create a `ChangeIdIndex` given a set of heads, without first
creating a `Revset`.
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
Finally, there are no test uses of these APIs. `DescendantRebaser` is made
`pub(crate)`, since it is used by `MutRepo`. Other functions are made private.
This completes the process of removing DescendantRebaser-related APIs from
tests. It requires creating some new test utils and a new
`rebase_descendants_with_option_return_map`.
This commit is a little out of place in this sequence, but
it seems to make more sense for MutRepo to own these maps.
@yuja [pointed out] that any tests written using `create_descendant_rebaser` now
need to do this cleanup, but there are no longer any such tests after the
previous commits and a follow-up commit removes `create_descendant_rebaser`
entirely.
[pointed out]: https://github.com/martinvonz/jj/pull/2737#discussion_r1435754370
The resolver callback usually returns wider error type, which I don't think
is a variant of OpHeadResolutionError.
To help type inference, resolver's error type is E, not E1 where E: From<E1>.
The error types are shared with the commit store backend. We could add per-store
error types, but it's unlikely that the caller needs to discriminate them.
Consider how one would implment the current `OpHeadsStore` interface
for a cloud-based backend. After `OpHeadsStore::add_op_head()` is
called, the set of op heads temporarily contains two heads (typically)
until `OpHeadsStore::remove_op_head()` is called. That's not invalid,
but it's annoying to have to deal with that state more than
necessary. Also, it's unnecessarily inefficient to send the addition
and removal of op heads as separate RPCs. This patch therefore adds a
`update_op_heads()` method that takes a list of old heads to remove
and a single new head to add. Coming patches will start migrating to
that method.
This requires creating a new public API as a substitute. I took the opportunity
to also add some comments to the
`MutRepo::record_rewritten_commit`/`record_abandoned_commit` functions.
I imade the simplest possible addition to the API; it is not a very elegant
one. Eventually, the entire `record_rewritten_commit` API should probably be
refactored again.
I also added some comments explaining what these functions do.