96 Commits

Author SHA1 Message Date
Ilya Grigoriev
3404085ec4 cleanup: remove Conflict default impl
The "default" conflict is not a valid conflict, as it
does not have one more add than removes.
2025-01-18 07:49:43 +00:00
Yuya Nishihara
ad4b940daa object_id: implement Display on ObjectId types
It's convenient if id can be inlined in error messages.
2024-10-16 09:12:16 +09:00
Yuya Nishihara
5658f070fc hex_util: simplify conversion from bytes to reverse hex string 2024-10-16 09:12:16 +09:00
Yuya Nishihara
59c635bfd0 object_id: add ChangeId::reverse_hex() for convenience
Borrowed from #4470.
2024-10-16 09:12:16 +09:00
Mateusz Mikuła
8dd3003bec refactor: mark Timestamp struct as Copy 2024-09-22 16:23:53 +02:00
Martin von Zweigbergk
8eb3d85b1c backend: make write methods async
This doesn't provide any benefit yet bit I think we've known for a
while that we want to make the backend write methods async. It's just
not been important to Google because we have the local daemon process
that makes our writes pretty fast. Regardless, this first commit just
changes the API and all callers immediately block for now, so it won't
help even on slow backends.
2024-09-04 18:34:11 -07:00
Matt Kulukundis
8ead72e99f formatting only: switch to Item level import ganularity 2024-08-22 14:52:54 -04:00
Martin von Zweigbergk
fd9a236be5 copies: move CopyRecords to new copies module
Copy/rename handling is complicated. It seems worth having a module
for it. I'm going to add more content to it next.
2024-08-18 22:16:41 -07:00
Matt Kulukundis
5911e5c9b2 copy-tracking: Add copy tracking as a post iteration step
- force each diff command to explicitly enable copy tracking
- enable copy tracking in diff_summary
- post-process for diff iterator
- post-process for diff stream
- update changelog
2024-08-11 17:01:45 -04:00
Matt Kulukundis
ee6b922144 copy-tracking: create CopyRecordMap and add it to diff summaries 2024-08-11 17:01:45 -04:00
Matt Kulukundis
e667a2b403 copy-tracking: adjust backend signature
- use a single commit instead of an array of them.  This simplifies the
  implementation.  A higher level api can wrap this when an array of
  commits is desired and those semantics are figured out.
- since this API is directly 1-1 on parents, there are no conflicts
- if we introduce a higher level API that handles lists of commits, we
  may need to restore the conflict/resolved distinction, but for now
  simplify
2024-08-11 17:01:45 -04:00
Martin von Zweigbergk
d740f1801b conflicts: use non-legacy MergedTreeId for root commit
This is part of migrating away from legacy trees (with path-level
conflicts). I can't think of any practical impact (we already compare
the tree ids equal).
2024-07-24 14:33:05 +02:00
Matt Kulukundis
dab8a29683 copy-tracking: stub get_copy_records
- add the method and types for all backends
2024-07-03 20:26:30 -04:00
Martin von Zweigbergk
f8a5ad0c7a conflicts: propagate error from conflict materialization 2024-06-17 14:33:29 +09:00
Martin von Zweigbergk
404f31cbc1 backend: add error variant for access denied, handle when diffing
Some backends, like the one we have at Google, can restrict access to
certain files. For such files, if they return a regular
`BackendError::ReadObject`, then that will terminate iteration in many
cases (e.g. when diffing or listing files). This patch adds a new
error variant for them to return instead, plus handling of such errors
in diff output and in the working copy.

In order to test the feature, I added a new commit backend that
returns the new `ReadAccessDenied` error when the caller tries to read
certain objects.
2024-05-30 18:27:38 -07:00
Yuya Nishihara
f20004fffe git_backend: classify "merge with root" as user error
Perhaps, there will be more error types that hold BackendError internally, but
this change is good enough to handle a merge error.
2024-03-30 11:14:25 +09:00
Ilya Grigoriev
96bf190234 Nightly clippy fixes
There are a few additional warnings because of
https://github.com/rust-lang/rust-clippy/issues/12377, which is a
nightly-only bug that will hopefully be fixed.
2024-03-02 18:19:14 -08:00
Evan Mesterhazy
a335321c45 Add documentation comments for several types
These comments are intended to make it easier for new developers to get up to
speed with the project. This is just a starting point... there are other types
and functions that could benefit from documentation.
2024-03-02 15:01:55 -05:00
Evan Mesterhazy
a28beb5b8f Allow id_type! to capture doc comments
This allows us to define documentation comments for types implemented using the
id_type! macro. Comments defined above the type inside the macro will be
captured and visible in generated docs.

Example:

```
id_type!(
    /// Stable identifier for a [`Commit`]. Unlike the `CommitId`, the `ChangeId`
    /// follows the commit and is not updated when the commit is rewritten.
    pub ChangeId
);
```

This commit also adds documentation for the `CommitId` and `ChangeId` types
defined using the `id_type!` macro.
2024-02-27 10:37:05 -05:00
Yuya Nishihara
e588a9babc backend: allow cheap copy of MillisSinceEpoch(i64)
It's unlikely this type will become uncopyable.
2024-02-25 09:00:56 +09:00
Evan Mesterhazy
e8f324ffde Replace uses of content_hash! with #[derive(ContentHash)]
This is a pure refactor with no behavior changes.

#3054
2024-02-20 14:18:13 -05:00
Evan Mesterhazy
965d6ce4e4 Implement a procedural macro to derive the ContentHash trait for structs
This is a no-op in terms of function, but provides a nicer way to derive the
ContentHash trait for structs using the `#[derive(ContentHash)]` syntax used
for other traits such as `Debug`.

This commit only adds the macro. A subsequent commit will replace uses of
`content_hash!{}` with `#[derive(ContentHash)]`.

The new macro generates nice error messages, just like the old macro:

```
error[E0277]: the trait bound `NotImplemented: content_hash::ContentHash` is not satisfied
   --> lib/src/content_hash.rs:265:16
    |
265 |             z: NotImplemented,
    |                ^^^^^^^^^^^^^^ the trait `content_hash::ContentHash` is not implemented for `NotImplemented`
    |
    = help: the following other types implement trait `content_hash::ContentHash`:
              bool
              i32
              i64
              u8
              u32
              u64
              std::collections::HashMap<K, V>
              BTreeMap<K, V>
            and 38 others
```

This commit does two things to make proc macros re-exported by jj_lib useable
by deps:

1. jj_lib needs to be able refer to itself as `jj_lib` which it does
   by adding an `extern crate self as jj_lib` declaration.

2. jj_lib::content_hash needs to re-export the `digest::Update` type so that
   users of jj_lib can use the `#[derive(ContentHash)]` proc macro without
   directly depending on the digest crate. This is done by re-exporting it
   as `DigestUpdate`.


#3054
2024-02-20 11:29:05 -05:00
Evan Mesterhazy
e1fd402d39 Fix the ContentHash implementations for std::Option, MergedTreeId, and RemoteRefState
The `ContentHash` documentation specifies that implementations for enums should
hash the ordinal number of the variant contained in the enum as a 32-bit
little-endian number and then hash the contents of the variant, if any.

The current implementations for `std::Option`, `MergedTreeId`, and
`RemoteRefState` are non-conformant since they hash the ordinal number as a u8
with platform specific endianness.


Fixes #3051
2024-02-16 09:27:32 -05:00
Yuya Nishihara
77ceadbfd0 cleanup: remove remaining ": {source}" from error message templates 2024-02-04 09:13:21 +09:00
Yuya Nishihara
351487b9f5 backend: pass Index and keep_newer timestamp parameters to gc()
GitBackend::gc() will need to check if a commit is reachable from any
historical operations. This could be calculated from the view and commit
objects, but the Index will do a better job.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
4e54021930 backend: have gc() return BackendError instead of opaque error type
The gc() implementation is likely to call other backend functions, which
return BackendError.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
84949dd551 backend: mark BackendError::Other as transparent
The inner error should be the source, and I don't think the "Error:" prefix
gives additional context.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
fa5e40719c object_id: extract ObjectId trait and macros to separate module
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
2024-01-05 10:20:57 +09:00
Yuya Nishihara
dbaee198e6 hex_util: move common_hex_len() from backend module
This function predates the hex_util module. If there were hex_util, I would
add it there.
2024-01-05 10:20:57 +09:00
Martin von Zweigbergk
1cc271441f gc: implement basic GC for Git backend
This adds an initial `jj util gc` command, which simply calls `git gc`
when using the Git backend. That should already be useful in
non-colocated repos because it's not obvious how to GC (repack) such
repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to
780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to
86 ms.

I haven't added any tests because the functionality depends on having
`git` binary on the PATH, which we don't yet depend on anywhere
else. I think we'll still be able to test much of the future parts of
garbage collection without a `git` binary because the interesting
parts are about manipulating the Git repo before calling `git gc` on
it.
2023-12-03 07:40:12 -08:00
Yuya Nishihara
d747879aee signing: pass SigningFn by reference
write_commit() doesn't need ownership of the signing function.
2023-12-01 22:55:04 +09:00
Anton Bulakh
d7229a3f90 sign: Define signing backend API and integrate it
Finished everything except actual signing backend implementation(s) and
the UI.
2023-11-30 23:36:56 +02:00
Yuya Nishihara
59ef3f0023 repo_path: split RepoPathComponent into owned and borrowed types
This is a step towards introducing a borrowed RepoPath type. The current
RepoPath type is inefficient as each component String is usually short. We
could apply short-string optimization, but still each inlined component would
consume 24 bytes just for e.g. "src", and increase the chance of random memory
access. If the owned RepoPath type is backed by String, we can implement cheap
cast from &str to borrowed &RepoPath type.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
f2096da2d6 repo_path: add stub type to introduce borrowed RepoPathComponent type
The current RepoPathComponent will be renamed to RepoPathComponentBuf, and
new str wrapper will be added as RepoPathComponent.
2023-11-26 18:21:40 +09:00
Anton Bulakh
5c3c0e9f6e sign: Implement generic commit signing on the backend 2023-11-23 22:52:20 +02:00
Anton Bulakh
e3a1e5b80e sign: Implement storage for digital commit signatures
Recognize signature metadata from git commit objects, implement
a basic version of that for the native backend.
Extract the signed data (a commit binary repr without the signature) to
be verified later.
2023-11-12 03:37:13 +02:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Yuya Nishihara
06c254e742 git_backend: use non-owned str::from_utf8() to decode symlink target
Just for consistency with the other changes. str::Utf8Error is 2 words long,
so I removed the boxing.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
d1c71c05c9 git_backend: remove redundant error handling for invalid hash length
The only error that could be returned by libgit2 is invalid hash length, and
we check that explicitly. If we switch the backends to gitoxide, there will be
panicking constructor.

https://docs.rs/git2/latest/git2/struct.Oid.html#method.from_bytes
2023-10-31 06:51:27 +09:00
Martin von Zweigbergk
35a23172ec backend: delete unused Phase enum
The idea was to support phases like in hg, but that hasn't happened
yet. We can add back this simple enum if we do add support for phases.
2023-10-29 12:02:40 -07:00
Martin von Zweigbergk
cfcdd71865 backend: make read_conflict synchronous again
This avoids https://github.com/rust-lang/futures-rs/issues/2090. I
don't think we need to worry about reading legacy conflicts
asynchronously - async is really only useful for Google's backend
right now, and we don't use the legacy format at Google. In
particular, I don't want `MergedTree::value()` to have to be async.
2023-10-28 16:45:40 -07:00
Martin von Zweigbergk
5174489959 backend: make read functions async
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:

 * Make the backend methods `async`
 * Use many threads for reading from the backend
 * Add backend methods for batch reading

I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.

I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).

I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
d575aaeca8 backend: move constant functions first
`root_commit_id()`, `root_change_id()`, and `empty_tree_id()` were
strangely ordered between `write_symlink()` and `read_tree().
2023-09-19 05:24:51 -07:00
Martin von Zweigbergk
61501db8ec merged_trees: consider conflict-format-change-only commits empty
When we start writing tree-level conflicts in an existing repo, we
don't want commits that change the format to be non-empty if they
don't change any content. This patch updates `MergeTreeId::eq()` to
consider two resolved trees equal even if only their `MergedTreeId`
variant is different (one is path-level and one is tree-level).

I think I've gone through all places we compare tree ids and checked
that it's safe to compare them this way. One consequence is that
rebasing a commit without changing the parents (typically
auto-rebasing after `jj describe`) will not lead to the tree id
getting upgraded, due to an optimization we have for that case. I
don't think that's serious enough to handle specially; we'll have to
support the old format for existing repos for a while regardless of a
few commits not getting upgraded right away.

The number of failing tests with the config option enabled drop from
108 to 11 with this patch.
2023-08-30 06:17:21 -07:00
Martin von Zweigbergk
7e6930b56f backend: remove last few instances of MergedTreeId::as_legacy_tree_id() 2023-08-30 06:17:21 -07:00
Martin von Zweigbergk
962da1947e tests: make dump_tree() work with merged trees
My goal is to minimize impact on tests when we start using the new
format.
2023-08-30 06:17:21 -07:00
Waleed Khan
56c61fd047 merge_tools: create builtin diff editor 2023-08-30 05:38:10 -04:00
Martin von Zweigbergk
bd6098e09e cli: merge trees via MergedTree in jj move 2023-08-28 15:58:34 -07:00
Martin von Zweigbergk
36674e8f7e merged_tree: make id() return a MergedTreeId
We will rarely want to use the tree id without knowing whether it can
contain `TreeValue::Conflict` values, so let's make the callers check.
2023-08-27 06:49:45 -07:00
Martin von Zweigbergk
fd4146d485 backend: use new enum for Commit::root_tree
We currently represent the root tree id in a commit by `Merge<TreeId>`
plus a boolean `uses_tree_conflict_format`. It's better to use an enum
for that. That makes it harder to forget to check which type of tree
it is, and it makes it impossible to store a legacy tree with multiple
ids (as we could with `uses_tree_conflict_format=false`,
`root_tree=Merge::new(...)`).

Maybe more importantly, we're also going to want to pass around this
information in most places where we currently pass a single `TreeId`,
and passing two separate values would be annoying.
2023-08-26 07:02:04 -07:00