102 Commits

Author SHA1 Message Date
Yuya Nishihara
9bd84c55e0 git_backend: use file mode extensively in read_tree()
Both filemode() and kind() are calculated from the same underlying data,
and kind() is libgit2-specific API.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
b3c9cab12d git_backend: handle read_tree() lookup/encoding errors gracefully 2023-10-31 06:51:27 +09:00
Yuya Nishihara
847adc832f git_backend: use lossy conversion to decode non-UTF-8 commit message
If message() returned None, it doesn't mean the commit message is empty. I
originally mapped it to an error, but that made import of linux repo fail.

https://docs.rs/git2/latest/git2/struct.Commit.html#method.message
2023-10-31 06:51:27 +09:00
Yuya Nishihara
06c254e742 git_backend: use non-owned str::from_utf8() to decode symlink target
Just for consistency with the other changes. str::Utf8Error is 2 words long,
so I removed the boxing.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
d1c71c05c9 git_backend: remove redundant error handling for invalid hash length
The only error that could be returned by libgit2 is invalid hash length, and
we check that explicitly. If we switch the backends to gitoxide, there will be
panicking constructor.

https://docs.rs/git2/latest/git2/struct.Oid.html#method.from_bytes
2023-10-31 06:51:27 +09:00
Martin von Zweigbergk
cfcdd71865 backend: make read_conflict synchronous again
This avoids https://github.com/rust-lang/futures-rs/issues/2090. I
don't think we need to worry about reading legacy conflicts
asynchronously - async is really only useful for Google's backend
right now, and we don't use the legacy format at Google. In
particular, I don't want `MergedTree::value()` to have to be async.
2023-10-28 16:45:40 -07:00
Martin von Zweigbergk
578d61ec2e git: use forward slashes in relative path to backing git repo
Closes #2403.
2023-10-20 22:51:14 -05:00
Martin von Zweigbergk
f541f9f3a6 cleanup: import futures::exectutor::block_on() instead of qualifying
It seems we'll end up using `block_on()` quite a bit, at least until
we're done transitioning to async, and the function name doesn't
conflict with anything else, so let's always import it when we need
it.
2023-10-20 07:38:34 -07:00
Martin von Zweigbergk
f8be0b2030 backends: deduplicate definition of backend names
I copied the example set by `DefaultSubmoduleStore`.
2023-10-14 06:38:35 -07:00
Martin von Zweigbergk
5174489959 backend: make read functions async
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:

 * Make the backend methods `async`
 * Use many threads for reading from the backend
 * Add backend methods for batch reading

I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.

I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).

I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
bd5eef9c5e git_backend: rename some store variables backend in tests
This is to avoid confusion with instances of the `Store` type.
2023-10-08 23:36:49 -07:00
Yuya Nishihara
f0ad1f53ea git_backend: on read_commit(), fall back to importing extras as needed
One problematic scenario is that we have commits imported by old jj, and all
of their descendant commits are created by jj. Therefore import_head_commits()
wouldn't reach the old ancestor commits.

This change might bury a real bug, but I don't have a better alternative. Maybe
we can remove this hack after a couple of jj releases, and add a debug command
that imports all reachable Git commits from all historical heads.

Closes #2343
2023-10-07 02:08:36 +09:00
Yuya Nishihara
7c96cead34 git_backend: rename git_repo_clone() as it isn't just cloning, propagate error
Since git2::Repository::open() will access to the filesystem, it can technically
fail.
2023-10-04 00:04:24 +09:00
Yuya Nishihara
837dc4f47c git_backend: rewrite remaining git_repo() callers, make it private
While debugging git issues, I often ended up creating a deadlock by adding
debug prints. It's also not obvious that git::export_refs() works even if the
git_repo() has already been locked, whereas git::import_refs() wouldn't. Let's
consolidate lock handling to the backend implementation.
2023-10-04 00:04:24 +09:00
Yuya Nishihara
0d63223dad git_backend: proxy git2::Repository methods to manage lock scope internally
Since git_repo() acquires Mutex, it's super easy to create a deadlock.
2023-10-04 00:04:24 +09:00
Martin von Zweigbergk
d575aaeca8 backend: move constant functions first
`root_commit_id()`, `root_change_id()`, and `empty_tree_id()` were
strangely ordered between `write_symlink()` and `read_tree().
2023-09-19 05:24:51 -07:00
Yuya Nishihara
d05aaf30b4 git: promote imported commits to tree-conflict format if configured 2023-09-08 21:54:43 +09:00
Yuya Nishihara
f744e5920b git: create no-gc ref by GitBackend
Since we now have an explicit method to import heads, it makes more sense to
manage no-gc refs by the backend.
2023-09-08 21:54:43 +09:00
Yuya Nishihara
17f502c83a git: do not import unknown commits by read_commit(), use explicit method call
The main goal of this change is to enable tree-level conflict format, but it
also allows us to bulk-import commits on clone/init. I think a separate method
will help if we want to provide progress information, enable check for
.jjconflict entries under certain condition, etc.

Since git::import_refs() now depends on GitBackend type, it might be better to
remove git_repo from the function arguments.
2023-09-08 21:54:43 +09:00
Martin von Zweigbergk
7e6930b56f backend: remove last few instances of MergedTreeId::as_legacy_tree_id() 2023-08-30 06:17:21 -07:00
Martin von Zweigbergk
e4ba6a42fc backends: store tree id conflicts as list with alternating signs
Now that we have `Merge::iter()` and friends, it's simpler to store
the tree ids in a single list.
2023-08-26 07:02:04 -07:00
Martin von Zweigbergk
fd4146d485 backend: use new enum for Commit::root_tree
We currently represent the root tree id in a commit by `Merge<TreeId>`
plus a boolean `uses_tree_conflict_format`. It's better to use an enum
for that. That makes it harder to forget to check which type of tree
it is, and it makes it impossible to store a legacy tree with multiple
ids (as we could with `uses_tree_conflict_format=false`,
`root_tree=Merge::new(...)`).

Maybe more importantly, we're also going to want to pass around this
information in most places where we currently pass a single `TreeId`,
and passing two separate values would be annoying.
2023-08-26 07:02:04 -07:00
Martin von Zweigbergk
589e0db3c5 git_backend: remove unused proto field for resolved tree id
We store resolved tree ids in the regular Git commit, so we we never
ended up using the `resolved` variant in the `root_tree`.
2023-08-26 07:02:04 -07:00
Waleed Khan
134d85e635 backend: reduce BackendError size somewhat
One of the error types that I later created embedded `BackendError`, but `clippy` complained that the size of the type was too large. This helps address that.
2023-08-23 21:11:15 -07:00
Emily Fox
2c88da02b4 git: teach backend to handle empty name and email strings 2023-08-18 17:22:59 -05:00
Martin von Zweigbergk
ef5f97f8d7 conflicts: move Merge<T> to merge module
The `merge` module now seems like the obvious place for this type.
2023-08-06 22:08:09 +00:00
Martin von Zweigbergk
ecc030848d conflicts: rename Conflict<T> to Merge<T>
Since `Conflict<T>` can also represent a non-conflict state (a single
term), `Merge<T>` seems like better name.

Thanks to @ilyagr for the suggestion in
https://github.com/martinvonz/jj/pull/1774#discussion_r1257547709

Sorry about the churn. It would have been better if I thought of this
name before I introduced `Conflict<T>`.
2023-08-06 22:08:09 +00:00
Martin von Zweigbergk
006c764694 backend: learn to store tree-level conflicts
Tree-level conflicts (#1624) will be stored as multiple trees
associated with a single commit. This patch adds support for that in
`backend::Commit` and in the backends.

When the Git backend writes a tree conflict, it creates a special root
tree for the commit. That tree has only the individual trees from the
conflict as subtrees. That way we prevent the trees from getting
GC'd. We also write the tree ids to the extra metadata table
(i.e. outside of the Git repo) so we don't need to load the tree
object to determine if there are conflicts.

I also added new flag to `backend::Commit` indicating whether the
commit is a new-style commit (with support for tree-level
conflicts). That will help with the migration. We will remove it once
we no longer care about old repos. When the flag is set, we know that
a commit with a single tree cannot have conflicts. When the flag is
not set, it's an old-style commit where we have to walk the whole tree
to find conflicts.
2023-07-19 22:04:16 -07:00
Waleed Khan
54dba51a08 docs: warn about missing docs for jj-lib crate 2023-07-10 18:28:59 +03:00
Yuya Nishihara
5346bd734f git_backend: translate io::Error of read_conflict() to ReadObject error
This is the last place in Git backend where io::Error is magically converted
to BackendError::Other.
2023-07-06 20:48:46 +09:00
Yuya Nishihara
4e4ca46998 git_backend: wrap TableStoreError to preserve source error object 2023-07-06 20:48:46 +09:00
Yuya Nishihara
cf8a0466c4 backend: introduce error types specific to init/load phases
Errors that may occur while loading backend would vary per backends, and
it's unlikely that these errors could be mapped to BackendError variants
other than BackendError::Other. So let's extract Other(_) of that kind as
a separate type to clarify there would be no other error variants.

Perhaps, Backend/Error will be renamed to CommitBackend/Error or
CommitStore/Error?, whereas I think BackendInit/LoadError can be shared
among store factories.
2023-07-06 20:48:46 +09:00
Yuya Nishihara
e1e75daa8e backend: make BackendError::Other preserve source error object 2023-07-06 20:48:46 +09:00
Yuya Nishihara
5b78fe75b1 git_backend: propagate load() error to caller
#1794
2023-07-06 12:43:49 +09:00
Yuya Nishihara
84060d750b git_backend: propagate init_internal() error to caller 2023-07-06 12:43:49 +09:00
Yuya Nishihara
2db4c906ad git_backend: attach file path to initialization error 2023-07-06 12:43:49 +09:00
Yuya Nishihara
31bb68486e git_backend: insert error type specific to backend initialization
This helps to map initialization error to BackendError without too general
From impl. I don't think io::Error (or our PathError) should be automatically
translated to BackendError::Other because BackendError has more specific
variants depending on context. If the error is specific to initialization,
it makes sense to translate it to Other variant.
2023-07-06 12:43:49 +09:00
Yuya Nishihara
a09a406817 git_backend: leverage std::fs::read/write() helpers 2023-07-06 12:43:49 +09:00
Kevin Liao
eac90fd113 Update init_external to return an error instead of unwrapping 2023-06-29 10:03:13 -07:00
Martin von Zweigbergk
da5db27bb0 backend: split up store.proto in git and local versions
It was convenient that what the git backend stored in its "extras"
table is exactly a subset of the fields that local backend stores, but
it's bit ugly and limiting. For example, it makes it possible to
populate the `author` field in the git extras, but that would have no
effect. It's better that it's not possible to do that (we store the
author field in the git commit, of course).

What made me notice this now was that I'm working on tree-level
conflicts (#1624) and I'm thinking of adding a field to the git extras
saying "this commit has single tree, but it's still a new-style
commit", so we can know not to walking such trees to find path-level
conflicts. That's only needed for the git backend because we don't
care about compatibility for the local backend.
2023-06-22 13:49:46 +02:00
Kevin Liao
86b6a11e63 Fix jj init --git-repo fails and leaves broken .jj folder
This commit fixes #1305

Before this commit, running `jj init --git-repo=./` in a folder that
does not have a .git would cause jj to panick and leave an unfinished corrupted jj repo.

This commit fixes that by changing the call chain to return an error
instead of calling .unwrap() and panicking. This commit also adds logic to delete the unfinished jj
repository when the git backend initialization failed.

Before this commit, running the above command would result in the following
```
Running `jj/target/debug/jj init --git-repo=./`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { code: -3, klass: 2, message: "failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory" }', lib/src/git_backend.rs:83:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

After this commit, the result is the following and the jj repo is deleted:
```
Running `jj/target/debug/jj init --git-repo=./`
Error: Failed to access the repository: Error: Failed to open git repository: failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory; class=Os (2); code=NotFound (-3)
```
2023-06-20 11:02:06 -07:00
Yuya Nishihara
38a7e7fd62 git_backend: on read_commit(), bulk-update extra metadata table of ancestors
Otherwise, "jj init --git-repo ." would create extra table files per commit,
and merge them.

I considered adding an explicit GitBackend method to be called from
git::import_refs(), but the call order matters. The method should be invoked
before calling store.get_commit(..) or mut_repo.add_head(..). Since commits
are likely to be loaded from the head, we can instead make read_commit()
import ancestor metadata at all.

Alternatively, we could make a Git commit hidden until it's inserted into
the extra table. It's rather big change, and I wouldn't like to do that
without thinking more thoroughly.
2023-05-21 08:29:00 +09:00
Yuya Nishihara
fe97dccd02 git_backend: move add_entry() of extra metadata table to caller
I'm going to add a caller which will insert multiple entries at once.
2023-05-21 08:29:00 +09:00
Yuya Nishihara
e6addf7905 git_backend: extract helper that converts git2::Commit to backend::Commit
The root parent id is filled by caller because empty parents list is more
convenient while walking ancestors.
2023-05-21 08:29:00 +09:00
Yuya Nishihara
0149e7b311 git_backend: generate change id from git2::Commit object
I'm going to extract a helper function that converts git2::Commit to
backend::Commit struct, and the commit id can also be obtained from the
git2::Commit object.
2023-05-21 08:29:00 +09:00
Yuya Nishihara
5dba0502cb git_backend: cache head of saved extra metadata table
Just because we know the latest table head.
2023-05-21 08:29:00 +09:00
Yuya Nishihara
a9422460cb git_backend: ensure change id generated from git commit id never reassigned
Fixes #924
2023-05-20 15:53:23 +09:00
Yuya Nishihara
9aa72f6f1d git_backend: add lock to prevent racy change id assignments
My first attempt was to fix up corrupted index when merging, but it turned
out to be not easy because the self side may contain corrupted data. It's
also possible that two concurrent commit operations have exactly the same
view state (because change id isn't hashed into commit id), and only the
table heads diverge.

#924
2023-05-20 15:53:23 +09:00
Yuya Nishihara
e224044dea git_backend: consistently use CommitId type to look up extra metadata table 2023-05-20 15:53:23 +09:00
Yuya Nishihara
78c8dbc8fe git_backend: extract helper to add extra metadata entry and save table 2023-05-20 15:53:23 +09:00