cybercyst/jj - jj - Gitea: Git with a cup of tea

mirror of https://github.com/martinvonz/jj.git synced 2025-05-13 11:14:25 +00:00

Author	SHA1	Message	Date
Yuya Nishihara	06c254e742	git_backend: use non-owned str::from_utf8() to decode symlink target Just for consistency with the other changes. str::Utf8Error is 2 words long, so I removed the boxing.	2023-10-31 06:51:27 +09:00
Yuya Nishihara	d1c71c05c9	git_backend: remove redundant error handling for invalid hash length The only error that could be returned by libgit2 is invalid hash length, and we check that explicitly. If we switch the backends to gitoxide, there will be panicking constructor. https://docs.rs/git2/latest/git2/struct.Oid.html#method.from_bytes	2023-10-31 06:51:27 +09:00
Martin von Zweigbergk	35a23172ec	backend: delete unused `Phase` enum The idea was to support phases like in hg, but that hasn't happened yet. We can add back this simple enum if we do add support for phases.	2023-10-29 12:02:40 -07:00
Martin von Zweigbergk	cfcdd71865	backend: make `read_conflict` synchronous again This avoids https://github.com/rust-lang/futures-rs/issues/2090. I don't think we need to worry about reading legacy conflicts asynchronously - async is really only useful for Google's backend right now, and we don't use the legacy format at Google. In particular, I don't want `MergedTree::value()` to have to be async.	2023-10-28 16:45:40 -07:00
Martin von Zweigbergk	5174489959	backend: make read functions async The commit backend at Google is cloud-based (and so are the other backends); it reads and writes commits from/to a server, which stores them in a database. That makes latency much higher than for disk-based backends. To reduce the latency, we have a local daemon process that caches and prefetches objects. There are still many cases where latency is high, such as when diffing two uncached commits. We can improve that by changing some of our (jj's) algorithms to read many objects concurrently from the backend. In the case of tree-diffing, we can fetch one level (depth) of the tree at a time. There are several ways of doing that: * Make the backend methods `async` * Use many threads for reading from the backend * Add backend methods for batch reading I don't think we typically need CPU parallelism, so it's wasteful to have hundreds of threads running in order to fetch hundreds of objects in parallel (especially when using a synchronous backend like the Git backend). Batching would work well for the tree-diffing case, but it's not as composable as `async`. For example, if we wanted to fetch some commits at the same time as we were doing a diff, it's hard to see how to do that with batching. Using async seems like our best bet. I didn't make the backend interface's write functions async because writes are already async with the daemon we have at Google. That daemon will hash the object and immediately return, and then send the object to the server in the background. I think any cloud-based solution will need a similar daemon process. However, we may need to reconsider this if/when jj gets used on a server with a custom backend that writes directly to a database (i.e. no async daemon in between). I've tried to measure the performance impact. That's the largest difference I've been able to measure was on `jj diff --ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo, which increases from 749 ms to 773 ms (3.3%). In most cases I've tested, there's no measurable difference. I've tried diffing from the root commit, as well as `jj --ignore-working-copy log --no-graph -r '::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a commit-heavy load).	2023-10-08 23:36:49 -07:00
Martin von Zweigbergk	d575aaeca8	backend: move constant functions first `root_commit_id()`, `root_change_id()`, and `empty_tree_id()` were strangely ordered between `write_symlink()` and `read_tree().	2023-09-19 05:24:51 -07:00
Martin von Zweigbergk	61501db8ec	merged_trees: consider conflict-format-change-only commits empty When we start writing tree-level conflicts in an existing repo, we don't want commits that change the format to be non-empty if they don't change any content. This patch updates `MergeTreeId::eq()` to consider two resolved trees equal even if only their `MergedTreeId` variant is different (one is path-level and one is tree-level). I think I've gone through all places we compare tree ids and checked that it's safe to compare them this way. One consequence is that rebasing a commit without changing the parents (typically auto-rebasing after `jj describe`) will not lead to the tree id getting upgraded, due to an optimization we have for that case. I don't think that's serious enough to handle specially; we'll have to support the old format for existing repos for a while regardless of a few commits not getting upgraded right away. The number of failing tests with the config option enabled drop from 108 to 11 with this patch.	2023-08-30 06:17:21 -07:00
Martin von Zweigbergk	7e6930b56f	backend: remove last few instances of `MergedTreeId::as_legacy_tree_id()`	2023-08-30 06:17:21 -07:00
Martin von Zweigbergk	962da1947e	tests: make `dump_tree()` work with merged trees My goal is to minimize impact on tests when we start using the new format.	2023-08-30 06:17:21 -07:00
Waleed Khan	56c61fd047	merge_tools: create builtin diff editor	2023-08-30 05:38:10 -04:00
Martin von Zweigbergk	bd6098e09e	cli: merge trees via `MergedTree` in `jj move`	2023-08-28 15:58:34 -07:00
Martin von Zweigbergk	36674e8f7e	merged_tree: make `id()` return a `MergedTreeId` We will rarely want to use the tree id without knowing whether it can contain `TreeValue::Conflict` values, so let's make the callers check.	2023-08-27 06:49:45 -07:00
Martin von Zweigbergk	fd4146d485	backend: use new enum for `Commit::root_tree` We currently represent the root tree id in a commit by `Merge<TreeId>` plus a boolean `uses_tree_conflict_format`. It's better to use an enum for that. That makes it harder to forget to check which type of tree it is, and it makes it impossible to store a legacy tree with multiple ids (as we could with `uses_tree_conflict_format=false`, `root_tree=Merge::new(...)`). Maybe more importantly, we're also going to want to pass around this information in most places where we currently pass a single `TreeId`, and passing two separate values would be annoying.	2023-08-26 07:02:04 -07:00
Waleed Khan	134d85e635	backend: reduce `BackendError` size somewhat One of the error types that I later created embedded `BackendError`, but `clippy` complained that the size of the type was too large. This helps address that.	2023-08-23 21:11:15 -07:00
Martin von Zweigbergk	ef5f97f8d7	conflicts: move `Merge<T>` to `merge` module The `merge` module now seems like the obvious place for this type.	2023-08-06 22:08:09 +00:00
Martin von Zweigbergk	ecc030848d	conflicts: rename `Conflict<T>` to `Merge<T>` Since `Conflict<T>` can also represent a non-conflict state (a single term), `Merge<T>` seems like better name. Thanks to @ilyagr for the suggestion in https://github.com/martinvonz/jj/pull/1774#discussion_r1257547709 Sorry about the churn. It would have been better if I thought of this name before I introduced `Conflict<T>`.	2023-08-06 22:08:09 +00:00
Martin von Zweigbergk	84a60d15bc	op_store: make `ViewId` and `OperationId` implement `ObjectId`	2023-07-26 14:17:21 -07:00
Martin von Zweigbergk	006c764694	backend: learn to store tree-level conflicts Tree-level conflicts (#1624) will be stored as multiple trees associated with a single commit. This patch adds support for that in `backend::Commit` and in the backends. When the Git backend writes a tree conflict, it creates a special root tree for the commit. That tree has only the individual trees from the conflict as subtrees. That way we prevent the trees from getting GC'd. We also write the tree ids to the extra metadata table (i.e. outside of the Git repo) so we don't need to load the tree object to determine if there are conflicts. I also added new flag to `backend::Commit` indicating whether the commit is a new-style commit (with support for tree-level conflicts). That will help with the migration. We will remove it once we no longer care about old repos. When the flag is set, we know that a commit with a single tree cannot have conflicts. When the flag is not set, it's an old-style commit where we have to walk the whole tree to find conflicts.	2023-07-19 22:04:16 -07:00
Waleed Khan	54dba51a08	docs: warn about missing docs for `jj-lib` crate	2023-07-10 18:28:59 +03:00
Yuya Nishihara	cf8a0466c4	backend: introduce error types specific to init/load phases Errors that may occur while loading backend would vary per backends, and it's unlikely that these errors could be mapped to BackendError variants other than BackendError::Other. So let's extract Other(_) of that kind as a separate type to clarify there would be no other error variants. Perhaps, Backend/Error will be renamed to CommitBackend/Error or CommitStore/Error?, whereas I think BackendInit/LoadError can be shared among store factories.	2023-07-06 20:48:46 +09:00
Yuya Nishihara	e1e75daa8e	backend: make BackendError::Other preserve source error object	2023-07-06 20:48:46 +09:00
Martin von Zweigbergk	99226bb96d	tree: simplify diff iterator by leveraging `Tree::value()` This is much simpler and I was slightly surprised that it doesn't have much impact on performance. I tried `jj --ignore-working-copy diff -s --from root --to v5.15` in the Linux kernel repo, and there was perhaps a 1.5% slowdown (508 ms -> 515 ms). In more normal cases (like diffing a single commit against its parent), I couldn't measure any difference at all.	2023-07-06 11:21:21 +02:00
Martin von Zweigbergk	651a3cbe15	rewrite: delete TODOs about labels for each term in a conflict I don't think we'll want to record a label for each term, because such labels would get stale, and it seems hard to make them make sense after transferring a remote to another repo. I think we'll probably want to infer labels on demand instead (#1176).	2023-07-05 16:50:27 +02:00
Martin von Zweigbergk	6bd13382f4	backend: add a function for setting or removing a tree entry	2023-06-30 14:43:58 +02:00
Martin von Zweigbergk	a95188ddbc	backend: take commit to write by value and return new value The internal backend at Google doesn't let you write any value you want for in the committer field. The `Store` type still caches the value it attempted to write, which gets a little weird when the written value is not what we tried to write. We should use the value the backend actually wrote. However, we don't know if the backend changed anything without reading the value back, which is often wasteful. This commit changes the API to return the written value. I only changed the signature of `write_commit()` for now. Maybe we should make a similar change to `write_tree()`.	2023-05-12 15:20:44 -07:00
Martin von Zweigbergk	e7419e76a1	backend: replace `git_repo()` by `as_any()` This has several advantages: * Makes it possible to downcast to non-Git custom backends (might be useful at Google, but we haven't needed it yet) * Lets us access more specific functionality on the `GitBackend`, making it possible to access the `git2::Repository` without creating a copy of it. * Removes the dependency on Git from the backend	2023-05-12 08:05:09 -07:00
Martin von Zweigbergk	a87125d08b	backend: rename `ConflictPart` to `ConflictTerm` It took a while before I realized that conflicts could be modeled as simple algebraic expressions with positive and negative terms (they were modeled as recursive 3-way conflicts initially). We've been thinking of them that way for a while now, so let's make the `ConflictPart` name match that model.	2023-02-17 23:28:50 -08:00
Martin von Zweigbergk	d1dc22d957	backend: let backend decide length of change id As mentioned in the previous commit, our internal backend at Google uses a 32-byte long change id. This commit will make us able to use that.	2023-02-07 22:31:34 -08:00
Martin von Zweigbergk	e6693d0f68	backend: let backend choose root change id Our internal backend at Google uses a 32-byte change id, so I'd like to make the backend able to decide the length. To start with, let's make the backend able to decide what the root change id should be. That's consistent with how we already let the backend decide what the root commit id should be.	2023-02-07 22:31:34 -08:00
Martin von Zweigbergk	98259346df	backend: make `hash_length()` specifically about commit IDs The function is currently only about the length of commit IDs, so let's clarify that. I'm going to add another function for the length of change IDs next. I don't know if we're going to care about lengths of other hashes in the future. We might even be able to remove the current restriction that all commit IDs and all change IDs have the same length.	2023-02-07 22:31:34 -08:00
Yuya Nishihara	1a4b5c5ee6	index: make IdIndex store raw bytes, not hex bytes This helps us to migrate commit_id index to ReadonlyIndex. For large repositories, this also reduces initialization cost, but that's not the main intent of this change. https://github.com/martinvonz/jj/pull/1041#issuecomment-1399225876 common_hex_len() and iter_half_bytes() are added to backend.rs since more call sites will be added to index.rs, and I feel index.rs isn't a good place to host this kind of utility functions.	2023-01-22 12:03:08 +09:00
Yuya Nishihara	8c0f7d7707	backend: define root change id statically I made it a free function. Alternatively, the root id could be instantiated by and obtained through backend, but I don't think we'll need such level of abstraction. I'm going to add a workaround for shortest prefix calculation of the root ids, where this function will be used.	2023-01-22 12:03:08 +09:00
Yuya Nishihara	ef33bd76df	backend: declare CHANGE_ID_HASH_LENGTH as constant	2023-01-22 12:03:08 +09:00
Martin von Zweigbergk	8a1b21ff73	backend: implement equality for commits and trees It can be useful in tests to be able to compare two commits or trees. Most other structs already implement equality.	2023-01-20 23:26:20 -08:00
Samuel Tardieu	bdaebf33c4	style: do not dereference self to perform pattern-matching Dereferencing `self` as `*self` in order to perform patten-matching using `ref` is unnecessary and will be done automatically by the compiler (match ergonomics, introduced in Rust 1.26).	2023-01-14 19:28:24 +01:00
Waleed Khan	af55d17a25	git_backend: propagate various errors I needed this in the course of debugging an error. Before this commit, the error looked like this: ``` Error: Unexpected error from backend: Object not found ``` After this commit, it looks like this: ``` Error: Unexpected error from backend: Object with CommitId 8f59646bc9bb6bb44b5624f1248f4a708f37003c not found: object not found - no match for id (8f59646bc9bb6bb44b5624f1248f4a708f37003c); class=Odb (9); code=NotFound (-3) ```	2023-01-02 12:28:51 -06:00
Waleed Khan	e299963fae	backend: remove `PartialEq`/`Eq` implementations As soon as we start tracking the `#[source]` for error variants, we won't be able to rely on the presence of `Eq` implementations.	2023-01-02 12:28:51 -06:00
Waleed Khan	456be4cc73	backend: create `BackendError::InvalidHashLength` Strictly speaking, we could rely on e.g. `git2::Oid::from_str` to produce an error, but I figure that having an explicit error for a mismatching hash length might demystify some error condition in the future, since commit IDs and change IDs and potentially other backends' IDs may have different lengths, so this could flag a mismatch earlier/more obviously.	2023-01-02 12:28:51 -06:00
Waleed Khan	7f8a196ab2	backend: create `ObjectId` trait This lets us operate over various kinds of objects polymorphically (e.g. call `.hex()` on any kind of object hash).	2023-01-02 12:28:51 -06:00
Yuya Nishihara	587e42d65d	backend: deduplicate id type declarations by using declarative macro	2022-12-23 23:52:03 +09:00
Yuya Nishihara	b07c0db56b	backend: deduplicate id type impls by using declarative macro It's unlikely we'll need to customize these impls per type, so let's ensure that these newtypes have identical implementations. This commit also adds from_hex() to FileId, SymlinkId, and ConflictId.	2022-12-23 23:52:03 +09:00
Martin von Zweigbergk	d8feed9be4	copyright: change from "Google LLC" to "The Jujutsu Authors" Let's acknowledge everyone's contributions by replacing "Google LLC" in the copyright header by "The Jujutsu Authors". If I understand correctly, it won't have any legal effect, but maybe it still helps reduce concerns from contributors (though I haven't heard any concerns). Google employees can read about Google's policy at go/releasing/contributions#copyright.	2022-11-28 06:05:45 -10:00
Martin von Zweigbergk	780d7fb59c	backend: rename `NormalFile` to just `File` There are no "non-normal" files, so "normal" is not needed. We have symlinks and conflicts, but they are not files, so I think just "file" is unambiguous. I left `testutils::write_normal_file()` because there it's used to mean "not executable file" (there's also a `write_executable_file()`). I left `working_copy::FileType::Normal` since renaming `Normal` there to `File` would also suggest we should rename `FileType`, and I don't know what would be a better name for that type.	2022-11-14 23:36:43 -08:00
Benjamin Saunders	c3bfe72754	local_backend: use ContentHash rather than hashing protos Insulates identifiers from the unstable serialized form.	2022-11-12 21:40:36 -08:00
Benjamin Saunders	2447dfeed8	simple_op_store: hash view/operation data directly Decouples view/operation IDs from serialized forms, which are not necessarily stable. Not breaking as these IDs are persistent, never recomputed or used for integrity checking.	2022-11-12 21:40:36 -08:00
Martin von Zweigbergk	6703810c6e	backend: remove `Commit::is_open` field from data model	2022-11-05 06:14:37 -07:00
Martin von Zweigbergk	3b3f6129e6	backend: allow negative timestamps in commits and operations I was reading a draft of "Git Rev News: Edition 91" [1] where Peff mentions some unfinished patches to allow negative timestamps in Git. So I figured I should add support for that before I forget. I haven't checked if libgit2 supports it, so it might be that our Git backend still doesn't support it after this patch. [1] https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-91.md	2022-09-30 00:50:17 -07:00
Martin von Zweigbergk	de7b5cf8b0	repo: write format ("git" or "local") to disk on init We currently determine if the repo uses the Git backend or the local backend by checking for presence of a `.jj/repo/store/git_target` file. To make it easier to add out-of-tree backends, let's instead add a file that indicates which backend to use.	2022-09-25 09:40:42 -07:00
Martin von Zweigbergk	fb8d087882	backend: make backend aware of root commit I had made the backends unaware of the virtual root commit because they don't need to know about it, and we could avoid some duplicated code by putting that in `Store` instead. However, as we saw in b21a123bc894, the root commit being virtual has some user-visible effects (they can't create a merge with the root and some other commit). So I'm thinking that we may want to make the root commit an actual commit, depending on which backend is used. Specificially, when using the Git backend, we cannot record the root commit as an actual parent since Git would fail when trying to look it up. Backends that don't need compatibility can make the root commit an actual commit, however. This commit therefore makes the backends aware of the root commit. It makes it remain a virtual commit in the Git backend, and makes it an actual commit in the `LocalBackend`. This commit breaks any existing repos using the `LocalBackend`, but there shouldn't be any such repos other than for testing.	2022-09-20 21:20:57 -07:00
Martin von Zweigbergk	1d9f1720c5	backend: add a `Tree::from_hex()` helper	2022-09-20 21:20:57 -07:00

1 2

59 Commits