We currently remove the file state for deleted files after walking the
working copy and noticing that the file is not there. However, in the
case of files that have been replaced by special files like Unix
sockets, we delete the file state inside the loop. Let's simplify a
tiny bit by not doing that.
If we don't have a recorded state for a file, we assume that it's new,
so we add it to the tree as the type it appears on disk. That means we
won't check if it exists as a conflict in the current tree. As another
step towards making the file state just a cache, let's instead treat
this case as a dirty file, so we look up the current value from the
tree. That means that adding files will be a tiny bit slower, but I
doubt it will be noticeable (we need to read the file from disk and
write it to the backend anyway).
I want to replace the `DirEntry` argument to
`get_updated_file_state()` by a `PathBuf` and a `Metadata`. To avoid
always reading the metadata, we need to check for ignored files
outside of `get_updated_file_state()`. I also think that gives the
call site a nice symmetry in how we use the `git_ignore` for
directories (`.matches_all_files_in()`)) and files
(`.matches_file()`).
This removes another little bit (literally) of dependency on the
cached file state by reading the old executable bit from the current
tree instead. That helps make it possible to discard the file states
without affecting the resulting snapshot, as we may want to do with
Watchman.
With this change, `write_path_to_store()` contains all the logic for
reading a file from disk and writing it to a `TreeBuilder`, making the
code for added and modified files more similar.
It's faster to add only files matched by the Watchman matcher to the
set of deleted files than to add all files and then removed files not
matched. This speeds up `jj diff` with Watchman in the Linux repo from
~530 ms to ~460 ms.
The intention in this and some future commits is to have `update_file_state` accept `&self` instead of `&mut self` to make clear what data is updated by `update_file_state` and to ensure transactional safety of the `TreeState` contents.
There's a comment saying that mutating the file's current state
simplifies later logic, but I don't think that's true. It might have
been true in the past, when we had `FileType::Conflict`.
When snapshotting a conflict, we read the contents and parse conflict
markers. If we determine that it's no longer a conflict, then we write
a normal file entry to the tree instead. When we do that, we re-read
the file from the working copy. Let's instead write the file contents
we already read.
When a file's mtime has changed on disk, we update our record of that
mtime, but we did so in a separate place for conflicts compared to
non-conflicts. Let's reuse it.
The working copy's current tree tracks whether a file is a
conflict. We also track that in the `TreeState` object. That allows us
to not read the trees object to decide if we should try to parse a
file as a conflict.
One disadvantage is that it's redundant information that needs to be
kept in sync with the tree object. Also, for Watchman, we would like
to completely ignore the persisted `FileState`.
This commit removes the `FileType::Conflict` variant and instead
checks in the tree object whether a given path was a conflict. This is
the change I mentioned in dc8a20773760. We still skip the check
completely if the file's mtime etc. is unchanged, so it shouldn't have
much effect in the common case of a mostly unchanged working copy. I
measured a slowdown on `jj diff` by ~3% in the Linux repo with a clean
working copy with all mtimes bumped. I think the simpler code and
reduced risk of subtle bugs is worth the performance hit.
Before we had `conflicts::Conflict`, most of these functions took a
`backend::Conflict`. I think I didn't want to pollute the `backend`
module with this kind of logic, trying to keep it focused on
storage. Now that we have the type in `conflicts`, however, I think it
makes sense to move these functions onto it.
For tree-level conflicts (#1624), I plan to remove `ConflictId`
completely. This commit removes `ConflictId` from
`update_conflict_from_content()` by instead making it take a
`Conflict<Option<TreeValue>>` and return a possibly different such
value.
I made the call site in `working_copy` avoid writing the conflict to
the store if it's unchanged, but I didn't make the same optimization
in `merge_tools` becuase it's much more likely to have changed there.
This prepares for allowing the base tree to be a conflict at the
root-tree level (#1624).
We could remove the `Conflict` variant completely. I tried doing that
and it slowed down `jj diff` by ~3% in the Linux repo with a clean
working copy with only mtime bumped on all files.
I don't think there's a good reason not to write the
`.jj/working_copy/tree_state` file on init. Being able to assume that
the file exists means that we won't need the store object to to lazily
load the `TreeState` object. Well, except that `TreeState` keeps an
`Arc<Store>`, but I'm trying to change that.
I've preferred "working-copy commit" over "checkout" for a while
because I think it's clearer, but there were lots of places still
using "checkout". I've left "checkout" in places where it refers to
the action of updating the working copy or the working-copy commit.
I needed this in the course of debugging an error. Before this commit, the error looked like this:
```
Error: Unexpected error from backend: Object not found
```
After this commit, it looks like this:
```
Error: Unexpected error from backend: Object with CommitId 8f59646bc9bb6bb44b5624f1248f4a708f37003c not found: object not found - no match for id (8f59646bc9bb6bb44b5624f1248f4a708f37003c); class=Odb (9); code=NotFound (-3)
```
I ran an upgraded Clippy on the codebase. All the changes seem to be
about using variables directly in format strings instead of passing
them as separate arguments.
A new FileType, GitSubmodule is added which is ignored. Files or
directories having this type are not added to the work queue and
are ignored in snapshot. Submodules are not created by jujutsu
when resetting or checking out a tree, they should be currently
managed using git.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.