This doesn't work yet since write_file() overwrites the existing file, which
will be fixed by the next patch.
I've added a callback parameter to update() just because that's the easiest
option. If we want to report the number of the conflicting files (through
CheckoutStats), the callback interface wouldn't work nicely and the error
handling would have to be moved to the update() body. If we want to make
both check_out() and set_sparse_patterns() ignore EEXIST error, we can
eliminate the calback parameter at all.
If a file gets replaced by a directory right after list files in a
directory but before we stat the file, we currently crash. Let's
instead treat it as a missing file, using the mechanism introduced for
#258.
This patch makes us treat special files (e.g. Unix sockets) as absent
when snapshotting the working copy. We can consider later reporting
such files back to the caller (possibly via callback) so it can inform
the user about them.
Closes#258
This patch is essentially f6a516ff6d78 taken further, to also apply to
when we write a symlink or a conflict. As with regular files, these
races seem very unlikely to happen, but I found these cases while
working on #258, so let's fix. Fixing it also means that we don't need
to handle these transition cases in the next patch (when
`file_states()` can indicate that the file is e.g. a socket).
The regular `Display` format is (not surprisingly) more user-friendly,
as pointed out by @yuja.
I also switched to using format strings for these cases, and some
nearby strings for consistency.
We can easily make the `DirEntry` available here, so we can call
`.metadata()` on that instead of on the `Path`. I think that avoids
walking the path. I'm sure this has no significant impact on
performance, but it's also almost as readable.
When we have just written a file to disk on checkout, let's record the
size we expected instead of what we got from `fstat()`. This should
address a race where the file was modified between the time we wrote
it and the time we requested its stat. I just happened to notice this
while going through the code; it seems very unlikely to be noticed in
practice.
When we have just written a file or conflict, we can get metadata for
it via the open file descriptor instead of using the path. That
removes the risk of a race where the file got removed or replaced by
another file type (at least on Unix).
These assertions were there to catch bugs, but when the bugs happen,
the assertions can obsure the underlying error (as @tp-woven found out
on #258). Let's just print errors instead.
The biggest difference in the API is that fields are now public. The
exception from that is `oneof` fields, which still require setters and
getters.
I couldn't measure any difference in performance. I didn't expect any
difference either, but it's good that it didn't seem to regress. I
timed `jj debug operation <some hash prefix>`, which will read the
whole operation log (to check that the prefix is unambiguous).
I think I copied the name `write_tree()` from Git, but I find it quite
confusing, since it's not clear if it write a tree to the working copy
or reads the working copy and writes a tree to the store (it's the
former).
When committing the working copy, we try to not visit ignored
directories (as e.g. `target/` often is), but we need to visit it if
there are already tracked files in it. I initially missed that in
c1060610bda2 and then fixed it in a028f33e3b21. The fix works by
checking if the next path after the ignored path is inside the ignore
path (viewed as a directory). However, I forgot to handle the case
where there are no paths at all after the ignored path. So, for
example, if the `target/` directory should be ignored and it there
were no tracked paths after `target/` in alphabetical order, we would
still visit the directory. That's why the bug reproduced in the
`git-branchless` repo but not in the `jj` repo (because there are
files under `testing/` and `tests/` here).
Closes#247.
This patch makes room for sparse patterns in the `TreeState` proto
message. We also start setting that value to a list of just the
pattern `.` when we create new working copies. Old working copies
without the sparse patterns are also interpreted as having that single
pattern. Note that this absence of sparse patterns is different from a
present list of no patterns. The latter is a valid state and means
that no paths are included in the sparse checkout.
We do it for all the other kinds of objects already. It's useful to
have the path for backends that store objects by path (we don't have
any such backends yet). I think the reason I didn't do it from the
beginning was because we had separate `RepoPath` types for files and
directories back then.
The library crate shouldn't look up the user's `$HOME` directory
(maybe the library is used by a server process), so let's have the
caller pass it into the library crate instead.
We no longer need the commit ID, so we shouldn't make the callers pass
it. This lets us simplify several tests, because they no longer to
create commits just to check out a tree in the working copy.
We used to use the value to detect races, but we use the tree ID and
the operation ID these days, so we don't need the commit ID.
By changing this, we can avoid creating some commit IDs in tests,
which is why I tackled this issue now.
There are only two callers of `LockedWorkingCopy::check_out()`. One is
in `commands.rs`. That caller already checks after taking the lock
that the old commit ID is as expected. The other caller is
`WorkingCopy::check_out()`. We can simply move the check to that level
since it's the only caller that cares now.
`LockedWorkingCopy::discard()` shouldn't result in changes to the
on-disk state, but `LockedWorkingCopy::check_out()` may have already
written a state file, which is surprising. The changes also remain in
memory, which is also surprising. Let's fix both of those issues.
When there are concurrent operations that want to update the working
copy, it's useful to know which operation was the last to successfully
update the working copy. That can help use decide how to resolve a
mismatch between the repo view's record and the working copy's
record. If we detect such a difference, we can look at the working
copy's operation ID to see if it was updated by an operation before or
after we loaded the repo.
If the working copy's record says that it was updated at operation A
and we have loaded the repo at operation B (after A), we know that the
working copy is stale, so we can automatically update it (or tell the
user to run some command to update it if we think that's more
user-friendly).
Conversely, if we have loaded the repo at operation A and the working
copy's record says that it was updated at operation B, we know that
there was some concurrent operation that updated it. We can then
decide to print a warning telling the user that we skipped updating
because of the conflict. We already have logic for not updating the
working copy if the repo is loaded at an earlier operation, but maybe
we can drop that if we record the operation in the working copy (as
this patch does).
Having the checkout functionality in `LockedWorkingCopy` makes it a
little more flexible (one could imagine using it for udating working
copy files and then discarding the state changes, for example). It
also lets us reuse a few lines of code for locking. I left
`WorkingCopy::check_out()` for convenience because that's what all
current users want.
`WorkingCopy::check_out()` currently fails if the commit recorded on
disk has changed since it was last read. It fails with a "concurrent
checkout" error. That usually works well in practice, but one can
imagine cases where it's not correct. For an example where the current
behavior is wrong, consider this sequence of events:
1. Process A loads the repo and working copy.
2. Process B loads the repo at operation A. It has not loaded the
working copy yet.
3. Process A writes an operation and updates the working copy.
4. Process B loads the working copy and sees that it is checked out
to the commit process B set it to. We don't currently have any
checks that the working copy commit matches the view's checkout
(though I plan to add that).
5. Process B finishes its operation (which is now divergent with the
operation written by process A). It updates the working copy to
the checkout set in the repo view by process B. There's no data
loss here, but the behavior is surprising because we would usually
tell the user that we detected a concurrent update to the working
copy.
We should instead check that the working copy's commit on disk matches
what the previous repo view said, i.e. the view at the start of the
operation we just committed. This patch does that by having the caller
pass in the expected old commit ID.