252 Commits

Author SHA1 Message Date
Martin von Zweigbergk
23f54b8151 working_copy: propagate errors when reading conflicted file 2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
33a93b6d2d working_copy: reduce scope of a content variable
This also avoids reading non-file conflict from disk.
2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
585c212617 working_copy: reduce scope of an executable variable 2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
2102de94b0 working_copy: inline write_conflict_to_store()
For tree-level conflicts, we're eventually not going to have
`ConflictId`. We'd want to make `write_conflict_to_store()` take a
`Merge<Option<TreeValue>>` and return an updated such value. That
would leave very little logic in the function, so let's just inline it
instead.
2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
4c46398b1c conflicts: make update_from_content() write resolved content to store
`update_from_content()` already writes file content for each term of
an unresolved merge, so it seems consistent for it to also write the
file content for resolved merges. I think this should simplify further
refactoring for tree-level conflicts and for preserving the executable
bit.
2023-08-11 23:59:44 +00:00
Martin von Zweigbergk
0b85f06e3d conflicts: make update_from_content() work with only FileIds
Since `update_from_contents()` only works with file contents and not
the executable or other kinds of paths, I think it makes more sense
for it to deal with `FileId`s instead of `TreeValue`s.
2023-08-11 23:59:44 +00:00
Martin von Zweigbergk
a995c66635 merge: move some methods back to conflicts as free functions
I think I moved way too many functions onto `Merge<Option<TreeValue>>`
in 82883e648da4. This effectively reverts almost all of that
commit. The `Merge<T>` type is simple container and it seems like it
should be at fairly low level in the dependency graph. By moving
functions off of it, we can get rid of the back-depdencies from the
`merge` module to the `conflict` module that I introduced when I moved
`Merge` to the `merge` module. I'm thinking the `conflict` module can
focus on materialized conflicts.
2023-08-11 21:11:25 +00:00
Martin von Zweigbergk
abc7312dbc working_copy: avoid an unused variable on Windows 2023-08-11 01:14:52 +00:00
Martin von Zweigbergk
14ddd17673 working_copy: add debug assertion that tree and file states match
Perhaps the most important invariant in `.jj/working_copy/tree_state`
is that its set of files in it matches the files in its tree. In
particular, if a file that exists in the tree doesn't exist in the
file state and doesn't exist on disk either, we won't notice that it's
gone, and we will therefore not delete it from the tree on future
rounds of snapshotting either.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
6cce5e758b working_copy: reduce scope of some variables
With the recent refactorings, we don't need the `tree_builder` and
`deleted_files` until a bit later.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
16d00581f6 working_copy: add trace scope to tree-writing call
Writing the tree can probably take a bit of time when the working copy
has changed.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
d06f51a88c working_copy: split up tracing scope a bit
Now that we process the outputs from the file system traversal by
reading from channels, we can separate the processing from the file
system traversal. When the working copy is unchanged, processing tree
entries and deleted files takes practically no time, but processing
file states and present files takes significant time.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
b27b686b4e working_copy: rename deleted_files_tx to present_files_tx
We use the chanell to report the files that exist, so
`deleted_files_tx` seems confusing.
2023-08-06 22:17:18 +00:00
Waleed Khan
e1c194ce67 working_copy: rename WorkItem -> DirectoryToVisit 2023-08-03 19:09:59 +00:00
Waleed Khan
84f807d222 working_copy: traverse filesystem in parallel
This improves `jj status` time by a factor of ~2x on my machine (M1 Macbook Pro 2021 16-inch, uses an SSD):

```sh
$ hyperfine --parameter-list hash before,after --parameter-list repo nixpkgs,gecko-dev --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../{repo} st'
Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before)
  Time (mean ± σ):      1.640 s ±  0.019 s    [User: 0.580 s, System: 1.044 s]
  Range (min … max):    1.621 s …  1.673 s    10 runs

Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
  Time (mean ± σ):     760.0 ms ±   5.4 ms    [User: 812.9 ms, System: 2214.6 ms]
  Range (min … max):   751.4 ms … 768.7 ms    10 runs

Benchmark 3: ./target/release-with-debug/jj -R ../gecko-dev st (hash = before)
  Time (mean ± σ):     11.403 s ±  0.648 s    [User: 4.546 s, System: 5.932 s]
  Range (min … max):   10.553 s … 12.718 s    10 runs

Benchmark 4: ./target/release-with-debug/jj -R ../gecko-dev st (hash = after)
  Time (mean ± σ):      5.974 s ±  0.028 s    [User: 5.387 s, System: 11.959 s]
  Range (min … max):    5.937 s …  6.024 s    10 runs

$ hyperfine --parameter-list repo nixpkgs,gecko-dev --warmup 3 'git -C ../{repo} status'
Benchmark 1: git -C ../nixpkgs status
  Time (mean ± σ):     865.4 ms ±   8.4 ms    [User: 119.4 ms, System: 1401.2 ms]
  Range (min … max):   852.8 ms … 879.1 ms    10 runs

Benchmark 2: git -C ../gecko-dev status
  Time (mean ± σ):      2.892 s ±  0.029 s    [User: 0.458 s, System: 14.244 s]
  Range (min … max):    2.837 s …  2.934 s    10 runs
```

Conclusions:

- ~2x improvement from previous `jj status` time.
- Slightly faster than Git on nixpkgs.
- Still 2x slower than Git on gecko-dev, not sure why.

For reference, Git's default number of threads is defined in the `online_cpus` function: ee48e70a82/thread-utils.c (L21-L66). We are using whatever the Rayon default is.
2023-08-03 18:20:49 +00:00
Waleed Khan
326be7c91e working_copy: send updates via channel
In preparation of traversing the filesystem in parallel, send updates via `channel`.

An alternative is to modify shared mutable state, e.g. put `self.file_states` behind a mutex or use a concurrent hash-map. This risks leaving the `TreeState` in an invalid state if an error occurs, and makes invariants harder to reason about.

Using a channel introduces a small performance regression. (I didn't try out the concurrent hash-map approach.)

```sh
$ hyperfine --parameter-list hash before,after --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../nixpkgs st'
Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before)
  Time (mean ± σ):      1.533 s ±  0.013 s    [User: 0.587 s, System: 0.926 s]
  Range (min … max):    1.510 s …  1.559 s    10 runs

Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
  Time (mean ± σ):      1.563 s ±  0.021 s    [User: 0.607 s, System: 0.936 s]
  Range (min … max):    1.518 s …  1.595 s    10 runs

Summary
  ./target/release-with-debug/jj -R ../nixpkgs st (hash = before) ran
    1.02 ± 0.02 times faster than ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
```
2023-08-03 17:56:05 +00:00
Waleed Khan
174704d752 working_copy: extract visit_directory function for snapshotting 2023-08-03 17:40:18 +00:00
Waleed Khan
515fb02049 working_copy: extract WorkItem to top-level struct 2023-08-03 09:49:22 -07:00
Yuya Nishihara
d17ef14956 merge_tools: extract 2-way diff checkout helper
The directory prefix is renamed to "jj-diff-" as I'm going to use it for
"jj diff --tool <external-diff-generator>".
2023-08-03 13:53:37 +09:00
Martin von Zweigbergk
48b1a1c533 working_copy: in ignored directories, visit only already tracked paths
`.gitignores` in ignored directories should be ignored. Before this
commit, we would visit ignored directories like any others if there
were any ignored paths in them.

I've done a lot of preparation for this commit, but There's still a
bit of duplication between the new code and the existing code. I don't
mind improving it if anyone has suggestions. Otherwise I might end up
doing that when I get back to working on snapshotting tree-level
conflicts soon.

This fixes #1785.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
bcba1c6682 working_copy: rename sub_path to path
The `sub_path` is created by joining `dir` to a basename. I think
calling it just `path` is clear, especially since its the main path
involved in each iteration of the loop.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
0dc5d967ae working_copy: move a duplicate statement out of match block 2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
b48b3780c8 working_copy: replace FileStateUpdate by Option
The `FileStateUpdate` enum now looks very similar to `Option`, so
let's just use that. I also renamed `get_updated_file_state()` to
`get_updated_tree_value()` since it returns a `TreeValue`.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
035d4bbbae working_copy: remove file state for deleted files in only one place
We currently remove the file state for deleted files after walking the
working copy and noticing that the file is not there. However, in the
case of files that have been replaced by special files like Unix
sockets, we delete the file state inside the loop. Let's simplify a
tiny bit by not doing that.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
4fa2a27f38 working_copy: treat a missing file state as dirty
If we don't have a recorded state for a file, we assume that it's new,
so we add it to the tree as the type it appears on disk. That means we
won't check if it exists as a conflict in the current tree. As another
step towards making the file state just a cache, let's instead treat
this case as a dirty file, so we look up the current value from the
tree. That means that adding files will be a tiny bit slower, but I
doubt it will be noticeable (we need to read the file from disk and
write it to the backend anyway).
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
cb8ff84cc8 working_copy: don't pass FileState through get_updated_file_state()
Since the caller now has the `FileState`, there's no need to pass it
in by value only to get it back in the return value.
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
01feb40fbb working_copy: handle deleted files outside get_updated_file_state()
This is simpler, and it will enable further simplfications.
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
5cc2c91453 working_copy: pass in PathBuf and Metadata to get_updated_file_state()
This will let us call the function even if we don't have a `DirEntry`.
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
37d9aae894 working_copy: handle ignored files outside of get_updated_file_state()
I want to replace the `DirEntry` argument to
`get_updated_file_state()` by a `PathBuf` and a `Metadata`. To avoid
always reading the metadata, we need to check for ignored files
outside of `get_updated_file_state()`. I also think that gives the
call site a nice symmetry in how we use the `git_ignore` for
directories (`.matches_all_files_in()`)) and files
(`.matches_file()`).
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
be8d471e76 working_copy: preserve executable-ness from tree on Windows
This removes another little bit (literally) of dependency on the
cached file state by reading the old executable bit from the current
tree instead. That helps make it possible to discard the file states
without affecting the resulting snapshot, as we may want to do with
Watchman.
2023-07-31 05:48:32 +00:00
Martin von Zweigbergk
37a770e8b4 working_copy: make write_conflict_to_store() also handle conflicts
With this change, `write_path_to_store()` contains all the logic for
reading a file from disk and writing it to a `TreeBuilder`, making the
code for added and modified files more similar.
2023-07-31 05:48:32 +00:00
Martin von Zweigbergk
beb997e85a watchman: don't even add non-watchman files to set of deleted files
It's faster to add only files matched by the Watchman matcher to the
set of deleted files than to add all files and then removed files not
matched. This speeds up `jj diff` with Watchman in the Linux repo from
~530 ms to ~460 ms.
2023-07-28 12:12:09 -07:00
Waleed Khan
4b635e9713 working_copy: use tree rather than file states to detect if directory is tracked
This moves us closer towards treating the `file_states` map as purely a cache rather than authoritative state.
2023-07-28 10:15:33 -07:00
Waleed Khan
9d8702b537 refactor(working_copy): return new file state from update_file_state 2023-07-28 09:52:37 -07:00
Waleed Khan
c409f4ed53 refactor(working_copy): hoist current and new file states to top of update_file_state
Reduces some indentation and simplifies some control flow.
2023-07-28 09:52:37 -07:00
Waleed Khan
3264135489 refactor(working_copy): return updated new tree value from update_file_state
The intention in this and some future commits is to have `update_file_state` accept `&self` instead of `&mut self` to make clear what data is updated by `update_file_state` and to ensure transactional safety of the `TreeState` contents.
2023-07-28 09:52:37 -07:00
Waleed Khan
1e28b312c6 perf: instrument some steps in snapshot 2023-07-28 09:28:01 -07:00
Waleed Khan
018bb88ec6 perf: add several #[instrument]s 2023-07-28 09:28:01 -07:00
Waleed Khan
7875656354 perf: add #[instrument] to all cmd_* functions 2023-07-28 09:28:01 -07:00
Martin von Zweigbergk
7cc916e4f1 working_copy: in mtime race case, don't mutate current state
There's a comment saying that mutating the file's current state
simplifies later logic, but I don't think that's true. It might have
been true in the past, when we had `FileType::Conflict`.
2023-07-24 16:41:44 -07:00
Martin von Zweigbergk
861788ae09 working_copy: propagate errors from failing to read conflict file 2023-07-24 15:14:42 -07:00
Martin von Zweigbergk
99fec7ed06 working_copy: don't re-read resolved conflict from disk on snapshot
When snapshotting a conflict, we read the contents and parse conflict
markers. If we determine that it's no longer a conflict, then we write
a normal file entry to the tree instead. When we do that, we re-read
the file from the working copy. Let's instead write the file contents
we already read.
2023-07-24 15:14:42 -07:00
Martin von Zweigbergk
72d389cefb working_copy: extract a function for writing a conflict 2023-07-24 15:14:42 -07:00
Martin von Zweigbergk
c7e736f73c working_copy: update file state for conflict and non-conflict the same
When a file's mtime has changed on disk, we update our record of that
mtime, but we did so in a separate place for conflicts compared to
non-conflicts. Let's reuse it.
2023-07-24 15:14:42 -07:00
Martin von Zweigbergk
10a2a15993 working_copy: don't track conflict-ness in state file, use tree object
The working copy's current tree tracks whether a file is a
conflict. We also track that in the `TreeState` object. That allows us
to not read the trees object to decide if we should try to parse a
file as a conflict.

One disadvantage is that it's redundant information that needs to be
kept in sync with the tree object. Also, for Watchman, we would like
to completely ignore the persisted `FileState`.

This commit removes the `FileType::Conflict` variant and instead
checks in the tree object whether a given path was a conflict. This is
the change I mentioned in dc8a20773760. We still skip the check
completely if the file's mtime etc. is unchanged, so it shouldn't have
much effect in the common case of a mostly unchanged working copy. I
measured a slowdown on `jj diff` by ~3% in the Linux repo with a clean
working copy with all mtimes bumped. I think the simpler code and
reduced risk of subtle bugs is worth the performance hit.
2023-07-24 15:02:33 -07:00
Yuya Nishihara
465b8d9cf0 working_copy: remove unused SnapshotError::FileOpenError variant
Perhaps, this would have been generalized as IoError.
2023-07-23 14:58:17 +09:00
Martin von Zweigbergk
066f31a15d working_copy: remove an unneccessary Box 2023-07-21 12:16:02 -07:00
Waleed Khan
6d7998f8c5 working_copy: return Result from WorkingCopy::tree_state/WorkingCopy::tree_state_mut 2023-07-14 13:45:40 -07:00
Waleed Khan
60f1d7e307 working_copy: create and propagate TreeStateError 2023-07-14 13:03:57 -07:00
Waleed Khan
54dba51a08 docs: warn about missing docs for jj-lib crate 2023-07-10 18:28:59 +03:00