70 Commits

Author SHA1 Message Date
Austin Seipp
4b45dde8c6 clippy: disable bogus lints for nightly clippy
The nightly compiler has several clippy fix-its that, if applied, break the
build. There are various bugs about this, but there isn't enough space in the
margins to detail it all.

Just ignore these on a per-function basis; about 70% of them are just multiple
instances happening inside a single function.

This makes `cargo clippy --workspace --all-targets` run clean, even with the
nightly compiler.

Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ic26a025d3c62b12fbf096171308b56e38f7d1bb9
2024-04-05 11:39:29 -05:00
Martin von Zweigbergk
c55e08023e workspace: don't lose sparsed-away paths when recovering workspace
When an operation is missing and we recover the workspace, we create a
new working-copy commit on top of the desired working-copy commit (per
the available head operation). We then reset the working copy to an
empty tree because it shouldn't really matter much which commit we
reset to. However, when the workspace is sparse, it does matter, as
the test case from the previous patch shows. This patch fixes it by
replacing the `reset_to_empty()` method by a new `recover(&Commit)`,
which effectively resets to the empty tree and then resets to the
commit. That way, any subsequent snapshotting will result keep the
paths from that tree for paths outside the sparse patterns.
2024-03-16 07:30:36 -07:00
Yuya Nishihara
a224d0f172 repo_path: show more detailed error if filesystem path failed to parse
This should address both use cases:
 1. If from_relative_path() is directly called, the error says ".." shouldn't
    be included in the (normalized) relative path.
 2. If parse_fs_path() is used, the error message contains paths relative to
    cwd. #3216
2024-03-09 11:01:43 +09:00
Thomas Castiglione
d661f59f9d working_copy: implement symlinks on windows with a helper function
enables symlink tests on windows, ignoring failures due to disabled developer mode,
and updates windows.md
2024-03-05 15:16:38 +08:00
Austin Seipp
6c31bab0d3 fsmonitor: allow core.fsmonitor = "none" to disable
When doing things like testing snapshot performance differences,
this allows you to turn off the monitor, no matter what the enabled
user or repository configuration has, e.g.

    jj st --config-toml='core.fsmonitor="none"'

Signed-off-by: Austin Seipp <aseipp@pobox.com>
2024-02-20 20:19:47 -06:00
Daehyeok Mun
a9f489ccdf Switch to ignore crate for gitignore handling.
Co-authored-by: Waleed Khan <me@waleedkhan.name>
2024-02-20 09:12:46 -08:00
Martin von Zweigbergk
6c1aeff7a9 working copy: materialize symlinks on Windows as regular files
I was a bit surprised to learn (or be reminded?) that checking out
symlinks on Windows leads to a panic. This patch fixes the crash by
materializing symlinks from the repo as regular files. It also updates
the snapshotting code so we preserve the symlink-ness of a path. The
user can update the symlink in the repo by updating the regular file
in the working copy. This seems to match Git's behavior on Windows
when symlinks are disabled.
2024-02-09 09:20:24 -08:00
Martin von Zweigbergk
5a898b16a8 working_copy: handle symlink outside write_path_to_store()
The `write_path_to_store()` has almost no overlapping code between the
handling of symlinks and regular files, which suggests that we should
move out the handling of symlinks to the caller (there's only one).
2024-02-09 09:20:24 -08:00
Jonathan Tan
33f3a420a1 workspace: recover from missing operation
If the operation corresponding to a workspace is missing for some reason
(the specific situation in the test in this commit is that an operation
was abandoned and garbage-collected from another workspace), currently,
jj fails with a 255 error code. Teach jj a way to recover from this
situation.

When jj detects such a situation, it prints a message and stops
operation, similar to when a workspace is stale. The message tells the
user what command to run.

When that command is run, jj loads the repo at the @ operation (instead
of the operation of the workspace), creates a new commit on the @
commit with an empty tree, and then proceeds as usual - in particular,
including the auto-snapshotting of the working tree, which creates
another commit that obsoletes the newly created commit.

There are several design points I considered.

1) Whether the recovery should be automatic, or (as in this commit)
manual in that the user should be prompted to run a command. The user
might prefer to recover in another way (e.g. by simply deleting the
workspace) and this situation is (hopefully) rare enough that I think
it's better to prompt the user.

2) Which command the user should be prompted to run (and thus, which
command should be taught to perform the recovery). I chose "workspace
update-stale" because the circumstances are very similar to it: it's
symptom is that the regular jj operation is blocked somewhere at the
beginning, and "workspace update-stale" already does some special work
before the blockage (this commit adds more of such special work). But it
might be better for something more explicitly named, or even a sequence
of commands (e.g. "create a new operation that becomes @ that no
workspace points to", "low-level command that makes a workspace point to
the operation @") but I can see how this can be unnecessarily confusing
for the user.

3) How we recover. I can think of several ways:
a) Always create a commit, and allow the automatic snapshotting to
create another commit that obsoletes this commit.
b) Create a commit but somehow teach the automatic snapshotting to
replace the created commit in-place (so it has no predecessor, as viewed
in "obslog").
c) Do either a) or b), with the added improvement that if there is no
diff between the newly created commit and the former @, to behave as if
no new commit was created (@ remains as the former @).
I chose a) since it was the simplest and most easily reasoned about,
which I think is the best way to go when recovering from a rare
situation.
2024-02-09 00:38:47 -08:00
Martin von Zweigbergk
b343289238 working_copy: make reset() take a commit instead of a tree
Our virtual file system at Google (CitC) would like to know the commit
so it can scan backwards and find the closest mainline tree based on
it. Since we always record an operation id (which resolves to a
working-copy commit) when we write the working-copy state, it doesn't
seem like a restriction to require a commit.
2024-02-06 12:41:09 -08:00
Yuya Nishihara
77ceadbfd0 cleanup: remove remaining ": {source}" from error message templates 2024-02-04 09:13:21 +09:00
Daniel Ploch
cb889f0b45 workspace: combine working copy functions into a trait 2024-01-25 11:46:07 -08:00
Yuya Nishihara
5a7d8ac596 working_copy: don't follow symlinks when visiting files in gitignored directory
Fixes #2878
2024-01-24 16:38:48 +09:00
Martin von Zweigbergk
a66e2a0a6d working_copy: mark commit_id field in proto reserved
By marking it reserved, we prevent accidental use. We can still read
working copy protos that have the field.
2024-01-12 17:38:23 -08:00
Yuya Nishihara
fa5e40719c object_id: extract ObjectId trait and macros to separate module
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
2024-01-05 10:20:57 +09:00
Martin von Zweigbergk
90744fb770 working copy: read files ahead when updating
If the commit backend has high latency, it can make a big difference
to read files concurrently. This patch updates the working copy code
to do that in the update code (when reading files from the backend to
write to the working copy). Because our backend at Google reads files
from a local daemon process that already does a lot of prefetching,
this patch doesn't actually help us. I think it's still the right
thing to do for backends that don't do the same kind of
prefetching. It speeds up `jj sparse set --add` by >10x when I disable
the prefetching in our daemon (our `Backend::concurrency()` is 100).
2023-12-29 13:37:13 -08:00
Yuya Nishihara
ac99145a28 working_copy: drop open file instance from PersistError
For the same reason as the file_util change.
2023-12-17 08:20:07 +09:00
Yuya Nishihara
601be0d480 working_copy: narrow file_states recursively while visiting directories
This saves another ~10ms.

Without watchman:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-1,jj-2 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files ~/mirrors/linux/no-match"
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files ~/mirrors/linux/no-match
  Time (mean ± σ):     327.7 ms ±  24.9 ms    [User: 1059.1 ms, System: 654.3 ms]
  Range (min … max):   296.0 ms … 385.4 ms    20 runs

Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux files ~/mirrors/linux/no-match
  Time (mean ± σ):     311.0 ms ±  24.8 ms    [User: 960.0 ms, System: 643.1 ms]
  Range (min … max):   274.9 ms … 358.5 ms    20 runs
```
2023-11-30 12:09:31 +09:00
Yuya Nishihara
a935a4f70c working_copy: use proto file states without rebuilding BTreeMap
In snapshot(), changed_file_states are received in arbitrary order. For the
other callers, entries are in diff_stream order, so we don't have to sort
them.

With watchman enabled, we can see the cost of sorting the sorted proto entries.
I don't think this is significant, but we can mitigate it by adding
is_file_states_sorted flag to the proto message if needed:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files ~/mirrors/linux/no-match"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files ~/mirrors/linux/no-match
  Time (mean ± σ):     164.8 ms ±  16.6 ms    [User: 50.2 ms, System: 111.7 ms]
  Range (min … max):   148.1 ms … 195.0 ms    20 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files ~/mirrors/linux/no-match
  Time (mean ± σ):     171.8 ms ±  13.6 ms    [User: 61.7 ms, System: 109.0 ms]
  Range (min … max):   159.5 ms … 192.1 ms    20 runs
```

Without watchman:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files ~/mirrors/linux/no-match"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files ~/mirrors/linux/no-match
  Time (mean ± σ):     367.3 ms ±  30.3 ms    [User: 1415.2 ms, System: 633.8 ms]
  Range (min … max):   325.4 ms … 421.7 ms    20 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files ~/mirrors/linux/no-match
  Time (mean ± σ):     327.7 ms ±  24.9 ms    [User: 1059.1 ms, System: 654.3 ms]
  Range (min … max):   296.0 ms … 385.4 ms    20 runs
```

I haven't measured snapshotting against dirty working copy, but I don't think
it would be slower than the original implementation.
2023-11-30 12:09:31 +09:00
Yuya Nishihara
fca3690dda working_copy: add file states wrapper that provides map-like API
I'll replace the current lazy loading mechanism with this. Read-only methods
are implemented on the borrowed type so that we can narrow lookup scope
recursively.
2023-11-30 12:09:31 +09:00
Yuya Nishihara
9292af5e52 working_copy: update file states in bulk
This helps migrate BTreeMap<RepoPath, _> to sorted Vec.
2023-11-30 12:09:31 +09:00
Yuya Nishihara
c9150d02fc working_copy: don't look up file state twice while visiting directories 2023-11-30 12:09:31 +09:00
Yuya Nishihara
6ce7bd5338 repo_path: replace .contains() with .starts_with(), flipping the arguments
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
2023-11-29 08:41:23 +09:00
Yuya Nishihara
bc9725c73c working_copy: use RepoPath::parent() which no longer allocates temporary object 2023-11-29 08:41:23 +09:00
Yuya Nishihara
28ab9593c3 repo_path: split RepoPath into owned and borrowed types
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
2023-11-28 07:33:28 +09:00
Yuya Nishihara
0a1bc2ba42 repo_path: add stub RepoPathBuf type, update callers
Most RepoPath::from_internal_string() callers will be migrated to the function
that returns &RepoPath, and cloning &RepoPath won't work.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
d322df0c8d matchers: make Files/PrefixMatcher constructors accept slice of borrowed paths
RepoPath will become slice type (like str), and it doesn't make sense to
require &[RepoPathBuf] here.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
55f75278bc repo_path: make to_internal_file_string() return &str, rename accordingly 2023-11-27 08:42:09 +09:00
Yuya Nishihara
974a6870b3 repo_path: make RepoPath::components() return iterator
This allows us to change the backing type from Vec<String> to String.
2023-11-27 08:42:09 +09:00
Yuya Nishihara
59ef3f0023 repo_path: split RepoPathComponent into owned and borrowed types
This is a step towards introducing a borrowed RepoPath type. The current
RepoPath type is inefficient as each component String is usually short. We
could apply short-string optimization, but still each inlined component would
consume 24 bytes just for e.g. "src", and increase the chance of random memory
access. If the owned RepoPath type is backed by String, we can implement cheap
cast from &str to borrowed &RepoPath type.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
6344cd56b3 repo_path: remove RepoPathJoin trait, just implement join() on the type
I don't think we'll add join() that takes different types.
2023-11-26 07:14:47 +09:00
Yuya Nishihara
042d26049c working_copy: lazily construct file_states BTreeMap
While it got faster to build a large BTreeMap<RepoPath, _>, there's still
a measurable cost. Let's eliminate it if watchman is enabled and the working
copy is clean. Perhaps, we should introduce new serialization format that
supports instant loading and lookup, but this hack works for the moment.
I'm not sure if the new tree_state format should be flat (RepoPath, _) list,
or tree like the backend storage btw.

In my "linux" repo (watchman enabled):
    % hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
      "target/release-with-debug/{bin} -R ~/mirrors/linux status"
    Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
      Time (mean ± σ):     768.9 ms ±  14.2 ms    [User: 630.7 ms, System: 131.2 ms]
      Range (min … max):   742.3 ms … 783.1 ms    10 runs

    Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
      Time (mean ± σ):     713.0 ms ±  16.8 ms    [User: 587.9 ms, System: 116.2 ms]
      Range (min … max):   681.5 ms … 731.1 ms    10 runs

    Relative speed comparison
            1.08 ±  0.03  target/release-with-debug/jj-0 -R ~/mirrors/linux status
            1.00          target/release-with-debug/jj-1 -R ~/mirrors/linux status
2023-11-23 18:48:14 +09:00
Yuya Nishihara
12cd657837 working_copy: extract file_states_to_proto() helper
Just minimizing the changes in the next commit. As we already have
file_states_from_proto(), it makes sense to extract the "to" function.
2023-11-23 18:48:14 +09:00
Yuya Nishihara
1ddcaa43b3 fsmonitor: don't apply prefix matching to paths obtained from watchman
If I understand it, watchman returns changed files and directories, and a
directory change doesn't mean we need to scan all files under the directory.
2023-11-23 10:06:00 +09:00
Yuya Nishihara
767e94f5af fsmonitor: drop unneeded mut from make_fsmonitor_matcher()
We only need &self.working_copy_path here.
2023-11-23 10:06:00 +09:00
Yuya Nishihara
c16c89bc27 fsmonitor: keep paths relative to the workspace root
Since the caller wants repo-relative paths, it doesn't make sense to convert
them back and forth.
2023-11-23 10:06:00 +09:00
Yuya Nishihara
5186066cf5 working_copy: simply collect() proto file states into BTreeMap
Suppose the input list is presorted, sorting a sorted vec would be cheaper
than .insert()-ing sorted items one by one.

In my "linux" repo (watchman eanbled):
 - jj-0: baseline
 - jj-1: previous (don't randomize by HashMap)
 - jj-2: this

    % hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
        "target/release-with-debug/{bin} -R ~/mirrors/linux status"
    Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
      Time (mean ± σ):      1.034 s ±  0.020 s    [User: 0.881 s, System: 0.212 s]
      Range (min … max):    1.011 s …  1.068 s    10 runs

    Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
      Time (mean ± σ):     849.3 ms ±  13.8 ms    [User: 710.7 ms, System: 199.3 ms]
      Range (min … max):   821.7 ms … 870.2 ms    10 runs

    Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux status
      Time (mean ± σ):     786.2 ms ±  16.7 ms    [User: 650.7 ms, System: 204.1 ms]
      Range (min … max):   760.8 ms … 805.2 ms    10 runs

    Relative speed comparison
            1.32 ±  0.04  target/release-with-debug/jj-0 -R ~/mirrors/linux status
            1.08 ±  0.03  target/release-with-debug/jj-1 -R ~/mirrors/linux status
            1.00          target/release-with-debug/jj-2 -R ~/mirrors/linux status
2023-11-20 08:29:33 +09:00
Yuya Nishihara
ee6a1e2c0a working_copy: don't build intermediate HashMap from proto file states
According to the doc, this is compatible with the map syntax.
https://protobuf.dev/programming-guides/proto3/#maps

This change means that the serialized file states are sorted by RepoPath,
so BTreeMap<RepoPath, _> can be reconstructed with fewer cache misses.

In my "linux" repo (watchman enabled):
 - jj-0: baseline
 - jj-1: this

    % hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
      "target/release-with-debug/{bin} -R ~/mirrors/linux status"
    Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
      Time (mean ± σ):      1.034 s ±  0.020 s    [User: 0.881 s, System: 0.212 s]
      Range (min … max):    1.011 s …  1.068 s    10 runs

    Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
      Time (mean ± σ):     849.3 ms ±  13.8 ms    [User: 710.7 ms, System: 199.3 ms]
      Range (min … max):   821.7 ms … 870.2 ms    10 runs

    Relative speed comparison
            1.32 ±  0.04  target/release-with-debug/jj-0 -R ~/mirrors/linux status
            1.08 ±  0.03  target/release-with-debug/jj-1 -R ~/mirrors/linux status

Cache-misses got reduced:

    % perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
      -- ./target/release-with-debug/jj-0 -R ~/mirrors/linux --no-pager status

              1,091.68 msec task-clock                       #    1.032 CPUs utilized
         4,179,596,978      cycles                           #    3.829 GHz
         6,166,231,489      instructions                     #    1.48  insn per cycle
           134,032,047      cache-references                 #  122.776 M/sec
            29,322,707      cache-misses                     #   21.88% of all cache refs

           1.057474164 seconds time elapsed

           0.897042000 seconds user
           0.194819000 seconds sys

    % perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
      -- ./target/release-with-debug/jj-1 -R ~/mirrors/linux --no-pager status

                927.05 msec task-clock                       #    1.083 CPUs utilized
         3,451,299,198      cycles                           #    3.723 GHz
         6,222,418,272      instructions                     #    1.80  insn per cycle
            98,499,363      cache-references                 #  106.251 M/sec
            11,998,523      cache-misses                     #   12.18% of all cache refs

           0.855938336 seconds time elapsed

           0.720568000 seconds user
           0.207924000 seconds sys
2023-11-20 08:29:33 +09:00
Yuya Nishihara
56047cb7ec working_copy: don't pass all proto data to from_proto() functions
Just a code cleanup. This allows us to consume proto fields if needed.
I also removed redundant .clone() and .as_str().
2023-11-20 08:29:33 +09:00
Martin von Zweigbergk
9b24d24612 conflicts: add another helper for materializing a tree value
We have a few places where we have a `MergedTreeValue` and need to
read the data associated with it so we can write to the working copy
or include it in a diff. Let's extract some of that shared logic to a
function so we can reuse it. I plan to use it for reading file
contents in advance while streaming a diff in `local_working_copy`
soon (and probably in `jj diff` thereafter), but I think it seems like
an improvement on its own.
2023-11-08 21:21:38 -08:00
Martin von Zweigbergk
65bd5cacba working copy: on checkout, move read from store out of write_*() functions
I'd like to read N files ahead from the backend, to avoid serializing
too many server calls on backends that are backed by a server. Moving
the reads a little earlier is a little step towards that.

The `TreeState::write_*()` functions can now be made into free/static
functions if we prefer.
2023-11-08 21:21:38 -08:00
Martin von Zweigbergk
904c37d36d working copy: use MergedTree::diff_stream()
This will make it a little faster to update the working copy at Google
once we've made `MergedTree::diff_stream()` fetch trees
concurrently. (It only makes it a little faster because we still fetch
files serially.)
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
24b706641f async: switch to pollster's block_on()
During the transition to using more async code, I keep running into
https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
a1ef9dc845 merged_tree: propagate backend errors in diff iterator
I want to fix error propagation before I start using async in this
code. This makes the diff iterator propagate errors from reading tree
objects.

Errors include the path and don't stop the iteration. The idea is that
we should be able to show the user an error inline in diff output if
we failed to read a tree. That's going to be especially useful for
backends that can return `BackendError::AccessDenied`. That error
variant doesn't yet exist, but I plan to add it, and use it in
Google's internal backend.
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
309f1200d6 merge: introduce a type alias for Merge<Option<TreeValue>>
Reasons to introduce this alias:

* Reduces complexity of a type, to silence Clippy warnings in the
  future if we use this type as a type parameter

* The type is used quite frequently, so it makes sense to have a name
  for it

* It's easier to visually scan for the end of the type when you don't
  have to match opening and closing angle brackets
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
8764ad9826 conflicts: make materialization async
We need to let async-ness propagate up from the backend because
`block_on()` doesn't like to be called recursively. The conflict
materialization code is a good place to make async because it doesn't
depends on anything that isn't already async-ready.
2023-10-20 07:38:34 -07:00
Martin von Zweigbergk
6bfd618275 workspace: load working copy implementation dynamically
This makes `Workspace::load()` look a new `.jj/working_copy/type` file
in order to load the right working copy implementation, just like
`Repo::load()` picks the right backends based on `.jj/store/type`,
`.jj/op_store/type`, etc. We don't write the file yet, and we don't
have a way of adding alternative working copy implementations, so it
will always be `LocalWorkingCopy` for now.
2023-10-16 22:33:44 -07:00
Martin von Zweigbergk
e1f00d9426 working copy: pass commit instead of tree into check_out()
Our internal working copy implementations at Google will need the
commit so they can walk history backwards until they get to a "public"
commit. They'll then use that to tell build tools and virtual file
systems to present that as a base.

I'm not sure if we'll need to update `reset()` too. It's currently
only used by `jj untrack`, which doesn't change the commit's parent,
so it wouldn't affect any history walks.
2023-10-16 22:33:44 -07:00
Martin von Zweigbergk
0582893144 working copy: return Box<dyn LockedWorkingCopy> from start_mutation() 2023-10-15 16:13:19 -07:00
Martin von Zweigbergk
580586d008 working copy: return Box<dyn WorkingCopy> from finish() 2023-10-15 16:13:19 -07:00