cybercyst/jj - jj - Gitea: Git with a cup of tea

mirror of https://github.com/martinvonz/jj.git synced 2025-05-16 20:54:27 +00:00

Author	SHA1	Message	Date
Austin Seipp	017ce851b4	refactor(jj-lib): remove nightly_shims gunk Summary: Now that we have Rust 1.71.0 at our fingertips, the `map_first_last` feature has been stabilized. That means we can get rid of the `jj-lib` build script and also the `nightly_shims` module. Signed-off-by: Austin Seipp <aseipp@pobox.com> Change-Id: Ibb5ce3258818a2de670763fbbaf3c2e7	2023-07-17 18:38:26 -05:00
Waleed Khan	54dba51a08	docs: warn about missing docs for `jj-lib` crate	2023-07-10 18:28:59 +03:00
Martin von Zweigbergk	a19a91bfbc	cargo: upgrade regex 1.7.3 to 1.8.1	2023-04-24 11:28:12 -07:00
B Wilson	01a9ce0c71	diff: Treat multi-byte UTF-8 runes as word characters Inline diffs on multi-byte UTF-8 characters would match individual bytes, causing garbled diffs in some cases. For example, replacing `⊢` with `⊣`, which differ in the final byte only, caused the diff to display a diff of the bytes instead the character. This commit uses a workaround present in Mercurial by treating all bytes 0x80 and above as word characters, causing any multi-byte character to be treated as a word and not segmented. https://www.mercurial-scm.org/repo/hg/file/6.3.3/mercurial/patch.py#l51	2023-03-30 00:06:56 +09:00
Martin von Zweigbergk	d8feed9be4	copyright: change from "Google LLC" to "The Jujutsu Authors" Let's acknowledge everyone's contributions by replacing "Google LLC" in the copyright header by "The Jujutsu Authors". If I understand correctly, it won't have any legal effect, but maybe it still helps reduce concerns from contributors (though I haven't heard any concerns). Google employees can read about Google's policy at go/releasing/contributions#copyright.	2022-11-28 06:05:45 -10:00
Martin von Zweigbergk	6fabf89529	diff: add helpers for finding index for non-base side We have some repeated code doing `wrapping_add()`, so let's add helpers to reduce repetition.	2022-03-10 22:00:45 -08:00
Martin von Zweigbergk	fc1731a180	diff: switch from BTreeMap to sorted Vec for unchanged ranges We don't really need a BTreeMap for keeping the unchanged ranges. The only place it helps a bit is when refining a diff because we may then insert some more unchanged ranges in the list. I think there has to be very many unchanged ranges for that to matter, however. This patch therefore replace the BTreeMap by a sorted Vec. `cargo bench` says that a few tests got ~20% faster. I'm looking into this code now because I'm thinking of copying some of it for the "partial conflict resolution" tool I'm working on for Mercurial.	2022-03-10 00:33:17 -08:00
Martin von Zweigbergk	934564bf8d	diff: also sort base ranges by end point I wanted to replace the BTreeMap by a Vec and noticed that we actually sometimes end up having a `0..n` range followed by a `0..0` after refinement. We currently compare those two as equal because I had not thought that we could end up attempting to add two ranges with the same start point. When trying to insert the second range (`0..0`), the BTreeMap will keep the existing key (`0..n`) and replace the value. That's probably works, but it's clearly not what I intended. Let's fix by sorting by the end point if the start point is equal. This actually improves some benchmarks by a few percent (maybe because the subsequent compaction can then remove the `0..0` range).	2022-03-10 00:33:17 -08:00
Waleed Khan	9202aae8b1	build: conditionally use `map_first_last` feature if available	2022-02-20 22:21:14 -08:00
Waleed Khan	261cd1a1c4	build: add shims for nightly feature `map_first_last`	2022-02-20 22:16:07 -08:00
Martin von Zweigbergk	e52c902d3a	diff: compact adjacent unchanged regions also when using `for_tokenizer()` I noticed while working on support for unified diffs (#33) that `Diff::for_tokenizer(..., &find_line_ranges)` would return a `DiffHunk::Matching` for each matching line instead of a single `DiffHunk::Matching` for all the matching lines. That's different from what you get from `Diff::default_refinement()` and seems less convenient to work with.	2021-10-10 00:00:06 -07:00
Martin von Zweigbergk	81b51de300	diff: rewrite diff() using new multi-way diff	2021-06-26 23:49:58 -07:00
Martin von Zweigbergk	987aecc749	diff: add a type for diffing arbitrary number of inputs I have been trying to figure out how to generalize diffs and merges for arbitrary number of inputs. For example, I want to have an internal representation of an octopus merge adding 5 inputs (file states/contents) and removing 4 inputs. I also want to be to represent a diff from a regular 3-way-conflict state to a resolved state. Such a diff would be from a state adding two inputs and removing one, to a state adding just one input. I finally realized last week that the problem is simple if you don't care about adds vs removes. Instead, you line up the matching and differing parts of all the inputs. It's then up to the caller to use it in an appropriate way for its use case. For example, a regular diff would pass in two inputs and would get back a list of matching and dffering hunks. It might then present the first element of differing hunks in red and the second element in green. Similarly, a 3-way merge would pass in three inputs with the base first. It would then compare the sides and decide on a resolution (or leave it unresolved if all three sides are different). This change adds a type representing this kind of multi-way diff. Coming changes will update existing code to use it. In addition to making the existing code simpler and more consistent, having this in place should also: * Make it much easier to present merge conflicts involving more than 3 parts. * Experiment with different ways of displaying diffs from/to conflict states. * Experiment with sub-line-level merging.	2021-06-26 23:49:58 -07:00
Martin von Zweigbergk	4c416dd864	cleanup: let Clippy fix a bunch of warnings	2021-06-14 00:27:31 -07:00
Martin von Zweigbergk	b50ef1410d	styler: rename Styler to more standard Formatter	2021-06-05 08:38:28 -07:00
Martin von Zweigbergk	5b18e89a4d	diff: fix LCS when a line/word/byte has been moved later	2021-04-28 23:33:18 -07:00
Martin von Zweigbergk	102f7a0416	diff: also recurse into final region after after unchanged regions See test case for details. Before: test bench_diff_10k_lines_reversed ... bench: 36,249,659 ns/iter (+/- 174,455) test bench_diff_10k_modified_lines ... bench: 37,258,890 ns/iter (+/- 803,963) test bench_diff_10k_unchanged_lines ... bench: 4,252 ns/iter (+/- 69) test bench_diff_1k_lines_reversed ... bench: 982,834 ns/iter (+/- 6,467) test bench_diff_1k_modified_lines ... bench: 3,343,469 ns/iter (+/- 23,243) test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 2) test bench_diff_git_git_read_tree_c ... bench: 95,559 ns/iter (+/- 816) After: test bench_diff_10k_lines_reversed ... bench: 36,186,715 ns/iter (+/- 196,903) test bench_diff_10k_modified_lines ... bench: 37,511,000 ns/iter (+/- 1,370,476) test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 8) test bench_diff_1k_lines_reversed ... bench: 986,010 ns/iter (+/- 11,565) test bench_diff_1k_modified_lines ... bench: 3,370,938 ns/iter (+/- 17,041) test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 2) test bench_diff_git_git_read_tree_c ... bench: 102,189 ns/iter (+/- 1,052) So this patch makes diffing even slower (but still easily fast enough for all cases I've run into in real life). There's probably a lot that can be done to make things faster, but the first priority is that the diffs are correct and easy to read.	2021-04-08 23:54:54 -07:00
Martin von Zweigbergk	5c10c93e64	diff: fix tests broken by the previous commit Sorry, I forgot to run the automated tests again :(	2021-04-07 11:00:04 -07:00
Martin von Zweigbergk	0dd000d236	diff: do final refinement at byte-level for non-word bytes This results in significantly more readable diffs on commits like 659393bec219 in this repo. Before: test bench_diff_10k_lines_reversed ... bench: 38,122,998 ns/iter (+/- 557,688) test bench_diff_10k_modified_lines ... bench: 32,556,563 ns/iter (+/- 548,114) test bench_diff_10k_unchanged_lines ... bench: 4,231 ns/iter (+/- 15) test bench_diff_1k_lines_reversed ... bench: 958,296 ns/iter (+/- 46,963) test bench_diff_1k_modified_lines ... bench: 3,014,723 ns/iter (+/- 15,830) test bench_diff_1k_unchanged_lines ... bench: 249 ns/iter (+/- 2) test bench_diff_git_git_read_tree_c ... bench: 78,599 ns/iter (+/- 1,079) After: test bench_diff_10k_lines_reversed ... bench: 38,289,493 ns/iter (+/- 413,712) test bench_diff_10k_modified_lines ... bench: 37,352,516 ns/iter (+/- 1,293,950) test bench_diff_10k_unchanged_lines ... bench: 4,238 ns/iter (+/- 13) test bench_diff_1k_lines_reversed ... bench: 967,253 ns/iter (+/- 8,506) test bench_diff_1k_modified_lines ... bench: 3,358,028 ns/iter (+/- 37,154) test bench_diff_1k_unchanged_lines ... bench: 233 ns/iter (+/- 1) test bench_diff_git_git_read_tree_c ... bench: 95,787 ns/iter (+/- 740) So the biggest slowdown is when there are modified lines.	2021-04-07 10:27:17 -07:00
Martin von Zweigbergk	d7395cc34a	diff: add copyright header	2021-04-06 21:26:37 -07:00
Martin von Zweigbergk	7e4e43f358	diff: first diff lines, then refine to words, producing better diffs The new diff algorithm produces pretty bad diffs in some cases, such as cc4b1e923091 in this repo (the parent of this commit). I think the problem there is that many words are repeated over and over. Diffing first at the line level and then refining the diff of the changed ranges at the word level gives much better results. That's what this patch does. After this patch, `jj diff -r cc4b1e923091` looks pretty similar to the diff in GitHub's UI. I hope to get around to doing the same for the merge code soon. Impact on benchmarks: Before: test bench_diff_10k_lines_reversed ... bench: 42,647,532 ns/iter (+/- 765,347) test bench_diff_10k_modified_lines ... bench: 21,407,980 ns/iter (+/- 126,366) test bench_diff_10k_unchanged_lines ... bench: 4,235 ns/iter (+/- 16) test bench_diff_1k_lines_reversed ... bench: 1,190,483 ns/iter (+/- 7,192) test bench_diff_1k_modified_lines ... bench: 1,919,766 ns/iter (+/- 9,665) test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 1) test bench_diff_git_git_read_tree_c ... bench: 174,702 ns/iter (+/- 1,199) After: test bench_diff_10k_lines_reversed ... bench: 38,289,509 ns/iter (+/- 129,004) test bench_diff_10k_modified_lines ... bench: 33,140,659 ns/iter (+/- 3,989,339) test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 14) test bench_diff_1k_lines_reversed ... bench: 973,551 ns/iter (+/- 94,895) test bench_diff_1k_modified_lines ... bench: 3,033,818 ns/iter (+/- 29,513) test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 1) test bench_diff_git_git_read_tree_c ... bench: 79,100 ns/iter (+/- 963) So most of them get slower, as expected. The last one, taken from a real diff in the git.git repo, get faster, however (which is also what I would have expected).	2021-04-04 21:50:31 -07:00
Martin von Zweigbergk	3c35dbace6	merge: use new diff algorithm for finding sync regions With the histogram diff code from the previous patch, we can now start using that for finding the "sync regions" in 3-way merge. That helps a lot with the slow merging we had before this patch. `jj diff -r 9d540e9726` in the git.git repo drops from 22 s to 0.15 s with this patch. (That commit is a rather arbitrary merge commit from aroun 5 years ago.) With the new diff algorithm, the output of `jj diff -r 9d540e9726` in git.git looks better if we find unchanged sync regions based on lines than on words, so that's what I'm using in this patch. That's a change compared the the LCS-based diff we used before this patch. I suspect the reason that finding sync regions based on words works worse now is not because of the change from LCS to histogram but because of the change in how we define a word. My goal right now is mostly to make it faster; I'll get back to refining the diff result later.	2021-03-31 22:16:19 -07:00
Martin von Zweigbergk	1e657c5331	diff: add a histogram(-like?) diff algorithm The current diff algorithm does a full LCS on the words of the texts, which is really slow. Diffing the working copy when e.g. `src/commands.py` has changes far apart takes seconds. This patch adds an implementation inspired by JGit's Histogram diff. I say "inspired" because I just didn't quite understand it :P In particular, I didn't understand what it does when it finds non-unique elements. I decided to line up the leading common elements on both sides of the merge. I don't know if that usually gives good enough results in practice. I'm sure this can still be optimized a lot, but this seems good enough as a start. There is also many things to improve about the quality of the diffs.	2021-03-31 22:15:36 -07:00

23 Commits