Since fileset and revset languages are syntactically close, we can reparse
revset expression as a fileset. This might sound a bit scary, but helps
eliminate nested quoting like file("~glob:'*.rs'"). One oddity exists in alias
substitution, though. Another possible problem is that we'll need to add fake
operator parsing rules if we introduce incompatibility in fileset, or want to
embed revset expressions in a fileset.
Since "file(x, y)" is equivalent to "file(x|y)", the former will be deprecated.
I'll probably add a mechanism to collect warnings during parsing.
While I like strict parsing, it's not uncommon that we have to deal with file
names containing spaces, and doubly-quoted strings such as '"Foo Bar"' look
ugly. So, this patch adds an exception that accepts top-level bare strings.
This parsing rule is specific to command arguments, and won't be enabled when
loading fileset aliases.
Mercurial appears to resolve cwd-relative path first, so "glob:*.c" could be
parsed as "**/*.c" if cwd was literally "**". It wouldn't practically matter,
but isn't correct. Instead, jj's parser first splits glob into literal part
and pattern. That's mainly because we want to parse the user input texts into
type-safe objects, and (RepoPathBuf, glob::Pattern) pairs are the simplest
ones. The current parser can't handle patterns like "foo/*/.." (= "foo" ?),
and errors out. I believe this restriction is acceptable.
Unlike literal paths, the 'glob:' pattern anchors to the whole file path. I
don't think "prefix"-matching glob is useful, and making it the default would
be rather confusing.
This prepares for adding glob matcher, which will be backed by
RepoPathTree<Vec<glob::Pattern>>.
FilesNodeKind/PrefixNodeKind are basically boolean types, but implemented as
enums for better code readability.
If this doesn't work out, maybe we can try one of these:
a. fall back to bare file name if expression doesn't contain any operator-like
characters (e.g. "f(x" is an error, but "f x" can be parsed as bare string)
b. introduce command-line flag to opt in (e.g. -e FILESET)
c. introduce pattern prefix to opt in (e.g. set:FILESET)
Closes#3239, #2915, #2286
The primary use case is to warn unmatched paths. I originally thought paths in
negated expressions shouldn't be checked, but doing that seems rather
inconsistent than useful. For example, "~x" in "jj split '~x'" should match at
least one file to split to non-empty revisions.
Since fileset is primarily used in CLI, it's better to avoid inner quoting if
possible. For example, ".." would have to be quoted in the original grammar
derived from the revset.
This patch also adds a stricter version of an identifier rule. If we add a
symbol alias, it will follow the "strict_identifier" rule.
The fileset grammar is basically a stripped-down version of the revset grammar,
with a few adjustments:
* extract function call to "function" rule (like templater)
* inline "symbol" rule (because "identifier" and "string" should be treated
differently at the early parsing stage.)
The parser will have a separate name resolution stage. This will help to do
alias substitution properly. I'll probably rewrite the revset parser in the
same way. It will also help if we want to embed fileset expression in file()
revset.
FilesetExpression is similar to RevsetExpression, but there are two major
differences:
- Union is represented as N-ary operator,
- Expression node isn't Rc-ed.
The former is because of the nature of the runtime Matcher objects. It's easier
to construct a Matcher from flattened union expressions than from a binary tree.
The latter choice comes from UnionAll(Vec<FilesetExpression>), which doesn't
have to be Vec<Rc<FilesetExpression>>, and Rc<[FilesetExpression]> can't be
constructed from [Rc<_>, ..]. Anyway, the internal representation may change as
needed.
Another design decision I made is Vec<Pattern(RepoPathBuf)> vs
Pattern(Vec<RepoPathBuf>). I chose the former because it will be more closer
to the parsed tree of the fileset language.