148 Commits

Author SHA1 Message Date
Olivier Mengué
0611816216 regexp/syntax: cleanup code generation in perl_groups.go
Cleanup code generation of perl_groups.go:
* Fix the generated code header to follow the standard https://go.dev/s/generatedcode
* Apply gofmt as last step of code generation
* Add //go:generate lines in parse.go to trigger code generation
* Adapt make_perl_groups.pl to handle writing directly to the output
  file (as we can't use shell redirection in go:generate lines)
* use strict; use warnings;

Change-Id: I675241da03dd3f6facc9eb9de00999bd9203ad70
Reviewed-on: https://go-review.googlesource.com/c/go/+/555995
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-02 13:59:01 +00:00
Daniel Martí
754f870381 crypto/tls,regexp: remove always-nil error results
These were harmless, but added unnecessary verbosity to the code.
This can happen as a result of refactors: for example,
the method sessionState used to return errors in some cases.

Change-Id: I4e6dacc01ae6a49b528c672979f95cbb86795a85
Reviewed-on: https://go-review.googlesource.com/c/go/+/528995
Reviewed-by: Leo Isla <islaleo93@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Olivier Mengué <olivier.mengue@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
2024-03-29 22:22:45 +00:00
Daniel Martí
27c7a3dcc3 regexp/syntax: use the Regexp.Equal static method directly
A follow-up for the recent https://go.dev/cl/573978.

Change-Id: I0e75ca0b37d9ef063bbdfb88d4d2e34647e0ee50
Reviewed-on: https://go-review.googlesource.com/c/go/+/574677
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-03-29 16:41:51 +00:00
apocelipes
51a96f86f7 regexp/syntax: simplify the code
Use the slices package and the built-in max to simplify the code.
There's no noticeable performance change in this modification.

Change-Id: I96e46ba8ab1323f1ba0b8c9b827836e217772cf2
GitHub-Last-Rev: f0111ac7e220f7dac03290125a3a83831012f235
GitHub-Pull-Request: golang/go#66511
Reviewed-on: https://go-review.googlesource.com/c/go/+/573978
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2024-03-26 19:50:03 +00:00
apocelipes
8ed0d35fef regexp: use slices to simplify the code
Replace some "reflect.DeepEqual" calls in the tests with
"slices.Equal" which is much faster for slice comparisons.

Remove unnecessary "runeSlice" and redundant helper functions.

Change-Id: Ib5dc41848d7a3c5149f41701d60471a487cff476
GitHub-Last-Rev: 87b5ed043d2935b971aa676cc52b9b2c5b45736b
GitHub-Pull-Request: golang/go#66509
Reviewed-on: https://go-review.googlesource.com/c/go/+/573977
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-03-25 19:36:03 +00:00
Oleksandr Redko
3e82b5ee0a regexp/syntax: use standard generated code header
Updates doc by running these commands:
  - mksyntaxgo from the google/re2 repo, which changes comment according
    to https://golang.org/s/generatedcode
  - gofmt -w regexp/syntax/doc.go

Change-Id: I66a9dd9fa841cbce899ab3aa32d7face798d2920
Reviewed-on: https://go-review.googlesource.com/c/go/+/572275
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-03-19 11:21:02 +00:00
cui fliter
65172c2036 regexp: add available godoc link
Signed-off-by: cui fliter <imcusg@gmail.com>
Change-Id: I85339293d4cfb691125f991ec7162e9be186efdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/539599
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-02-26 20:50:01 +00:00
Mauri de Souza Meneguzzo
c4d55ab912 regexp/syntax: regenerate docs with mksyntaxgo
This makes the docs up-to-date by running doc/mksyntaxgo from the google/re2 repo.

Change-Id: I80358eed071e7566c85edaeb1cc5514a6d8c37a7
GitHub-Last-Rev: 0f8c8df4f213ce89fbea89e81f0ea3babd59d38f
GitHub-Pull-Request: golang/go#65249
Reviewed-on: https://go-review.googlesource.com/c/go/+/558136
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-02-20 14:57:37 +00:00
Olivier Mengué
eff7aef4eb regexp: add godoc links
Change-Id: I087162f866e781258f9fbb96215c1ff6a5c315a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/507776
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-11 20:23:37 +00:00
qiulaidongfeng
3bd30298dd regexp/syntax: use min func
Change-Id: I679c906057577d4a795c07a2f572b969c3ee14d5
GitHub-Last-Rev: fba371d2d6bfc7fbf3a93a53bb22039acf65d7cf
GitHub-Pull-Request: golang/go#63350
Reviewed-on: https://go-review.googlesource.com/c/go/+/532218
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-03 19:49:07 +00:00
Jes Cok
0256198129 regexp: use built-in clear to clear b.visited in *bitState.reset
Change-Id: I1a723124f7050aeb971377db8c3cd04ebf9f7a16
GitHub-Last-Rev: 465da88feb20b4a3ebea3c3e36560f6c82f7fa2e
GitHub-Pull-Request: golang/go#62611
Reviewed-on: https://go-review.googlesource.com/c/go/+/527975
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-09-14 17:25:39 +00:00
Jes Cok
267323ef2d all: calculate the median uniformly
Like sort.Search, use "h := int(uint(i+j) >> 1)" style code to calculate
the median.

Change-Id: Ifb1a19dde1c6ed6c1654bc642fc9565a8b6c5fc4
GitHub-Last-Rev: e2213b738832f1674948d6507f40e2c0b98cb972
GitHub-Pull-Request: golang/go#62503
Reviewed-on: https://go-review.googlesource.com/c/go/+/526496
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-09-09 01:46:03 +00:00
Russ Cox
98c9f271d6 regexp/syntax: use more compact Regexp.String output
Compact the Regexp.String output. It was only ever intended for debugging,
but there are at least some uses in the wild where regexps are built up
using regexp/syntax and then formatted using the String method.
Compact the output to help that use case. Specifically:

 - Compact 2-element character class ranges: [a-b] -> [ab].
 - Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E).

Fixes #57950.

Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507015
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
2023-08-16 16:02:30 +00:00
Eduard Bondarenko
6ab8dfbe6b regexp: improve Regexp.ReplaceAll documentation and tests related to Expand part
For #40329

Change-Id: Ie0cb337545ce39cd169129227c45f7d2eaebc898
GitHub-Last-Rev: c017d4c7c1bc1f8cd39e6c70b60885cef1231dcd
GitHub-Pull-Request: golang/go#56507
Reviewed-on: https://go-review.googlesource.com/c/go/+/446836
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-07 00:22:53 +00:00
Mauri de Souza Meneguzzo
ee61186b33 regexp/syntax: accept (?<name>...) syntax as valid capture
Currently the only named capture supported by regexp is (?P<name>a).

The syntax (?<name>a) is also widely used and there is currently an effort from
 the Rust regex and RE2 teams to also accept this syntax.

Fixes #58458

Change-Id: If22d44d3a5c4e8133ec68238ab130c151ca7c5c5
GitHub-Last-Rev: 31b50e6ab40cfb0f36df6f570525657d4680017f
GitHub-Pull-Request: golang/go#61624
Reviewed-on: https://go-review.googlesource.com/c/go/+/513838
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-07-31 19:52:57 +00:00
Jes Cok
5d481abc87 all: fix typos
Change-Id: I510b0a4bf3472d937393800dd57472c30beef329
GitHub-Last-Rev: 8d289b73a37bd86080936423d981d21e152aaa33
GitHub-Pull-Request: golang/go#60960
Reviewed-on: https://go-review.googlesource.com/c/go/+/505398
Auto-Submit: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Findley <rfindley@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-07-18 19:55:29 +00:00
Daniel Martí
d4bcfe4e83 regexp: fix copy-paste typo on Regexp.UnmarshalText doc
I noticed that https://go.dev/cl/479401 called both methods MarshalText
in the godoc, so fix that.

While here, add more godoc links for better usability.

Change-Id: I8f10bafeca6a1ca1c1ed9be7a7dd9fdecfe991a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/484335
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-04-14 16:28:44 +00:00
Jonathan Hall
313ce55a86 regexp: add Regexp.TextMarshaler/TextUnmarshaler
Fixes #46159

Change-Id: I51dc4e9e8915ab5a73f053690fb2395edbeb1151
Reviewed-on: https://go-review.googlesource.com/c/go/+/479401
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
2023-04-12 20:03:09 +00:00
Ian Lance Taylor
9a0c506a4e all: re-run stringer
Re-run all go:generate stringer commands. This mostly adds checks
that the constant values did not change, but does add new strings
for the debug/dwarf and internal/pkgbits packages.

Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/483717
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-04-11 20:24:07 +00:00
Ludi Rehak
82bf12902f regexp/syntax: test for lowercase letters first in IsWordChar
Lowercase letters occur more frequently than uppercase letters
in English text.  In IsWordChar, evaluate the most common case
(lowercase letters) first to minimize the expected value of its
execution time. Code clarity does not suffer by rearranging the
order of the checks.

Add a benchmark on a sentence demonstrating the performance
improvement.

name           old time/op  new time/op  delta
IsWordChar-10   122ns ± 0%   114ns ± 1%  -6.68%  (p=0.000 n=8+10)

Change-Id: Ieee8126a4bd8ee8703905b4f75724623029f6fa2
Reviewed-on: https://go-review.googlesource.com/c/go/+/404100
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
2023-03-14 04:39:42 +00:00
cui fliter
b2faff18ce all: add missing periods in comments
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/449757
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joedian Reid <joedian@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-11-18 17:59:44 +00:00
cuiweixie
581a822a9e regexp: add ErrLarge error
For #56041

Change-Id: I6c98458b5c0d3b3636a53ee04fc97221f3fd8bbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/444817
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: xie cui <523516579@qq.com>
2022-11-02 18:15:21 +00:00
Russ Cox
c3c4aea55b regexp: limit size of parsed regexps
Set a 128 MB limit on the amount of space used by []syntax.Inst
in the compiled form corresponding to a given regexp.

Also set a 128 MB limit on the rune storage in the *syntax.Regexp
tree itself.

Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting this issue.

Fixes CVE-2022-41715.
Fixes #55949.

Change-Id: Ia656baed81564436368cf950e1c5409752f28e1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/439356
Auto-Submit: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Roland Shoemaker <roland@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
2022-10-05 20:39:49 +00:00
cui fliter
1888875182 regexp: fix a few function names on comments
Change-Id: I192dd34c677e52e16f0ef78e1dae58a78f6d1aac
GitHub-Last-Rev: 1638a7468951df72f13fea34855b6a4fcbb08226
GitHub-Pull-Request: golang/go#55967
Reviewed-on: https://go-review.googlesource.com/c/go/+/436885
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-10-02 02:31:23 +00:00
Bryan Boreham
0293c51bc5 regexp: avoid copying each instruction executed
Inst is a 40-byte struct, so avoiding the copy gives a decent speedup:

name                            old time/op    new time/op     delta
Find-8                             160ns ± 4%      109ns ± 4%  -32.22%  (p=0.008 n=5+5)
FindAllNoMatches-8                70.4ns ± 4%     53.8ns ± 0%  -23.58%  (p=0.016 n=5+4)
FindString-8                       154ns ± 6%      107ns ± 0%  -30.37%  (p=0.016 n=5+4)
FindSubmatch-8                     194ns ± 1%      135ns ± 1%  -30.56%  (p=0.008 n=5+5)
FindStringSubmatch-8               193ns ± 8%      131ns ± 0%  -31.82%  (p=0.008 n=5+5)
Literal-8                         42.8ns ± 2%     34.8ns ± 0%  -18.67%  (p=0.008 n=5+5)
NotLiteral-8                       917ns ± 2%      636ns ± 0%  -30.68%  (p=0.008 n=5+5)
MatchClass-8                      1.18µs ± 3%     0.91µs ± 1%  -22.27%  (p=0.016 n=5+4)
MatchClass_InRange-8              1.11µs ± 1%     0.87µs ± 2%  -21.38%  (p=0.008 n=5+5)
ReplaceAll-8                       659ns ± 6%      596ns ± 3%   -9.60%  (p=0.008 n=5+5)
AnchoredLiteralShortNonMatch-8    34.2ns ± 0%     30.4ns ± 1%  -11.20%  (p=0.016 n=4+5)
AnchoredLiteralLongNonMatch-8     38.7ns ± 0%     38.7ns ± 0%     ~     (p=0.579 n=5+5)
AnchoredShortMatch-8              67.0ns ± 1%     52.7ns ± 0%  -21.31%  (p=0.016 n=5+4)
AnchoredLongMatch-8                121ns ± 0%      124ns ±10%     ~     (p=0.730 n=5+5)
OnePassShortA-8                    392ns ± 0%      231ns ± 3%  -41.10%  (p=0.008 n=5+5)
NotOnePassShortA-8                 370ns ± 0%      282ns ± 1%  -23.81%  (p=0.008 n=5+5)
OnePassShortB-8                    280ns ± 0%      179ns ± 1%  -36.05%  (p=0.008 n=5+5)
NotOnePassShortB-8                 226ns ± 0%      185ns ± 3%  -18.26%  (p=0.008 n=5+5)
OnePassLongPrefix-8               51.7ns ± 0%     39.1ns ± 1%  -24.28%  (p=0.016 n=4+5)
OnePassLongNotPrefix-8             213ns ± 2%      132ns ± 1%  -37.86%  (p=0.008 n=5+5)
MatchParallelShared-8             25.3ns ± 3%     23.4ns ± 7%   -7.50%  (p=0.016 n=5+5)
MatchParallelCopied-8             26.5ns ± 7%     22.3ns ± 7%  -16.06%  (p=0.008 n=5+5)
QuoteMetaAll-8                    45.8ns ± 1%     45.8ns ± 1%     ~     (p=1.000 n=5+5)
QuoteMetaNone-8                   24.3ns ± 0%     24.3ns ± 0%     ~     (p=0.325 n=5+5)
Compile/Onepass-8                 1.98µs ± 0%     1.97µs ± 0%   -0.22%  (p=0.016 n=5+4)
Compile/Medium-8                  4.56µs ± 0%     4.55µs ± 1%     ~     (p=0.595 n=5+5)
Compile/Hard-8                    35.7µs ± 0%     35.3µs ± 3%     ~     (p=0.151 n=5+5)
Match/Easy0/16-8                  2.18ns ± 2%     2.19ns ± 5%     ~     (p=0.690 n=5+5)
Match/Easy0/32-8                  27.4ns ± 2%     27.6ns ± 4%     ~     (p=1.000 n=5+5)
Match/Easy0/1K-8                   246ns ± 0%      252ns ± 7%     ~     (p=0.238 n=5+5)
Match/Easy0/32K-8                 4.58µs ± 7%     4.64µs ± 5%     ~     (p=1.000 n=5+5)
Match/Easy0/1M-8                   235µs ± 0%      235µs ± 0%     ~     (p=0.886 n=4+4)
Match/Easy0/32M-8                 7.86ms ± 0%     7.86ms ± 1%     ~     (p=0.730 n=4+5)
Match/Easy0i/16-8                 2.15ns ± 0%     2.15ns ± 0%     ~     (p=0.246 n=5+5)
Match/Easy0i/32-8                  507ns ± 2%      466ns ± 4%   -8.03%  (p=0.008 n=5+5)
Match/Easy0i/1K-8                 14.7µs ± 0%     13.6µs ± 2%   -7.63%  (p=0.008 n=5+5)
Match/Easy0i/32K-8                 571µs ± 1%      570µs ± 1%     ~     (p=0.556 n=4+5)
Match/Easy0i/1M-8                 18.2ms ± 0%     18.8ms ±11%     ~     (p=0.548 n=5+5)
Match/Easy0i/32M-8                 581ms ± 0%      590ms ± 1%   +1.52%  (p=0.016 n=4+5)
Match/Easy1/16-8                  2.17ns ± 0%     2.15ns ± 0%   -0.90%  (p=0.000 n=5+4)
Match/Easy1/32-8                  25.1ns ± 0%     25.4ns ± 4%     ~     (p=0.651 n=5+5)
Match/Easy1/1K-8                   462ns ± 1%      431ns ± 4%   -6.56%  (p=0.008 n=5+5)
Match/Easy1/32K-8                 18.8µs ± 0%     18.8µs ± 1%     ~     (p=1.000 n=5+5)
Match/Easy1/1M-8                   658µs ± 0%      658µs ± 1%     ~     (p=0.841 n=5+5)
Match/Easy1/32M-8                 21.0ms ± 1%     21.0ms ± 2%     ~     (p=0.841 n=5+5)
Match/Medium/16-8                 2.15ns ± 0%     2.16ns ± 0%     ~     (p=0.714 n=4+5)
Match/Medium/32-8                  561ns ± 1%      512ns ± 5%   -8.69%  (p=0.008 n=5+5)
Match/Medium/1K-8                 16.9µs ± 0%     15.2µs ± 1%  -10.40%  (p=0.008 n=5+5)
Match/Medium/32K-8                 632µs ± 0%      631µs ± 1%     ~     (p=0.421 n=5+5)
Match/Medium/1M-8                 20.3ms ± 1%     20.1ms ± 0%     ~     (p=0.190 n=5+4)
Match/Medium/32M-8                 650ms ± 1%      646ms ± 0%   -0.58%  (p=0.032 n=5+4)
Match/Hard/16-8                   2.15ns ± 0%     2.15ns ± 1%     ~     (p=0.111 n=5+5)
Match/Hard/32-8                    870ns ± 2%      667ns ± 1%  -23.28%  (p=0.008 n=5+5)
Match/Hard/1K-8                   26.9µs ± 0%     21.0µs ± 2%  -21.83%  (p=0.008 n=5+5)
Match/Hard/32K-8                   833µs ± 0%      833µs ± 1%     ~     (p=0.548 n=5+5)
Match/Hard/1M-8                   26.6ms ± 0%     26.8ms ± 1%     ~     (p=0.905 n=4+5)
Match/Hard/32M-8                   856ms ± 0%      851ms ± 0%   -0.65%  (p=0.016 n=5+4)
Match/Hard1/16-8                  2.96µs ±12%     1.81µs ± 3%  -38.68%  (p=0.008 n=5+5)
Match/Hard1/32-8                  5.62µs ± 3%     3.48µs ± 0%  -38.07%  (p=0.016 n=5+4)
Match/Hard1/1K-8                   175µs ± 5%      108µs ± 0%  -37.85%  (p=0.016 n=5+4)
Match/Hard1/32K-8                 4.09ms ± 2%     4.05ms ± 0%   -0.85%  (p=0.016 n=5+4)
Match/Hard1/1M-8                   131ms ± 0%      131ms ± 3%     ~     (p=0.151 n=5+5)
Match/Hard1/32M-8                  4.19s ± 0%      4.20s ± 1%     ~     (p=1.000 n=5+5)
Match_onepass_regex/16-8           262ns ± 2%      170ns ± 2%  -35.13%  (p=0.008 n=5+5)
Match_onepass_regex/32-8           463ns ± 0%      306ns ± 0%  -33.90%  (p=0.008 n=5+5)
Match_onepass_regex/1K-8          13.3µs ± 2%      8.8µs ± 0%  -33.84%  (p=0.008 n=5+5)
Match_onepass_regex/32K-8          424µs ± 3%      280µs ± 1%  -33.93%  (p=0.008 n=5+5)
Match_onepass_regex/1M-8          13.4ms ± 0%      9.0ms ± 1%  -32.80%  (p=0.016 n=4+5)
Match_onepass_regex/32M-8          427ms ± 0%      288ms ± 1%  -32.60%  (p=0.008 n=5+5)

Change-Id: I02c54176ed5c9f5b5fc99524a2d5eb1c490f0ebf
Reviewed-on: https://go-review.googlesource.com/c/go/+/355789
Reviewed-by: Peter Weinberger <pjw@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
2022-06-04 20:10:54 +00:00
Ludi Rehak
e7b0559448 regexp/syntax: fix typo in comment
Fix typo in comment describing IsWordChar.

Change-Id: Ia283813cf5662e218ee6d0411fb0c1b1ad1021f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/393435
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-29 01:00:55 +00:00
Ian Lance Taylor
0bf545e51f regexp/syntax: rename ErrInvalidDepth to ErrNestingDepth
The proposal accepted the name ErrNestingDepth.

For #51684

Change-Id: I702365f19e5e1889dbcc3c971eecff68e0b08727
Reviewed-on: https://go-review.googlesource.com/c/go/+/401854
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22 22:35:03 +00:00
Ian Lance Taylor
575fd8817a regexp: change ErrInvalidDepth message to match proposal
Also update the file in $GOROOT/api/next to use proposal number.

For #51684

Change-Id: I28bfa6bc1cee98a17b13da196d41cda34d968bb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/401076
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22 00:10:17 +00:00
Russ Cox
19309779ac all: gofmt main repo
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]

Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.

For #51082.

Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-11 16:34:30 +00:00
Russ Cox
81431c7aa7 all: replace `` and '' with “ (U+201C) and ” (U+201D) in doc comments
go/doc in all its forms applies this replacement when rendering
the comments. We are considering formatting doc comments,
including doing this replacement as part of the formatting.
Apply it to our source files ahead of time.

For #51082.

Change-Id: Ifcc1f5861abb57c5d14e7d8c2102dfb31b7a3a19
Reviewed-on: https://go-review.googlesource.com/c/go/+/384262
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-05 17:52:29 +00:00
Russ Cox
1af60b2f49 regexp/syntax: add and use ErrInvalidDepth
The fix for #51112 introduced a depth check but used
ErrInternalError to avoid introduce new API in a CL that
would be backported to earlier releases.

New API accepted in proposal #51684.

This CL adds a distinct error for this case.

For #51112.
Fixes #51684.

Change-Id: I068fc70aafe4218386a06103d9b7c847fb7ffa65
Reviewed-on: https://go-review.googlesource.com/c/go/+/384617
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-04 10:59:27 +00:00
Russ Cox
690ac4071f all: remove trailing blank doc comment lines
A future change to gofmt will rewrite

	// Doc comment.
	//
	func f()

to

	// Doc comment.
	func f()

Apply that change preemptively to all doc comments.

For #51082.

Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:07 +00:00
Andy Pan
737837c9d4 regexp: use input.step() to advance one rune in Regexp.allMatches()
Change-Id: I32944f4ed519419e168e62f9ed6df63961839259
Reviewed-on: https://go-review.googlesource.com/c/go/+/359197
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-27 20:22:06 +00:00
Jinwook Jeong
40e24a942b regexp: fix typo in the overview
Correct the slice expression in the description of Index functions.

Change-Id: I97a1b670c4c7e600d858f6550b647f677ef90b41
Reviewed-on: https://go-review.googlesource.com/c/go/+/360058
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
2022-03-02 11:49:04 +00:00
Russ Cox
452f24ae94 regexp/syntax: reject very deeply nested regexps in Parse
The regexp code assumes it can recurse over the structure of
a regexp safely. Go's growable stacks make that reasonable
for all plausible regexps, but implausible ones can reach the
“infinite recursion?” stack limit.

This CL limits the depth of any parsed regexp to 1000.
That is, the depth of the parse tree is required to be ≤ 1000.
Regexps that require deeper parse trees will return ErrInternalError.
A future CL will change the error to ErrInvalidDepth,
but using ErrInternalError for now avoids introducing new API
in point releases when this is backported.

Fixes #51112.

Change-Id: I97d2cd82195946eb43a4ea8561f5b95f91fb14c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/384616
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-02-10 15:23:05 +00:00
luochuanhang
e4ab8b0fe6 regexp: add the missing is
Change-Id: I23264972329aa3414067cd0e0986b69bb39bbeb5
GitHub-Last-Rev: d1d668a3cbe852d9a06f03369e7e635232d85139
GitHub-Pull-Request: golang/go#50650
Reviewed-on: https://go-review.googlesource.com/c/go/+/378935
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
2022-01-19 22:56:09 +00:00
Russ Cox
f229e7031a all: go fix -fix=buildtag std cmd (except for bootstrap deps, vendor)
When these packages are released as part of Go 1.18,
Go 1.16 will no longer be supported, so we can remove
the +build tags in these files.

Ran go fix -fix=buildtag std cmd and then reverted the bootstrapDirs
as defined in src/cmd/dist/buildtool.go, which need to continue
to build with Go 1.4 for now.

Also reverted src/vendor and src/cmd/vendor, which will need
to be updated in their own repos first.

Manual changes in runtime/pprof/mprof_test.go to adjust line numbers.

For #41184.

Change-Id: Ic0f93f7091295b6abc76ed5cd6e6746e1280861e
Reviewed-on: https://go-review.googlesource.com/c/go/+/344955
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2021-10-28 18:17:57 +00:00
Russ Cox
702e337174 regexp: document and implement that invalid UTF-8 bytes are the same as U+FFFD
What should it mean to run a regexp match on invalid UTF-8 bytes?
The coherent behavior options are:

1. Invalid UTF-8 does not match any character classes,
   nor a U+FFFD literal (nor \x{fffd}).
2. Each byte of invalid UTF-8 is treated identically to a U+FFFD in the input,
   as a utf8.DecodeRune loop might.

RE2 uses Rule 1.
Because it works byte at a time, it can also provide \C to match any
single byte of input, which matches invalid UTF-8 as well.
This provides the nice property that a match for a regexp without \C
is guaranteed to be valid UTF-8.

Unfortunately, today Go has an incoherent mix of these two, although
mostly Rule 2. This is a deviation from RE2, and it gives up the nice
property, but we probably can't correct that at this point.
In particular .* already matches entire inputs today, valid UTF-8 or
not, and I doubt we can break that.

This CL adopts Rule 2 officially, fixing the few places that deviate from it.

Fixes #48749.

Change-Id: I96402527c5dfb1146212f568ffa09dde91d71244
Reviewed-on: https://go-review.googlesource.com/c/go/+/354569
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2021-10-11 15:28:50 +00:00
Russ Cox
4d8db00641 all: use bytes.Cut, strings.Cut
Many uses of Index/IndexByte/IndexRune/Split/SplitN
can be written more clearly using the new Cut functions.
Do that. Also rewrite to other functions if that's clearer.

For #46336.

Change-Id: I68d024716ace41a57a8bf74455c62279bde0f448
Reviewed-on: https://go-review.googlesource.com/c/go/+/351711
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-10-06 15:53:04 +00:00
Russ Cox
2a61b3c590 regexp: fix repeat of preferred empty match
In Perl mode, (|a)* should match an empty string at the start of the
input. Instead it matches as many a's as possible.
Because (|a)+ is handled correctly, matching only an empty string,
this leads to the paradox that e* can match more text than e+
(for e = (|a)) and that e+ is sometimes different from ee*.

This is a very old bug that ultimately derives from the picture I drew
for e* in https://swtch.com/~rsc/regexp/regexp1.html. The picture is
correct for longest-match (POSIX) regexps but subtly wrong for
preferred-match (Perl) regexps in the case where e has a preferred
empty match. Pointed out by Andrew Gallant in private mail.

The current code treats e* and e+ as the same structure, with
different entry points. In the case of e* the preference list ends up
not quite in the right order, in part because the “before e” and
“after e” states are the same state. Splitting them apart fixes the
preference list, and that can be done by compiling e* as if it were
(e+)?.

Like with any bug fix, there is a very low chance of breaking a
program that accidentally depends on the buggy behavior.

RE2, Go, and Rust all have this bug, and we've all agreed to fix it,
to keep the implementations in sync.

Fixes #46123.

Change-Id: I70e742e71e0a23b626593b16ddef3c1e73b413b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/318750
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-05-13 14:52:20 +00:00
Russ Cox
d4b2638234 all: go fmt std cmd (but revert vendor)
Make all our package sources use Go 1.17 gofmt format
(adding //go:build lines).

Part of //go:build change (#41184).
See https://golang.org/design/draft-gobuild

Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/294430
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-02-20 03:54:50 +00:00
Tom Payne
1d3baf20dc regexp/syntax: add note about Unicode character classes
As proposed on golang-nuts:
https://groups.google.com/g/golang-nuts/c/M3lmSUptExQ/m/hRySV9GsCAAJ

Includes the latest updates from re2's mksyntaxgo:
https://code.googlesource.com/re2/+/refs/heads/master/doc/mksyntaxgo

Change-Id: Ib7b79aa6531f473feabd0a7f1d263cd65c4388e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/264678
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2020-11-25 15:10:09 +00:00
Andy Balholm
0ce1ffc49d regexp/syntax: append patchLists in constant time
By keeping a tail pointer, we can append to a patchList in constant
time, rather than in time proportional to the length of the list. This
gets rid of the quadratic compile times we were seeing for long series
of alternations.

This is basically the same change as
e9d517989f.

Fixes #39542.

Change-Id: Ib4ca0ca9c55abd1594df1984653c7d311ccf7572
Reviewed-on: https://go-review.googlesource.com/c/go/+/238079
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-06-18 00:40:11 +00:00
Russ Cox
4d9ecde30a regexp/syntax: fix comment on p.literal and simplify
p.literal's doc comment said it returned a value but it doesn't.
While we're here, p.newLiteral is only called from p.literal,
so simplify the code by merging the two.

Change-Id: Ia357937a99f4e7473f0f1ec837113a39eaeb83d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/222659
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-04-17 22:12:02 +00:00
Sylvain Zimmer
782fcb44b9 regexp: add (*Regexp).SubexpIndex
SubexpIndex returns the index of the first subexpression with the given name,
or -1 if there is no subexpression with that name.

Fixes #32420

Change-Id: Ie1f9d22d50fb84e18added80a9d9a9f6dca8ffc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/187919
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-10 09:38:07 +00:00
Keith Randall
9b80791584 regexp: skip long-running benchmarks if -short is specified
This CL helps race.bash finish in a reasonable amount of
time. Otherwise the Match/Hard1/32M benchmark takes over 1200 seconds
to finish on arm64, triggering a timeout.  With this change the regexp
benchmarks as a whole take only about a minute.

Change-Id: Ie2260ef9f5709e32a74bd76f135bc384b2d9853f
Reviewed-on: https://go-review.googlesource.com/c/go/+/201742
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-17 19:53:52 +00:00
Pantelis Sampaziotis
9748e64fe5 regexp: add examples for FindSubmatchIndex and Longest
updates #21450

Change-Id: Ibffe0dadc1e1523c55cd5f5b8a69bc1c399a255d
GitHub-Last-Rev: 507f55508121a525de4d210e7ada1396ccaaf367
GitHub-Pull-Request: golang/go#33497
Reviewed-on: https://go-review.googlesource.com/c/go/+/189177
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-28 00:37:35 +00:00
Liz Rice
5253a6dc72 regexp: add more examples for Regexp methods
Since I first started on this CL, most of the methods have had examples
added by other folks, so this is now one new example, and additions to
two existing examples for extra clarity.

The issue has a comment about not necessarily having examples for all
methods, but I recall finding this package pretty confusing when I first
used it, and having concrete examples would have really helped me
navigate all the different options. There are more
String methods with examples now, but I think seeing how the byte-slice
methods work could also be helpful to explain the differences.

Updates #21450

Change-Id: I27b4eeb634fb8ab59f791c0961cce79a67889826
Reviewed-on: https://go-review.googlesource.com/c/go/+/120145
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-18 20:26:36 +00:00
Pantelis Sampaziotis
68a6536848 regexp: add example for NumSubexp
Updates #21450

Change-Id: Idf276e97f816933cc0f752cdcd5e713b5c975833
GitHub-Last-Rev: 198e585f92db6e7ac126b49cd751b333e9a44b93
GitHub-Pull-Request: golang/go#33490
Reviewed-on: https://go-review.googlesource.com/c/go/+/189138
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-10 20:33:25 +00:00
psampaz
2b6b474f64 regexp: add example for ReplaceAll
Updates #21450

Change-Id: Ia31c20b52bae5daeb33d918234c2f0944a8aeb07
GitHub-Last-Rev: cc8554477024277c3c1b4122344e9d14427680b3
GitHub-Pull-Request: golang/go#33489
Reviewed-on: https://go-review.googlesource.com/c/go/+/189137
Run-TryBot: Sylvain Zimmer <sylvinus@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-05 23:52:39 +00:00