unicode: add CategoryAliases, Cn, LC

CategoryAliases is for regexp to use, for things like \p{Letter} as an alias for \p{L}.
Cn and LC are special-case categories that were never implemented
but should have been.

These changes were generated by the updated generator in CL 641395.

Fixes #70780.

Change-Id: Ibba20ff76191c8ae9631ac5ba19965790fe0cc81
Reviewed-on: https://go-review.googlesource.com/c/go/+/641376
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
This commit is contained in:
Russ Cox 2025-01-08 11:21:30 -05:00
parent 252c939445
commit 28fd9fa8a6
4 changed files with 1420 additions and 19 deletions

3
api/next/70780.txt Normal file
View File

@ -0,0 +1,3 @@
pkg unicode, var CategoryAliases map[string]string #70780
pkg unicode, var Cn *RangeTable #70780
pkg unicode, var LC *RangeTable #70780

View File

@ -0,0 +1,4 @@
The new [CategoryAliases] map provides access to category alias names, such as “Letter” for “L”.
The new categories [Cn] and [LC] define unassigned codepoints and cased letters, respectively.
These have always been defined by Unicode but were inadvertently omitted in earlier versions of Go.
The [C] category now includes [Cn], meaning it has added all unassigned code points.

View File

@ -52,6 +52,10 @@ var inCategoryTest = []T{
{0x00bb, "P"}, {0x00bb, "P"},
{0x00a2, "S"}, {0x00a2, "S"},
{0x00a0, "Z"}, {0x00a0, "Z"},
{0x0065, "LC"},
// Unassigned
{0x0378, "Cn"},
{0x0378, "C"},
} }
var inPropTest = []T{ var inPropTest = []T{

File diff suppressed because it is too large Load Diff