An optional interface FileInfoNames has been added.
If the parameter fi of FileInfoHeader implements the interface
the Gname/Uname of the return value Header
are provided by the method of the interface.
Also added testing.
Fixes#50102
Change-Id: I47976e238eb20ed43113b060e4f83a14ae49493e
GitHub-Last-Rev: a213613c79e150d52a2f5c84dca7a49fe123fa40
GitHub-Pull-Request: golang/go#65273
Reviewed-on: https://go-review.googlesource.com/c/go/+/558355
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This reverts CL 514235. Also reverts CL 518056 which is a followup
fix.
Reason for revert: Proposal #50102 defined an interface that is
too specific to UNIX-y systems and also didn't make much sense.
The proposal is un-accepted, and we'll revisit in Go 1.23.
Fixes (via backport) #65245.
Updates #50102.
Change-Id: I41ba0ee286c1d893e6564a337e5d76418d19435d
Reviewed-on: https://go-review.googlesource.com/c/go/+/558295
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Change-Id: I813aa09f8a65936796469fa637d0f23004d26098
Reviewed-on: https://go-review.googlesource.com/c/go/+/534757
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
For #50102
Change-Id: I28b5579611b07952b6379bc4603daf29a86a3be0
Reviewed-on: https://go-review.googlesource.com/c/go/+/518056
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Tianon Gravi (Andrew) <admwiggin@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
An optional interface FileInfoNames has been added.
If the parameter fi of FileInfoHeader implements the interface
the Gname and Uname of the return value Header are
provided by the method of the interface.
Also added testing.
Fixes#50102
Change-Id: I6fd06c7c9aaf29b22b7384542fe57affed33009a
Change-Id: I6fd06c7c9aaf29b22b7384542fe57affed33009a
GitHub-Last-Rev: 5e82257948759e13880d8af12743b9524ae3df5a
GitHub-Pull-Request: golang/go#61662
Reviewed-on: https://go-review.googlesource.com/c/go/+/514235
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Change-Id: I23e0005071fcbafeaecaa05f51712dd1de6eed01
Change-Id: I23e0005071fcbafeaecaa05f51712dd1de6eed01
GitHub-Last-Rev: 364d7c74fef1668930b730b05a7539f7ac43e60a
GitHub-Pull-Request: golang/go#61661
Reviewed-on: https://go-review.googlesource.com/c/go/+/514215
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
The new String methods use the new FormatFileInfo and
FormatDirEntry functions.
Fixes#54451
Change-Id: I414cdfc212ec3c316fb2734756d2117842a23631
Reviewed-on: https://go-review.googlesource.com/c/go/+/491175
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
End-of-line comments are not doc comments,
so Deprecated notes in them are not recognized
as deprecation notices. Rewrite the comments.
Change-Id: Idb19603d7fc2ec8e3a2f74bacb74fbbec5583d20
Reviewed-on: https://go-review.googlesource.com/c/go/+/453615
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Return a distinguishable error when reading an archive file
with a path that is:
- absolute
- escapes the current directory (../a)
- on Windows, a reserved name such as NUL
Users may ignore this error and proceed if they do not need name
sanitization or intend to perform it themselves.
Fixes#25849Fixes#55356
Change-Id: Ieefa163f00384bc285ab329ea21a6561d39d8096
Reviewed-on: https://go-review.googlesource.com/c/go/+/449937
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Damien Neil <dneil@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]
Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.
For #51082.
Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.
Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.
A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.
Fixes#49884.
Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Many of the methods inside the archive/tar package don't need to be
exported. Doing so sets a bad precedent that it's OK to export methods
to indicate an internal public API. That's not a good idea in general,
because exported methods increase cognitive load when reading code:
the reader needs to consider whether the exported method might be used
via some external interface or reflection.
This CL should have no externally visible behaviour changes at all.
Change-Id: I94a63de5e6a28e9ac8a283325217349ebce4f308
Reviewed-on: https://go-review.googlesource.com/c/go/+/341410
Reviewed-by: Joe Tsai <joetsai@digital-static.net>
Trust: Joe Tsai <joetsai@digital-static.net>
Trust: Michael Knyszek <mknyszek@google.com>
The old os references are still valid, but update our code
to reflect best practices and get used to the new locations.
Code compiled with the bootstrap toolchain
(cmd/asm, cmd/dist, cmd/compile, debug/elf)
must remain Go 1.4-compatible and is excluded.
For #41190.
Change-Id: I8f9526977867c10a221e2f392f78d7dec073f1bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/243907
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Change Reader to promote TypeRegA to TypeReg in headers, unless their
name have a trailing slash which is already promoted to TypeDir. This
will allow client code to handle just TypeReg instead both TypeReg and
TypeRegA.
Change Writer to promote TypeRegA to TypeReg or TypeDir in the headers
depending on whether the name has a trailing slash. This normalization
is motivated by the specification (in pax(1)):
0 represents a regular file. For backwards-compatibility, a
typeflag value of binary zero ( '\0' ) should be recognized as
meaning a regular file when extracting files from the
archive. Archives written with this version of the archive file
format create regular files with a typeflag value of the
ISO/IEC 646:1991 standard IRV '0'.
Fixes#22768.
Change-Id: I149ec55824580d446cdde5a0d7a0457ad7b03466
Reviewed-on: https://go-review.googlesource.com/85656
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Several usages of tar (reasonably) just use the Header.FileInfo
to determine the type of the header. However, the os.FileMode type
is not expressive enough to represent "files" that are not files
at all, but some form of metadata.
Thus, Header{Typeflag: TypeXGlobalHeader}.FileInfo().Mode().IsRegular()
reports true, even though the expected result may have been false.
To reduce (not eliminate) the possibility of failure for such usages,
use the placeholder filename from the global PAX headers.
Thus, in the event the user did not handle special "meta" headers
specifically, they will just be written to disk as a regular file.
As an example use case, the "git archive --format=tgz" command produces
an archive where the first "file" is a global PAX header with the
name "global_pax_header". For users that do not explicitly check
the Header.Typeflag field to ignore such headers, they may end up
extracting a file named "global_pax_header". While it is a bogus file,
it at least does not stop the extraction process.
Updates #22748
Change-Id: I28448b528dcfacb4e92311824c33c71b482f49c9
Reviewed-on: https://go-review.googlesource.com/78355
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL removes the following APIs:
type SparseEntry struct{ ... }
type Header struct{ SparseHoles []SparseEntry; ... }
func (*Header) DetectSparseHoles(f *os.File) error
func (*Header) PunchSparseHoles(f *os.File) error
func (*Reader) WriteTo(io.Writer) (int, error)
func (*Writer) ReadFrom(io.Reader) (int, error)
This API was added during the Go1.10 dev cycle, and are safe to remove.
The rationale for reverting is because Header.DetectSparseHoles and
Header.PunchSparseHoles are functionality that probably better belongs in
the os package itself.
The other API like Header.SparseHoles, Reader.WriteTo, and Writer.ReadFrom
perform no OS specific logic and only perform the actual business logic of
reading and writing sparse archives. Since we do know know what the API added to
package os may look like, we preemptively revert these non-OS specific changes
as well by simply commenting them out.
Updates #13548
Updates #22735
Change-Id: I77842acd39a43de63e5c754bfa1c26cc24687b70
Reviewed-on: https://go-review.googlesource.com/78030
Reviewed-by: Russ Cox <rsc@golang.org>
Change error message prefix from "tar:" to "archive/tar:" to maintain
backwards compatibility with Go1.9 and earlier in the unfortunate event
that someone is relying on string parsing of errors.
Fixes#22740
Change-Id: I59039c59818a0599e9d3b06bb5a531aa22a389b8
Reviewed-on: https://go-review.googlesource.com/77933
Reviewed-by: roger peppe <rogpeppe@gmail.com>
CL 59230 changed Writer.WriteHeader to ignore the ChangeTime and AccessTime
fields when considering using the USTAR format when the format is unspecified.
This policy is confusing and leads to unexpected behavior where some files
have ModTime only, while others have ModTime+AccessTime+ChangeTime if the
format became PAX for some unrelated reason (e.g., long pathname).
Change the policy to simply always ignore ChangeTime, AccessTime, and
sub-second time resolutions unless the user explicitly specifies a format.
This is a safe policy change since WriteHeader had no support for the
above features in any Go release.
Support for ChangeTime and AccessTime was added in CL 55570.
Support for sub-second times was added in CL 55552.
Both CLs landed after the latest Go release (i.e., Go1.9), which was
cut from the master branch around August 6th, 2017.
Change-Id: Ib82baa1bf9dd4573ed4f674b7d55d15f733a4843
Reviewed-on: https://go-review.googlesource.com/69296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The USTAR format says:
<<<
Implementors should be aware that the previous file format did not include
a mechanism to archive directory type files.
For this reason, the convention of using a filename ending with
<slash> was adopted to specify a directory on the archive.
>>>
In light of this suggestion, make the following changes:
* Writer.WriteHeader refuses to encode a header where a file that
is obviously a file-type has a trailing slash in the name.
* formatter.formatString avoids encoding a trailing slash in the event
that the string is truncated (the full string will be encoded elsewhere,
so stripping the slash is safe).
* Reader.Next treats a TypeRegA (which is the zero value of Typeflag)
as a TypeDir if the name has a trailing slash.
Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6
Reviewed-on: https://go-review.googlesource.com/69293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
To support the detection and creation of sparse files,
add two new methods:
func Header.DetectSparseHoles(*os.File) error
func Header.PunchSparseHoles(*os.File) error
DetectSparseHoles is intended to be used after FileInfoHeader
prior to serializing the Header with WriteHeader.
For each OS, it uses specialized logic to detect
the location of sparse holes. On most Unix systems, it uses
SEEK_HOLE and SEEK_DATA to query for the holes.
On Windows, it uses a specialized the FSCTL_QUERY_ALLOCATED_RANGES
syscall to query for all the holes.
PunchSparseHoles is intended to be used after Reader.Next
prior to populating the file with Reader.WriteTo.
On Windows, this uses the FSCTL_SET_ZERO_DATA syscall.
On other operating systems it simply truncates the file
to the end-offset of SparseHoles.
DetectSparseHoles and PunchSparseHoles are added as methods on
Header because they are heavily tied to the operating system,
for which there is already an existing precedence for
(since FileInfoHeader makes uses of OS-specific details).
Fixes#13548
Change-Id: I98a321dd1ce0165f3d143d4edadfda5e7db67746
Reviewed-on: https://go-review.googlesource.com/60871
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
To support the efficient packing and extracting of sparse files,
add two new methods:
func Reader.WriteTo(io.Writer) (int64, error)
func Writer.ReadFrom(io.Reader) (int64, error)
If the current archive entry is sparse and the provided io.{Reader,Writer}
is also an io.Seeker, then use Seek to skip past the holes.
If the last region in a file entry is a hole, then we seek to 1 byte
before the EOF:
* for Reader.WriteTo to write a single byte
to ensure that the resulting filesize is correct.
* for Writer.ReadFrom to read a single byte
to verify that the input filesize is correct.
The downside of this approach is when the last region in the sparse file
is a hole. In the case of Reader.WriteTo, the 1-byte write will cause
the last fragment to have a single chunk allocated.
However, the goal of ReadFrom/WriteTo is *not* the ability to
exactly reproduce sparse files (in terms of the location of sparse holes),
but rather to provide an efficient way to create them.
File systems already impose their own restrictions on how the sparse file
will be created. Some filesystems (e.g., HFS+) don't support sparseness and
seeking forward simply causes the FS to write zeros. Other filesystems
have different chunk sizes, which will cause chunk allocations at boundaries
different from what was in the original sparse file. In either case,
it should not be a normal expectation of users that the location of holes
in sparse files exactly matches the source.
For users that really desire to have exact reproduction of sparse holes,
they can wrap os.File with their own io.WriteSeeker that discards the
final 1-byte write and uses File.Truncate to resize the file to the
correct size.
Other reasons we choose this approach over special-casing *os.File because:
* The Reader already has special-case logic for io.Seeker
* As much as possible, we want to decouple OS-specific logic from
Reader and Writer.
* This allows other abstractions over *os.File to also benefit from
the "skip past holes" logic.
* It is easier to test, since it is harder to mock an *os.File.
Updates #13548
Change-Id: I0a4f293bd53d13d154a946bc4a2ade28a6646f6a
Reviewed-on: https://go-review.googlesource.com/60872
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Use "file" consistently instead of "entry".
Change-Id: Ia81c9665d0d956adb78f7fa49de40cdb87fba000
Reviewed-on: https://go-review.googlesource.com/60150
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Nearly every Header obtained from FileInfoHeader via the FS has
timestamps with sub-second resolution and the AccessTime
and ChangeTime fields populated. This forces the PAX format
to almost always be used, which has the following problems:
* PAX is still not as widely supported compared to USTAR
* The PAX headers will occupy at minimum 1KiB for every entry
The old behavior of tar Writer had no support for sub-second resolution
nor any support for AccessTime or ChangeTime, so had neither problem.
Instead the Writer would just truncate sub-second information and
ignore the AccessTime and ChangeTime fields.
In this CL, we preserve the behavior such that the *default* behavior
would output a USTAR header for most cases by truncating sub-second
time measurements and ignoring AccessTime and ChangeTime.
To use either of the features, users will need to explicitly specify
that the format is PAX or GNU.
The exact policy chosen is this:
* USTAR and GNU may still be chosen even if sub-second measurements
are present; they simply truncate the timestamp to the nearest second.
As before, PAX uses sub-second resolutions.
* If the Format is unspecified, then WriteHeader ignores AccessTime
and ChangeTime when using the USTAR format.
This ensures that USTAR may still be chosen for a vast majority of
file entries obtained through FileInfoHeader.
Updates #11171
Updates #17876
Change-Id: Icc5274d4245922924498fd79b8d3ae94d5717271
Reviewed-on: https://go-review.googlesource.com/59230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Many aspects of the package is woefully undocumented.
With the recent flurry of improvements, the package is now at feature
parity with the GNU and TAR tools. Thoroughly all of the public API
and perform some minor stylistic cleanup in some code segments.
Change-Id: Ic892fd72c587f30dfe91d1b25b88c9c8048cc389
Reviewed-on: https://go-review.googlesource.com/59210
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The PAX specification says the following:
<<<
'g' represents global extended header records for the following files in the archive.
The format of these extended header records shall be as described in pax Extended Header.
Each value shall affect all subsequent files that do not override that value
in their own extended header record and until another global extended header record
is reached that provides another value for the same field.
>>>
This CL adds support for parsing and composing global PAX records,
but intentionally does not provide support for automatically
persisting the global state across files.
Changes made:
* When Reader encounters a TypeXGlobalRecord header, it parses the
PAX records and returns them to the user ad-verbatim. Reader does not
store them in its state, ensuring it has no effect on future Next calls.
* When Writer receives a TypeXGlobalRecord header, it writes the
PAX records to the archive ad-verbatim. It does not store them in
its state, ensuring it has no effect on future WriteHeader calls.
* The restriction regarding empty record values is lifted since this
value is used to represent deletion in global headers.
Why provide raw support only:
* Some archives in the wild have a global header section (often empty)
and it is the user's responsibility to manually read and discard it's body.
The logic added here allows users to more easily skip over these sections.
* For users that do care about global headers, having access to the raw
records allows them to implement the functionality of global headers themselves
and manually persist the global state across files.
* We can still upgrade to a full implementation in the future.
Why we don't provide full support:
* Even though the PAX specification describes their operation in detail,
both the GNU and BSD tar tools (which are the most common implementations)
do not have a consistent interpretation of many details.
* Global headers were a controversial feature in PAX, by admission of the
specification itself:
<<<
The concept of a global extended header (typeflag g) was controversial.
The typeflag g global headers should not be used with interchange media that
could suffer partial data loss in transporting the archive.
>>>
* Having state persist from entry-to-entry complicates the implementation
for a feature that is not widely used and not well supported.
Change-Id: I1d904cacc2623ddcaa91525a5470b7dbe226c7e8
Reviewed-on: https://go-review.googlesource.com/59190
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
This CL adds the following new publicly visible API:
type Header struct { ...; PAXRecords map[string]string }
The new Header.PAXRecords field is a map of all PAX extended header records.
We suggest (but do not enforce) that users use VENDOR-prefixed keys
according to the following in the PAX specification:
<<<
The standard developers have reserved keyword name space for vendor extensions.
It is suggested that the format to be used is:
VENDOR.keyword
where VENDOR is the name of the vendor or organization in all uppercase letters.
>>>
When reading, the Header.PAXRecords is populated with all PAX records
encountered so far, including basic ones (e.g., "path", "mtime", etc).
When writing, the fields of Header will be merged into PAXRecords,
overwriting any records that may conflict.
Since PAXRecords is a more expressive feature than Xattrs and
is entirely a superset of Xattrs, we mark Xattrs as deprecated,
and steer users towards the new PAXRecords API.
The issue has a discussion about adding a Header.SetPAXRecord method
to help validate records and keep the Header fields in sync.
However, we do not include that in this CL since that helper
method can always be added in the future.
There is no support for global records.
Fixes#14472
Change-Id: If285a52749acc733476cf75a2c7ad15bc1542071
Reviewed-on: https://go-review.googlesource.com/58390
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
WriteHeader may fail to encode a header for any number of reasons,
which can be frustrating for the user when trying to create a tar archive.
As we validate the Header, we generate an informative error message
intended for human consumption and return that if and only if no
format can be selected.
This allows WriteHeader to return informative errors like:
tar: cannot encode header: invalid PAX record: "linkpath = \x00hello"
tar: cannot encode header: invalid PAX record: "SCHILY.xattr.foo=bar = baz"
tar: cannot encode header: Format specifies GNU; and only PAX supports Xattrs
tar: cannot encode header: Format specifies GNU; and GNU cannot encode ModTime=1969-12-31 15:59:59.0000005 -0800 PST
tar: cannot encode header: Format specifies GNU; and GNU supports sparse files only with TypeGNUSparse
tar: cannot encode header: Format specifies USTAR; and USTAR cannot encode ModTime=292277026596-12-04 07:30:07 -0800 PST
tar: cannot encode header: Format specifies USTAR; and USTAR does not support sparse files
tar: cannot encode header: Format specifies PAX; and only GNU supports TypeGNUSparse
Updates #18710
Change-Id: I82a498d6f29d02c4e73bce47b768eb578da8499c
Reviewed-on: https://go-review.googlesource.com/58310
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The Reader and Writer are now at feature parity,
meaning that everything that can be parsed by the Reader,
can also be composed by the Writer.
This position enables us to support selection of the format
in a backwards compatible way, since it ensures that everything
that can be read can also be round-trip written.
As such, we add the following new API:
type Format int
const FormatUnknown Format = 0 ...
type Header struct { ...; Format Format }
The new Header.Format field is populated by the Reader on the
best guess on what the format is. Note that the Reader is very liberal
in what it permits, so a hybrid TAR file using aspects of multiple
formats can still be decoded, but will be reported as FormatUnknown.
Even though Reader has full support for V7 and basic support for STAR,
it will still report those formats as unknown (and the constants for
those formats are not even exported). The reasons for this is because
the Writer has no support for V7 or STAR. Leaving it as unknown allows
the Writer to choose a format usually USTAR or GNU that can encode
the equivalent Header.
When writing, the Header.allowedFormats will take the Format field
into consideration if it is a known format.
Fixes#18710
Change-Id: I00980c475d067c6969d3414e1ff0224fdd89cd49
Reviewed-on: https://go-review.googlesource.com/58230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This CL is the second step (of two; part1 is CL/56771) for adding
sparse file support to the Writer.
There are no new identifiers exported in this CL, but this does make
use of Header.SparseHoles added in part1. If the Typeflag is set to
TypeGNUSparse or len(SparseHoles) > 0, then the Writer will emit an
sparse file, where the holes must be written by the user as zeros.
If TypeGNUSparse is set, then the output file must use the GNU format.
Otherwise, it must use the PAX format (with GNU-defined PAX keys).
A future CL may export Reader.Discard and Writer.FillZeros,
but those methods are currently unexported, and only used by the
tests for efficiency reasons.
Calling Discard or FillZeros on a hole 10GiB in size does take
time, even if it is essentially a memcopy.
Updates #13548
Change-Id: Id586d9178c227c0577f796f731ae2cbb72355601
Reviewed-on: https://go-review.googlesource.com/57212
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This CL is the first step (of two) for adding sparse file support
to the Writer. This CL only refactors the logic of sparse-file handling
in the Reader so that common logic can be easily shared by the Writer.
As a result of this CL, there are some new publicly visible API changes:
type SparseEntry struct { Offset, Length int64 }
type Header struct { ...; SparseHoles []SparseEntry }
A new type is defined to represent a sparse fragment and a new field
Header.SparseHoles is added to represent the sparse holes in a file.
The API intentionally represent sparse files using hole fragments,
rather than data fragments so that the zero value of SparseHoles
naturally represents a normal file (i.e., a file without any holes).
The Reader now populates SparseHoles for sparse files.
It is necessary to export the sparse hole information, otherwise it would
be impossible for the Writer to specify that it is trying to encode
a sparse file, and what it looks like.
Some unexported helper functions were added to common.go:
func validateSparseEntries(sp []SparseEntry, size int64) bool
func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry
func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry
The validation logic that used to be in newSparseFileReader is now moved
to validateSparseEntries so that the Writer can use it in the future.
alignSparseEntries is currently unused by the Reader, but will be used
by the Writer in the future. Since TAR represents sparse files by
only recording the data fragments, we add the invertSparseEntries
function to convert a list of data fragments to a normalized list
of hole fragments (and vice-versa).
Some other high-level changes:
* skipUnread is deleted, where most of it's logic is moved to the
Discard methods on regFileReader and sparseFileReader.
* readGNUSparsePAXHeaders was rewritten to be simpler.
* regFileReader and sparseFileReader were completely rewritten
in simpler and easier to understand logic.
* A bug was fixed in sparseFileReader.Read where it failed to
report an error if the logical size of the file ends before
consuming all of the underlying data.
* The tests for sparse-file support was completely rewritten.
Updates #13548
Change-Id: Ic1233ae5daf3b3f4278fe1115d34a90c4aeaf0c2
Reviewed-on: https://go-review.googlesource.com/56771
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The GNU tar format defines the following type flags:
TypeGNULongName = 'L' // Next file has a long name
TypeGNULongLink = 'K' // Next file symlinks to a file w/ a long name
Anytime a string exceeds the field dedicated to store it, the GNU format
permits a fake "file" to be prepended where that file entry has a Typeflag
of 'L' or 'K' and the contents of the file is a NUL-terminated string.
Contrary to previous TODO comments,
the GNU format supports arbitrary strings (without NUL) rather UTF-8 strings.
The manual says the following:
<<<
The name, linkname, magic, uname, and gname are
null-terminated character strings
>>>
<<<
All characters in header blocks are represented
by using 8-bit characters in the local variant of ASCII.
>>>
From this description, we gather the following:
* We must forbid NULs in any GNU strings
* Any 8-bit value (other than NUL) is permitted
Since the modern world has moved to UTF-8, it is really difficult to
determine what a "local variant of ASCII" means. For this reason,
we treat strings as just an arbitrary binary string (without NUL)
and leave it to the user to determine the encoding of this string.
(Practically, it seems that UTF-8 is the typical encoding used
in GNU archives seen in the wild).
The implementation of GNU tar seems to confirm this interpretation
of the manual where it permits any arbitrary binary string to exist
within these fields so long as they do not contain the NUL character.
$ touch `echo -e "not\x80\x81\x82\x83utf8"`
$ gnutar -H gnu --tar -cvf gnu-not-utf8.tar $(echo -e "not\x80\x81\x82\x83utf8")
The fact that we permit arbitrary binary in GNU strings goes
hand-in-hand with the fact that GNU also permits a "base-256" encoding
of numeric fields, which is effectively two-complement binary.
Change-Id: Ic037ec6bed306d07d1312f0058594bd9b64d9880
Reviewed-on: https://go-review.googlesource.com/55573
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The logic for USTAR was disabled because a previous implementation of
Writer had a wrong understanding of the differences between USTAR and GNU,
causing the prefix field is incorrectly be populated in GNU files.
Now that this issue has been fixed, we can re-enable the logic for USTAR
path splitting, which allows Writer to use the USTAR for a wider range
of possible inputs.
Updates #9683
Updates #12594
Updates #17630
Change-Id: I9fe34e5df63f99c6dd56fee3a7e7e4d6ec3995c9
Reviewed-on: https://go-review.googlesource.com/55574
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Move all sentinel errors to common.go since some of them are
returned by both the reader and writer and remove errInvalidHeader
since it not used.
Also, consistently use the "tar: " prefix for errors.
Change-Id: I0afb185bbf3db80dfd9595321603924454a4c2f9
Reviewed-on: https://go-review.googlesource.com/55650
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Both the GNU and PAX formats support atime and ctime fields.
The implementation is trivial now that we have:
* support for formatting PAX records for timestamps
* dedicated methods that only handle one format (e.g., GNU)
Fixes#17876
Change-Id: I0c604fce14a47d722098afc966399cca2037395d
Reviewed-on: https://go-review.googlesource.com/55570
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
We forbid empty keys or keys with '=' because it leads to ambiguous parsing.
Relevent PAX specification:
<<<
A keyword shall not include an <equals-sign>.
>>>
Also, we forbid the writer from encoding records with an empty value.
While, this is a valid record syntactically, the semantics of an empty
value is that previous records with that key should be deleted.
Since we have no support (and probably never will) for global PAX records,
deletion is a non-sensible operation.
<<<
If the <value> field is zero length,
it shall delete any header block field,
previously entered extended header value,
or global extended header value of the same name.
>>>
Fixes#20698Fixes#15567
Change-Id: Ia29c5c6ef2e36cd9e6d7f6cff10e92b96a62f0d1
Reviewed-on: https://go-review.googlesource.com/55571
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add support for PAX subsecond resolution times. Since the parser
supports negative timestamps, the formatter also handles negative
timestamps.
The relevant PAX specification is:
<<<
Portable file timestamps cannot be negative. If pax encounters a
file with a negative timestamp in copy or write mode, it can reject
the file, substitute a non-negative timestamp, or generate a
non-portable timestamp with a leading '-'.
>>>
<<<
All of these time records shall be formatted as a decimal
representation of the time in seconds since the Epoch.
If a <period> ( '.' ) decimal point character is present,
the digits to the right of the point shall represent the units of
a subsecond timing granularity, where the first digit is tenths of
a second and each subsequent digit is a tenth of the previous digit.
>>>
Fixes#11171
Change-Id: Ied108f3d2654390bc1b0ddd66a4081c2b83e490b
Reviewed-on: https://go-review.googlesource.com/55552
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The current logic in writeHeader attempts to encode the Header in one
format and if it discovered that it could not it would attempt to
switch to a different format mid-way through. This makes it very
hard to reason about what format will be used in the end and whether
it will even be a valid format.
Instead, we should verify from the start what formats are allowed
to encode the given input Header. If no formats are possible,
then we can return immediately, rejecting the Header.
For now, we continue on to the hairy logic in writeHeader, but
a future CL can split that logic up and specialize them for each
format now that we know what is possible.
Update #9683
Update #12594
Change-Id: I8406ea855dfcb8b478a03a7058ddf8b2b09d46dc
Reviewed-on: https://go-review.googlesource.com/54433
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
When writing tar files by using the FileInfoHeader
the type bits was set in the mode field of the header
This is not correct according to the standard (GNU/Posix) and
other implementations.
Fixed#20150
Change-Id: I3be7d946a1923ad5827cf45c696546a5e287ebba
Reviewed-on: https://go-review.googlesource.com/42093
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Most calls to strconv.ParseInt(x, 10, 0) should really be
calls to strconv.ParseInt(x, 10, 64) in order to ensure that they
do not overflow on 32b architectures.
Furthermore, we should document a bug where Uid and Gid may
overflow on 32b machines since the type is declared as int.
Change-Id: I99c0670b3c2922e4a9806822d9ad37e1a364b2b8
Reviewed-on: https://go-review.googlesource.com/28472
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Move all parse/format related functionality into strconv.go
and thoroughly test them. This also reduces the amount of noise
inside reader.go and writer.go.
There was zero functionality change other than moving code around.
Change-Id: I3bc288d10c20ebb3814b30b75d8acd7be62b85d7
Reviewed-on: https://go-review.googlesource.com/28470
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The Reader and Writer have hard-coded constants regarding the
offsets and lengths of certain fields in the tar format sprinkled
all over. This makes it harder to verify that the offsets are
correct since a reviewer would need to search for them throughout
the code. Instead, all information about the layout of header
fields should be centralized in one single file. This has the
advantage of being both centralized, and also acting as a form
of documentation about the header struct format.
This method was chosen over using "encoding/binary" since that
method would cause an allocation of a header struct every time
binary.Read was called. This method causes zero allocations and
its logic is no longer than if structs were declared.
Updates #12594
Change-Id: Ic7a0565d2a2cd95d955547ace3b6dea2b57fab34
Reviewed-on: https://go-review.googlesource.com/14669
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Certain special type-flags, specifically 1, 2, 3, 4, 5, 6,
do not have a data section. Thus, regardless of what the size field
says, we should not attempt to read any data for these special types.
The relevant PAX and USTAR specification says:
<<<
If the typeflag field is set to specify a file to be of type 1 (a link)
or 2 (a symbolic link), the size field shall be specified as zero.
If the typeflag field is set to specify a file of type 5 (directory),
the size field shall be interpreted as described under the definition
of that record type. No data logical records are stored for types 1, 2, or 5.
If the typeflag field is set to 3 (character special file),
4 (block special file), or 6 (FIFO), the meaning of the size field is
unspecified by this volume of POSIX.1-2008, and no data logical records shall
be stored on the medium.
Additionally, for type 6, the size field shall be ignored when reading.
If the typeflag field is set to any other value, the number of logical
records written following the header shall be (size+511)/512, ignoring
any fraction in the result of the division.
>>>
Contrary to the specification, we do not assert that the size field
is zero for type 1 and 2 since we liberally accept non-conforming formats.
Change-Id: I666b601597cb9d7a50caa081813d90ca9cfc52ed
Reviewed-on: https://go-review.googlesource.com/16614
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The issue was identified while
working with round trip FileInfo of the headers of hardlinks. Also,
additional test cases for hard link handling.
(review carried over from http://golang.org/cl/165860043)
Fixes#9027
Change-Id: I9e3a724c8de72eb1b0fbe0751a7b488894911b76
Reviewed-on: https://go-review.googlesource.com/6790
Reviewed-by: Russ Cox <rsc@golang.org>