External changes: - We now print the actual reason phrase sent by the server instead of guessing it from the status code. That is, if servers reply with "200 Wonderful" instead of "200 OK" then we show that. This is especially useful for status codes that xh doesn't recognize. - Header values are now decoded as latin1, with the UTF-8 decoding also shown if applicable. - A new FAQ file with an entry that explains header value encoding. Header output now hyperlinks to this entry when relevant and if supported by the terminal. Under the hood we now color headers manually. It's still hooked up to the `.tmTheme` files but not to the `.sublime-syntax` file. This lets us highlight the latin1 header values differently. In the future we could use the same approach to optimize JSON highlighting. I'm unsure about the position of the hyperlink. Currently it's the text "UTF-8" in `<latin1 value> (UTF-8: <utf-8 value>)`. But that means it's only shown if the value can be decoded as UTF-8. An alternative is to turn the latin1 value itself into a hyperlink, but that's confusing if the value itself is already a URL (which is a common case for the `Location` header). I also don't feel that our text is quite distinct enough from the header value in the default `ansi` theme. Though the hyperlink does help to set it apart.
1.7 KiB
Why do some HTTP headers show up mangled?
HTTP header values are officially only supposed to contain ASCII. Other bytes are "opaque data":
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.
(RFC 7230)
In practice some headers are for some purposes treated like UTF-8, which supports all languages and characters in Unicode. But if you try to access header values through a browser's fetch()
API or view them in the developer tools then they tend to be decoded as ISO-8859-1, which only supports a very limited number of characters and may not be the actual intended encoding.
xh as of version 0.23.0 shows the ISO-8859-1 decoding by default to avoid a confusing difference with web browsers. If the value looks like valid UTF-8 then it additionally shows the UTF-8 decoding.
That is, the following request:
xh -v https://example.org Smile:☺
Displays the Smile
header like this:
Smile: â<>º (UTF-8: ☺)
The server will probably see â<EFBFBD>º
instead of the smiley. Or it might see ☺
after all. It depends!