xh/FAQ.md at http-over-unix-socket

mirror of https://github.com/ducaale/xh.git synced 2025-05-05 15:32:50 +00:00

Jan Verbeek 00bc6f2238 Decode headers as latin1/UTF-8, show real reason phrase

External changes:

- We now print the actual reason phrase sent by the server instead
  of guessing it from the status code. That is, if servers reply with
  "200 Wonderful" instead of "200 OK" then we show that. This is
  especially useful for status codes that xh doesn't recognize.

- Header values are now decoded as latin1, with the UTF-8 decoding
  also shown if applicable.

- A new FAQ file with an entry that explains header value encoding.
  Header output now hyperlinks to this entry when relevant and if
  supported by the terminal.

Under the hood we now color headers manually. It's still hooked up to
the `.tmTheme` files but not to the `.sublime-syntax` file. This lets
us highlight the latin1 header values differently. In the future we
could use the same approach to optimize JSON highlighting.

I'm unsure about the position of the hyperlink. Currently it's the
text "UTF-8" in `<latin1 value> (UTF-8: <utf-8 value>)`. But that
means it's only shown if the value can be decoded as UTF-8. An
alternative is to turn the latin1 value itself into a hyperlink, but
that's confusing if the value itself is already a URL (which is a
common case for the `Location` header).

I also don't feel that our text is quite distinct enough from the
header value in the default `ansi` theme. Though the hyperlink does
help to set it apart.

2024-07-04 21:34:52 +02:00

1.7 KiB

Raw Permalink Blame History

Why do some HTTP headers show up mangled?

HTTP header values are officially only supposed to contain ASCII. Other bytes are "opaque data":

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

(RFC 7230)

In practice some headers are for some purposes treated like UTF-8, which supports all languages and characters in Unicode. But if you try to access header values through a browser's fetch() API or view them in the developer tools then they tend to be decoded as ISO-8859-1, which only supports a very limited number of characters and may not be the actual intended encoding.

xh as of version 0.23.0 shows the ISO-8859-1 decoding by default to avoid a confusing difference with web browsers. If the value looks like valid UTF-8 then it additionally shows the UTF-8 decoding.

That is, the following request:

xh -v https://example.org Smile:☺

Displays the Smile header like this:

Smile: â<>º (UTF-8: ☺)

The server will probably see â<EFBFBD>º instead of the smiley. Or it might see ☺ after all. It depends!

1.7 KiB Raw Permalink Blame History Unescape Escape

Why do some HTTP headers show up mangled?

1.7 KiB

Raw Permalink Blame History