From 012917afba1dfe62b37acf8f5087b98c11f64f25 Mon Sep 17 00:00:00 2001 From: Rob Pike Date: Wed, 8 Jul 2015 15:53:47 +1000 Subject: [PATCH] doc: document the machine-independent changes to the assembler The architecture-specific details will be updated and expanded in a subsequent CL (or series thereof). Update #10096 Change-Id: I59c6be1fcc123fe8626ce2130e6ffe71152c87af Reviewed-on: https://go-review.googlesource.com/11954 Reviewed-by: Russ Cox --- doc/asm.html | 161 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 137 insertions(+), 24 deletions(-) diff --git a/doc/asm.html b/doc/asm.html index 3f116ea607..b283efde61 100644 --- a/doc/asm.html +++ b/doc/asm.html @@ -6,16 +6,16 @@

A Quick Guide to Go's Assembler

-This document is a quick outline of the unusual form of assembly language used by the gc -Go compiler. +This document is a quick outline of the unusual form of assembly language used by the gc Go compiler. The document is not comprehensive.

-The assembler is based on the input to the Plan 9 assemblers, which is documented in detail -on the Plan 9 site. +The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail +elsewhere. If you plan to write assembly language, you should read that document although much of it is Plan 9-specific. -This document provides a summary of the syntax and +The current document provides a summary of the syntax and the differences with +what is explained in that document, and describes the peculiarities that apply when writing assembly code to interact with Go.

@@ -25,10 +25,12 @@ Some of the details map precisely to the machine, but some do not. This is because the compiler suite (see this description) needs no assembler pass in the usual pipeline. -Instead, the compiler emits a kind of incompletely defined instruction set, in binary form, which the linker -then completes. -In particular, the linker does instruction selection, so when you see an instruction like MOV -what the linker actually generates for that operation might not be a move instruction at all, perhaps a clear or load. +Instead, the compiler operates on a kind of semi-abstract instruction set, +and instruction selection occurs partly after code generation. +The assembler works on the semi-abstract form, so +when you see an instruction like MOV +what the tool chain actually generates for that operation might +not be a move instruction at all, perhaps a clear or load. Or it might correspond exactly to the machine instruction with that name. In general, machine-specific operations tend to appear as themselves, while more general concepts like memory move and subroutine call and return are more abstract. @@ -36,13 +38,15 @@ The details vary with architecture, and we apologize for the imprecision; the si

-The assembler program is a way to generate that intermediate, incompletely defined instruction sequence -as input for the linker. +The assembler program is a way to parse a description of that +semi-abstract instruction set and turn it into instructions to be +input to the linker. If you want to see what the instructions look like in assembly for a given architecture, say amd64, there are many examples in the sources of the standard library, in packages such as runtime and math/big. -You can also examine what the compiler emits as assembly code: +You can also examine what the compiler emits as assembly code +(the actual output may differ from what you see here):

@@ -52,7 +56,7 @@ package main
 func main() {
 	println(3)
 }
-$ go tool compile -S x.go        # or: go build -gcflags -S x.go
+$ GOOS=linux GOARCH=amd64 go tool compile -S x.go        # or: go build -gcflags -S x.go
 
 --- prog list "main" ---
 0000 (x.go:3) TEXT    main+0(SB),$8-0
@@ -106,20 +110,73 @@ codeblk [0x2000,0x1d059) at offset 0x1000
 
 -->
 
+

Constants

+ +

+Although the assembler takes its guidance from the Plan 9 assemblers, +it is a distinct program, so there are some differences. +One is in constant evaluation. +Constant expressions in the assembler are parsed using Go's operator +precedence, not the C-like precedence of the original. +Thus 3&1<<2 is 4, not 0—it parses as (3&1)<<2 +not 3&(1<<2). +Also, constants are always evaluated as 64-bit unsigned integers. +Thus -2 is not the integer value minus two, +but the unsigned 64-bit integer with the same bit pattern. +The distinction rarely matters but +to avoid ambiguity, division or right shift where the right operand's +high bit is set is rejected. +

+

Symbols

-Some symbols, such as PC, R0 and SP, are predeclared and refer to registers. -There are two other predeclared symbols, SB (static base) and FP (frame pointer). -All user-defined symbols other than jump labels are written as offsets to these pseudo-registers. +Some symbols, such as R1 or LR, +are predefined and refer to registers. +The exact set depends on the architecture. +

+ +

+There are four predeclared symbols that refer to pseudo-registers. +These are not real registers, but rather virtual registers maintained by +the tool chain, such as a frame pointer. +The set of pseudo-registers is the same for all architectures: +

+ +
    + +
  • +FP: Frame pointer: arguments and locals. +
  • + +
  • +PC: Program counter: +jumps and branches. +
  • + +
  • +SB: Static base pointer: global symbols. +
  • + +
  • +SP: Stack pointer: top of stack. +
  • + +
+ +

+All user-defined symbols are written as offsets to the pseudo-registers +FP (arguments and locals) and SB (globals).

The SB pseudo-register can be thought of as the origin of memory, so the symbol foo(SB) is the name foo as an address in memory. This form is used to name global functions and data. -Adding <> to the name, as in foo<>(SB), makes the name +Adding <> to the name, as in foo<>(SB), makes the name visible only in the current source file, like a top-level static declaration in a C file. +Adding an offset to the name refers to that offset from the symbol's address, so +a+4(SB) is four bytes past the start of foo.

@@ -128,9 +185,19 @@ used to refer to function arguments. The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register. Thus 0(FP) is the first argument to the function, 8(FP) is the second (on a 64-bit machine), and so on. -When referring to a function argument this way, it is conventional to place the name +However, when referring to a function argument this way, it is necessary to place a name at the beginning, as in first_arg+0(FP) and second_arg+8(FP). -Some of the assemblers enforce this convention, rejecting plain 0(FP) and 8(FP). +(The meaning of the offset—offset from the frame pointer—distinct +from its use with SB, where it is an offset from the symbol.) +The assembler enforces this convention, rejecting plain 0(FP) and 8(FP). +The actual name is semantically irrelevant but should be used to document +the argument's name. +It is worth stressing that FP is always a +pseudo-register, not a hardware +register, even on architectures with a hardware frame pointer. +

+ +

For assembly functions with Go prototypes, go vet will check that the argument names and offsets match. On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding @@ -145,13 +212,53 @@ prepared for function calls. It points to the top of the local stack frame, so references should use negative offsets in the range [−framesize, 0): x-8(SP), y-4(SP), and so on. -On architectures with a real register named SP, the name prefix distinguishes -references to the virtual stack pointer from references to the architectural SP register. -That is, x-8(SP) and -8(SP) are different memory locations: -the first refers to the virtual stack pointer pseudo-register, while the second refers to the +

+ +

+On architectures with a hardware register named SP, +the name prefix distinguishes +references to the virtual stack pointer from references to the architectural +SP register. +That is, x-8(SP) and -8(SP) +are different memory locations: +the first refers to the virtual stack pointer pseudo-register, +while the second refers to the hardware's SP register.

+

+On machines where SP and PC are +traditionally aliases for a physical, numbered register, +in the Go assembler the names SP and PC +are still treated specially; +for instance, references to SP require a symbol, +much like FP. +To access the actual hardware register use the true R name. +For example, on the ARM architecture the hardware +SP and PC are accessible as +R13 and R15. +

+ +

+Branches and direct jumps are always written as offsets to the PC, or as +jumps to labels: +

+ +
+label:
+	MOVW $0, R1
+	JMP label
+
+ +

+Each label is visible only within the function in which it is defined. +It is therefore permitted for multiple functions in a file to define +and use the same label names. +Direct jumps and call instructions can target text symbols, +such as name(SB), but not offsets from symbols, +such as name+4(SB). +

+

Instructions, registers, and assembler directives are always in UPPER CASE to remind you that assembly programming is a fraught endeavor. @@ -312,11 +419,17 @@ This data contains no pointers and therefore does not need to be scanned by the garbage collector.

  • -WRAPPER = 32 +WRAPPER = 32
    (For TEXT items.) This is a wrapper function and should not count as disabling recover.
  • +
  • +NEEDCTXT = 64 +
    +(For TEXT items.) +This function is a closure so it uses its incoming context register. +
  • Runtime Coordination