diff --git a/doc/asm.html b/doc/asm.html index 3a05d46aeb..c954079b66 100644 --- a/doc/asm.html +++ b/doc/asm.html @@ -738,6 +738,13 @@ The other codes are -> (arithmetic right shift), The ARM64 port is in an experimental state.

+

+R18 is the "platform register", reserved on the Apple platform. +R27 and R28 are reserved by the compiler and linker. +R29 is the frame pointer. +R30 is the link register. +

+

Instruction modifiers are appended to the instruction following a period. The only modifiers are P (postincrement) and W @@ -752,11 +759,61 @@ Addressing modes:

+

+Reference: Go ARM64 Assembly Instructions Reference Manual +

+

64-bit PowerPC, a.k.a. ppc64

diff --git a/src/cmd/internal/obj/arm64/doc.go b/src/cmd/internal/obj/arm64/doc.go index d06025d21c..d98b1b6f9e 100644 --- a/src/cmd/internal/obj/arm64/doc.go +++ b/src/cmd/internal/obj/arm64/doc.go @@ -1,334 +1,201 @@ -// Copyright 2017 The Go Authors. All rights reserved. +// Copyright 2018 The Go Authors. All rights reserved. // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -package arm64 - /* +Package arm64 implements an ARM64 assembler. Go assembly syntax is different from GNU ARM64 +syntax, but we can still follow the general rules to map between them. -Go Assembly for ARM64 Reference Manual - -1. Alphabetical list of basic instructions - // TODO - - LDARB: Load-Acquire Register Byte - LDARB (), - Loads a byte from memory, zero-extends it and writes it to Rd. - - LDARH: Load-Acquire Register Halfword - LDARH (), - Loads a halfword from memory, zero-extends it and writes it to Rd. - - LDAXP: Load-Acquire Exclusive Pair of Registers - LDAXP (), (, ) - Loads two 64-bit doublewords from memory, and writes them to Rt1 and Rt2. - - LDAXPW: Load-Acquire Exclusive Pair of Registers - LDAXPW (), (, ) - Loads two 32-bit words from memory, and writes them to Rt1 and Rt2. - - LDXP: 64-bit Load Exclusive Pair of Registers - LDXP (), (, ) - Loads two 64-bit doublewords from memory, and writes them to Rt1 and Rt2. - - LDXPW: 32-bit Load Exclusive Pair of Registers - LDXPW (), (, ) - Loads two 32-bit words from memory, and writes them to Rt1 and Rt2. - - MOVD|MOVW|MOVH|MOVHU|MOVB|MOVBU: Load Register (register offset) - MOVD (Rn)(Rm.UXTW<<3), Rt - MOVD (Rn)(Rm.SXTX), Rt - MOVD (Rn)(Rm<<3), Rt - MOVD (Rn)(Rm), Rt - MOVB|MOVBU (Rn)(Rm.UXTW), Rt - - MOVD|MOVW|MOVH|MOVB: Stote Register (register offset) - MOVD Rt, (Rn)(Rm.UXTW<<3) - MOVD Rt, (Rn)(Rm.SXTX) - MOVD Rt, (Rn)(Rm) - - PRFM: Prefetch Memory (immediate) - PRFM imm(Rn), - prfop is the prefetch operation and can have the following values: - PLDL1KEEP, PLDL1STRM, PLDL2KEEP, PLDL2STRM, PLDL3KEEP, PLDL3STRM, - PLIL1KEEP, PLIL1STRM, PLIL2KEEP, PLIL2STRM, PLIL3KEEP, PLIL3STRM, - PSTL1KEEP, PSTL1STRM, PSTL2KEEP, PSTL2STRM, PSTL3KEEP, PSTL3STRM. - PRFM imm(Rn), $imm - $imm prefetch operation is encoded as an immediate. - - STLRB: Store-Release Register Byte - STLRB , () - Stores a byte from Rd to a memory location from Rn. - - STLRH: Store-Release Register Halfword - STLRH , () - Stores a halfword from Rd to a memory location from Rn. - - STLXP: 64-bit Store-Release Exclusive Pair of registers - STLXP (, ), (), - Stores two 64-bit doublewords from Rt1 and Rt2 to a memory location from Rn, - and returns in Rs a status value of 0 if the store was successful, or of 1 if - no store was performed. - - STLXPW: 32-bit Store-Release Exclusive Pair of registers - STLXPW (, ), (), - Stores two 32-bit words from Rt1 and Rt2 to a memory location from Rn, and - returns in Rs a status value of 0 if the store was successful, or of 1 if no - store was performed. - - STXP: 64-bit Store Exclusive Pair of registers - STXP (, ), (), - Stores two 64-bit doublewords from Rt1 and Rt2 to a memory location from Rn, - and returns in Rs a status value of 0 if the store was successful, or of 1 if - no store was performed. - - STXPW: 32-bit Store Exclusive Pair of registers - STXPW (, ), (), - Stores two 32-bit words from Rt1 and Rt2 to a memory location from Rn, and returns in - a Rs a status value of 0 if the store was successful, or of 1 if no store was performed. - -2. Alphabetical list of float-point instructions - // TODO - - FMADDD: 64-bit floating-point fused Multiply-Add - FMADDD , , , - Multiplies the values of and , - adds the product to , and writes the result to . - - FMADDS: 32-bit floating-point fused Multiply-Add - FMADDS , , , - Multiplies the values of and , - adds the product to , and writes the result to . - - FMSUBD: 64-bit floating-point fused Multiply-Subtract - FMSUBD , , , - Multiplies the values of and , negates the product, - adds the product to , and writes the result to . - - FMSUBS: 32-bit floating-point fused Multiply-Subtract - FMSUBS , , , - Multiplies the values of and , negates the product, - adds the product to , and writes the result to . - - FNMADDD: 64-bit floating-point negated fused Multiply-Add - FNMADDD , , , - Multiplies the values of and , negates the product, - subtracts the value of , and writes the result to . - - FNMADDS: 32-bit floating-point negated fused Multiply-Add - FNMADDS , , , - Multiplies the values of and , negates the product, - subtracts the value of , and writes the result to . - - FNMSUBD: 64-bit floating-point negated fused Multiply-Subtract - FNMSUBD , , , - Multiplies the values of and , - subtracts the value of , and writes the result to . - - FNMSUBS: 32-bit floating-point negated fused Multiply-Subtract - FNMSUBS , , , - Multiplies the values of and , - subtracts the value of , and writes the result to . - -3. Alphabetical list of SIMD instructions - VADD: Add (scalar) - VADD , , - Add corresponding low 64-bit elements in and , - place the result into low 64-bit element of . - - VADD: Add (vector). - VADD .T, ., . - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D2 - - VADDP: Add Pairwise (vector) - VADDP ., ., . - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D2 - - VADDV: Add across Vector. - VADDV ., Vd - Is an arrangement specifier and can have the following values: - 8B, 16B, H4, H8, S4 - - VAND: Bitwise AND (vector) - VAND ., ., . - Is an arrangement specifier and can have the following values: - B8, B16 - - VCMEQ: Compare bitwise Equal (vector) - VCMEQ ., ., . - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D2 - - VDUP: Duplicate vector element to vector or scalar. - VDUP .[index], . - Is an arrangement specifier and can have the following values: - 8B, 16B, H4, H8, S2, S4, D2 - Is an element size specifier and can have the following values: - B, H, S, D - - VEOR: Bitwise exclusive OR (vector, register) - VEOR ., ., . - Is an arrangement specifier and can have the following values: - B8, B16 - - VFMLA: Floating-point fused Multiply-Add to accumulator (vector) - VFMLA ., ., . - Is an arrangement specifier and can have the following values: - S2, S4, D2 - - VFMLS: Floating-point fused Multiply-Subtract from accumulator (vector) - VFMLS ., ., . - Is an arrangement specifier and can have the following values: - S2, S4, D2 - - VEXT: Extracts vector elements from src SIMD registers to dst SIMD register - VEXT $index, ., ., . - is an arrangment specifier and can be B8, B16 - $index is the lowest numbered byte element to be exracted. - - VLD1: Load multiple single-element structures - VLD1 (Rn), [., . ...] // no offset - VLD1.P imm(Rn), [., . ...] // immediate offset variant - VLD1.P (Rn)(Rm), [., . ...] // register offset variant - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D1, D2 - - VLD1: Load one single-element structure - VLD1 (Rn), .[index] // no offset - VLD1.P imm(Rn), .[index] // immediate offset variant - VLD1.P (Rn)(Rm), .[index] // register offset variant - is an arrangement specifier and can have the following values: - B, H, S D - - VMOV: move - VMOV .[index], Rd // Move vector element to general-purpose register. - Is a source width specifier and can have the following values: - B, H, S (Wd) - D (Xd) - - VMOV Rn, . // Duplicate general-purpose register to vector. - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4 (Wn) - D2 (Xn) - - VMOV ., . // Move vector. - Is an arrangement specifier and can have the following values: - B8, B16 - - VMOV Rn, .[index] // Move general-purpose register to a vector element. - Is a source width specifier and can have the following values: - B, H, S (Wd) - D (Xd) - - VMOV .[index], Vn // Move vector element to scalar. - Is an element size specifier and can have the following values: - B, H, S, D - - VMOV .[index], .[index] // Move vector element to another vector element. - Is an element size specifier and can have the following values: - B, H, S, D - - VMOVI: Move Immediate (vector). - VMOVI $imm8, . - is an arrangement specifier and can have the following values: - 8B, 16B - - VMOVS: Load SIMD&FP Register (immediate offset). ARMv8: LDR (immediate, SIMD&FP) - Store SIMD&FP register (immediate offset). ARMv8: STR (immediate, SIMD&FP) - VMOVS (Rn), Vn - VMOVS.W imm(Rn), Vn - VMOVS.P imm(Rn), Vn - VMOVS Vn, (Rn) - VMOVS.W Vn, imm(Rn) - VMOVS.P Vn, imm(Rn) - - VORR: Bitwise inclusive OR (vector, register) - VORR ., ., . - Is an arrangement specifier and can have the following values: - B8, B16 - - VRBIT: Reverse bit order (vector) - VRBIT ., . - is an arrangment specifier and can be B8, B16 - - VREV32: Reverse elements in 32-bit words (vector). - REV32 ., . - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8 - - VREV64: Reverse elements in 64-bit words (vector). - REV64 ., . - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4 - - VSHL: Shift Left(immediate) - VSHL $shift, ., . - is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D1, D2 - $shift Is the left shift amount - - VST1: Store multiple single-element structures - VST1 [., . ...], (Rn) // no offset - VST1.P [., . ...], imm(Rn) // immediate offset variant - VST1.P [., . ...], (Rn)(Rm) // register offset variant - Is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D1, D2 - - VSUB: Sub (scalar) - VSUB , , - Subtract low 64-bit element in from the corresponding element in , - place the result into low 64-bit element of . - - VUADDLV: Unsigned sum Long across Vector. - VUADDLV ., Vd - Is an arrangement specifier and can have the following values: - 8B, 16B, H4, H8, S4 - - VST1: Store one single-element structure - VST1 .., (Rn) // no offset - VST1.P .., imm(Rn) // immediate offset variant - VST1.P .., (Rn)(Rm) // register offset variant - Is an arrangement specifier and can have the following values: - B, H, S, D - - VUSHR: Unsigned shift right(immediate) - VUSHR $shift, ., . - is an arrangement specifier and can have the following values: - B8, B16, H4, H8, S2, S4, D1, D2 - $shift is the right shift amount - - -4. Alphabetical list of cryptographic extension instructions - - VPMULL{2}: Polynomial multiply long. - VPMULL{2} ., ., . - VPMULL multiplies corresponding elements in the lower half of the - vectors of two source SIMD registers and VPMULL{2} operates in the upper half. - is an arrangement specifier, it can be H8, Q1 - is an arrangement specifier, it can be B8, B16, D1, D2 - - SHA1C, SHA1M, SHA1P: SHA1 hash update. - SHA1C .S4, Vn, Vd - SHA1M .S4, Vn, Vd - SHA1P .S4, Vn, Vd - - SHA1H: SHA1 fixed rotate. - SHA1H Vn, Vd - - SHA1SU0: SHA1 schedule update 0. - SHA256SU1: SHA256 schedule update 1. - SHA1SU0 .S4, .S4, .S4 - SHA256SU1 .S4, .S4, .S4 - - SHA1SU1: SHA1 schedule update 1. - SHA256SU0: SHA256 schedule update 0. - SHA1SU1 .S4, .S4 - SHA256SU0 .S4, .S4 - - SHA256H, SHA256H2: SHA256 hash update. - SHA256H .S4, Vn, Vd - SHA256H2 .S4, Vn, Vd +Instructions mnemonics mapping rules +1. Most instructions use width suffixes of instruction names to indicate operand width rather than +using different register names. + + Examples: + ADC R24, R14, R12 <=> adc x12, x24 + ADDW R26->24, R21, R15 <=> add w15, w21, w26, asr #24 + FCMPS F2, F3 <=> fcmp s3, s2 + FCMPD F2, F3 <=> fcmp d3, d2 + FCVTDH F2, F3 <=> fcvt h3, d2 + +2. Go uses .P and .W suffixes to indicate post-increment and pre-increment. + + Examples: + MOVD.P -8(R10), R8 <=> ldr x8, [x10],#-8 + MOVB.W 16(R16), R10 <=> ldr x10, [x16,#16]! + +3. Go uses a series of MOV instructions as load and store. + +64-bit variant ldr, str, stur => MOVD; +32-bit variant str, stur, ldrsw => MOVW; +32-bit variant ldr => MOVWU; +ldrb => MOVBU; ldrh => MOVHU; +ldrsb, sturb, strb => MOVB; +ldrsh, sturh, strh => MOVH. + +4. Go moves conditions into opcode suffix, like BLT. + +5. Go adds a V prefix for most floating-point and SIMD instrutions except cryptographic extension +instructions and floating-point(scalar) instructions. + + Examples: + VADD V5.H8, V18.H8, V9.H8 <=> add v9.8h, v18.8h, v5.8h + VLD1.P (R6)(R11), [V31.D1] <=> ld1 {v31.1d}, [x6], x11 + VFMLA V29.S2, V20.S2, V14.S2 <=> fmla v14.2s, v20.2s, v29.2s + AESD V22.B16, V19.B16 <=> aesd v19.16b, v22.16b + SCVTFWS R3, F16 <=> scvtf s17, w6 + +Special Cases. + +(1) umov is written as VMOV. + +(2) br is renamed JMP, blr is renamed CALL. + +(3) No need to add "W" suffix: LDARB, LDARH, LDAXRB, LDAXRH, LDTRH, LDXRB, LDXRH. + + Examples: + VMOV V13.B[1], R20 <=> mov x20, v13.b[1] + VMOV V13.H[1], R20 <=> mov w20, v13.h[1] + JMP (R3) <=> br x3 + CALL (R17) <=> blr x17 + LDAXRB (R19), R16 <=> ldaxrb w16, [x19] + + +Register mapping rules + +1. All basic register names are written as Rn. + +2. Go uses ZR as the zero register and RSP as the stack pointer. + +3. Bn, Hn, Dn, Sn and Qn instructions are written as Fn in floating-point instructions and as Vn +in SIMD instructions. + + +Argument mapping rules + +1. The operands appear in left-to-right assignment order. + +Go reverses the arguments of most instructions. + + Examples: + ADD R11.SXTB<<1, RSP, R25 <=> add x25, sp, w11, sxtb #1 + VADD V16, V19, V14 <=> add d14, d19, d16 + +Special Cases. + +(1) Argument order is the same as in the GNU ARM64 syntax: cbz, cbnz and some store instructions, +such as str, stur, strb, sturb, strh, sturh stlr, stlrb. stlrh, st1. + + Examples: + MOVD R29, 384(R19) <=> str x29, [x19,#384] + MOVB.P R30, 30(R4) <=> strb w30, [x4],#30 + STLRH R21, (R18) <=> stlrh w21, [x18] + +(2) MADD, MADDW, MSUB, MSUBW, SMADDL, SMSUBL, UMADDL, UMSUBL , , , + + Examples: + MADD R2, R30, R22, R6 <=> madd x6, x22, x2, x30 + SMSUBL R10, R3, R17, R27 <=> smsubl x27, w17, w10, x3 + + Examples: + FMADDD F30, F20, F3, F29 <=> fmadd d29, d3, d30, d20 + FNMSUBS F7, F25, F7, F22 <=> fnmsub s22, s7, s7, s25 + +(4) BFI, BFXIL, SBFIZ, SBFX, UBFIZ, UBFX $, , $, + + Examples: + BFIW $16, R20, $6, R0 <=> bfi w0, w20, #16, #6 + UBFIZ $34, R26, $5, R20 <=> ubfiz x20, x26, #34, #5 + +(5) FCCMPD, FCCMPS, FCCMPED, FCCMPES , Fm. Fn, $ + + Examples: + FCCMPD AL, F8, F26, $0 <=> fccmp d26, d8, #0x0, al + FCCMPS VS, F29, F4, $4 <=> fccmp s4, s29, #0x4, vs + FCCMPED LE, F20, F5, $13 <=> fccmpe d5, d20, #0xd, le + FCCMPES NE, F26, F10, $0 <=> fccmpe s10, s26, #0x0, ne + +(6) CCMN, CCMNW, CCMP, CCMPW , , $, $ + + Examples: + CCMP MI, R22, $12, $13 <=> ccmp x22, #0xc, #0xd, mi + CCMNW AL, R1, $11, $8 <=> ccmn w1, #0xb, #0x8, al + +(7) CCMN, CCMNW, CCMP, CCMPW , , , $ + + Examples: + CCMN VS, R13, R22, $10 <=> ccmn x13, x22, #0xa, vs + CCMPW HS, R18, R14, $11 <=> ccmp w18, w14, #0xb, cs + +(9) CSEL, CSELW, CSNEG, CSNEGW, CSINC, CSINCW , , , ; +FCSELD, FCSELS , , , + + Examples: + CSEL GT, R0, R19, R1 <=> csel x1, x0, x19, gt + CSNEGW GT, R7, R17, R8 <=> csneg w8, w7, w17, gt + FCSELD EQ, F15, F18, F16 <=> fcsel d16, d15, d18, eq + +(10) TBNZ, TBZ $, ,