Do you have an idea on how to display the contents of the xmm registers in a better way?
Meaning of content of XMM registers is not fixed. In other words, citing "IA-32 Intel Architecture Software Developer's Manual; Volume 1, Basic Architecture", section 11.6.9:
SSE and SSE2 extensions define typed operations on packed and scalar floating-point data types
and on 128-bit SIMD integer data types, but IA-32 processors do not enforce this typing at the
architectural level. They only enforce it at the microarchitectural level. Therefore, when a
Pentium 4 or Intel Xeon processor loads a packed or scalar floating-point operand or a 128-bit
packed integer operand from memory into an XMM register, it does not check that the actual
data being loaded matches the data type specified in the instruction. Likewise, when the
processor performs an arithmetic operation on the data in an XMM register, it does not check
that the data being operated on matches the data type specified in the instruction.
As a general rule, because data typing of SIMD floating-point and integer data types is not
enforced at the architectural level, it is the responsibility of the programmer, assembler, or
compiler to insure that code enforces data typing.
(italisized by me)
Moreover, mixing different interpretations of data in XMM register is directly allowed (for some cases) in the manual:
The ability to operate on an operand that contains a data type that is inconsistent with the typing
of the instruction being executed, permits some valid operations to be performed. For example,
the following instructions load a packed double-precision floating-point operand from memory
to register XMM0, and a mask to register XMM1; then they use XORPD to toggle the sign bits
of the two packed values in register XMM0.
movapdxmm0, [eax] ; EAX register contains pointer to packed
; double-precision floating-point operand
movapsxmm1, [ebx] ; EBX register contains pointer to packed
; double-precision floating-point mask
xorpdxmm0, xmm1 ; XOR operation toggles sign bits using
; the mask in xmm1
As a result, IMHO, it is better let user choose, how to interprete content of XMM registers in per-register basis. (Example - implementation of viewing XMM(x) in the Insight debugger.)
Note about data types:
SSE introduces one data type: 128-bit packed single-precision floating-point data type.
SSE2 introduces 5 new types:
- Packed double-precision floating-point
- 128-bit packed byte integers
- 128-bit packed word integers
- 128-bit packed doubleword integers
- 128-bit packed quadword integers
SSE3 does not introduce new data types.
Displaying MXCSR control/status register will be useful too
To make matters worse, the register set and what can be in which register also depends on your CPU (MMX/SSE/SSE2/SSE3).
Is gdb showing contents of not present in current CPU registers??
(It do shows "Packed double-precision floating-point" ans so on interpretations in processor without SSE2)
<edit>Forgot to mention another problem: Since the MMX registers and the floating point registers are indeed the the same physical set of registers, you have yet another thing which you have to guess to display it right.</edit>
For FPU/MMX pair situation is little bit different. It looks like some heuristic analysis can be done here. According to the manual:
When an MMX instruction (other than the EMMS instruction) is executed, the processor
changes the x87 FPU state as follows:
- The TOS (top of stack) value of the x87 FPU status word is set to 0.
- The entire x87 FPU tag word is set to the valid state (00B in all tag fields).
- When an MMX instruction writes to an MMX register, it writes ones (11B) to the exponent part of the corresponding floating-point register (bits 64 through 79).
But it can be complicated enough and not necessary. IMHO, here approach, mentioned before for the XMM is the best too. Generally speaking, almost every program starts, assuming that FPU contains no data, so treating FPU registers as FPU registers
(not MMX registers) is reasonable default.
Of cause, there can be two global user-selectable displaying modes - FPU and MMX; and MMX having "submodes", individual for each register ("Packed byte, packed word, packed doubleword")
What do you think about such a model?