Currently Renderscript sample code RsBalls crashed on x86 when SSE2
enabled. The root cause is that the stack was not 16-byte aligned
from the beginning when the processes/threads were created, so the
RsBalls crashed when SSE2 instructions tried to access the variables
on the stack.
- For the thread created by fork():
Its stack alignment is determined by crtbegin_{dynamic, static}.S
- For the thread created by pthread_create():
Its stack alignment is determined by clone.S. __thread_entry( ) is
a standard C function. In order to have its stack be aligned with
16 byte properly, __thread_entry() needs the stack with following
layout when it is called:
layout #1 (correct)
--------------
| |
-------------- <--ESP (ECX - 20)
| ret EIP |
-------------- <--ECX - 16
| arg0 |
-------------- <--ECX - 12
| arg1 |
-------------- <--ECX - 8
| arg2 |
-------------- <--ECX - 4
| unused |
-------------- <--ECX (16-byte boundary)
But it has following layout for now:
layout #2: (incorrect)
--------------
| |
-------------- <--ESP (ECX - 16)
| unused |
-------------- <--ECX - 12
| arg0 |
-------------- <--ECX - 8
| arg1 |
-------------- <--ECX - 4
| arg2 |
-------------- <--ECX (16-byte boundary)
Fixed in this patch.
Change-Id: Ibe01f64db14be14033c505d854c73033556ddaa8
Signed-off-by: Michael Liao <michael.liao@intel.com>
Signed-off-by: H.J. Lu <hongjiu.lu@intel.com>
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
(1) in pthread_create:
If the one signal is received before esp is subtracted by 16 and
__thread_entry( ) is called, the stack will be cleared by kernel
when it tries to contruct the signal stack frame. That will cause
that __thread_entry will get a wrong tls pointer from the stack
which leads to the segment fault when trying to access tls content.
(2) in pthread_exit
After pthread_exit called system call unmap(), its stack will be
freed. If one signal is received at that time, there is no stack
available for it.
Fixed by subtracting the child's esp by 16 before the clone system
call and by blocking signal handling before pthread_exit is started.
Author: Jack Ren <jack.ren@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
Only provide an implementation for ARM at the moment, since
it requires specific assembly fragments (the standard syscall
stubs cannot be used because the child returns in a different
stack).