Create one version of strcat/strcpy/strlen for cortex-a15/krait and another
version for cortex-a9.
Tested with the libc_test strcat/strcpy/strlen tests.
Including new tests that verify that the src for strcat/strcpy do not
overread across page boundaries.
NOTE: The handling of unaligned strcpy (same code in strcat) could probably
be optimized further such that the src is read 64 bits at a time instead of
the partial reads occurring now.
strlen improves slightly since it was recently optimized.
Performance improvements for strcpy and strcat (using an empty dest string):
cortex-a9
- Small copies vary from about 5% to 20% as the size gets above 10 bytes.
- Copies >= 1024, about a 60% improvement.
- Unaligned copies, from about 40% improvement.
cortex-a15
- Most small copies exhibit a 100% improvement, a few copies only
improve by 20%.
- Copies >= 1024, about 150% improvement.
- Unaligned copies, about 100% improvement.
krait
- Most small copies vary widely, but on average 20% improvement, then
the performance gets better, hitting about a 100% improvement when
copies 64 bytes of data.
- Copies >= 1024, about 100% improvement.
- When coping MBs of data, about 50% improvement.
- Unaligned copies, about 90% improvement.
As strcat destination strings get larger in size:
cortex-a9
- about 40% improvement for small dst strings (>= 32).
- about 250% improvement for dst strings >= 1024.
cortex-a15
- about 200% improvement for small dst strings (>=32).
- about 250% improvement for dst strings >= 1024.
krait
- about 25% improvement for small dst strings (>=32).
- about 100% improvement for dst strings >=1024.
Merge from internal master.
(cherry-picked from d119b7b6f4)
Change-Id: I296463b251ef9fab004ee4dded2793feca5b547a
Add a macro to annotate function end and start using both ENTRY and END
for each function. This allows valgrind (and presumably other debugging
tools) to use the debug symbols to trace the functions.
Change-Id: I5f09cef8e22fb356eb6f5cee952b031e567599b6