Commit graph

3 commits

Author SHA1 Message Date
Christopher Ferris
4e24dcc8d8 Optimize strcat/strcpy, small tweaks to strlen. DO NOT MERGE
Create one version of strcat/strcpy/strlen for cortex-a15/krait and another
version for cortex-a9.

Tested with the libc_test strcat/strcpy/strlen tests.
Including new tests that verify that the src for strcat/strcpy do not
overread across page boundaries.

NOTE: The handling of unaligned strcpy (same code in strcat) could probably
be optimized further such that the src is read 64 bits at a time instead of
the partial reads occurring now.

strlen improves slightly since it was recently optimized.

Performance improvements for strcpy and strcat (using an empty dest string):

cortex-a9
- Small copies vary from about 5% to 20% as the size gets above 10 bytes.
- Copies >= 1024, about a 60% improvement.
- Unaligned copies, from about 40% improvement.

cortex-a15
- Most small copies exhibit a 100% improvement, a few copies only
  improve by 20%.
- Copies >= 1024, about 150% improvement.
- Unaligned copies, about 100% improvement.

krait
- Most small copies vary widely, but on average 20% improvement, then
  the performance gets better, hitting about a 100% improvement when
  copies 64 bytes of data.
- Copies >= 1024, about 100% improvement.
- When coping MBs of data, about 50% improvement.
- Unaligned copies, about 90% improvement.

As strcat destination strings get larger in size:

cortex-a9
- about 40% improvement for small dst strings (>= 32).
- about 250% improvement for dst strings >= 1024.

cortex-a15
- about 200% improvement for small dst strings (>=32).
- about 250% improvement for dst strings >= 1024.

krait
- about 25% improvement for small dst strings (>=32).
- about 100% improvement for dst strings >=1024.

Merge from internal master.

(cherry-picked from d119b7b6f4)

Change-Id: I296463b251ef9fab004ee4dded2793feca5b547a
2013-08-08 11:13:46 -07:00
Kenny Root
420878c690 Add function marks and size indications
Add a macro to annotate function end and start using both ENTRY and END
for each function. This allows valgrind (and presumably other debugging
tools) to use the debug symbols to trace the functions.

Change-Id: I5f09cef8e22fb356eb6f5cee952b031e567599b6
2011-02-17 09:07:25 -08:00
Jim Huang
73c04b3269 bionic: Add ARM optimized strcpy()
Reference results of the experiments on Qualcomm MSM7x25 (524MHz):

[original C code]
             prc thr   usecs/call      samples   errors cnt/samp
size
strcpy_1k      1   1     14.56159           99        0     1000
1024

[ARM optimized code]
             prc thr   usecs/call      samples   errors cnt/samp
size
strcpy_1k      1   1      3.46653           99        0     1000
1024

The work was derived from ARM Ltd.

Change-Id: I906ac53bb7a7285e14693c77d3ce8d4ed6f98bfd
2010-11-22 13:01:32 -08:00