tequilaOS/platform_bionic

Author	SHA1	Message	Date
Christopher Ferris	59a13c122e	Optimize __memset_chk, __memcpy_chk. DO NOT MERGE. This change creates assembler versions of __memcpy_chk/__memset_chk that is implemented in the memcpy/memset assembler code. This change avoids an extra call to memcpy/memset, instead allowing a simple fall through to occur from the chk code into the body of the real implementation. Testing: - Ran the libc_test on __memcpy_chk/__memset_chk on all nexus devices. - Wrote a small test executable that has three calls to __memcpy_chk and three calls to __memset_chk. First call dest_len is length + 1. Second call dest_len is length. Third call dest_len is length - 1. Verified that the first two calls pass, and the third fails. Examined the logcat output on all nexus devices to verify that the fortify error message was sent properly. - I benchmarked the new __memcpy_chk and __memset_chk on all systems. For __memcpy_chk and large copies, the savings is relatively small (about 1%). For small copies, the savings is large on cortex-a15/krait devices (between 5% to 30%). For cortex-a9 and small copies, the speed up is present, but relatively small (about 3% to 5%). For __memset_chk and large copies, the savings is also small (about 1%). However, all processors show larger speed-ups on small copies (about 30% to 100%). Bug: 9293744 Merge from internal master. (cherry-picked from `7c860db074`) Change-Id: I916ad305e4001269460ca6ebd38aaa0be8ac7f52	2013-08-14 18:14:43 -07:00
Christopher Ferris	4e24dcc8d8	Optimize strcat/strcpy, small tweaks to strlen. DO NOT MERGE Create one version of strcat/strcpy/strlen for cortex-a15/krait and another version for cortex-a9. Tested with the libc_test strcat/strcpy/strlen tests. Including new tests that verify that the src for strcat/strcpy do not overread across page boundaries. NOTE: The handling of unaligned strcpy (same code in strcat) could probably be optimized further such that the src is read 64 bits at a time instead of the partial reads occurring now. strlen improves slightly since it was recently optimized. Performance improvements for strcpy and strcat (using an empty dest string): cortex-a9 - Small copies vary from about 5% to 20% as the size gets above 10 bytes. - Copies >= 1024, about a 60% improvement. - Unaligned copies, from about 40% improvement. cortex-a15 - Most small copies exhibit a 100% improvement, a few copies only improve by 20%. - Copies >= 1024, about 150% improvement. - Unaligned copies, about 100% improvement. krait - Most small copies vary widely, but on average 20% improvement, then the performance gets better, hitting about a 100% improvement when copies 64 bytes of data. - Copies >= 1024, about 100% improvement. - When coping MBs of data, about 50% improvement. - Unaligned copies, about 90% improvement. As strcat destination strings get larger in size: cortex-a9 - about 40% improvement for small dst strings (>= 32). - about 250% improvement for dst strings >= 1024. cortex-a15 - about 200% improvement for small dst strings (>=32). - about 250% improvement for dst strings >= 1024. krait - about 25% improvement for small dst strings (>=32). - about 100% improvement for dst strings >=1024. Merge from internal master. (cherry-picked from `d119b7b6f4`) Change-Id: I296463b251ef9fab004ee4dded2793feca5b547a	2013-08-08 11:13:46 -07:00
Christopher Ferris	0aa9b52efa	Add new optimized strlen for arm. This optimized version is primarily targeted at cortex-a15. Tested on all nexus devices using the system/extras/libc_test strlen test. Tested alignments from 1 to 32 that are powers of 2. Tested that strlen does not cross page boundaries at all alignments. Speed improvements listed below: cortex-a15 - Sizes >= 32 bytes, ~75% improvement. - Sizes >= 1024 bytes, ~250% improvement. cortex-a9 - Sizes >= 32 bytes, ~75% improvement. - Sizes >= 1024 bytes, ~85% improvement. krait - Sizes >= 32 bytes, ~95% improvement. - Sizes >= 1024 bytes, ~160% improvement. Merge from internal master. (cherry-picked from `2fc0717977`) Change-Id: I1ceceb4e745fd68e9d946f96d1d42e0cdaff6ccf	2013-07-16 16:47:37 -07:00
Christopher Ferris	796cbe249b	Rewrite memset for cortexa15 to use strd. Merge from internal master. (cherry-picked from commit `7ffad9c120`) Change-Id: Ia67f2a545399f4fa37b63d5634a3565e4f5482f9	2013-04-12 10:58:25 -07:00
Christopher Ferris	bf0d1ad72b	Add missing branch in memcpy.S dst aligned case. Merge from internal master. (cherry-picked from commit `6ffaa931c3`) Change-Id: Ifdcf01fd122866cf0d4c5b5f7a997803561d7889	2013-04-10 17:21:29 -07:00
Christopher Ferris	185ce72d00	Update to latest cortexa15 memcpy code. This uses the new code original submitted as memcpy.a15.S as the base. However, the old code handled unaligned src/dst better so that was spliced in. I optimized the original unaligned code by removing a few unnecessary instructions. I optimized the a15 code by rewriting the pre and post code. I also modified the main loop to add a pld so that larger copies would not stall waiting for memory. Test cases for the new memcpy: - Copy all sized values from 0 to 1024 bytes, using whatever alignment is returned by malloc. For each alignment case described below, the test copied from 0 to 128 bytes. - Src and dst pointers are both aligned to the same value, starting at one going through every power of two up to and including 128. - Src aligned to double word boundary, dst aligned to word boundary. - Src aligned to word boundary, dst aligned to double word boundary. - Src aligned to 16 bit boundary, dst aligned to word boundary. - Src aligned to word boundary, dst aligned to 16 byte boundary. - Src aligned to word boundary, dst aligned to 1 byte from a word boundary. - Src aligned to word boundary, dst aligned to 2 bytes from a word boundary. - Src aligned to word boundary, dst aligned to 3 bytes from a word boundary. - Src aligned to 1 byte from a word boundary, dst aligned to a word boundary. - Src aligned to 2 bytes from a word boundary, dst aligned to a word boundary. - Src aligned to 3 bytes from a word boundary, dst aligned to a word boundary. Cases to verify the unaligned source code properly aligns to a 16 bit boundary. - Src aligned to 1 byte from a 128 bit boundary, dst aligned to 4 + 128 bit boundary. - Src aligned to 1 byte from a 128 bit boundary, dst aligned to 8 + 128 bit boundary. - Src aligned to 1 byte from a 128 bit boundary, dst aligned to 12 + 128 bit boundary. - Src aligned to 1 byte from a 128 bit boundary, dst aligned to 16 + 128 bit boundary. In all cases, a two byte fencepost was placed at the end of the destination to verify that only the requested number of bytes were copied. Bug: 8005082 Merge from internal master. (cherry-picked from commit `21ede92d79`) Change-Id: Ief70c9e6dc8c6473ae245b6570b2c266fed9618c	2013-04-08 18:13:35 -07:00
Christopher Ferris	31dea25b8b	Create arch specific versions of strcmp. This uses the new strcmp.a15.S code as the basis for new versions of strcmp.S. The cortex-a15 code is the performance optimized version of strcmp.a15.S taken with only the addition of a few pld instructions. The cortex-a9 code is the same as the cortex-a15 code except that the unaligned strcmp code was taken from the original strcmp.S. The krait code is the same as the cortex-a15 code except that one path in the unaligned strcmp code was taken from the original strcmp.S code (the 2 byte overlap case). The generic code is the original unmodified strmp.S from the bionic subdirectory. All three new versions underwent these test cases: Strings the same, all same size: - Both pointers double word aligned. - One pointer double word aligned, one pointer word aligned. - Both pointers word aligned. - One pointer double word aligned, one pointer 1 off a word alignment. - One pointer double word aligned, one pointer 2 off a word alignment. - One pointer double word aligned, one pointer 3 off a word alignment. - One pointer word aligned, one pointer 1 off a word alignment. - One pointer word aligned, one pointer 2 off a word alignment. - One pointer word aligned, one pointer 3 off a word alignment. For all cases where it made sense, the two pointers were also tested swapped. Different strings, all same size: - Single difference at double word boundary. - Single difference at word boudary. - Single difference at 1 off a word alignment. - Single difference at 2 off a word alignment. - Single difference at 3 off a word alignment. Different sized strings, strings the same until the end: - Shorter string ends on a double word boundary. - Shorter string ends on word boundary. - Shorter string ends at 1 off a word boundary. - Shorter string ends at 2 off a word boundary. - Shorter string ends at 3 off a word boundary. For all different cases, run them through the same pointer alignment cases when the strings are the same size. For all cases the two pointers were also tested swapped. Bug: 8005082 Merge from internal master. (cherry-picked from commit `a9a5870d16`) Change-Id: I4c2b98f8a50804fb98ab67f75e9d660f1315a144	2013-03-20 14:33:54 -07:00
Christopher Ferris	04954a43b3	Break bionic implementations into arch versions. Move arch specific code for arm, mips, x86 into separate makefiles. In addition, add different arm cpu versions of memcpy/memset. Bug: 8005082 Merge from internal master (`acdde8c1cf`). Change-Id: I04f3d0715104fab618e1abf7cf8f7eec9bec79df	2013-03-12 14:06:08 -07:00

8 commits