Commit graph

9 commits

Author SHA1 Message Date
George Burgess IV
6cb0687932 Split our FORTIFY implementation into libc_fortify
As requested in the bug. This also rips __memcpy_chk out of memcpy.S,
which lets us cut down on copypasta (all of the implementations look
identical).

Bug: 12231437
Test: mma on aosp_{arm,arm64,mips,x86,x86_64} internal master;
checkbuild on bullhead internal master; CtsBionicTestCases on bullhead.
No new failures.
Change-Id: I88c39ca166bacde0b692aa3063e743bb046a5d2f
2017-07-24 14:20:16 -07:00
Christopher Ferris
866e7b6906 Fix assembler warnings.
There are a few instructions deprecated on armv8 that result in lots
of warnings. Add an arch directive so that these warnings go away.

This doesn't cause any problems because the instructions still
execute properly.

Bug: 38319728

Test: Built all of these assembler files and verified the warning are gone.
Change-Id: If063defdd16f290c01975233c8d257d1b2005e76
2017-05-24 16:06:11 -07:00
Chitti Babu Theegala
cbfdc7f905 Fix streaming(memcpy) performance on Cortex-A7
Stream-mode detection for L1 in A7-core is failing for
non cache-line-size (non 64 byte) aligned addresses.
This leads to destination data getting cached unnecessarily.
This A7 issue is confirmed by ARM

This issue is solved by aligning destination address to 64 byte before
entering the loop in memcpy routine.
Though we get lower score for micro_bench memcpy when L1 cache is bypassed,
it is desirable since it avoids unnecessary eviction of other process data
from L1 which is good for overall system performance.

Higher micro_bench memcpy numbers for < 64byte alignment shows good numbers
but this is at the cost of L1 cache pollution. During memcpy/memset,
unnecessary data is filled in L1 cache, this causes eviction of other
process data from L1.
For example during msmset(0), L1 cache gets filled with 0s which should be
avoided.

Additionally, there is another issue with cortex A7 that impacts performance
for all alignments / all Android Wear versions:
Store Buffer on A7 is 32 byte which limits the 32-byte back to back stores.
In the current implementation back to back 32bytes writes is causing CPU stalls.
This issue can be solved by interleaved Loads and Stores.
This helps in avoiding CPU stalls during memcpy by utilizing efficiently the
A7 internal load and store buffers.

Change-Id: Ie5f12f2bb5d86f627686730416279057e4f5f6d0
2016-12-19 15:11:43 -08:00
Christopher Ferris
ecebb49ac6 Add cortex-a7 specific routines.
Test: Changed angler target to use cortex-a7 and I compiled.
Test: Booted this version on angler and ran bionic-unit-tests.

Change-Id: Ice7f6ea38a2569582161a8e659d7877918c1a45a
2016-11-28 12:49:36 -08:00
Elliott Hughes
382bd666e2 Stop including <machine/cpu-features.h>.
We're not looking at __ARM_ARCH__, because we don't support ARMv6.

Bug: http://b/18556103
Change-Id: I91fe096af697dc842a57e97515312e3530743678
2016-05-16 17:52:40 -07:00
Elliott Hughes
01d5b946ac Remove optimized code for bzero, which was removed from POSIX in 2008.
I'll come back for the last bcopy remnant...

Bug: http://b/26407170
Change-Id: Iabfeb95fc8a4b4b3992e3cc209ec5221040e7c26
2016-03-02 17:21:07 -08:00
Elliott Hughes
62e59646f8 Improve diagnostics from the assembler __memset_chk routines.
Change-Id: Ic165043ab8cd5e16866b3e11cfba960514cbdc57
2016-03-01 12:46:47 -08:00
Elliott Hughes
b83d6747fa Improve FORTIFY failure diagnostics.
Our FORTIFY _chk functions' implementations were very repetitive and verbose
but not very helpful. We'd also screwed up and put the SSIZE_MAX checks where
they would never fire unless you actually had a buffer as large as half your
address space, which probably doesn't happen very often.

Factor out the duplication and take the opportunity to actually show details
like how big the overrun buffer was, or by how much it was overrun.

Also remove the obsolete FORTIFY event logging.

Also remove the unused __libc_fatal_no_abort.

This change doesn't improve the diagnostics from the optimized assembler
implementations.

Change-Id: I176a90701395404d50975b547a00bd2c654e1252
2016-02-26 22:06:17 -08:00
Christopher Ferris
5930772286 Add optimized cortex-a7/cortex-a53 memset/memcpy.
Add an optimized memset that is ~20% faster for cortex-a7 and
cortex-a53.

Add a 32 bit optimized cortex-a53 memcpy that is about ~20% faster
on cached data.

Fix the cortex-a15 __str{cat,cpy}_chk.S, memcpy_base.S to remove
the phony functions, since they aren't needed any more. Then add
a direct include of these for cortex-a53.

Verified the new functions by stepping through all of the major
paths and verifying the backtrace is still correct.

Bug: 22696180
Change-Id: Iec92a3f82d51243cca76c9aff9f35d920ff865ae
2015-08-17 13:02:03 -07:00