233 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Elliott Hughes
|
f978a85cc3 |
Simplify Oryon ifunc resolvers.
Mainly just factoring out the code, but there are two functional changes here too: 1. The inline assembler was missing `volatile`, making the hwcap check ineffective (because the compiler would sometimes move the MIDR_EL1 read above the hwcap check). 2. The previous code accepted variants 0x0 to 0x5 while the comment said 0x1 to 0x5. The comment was correct. I resisted the temptation to actually have a table to search on the assumption that it'll be a while before we need such a thing. Bug: https://issuetracker.google.com/330105715 Change-Id: I9fdc1e70e49b26ef32794b55ca5e5fd37f1163f9 |
||
Vaisakh K V
|
54a612187d |
Custom memset implementation for Qualcomm Oryon CPU
Submitted on behalf of a third-party: Linaro Limited
License rights, if any, to the submission are granted solely by the
copyright owner of such submission under its applicable intellectual
property.
Copyright (c) 2012, Linaro Limited
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the Linaro nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Origin Project URL: https://android.googlesource.com/platform/bionic/
Commit ID:
|
||
Vaisakh K V
|
83e55841ea |
Custom memcpy implementation for Qualcomm Oryon CPU
Submitted on behalf of a third-party: Arm Limited License rights, if any, to the submission are granted solely by the copyright owner of such submission under its applicable intellectual property. Copyright (c) 2012-2022, Arm Limited. SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception Origin Project URL: https://github.com/ARM-software/optimized-routines Tag: v24.01 Third Party code includes additions/modifications from Qualcomm Innovation Center, Inc. Test: All Change-Id: I0c97398a435e3f8ddf8ad38bc6bd71cc0d78aea5 |
||
Elliott Hughes
|
cb47a4f671 |
Use ifuncs for memset and memrchr.
Not useful right now, but Qualcomm has an Oryon memset they'd like to use, and there's no reason to treat memrchr as a weird special case. Bug: https://issuetracker.google.com/330105715 Test: treehugger Change-Id: Id879479bf4f45433debcb3fe08cfa96bb1eb3b93 |
||
Florian Mayer
|
73750dc38e |
Move memtag_stack out of libc_globals
We cannot use a WriteProtected because we are accessing it in a multithreaded context. Test: atest memtag_stack_dlopen_test w/ MTE Test: atest bionic-unit-tests w/ MTE Test: atest bionic-unit-tests on _fullmte Bug: 328256432 Change-Id: I39faa75f97fd5b3fb755a46e88346c17c0e9a8e2 |
||
Florian Mayer
|
0e1412e08e |
Make memtag_handle_longjmp precise
We would get the SP inside of memtag_handle_longjmp, which could prevent us from detecting the case where a longjmp is going into a function that had already returned. This changes makes the behaviour more predictable. Change-Id: I75bf931c8f4129a2f38001156b7bbe0b54a726ee |
||
Elliott Hughes
|
d7831208b2 |
Fix assembler warnings.
clang complains if you define a symbol and _then_ make it weak, rather than the other way round: /tmp/setjmp-c3c977.s:90:1: warning: sigsetjmp changed binding to STB_WEAK .weak sigsetjmp; ^ Test: treehugger Change-Id: Iee6b0ea456bb2e92aea810ce45f171caabaa89d2 |
||
Elliott Hughes
|
20f9d67327 |
Fix the *return* types in the arm64 dynamic function dispatch.
No actual effect on the code, but misleading and wrong. (The previous change only fixed the argument types; I didn't notice that some of the return types were wrong too.) Test: treehugger Change-Id: I1ee5c48e2652fd8cbf8178d5659e57f79e61898e |
||
Elliott Hughes
|
a1974064ae |
Fix the types in the arm64 dynamic function dispatch.
No actually effect on the code, but misleading and wrong. Test: treehugger Change-Id: I55405ac224b4dcc2ae515954aed179c1cde3c73c |
||
Peter Collingbourne
|
b6a592b25b |
Make fork equivalent to vfork when HWASan or MTE stack tagging is enabled.
Bug: 274056091 Change-Id: Iac029ca6b0e26f57f20c0a54822b75e3cae67344 |
||
Elliott Hughes
|
9a7155dbbd |
riscv64 SCS support.
Bug: https://github.com/google/android-riscv64/issues/55 Test: treehugger Change-Id: I05d48a07a302305126942d38529ffa280640c7b7 |
||
Elliott Hughes
|
3d8e98f8bd |
Add (no-op) ifuncs for SVE optimized routines.
This patch doesn't *enable* the SVE optimized routines, but it does let us see if switching them to ifuncs will cause any app compat issues, so that we can more easily use the optimized routines in future. Test: treehugger Change-Id: Ic5fe570bd21687da397b48127bf688f7ec68dd0c |
||
Elliott Hughes
|
5ec0bfae50 |
Track upstream arm-optimized-routines changes.
The MTE-compatible routines are now faster than the incompatible ones, so they merged them upstream. I've left the ifunc boilerplate on the assumption that I'll be back later to enable the new SVE variants. Test: treehugger Change-Id: Ic894bfb350b9aa70e307bca1c4978624b3e5f4fd |
||
Elliott Hughes
|
023e4e7840 |
Move to arm-optimized-routines memset().
This one's a bit simpler, because there is only one upstream memset() implementation. Test: treehugger Change-Id: I2536d0eb72adaacfa6a0e40d2bd29fc833988c16 |
||
Elliott Hughes
|
7daf4596b7 |
Switch to the arm-optimized-routines memcpy() and memmove().
Outsource this to them, and choose the best of the two options available based on the hardware we're running on. Test: treehugger Change-Id: I2fa7555c971b64a6decca132210e901ffa248efa |
||
Treehugger Robot
|
d26d3c0b5c | Merge "Implement __memset_chk as a copy & paste of __memcpy_chk." | ||
Treehugger Robot
|
6c599e3a67 | Merge "Move memcpy_base.S into memcpy.S." | ||
Elliott Hughes
|
3cc366d3a2 |
Implement __memset_chk as a copy & paste of __memcpy_chk.
These two will stay behind when we move memcpy()/memmove()/memset() over to arm-optimized-routines (which leaves fortify to us). Test: treehugger Change-Id: Ie683f71a5a141263ce3f4e8811df9eaf667584f4 |
||
Elliott Hughes
|
d5ac40cc9f |
Move memcpy_base.S into memcpy.S.
Just to make it clear that there's nothing interesting going on here --- there's just one user, and the only symbol here is __memcpy(). Test: treehugger Change-Id: I62d72c43c4c6d30442f05c1e08a0cb1a1ec42a8a |
||
Elliott Hughes
|
0d4d276253 |
Remove assembler wmemmove().
The compiler turns our C wmemmove() into one shift instruction and a branch, which is plenty for a function no-one uses anyway. Why don't I just leave this alone, since we already have it? Because I'm looking at finishing the project of "switch to arm-optimized-routines" and getting rid of our assembler here, and Arm agrees that this isn't worth having optimized assembler for in their optimized assembler project, judging by its absence. Test: treehugger Change-Id: I985801241a8cbd7dbda51a447946affb1402effb |
||
Elliott Hughes
|
faac8e658c |
arm64: remove unnecessary duplication of constants in vfork.S.
Test: treehugger Change-Id: I41fd22bad0581269c88f5b3bb499735ab6ecafd2 |
||
Evgenii Stepanov
|
3031a7e45e |
memtag_stack: vfork and longjmp support.
With memtag_stack, each function is responsible for cleaning up allocation tags for its stack frame. Allocation tags for anything below SP must match the address tag in SP. Both vfork and longjmp implement non-local control transfer which abandons part of the stack without proper cleanup. Update allocation tags: * For longjmp, we know both source and destination values of SP. * For vfork, save the value of SP before exit() or exec*() - the only valid ways of ending the child process according to POSIX - and reset tags from there to SP-in-parent. This is not 100% solid and can be confused by a number of hopefully uncommon conditions: * Segmented stacks. * Longjmp from sigaltstack into the main stack. * Some kind of userspace thread implementation using longjmp (that's UB, longjmp can only return to the caller on the current stack). * and other strange things. This change adds a sanity limit on the size of the tag cleanup. Also, this logic is only activated in the binaries that carry the NT_MEMTAG_STACK note (set by -fsanitize=memtag-stack) which is meant as a debugging configuration, is not compatible with pre-armv9 CPUs, and should not be set on production code. Bug: b/174878242 Test: fvp_mini with ToT LLVM (more test in a separate change) Change-Id: Ibef8b2fc5a6ce85c8e562dead1019964d9f6b80b |
||
Mitch Phillips
|
93400371f7 |
[NFCI] Change Android's NT_TYPE to NT_ANDROID_TYPE.
Normally, platform-specific note types in the toolchain are prefixed with the platform name. Because we're exposing the NT_TYPE_MEMTAG and synthesizing the note in the toolchain in an upcoming patch (https://reviews.llvm.org/D118948), it's been requested that we change the name to include the platform prefix. While NT_TYPE_IDENT and NT_TYPE_KUSER aren't known about or synthesized by the toolchain, update those references as well for consistency. Bug: N/A Test: Build Android Change-Id: I7742e4917ae275d59d7984991664ea48028053a1 |
||
Elliott Hughes
|
c0d41db92e |
setjmp/longjmp: avoid invalid values in the stack pointer.
arm64 was already being careful, but x86/x86-64 and 32-bit ARM could be caught by a signal in a state where the stack pointer was mangled. For 32-bit ARM I've taken care with the link register too, to avoid potential issues with unwinding. Bug: http://b/152210274 Test: treehugger Change-Id: I1ce285b017a633c732dbe04743368f4cae27af85 |
||
Elliott Hughes
|
3e1d5563b6 |
PAC/BTI: no need to keep using hint .
The toolchain is new enough that should be able to use the actual instructions now... Test: treehugger Change-Id: I30aafcdc5386268344c40dc6cc9a22caf591915a |
||
Peter Collingbourne
|
7e20117a36 |
Remove ANDROID_EXPERIMENTAL_MTE.
Now that the feature guarded by this flag has landed in Linux 5.10 we no longer need the flag, so we can remove it. Bug: 135772972 Change-Id: I02fa50848cbd0486c23c8a229bb8f1ab5dd5a56f |
||
Evgenii Stepanov
|
8564b8d9e6 |
Use ELF notes to set the desired memory tagging level.
Use a note in executables to specify (none|sync|async) heap tagging level. To be extended with (heap x stack x globals) in the future. A missing note disables all tagging. Bug: b/135772972 Test: bionic-unit-tests (in a future change) Change-Id: Iab145a922c7abe24cdce17323f9e0c1063cc1321 |
||
Tamas Petz
|
f5bdee7fdf |
libc: Add Armv8.3-A PAuth and Armv8.5-A BTI compatibility to *.S
The most notable change is in sigsetjmp/siglongjmp. The former stores LR signed with the current SP into jmp_buf. Calling siglongjmp reads a signed LR and the corresponding SP from jmp_buf. This way not only the checksum provides some means of integrity protection but Pointer Authentication too. Test: Tested on FVP with BTI enabled. Change-Id: I9d720239775f8d2829a677901f546c4b14b5cbe5 |
||
Peter Collingbourne
|
2361d4ef80 |
Adopt remaining MTE string routines.
ARM has released the remaining MTE string routines, so let's start using them. The strnlen implementation is now compatible with MTE, so it no longer needs to be an ifunc. Bug: 135772972 Change-Id: I9de7fb44447aa1b878f4ad3f62cb0129857b43ad |
||
Josh Gao
|
2303283740 |
Track whether a thread is currently vforked.
Our various fd debugging facilities get extremely confused by a vforked process closing file descriptors in preparation to exec: fdsan can abort, and fdtrack will delete backtraces for any file descriptors that get closed. Keep track of whether we're in a vforked child in order to be able to detect this. Bug: http://b/153926671 Test: 32/64-bit bionic-unit-tests on blueline, x86_64 emulator Change-Id: I8a082fd06bfdfef0e2a88dbce350b6f667f7df9f |
||
Peter Collingbourne
|
337a5b3f9a |
Switch to the arm-optimized-routines string routines on aarch64 where possible.
This includes optimized strrchr and strchrnul routines, and an MTE-compatible strlen routine. Bug: 135772972 Change-Id: I48499f757cdc6d3e77e5649123d45b17dfa3c6b0 |
||
Peter Collingbourne
|
900d07d6a1 |
Add arm64 string.h function implementations for use with hardware supporting MTE.
As it turns out, our "generic" arm64 implementations of certain string.h functions are not actually generic, since they will eagerly read memory possibly outside of the bounds of an MTE granule, which may lead to a segfault on MTE-enabled hardware. Therefore, move the implementations into a "default" directory and use ifuncs to select between them and a new set of "mte" implementations, conditional on whether the hardware and kernel support MTE. The MTE implementations are currently naive implementations written in C but will later be replaced with a set of optimized assembly implementations. Bug: 135772972 Change-Id: Ife37c4e0e6fd60ff20a34594cc09c541af4d1dd7 |
||
Christopher Ferris
|
b8a95e2186 |
Update to kernel headers v5.3.2.
Test: Builds and run unit tests on taimen/cuttlefish. Change-Id: I6ebd8f179d159ac974555e8edca588083e8081b3 |
||
Christopher Ferris
|
c5d3a4348a |
Make tls related header files platform accessible.
There are places in frameworks and art code that directly included
private bionic header files. Move these files to the new platform
include files.
This change also moves the __get_tls.h header file to tls.h and includes
the tls defines header so that there is a single header that platform
code can use to get __get_tls and the defines.
Also, simplify the visibility rules for platform includes.
Bug: 141560639
Test: Builds and bionic unit tests pass.
Change-Id: I9e5e9c33fe8a85260f69823468bc9d340ab7a1f9
Merged-In: I9e5e9c33fe8a85260f69823468bc9d340ab7a1f9
(cherry picked from commit
|
||
Elliott Hughes
|
782c485880 |
Generate assembler system call stubs via genrule.
There's no need to check in generated code. Test: builds & boots Change-Id: Ife368bca4349d4adeb0666db590356196b4fbd63 |
||
Elliott Hughes
|
d67b03734d |
libc: generate syscall stubs in one big file...
...all the better to switch to a genrule rather than checking in generated source. This also removes all the code in the script to deal with git, rather than fix it. We won't need that where we're going. Test: boots Change-Id: I468ce019d4232a7ef27e5cb5cfd89f4c2fe4ecbd |
||
Evgenii Stepanov
|
505168e530 |
Annotate vfork for hwasan.
Call a hwasan hook in the parent return path for vfork() to let hwasan update its shadow. See https://github.com/google/sanitizers/issues/925 for more details. Bug: 112438058 Test: bionic-unit-tests Change-Id: I9a06800962913e822bd66e072012d0a2c5be453d |
||
Ryan Prichard
|
82aea78136 |
Use TLS_SLOT_THREAD_ID macro in vfork.S
No functional change intended. Bug: none Test: bionic unit tests Change-Id: I7ee0a2b3f0e3807abe88bfa34ef3cd56c150a8f6 |
||
Haibo Huang
|
3927db1d5b |
Remove denver64 from libc
Test: compile Change-Id: Ifcbe15c1682b4e1e18835e38915b2421196882f7 |
||
Peter Collingbourne
|
734beec3d4 |
Allocate a small guard region around the shadow call stack.
This lets us do two things: 1) Make setjmp and longjmp compatible with shadow call stack. To avoid leaking the shadow call stack address into memory, only the lower log2(SCS_SIZE) bits of x18 are stored to jmp_buf. This requires allocating an additional guard page so that we're guaranteed to be able to allocate a sufficiently aligned SCS. 2) SCS overflow detection. Overflows now result in a SIGSEGV instead of corrupting the allocation that comes after it. Change-Id: I04d6634f96162bf625684672a87fba8b402b7fd1 Test: bionic-unit-tests |
||
Treehugger Robot
|
a2a114ba26 | Merge "Annotate siglongjmp for HWASan." | ||
Evgenii Stepanov
|
b16e9ce7b8 |
Annotate siglongjmp for HWASan.
HWASan needs to re-tag the newly unallocated stack space to match SP. Bug: 112438058 Test: SANITIZE_TARGET=hwaddress Change-Id: I4dddef542d802d63bdea59e32a03425a2c4f870b |
||
Evgenii Stepanov
|
00d087c629 |
(arm64) Extend branch range in __memcpy_chk.
Conditional branch has limited range (1MB) and can not be extended by the linker. The current distance (in walleye build) is 500KB, about half of the maximum. HWASan pushes it over the limit. Replace conditional branch with regular branch, which has longer range (26 vs 19 bits offset) and can be extended in the linker if needed. Bug: 112437884 Bug: 12231437 Test: SANITIZE_TARGET=hwaddress Change-Id: Idc083fb557ab3a859541beb009809992406a6703 |
||
Adhemerval Zanella
|
65a6211f4d |
[AArch64] Improve strncmp for mutually misaligned inputs
This patch was originally written by Siddhesh Poyarekar and pushed on cortex-strings [1]. The mutually misaligned inputs on aarch64 are compared with a simple byte copy, which is not very efficient. This patch enhances the comparison similar to strcmp by loading a double-word at a time. Comparison on the default bionic and proposed optimized routines shows the following performance improvements on A54 (using the new proposed memcmp input data from test_strncmp.xml): - No noticeable change on aligned inputs or with same alignment. - Large improvements on unaligned inputs from sizes larger than 16 bytes. Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------- BM_string_strncmp/1/0/0 -0.0954 -0.0954 19 17 19 17 BM_string_strncmp/2/0/0 -0.0344 -0.0344 19 18 19 18 BM_string_strncmp/3/0/0 +0.1768 +0.1768 15 18 15 18 BM_string_strncmp/4/0/0 -0.0344 -0.0344 19 18 19 18 BM_string_strncmp/5/0/0 -0.0344 -0.0344 19 18 19 18 BM_string_strncmp/6/0/0 +0.1589 +0.1589 15 18 15 18 BM_string_strncmp/7/0/0 -0.0344 -0.0344 19 18 19 18 BM_string_strncmp/8/0/0 -0.0998 -0.0998 19 17 19 17 BM_string_strncmp/9/0/0 -0.0277 -0.0277 23 22 23 22 BM_string_strncmp/10/0/0 -0.0270 -0.0270 23 22 23 22 BM_string_strncmp/11/0/0 -0.0331 -0.0331 23 22 23 22 BM_string_strncmp/12/0/0 -0.0270 -0.0270 23 22 23 22 BM_string_strncmp/13/0/0 -0.0284 -0.0284 23 22 23 22 BM_string_strncmp/14/0/0 +0.1042 +0.1042 20 22 20 22 BM_string_strncmp/15/0/0 -0.0277 -0.0277 23 22 23 22 BM_string_strncmp/16/0/0 +0.0214 +0.0215 22 22 22 22 BM_string_strncmp/24/0/0 -0.1291 -0.1291 24 21 24 21 BM_string_strncmp/32/0/0 -0.0470 -0.0470 27 26 27 26 BM_string_strncmp/40/0/0 -0.0433 -0.0433 29 28 29 28 BM_string_strncmp/48/0/0 -0.0301 -0.0301 31 30 31 30 BM_string_strncmp/56/0/0 -0.0800 -0.0800 33 31 33 31 BM_string_strncmp/64/0/0 +0.0188 +0.0188 34 34 34 34 BM_string_strncmp/72/0/0 -0.0334 -0.0334 38 37 38 37 BM_string_strncmp/80/0/0 -0.0000 -0.0000 40 40 40 40 BM_string_strncmp/88/0/0 +0.0413 +0.0413 61 64 61 64 BM_string_strncmp/96/0/0 -0.0215 -0.0216 69 67 69 67 BM_string_strncmp/104/0/0 -0.0208 -0.0208 72 70 72 70 BM_string_strncmp/112/0/0 -0.0173 -0.0173 75 74 75 74 BM_string_strncmp/120/0/0 -0.0166 -0.0166 78 77 78 77 BM_string_strncmp/128/0/0 -0.0158 -0.0158 81 80 81 80 BM_string_strncmp/136/0/0 -0.0149 -0.0149 84 83 84 83 BM_string_strncmp/144/0/0 -0.0201 -0.0201 88 86 88 86 BM_string_strncmp/160/0/0 -0.0136 -0.0136 94 93 94 93 BM_string_strncmp/176/0/0 +0.0224 +0.0224 96 98 96 98 BM_string_strncmp/192/0/0 +0.0289 +0.0289 102 105 102 105 BM_string_strncmp/208/0/0 +0.0101 +0.0101 111 112 111 112 BM_string_strncmp/224/0/0 -0.0107 -0.0107 119 118 119 118 BM_string_strncmp/240/0/0 -0.0088 -0.0088 126 125 126 125 BM_string_strncmp/256/0/0 -0.0101 -0.0101 132 131 132 131 BM_string_strncmp/512/0/0 -0.0056 -0.0056 235 233 235 233 BM_string_strncmp/1024/0/0 -0.0030 -0.0030 439 437 439 437 BM_string_strncmp/8192/0/0 -0.0431 -0.0431 3799 3635 3799 3635 BM_string_strncmp/16384/0/0 -0.0069 -0.0069 6778 6732 6779 6732 BM_string_strncmp/32768/0/0 -0.0001 -0.0002 13405 13403 13405 13403 BM_string_strncmp/65536/0/0 +0.0005 +0.0005 26968 26981 26968 26981 BM_string_strncmp/131072/0/0 -0.0057 -0.0057 53959 53650 53958 53650 BM_string_strncmp/1/4/0 -0.1352 -0.1352 12 10 12 10 BM_string_strncmp/2/4/0 +0.0020 +0.0020 15 15 15 15 BM_string_strncmp/3/4/0 -0.1560 -0.1560 20 17 20 17 BM_string_strncmp/4/4/0 +0.0296 +0.0296 22 22 22 22 BM_string_strncmp/5/4/0 +0.0573 +0.0573 22 23 22 23 BM_string_strncmp/6/4/0 -0.0340 -0.0340 25 24 25 24 BM_string_strncmp/7/4/0 +0.0185 +0.0185 26 26 26 26 BM_string_strncmp/8/4/0 -0.0050 -0.0050 27 27 27 27 BM_string_strncmp/9/4/0 -0.1294 -0.1294 28 24 28 24 BM_string_strncmp/10/4/0 +0.0109 +0.0109 29 29 29 29 BM_string_strncmp/11/4/0 -0.0000 -0.0001 30 30 30 30 BM_string_strncmp/12/4/0 +0.0055 +0.0055 50 50 50 50 BM_string_strncmp/13/4/0 -0.0249 -0.0249 51 50 51 50 BM_string_strncmp/14/4/0 -0.0289 -0.0289 53 52 53 52 BM_string_strncmp/15/4/0 -0.0205 -0.0205 55 54 55 54 BM_string_strncmp/16/4/0 -0.4616 -0.4616 57 31 57 31 BM_string_strncmp/24/4/0 -0.4871 -0.4871 72 37 72 37 BM_string_strncmp/32/4/0 -0.5549 -0.5549 87 39 87 39 BM_string_strncmp/40/4/0 -0.5964 -0.5964 103 42 103 42 BM_string_strncmp/48/4/0 -0.6647 -0.6647 118 40 118 40 BM_string_strncmp/56/4/0 -0.6551 -0.6551 134 46 134 46 BM_string_strncmp/64/4/0 -0.6609 -0.6609 145 49 145 49 BM_string_strncmp/72/4/0 -0.5709 -0.5710 164 70 164 70 BM_string_strncmp/80/4/0 -0.5929 -0.5929 180 73 180 73 BM_string_strncmp/88/4/0 -0.6051 -0.6051 195 77 195 77 BM_string_strncmp/96/4/0 -0.6160 -0.6160 210 81 210 81 BM_string_strncmp/104/4/0 -0.6199 -0.6199 223 85 223 85 BM_string_strncmp/112/4/0 -0.6293 -0.6293 240 89 240 89 BM_string_strncmp/120/4/0 -0.6439 -0.6439 255 91 255 91 BM_string_strncmp/128/4/0 -0.6493 -0.6493 271 95 271 95 BM_string_strncmp/136/4/0 -0.6704 -0.6704 287 95 287 95 BM_string_strncmp/144/4/0 -0.6744 -0.6744 302 98 302 98 BM_string_strncmp/160/4/0 -0.6700 -0.6700 333 110 333 110 BM_string_strncmp/176/4/0 -0.6821 -0.6821 364 116 364 116 BM_string_strncmp/192/4/0 -0.6887 -0.6887 394 123 394 123 BM_string_strncmp/208/4/0 -0.6949 -0.6949 425 130 425 130 BM_string_strncmp/224/4/0 -0.7069 -0.7069 456 134 456 134 BM_string_strncmp/240/4/0 -0.7042 -0.7042 486 144 486 144 BM_string_strncmp/256/4/0 -0.7043 -0.7043 514 152 514 152 BM_string_strncmp/1/0/4 +0.0227 +0.0227 14 14 14 14 BM_string_strncmp/2/0/4 +0.0442 +0.0442 15 16 15 16 BM_string_strncmp/3/0/4 +0.5829 +0.5829 17 27 17 27 BM_string_strncmp/4/0/4 -0.1593 -0.1593 22 19 22 19 BM_string_strncmp/5/0/4 -0.0516 -0.0516 23 22 23 22 BM_string_strncmp/6/0/4 -0.1684 -0.1684 25 20 25 20 BM_string_strncmp/7/0/4 +0.0170 +0.0170 26 26 26 26 BM_string_strncmp/8/0/4 +0.0006 +0.0006 27 27 27 27 BM_string_strncmp/9/0/4 +0.1272 +0.1272 25 28 25 28 BM_string_strncmp/10/0/4 +0.0108 +0.0108 29 29 29 29 BM_string_strncmp/11/0/4 -0.0001 -0.0001 30 30 30 30 BM_string_strncmp/12/0/4 -0.3557 -0.3557 50 32 50 32 BM_string_strncmp/13/0/4 -0.3370 -0.3370 51 34 51 34 BM_string_strncmp/14/0/4 -0.3444 -0.3444 53 35 53 35 BM_string_strncmp/15/0/4 +0.0946 +0.0946 51 56 51 56 BM_string_strncmp/16/0/4 -0.5203 -0.5203 53 25 53 25 BM_string_strncmp/24/0/4 -0.6109 -0.6109 72 28 72 28 BM_string_strncmp/32/0/4 -0.6934 -0.6934 88 27 88 27 BM_string_strncmp/40/0/4 -0.6833 -0.6833 103 33 103 33 BM_string_strncmp/48/0/4 -0.6973 -0.6973 118 36 118 36 BM_string_strncmp/56/0/4 -0.7116 -0.7116 134 39 134 39 BM_string_strncmp/64/0/4 -0.6017 -0.6018 149 59 149 59 BM_string_strncmp/72/0/4 -0.6268 -0.6268 164 61 164 61 BM_string_strncmp/80/0/4 -0.6409 -0.6409 179 64 179 64 BM_string_strncmp/88/0/4 -0.6465 -0.6465 195 69 195 69 BM_string_strncmp/96/0/4 -0.6551 -0.6551 210 72 210 72 BM_string_strncmp/104/0/4 -0.6662 -0.6662 227 76 227 76 BM_string_strncmp/112/0/4 -0.6700 -0.6700 240 79 240 79 BM_string_strncmp/120/0/4 -0.6740 -0.6740 256 83 256 83 BM_string_strncmp/128/0/4 -0.6862 -0.6862 271 85 271 85 BM_string_strncmp/136/0/4 -0.6883 -0.6883 287 89 287 89 BM_string_strncmp/144/0/4 -0.7031 -0.7031 297 88 297 88 BM_string_strncmp/160/0/4 -0.6985 -0.6985 333 100 333 100 BM_string_strncmp/176/0/4 -0.7082 -0.7082 364 106 364 106 BM_string_strncmp/192/0/4 -0.7223 -0.7223 396 110 396 110 BM_string_strncmp/208/0/4 -0.7135 -0.7135 421 121 421 121 BM_string_strncmp/224/0/4 -0.7194 -0.7194 455 128 455 128 BM_string_strncmp/240/0/4 -0.7233 -0.7233 487 135 487 135 BM_string_strncmp/256/0/4 -0.7239 -0.7239 516 143 516 143 BM_string_strncmp/1/4/4 +0.0224 +0.0225 21 22 21 22 BM_string_strncmp/2/4/4 -0.0001 -0.0001 22 22 22 22 BM_string_strncmp/3/4/4 -0.0001 -0.0001 22 22 22 22 BM_string_strncmp/4/4/4 -0.0435 -0.0435 22 21 22 21 BM_string_strncmp/5/4/4 -0.0118 -0.0118 27 27 27 27 BM_string_strncmp/6/4/4 -0.0118 -0.0118 27 27 27 27 BM_string_strncmp/7/4/4 -0.0117 -0.0117 27 27 27 27 BM_string_strncmp/8/4/4 -0.0118 -0.0118 27 27 27 27 BM_string_strncmp/9/4/4 -0.0117 -0.0117 27 27 27 27 BM_string_strncmp/10/4/4 +0.1447 +0.1447 23 27 23 27 BM_string_strncmp/11/4/4 -0.0062 -0.0062 27 27 27 27 BM_string_strncmp/12/4/4 -0.0454 -0.0454 28 27 28 27 BM_string_strncmp/13/4/4 -0.1507 -0.1507 29 24 29 24 BM_string_strncmp/14/4/4 -0.0003 -0.0003 29 29 29 29 BM_string_strncmp/15/4/4 -0.0002 -0.0003 29 29 29 29 BM_string_strncmp/16/4/4 +0.0047 +0.0047 29 29 29 29 BM_string_strncmp/24/4/4 -0.0104 -0.0104 31 30 31 30 BM_string_strncmp/32/4/4 -0.0290 -0.0290 33 32 33 32 BM_string_strncmp/40/4/4 -0.0189 -0.0189 34 33 34 33 BM_string_strncmp/48/4/4 -0.0059 -0.0059 36 36 36 36 BM_string_strncmp/56/4/4 +0.0000 +0.0000 39 39 39 39 BM_string_strncmp/64/4/4 +0.0000 +0.0000 42 42 42 42 BM_string_strncmp/72/4/4 +0.0000 +0.0000 45 45 45 45 BM_string_strncmp/80/4/4 +0.0391 +0.0392 65 68 65 68 BM_string_strncmp/88/4/4 -0.0090 -0.0090 71 70 71 70 BM_string_strncmp/96/4/4 -0.0034 -0.0034 74 74 74 74 BM_string_strncmp/104/4/4 -0.0482 -0.0482 77 73 77 73 BM_string_strncmp/112/4/4 +0.0387 +0.0387 77 80 77 80 BM_string_strncmp/120/4/4 -0.0072 -0.0073 84 83 84 83 BM_string_strncmp/128/4/4 -0.0071 -0.0071 87 86 87 86 BM_string_strncmp/136/4/4 +0.0366 +0.0366 86 89 86 89 BM_string_strncmp/144/4/4 -0.0068 -0.0068 93 93 93 93 BM_string_strncmp/160/4/4 -0.0064 -0.0064 100 99 100 99 BM_string_strncmp/176/4/4 -0.0063 -0.0063 106 105 106 105 BM_string_strncmp/192/4/4 -0.0012 -0.0012 112 112 112 112 BM_string_strncmp/208/4/4 -0.0098 -0.0098 119 118 119 118 BM_string_strncmp/224/4/4 -0.0050 -0.0050 125 125 125 125 BM_string_strncmp/240/4/4 -0.0060 -0.0060 132 131 132 131 BM_string_strncmp/256/4/4 -0.0046 -0.0046 138 137 138 137 [1] Commit id: 26cc4faec37a55529e5d0a39949f7b6ec81008f9 Test: bionic tests and benchmarks on aarch64. Change-Id: Ied579d2044b4092fc95fad486af6541d1eb71dc3 |
||
Adhemerval Zanella
|
b42ff1b5c3 |
[AArch64] Improve strcmp performance for misaligned strings
This patch was originally written by Siddhesh Poyarekar and pushed on cortex-strings [1]. Replace the simple byte-wise compare in the misaligned case with a dword compare with page boundary checks in place. For simplicity its uses a 4K page boundary so that it does not have to query the actual page size on the system. Comparison on the default bionic and proposed optimized routines shows the following performance improvements on A64 (using the new proposed memcmp input data from test_strcmp.xml): - Small improvement for aligned arguments with sizes up to 56 bytes (from 10% to 20%). - Large improvements for unaligned arguments for small sizes (from 3 to 256 bytes). Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------- BM_string_strcmp/1/0/0 +0.0034 +0.0034 11 11 11 11 BM_string_strcmp/2/0/0 +0.0000 +0.0000 11 11 11 11 BM_string_strcmp/3/0/0 -0.1726 -0.1726 11 9 11 9 BM_string_strcmp/4/0/0 -0.1726 -0.1726 11 9 11 9 BM_string_strcmp/5/0/0 -0.1726 -0.1726 11 9 11 9 BM_string_strcmp/6/0/0 -0.1719 -0.1719 11 9 11 9 BM_string_strcmp/7/0/0 -0.1724 -0.1724 11 9 11 9 BM_string_strcmp/8/0/0 -0.1718 -0.1718 11 9 11 9 BM_string_strcmp/9/0/0 -0.2008 -0.2008 16 13 16 13 BM_string_strcmp/10/0/0 -0.2008 -0.2008 16 13 16 13 BM_string_strcmp/11/0/0 -0.2040 -0.2040 16 13 16 13 BM_string_strcmp/12/0/0 -0.1991 -0.1991 16 13 16 13 BM_string_strcmp/13/0/0 -0.1997 -0.1997 16 13 16 13 BM_string_strcmp/14/0/0 -0.1988 -0.1989 16 13 16 13 BM_string_strcmp/15/0/0 -0.2006 -0.2006 16 13 16 13 BM_string_strcmp/16/0/0 -0.2043 -0.2043 16 13 16 13 BM_string_strcmp/24/0/0 -0.1927 -0.1927 18 15 18 15 BM_string_strcmp/32/0/0 -0.1743 -0.1743 20 17 20 17 BM_string_strcmp/40/0/0 -0.1427 -0.1427 22 19 22 19 BM_string_strcmp/48/0/0 -0.1053 -0.1053 24 22 24 22 BM_string_strcmp/56/0/0 -0.0805 -0.0805 26 24 26 24 BM_string_strcmp/64/0/0 -0.0454 -0.0454 28 27 28 27 BM_string_strcmp/72/0/0 -0.0303 -0.0303 30 29 30 29 BM_string_strcmp/80/0/0 -0.0111 -0.0111 32 32 32 32 BM_string_strcmp/88/0/0 -0.0004 -0.0004 34 34 34 34 BM_string_strcmp/96/0/0 -0.0058 -0.0058 37 37 37 37 BM_string_strcmp/104/0/0 +0.0000 +0.0000 40 40 40 40 BM_string_strcmp/112/0/0 -0.0457 -0.0457 61 58 61 58 BM_string_strcmp/120/0/0 -0.0486 -0.0487 61 58 61 58 BM_string_strcmp/128/0/0 -0.0499 -0.0499 64 61 64 61 BM_string_strcmp/136/0/0 -0.0529 -0.0529 66 63 66 63 BM_string_strcmp/144/0/0 -0.0492 -0.0492 69 66 69 66 BM_string_strcmp/160/0/0 -0.0459 -0.0459 74 71 74 71 BM_string_strcmp/176/0/0 -0.0400 -0.0401 79 76 79 76 BM_string_strcmp/192/0/0 -0.0378 -0.0378 85 81 85 81 BM_string_strcmp/208/0/0 -0.0009 -0.0009 89 89 89 89 BM_string_strcmp/224/0/0 -0.0003 -0.0003 95 95 95 95 BM_string_strcmp/240/0/0 -0.0320 -0.0320 100 96 100 96 BM_string_strcmp/256/0/0 -0.0303 -0.0304 105 102 105 102 BM_string_strcmp/512/0/0 -0.0171 -0.0171 187 183 187 183 BM_string_strcmp/1024/0/0 -0.0091 -0.0091 350 347 350 347 BM_string_strcmp/8192/0/0 -0.0030 -0.0031 2668 2660 2668 2660 BM_string_strcmp/16384/0/0 +0.0007 +0.0007 5449 5452 5448 5452 BM_string_strcmp/32768/0/0 +0.0635 +0.0635 10868 11558 10867 11557 BM_string_strcmp/65536/0/0 -0.0017 -0.0017 21824 21786 21822 21784 BM_string_strcmp/131072/0/0 +0.0012 +0.0012 43485 43536 43480 43532 BM_string_strcmp/1/4/0 +0.7630 +0.7630 7 12 7 12 BM_string_strcmp/2/4/0 +0.9265 +0.9265 12 23 12 23 BM_string_strcmp/3/4/0 -0.0000 -0.0000 14 14 14 14 BM_string_strcmp/4/4/0 +0.0372 +0.0372 19 19 19 19 BM_string_strcmp/6/4/0 -0.0921 -0.0921 20 19 20 19 BM_string_strcmp/7/4/0 -0.0291 -0.0291 19 19 19 19 BM_string_strcmp/8/4/0 +0.0648 +0.0648 20 22 20 22 BM_string_strcmp/9/4/0 +0.0001 -0.0055 22 22 22 22 BM_string_strcmp/10/4/0 -0.1924 -0.1924 23 19 23 19 BM_string_strcmp/11/4/0 -0.2347 -0.2347 24 19 24 19 BM_string_strcmp/12/4/0 -0.2738 -0.2739 26 19 26 19 BM_string_strcmp/13/4/0 -0.3804 -0.3804 42 26 42 26 BM_string_strcmp/14/4/0 -0.3581 -0.3582 41 26 41 26 BM_string_strcmp/15/4/0 -0.3905 -0.3905 43 26 43 26 BM_string_strcmp/16/4/0 -0.4068 -0.4068 44 26 44 26 BM_string_strcmp/24/4/0 -0.4917 -0.4917 57 29 57 29 BM_string_strcmp/32/4/0 -0.5607 -0.5607 70 31 70 31 BM_string_strcmp/40/4/0 -0.5940 -0.5940 82 33 82 33 BM_string_strcmp/48/4/0 -0.5303 -0.5302 95 45 95 45 BM_string_strcmp/56/4/0 -0.4975 -0.4975 108 54 108 54 BM_string_strcmp/64/4/0 -0.5167 -0.5167 121 58 121 58 BM_string_strcmp/72/4/0 -0.5325 -0.5325 133 62 133 62 BM_string_strcmp/80/4/0 -0.5523 -0.5523 146 65 146 65 BM_string_strcmp/88/4/0 -0.5686 -0.5686 159 69 159 69 BM_string_strcmp/96/4/0 -0.5815 -0.5815 172 72 172 72 BM_string_strcmp/104/4/0 -0.5931 -0.5931 185 75 185 75 BM_string_strcmp/112/4/0 -0.6046 -0.6046 197 78 197 78 BM_string_strcmp/120/4/0 -0.6113 -0.6113 210 82 210 82 BM_string_strcmp/128/4/0 -0.6186 -0.6186 223 85 223 85 BM_string_strcmp/136/4/0 -0.6278 -0.6278 237 88 237 88 BM_string_strcmp/144/4/0 -0.6410 -0.6410 253 91 253 91 BM_string_strcmp/160/4/0 -0.6506 -0.6506 280 98 280 98 BM_string_strcmp/176/4/0 -0.6593 -0.6593 304 104 304 104 BM_string_strcmp/192/4/0 -0.6647 -0.6647 330 111 330 111 BM_string_strcmp/208/4/0 -0.6741 -0.6741 357 116 357 116 BM_string_strcmp/224/4/0 -0.6761 -0.6761 381 123 381 123 BM_string_strcmp/240/4/0 -0.6824 -0.6824 406 129 406 129 BM_string_strcmp/256/4/0 -0.6846 -0.6846 432 136 432 136 BM_string_strcmp/1/0/4 +1.0024 +1.0024 7 14 7 14 BM_string_strcmp/2/0/4 +0.1591 +0.1591 12 14 12 14 BM_string_strcmp/3/0/4 -0.0015 -0.0015 14 14 14 14 BM_string_strcmp/4/0/4 -0.0809 -0.0809 15 14 15 14 BM_string_strcmp/5/0/4 -0.1535 -0.1536 17 14 17 14 BM_string_strcmp/6/0/4 -0.2111 -0.2111 18 14 18 14 BM_string_strcmp/7/0/4 -0.2650 -0.2650 19 14 19 14 BM_string_strcmp/8/0/4 -0.3118 -0.3118 20 14 20 14 BM_string_strcmp/9/0/4 -0.1741 -0.1740 22 18 22 18 BM_string_strcmp/10/0/4 -0.2201 -0.2201 23 18 23 18 BM_string_strcmp/11/0/4 -0.2610 -0.2610 24 18 24 18 BM_string_strcmp/12/0/4 -0.2987 -0.2987 26 18 26 18 BM_string_strcmp/13/0/4 -0.5748 -0.5748 42 18 42 18 BM_string_strcmp/14/0/4 -0.5796 -0.5796 43 18 43 18 BM_string_strcmp/15/0/4 -0.6167 -0.6167 47 18 47 18 BM_string_strcmp/16/0/4 -0.6303 -0.6303 49 18 49 18 BM_string_strcmp/24/0/4 -0.6557 -0.6557 61 21 61 21 BM_string_strcmp/32/0/4 -0.6612 -0.6612 70 24 70 24 BM_string_strcmp/40/0/4 -0.6812 -0.6813 82 26 82 26 BM_string_strcmp/48/0/4 -0.6974 -0.6974 95 29 95 29 BM_string_strcmp/56/0/4 -0.7151 -0.7151 108 31 108 31 BM_string_strcmp/64/0/4 -0.5717 -0.5717 121 52 121 52 BM_string_strcmp/72/0/4 -0.5927 -0.5927 134 54 134 54 BM_string_strcmp/80/0/4 -0.6004 -0.6004 146 58 146 58 BM_string_strcmp/88/0/4 -0.6145 -0.6145 159 61 159 61 BM_string_strcmp/96/0/4 -0.6287 -0.6287 172 64 172 64 BM_string_strcmp/104/0/4 -0.6351 -0.6351 185 67 185 67 BM_string_strcmp/112/0/4 -0.6423 -0.6423 197 71 197 71 BM_string_strcmp/120/0/4 -0.6489 -0.6489 210 74 210 74 BM_string_strcmp/128/0/4 -0.6578 -0.6578 223 76 223 76 BM_string_strcmp/136/0/4 -0.6597 -0.6597 236 80 236 80 BM_string_strcmp/144/0/4 -0.6674 -0.6674 250 83 250 83 BM_string_strcmp/160/0/4 -0.6751 -0.6751 274 89 274 89 BM_string_strcmp/176/0/4 -0.6798 -0.6798 300 96 300 96 BM_string_strcmp/192/0/4 -0.6873 -0.6855 327 102 325 102 BM_string_strcmp/208/0/4 -0.6903 -0.6903 351 109 351 109 BM_string_strcmp/224/0/4 -0.6907 -0.6907 376 116 376 116 BM_string_strcmp/240/0/4 -0.6897 -0.6897 402 125 402 125 BM_string_strcmp/256/0/4 -0.6937 -0.6937 427 131 427 131 BM_string_strcmp/1/4/4 +0.0009 +0.0009 14 14 14 14 BM_string_strcmp/2/4/4 -0.2229 -0.2229 14 11 14 11 BM_string_strcmp/3/4/4 -0.2256 -0.2256 14 11 14 11 BM_string_strcmp/4/4/4 -0.2241 -0.2240 14 11 14 11 BM_string_strcmp/5/4/4 -0.2220 -0.2220 20 15 20 15 BM_string_strcmp/6/4/4 -0.2267 -0.2267 20 15 20 15 BM_string_strcmp/7/4/4 -0.2228 -0.2227 20 15 20 15 BM_string_strcmp/8/4/4 -0.2219 -0.2219 20 15 20 15 BM_string_strcmp/9/4/4 -0.2220 -0.2220 20 15 20 15 BM_string_strcmp/10/4/4 -0.2227 -0.2227 20 15 20 15 BM_string_strcmp/11/4/4 -0.2210 -0.2210 20 15 20 15 BM_string_strcmp/12/4/4 -0.2224 -0.2224 20 15 20 15 BM_string_strcmp/13/4/4 -0.1778 -0.1778 21 17 21 17 BM_string_strcmp/14/4/4 -0.1863 -0.1863 21 17 21 17 BM_string_strcmp/15/4/4 -0.1780 -0.1780 21 17 21 17 BM_string_strcmp/16/4/4 +0.0031 +0.0031 21 21 21 21 BM_string_strcmp/24/4/4 +0.0041 +0.0041 24 24 24 24 BM_string_strcmp/32/4/4 -0.0001 -0.0000 25 25 25 25 BM_string_strcmp/40/4/4 +0.0016 +0.0016 26 26 26 26 BM_string_strcmp/48/4/4 +0.0001 +0.0001 28 28 28 28 BM_string_strcmp/56/4/4 -0.0001 -0.0001 30 30 30 30 BM_string_strcmp/64/4/4 -0.0342 -0.0342 32 31 32 31 BM_string_strcmp/72/4/4 -0.0186 -0.0186 34 34 34 34 BM_string_strcmp/80/4/4 +0.0004 +0.0004 36 36 36 36 BM_string_strcmp/88/4/4 -0.0000 -0.0000 39 39 39 39 BM_string_strcmp/96/4/4 -0.0510 -0.0510 62 59 62 59 BM_string_strcmp/104/4/4 -0.0502 -0.0502 63 60 63 60 BM_string_strcmp/112/4/4 -0.0490 -0.0490 65 62 65 62 BM_string_strcmp/120/4/4 -0.0387 -0.0387 67 65 67 65 BM_string_strcmp/128/4/4 -0.0426 -0.0426 70 67 70 67 BM_string_strcmp/136/4/4 -0.0408 -0.0408 73 70 73 70 BM_string_strcmp/144/4/4 -0.0194 -0.0194 75 74 75 74 BM_string_strcmp/160/4/4 -0.0035 -0.0035 81 81 81 81 BM_string_strcmp/176/4/4 -0.0001 -0.0001 86 86 86 86 BM_string_strcmp/192/4/4 -0.0002 -0.0002 91 91 91 91 BM_string_strcmp/208/4/4 -0.0335 -0.0335 96 93 96 93 BM_string_strcmp/224/4/4 -0.0314 -0.0314 101 98 101 98 BM_string_strcmp/240/4/4 -0.0303 -0.0303 106 103 106 103 BM_string_strcmp/256/4/4 -0.0288 -0.0288 111 108 111 108 [1] Commit id: f98f2a6780d686ca3d44f8011c7823d42d9b083a Test: bionic tests and benchmarks on aarch64. Change-Id: I75f8948782b8bd459d21f15e75e1d420905f5e5a |
||
Adhemerval Zanella
|
4ab56af82d |
[AArch64] Optimize memcmp for medium to large sizes
This patch was originally written by Siddhesh Poyarekar and pushed on cortex-strings [1]. This improved memcmp provides a fast path for compares up to 16 bytes and then compares 16 bytes at a time, thus optimizing loads from both sources. Comparison on the default bionic and proposed optimized routines shows the following performance improvements on A72 (using the new proposed memcmp input data from test_memcmp.xml): Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------- BM_string_memcmp/1/0/0 -0.2074 -0.2074 15 12 15 12 BM_string_memcmp/2/0/0 -0.5193 -0.5193 31 15 31 15 BM_string_memcmp/3/0/0 -0.1291 -0.1291 19 17 19 17 BM_string_memcmp/4/0/0 -0.2889 -0.2889 17 12 17 12 BM_string_memcmp/5/0/0 -0.2606 -0.2606 15 11 15 11 BM_string_memcmp/6/0/0 -0.1656 -0.1655 17 14 17 14 BM_string_memcmp/7/0/0 -0.1721 -0.1721 19 15 19 15 BM_string_memcmp/8/0/0 -0.3048 -0.3048 15 10 15 10 BM_string_memcmp/9/0/0 -0.3041 -0.3041 15 10 15 10 BM_string_memcmp/10/0/0 -0.3040 -0.3040 15 10 15 10 BM_string_memcmp/11/0/0 -0.3048 -0.3048 15 10 15 10 BM_string_memcmp/12/0/0 -0.3041 -0.3041 15 10 15 10 BM_string_memcmp/13/0/0 -0.3040 -0.3040 15 10 15 10 BM_string_memcmp/14/0/0 -0.3048 -0.3048 15 10 15 10 BM_string_memcmp/15/0/0 -0.3040 -0.3040 15 10 15 10 BM_string_memcmp/16/0/0 -0.3041 -0.3041 15 10 15 10 BM_string_memcmp/24/0/0 -0.1209 -0.1209 15 13 15 13 BM_string_memcmp/32/0/0 -0.3228 -0.3228 20 13 20 13 BM_string_memcmp/40/0/0 -0.2937 -0.2937 22 15 22 15 BM_string_memcmp/48/0/0 -0.3299 -0.3299 23 15 23 15 BM_string_memcmp/56/0/0 -0.1845 -0.1845 24 20 24 20 BM_string_memcmp/64/0/0 -0.2247 -0.2247 26 20 26 20 BM_string_memcmp/72/0/0 -0.1947 -0.1947 27 22 27 22 BM_string_memcmp/80/0/0 -0.2275 -0.2275 28 22 28 22 BM_string_memcmp/88/0/0 -0.2360 -0.2360 29 22 29 22 BM_string_memcmp/96/0/0 -0.2675 -0.2675 31 22 31 22 BM_string_memcmp/104/0/0 -0.2559 -0.2559 32 24 32 24 BM_string_memcmp/112/0/0 -0.2787 -0.2786 33 24 33 24 BM_string_memcmp/120/0/0 -0.2599 -0.2599 34 25 34 25 BM_string_memcmp/128/0/0 -0.2860 -0.2860 35 25 35 25 BM_string_memcmp/136/0/0 -0.4708 -0.4708 53 28 53 28 BM_string_memcmp/144/0/0 -0.4719 -0.4719 53 28 53 28 BM_string_memcmp/160/0/0 -0.4680 -0.4680 56 30 56 30 BM_string_memcmp/176/0/0 -0.4645 -0.4645 60 32 60 32 BM_string_memcmp/192/0/0 -0.4641 -0.4641 63 34 63 34 BM_string_memcmp/208/0/0 -0.4555 -0.4555 66 36 66 36 BM_string_memcmp/224/0/0 -0.4558 -0.4557 69 38 69 38 BM_string_memcmp/240/0/0 -0.4534 -0.4534 72 40 72 40 BM_string_memcmp/256/0/0 -0.4463 -0.4463 75 42 75 42 BM_string_memcmp/512/0/0 -0.3077 -0.3077 126 88 126 88 BM_string_memcmp/1024/0/0 -0.3493 -0.3493 229 149 229 149 BM_string_memcmp/8192/0/0 -0.4173 -0.4173 1729 1007 1729 1007 BM_string_memcmp/16384/0/0 -0.3855 -0.3855 3377 2076 3377 2075 BM_string_memcmp/32768/0/0 -0.2968 -0.2968 6847 4815 6847 4814 BM_string_memcmp/65536/0/0 -0.2496 -0.2496 13715 10292 13714 10291 BM_string_memcmp/131072/0/0 -0.2676 -0.2676 27354 20033 27351 20031 BM_string_memcmp/262144/0/0 -0.2319 -0.2319 54604 41943 54598 41939 BM_string_memcmp/524288/0/0 -0.2359 -0.2359 109225 83460 109212 83449 BM_string_memcmp/1048576/0/0 -0.0439 -0.0439 423367 404791 423251 404686 BM_string_memcmp/2097152/0/0 -0.0023 -0.0024 762470 760701 761956 760122 BM_string_memcmp/512/4/4 -0.2853 -0.2853 125 89 125 89 BM_string_memcmp/1024/4/4 -0.3377 -0.3377 228 151 227 151 BM_string_memcmp/8192/4/4 -0.4083 -0.4083 1706 1009 1706 1009 BM_string_memcmp/16384/4/4 -0.3853 -0.3853 3376 2075 3376 2075 BM_string_memcmp/32768/4/4 -0.2974 -0.2974 6846 4810 6845 4810 BM_string_memcmp/65536/4/4 -0.2485 -0.2485 13619 10235 13618 10234 BM_string_memcmp/131072/4/4 -0.2387 -0.2387 27056 20597 27054 20595 BM_string_memcmp/512/4/0 -0.2898 -0.2898 123 88 123 88 BM_string_memcmp/1024/4/0 -0.3401 -0.3401 225 149 225 149 BM_string_memcmp/8192/4/0 -0.4167 -0.4167 1727 1007 1727 1007 BM_string_memcmp/16384/4/0 -0.3820 -0.3820 3384 2092 3384 2091 BM_string_memcmp/32768/4/0 -0.2535 -0.2535 6886 5141 6886 5140 BM_string_memcmp/65536/4/0 -0.1897 -0.1897 13850 11223 13849 11223 BM_string_memcmp/131072/4/0 -0.1972 -0.1972 27536 22106 27533 22104 BM_string_memcmp/512/0/4 -0.2854 -0.2854 125 89 125 89 BM_string_memcmp/1024/0/4 -0.3332 -0.3333 226 151 226 151 BM_string_memcmp/8192/0/4 -0.4199 -0.4199 1740 1009 1740 1009 BM_string_memcmp/16384/0/4 -0.3811 -0.3811 3383 2094 3383 2094 BM_string_memcmp/32768/0/4 -0.2409 -0.2409 6900 5238 6899 5237 BM_string_memcmp/65536/0/4 -0.1920 -0.1920 13922 11250 13921 11248 BM_string_memcmp/131072/0/4 -0.2029 -0.2029 27699 22079 27697 22077 I see similar improvements on A54 as well: Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------- BM_string_memcmp/1/0/0 -0.2074 -0.2074 15 12 15 12 BM_string_memcmp/2/0/0 -0.5193 -0.5193 31 15 31 15 BM_string_memcmp/3/0/0 -0.1291 -0.1291 19 17 19 17 BM_string_memcmp/4/0/0 -0.2889 -0.2889 17 12 17 12 BM_string_memcmp/5/0/0 -0.2606 -0.2606 15 11 15 11 BM_string_memcmp/6/0/0 -0.1656 -0.1655 17 14 17 14 BM_string_memcmp/7/0/0 -0.1721 -0.1721 19 15 19 15 BM_string_memcmp/8/0/0 -0.3048 -0.3048 15 10 15 10 BM_string_memcmp/9/0/0 -0.3041 -0.3041 15 10 15 10 BM_string_memcmp/10/0/0 -0.3040 -0.3040 15 10 15 10 BM_string_memcmp/11/0/0 -0.3048 -0.3048 15 10 15 10 BM_string_memcmp/12/0/0 -0.3041 -0.3041 15 10 15 10 BM_string_memcmp/13/0/0 -0.3040 -0.3040 15 10 15 10 BM_string_memcmp/14/0/0 -0.3048 -0.3048 15 10 15 10 BM_string_memcmp/15/0/0 -0.3040 -0.3040 15 10 15 10 BM_string_memcmp/16/0/0 -0.3041 -0.3041 15 10 15 10 BM_string_memcmp/24/0/0 -0.1209 -0.1209 15 13 15 13 BM_string_memcmp/32/0/0 -0.3228 -0.3228 20 13 20 13 BM_string_memcmp/40/0/0 -0.2937 -0.2937 22 15 22 15 BM_string_memcmp/48/0/0 -0.3299 -0.3299 23 15 23 15 BM_string_memcmp/56/0/0 -0.1845 -0.1845 24 20 24 20 BM_string_memcmp/64/0/0 -0.2247 -0.2247 26 20 26 20 BM_string_memcmp/72/0/0 -0.1947 -0.1947 27 22 27 22 BM_string_memcmp/80/0/0 -0.2275 -0.2275 28 22 28 22 BM_string_memcmp/88/0/0 -0.2360 -0.2360 29 22 29 22 BM_string_memcmp/96/0/0 -0.2675 -0.2675 31 22 31 22 BM_string_memcmp/104/0/0 -0.2559 -0.2559 32 24 32 24 BM_string_memcmp/112/0/0 -0.2787 -0.2786 33 24 33 24 BM_string_memcmp/120/0/0 -0.2599 -0.2599 34 25 34 25 BM_string_memcmp/128/0/0 -0.2860 -0.2860 35 25 35 25 BM_string_memcmp/136/0/0 -0.4708 -0.4708 53 28 53 28 BM_string_memcmp/144/0/0 -0.4719 -0.4719 53 28 53 28 BM_string_memcmp/160/0/0 -0.4680 -0.4680 56 30 56 30 BM_string_memcmp/176/0/0 -0.4645 -0.4645 60 32 60 32 BM_string_memcmp/192/0/0 -0.4641 -0.4641 63 34 63 34 BM_string_memcmp/208/0/0 -0.4555 -0.4555 66 36 66 36 BM_string_memcmp/224/0/0 -0.4558 -0.4557 69 38 69 38 BM_string_memcmp/240/0/0 -0.4534 -0.4534 72 40 72 40 BM_string_memcmp/256/0/0 -0.4463 -0.4463 75 42 75 42 BM_string_memcmp/512/0/0 -0.3077 -0.3077 126 88 126 88 BM_string_memcmp/1024/0/0 -0.3493 -0.3493 229 149 229 149 BM_string_memcmp/8192/0/0 -0.4173 -0.4173 1729 1007 1729 1007 BM_string_memcmp/16384/0/0 -0.3855 -0.3855 3377 2076 3377 2075 BM_string_memcmp/32768/0/0 -0.2968 -0.2968 6847 4815 6847 4814 BM_string_memcmp/65536/0/0 -0.2496 -0.2496 13715 10292 13714 10291 BM_string_memcmp/131072/0/0 -0.2676 -0.2676 27354 20033 27351 20031 BM_string_memcmp/262144/0/0 -0.2319 -0.2319 54604 41943 54598 41939 BM_string_memcmp/524288/0/0 -0.2359 -0.2359 109225 83460 109212 83449 BM_string_memcmp/1048576/0/0 -0.0439 -0.0439 423367 404791 423251 404686 BM_string_memcmp/2097152/0/0 -0.0023 -0.0024 762470 760701 761956 760122 BM_string_memcmp/512/4/4 -0.2853 -0.2853 125 89 125 89 BM_string_memcmp/1024/4/4 -0.3377 -0.3377 228 151 227 151 BM_string_memcmp/8192/4/4 -0.4083 -0.4083 1706 1009 1706 1009 BM_string_memcmp/16384/4/4 -0.3853 -0.3853 3376 2075 3376 2075 BM_string_memcmp/32768/4/4 -0.2974 -0.2974 6846 4810 6845 4810 BM_string_memcmp/65536/4/4 -0.2485 -0.2485 13619 10235 13618 10234 BM_string_memcmp/131072/4/4 -0.2387 -0.2387 27056 20597 27054 20595 BM_string_memcmp/512/4/0 -0.2898 -0.2898 123 88 123 88 BM_string_memcmp/1024/4/0 -0.3401 -0.3401 225 149 225 149 BM_string_memcmp/8192/4/0 -0.4167 -0.4167 1727 1007 1727 1007 BM_string_memcmp/16384/4/0 -0.3820 -0.3820 3384 2092 3384 2091 BM_string_memcmp/32768/4/0 -0.2535 -0.2535 6886 5141 6886 5140 BM_string_memcmp/65536/4/0 -0.1897 -0.1897 13850 11223 13849 11223 BM_string_memcmp/131072/4/0 -0.1972 -0.1972 27536 22106 27533 22104 BM_string_memcmp/512/0/4 -0.2854 -0.2854 125 89 125 89 BM_string_memcmp/1024/0/4 -0.3332 -0.3333 226 151 226 151 BM_string_memcmp/8192/0/4 -0.4199 -0.4199 1740 1009 1740 1009 BM_string_memcmp/16384/0/4 -0.3811 -0.3811 3383 2094 3383 2094 BM_string_memcmp/32768/0/4 -0.2409 -0.2409 6900 5238 6899 5237 BM_string_memcmp/65536/0/4 -0.1920 -0.1920 13922 11250 13921 11248 BM_string_memcmp/131072/0/4 -0.2029 -0.2029 27699 22079 27697 22077 [1] Commit id: f77e4c932b4fd65177b57dd5e220bd17fb4037d6 Test: bionic tests and benchmarks on aarch64. Change-Id: I2791e2b20d1c0ad429e8e5a41d3e47b1ac02c921 |
||
Haibo Huang
|
8a0f0ed5e7 |
Make memcpy memmove
Bug: http://b/63992911 Test: Change BoardConfig.mk and compile for each variant Change-Id: Ia0cc68d8e90e3316ddb2e9ff1555a009b6a0c5be |
||
Haibo Huang
|
ece43e14c9 |
Use cortex-a53/bionic/memmove.S by default for arm64
cortex-a53/bionic/memmove.S looks like a more optimized version. It should be used in most cases. It delegates small (<= 96 bytes) moves to memcpy. The only exception is denver64. It is using its own memcpy, which doesn't allow overlap for < 96 bytes copies. Only for this variant we need generic/bionic/memmove.S. Benchmark result looks pretty close through (on marlin) Before: using generic/bionic/memmove.S ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- BM_string_memcpy/8/0/0 6 ns 6 ns 108872005 1.15787GB/s BM_string_memcpy/64/0/0 7 ns 7 ns 107387438 9.14365GB/s BM_string_memcpy/512/0/0 21 ns 20 ns 34165353 23.2734GB/s BM_string_memcpy/1024/0/0 40 ns 39 ns 17766657 24.2346GB/s BM_string_memcpy/8192/0/0 311 ns 310 ns 2259904 24.6339GB/s BM_string_memcpy/16384/0/0 616 ns 613 ns 1143027 24.8852GB/s BM_string_memcpy/32768/0/0 1322 ns 1316 ns 530799 23.1835GB/s BM_string_memcpy/65536/0/0 2672 ns 2661 ns 229638 22.937GB/s BM_string_memcpy/131072/0/0 5379 ns 5357 ns 128316 22.788GB/s After: using cortex-a53/bionic/memmove.S ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- BM_string_memcpy/8/0/0 6 ns 6 ns 116610749 1.24646GB/s BM_string_memcpy/64/0/0 6 ns 6 ns 115634093 9.84708GB/s BM_string_memcpy/512/0/0 21 ns 21 ns 34167322 22.8938GB/s BM_string_memcpy/1024/0/0 39 ns 39 ns 17859445 24.3312GB/s BM_string_memcpy/8192/0/0 311 ns 310 ns 2260192 24.6325GB/s BM_string_memcpy/16384/0/0 610 ns 608 ns 1151889 25.0987GB/s BM_string_memcpy/32768/0/0 1488 ns 1482 ns 532508 20.5988GB/s BM_string_memcpy/65536/0/0 2421 ns 2411 ns 290502 25.3146GB/s BM_string_memcpy/131072/0/0 5278 ns 5256 ns 132710 23.2234GB/s Test: Build and benchmark on marlin Bug: http://b/63992911 Change-Id: Id85961aca18ba841bcbcfe0d8b162843eab30584 |
||
Mark Salyzyn
|
79249b0897 |
bionic: add vdso clock_getres
clock_getres() should not be a hot call, nevertheless it is ~6-7 times faster for supported clock ids if it uses __vdso_clock_getres if available. There is a 3% performance penalty for unsupported clock ids via __vdso_clock_getres with respect to a direct syscall. [TL;DR] w/vdso32 kernel patches, locked cores to MAX, little cores only. BEFORE: hikey960 vdso (aarch64): ---------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------- BM_time_clock_getres 126 ns 126 ns 5577874 BM_time_clock_getres_syscall 127 ns 127 ns 5505016 BM_time_clock_getres_REALTIME 126 ns 126 ns 5574682 BM_time_clock_getres_BOOTTIME 126 ns 126 ns 5575237 BM_time_clock_getres_TAI 126 ns 126 ns 5576810 BM_time_clock_getres_unsupported 128 ns 128 ns 5480189 hikey960 vdso32 (aarch32): ---------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------- BM_time_clock_getres 199 ns 199 ns 3508708 BM_time_clock_getres_syscall 220 ns 220 ns 3184676 BM_time_clock_getres_REALTIME 199 ns 199 ns 3509697 BM_time_clock_getres_BOOTTIME 199 ns 199 ns 3513551 BM_time_clock_getres_TAI 200 ns 199 ns 3512412 BM_time_clock_getres_unsupported 196 ns 196 ns 3575609 x86_64 (glibc): --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_time_clock_getres 252 ns 252 ns 2370263 BM_time_clock_getres_syscall 215 ns 215 ns 3287497 BM_time_clock_getres_REALTIME 214 ns 214 ns 3294228 BM_time_clock_getres_BOOTTIME 213 ns 213 ns 3277519 BM_time_clock_getres_TAI 213 ns 213 ns 3294991 BM_time_clock_getres_unsupported 206 ns 206 ns 3450654 imx7d_pico IOT nyc (w/arm,cpu-registers-not-fw-configured) (armv7a): (Virtual Timers) Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------------------ BM_time_clock_getres 16 345 2000000 BM_time_clock_getres_syscall 16 339 2121212 BM_time_clock_getres_REALTIME 17 350 2058824 BM_time_clock_getres_BOOTTIME 17 345 2000000 BM_time_clock_getres_TAI 16 350 2000000 BM_time_clock_getres_unsupported 13 284 2500000 AFTER: hikey960 vdso (aarch64): --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_time_clock_getres 18 ns 18 ns 37880389 BM_time_clock_getres_syscall 127 ns 127 ns 5520029 BM_time_clock_getres_REALTIME 18 ns 18 ns 37879962 BM_time_clock_getres_BOOTTIME 19 ns 18 ns 37878361 BM_time_clock_getres_TAI 131 ns 131 ns 5368484 BM_time_clock_getres_unsupported 97 ns 97 ns 7182864 hikey960 vdso32 (aarch32): --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_time_clock_getres 36 ns 36 ns 19205240 BM_time_clock_getres_syscall 212 ns 212 ns 3297100 BM_time_clock_getres_REALTIME 36 ns 36 ns 19219109 BM_time_clock_getres_BOOTTIME 36 ns 36 ns 19222490 BM_time_clock_getres_TAI 206 ns 206 ns 3402868 BM_time_clock_getres_unsupported 159 ns 159 ns 4409492 imx7d_pico IOT nyc (wo/arm,cpu-registers-not-fw-configured) (armv7a): (Physical Timers) Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------------------------ BM_time_clock_getres 2 48 14000000 BM_time_clock_getres_syscall 14 335 2058824 BM_time_clock_getres_REALTIME 2 49 14583333 BM_time_clock_getres_BOOTTIME 2 48 14000000 BM_time_clock_getres_TAI 14 350 2058824 BM_time_clock_getres_unsupported 8 203 3500000 Test: taskset F \ /data/benchmarktest{64}/bionic-benchmarks/bionic-benchmarks \ --bionic_xml=vdso.xml --benchmark_filter=BM_time_clock_getres* Bug: 63737556 Change-Id: I80c0a5106625d76720287f715fcf145d2aad1705 |
||
Sebastian Pop
|
ed9bfc4616 |
[AArch64] Optimized memcmp
Patch written by Wilco Dijkstra submitted for review to newlib: https://sourceware.org/ml/newlib/2017/msg00524.html This is an optimized memcmp for AArch64. This is a complete rewrite using a different algorithm. The previous version split into cases where both inputs were aligned, the inputs were mutually aligned and unaligned using a byte loop. The new version combines all these cases, while small inputs of less than 8 bytes are handled separately. This allows the main code to be sped up using unaligned loads since there are now at least 8 bytes to be compared. After the first 8 bytes, align the first input. This ensures each iteration does at most one unaligned access and mutually aligned inputs behave as aligned. After the main loop, process the last 8 bytes using unaligned accesses. This improves performance of (mutually) aligned cases by 25% and unaligned by >500% (yes >6 times faster) on large inputs. 2017-06-28 Wilco Dijkstra <wdijkstr@arm.com> * bionic/libc/arch-arm64/generic/bionic/memcmp.S (memcmp): Rewrite of optimized memcmp. GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53: Length 1, alignment 1/ 1: 153% Length 1, alignment 1/ 1: 119% Length 1, alignment 1/ 1: 154% Length 2, alignment 2/ 2: 121% Length 2, alignment 2/ 2: 140% Length 2, alignment 2/ 2: 121% Length 3, alignment 3/ 3: 105% Length 3, alignment 3/ 3: 105% Length 3, alignment 3/ 3: 105% Length 4, alignment 4/ 4: 155% Length 4, alignment 4/ 4: 154% Length 4, alignment 4/ 4: 161% Length 5, alignment 5/ 5: 173% Length 5, alignment 5/ 5: 173% Length 5, alignment 5/ 5: 173% Length 6, alignment 6/ 6: 145% Length 6, alignment 6/ 6: 145% Length 6, alignment 6/ 6: 145% Length 7, alignment 7/ 7: 125% Length 7, alignment 7/ 7: 125% Length 7, alignment 7/ 7: 125% Length 8, alignment 8/ 8: 111% Length 8, alignment 8/ 8: 130% Length 8, alignment 8/ 8: 124% Length 9, alignment 9/ 9: 160% Length 9, alignment 9/ 9: 160% Length 9, alignment 9/ 9: 150% Length 10, alignment 10/10: 170% Length 10, alignment 10/10: 137% Length 10, alignment 10/10: 150% Length 11, alignment 11/11: 160% Length 11, alignment 11/11: 160% Length 11, alignment 11/11: 160% Length 12, alignment 12/12: 146% Length 12, alignment 12/12: 168% Length 12, alignment 12/12: 156% Length 13, alignment 13/13: 167% Length 13, alignment 13/13: 167% Length 13, alignment 13/13: 173% Length 14, alignment 14/14: 167% Length 14, alignment 14/14: 168% Length 14, alignment 14/14: 168% Length 15, alignment 15/15: 168% Length 15, alignment 15/15: 173% Length 15, alignment 15/15: 173% Length 1, alignment 0/ 0: 134% Length 1, alignment 0/ 0: 127% Length 1, alignment 0/ 0: 119% Length 2, alignment 0/ 0: 94% Length 2, alignment 0/ 0: 94% Length 2, alignment 0/ 0: 106% Length 3, alignment 0/ 0: 82% Length 3, alignment 0/ 0: 87% Length 3, alignment 0/ 0: 82% Length 4, alignment 0/ 0: 115% Length 4, alignment 0/ 0: 115% Length 4, alignment 0/ 0: 122% Length 5, alignment 0/ 0: 127% Length 5, alignment 0/ 0: 119% Length 5, alignment 0/ 0: 127% Length 6, alignment 0/ 0: 103% Length 6, alignment 0/ 0: 100% Length 6, alignment 0/ 0: 100% Length 7, alignment 0/ 0: 82% Length 7, alignment 0/ 0: 91% Length 7, alignment 0/ 0: 87% Length 8, alignment 0/ 0: 111% Length 8, alignment 0/ 0: 124% Length 8, alignment 0/ 0: 124% Length 9, alignment 0/ 0: 136% Length 9, alignment 0/ 0: 136% Length 9, alignment 0/ 0: 136% Length 10, alignment 0/ 0: 136% Length 10, alignment 0/ 0: 135% Length 10, alignment 0/ 0: 136% Length 11, alignment 0/ 0: 136% Length 11, alignment 0/ 0: 136% Length 11, alignment 0/ 0: 135% Length 12, alignment 0/ 0: 136% Length 12, alignment 0/ 0: 136% Length 12, alignment 0/ 0: 136% Length 13, alignment 0/ 0: 135% Length 13, alignment 0/ 0: 136% Length 13, alignment 0/ 0: 136% Length 14, alignment 0/ 0: 136% Length 14, alignment 0/ 0: 136% Length 14, alignment 0/ 0: 136% Length 15, alignment 0/ 0: 136% Length 15, alignment 0/ 0: 136% Length 15, alignment 0/ 0: 136% Length 4, alignment 0/ 0: 115% Length 4, alignment 0/ 0: 115% Length 4, alignment 0/ 0: 115% Length 32, alignment 0/ 0: 127% Length 32, alignment 7/ 2: 395% Length 32, alignment 0/ 0: 127% Length 32, alignment 0/ 0: 127% Length 8, alignment 0/ 0: 111% Length 8, alignment 0/ 0: 124% Length 8, alignment 0/ 0: 124% Length 64, alignment 0/ 0: 128% Length 64, alignment 6/ 4: 475% Length 64, alignment 0/ 0: 131% Length 64, alignment 0/ 0: 134% Length 16, alignment 0/ 0: 128% Length 16, alignment 0/ 0: 119% Length 16, alignment 0/ 0: 128% Length 128, alignment 0/ 0: 129% Length 128, alignment 5/ 6: 475% Length 128, alignment 0/ 0: 130% Length 128, alignment 0/ 0: 129% Length 32, alignment 0/ 0: 126% Length 32, alignment 0/ 0: 126% Length 32, alignment 0/ 0: 126% Length 256, alignment 0/ 0: 127% Length 256, alignment 4/ 8: 545% Length 256, alignment 0/ 0: 126% Length 256, alignment 0/ 0: 128% Length 64, alignment 0/ 0: 171% Length 64, alignment 0/ 0: 171% Length 64, alignment 0/ 0: 174% Length 512, alignment 0/ 0: 126% Length 512, alignment 3/10: 585% Length 512, alignment 0/ 0: 126% Length 512, alignment 0/ 0: 127% Length 128, alignment 0/ 0: 129% Length 128, alignment 0/ 0: 128% Length 128, alignment 0/ 0: 129% Length 1024, alignment 0/ 0: 125% Length 1024, alignment 2/12: 611% Length 1024, alignment 0/ 0: 126% Length 1024, alignment 0/ 0: 126% Length 256, alignment 0/ 0: 128% Length 256, alignment 0/ 0: 127% Length 256, alignment 0/ 0: 128% Length 2048, alignment 0/ 0: 125% Length 2048, alignment 1/14: 625% Length 2048, alignment 0/ 0: 125% Length 2048, alignment 0/ 0: 125% Length 512, alignment 0/ 0: 126% Length 512, alignment 0/ 0: 127% Length 512, alignment 0/ 0: 127% Length 4096, alignment 0/ 0: 125% Length 4096, alignment 0/16: 125% Length 4096, alignment 0/ 0: 125% Length 4096, alignment 0/ 0: 125% Length 1024, alignment 0/ 0: 126% Length 1024, alignment 0/ 0: 126% Length 1024, alignment 0/ 0: 126% Length 8192, alignment 0/ 0: 125% Length 8192, alignment 63/18: 636% Length 8192, alignment 0/ 0: 125% Length 8192, alignment 0/ 0: 125% Length 16, alignment 1/ 2: 317% Length 16, alignment 1/ 2: 317% Length 16, alignment 1/ 2: 317% Length 32, alignment 2/ 4: 395% Length 32, alignment 2/ 4: 395% Length 32, alignment 2/ 4: 398% Length 64, alignment 3/ 6: 475% Length 64, alignment 3/ 6: 475% Length 64, alignment 3/ 6: 477% Length 128, alignment 4/ 8: 479% Length 128, alignment 4/ 8: 479% Length 128, alignment 4/ 8: 479% Length 256, alignment 5/10: 543% Length 256, alignment 5/10: 539% Length 256, alignment 5/10: 543% Length 512, alignment 6/12: 585% Length 512, alignment 6/12: 585% Length 512, alignment 6/12: 585% Length 1024, alignment 7/14: 611% Length 1024, alignment 7/14: 611% Length 1024, alignment 7/14: 611% The performance measured on the bionic-benchmarks on a hikey board with a new benchmark for unaligned memcmp submitted for review at https://android-review.googlesource.com/414860 The base is with the libc from /system/lib64. The bionic libc with this patch is in /data. hikey:/data # export LD_LIBRARY_PATH=/system/lib64 hikey:/data # ./bionic-benchmarks --benchmark_filter=BM_string_memcmp Run on (8 X 2.4 MHz CPU s) Benchmark Time CPU Iterations ---------------------------------------------------------------------- BM_string_memcmp/8 30 ns 30 ns 22955680 251.07MB/s BM_string_memcmp/64 57 ns 57 ns 12349184 1076.99MB/s BM_string_memcmp/512 305 ns 305 ns 2297163 1.56496GB/s BM_string_memcmp/1024 571 ns 571 ns 1225211 1.66912GB/s BM_string_memcmp/8k 4307 ns 4306 ns 162562 1.77177GB/s BM_string_memcmp/16k 8676 ns 8675 ns 80676 1.75887GB/s BM_string_memcmp/32k 19233 ns 19230 ns 36394 1.58695GB/s BM_string_memcmp/64k 36986 ns 36984 ns 18952 1.65029GB/s BM_string_memcmp_aligned/8 199 ns 199 ns 3519166 38.3336MB/s BM_string_memcmp_aligned/64 386 ns 386 ns 1810734 158.073MB/s BM_string_memcmp_aligned/512 1735 ns 1734 ns 403981 281.525MB/s BM_string_memcmp_aligned/1024 3200 ns 3200 ns 218838 305.151MB/s BM_string_memcmp_aligned/8k 25084 ns 25080 ns 28180 311.507MB/s BM_string_memcmp_aligned/16k 51730 ns 51729 ns 13521 302.057MB/s BM_string_memcmp_aligned/32k 103228 ns 103228 ns 6782 302.727MB/s BM_string_memcmp_aligned/64k 207117 ns 207087 ns 3450 301.806MB/s BM_string_memcmp_unaligned/8 339 ns 339 ns 2070998 22.5302MB/s BM_string_memcmp_unaligned/64 1392 ns 1392 ns 502796 43.8454MB/s BM_string_memcmp_unaligned/512 9194 ns 9194 ns 76133 53.1104MB/s BM_string_memcmp_unaligned/1024 18325 ns 18323 ns 38206 53.2963MB/s BM_string_memcmp_unaligned/8k 148579 ns 148574 ns 4713 52.5831MB/s BM_string_memcmp_unaligned/16k 298169 ns 298120 ns 2344 52.4118MB/s BM_string_memcmp_unaligned/32k 598813 ns 598797 ns 1085 52.188MB/s BM_string_memcmp_unaligned/64k 1196079 ns 1196083 ns 540 52.2539MB/s hikey:/data # export LD_LIBRARY_PATH=/data hikey:/data # ./bionic-benchmarks --benchmark_filter=BM_string_memcmp Benchmark Time CPU Iterations ---------------------------------------------------------------------- BM_string_memcmp/8 27 ns 27 ns 26198166 286.069MB/s BM_string_memcmp/64 45 ns 45 ns 15553753 1.32443GB/s BM_string_memcmp/512 242 ns 242 ns 2892423 1.97049GB/s BM_string_memcmp/1024 455 ns 455 ns 1537290 2.09436GB/s BM_string_memcmp/8k 3446 ns 3446 ns 203295 2.21392GB/s BM_string_memcmp/16k 7567 ns 7567 ns 92582 2.01657GB/s BM_string_memcmp/32k 16081 ns 16081 ns 43524 1.8977GB/s BM_string_memcmp/64k 31029 ns 31028 ns 22565 1.96712GB/s BM_string_memcmp_aligned/8 184 ns 184 ns 3800912 41.3654MB/s BM_string_memcmp_aligned/64 287 ns 287 ns 2438835 212.65MB/s BM_string_memcmp_aligned/512 1370 ns 1370 ns 511014 356.498MB/s BM_string_memcmp_aligned/1024 2543 ns 2543 ns 275253 384.006MB/s BM_string_memcmp_aligned/8k 20413 ns 20411 ns 34306 382.764MB/s BM_string_memcmp_aligned/16k 42908 ns 42907 ns 16132 364.158MB/s BM_string_memcmp_aligned/32k 88902 ns 88886 ns 8087 351.574MB/s BM_string_memcmp_aligned/64k 173016 ns 173007 ns 4122 361.258MB/s BM_string_memcmp_unaligned/8 212 ns 212 ns 3304163 36.0243MB/s BM_string_memcmp_unaligned/64 361 ns 361 ns 1941597 169.279MB/s BM_string_memcmp_unaligned/512 1754 ns 1753 ns 399210 278.492MB/s BM_string_memcmp_unaligned/1024 3308 ns 3308 ns 211622 295.243MB/s BM_string_memcmp_unaligned/8k 27227 ns 27225 ns 25637 286.964MB/s BM_string_memcmp_unaligned/16k 55877 ns 55874 ns 12455 279.645MB/s BM_string_memcmp_unaligned/32k 112397 ns 112366 ns 6200 278.11MB/s BM_string_memcmp_unaligned/64k 223493 ns 223482 ns 3127 279.665MB/s Test: bionic-benchmarks --benchmark_filter='BM_string_memcmp*' Change-Id: Ia16a8cf69c68b8c0533f025f03b925c9883bb708 |