Commit graph

212 commits

Author SHA1 Message Date
Evgenii Stepanov
3031a7e45e memtag_stack: vfork and longjmp support.
With memtag_stack, each function is responsible for cleaning up
allocation tags for its stack frame. Allocation tags for anything below
SP must match the address tag in SP.

Both vfork and longjmp implement non-local control transfer which
abandons part of the stack without proper cleanup. Update allocation
tags:
* For longjmp, we know both source and destination values of SP.
* For vfork, save the value of SP before exit() or exec*() - the only
  valid ways of ending the child process according to POSIX - and reset
  tags from there to SP-in-parent.

This is not 100% solid and can be confused by a number of hopefully
uncommon conditions:
* Segmented stacks.
* Longjmp from sigaltstack into the main stack.
* Some kind of userspace thread implementation using longjmp (that's UB,
  longjmp can only return to the caller on the current stack).
* and other strange things.

This change adds a sanity limit on the size of the tag cleanup. Also,
this logic is only activated in the binaries that carry the
NT_MEMTAG_STACK note (set by -fsanitize=memtag-stack) which is meant as
a debugging configuration, is not compatible with pre-armv9 CPUs, and
should not be set on production code.

Bug: b/174878242
Test: fvp_mini with ToT LLVM (more test in a separate change)

Change-Id: Ibef8b2fc5a6ce85c8e562dead1019964d9f6b80b
2022-05-27 13:19:34 -07:00
Mitch Phillips
93400371f7 [NFCI] Change Android's NT_TYPE to NT_ANDROID_TYPE.
Normally, platform-specific note types in the toolchain are prefixed
with the platform name. Because we're exposing the NT_TYPE_MEMTAG and
synthesizing the note in the toolchain in an upcoming patch
(https://reviews.llvm.org/D118948), it's been requested that we change
the name to include the platform prefix.

While NT_TYPE_IDENT and NT_TYPE_KUSER aren't known about or synthesized
by the toolchain, update those references as well for consistency.

Bug: N/A
Test: Build Android
Change-Id: I7742e4917ae275d59d7984991664ea48028053a1
2022-02-07 13:49:20 -08:00
Elliott Hughes
c0d41db92e setjmp/longjmp: avoid invalid values in the stack pointer.
arm64 was already being careful, but x86/x86-64 and 32-bit ARM could be
caught by a signal in a state where the stack pointer was mangled.

For 32-bit ARM I've taken care with the link register too, to avoid
potential issues with unwinding.

Bug: http://b/152210274
Test: treehugger
Change-Id: I1ce285b017a633c732dbe04743368f4cae27af85
2021-04-05 17:43:36 -07:00
Elliott Hughes
3e1d5563b6 PAC/BTI: no need to keep using hint.
The toolchain is new enough that should be able to use the actual
instructions now...

Test: treehugger
Change-Id: I30aafcdc5386268344c40dc6cc9a22caf591915a
2021-01-25 08:49:01 -08:00
Peter Collingbourne
7e20117a36 Remove ANDROID_EXPERIMENTAL_MTE.
Now that the feature guarded by this flag has landed in Linux 5.10
we no longer need the flag, so we can remove it.

Bug: 135772972
Change-Id: I02fa50848cbd0486c23c8a229bb8f1ab5dd5a56f
2021-01-11 10:55:51 -08:00
Evgenii Stepanov
8564b8d9e6 Use ELF notes to set the desired memory tagging level.
Use a note in executables to specify
(none|sync|async) heap tagging level. To be extended with (heap x stack x
globals) in the future. A missing note disables all tagging.

Bug: b/135772972
Test: bionic-unit-tests (in a future change)

Change-Id: Iab145a922c7abe24cdce17323f9e0c1063cc1321
2021-01-06 16:08:18 -08:00
Tamas Petz
f5bdee7fdf libc: Add Armv8.3-A PAuth and Armv8.5-A BTI compatibility to *.S
The most notable change is in sigsetjmp/siglongjmp. The former
stores LR signed with the current SP into jmp_buf. Calling siglongjmp
reads a signed LR and the corresponding SP from jmp_buf. This way not
only the checksum provides some means of integrity protection but
Pointer Authentication too.

Test: Tested on FVP with BTI enabled.

Change-Id: I9d720239775f8d2829a677901f546c4b14b5cbe5
2020-09-04 11:29:12 +02:00
Peter Collingbourne
2361d4ef80 Adopt remaining MTE string routines.
ARM has released the remaining MTE string routines, so let's start
using them. The strnlen implementation is now compatible with MTE,
so it no longer needs to be an ifunc.

Bug: 135772972
Change-Id: I9de7fb44447aa1b878f4ad3f62cb0129857b43ad
2020-06-11 08:52:26 -07:00
Josh Gao
2303283740 Track whether a thread is currently vforked.
Our various fd debugging facilities get extremely confused by a vforked
process closing file descriptors in preparation to exec: fdsan can
abort, and fdtrack will delete backtraces for any file descriptors that
get closed. Keep track of whether we're in a vforked child in order to
be able to detect this.

Bug: http://b/153926671
Test: 32/64-bit bionic-unit-tests on blueline, x86_64 emulator
Change-Id: I8a082fd06bfdfef0e2a88dbce350b6f667f7df9f
2020-05-07 19:44:27 -07:00
Peter Collingbourne
337a5b3f9a Switch to the arm-optimized-routines string routines on aarch64 where possible.
This includes optimized strrchr and strchrnul routines, and an MTE-compatible
strlen routine.

Bug: 135772972
Change-Id: I48499f757cdc6d3e77e5649123d45b17dfa3c6b0
2020-02-25 13:11:55 -08:00
Peter Collingbourne
900d07d6a1 Add arm64 string.h function implementations for use with hardware supporting MTE.
As it turns out, our "generic" arm64 implementations of certain string.h
functions are not actually generic, since they will eagerly read memory
possibly outside of the bounds of an MTE granule, which may lead to a segfault
on MTE-enabled hardware. Therefore, move the implementations into a "default"
directory and use ifuncs to select between them and a new set of "mte"
implementations, conditional on whether the hardware and kernel support MTE.

The MTE implementations are currently naive implementations written in C
but will later be replaced with a set of optimized assembly implementations.

Bug: 135772972
Change-Id: Ife37c4e0e6fd60ff20a34594cc09c541af4d1dd7
2019-10-29 16:18:31 -07:00
Christopher Ferris
b8a95e2186 Update to kernel headers v5.3.2.
Test: Builds and run unit tests on taimen/cuttlefish.
Change-Id: I6ebd8f179d159ac974555e8edca588083e8081b3
2019-10-03 10:59:32 -07:00
Christopher Ferris
c5d3a4348a Make tls related header files platform accessible.
There are places in frameworks and art code that directly included
private bionic header files. Move these files to the new platform
include files.

This change also moves the __get_tls.h header file to tls.h and includes
the tls defines header so that there is a single header that platform
code can use to get __get_tls and the defines.

Also, simplify the visibility rules for platform includes.

Bug: 141560639

Test: Builds and bionic unit tests pass.
Change-Id: I9e5e9c33fe8a85260f69823468bc9d340ab7a1f9
Merged-In: I9e5e9c33fe8a85260f69823468bc9d340ab7a1f9
(cherry picked from commit 44631c919a)
2019-09-27 12:14:24 -07:00
Elliott Hughes
782c485880 Generate assembler system call stubs via genrule.
There's no need to check in generated code.

Test: builds & boots
Change-Id: Ife368bca4349d4adeb0666db590356196b4fbd63
2019-04-16 12:31:00 -07:00
Elliott Hughes
d67b03734d libc: generate syscall stubs in one big file...
...all the better to switch to a genrule rather than checking in
generated source.

This also removes all the code in the script to deal with git,
rather than fix it. We won't need that where we're going.

Test: boots
Change-Id: I468ce019d4232a7ef27e5cb5cfd89f4c2fe4ecbd
2019-04-16 00:54:11 +00:00
Evgenii Stepanov
505168e530 Annotate vfork for hwasan.
Call a hwasan hook in the parent return path for vfork() to let hwasan
update its shadow. See https://github.com/google/sanitizers/issues/925
for more details.

Bug: 112438058
Test: bionic-unit-tests
Change-Id: I9a06800962913e822bd66e072012d0a2c5be453d
2019-03-19 23:36:44 +00:00
Ryan Prichard
82aea78136 Use TLS_SLOT_THREAD_ID macro in vfork.S
No functional change intended.

Bug: none
Test: bionic unit tests
Change-Id: I7ee0a2b3f0e3807abe88bfa34ef3cd56c150a8f6
2019-01-16 01:11:26 -08:00
Haibo Huang
3927db1d5b Remove denver64 from libc
Test: compile
Change-Id: Ifcbe15c1682b4e1e18835e38915b2421196882f7
2018-11-30 22:28:39 +00:00
Peter Collingbourne
734beec3d4 Allocate a small guard region around the shadow call stack.
This lets us do two things:

1) Make setjmp and longjmp compatible with shadow call stack.
   To avoid leaking the shadow call stack address into memory, only the
   lower log2(SCS_SIZE) bits of x18 are stored to jmp_buf. This requires
   allocating an additional guard page so that we're guaranteed to be
   able to allocate a sufficiently aligned SCS.

2) SCS overflow detection. Overflows now result in a SIGSEGV instead
   of corrupting the allocation that comes after it.

Change-Id: I04d6634f96162bf625684672a87fba8b402b7fd1
Test: bionic-unit-tests
2018-11-16 14:37:08 -08:00
Treehugger Robot
a2a114ba26 Merge "Annotate siglongjmp for HWASan." 2018-09-06 21:35:09 +00:00
Evgenii Stepanov
b16e9ce7b8 Annotate siglongjmp for HWASan.
HWASan needs to re-tag the newly unallocated stack space to match SP.

Bug: 112438058
Test: SANITIZE_TARGET=hwaddress

Change-Id: I4dddef542d802d63bdea59e32a03425a2c4f870b
2018-09-05 13:37:14 -07:00
Evgenii Stepanov
00d087c629 (arm64) Extend branch range in __memcpy_chk.
Conditional branch has limited range (1MB) and can not be extended by
the linker. The current distance (in walleye build) is 500KB, about
half of the maximum. HWASan pushes it over the limit.

Replace conditional branch with regular branch, which has longer
range (26 vs 19 bits offset) and can be extended in the linker if
needed.

Bug: 112437884
Bug: 12231437
Test: SANITIZE_TARGET=hwaddress

Change-Id: Idc083fb557ab3a859541beb009809992406a6703
2018-08-31 15:02:12 -07:00
Adhemerval Zanella
65a6211f4d [AArch64] Improve strncmp for mutually misaligned inputs
This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. The mutually misaligned inputs on aarch64 are
compared with a simple byte copy, which is not very efficient.
This patch enhances the comparison similar to strcmp by loading a
double-word at a time.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A54 (using the
new proposed memcmp input data from test_strncmp.xml):

  - No noticeable change on aligned inputs or with same alignment.

  - Large improvements on unaligned inputs from sizes larger than
    16 bytes.

Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
BM_string_strncmp/1/0/0              -0.0954         -0.0954            19            17            19            17
BM_string_strncmp/2/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/3/0/0              +0.1768         +0.1768            15            18            15            18
BM_string_strncmp/4/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/5/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/6/0/0              +0.1589         +0.1589            15            18            15            18
BM_string_strncmp/7/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/8/0/0              -0.0998         -0.0998            19            17            19            17
BM_string_strncmp/9/0/0              -0.0277         -0.0277            23            22            23            22
BM_string_strncmp/10/0/0             -0.0270         -0.0270            23            22            23            22
BM_string_strncmp/11/0/0             -0.0331         -0.0331            23            22            23            22
BM_string_strncmp/12/0/0             -0.0270         -0.0270            23            22            23            22
BM_string_strncmp/13/0/0             -0.0284         -0.0284            23            22            23            22
BM_string_strncmp/14/0/0             +0.1042         +0.1042            20            22            20            22
BM_string_strncmp/15/0/0             -0.0277         -0.0277            23            22            23            22
BM_string_strncmp/16/0/0             +0.0214         +0.0215            22            22            22            22
BM_string_strncmp/24/0/0             -0.1291         -0.1291            24            21            24            21
BM_string_strncmp/32/0/0             -0.0470         -0.0470            27            26            27            26
BM_string_strncmp/40/0/0             -0.0433         -0.0433            29            28            29            28
BM_string_strncmp/48/0/0             -0.0301         -0.0301            31            30            31            30
BM_string_strncmp/56/0/0             -0.0800         -0.0800            33            31            33            31
BM_string_strncmp/64/0/0             +0.0188         +0.0188            34            34            34            34
BM_string_strncmp/72/0/0             -0.0334         -0.0334            38            37            38            37
BM_string_strncmp/80/0/0             -0.0000         -0.0000            40            40            40            40
BM_string_strncmp/88/0/0             +0.0413         +0.0413            61            64            61            64
BM_string_strncmp/96/0/0             -0.0215         -0.0216            69            67            69            67
BM_string_strncmp/104/0/0            -0.0208         -0.0208            72            70            72            70
BM_string_strncmp/112/0/0            -0.0173         -0.0173            75            74            75            74
BM_string_strncmp/120/0/0            -0.0166         -0.0166            78            77            78            77
BM_string_strncmp/128/0/0            -0.0158         -0.0158            81            80            81            80
BM_string_strncmp/136/0/0            -0.0149         -0.0149            84            83            84            83
BM_string_strncmp/144/0/0            -0.0201         -0.0201            88            86            88            86
BM_string_strncmp/160/0/0            -0.0136         -0.0136            94            93            94            93
BM_string_strncmp/176/0/0            +0.0224         +0.0224            96            98            96            98
BM_string_strncmp/192/0/0            +0.0289         +0.0289           102           105           102           105
BM_string_strncmp/208/0/0            +0.0101         +0.0101           111           112           111           112
BM_string_strncmp/224/0/0            -0.0107         -0.0107           119           118           119           118
BM_string_strncmp/240/0/0            -0.0088         -0.0088           126           125           126           125
BM_string_strncmp/256/0/0            -0.0101         -0.0101           132           131           132           131
BM_string_strncmp/512/0/0            -0.0056         -0.0056           235           233           235           233
BM_string_strncmp/1024/0/0           -0.0030         -0.0030           439           437           439           437
BM_string_strncmp/8192/0/0           -0.0431         -0.0431          3799          3635          3799          3635
BM_string_strncmp/16384/0/0          -0.0069         -0.0069          6778          6732          6779          6732
BM_string_strncmp/32768/0/0          -0.0001         -0.0002         13405         13403         13405         13403
BM_string_strncmp/65536/0/0          +0.0005         +0.0005         26968         26981         26968         26981
BM_string_strncmp/131072/0/0         -0.0057         -0.0057         53959         53650         53958         53650
BM_string_strncmp/1/4/0              -0.1352         -0.1352            12            10            12            10
BM_string_strncmp/2/4/0              +0.0020         +0.0020            15            15            15            15
BM_string_strncmp/3/4/0              -0.1560         -0.1560            20            17            20            17
BM_string_strncmp/4/4/0              +0.0296         +0.0296            22            22            22            22
BM_string_strncmp/5/4/0              +0.0573         +0.0573            22            23            22            23
BM_string_strncmp/6/4/0              -0.0340         -0.0340            25            24            25            24
BM_string_strncmp/7/4/0              +0.0185         +0.0185            26            26            26            26
BM_string_strncmp/8/4/0              -0.0050         -0.0050            27            27            27            27
BM_string_strncmp/9/4/0              -0.1294         -0.1294            28            24            28            24
BM_string_strncmp/10/4/0             +0.0109         +0.0109            29            29            29            29
BM_string_strncmp/11/4/0             -0.0000         -0.0001            30            30            30            30
BM_string_strncmp/12/4/0             +0.0055         +0.0055            50            50            50            50
BM_string_strncmp/13/4/0             -0.0249         -0.0249            51            50            51            50
BM_string_strncmp/14/4/0             -0.0289         -0.0289            53            52            53            52
BM_string_strncmp/15/4/0             -0.0205         -0.0205            55            54            55            54
BM_string_strncmp/16/4/0             -0.4616         -0.4616            57            31            57            31
BM_string_strncmp/24/4/0             -0.4871         -0.4871            72            37            72            37
BM_string_strncmp/32/4/0             -0.5549         -0.5549            87            39            87            39
BM_string_strncmp/40/4/0             -0.5964         -0.5964           103            42           103            42
BM_string_strncmp/48/4/0             -0.6647         -0.6647           118            40           118            40
BM_string_strncmp/56/4/0             -0.6551         -0.6551           134            46           134            46
BM_string_strncmp/64/4/0             -0.6609         -0.6609           145            49           145            49
BM_string_strncmp/72/4/0             -0.5709         -0.5710           164            70           164            70
BM_string_strncmp/80/4/0             -0.5929         -0.5929           180            73           180            73
BM_string_strncmp/88/4/0             -0.6051         -0.6051           195            77           195            77
BM_string_strncmp/96/4/0             -0.6160         -0.6160           210            81           210            81
BM_string_strncmp/104/4/0            -0.6199         -0.6199           223            85           223            85
BM_string_strncmp/112/4/0            -0.6293         -0.6293           240            89           240            89
BM_string_strncmp/120/4/0            -0.6439         -0.6439           255            91           255            91
BM_string_strncmp/128/4/0            -0.6493         -0.6493           271            95           271            95
BM_string_strncmp/136/4/0            -0.6704         -0.6704           287            95           287            95
BM_string_strncmp/144/4/0            -0.6744         -0.6744           302            98           302            98
BM_string_strncmp/160/4/0            -0.6700         -0.6700           333           110           333           110
BM_string_strncmp/176/4/0            -0.6821         -0.6821           364           116           364           116
BM_string_strncmp/192/4/0            -0.6887         -0.6887           394           123           394           123
BM_string_strncmp/208/4/0            -0.6949         -0.6949           425           130           425           130
BM_string_strncmp/224/4/0            -0.7069         -0.7069           456           134           456           134
BM_string_strncmp/240/4/0            -0.7042         -0.7042           486           144           486           144
BM_string_strncmp/256/4/0            -0.7043         -0.7043           514           152           514           152
BM_string_strncmp/1/0/4              +0.0227         +0.0227            14            14            14            14
BM_string_strncmp/2/0/4              +0.0442         +0.0442            15            16            15            16
BM_string_strncmp/3/0/4              +0.5829         +0.5829            17            27            17            27
BM_string_strncmp/4/0/4              -0.1593         -0.1593            22            19            22            19
BM_string_strncmp/5/0/4              -0.0516         -0.0516            23            22            23            22
BM_string_strncmp/6/0/4              -0.1684         -0.1684            25            20            25            20
BM_string_strncmp/7/0/4              +0.0170         +0.0170            26            26            26            26
BM_string_strncmp/8/0/4              +0.0006         +0.0006            27            27            27            27
BM_string_strncmp/9/0/4              +0.1272         +0.1272            25            28            25            28
BM_string_strncmp/10/0/4             +0.0108         +0.0108            29            29            29            29
BM_string_strncmp/11/0/4             -0.0001         -0.0001            30            30            30            30
BM_string_strncmp/12/0/4             -0.3557         -0.3557            50            32            50            32
BM_string_strncmp/13/0/4             -0.3370         -0.3370            51            34            51            34
BM_string_strncmp/14/0/4             -0.3444         -0.3444            53            35            53            35
BM_string_strncmp/15/0/4             +0.0946         +0.0946            51            56            51            56
BM_string_strncmp/16/0/4             -0.5203         -0.5203            53            25            53            25
BM_string_strncmp/24/0/4             -0.6109         -0.6109            72            28            72            28
BM_string_strncmp/32/0/4             -0.6934         -0.6934            88            27            88            27
BM_string_strncmp/40/0/4             -0.6833         -0.6833           103            33           103            33
BM_string_strncmp/48/0/4             -0.6973         -0.6973           118            36           118            36
BM_string_strncmp/56/0/4             -0.7116         -0.7116           134            39           134            39
BM_string_strncmp/64/0/4             -0.6017         -0.6018           149            59           149            59
BM_string_strncmp/72/0/4             -0.6268         -0.6268           164            61           164            61
BM_string_strncmp/80/0/4             -0.6409         -0.6409           179            64           179            64
BM_string_strncmp/88/0/4             -0.6465         -0.6465           195            69           195            69
BM_string_strncmp/96/0/4             -0.6551         -0.6551           210            72           210            72
BM_string_strncmp/104/0/4            -0.6662         -0.6662           227            76           227            76
BM_string_strncmp/112/0/4            -0.6700         -0.6700           240            79           240            79
BM_string_strncmp/120/0/4            -0.6740         -0.6740           256            83           256            83
BM_string_strncmp/128/0/4            -0.6862         -0.6862           271            85           271            85
BM_string_strncmp/136/0/4            -0.6883         -0.6883           287            89           287            89
BM_string_strncmp/144/0/4            -0.7031         -0.7031           297            88           297            88
BM_string_strncmp/160/0/4            -0.6985         -0.6985           333           100           333           100
BM_string_strncmp/176/0/4            -0.7082         -0.7082           364           106           364           106
BM_string_strncmp/192/0/4            -0.7223         -0.7223           396           110           396           110
BM_string_strncmp/208/0/4            -0.7135         -0.7135           421           121           421           121
BM_string_strncmp/224/0/4            -0.7194         -0.7194           455           128           455           128
BM_string_strncmp/240/0/4            -0.7233         -0.7233           487           135           487           135
BM_string_strncmp/256/0/4            -0.7239         -0.7239           516           143           516           143
BM_string_strncmp/1/4/4              +0.0224         +0.0225            21            22            21            22
BM_string_strncmp/2/4/4              -0.0001         -0.0001            22            22            22            22
BM_string_strncmp/3/4/4              -0.0001         -0.0001            22            22            22            22
BM_string_strncmp/4/4/4              -0.0435         -0.0435            22            21            22            21
BM_string_strncmp/5/4/4              -0.0118         -0.0118            27            27            27            27
BM_string_strncmp/6/4/4              -0.0118         -0.0118            27            27            27            27
BM_string_strncmp/7/4/4              -0.0117         -0.0117            27            27            27            27
BM_string_strncmp/8/4/4              -0.0118         -0.0118            27            27            27            27
BM_string_strncmp/9/4/4              -0.0117         -0.0117            27            27            27            27
BM_string_strncmp/10/4/4             +0.1447         +0.1447            23            27            23            27
BM_string_strncmp/11/4/4             -0.0062         -0.0062            27            27            27            27
BM_string_strncmp/12/4/4             -0.0454         -0.0454            28            27            28            27
BM_string_strncmp/13/4/4             -0.1507         -0.1507            29            24            29            24
BM_string_strncmp/14/4/4             -0.0003         -0.0003            29            29            29            29
BM_string_strncmp/15/4/4             -0.0002         -0.0003            29            29            29            29
BM_string_strncmp/16/4/4             +0.0047         +0.0047            29            29            29            29
BM_string_strncmp/24/4/4             -0.0104         -0.0104            31            30            31            30
BM_string_strncmp/32/4/4             -0.0290         -0.0290            33            32            33            32
BM_string_strncmp/40/4/4             -0.0189         -0.0189            34            33            34            33
BM_string_strncmp/48/4/4             -0.0059         -0.0059            36            36            36            36
BM_string_strncmp/56/4/4             +0.0000         +0.0000            39            39            39            39
BM_string_strncmp/64/4/4             +0.0000         +0.0000            42            42            42            42
BM_string_strncmp/72/4/4             +0.0000         +0.0000            45            45            45            45
BM_string_strncmp/80/4/4             +0.0391         +0.0392            65            68            65            68
BM_string_strncmp/88/4/4             -0.0090         -0.0090            71            70            71            70
BM_string_strncmp/96/4/4             -0.0034         -0.0034            74            74            74            74
BM_string_strncmp/104/4/4            -0.0482         -0.0482            77            73            77            73
BM_string_strncmp/112/4/4            +0.0387         +0.0387            77            80            77            80
BM_string_strncmp/120/4/4            -0.0072         -0.0073            84            83            84            83
BM_string_strncmp/128/4/4            -0.0071         -0.0071            87            86            87            86
BM_string_strncmp/136/4/4            +0.0366         +0.0366            86            89            86            89
BM_string_strncmp/144/4/4            -0.0068         -0.0068            93            93            93            93
BM_string_strncmp/160/4/4            -0.0064         -0.0064           100            99           100            99
BM_string_strncmp/176/4/4            -0.0063         -0.0063           106           105           106           105
BM_string_strncmp/192/4/4            -0.0012         -0.0012           112           112           112           112
BM_string_strncmp/208/4/4            -0.0098         -0.0098           119           118           119           118
BM_string_strncmp/224/4/4            -0.0050         -0.0050           125           125           125           125
BM_string_strncmp/240/4/4            -0.0060         -0.0060           132           131           132           131
BM_string_strncmp/256/4/4            -0.0046         -0.0046           138           137           138           137

[1] Commit id: 26cc4faec37a55529e5d0a39949f7b6ec81008f9

Test: bionic tests and benchmarks on aarch64.
Change-Id: Ied579d2044b4092fc95fad486af6541d1eb71dc3
2018-08-21 19:50:09 +00:00
Adhemerval Zanella
b42ff1b5c3 [AArch64] Improve strcmp performance for misaligned strings
This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. Replace the simple byte-wise compare in the
misaligned case with a dword compare with page boundary checks in
place. For simplicity its uses a 4K page boundary so that it does not
have to query the actual page size on the system.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A64 (using the
new proposed memcmp input data from test_strcmp.xml):

  - Small improvement for aligned arguments with sizes up to 56 bytes
    (from 10% to 20%).

  - Large improvements for unaligned arguments for small sizes (from
    3 to 256 bytes).

Benchmark                              Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------
BM_string_strcmp/1/0/0              +0.0034         +0.0034            11            11            11            11
BM_string_strcmp/2/0/0              +0.0000         +0.0000            11            11            11            11
BM_string_strcmp/3/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/4/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/5/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/6/0/0              -0.1719         -0.1719            11             9            11             9
BM_string_strcmp/7/0/0              -0.1724         -0.1724            11             9            11             9
BM_string_strcmp/8/0/0              -0.1718         -0.1718            11             9            11             9
BM_string_strcmp/9/0/0              -0.2008         -0.2008            16            13            16            13
BM_string_strcmp/10/0/0             -0.2008         -0.2008            16            13            16            13
BM_string_strcmp/11/0/0             -0.2040         -0.2040            16            13            16            13
BM_string_strcmp/12/0/0             -0.1991         -0.1991            16            13            16            13
BM_string_strcmp/13/0/0             -0.1997         -0.1997            16            13            16            13
BM_string_strcmp/14/0/0             -0.1988         -0.1989            16            13            16            13
BM_string_strcmp/15/0/0             -0.2006         -0.2006            16            13            16            13
BM_string_strcmp/16/0/0             -0.2043         -0.2043            16            13            16            13
BM_string_strcmp/24/0/0             -0.1927         -0.1927            18            15            18            15
BM_string_strcmp/32/0/0             -0.1743         -0.1743            20            17            20            17
BM_string_strcmp/40/0/0             -0.1427         -0.1427            22            19            22            19
BM_string_strcmp/48/0/0             -0.1053         -0.1053            24            22            24            22
BM_string_strcmp/56/0/0             -0.0805         -0.0805            26            24            26            24
BM_string_strcmp/64/0/0             -0.0454         -0.0454            28            27            28            27
BM_string_strcmp/72/0/0             -0.0303         -0.0303            30            29            30            29
BM_string_strcmp/80/0/0             -0.0111         -0.0111            32            32            32            32
BM_string_strcmp/88/0/0             -0.0004         -0.0004            34            34            34            34
BM_string_strcmp/96/0/0             -0.0058         -0.0058            37            37            37            37
BM_string_strcmp/104/0/0            +0.0000         +0.0000            40            40            40            40
BM_string_strcmp/112/0/0            -0.0457         -0.0457            61            58            61            58
BM_string_strcmp/120/0/0            -0.0486         -0.0487            61            58            61            58
BM_string_strcmp/128/0/0            -0.0499         -0.0499            64            61            64            61
BM_string_strcmp/136/0/0            -0.0529         -0.0529            66            63            66            63
BM_string_strcmp/144/0/0            -0.0492         -0.0492            69            66            69            66
BM_string_strcmp/160/0/0            -0.0459         -0.0459            74            71            74            71
BM_string_strcmp/176/0/0            -0.0400         -0.0401            79            76            79            76
BM_string_strcmp/192/0/0            -0.0378         -0.0378            85            81            85            81
BM_string_strcmp/208/0/0            -0.0009         -0.0009            89            89            89            89
BM_string_strcmp/224/0/0            -0.0003         -0.0003            95            95            95            95
BM_string_strcmp/240/0/0            -0.0320         -0.0320           100            96           100            96
BM_string_strcmp/256/0/0            -0.0303         -0.0304           105           102           105           102
BM_string_strcmp/512/0/0            -0.0171         -0.0171           187           183           187           183
BM_string_strcmp/1024/0/0           -0.0091         -0.0091           350           347           350           347
BM_string_strcmp/8192/0/0           -0.0030         -0.0031          2668          2660          2668          2660
BM_string_strcmp/16384/0/0          +0.0007         +0.0007          5449          5452          5448          5452
BM_string_strcmp/32768/0/0          +0.0635         +0.0635         10868         11558         10867         11557
BM_string_strcmp/65536/0/0          -0.0017         -0.0017         21824         21786         21822         21784
BM_string_strcmp/131072/0/0         +0.0012         +0.0012         43485         43536         43480         43532
BM_string_strcmp/1/4/0              +0.7630         +0.7630             7            12             7            12
BM_string_strcmp/2/4/0              +0.9265         +0.9265            12            23            12            23
BM_string_strcmp/3/4/0              -0.0000         -0.0000            14            14            14            14
BM_string_strcmp/4/4/0              +0.0372         +0.0372            19            19            19            19
BM_string_strcmp/6/4/0              -0.0921         -0.0921            20            19            20            19
BM_string_strcmp/7/4/0              -0.0291         -0.0291            19            19            19            19
BM_string_strcmp/8/4/0              +0.0648         +0.0648            20            22            20            22
BM_string_strcmp/9/4/0              +0.0001         -0.0055            22            22            22            22
BM_string_strcmp/10/4/0             -0.1924         -0.1924            23            19            23            19
BM_string_strcmp/11/4/0             -0.2347         -0.2347            24            19            24            19
BM_string_strcmp/12/4/0             -0.2738         -0.2739            26            19            26            19
BM_string_strcmp/13/4/0             -0.3804         -0.3804            42            26            42            26
BM_string_strcmp/14/4/0             -0.3581         -0.3582            41            26            41            26
BM_string_strcmp/15/4/0             -0.3905         -0.3905            43            26            43            26
BM_string_strcmp/16/4/0             -0.4068         -0.4068            44            26            44            26
BM_string_strcmp/24/4/0             -0.4917         -0.4917            57            29            57            29
BM_string_strcmp/32/4/0             -0.5607         -0.5607            70            31            70            31
BM_string_strcmp/40/4/0             -0.5940         -0.5940            82            33            82            33
BM_string_strcmp/48/4/0             -0.5303         -0.5302            95            45            95            45
BM_string_strcmp/56/4/0             -0.4975         -0.4975           108            54           108            54
BM_string_strcmp/64/4/0             -0.5167         -0.5167           121            58           121            58
BM_string_strcmp/72/4/0             -0.5325         -0.5325           133            62           133            62
BM_string_strcmp/80/4/0             -0.5523         -0.5523           146            65           146            65
BM_string_strcmp/88/4/0             -0.5686         -0.5686           159            69           159            69
BM_string_strcmp/96/4/0             -0.5815         -0.5815           172            72           172            72
BM_string_strcmp/104/4/0            -0.5931         -0.5931           185            75           185            75
BM_string_strcmp/112/4/0            -0.6046         -0.6046           197            78           197            78
BM_string_strcmp/120/4/0            -0.6113         -0.6113           210            82           210            82
BM_string_strcmp/128/4/0            -0.6186         -0.6186           223            85           223            85
BM_string_strcmp/136/4/0            -0.6278         -0.6278           237            88           237            88
BM_string_strcmp/144/4/0            -0.6410         -0.6410           253            91           253            91
BM_string_strcmp/160/4/0            -0.6506         -0.6506           280            98           280            98
BM_string_strcmp/176/4/0            -0.6593         -0.6593           304           104           304           104
BM_string_strcmp/192/4/0            -0.6647         -0.6647           330           111           330           111
BM_string_strcmp/208/4/0            -0.6741         -0.6741           357           116           357           116
BM_string_strcmp/224/4/0            -0.6761         -0.6761           381           123           381           123
BM_string_strcmp/240/4/0            -0.6824         -0.6824           406           129           406           129
BM_string_strcmp/256/4/0            -0.6846         -0.6846           432           136           432           136
BM_string_strcmp/1/0/4              +1.0024         +1.0024             7            14             7            14
BM_string_strcmp/2/0/4              +0.1591         +0.1591            12            14            12            14
BM_string_strcmp/3/0/4              -0.0015         -0.0015            14            14            14            14
BM_string_strcmp/4/0/4              -0.0809         -0.0809            15            14            15            14
BM_string_strcmp/5/0/4              -0.1535         -0.1536            17            14            17            14
BM_string_strcmp/6/0/4              -0.2111         -0.2111            18            14            18            14
BM_string_strcmp/7/0/4              -0.2650         -0.2650            19            14            19            14
BM_string_strcmp/8/0/4              -0.3118         -0.3118            20            14            20            14
BM_string_strcmp/9/0/4              -0.1741         -0.1740            22            18            22            18
BM_string_strcmp/10/0/4             -0.2201         -0.2201            23            18            23            18
BM_string_strcmp/11/0/4             -0.2610         -0.2610            24            18            24            18
BM_string_strcmp/12/0/4             -0.2987         -0.2987            26            18            26            18
BM_string_strcmp/13/0/4             -0.5748         -0.5748            42            18            42            18
BM_string_strcmp/14/0/4             -0.5796         -0.5796            43            18            43            18
BM_string_strcmp/15/0/4             -0.6167         -0.6167            47            18            47            18
BM_string_strcmp/16/0/4             -0.6303         -0.6303            49            18            49            18
BM_string_strcmp/24/0/4             -0.6557         -0.6557            61            21            61            21
BM_string_strcmp/32/0/4             -0.6612         -0.6612            70            24            70            24
BM_string_strcmp/40/0/4             -0.6812         -0.6813            82            26            82            26
BM_string_strcmp/48/0/4             -0.6974         -0.6974            95            29            95            29
BM_string_strcmp/56/0/4             -0.7151         -0.7151           108            31           108            31
BM_string_strcmp/64/0/4             -0.5717         -0.5717           121            52           121            52
BM_string_strcmp/72/0/4             -0.5927         -0.5927           134            54           134            54
BM_string_strcmp/80/0/4             -0.6004         -0.6004           146            58           146            58
BM_string_strcmp/88/0/4             -0.6145         -0.6145           159            61           159            61
BM_string_strcmp/96/0/4             -0.6287         -0.6287           172            64           172            64
BM_string_strcmp/104/0/4            -0.6351         -0.6351           185            67           185            67
BM_string_strcmp/112/0/4            -0.6423         -0.6423           197            71           197            71
BM_string_strcmp/120/0/4            -0.6489         -0.6489           210            74           210            74
BM_string_strcmp/128/0/4            -0.6578         -0.6578           223            76           223            76
BM_string_strcmp/136/0/4            -0.6597         -0.6597           236            80           236            80
BM_string_strcmp/144/0/4            -0.6674         -0.6674           250            83           250            83
BM_string_strcmp/160/0/4            -0.6751         -0.6751           274            89           274            89
BM_string_strcmp/176/0/4            -0.6798         -0.6798           300            96           300            96
BM_string_strcmp/192/0/4            -0.6873         -0.6855           327           102           325           102
BM_string_strcmp/208/0/4            -0.6903         -0.6903           351           109           351           109
BM_string_strcmp/224/0/4            -0.6907         -0.6907           376           116           376           116
BM_string_strcmp/240/0/4            -0.6897         -0.6897           402           125           402           125
BM_string_strcmp/256/0/4            -0.6937         -0.6937           427           131           427           131
BM_string_strcmp/1/4/4              +0.0009         +0.0009            14            14            14            14
BM_string_strcmp/2/4/4              -0.2229         -0.2229            14            11            14            11
BM_string_strcmp/3/4/4              -0.2256         -0.2256            14            11            14            11
BM_string_strcmp/4/4/4              -0.2241         -0.2240            14            11            14            11
BM_string_strcmp/5/4/4              -0.2220         -0.2220            20            15            20            15
BM_string_strcmp/6/4/4              -0.2267         -0.2267            20            15            20            15
BM_string_strcmp/7/4/4              -0.2228         -0.2227            20            15            20            15
BM_string_strcmp/8/4/4              -0.2219         -0.2219            20            15            20            15
BM_string_strcmp/9/4/4              -0.2220         -0.2220            20            15            20            15
BM_string_strcmp/10/4/4             -0.2227         -0.2227            20            15            20            15
BM_string_strcmp/11/4/4             -0.2210         -0.2210            20            15            20            15
BM_string_strcmp/12/4/4             -0.2224         -0.2224            20            15            20            15
BM_string_strcmp/13/4/4             -0.1778         -0.1778            21            17            21            17
BM_string_strcmp/14/4/4             -0.1863         -0.1863            21            17            21            17
BM_string_strcmp/15/4/4             -0.1780         -0.1780            21            17            21            17
BM_string_strcmp/16/4/4             +0.0031         +0.0031            21            21            21            21
BM_string_strcmp/24/4/4             +0.0041         +0.0041            24            24            24            24
BM_string_strcmp/32/4/4             -0.0001         -0.0000            25            25            25            25
BM_string_strcmp/40/4/4             +0.0016         +0.0016            26            26            26            26
BM_string_strcmp/48/4/4             +0.0001         +0.0001            28            28            28            28
BM_string_strcmp/56/4/4             -0.0001         -0.0001            30            30            30            30
BM_string_strcmp/64/4/4             -0.0342         -0.0342            32            31            32            31
BM_string_strcmp/72/4/4             -0.0186         -0.0186            34            34            34            34
BM_string_strcmp/80/4/4             +0.0004         +0.0004            36            36            36            36
BM_string_strcmp/88/4/4             -0.0000         -0.0000            39            39            39            39
BM_string_strcmp/96/4/4             -0.0510         -0.0510            62            59            62            59
BM_string_strcmp/104/4/4            -0.0502         -0.0502            63            60            63            60
BM_string_strcmp/112/4/4            -0.0490         -0.0490            65            62            65            62
BM_string_strcmp/120/4/4            -0.0387         -0.0387            67            65            67            65
BM_string_strcmp/128/4/4            -0.0426         -0.0426            70            67            70            67
BM_string_strcmp/136/4/4            -0.0408         -0.0408            73            70            73            70
BM_string_strcmp/144/4/4            -0.0194         -0.0194            75            74            75            74
BM_string_strcmp/160/4/4            -0.0035         -0.0035            81            81            81            81
BM_string_strcmp/176/4/4            -0.0001         -0.0001            86            86            86            86
BM_string_strcmp/192/4/4            -0.0002         -0.0002            91            91            91            91
BM_string_strcmp/208/4/4            -0.0335         -0.0335            96            93            96            93
BM_string_strcmp/224/4/4            -0.0314         -0.0314           101            98           101            98
BM_string_strcmp/240/4/4            -0.0303         -0.0303           106           103           106           103
BM_string_strcmp/256/4/4            -0.0288         -0.0288           111           108           111           108

[1] Commit id: f98f2a6780d686ca3d44f8011c7823d42d9b083a

Test: bionic tests and benchmarks on aarch64.
Change-Id: I75f8948782b8bd459d21f15e75e1d420905f5e5a
2018-08-21 17:48:52 +00:00
Adhemerval Zanella
4ab56af82d [AArch64] Optimize memcmp for medium to large sizes
This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. This improved memcmp provides a fast path for
compares up to 16 bytes and then compares 16 bytes at a time, thus
optimizing loads from both sources.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A72 (using the
new proposed memcmp input data from test_memcmp.xml):

Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
BM_string_memcmp/1/0/0               -0.2074         -0.2074            15            12            15            12
BM_string_memcmp/2/0/0               -0.5193         -0.5193            31            15            31            15
BM_string_memcmp/3/0/0               -0.1291         -0.1291            19            17            19            17
BM_string_memcmp/4/0/0               -0.2889         -0.2889            17            12            17            12
BM_string_memcmp/5/0/0               -0.2606         -0.2606            15            11            15            11
BM_string_memcmp/6/0/0               -0.1656         -0.1655            17            14            17            14
BM_string_memcmp/7/0/0               -0.1721         -0.1721            19            15            19            15
BM_string_memcmp/8/0/0               -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/9/0/0               -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/10/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/11/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/12/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/13/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/14/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/15/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/16/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/24/0/0              -0.1209         -0.1209            15            13            15            13
BM_string_memcmp/32/0/0              -0.3228         -0.3228            20            13            20            13
BM_string_memcmp/40/0/0              -0.2937         -0.2937            22            15            22            15
BM_string_memcmp/48/0/0              -0.3299         -0.3299            23            15            23            15
BM_string_memcmp/56/0/0              -0.1845         -0.1845            24            20            24            20
BM_string_memcmp/64/0/0              -0.2247         -0.2247            26            20            26            20
BM_string_memcmp/72/0/0              -0.1947         -0.1947            27            22            27            22
BM_string_memcmp/80/0/0              -0.2275         -0.2275            28            22            28            22
BM_string_memcmp/88/0/0              -0.2360         -0.2360            29            22            29            22
BM_string_memcmp/96/0/0              -0.2675         -0.2675            31            22            31            22
BM_string_memcmp/104/0/0             -0.2559         -0.2559            32            24            32            24
BM_string_memcmp/112/0/0             -0.2787         -0.2786            33            24            33            24
BM_string_memcmp/120/0/0             -0.2599         -0.2599            34            25            34            25
BM_string_memcmp/128/0/0             -0.2860         -0.2860            35            25            35            25
BM_string_memcmp/136/0/0             -0.4708         -0.4708            53            28            53            28
BM_string_memcmp/144/0/0             -0.4719         -0.4719            53            28            53            28
BM_string_memcmp/160/0/0             -0.4680         -0.4680            56            30            56            30
BM_string_memcmp/176/0/0             -0.4645         -0.4645            60            32            60            32
BM_string_memcmp/192/0/0             -0.4641         -0.4641            63            34            63            34
BM_string_memcmp/208/0/0             -0.4555         -0.4555            66            36            66            36
BM_string_memcmp/224/0/0             -0.4558         -0.4557            69            38            69            38
BM_string_memcmp/240/0/0             -0.4534         -0.4534            72            40            72            40
BM_string_memcmp/256/0/0             -0.4463         -0.4463            75            42            75            42
BM_string_memcmp/512/0/0             -0.3077         -0.3077           126            88           126            88
BM_string_memcmp/1024/0/0            -0.3493         -0.3493           229           149           229           149
BM_string_memcmp/8192/0/0            -0.4173         -0.4173          1729          1007          1729          1007
BM_string_memcmp/16384/0/0           -0.3855         -0.3855          3377          2076          3377          2075
BM_string_memcmp/32768/0/0           -0.2968         -0.2968          6847          4815          6847          4814
BM_string_memcmp/65536/0/0           -0.2496         -0.2496         13715         10292         13714         10291
BM_string_memcmp/131072/0/0          -0.2676         -0.2676         27354         20033         27351         20031
BM_string_memcmp/262144/0/0          -0.2319         -0.2319         54604         41943         54598         41939
BM_string_memcmp/524288/0/0          -0.2359         -0.2359        109225         83460        109212         83449
BM_string_memcmp/1048576/0/0         -0.0439         -0.0439        423367        404791        423251        404686
BM_string_memcmp/2097152/0/0         -0.0023         -0.0024        762470        760701        761956        760122
BM_string_memcmp/512/4/4             -0.2853         -0.2853           125            89           125            89
BM_string_memcmp/1024/4/4            -0.3377         -0.3377           228           151           227           151
BM_string_memcmp/8192/4/4            -0.4083         -0.4083          1706          1009          1706          1009
BM_string_memcmp/16384/4/4           -0.3853         -0.3853          3376          2075          3376          2075
BM_string_memcmp/32768/4/4           -0.2974         -0.2974          6846          4810          6845          4810
BM_string_memcmp/65536/4/4           -0.2485         -0.2485         13619         10235         13618         10234
BM_string_memcmp/131072/4/4          -0.2387         -0.2387         27056         20597         27054         20595
BM_string_memcmp/512/4/0             -0.2898         -0.2898           123            88           123            88
BM_string_memcmp/1024/4/0            -0.3401         -0.3401           225           149           225           149
BM_string_memcmp/8192/4/0            -0.4167         -0.4167          1727          1007          1727          1007
BM_string_memcmp/16384/4/0           -0.3820         -0.3820          3384          2092          3384          2091
BM_string_memcmp/32768/4/0           -0.2535         -0.2535          6886          5141          6886          5140
BM_string_memcmp/65536/4/0           -0.1897         -0.1897         13850         11223         13849         11223
BM_string_memcmp/131072/4/0          -0.1972         -0.1972         27536         22106         27533         22104
BM_string_memcmp/512/0/4             -0.2854         -0.2854           125            89           125            89
BM_string_memcmp/1024/0/4            -0.3332         -0.3333           226           151           226           151
BM_string_memcmp/8192/0/4            -0.4199         -0.4199          1740          1009          1740          1009
BM_string_memcmp/16384/0/4           -0.3811         -0.3811          3383          2094          3383          2094
BM_string_memcmp/32768/0/4           -0.2409         -0.2409          6900          5238          6899          5237
BM_string_memcmp/65536/0/4           -0.1920         -0.1920         13922         11250         13921         11248
BM_string_memcmp/131072/0/4          -0.2029         -0.2029         27699         22079         27697         22077

I see similar improvements on A54 as well:

Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
BM_string_memcmp/1/0/0               -0.2074         -0.2074            15            12            15            12
BM_string_memcmp/2/0/0               -0.5193         -0.5193            31            15            31            15
BM_string_memcmp/3/0/0               -0.1291         -0.1291            19            17            19            17
BM_string_memcmp/4/0/0               -0.2889         -0.2889            17            12            17            12
BM_string_memcmp/5/0/0               -0.2606         -0.2606            15            11            15            11
BM_string_memcmp/6/0/0               -0.1656         -0.1655            17            14            17            14
BM_string_memcmp/7/0/0               -0.1721         -0.1721            19            15            19            15
BM_string_memcmp/8/0/0               -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/9/0/0               -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/10/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/11/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/12/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/13/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/14/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/15/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/16/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/24/0/0              -0.1209         -0.1209            15            13            15            13
BM_string_memcmp/32/0/0              -0.3228         -0.3228            20            13            20            13
BM_string_memcmp/40/0/0              -0.2937         -0.2937            22            15            22            15
BM_string_memcmp/48/0/0              -0.3299         -0.3299            23            15            23            15
BM_string_memcmp/56/0/0              -0.1845         -0.1845            24            20            24            20
BM_string_memcmp/64/0/0              -0.2247         -0.2247            26            20            26            20
BM_string_memcmp/72/0/0              -0.1947         -0.1947            27            22            27            22
BM_string_memcmp/80/0/0              -0.2275         -0.2275            28            22            28            22
BM_string_memcmp/88/0/0              -0.2360         -0.2360            29            22            29            22
BM_string_memcmp/96/0/0              -0.2675         -0.2675            31            22            31            22
BM_string_memcmp/104/0/0             -0.2559         -0.2559            32            24            32            24
BM_string_memcmp/112/0/0             -0.2787         -0.2786            33            24            33            24
BM_string_memcmp/120/0/0             -0.2599         -0.2599            34            25            34            25
BM_string_memcmp/128/0/0             -0.2860         -0.2860            35            25            35            25
BM_string_memcmp/136/0/0             -0.4708         -0.4708            53            28            53            28
BM_string_memcmp/144/0/0             -0.4719         -0.4719            53            28            53            28
BM_string_memcmp/160/0/0             -0.4680         -0.4680            56            30            56            30
BM_string_memcmp/176/0/0             -0.4645         -0.4645            60            32            60            32
BM_string_memcmp/192/0/0             -0.4641         -0.4641            63            34            63            34
BM_string_memcmp/208/0/0             -0.4555         -0.4555            66            36            66            36
BM_string_memcmp/224/0/0             -0.4558         -0.4557            69            38            69            38
BM_string_memcmp/240/0/0             -0.4534         -0.4534            72            40            72            40
BM_string_memcmp/256/0/0             -0.4463         -0.4463            75            42            75            42
BM_string_memcmp/512/0/0             -0.3077         -0.3077           126            88           126            88
BM_string_memcmp/1024/0/0            -0.3493         -0.3493           229           149           229           149
BM_string_memcmp/8192/0/0            -0.4173         -0.4173          1729          1007          1729          1007
BM_string_memcmp/16384/0/0           -0.3855         -0.3855          3377          2076          3377          2075
BM_string_memcmp/32768/0/0           -0.2968         -0.2968          6847          4815          6847          4814
BM_string_memcmp/65536/0/0           -0.2496         -0.2496         13715         10292         13714         10291
BM_string_memcmp/131072/0/0          -0.2676         -0.2676         27354         20033         27351         20031
BM_string_memcmp/262144/0/0          -0.2319         -0.2319         54604         41943         54598         41939
BM_string_memcmp/524288/0/0          -0.2359         -0.2359        109225         83460        109212         83449
BM_string_memcmp/1048576/0/0         -0.0439         -0.0439        423367        404791        423251        404686
BM_string_memcmp/2097152/0/0         -0.0023         -0.0024        762470        760701        761956        760122
BM_string_memcmp/512/4/4             -0.2853         -0.2853           125            89           125            89
BM_string_memcmp/1024/4/4            -0.3377         -0.3377           228           151           227           151
BM_string_memcmp/8192/4/4            -0.4083         -0.4083          1706          1009          1706          1009
BM_string_memcmp/16384/4/4           -0.3853         -0.3853          3376          2075          3376          2075
BM_string_memcmp/32768/4/4           -0.2974         -0.2974          6846          4810          6845          4810
BM_string_memcmp/65536/4/4           -0.2485         -0.2485         13619         10235         13618         10234
BM_string_memcmp/131072/4/4          -0.2387         -0.2387         27056         20597         27054         20595
BM_string_memcmp/512/4/0             -0.2898         -0.2898           123            88           123            88
BM_string_memcmp/1024/4/0            -0.3401         -0.3401           225           149           225           149
BM_string_memcmp/8192/4/0            -0.4167         -0.4167          1727          1007          1727          1007
BM_string_memcmp/16384/4/0           -0.3820         -0.3820          3384          2092          3384          2091
BM_string_memcmp/32768/4/0           -0.2535         -0.2535          6886          5141          6886          5140
BM_string_memcmp/65536/4/0           -0.1897         -0.1897         13850         11223         13849         11223
BM_string_memcmp/131072/4/0          -0.1972         -0.1972         27536         22106         27533         22104
BM_string_memcmp/512/0/4             -0.2854         -0.2854           125            89           125            89
BM_string_memcmp/1024/0/4            -0.3332         -0.3333           226           151           226           151
BM_string_memcmp/8192/0/4            -0.4199         -0.4199          1740          1009          1740          1009
BM_string_memcmp/16384/0/4           -0.3811         -0.3811          3383          2094          3383          2094
BM_string_memcmp/32768/0/4           -0.2409         -0.2409          6900          5238          6899          5237
BM_string_memcmp/65536/0/4           -0.1920         -0.1920         13922         11250         13921         11248
BM_string_memcmp/131072/0/4          -0.2029         -0.2029         27699         22079         27697         22077

[1] Commit id: f77e4c932b4fd65177b57dd5e220bd17fb4037d6

Test: bionic tests and benchmarks on aarch64.
Change-Id: I2791e2b20d1c0ad429e8e5a41d3e47b1ac02c921
2018-08-21 17:48:12 +00:00
Haibo Huang
8a0f0ed5e7 Make memcpy memmove
Bug: http://b/63992911
Test: Change BoardConfig.mk and compile for each variant
Change-Id: Ia0cc68d8e90e3316ddb2e9ff1555a009b6a0c5be
2018-06-11 18:12:45 +00:00
Haibo Huang
ece43e14c9 Use cortex-a53/bionic/memmove.S by default for arm64
cortex-a53/bionic/memmove.S looks like a more optimized version. It
should be used in most cases. It delegates small (<= 96 bytes) moves
to memcpy.

The only exception is denver64. It is using its own memcpy, which
doesn't allow overlap for < 96 bytes copies. Only for this variant we
need generic/bionic/memmove.S.

Benchmark result looks pretty close through (on marlin)

Before: using generic/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  108872005   1.15787GB/s
BM_string_memcpy/64/0/0              7 ns          7 ns  107387438   9.14365GB/s
BM_string_memcpy/512/0/0            21 ns         20 ns   34165353   23.2734GB/s
BM_string_memcpy/1024/0/0           40 ns         39 ns   17766657   24.2346GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2259904   24.6339GB/s
BM_string_memcpy/16384/0/0         616 ns        613 ns    1143027   24.8852GB/s
BM_string_memcpy/32768/0/0        1322 ns       1316 ns     530799   23.1835GB/s
BM_string_memcpy/65536/0/0        2672 ns       2661 ns     229638    22.937GB/s
BM_string_memcpy/131072/0/0       5379 ns       5357 ns     128316    22.788GB/s

After: using cortex-a53/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  116610749   1.24646GB/s
BM_string_memcpy/64/0/0              6 ns          6 ns  115634093   9.84708GB/s
BM_string_memcpy/512/0/0            21 ns         21 ns   34167322   22.8938GB/s
BM_string_memcpy/1024/0/0           39 ns         39 ns   17859445   24.3312GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2260192   24.6325GB/s
BM_string_memcpy/16384/0/0         610 ns        608 ns    1151889   25.0987GB/s
BM_string_memcpy/32768/0/0        1488 ns       1482 ns     532508   20.5988GB/s
BM_string_memcpy/65536/0/0        2421 ns       2411 ns     290502   25.3146GB/s
BM_string_memcpy/131072/0/0       5278 ns       5256 ns     132710   23.2234GB/s

Test: Build and benchmark on marlin
Bug: http://b/63992911
Change-Id: Id85961aca18ba841bcbcfe0d8b162843eab30584
2018-05-30 11:09:19 -07:00
Mark Salyzyn
79249b0897 bionic: add vdso clock_getres
clock_getres() should not be a hot call, nevertheless it is
~6-7 times faster for supported clock ids if it uses
__vdso_clock_getres if available.  There is a 3% performance
penalty for unsupported clock ids via __vdso_clock_getres with
respect to a direct syscall.

[TL;DR]

w/vdso32 kernel patches, locked cores to MAX, little cores only.

BEFORE:

hikey960 vdso (aarch64):

----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
BM_time_clock_getres                  126 ns        126 ns    5577874
BM_time_clock_getres_syscall          127 ns        127 ns    5505016
BM_time_clock_getres_REALTIME         126 ns        126 ns    5574682
BM_time_clock_getres_BOOTTIME         126 ns        126 ns    5575237
BM_time_clock_getres_TAI              126 ns        126 ns    5576810
BM_time_clock_getres_unsupported      128 ns        128 ns    5480189

hikey960 vdso32 (aarch32):

----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
BM_time_clock_getres                  199 ns        199 ns    3508708
BM_time_clock_getres_syscall          220 ns        220 ns    3184676
BM_time_clock_getres_REALTIME         199 ns        199 ns    3509697
BM_time_clock_getres_BOOTTIME         199 ns        199 ns    3513551
BM_time_clock_getres_TAI              200 ns        199 ns    3512412
BM_time_clock_getres_unsupported      196 ns        196 ns    3575609

x86_64 (glibc):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_time_clock_getres                 252 ns        252 ns    2370263
BM_time_clock_getres_syscall         215 ns        215 ns    3287497
BM_time_clock_getres_REALTIME        214 ns        214 ns    3294228
BM_time_clock_getres_BOOTTIME        213 ns        213 ns    3277519
BM_time_clock_getres_TAI             213 ns        213 ns    3294991
BM_time_clock_getres_unsupported     206 ns        206 ns    3450654

imx7d_pico IOT nyc (w/arm,cpu-registers-not-fw-configured) (armv7a):
(Virtual Timers)

Benchmark                           Time(ns)    CPU(ns) Iterations
------------------------------------------------------------------
BM_time_clock_getres                      16        345    2000000
BM_time_clock_getres_syscall              16        339    2121212
BM_time_clock_getres_REALTIME             17        350    2058824
BM_time_clock_getres_BOOTTIME             17        345    2000000
BM_time_clock_getres_TAI                  16        350    2000000
BM_time_clock_getres_unsupported          13        284    2500000

AFTER:

hikey960 vdso (aarch64):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_time_clock_getres                  18 ns         18 ns   37880389
BM_time_clock_getres_syscall         127 ns        127 ns    5520029
BM_time_clock_getres_REALTIME         18 ns         18 ns   37879962
BM_time_clock_getres_BOOTTIME         19 ns         18 ns   37878361
BM_time_clock_getres_TAI             131 ns        131 ns    5368484
BM_time_clock_getres_unsupported      97 ns         97 ns    7182864

hikey960 vdso32 (aarch32):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_time_clock_getres                  36 ns         36 ns   19205240
BM_time_clock_getres_syscall         212 ns        212 ns    3297100
BM_time_clock_getres_REALTIME         36 ns         36 ns   19219109
BM_time_clock_getres_BOOTTIME         36 ns         36 ns   19222490
BM_time_clock_getres_TAI             206 ns        206 ns    3402868
BM_time_clock_getres_unsupported     159 ns        159 ns    4409492

imx7d_pico IOT nyc (wo/arm,cpu-registers-not-fw-configured) (armv7a):
(Physical Timers)

Benchmark                           Time(ns)    CPU(ns) Iterations
------------------------------------------------------------------
BM_time_clock_getres                       2         48   14000000
BM_time_clock_getres_syscall              14        335    2058824
BM_time_clock_getres_REALTIME              2         49   14583333
BM_time_clock_getres_BOOTTIME              2         48   14000000
BM_time_clock_getres_TAI                  14        350    2058824
BM_time_clock_getres_unsupported           8        203    3500000

Test: taskset F \
        /data/benchmarktest{64}/bionic-benchmarks/bionic-benchmarks \
        --bionic_xml=vdso.xml --benchmark_filter=BM_time_clock_getres*
Bug: 63737556
Change-Id: I80c0a5106625d76720287f715fcf145d2aad1705
2017-12-07 09:41:48 -08:00
Sebastian Pop
ed9bfc4616 [AArch64] Optimized memcmp
Patch written by Wilco Dijkstra submitted for review to newlib:
https://sourceware.org/ml/newlib/2017/msg00524.html

This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * bionic/libc/arch-arm64/generic/bionic/memcmp.S (memcmp):
                Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:        153%
Length    1, alignment  1/ 1:        119%
Length    1, alignment  1/ 1:        154%
Length    2, alignment  2/ 2:        121%
Length    2, alignment  2/ 2:        140%
Length    2, alignment  2/ 2:        121%
Length    3, alignment  3/ 3:        105%
Length    3, alignment  3/ 3:        105%
Length    3, alignment  3/ 3:        105%
Length    4, alignment  4/ 4:        155%
Length    4, alignment  4/ 4:        154%
Length    4, alignment  4/ 4:        161%
Length    5, alignment  5/ 5:        173%
Length    5, alignment  5/ 5:        173%
Length    5, alignment  5/ 5:        173%
Length    6, alignment  6/ 6:        145%
Length    6, alignment  6/ 6:        145%
Length    6, alignment  6/ 6:        145%
Length    7, alignment  7/ 7:        125%
Length    7, alignment  7/ 7:        125%
Length    7, alignment  7/ 7:        125%
Length    8, alignment  8/ 8:        111%
Length    8, alignment  8/ 8:        130%
Length    8, alignment  8/ 8:        124%
Length    9, alignment  9/ 9:        160%
Length    9, alignment  9/ 9:        160%
Length    9, alignment  9/ 9:        150%
Length   10, alignment 10/10:        170%
Length   10, alignment 10/10:        137%
Length   10, alignment 10/10:        150%
Length   11, alignment 11/11:        160%
Length   11, alignment 11/11:        160%
Length   11, alignment 11/11:        160%
Length   12, alignment 12/12:        146%
Length   12, alignment 12/12:        168%
Length   12, alignment 12/12:        156%
Length   13, alignment 13/13:        167%
Length   13, alignment 13/13:        167%
Length   13, alignment 13/13:        173%
Length   14, alignment 14/14:        167%
Length   14, alignment 14/14:        168%
Length   14, alignment 14/14:        168%
Length   15, alignment 15/15:        168%
Length   15, alignment 15/15:        173%
Length   15, alignment 15/15:        173%
Length    1, alignment  0/ 0:        134%
Length    1, alignment  0/ 0:        127%
Length    1, alignment  0/ 0:        119%
Length    2, alignment  0/ 0:        94%
Length    2, alignment  0/ 0:        94%
Length    2, alignment  0/ 0:        106%
Length    3, alignment  0/ 0:        82%
Length    3, alignment  0/ 0:        87%
Length    3, alignment  0/ 0:        82%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        122%
Length    5, alignment  0/ 0:        127%
Length    5, alignment  0/ 0:        119%
Length    5, alignment  0/ 0:        127%
Length    6, alignment  0/ 0:        103%
Length    6, alignment  0/ 0:        100%
Length    6, alignment  0/ 0:        100%
Length    7, alignment  0/ 0:        82%
Length    7, alignment  0/ 0:        91%
Length    7, alignment  0/ 0:        87%
Length    8, alignment  0/ 0:        111%
Length    8, alignment  0/ 0:        124%
Length    8, alignment  0/ 0:        124%
Length    9, alignment  0/ 0:        136%
Length    9, alignment  0/ 0:        136%
Length    9, alignment  0/ 0:        136%
Length   10, alignment  0/ 0:        136%
Length   10, alignment  0/ 0:        135%
Length   10, alignment  0/ 0:        136%
Length   11, alignment  0/ 0:        136%
Length   11, alignment  0/ 0:        136%
Length   11, alignment  0/ 0:        135%
Length   12, alignment  0/ 0:        136%
Length   12, alignment  0/ 0:        136%
Length   12, alignment  0/ 0:        136%
Length   13, alignment  0/ 0:        135%
Length   13, alignment  0/ 0:        136%
Length   13, alignment  0/ 0:        136%
Length   14, alignment  0/ 0:        136%
Length   14, alignment  0/ 0:        136%
Length   14, alignment  0/ 0:        136%
Length   15, alignment  0/ 0:        136%
Length   15, alignment  0/ 0:        136%
Length   15, alignment  0/ 0:        136%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        115%
Length   32, alignment  0/ 0:        127%
Length   32, alignment  7/ 2:        395%
Length   32, alignment  0/ 0:        127%
Length   32, alignment  0/ 0:        127%
Length    8, alignment  0/ 0:        111%
Length    8, alignment  0/ 0:        124%
Length    8, alignment  0/ 0:        124%
Length   64, alignment  0/ 0:        128%
Length   64, alignment  6/ 4:        475%
Length   64, alignment  0/ 0:        131%
Length   64, alignment  0/ 0:        134%
Length   16, alignment  0/ 0:        128%
Length   16, alignment  0/ 0:        119%
Length   16, alignment  0/ 0:        128%
Length  128, alignment  0/ 0:        129%
Length  128, alignment  5/ 6:        475%
Length  128, alignment  0/ 0:        130%
Length  128, alignment  0/ 0:        129%
Length   32, alignment  0/ 0:        126%
Length   32, alignment  0/ 0:        126%
Length   32, alignment  0/ 0:        126%
Length  256, alignment  0/ 0:        127%
Length  256, alignment  4/ 8:        545%
Length  256, alignment  0/ 0:        126%
Length  256, alignment  0/ 0:        128%
Length   64, alignment  0/ 0:        171%
Length   64, alignment  0/ 0:        171%
Length   64, alignment  0/ 0:        174%
Length  512, alignment  0/ 0:        126%
Length  512, alignment  3/10:        585%
Length  512, alignment  0/ 0:        126%
Length  512, alignment  0/ 0:        127%
Length  128, alignment  0/ 0:        129%
Length  128, alignment  0/ 0:        128%
Length  128, alignment  0/ 0:        129%
Length 1024, alignment  0/ 0:        125%
Length 1024, alignment  2/12:        611%
Length 1024, alignment  0/ 0:        126%
Length 1024, alignment  0/ 0:        126%
Length  256, alignment  0/ 0:        128%
Length  256, alignment  0/ 0:        127%
Length  256, alignment  0/ 0:        128%
Length 2048, alignment  0/ 0:        125%
Length 2048, alignment  1/14:        625%
Length 2048, alignment  0/ 0:        125%
Length 2048, alignment  0/ 0:        125%
Length  512, alignment  0/ 0:        126%
Length  512, alignment  0/ 0:        127%
Length  512, alignment  0/ 0:        127%
Length 4096, alignment  0/ 0:        125%
Length 4096, alignment  0/16:        125%
Length 4096, alignment  0/ 0:        125%
Length 4096, alignment  0/ 0:        125%
Length 1024, alignment  0/ 0:        126%
Length 1024, alignment  0/ 0:        126%
Length 1024, alignment  0/ 0:        126%
Length 8192, alignment  0/ 0:        125%
Length 8192, alignment 63/18:        636%
Length 8192, alignment  0/ 0:        125%
Length 8192, alignment  0/ 0:        125%
Length   16, alignment  1/ 2:        317%
Length   16, alignment  1/ 2:        317%
Length   16, alignment  1/ 2:        317%
Length   32, alignment  2/ 4:        395%
Length   32, alignment  2/ 4:        395%
Length   32, alignment  2/ 4:        398%
Length   64, alignment  3/ 6:        475%
Length   64, alignment  3/ 6:        475%
Length   64, alignment  3/ 6:        477%
Length  128, alignment  4/ 8:        479%
Length  128, alignment  4/ 8:        479%
Length  128, alignment  4/ 8:        479%
Length  256, alignment  5/10:        543%
Length  256, alignment  5/10:        539%
Length  256, alignment  5/10:        543%
Length  512, alignment  6/12:        585%
Length  512, alignment  6/12:        585%
Length  512, alignment  6/12:        585%
Length 1024, alignment  7/14:        611%
Length 1024, alignment  7/14:        611%
Length 1024, alignment  7/14:        611%

The performance measured on the bionic-benchmarks on a hikey
board with a new benchmark for unaligned memcmp submitted for
review at https://android-review.googlesource.com/414860

The base is with the libc from /system/lib64. The bionic libc
with this patch is in /data.

hikey:/data # export LD_LIBRARY_PATH=/system/lib64
hikey:/data # ./bionic-benchmarks --benchmark_filter=BM_string_memcmp
Run on (8 X 2.4 MHz CPU s)
Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      30 ns         30 ns   22955680    251.07MB/s
BM_string_memcmp/64                     57 ns         57 ns   12349184   1076.99MB/s
BM_string_memcmp/512                   305 ns        305 ns    2297163   1.56496GB/s
BM_string_memcmp/1024                  571 ns        571 ns    1225211   1.66912GB/s
BM_string_memcmp/8k                   4307 ns       4306 ns     162562   1.77177GB/s
BM_string_memcmp/16k                  8676 ns       8675 ns      80676   1.75887GB/s
BM_string_memcmp/32k                 19233 ns      19230 ns      36394   1.58695GB/s
BM_string_memcmp/64k                 36986 ns      36984 ns      18952   1.65029GB/s
BM_string_memcmp_aligned/8             199 ns        199 ns    3519166   38.3336MB/s
BM_string_memcmp_aligned/64            386 ns        386 ns    1810734   158.073MB/s
BM_string_memcmp_aligned/512          1735 ns       1734 ns     403981   281.525MB/s
BM_string_memcmp_aligned/1024         3200 ns       3200 ns     218838   305.151MB/s
BM_string_memcmp_aligned/8k          25084 ns      25080 ns      28180   311.507MB/s
BM_string_memcmp_aligned/16k         51730 ns      51729 ns      13521   302.057MB/s
BM_string_memcmp_aligned/32k        103228 ns     103228 ns       6782   302.727MB/s
BM_string_memcmp_aligned/64k        207117 ns     207087 ns       3450   301.806MB/s
BM_string_memcmp_unaligned/8           339 ns        339 ns    2070998   22.5302MB/s
BM_string_memcmp_unaligned/64         1392 ns       1392 ns     502796   43.8454MB/s
BM_string_memcmp_unaligned/512        9194 ns       9194 ns      76133   53.1104MB/s
BM_string_memcmp_unaligned/1024      18325 ns      18323 ns      38206   53.2963MB/s
BM_string_memcmp_unaligned/8k       148579 ns     148574 ns       4713   52.5831MB/s
BM_string_memcmp_unaligned/16k      298169 ns     298120 ns       2344   52.4118MB/s
BM_string_memcmp_unaligned/32k      598813 ns     598797 ns       1085    52.188MB/s
BM_string_memcmp_unaligned/64k     1196079 ns    1196083 ns        540   52.2539MB/s

hikey:/data # export LD_LIBRARY_PATH=/data
hikey:/data # ./bionic-benchmarks --benchmark_filter=BM_string_memcmp

Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      27 ns         27 ns   26198166   286.069MB/s
BM_string_memcmp/64                     45 ns         45 ns   15553753   1.32443GB/s
BM_string_memcmp/512                   242 ns        242 ns    2892423   1.97049GB/s
BM_string_memcmp/1024                  455 ns        455 ns    1537290   2.09436GB/s
BM_string_memcmp/8k                   3446 ns       3446 ns     203295   2.21392GB/s
BM_string_memcmp/16k                  7567 ns       7567 ns      92582   2.01657GB/s
BM_string_memcmp/32k                 16081 ns      16081 ns      43524    1.8977GB/s
BM_string_memcmp/64k                 31029 ns      31028 ns      22565   1.96712GB/s
BM_string_memcmp_aligned/8             184 ns        184 ns    3800912   41.3654MB/s
BM_string_memcmp_aligned/64            287 ns        287 ns    2438835    212.65MB/s
BM_string_memcmp_aligned/512          1370 ns       1370 ns     511014   356.498MB/s
BM_string_memcmp_aligned/1024         2543 ns       2543 ns     275253   384.006MB/s
BM_string_memcmp_aligned/8k          20413 ns      20411 ns      34306   382.764MB/s
BM_string_memcmp_aligned/16k         42908 ns      42907 ns      16132   364.158MB/s
BM_string_memcmp_aligned/32k         88902 ns      88886 ns       8087   351.574MB/s
BM_string_memcmp_aligned/64k        173016 ns     173007 ns       4122   361.258MB/s
BM_string_memcmp_unaligned/8           212 ns        212 ns    3304163   36.0243MB/s
BM_string_memcmp_unaligned/64          361 ns        361 ns    1941597   169.279MB/s
BM_string_memcmp_unaligned/512        1754 ns       1753 ns     399210   278.492MB/s
BM_string_memcmp_unaligned/1024       3308 ns       3308 ns     211622   295.243MB/s
BM_string_memcmp_unaligned/8k        27227 ns      27225 ns      25637   286.964MB/s
BM_string_memcmp_unaligned/16k       55877 ns      55874 ns      12455   279.645MB/s
BM_string_memcmp_unaligned/32k      112397 ns     112366 ns       6200    278.11MB/s
BM_string_memcmp_unaligned/64k      223493 ns     223482 ns       3127   279.665MB/s

Test: bionic-benchmarks --benchmark_filter='BM_string_memcmp*'

Change-Id: Ia16a8cf69c68b8c0533f025f03b925c9883bb708
2017-11-03 13:21:07 -04:00
dimitry
fa432524a6 Mark __BIONIC_WEAK_FOR_NATIVE_BRIDGE symbols
To make it easier for Native Bridge implementations
to override these symbols.

Bug: http://b/67993967
Test: make
Change-Id: I4c53e53af494bca365dd2b3305ab0ccc2b23ba44
2017-10-27 10:01:46 +02:00
Elliott Hughes
d4ca231ae2 Unified sysroot: kill arch-specific include dirs.
<machine/asm.h> was internal use only.

<machine/fenv.h> is quite large, but can live in <bits/...>.

<machine/regdef.h> is trivially replaced by saying $x instead of x in
our assembler.

<machine/setjmp.h> is trivially inlined into <setjmp.h>.

<sgidefs.h> is unused.

Bug: N/A
Test: builds
Change-Id: Id05dbab43a2f9537486efb8f27a5ef167b055815
2017-10-12 13:19:51 -07:00
Elliott Hughes
8465e968a8 Add <sys/random.h>.
iOS 10 has <sys/random.h> with getentropy, glibc >= 2.25 has
<sys/random.h> with getentropy and getrandom. (glibc also pollutes
<unistd.h>, but that seems like a bad idea.)

Also, all supported devices now have kernels with the getrandom system
call.

We've had these available internally for a while, but it seems like the
time is ripe to expose them.

Bug: http://b/67014255
Test: ran tests
Change-Id: I76dde1e3a2d0bc82777eea437ac193f96964f138
2017-09-29 05:31:35 +00:00
Elliott Hughes
896362eb0e Add syncfs(2).
GMM calls this system call directly at the moment. That's silly.

Bug: http://b/36405699
Test: ran tests
Change-Id: I1e14c0e5ce0bc2aa888d884845ac30dc20f13cd5
2017-08-24 16:31:49 -07:00
George Burgess IV
6cb0687932 Split our FORTIFY implementation into libc_fortify
As requested in the bug. This also rips __memcpy_chk out of memcpy.S,
which lets us cut down on copypasta (all of the implementations look
identical).

Bug: 12231437
Test: mma on aosp_{arm,arm64,mips,x86,x86_64} internal master;
checkbuild on bullhead internal master; CtsBionicTestCases on bullhead.
No new failures.
Change-Id: I88c39ca166bacde0b692aa3063e743bb046a5d2f
2017-07-24 14:20:16 -07:00
Elliott Hughes
94072fbb4e Switch to inline assembler in crtbegin.
Using __builtin_frame_address was clever, but didn't work for arm64 (for
reasons which were never investigated) and the ChromeOS folks claim it
causes trouble for x86 with ARC++ (though without a reproduceable test case).

Naked functions turn out to be quite unevenly supported: some architectures
do the right thing, others don't; some architectures warn, others don't (and
the warnings don't always match the platforms that _actually_ have problems).

Inline assembler also removes the guessing games: everyone knows what the
couple of instructions _ought_ to be, and now we don't have to reason about
what the compiler will actually do (yet still keep the majority of the code
in C).

Bug: N/A
Test: builds, boots
Change-Id: I14207ef50ca46b6eca273c3cb7509c311146a3ca
2017-05-23 14:47:16 -07:00
Kevin Brodsky
f19eeb8446 libc: ARM64: fix memset for non-standard ZVA sizes
372f19e9e2 ("libc: ARM64: update memset/strlen/memcpy/memmove to
newlib/cortex-strings") introduced a bug in memset, only occurring
on the [set_long + zero + non-standard ZVA size] path, more
specifically when DCZID_EL0 reports a size different to 64 or 128.

On platforms with such sizes reported by DCZID_EL0, various string*
unit tests fail due to memset zeroing memory before and/or after the
area it is supposed to set.

Test: bionic-unit-tests --gtest_filter=string*
Change-Id: Idb80c0269226e40e343645a58608e3f324378468
2017-05-16 11:29:49 +01:00
Jake Weinstein
28285f5338 libc: clean up ARM64 copyright notices
Test: None needed

Change-Id: I3626a92329e954f67bada6ed73f3033225bbfef5
2017-05-04 12:59:53 -04:00
Elliott Hughes
5109bb463d Make all the ELF relocation constants available.
BSD thinks you should only get the relocation constants for your target
architecture, but it's often useful to have them all available at once.
Rearrange the headers to enable that.

Also update the (modified) NetBSD files to CVS HEAD.

Also remove the unused BSDism R_TYPE.

Bug: N/A
Test: builds
Change-Id: Iad5ef29192a732696e2b36af35144a9ca116aa46
2017-04-19 13:28:32 -07:00
Elliott Hughes
901601b48e Remove unused elf_machdep.h cruft.
Also add a few missing include guards.

Bug: N/A
Test: builds
Change-Id: I9557303c81a4b11d430112528def038ecb5562a9
2017-04-17 16:25:09 -07:00
Yuanyuan Zhong
9d150dd9a0 bionic: arm64: generic: strcmp: align to 64B cache line
Align strcmp to 64B. This will ensure the preformance critical
loop is within one 64B cache line.

Change-Id: I88eef2f12b2a6442cacec9cdbdffbf17293e7d32
Signed-off-by: Yuanyuan Zhong <zyy@motorola.com>
Reviewed-on: https://gerrit.mot.com/902536
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver <sltawvr@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Reviewed-by: Igor Kovalenko <igork@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
2017-03-20 17:54:29 +00:00
Jake Weinstein
372f19e9e2 libc: ARM64: update memset/strlen/memcpy/memmove to newlib/cortex-strings
* Bionic benchmarks results at the bottom

* This is a squash of the following commits:

libc: ARM64: optimize memset.

 This is an optimized memset for AArch64.  Memset is split into 4 main
 cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
 fully unrolled.  Large memsets of more than 96 bytes align the
 destination and use an unrolled loop processing 64 bytes per
 iteration.  Memsets of zero of more than 256 use the dc zva
 instruction, and there are faster versions for the common ZVA sizes 64
 or 128.  STP of Q registers is used to reduce codesize without loss of
 performance.

Change-Id: I0c5b5ec5ab8a1fd0f23eee8fbacada0be08e841f

libc: ARM64: improve performance in strlen

Change-Id: Ic20f93a0052a49bd76cd6795f51e8606ccfbf11c

libc: ARM64: Optimize memcpy.

 This is an optimized memcpy for AArch64.  Copies are split into 3 main
 cases: small copies of up to 16 bytes, medium copies of 17..96 bytes
 which are fully unrolled.  Large copies of more than 96 bytes align
 the destination and use an unrolled loop processing 64 bytes per
 iteration.  In order to share code with memmove, small and medium
 copies read all data before writing, allowing any kind of overlap.  On
 a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.

Change-Id: Ibb9483e45bbc0e8ca3d5ce98a31c55dfd8a5ac28

libc: AArch64: Tune memcpy

* Further tuning for performance.

Change-Id: Id08eaab885f9743fa7575077924a947c1b88e4ff

libc: ARM64: optimize memmove for Cortex-A53

* Sadly does not work on Denver or Kryo, so can't go to generic

 This is an optimized memmove for AArch64.  All copies of up to 96
 bytes and all backward copies are done by the new memcpy.  The only
 remaining case is large forward copies which are done in the same way
 as the memcpy loop, but copying from the end rather than the start.

Tested on the Nextbit Robin with MSM8992 (Snapdragon 808):

Before
BM_string_memcmp/8                          1000k         27    0.286 GiB/s
BM_string_memcmp/64                           50M         20    3.053 GiB/s
BM_string_memcmp/512                          20M        126    4.060 GiB/s
BM_string_memcmp/1024                         10M        234    4.372 GiB/s
BM_string_memcmp/8Ki                        1000k       1726    4.745 GiB/s
BM_string_memcmp/16Ki                        500k       3711    4.415 GiB/s
BM_string_memcmp/32Ki                        200k       8276    3.959 GiB/s
BM_string_memcmp/64Ki                        100k      16351    4.008 GiB/s
BM_string_memcpy/8                          1000k         13    0.612 GiB/s
BM_string_memcpy/64                         1000k          8    7.187 GiB/s
BM_string_memcpy/512                          50M         38   13.311 GiB/s
BM_string_memcpy/1024                         20M         86   11.858 GiB/s
BM_string_memcpy/8Ki                           5M        620   13.203 GiB/s
BM_string_memcpy/16Ki                       1000k       1265   12.950 GiB/s
BM_string_memcpy/32Ki                        500k       2977   11.004 GiB/s
BM_string_memcpy/64Ki                        500k       8003    8.188 GiB/s
BM_string_memmove/8                         1000k         11    0.684 GiB/s
BM_string_memmove/64                        1000k         16    3.855 GiB/s
BM_string_memmove/512                         50M         57    8.915 GiB/s
BM_string_memmove/1024                        20M        117    8.720 GiB/s
BM_string_memmove/8Ki                          2M        853    9.594 GiB/s
BM_string_memmove/16Ki                      1000k       1731    9.462 GiB/s
BM_string_memmove/32Ki                       500k       3566    9.189 GiB/s
BM_string_memmove/64Ki                       500k       7708    8.501 GiB/s
BM_string_memset/8                          1000k         16    0.487 GiB/s
BM_string_memset/64                         1000k         16    3.995 GiB/s
BM_string_memset/512                          50M         37   13.489 GiB/s
BM_string_memset/1024                         50M         58   17.405 GiB/s
BM_string_memset/8Ki                           5M        451   18.160 GiB/s
BM_string_memset/16Ki                          2M        883   18.554 GiB/s
BM_string_memset/32Ki                       1000k       2181   15.022 GiB/s
BM_string_memset/64Ki                        500k       4563   14.362 GiB/s
BM_string_strlen/8                          1000k          8    0.965 GiB/s
BM_string_strlen/64                         1000k         16    3.855 GiB/s
BM_string_strlen/512                          20M         92    5.540 GiB/s
BM_string_strlen/1024                         10M        167    6.111 GiB/s
BM_string_strlen/8Ki                        1000k       1237    6.620 GiB/s
BM_string_strlen/16Ki                       1000k       2765    5.923 GiB/s
BM_string_strlen/32Ki                        500k       6135    5.341 GiB/s
BM_string_strlen/64Ki                        200k      13168    4.977 GiB/s

After
BM_string_memcmp/8                          1000k         21    0.369 GiB/s
BM_string_memcmp/64                         1000k         28    2.272 GiB/s
BM_string_memcmp/512                          20M        128    3.983 GiB/s
BM_string_memcmp/1024                         10M        234    4.375 GiB/s
BM_string_memcmp/8Ki                        1000k       1732    4.728 GiB/s
BM_string_memcmp/16Ki                        500k       3485    4.701 GiB/s
BM_string_memcmp/32Ki                        500k       7031    4.660 GiB/s
BM_string_memcmp/64Ki                        200k      14296    4.584 GiB/s
BM_string_memcpy/8                          1000k          5    1.458 GiB/s
BM_string_memcpy/64                         1000k          7    8.952 GiB/s
BM_string_memcpy/512                          50M         36   13.907 GiB/s
BM_string_memcpy/1024                         20M         80   12.750 GiB/s
BM_string_memcpy/8Ki                           5M        572   14.307 GiB/s
BM_string_memcpy/16Ki                       1000k       1165   14.053 GiB/s
BM_string_memcpy/32Ki                        500k       3141   10.430 GiB/s
BM_string_memcpy/64Ki                        500k       7008    9.351 GiB/s
BM_string_memmove/8                           50M          7    1.074 GiB/s
BM_string_memmove/64                        1000k          9    6.593 GiB/s
BM_string_memmove/512                         50M         37   13.502 GiB/s
BM_string_memmove/1024                        20M         80   12.656 GiB/s
BM_string_memmove/8Ki                          5M        573   14.281 GiB/s
BM_string_memmove/16Ki                      1000k       1168   14.018 GiB/s
BM_string_memmove/32Ki                      1000k       2825   11.599 GiB/s
BM_string_memmove/64Ki                       500k       6548   10.008 GiB/s
BM_string_memset/8                          1000k          7    1.038 GiB/s
BM_string_memset/64                         1000k          8    7.151 GiB/s
BM_string_memset/512                        1000k         29   17.272 GiB/s
BM_string_memset/1024                         50M         53   18.969 GiB/s
BM_string_memset/8Ki                           5M        424   19.300 GiB/s
BM_string_memset/16Ki                          2M        846   19.350 GiB/s
BM_string_memset/32Ki                       1000k       2028   16.156 GiB/s
BM_string_memset/64Ki                        500k       4514   14.517 GiB/s
BM_string_strlen/8                          1000k          7    1.120 GiB/s
BM_string_strlen/64                         1000k         16    3.918 GiB/s
BM_string_strlen/512                          50M         64    7.894 GiB/s
BM_string_strlen/1024                         20M        104    9.815 GiB/s
BM_string_strlen/8Ki                           5M        664   12.337 GiB/s
BM_string_strlen/16Ki                       1000k       1291   12.682 GiB/s
BM_string_strlen/32Ki                       1000k       2940   11.143 GiB/s
BM_string_strlen/64Ki                        500k       6440   10.175 GiB/s

Change-Id: I635bd2798a755256f748b2af19b1a56fb85a40c6
2016-11-28 19:35:12 +00:00
Elliott Hughes
beb8796624 Use ENTRY_PRIVATE in __bionic_clone assembler.
Bug: N/A
Test: bionic tests
Change-Id: Ic651d628be009487a36d0b2e5bcf900b981b1ef9
2016-10-26 17:01:58 -07:00
Colin Cross
7510c33b61 Remove deprecated Android.mk files
These directories all have Android.bp files that are always used now,
delete the Android.mk files.

Change-Id: Ib0ba2d28bff88483b505426ba61606da314e03ab
2016-05-26 16:41:57 -07:00
Elliott Hughes
eafad49bd6 Add <sys/quota.h>.
It turns out that at least the Nexus 9 kernel is built without CONFIG_QUOTA.
If we decide we're going to mandate quota functionality, I'm happy for us to
be a part of CTS that ensures that happens, but I don't want to be first, so
there's not much to test here other than "will it compile?". The strace
output looks right though.

Bug: http://b/27948821
Bug: http://b/27952303
Change-Id: If667195eee849ed17c8fa9110f6b02907fc8fc04
2016-04-06 11:06:09 -07:00
Elliott Hughes
7f72ad4d6c Add sync_file_range to <fcntl.h>.
Bug: http://b/27952303
Change-Id: Idadfacd657ed415abc11684b9471e4e24c2fbf05
2016-04-05 12:17:22 -07:00
Elliott Hughes
afe835d540 Move math headers in with the other headers.
Keeping them separate is a pain for the NDK, and doesn't help the platform.

Change-Id: I96b8beef307d4a956e9c0a899ad9315adc502582
2016-04-02 08:36:33 -07:00
Greg Hackmann
e2faf07d65 Add {get,set}domainname(2)
{get,set}domainname aren't in POSIX but are widely-implemented
extensions.

The Linux kernel provides a setdomainname syscall but not a symmetric
getdomainname syscall, since it expects userspace to get the domain name
from uname(2).

Change-Id: I96726c242f4bb646c130b361688328b0b97269a0
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2016-03-25 14:16:58 -07:00
Josh Gao
0c3655a864 Add a checksum to jmp_buf on AArch64.
Bug: http://b/27417786
Change-Id: I17c22dc28a46dd6b678b449b506b0da978f3793e
2016-03-03 12:45:08 -08:00
Elliott Hughes
784609317d Mandate optimized __memset_chk for arm and arm64.
This involves actually implementing assembler __memset_chk for arm64,
but that's easily done.

Obviously I'd like this for all architectures (and all the string functions),
but this is low-hanging fruit...

Change-Id: I70ec48c91aafd1f0feb974a2555c51611de9ef82
2016-03-02 11:58:41 -08:00
Elliott Hughes
3c6016f04a Improve diagnostics from the assembler __memcpy_chk routines.
Change-Id: Iec16c92ed80beee505cba2121ea33e3550197b02
2016-03-01 14:45:58 -08:00