ahs
919fb7f2e0
avx2 implementation for memset.
...
This patch includes handwritten avx2
assembly for memset 64-bit. Uses
non-temporal stores for very large sizes.
Also includes dynamic dispatch for APIs
having multiple implementations.
Convincing benchmark improvements for sizes above 512 bytes, and
although the slight regression for small sizes is unfortunate, it's
probably small enough to be okay?
Before:
BM_string_memset/8/0 3.06 ns 3.04 ns 222703428 bytes_per_second=2.45261G/s
BM_string_memset/16/0 3.50 ns 3.47 ns 202569932 bytes_per_second=4.29686G/s
BM_string_memset/32/0 3.50 ns 3.48 ns 200064955 bytes_per_second=8.57386G/s
BM_string_memset/64/0 3.49 ns 3.46 ns 201928186 bytes_per_second=17.2184G/s
BM_string_memset/512/0 14.8 ns 14.7 ns 47776178 bytes_per_second=32.3887G/s
BM_string_memset/1024/0 27.3 ns 27.1 ns 25884933 bytes_per_second=35.2515G/s
BM_string_memset/8192/0 203 ns 201 ns 3476903 bytes_per_second=37.9311G/s
BM_string_memset/16384/0 402 ns 399 ns 1750471 bytes_per_second=38.2725G/s
BM_string_memset/32768/0 932 ns 925 ns 755750 bytes_per_second=33.0071G/s
BM_string_memset/65536/0 2038 ns 2014 ns 347060 bytes_per_second=30.3057G/s
BM_string_memset/131072/0 4012 ns 3980 ns 175186 bytes_per_second=30.6682G/s
After:
BM_string_memset/8/0 3.32 ns 3.23 ns 208939089 bytes_per_second=2.3051G/s
BM_string_memset/16/0 4.07 ns 3.98 ns 173479615 bytes_per_second=3.74822G/s
BM_string_memset/32/0 4.07 ns 3.95 ns 177208119 bytes_per_second=7.54344G/s
BM_string_memset/64/0 4.09 ns 4.00 ns 174729144 bytes_per_second=14.8878G/s
BM_string_memset/512/0 10.7 ns 10.4 ns 65922763 bytes_per_second=45.6611G/s
BM_string_memset/1024/0 18.0 ns 17.6 ns 40489136 bytes_per_second=54.3166G/s
BM_string_memset/8192/0 109 ns 106 ns 6577711 bytes_per_second=71.7667G/s
BM_string_memset/16384/0 221 ns 210 ns 3343800 bytes_per_second=72.684G/s
BM_string_memset/32768/0 655 ns 623 ns 1153501 bytes_per_second=48.9781G/s
BM_string_memset/65536/0 1547 ns 1495 ns 461702 bytes_per_second=40.8154G/s
BM_string_memset/131072/0 2991 ns 2924 ns 240189 bytes_per_second=41.7438G/s
This patch drops the wmemset() code because we don't even have a
microbenchmark for it, we have as many implementations checked in as we
have non-test call sites (!), so at this point it seems like we've spent
more time maintaining wmemset() than running it!
Test: bionic/tests/run-on-host.sh 64
Signed-off-by: ahs <amrita.h.s@intel.com>
Change-Id: Ie5047df5300638c1e4c69f8285d33d034f79c83b
2022-07-22 21:48:50 +00:00
jaishank
2e50fa7cf8
Optimized L2 Cache value for Intel(R) Core Architectures.
...
Performance Gain:
AnTuTu - 4.80%
3D Mark Sling Shot - 3.47%
BaseMarkGPU - 5.51%
GeekBench - 3.19%
Test: ./tests/run-on-host.sh 64
Change-Id: I6122835a3f5fd97cc291623d1062fe25843a2d93
Signed-off-by: jaishank <jaishankar.rajendran@intel.com>
2019-11-12 15:58:34 +00:00
Shalini Salomi Bodapati
4ed2f475d8
Add avx2 version of wmemset in binoic
...
Test: ./tests/run-on-host.sh 64
Change-Id: Id2f696cc60a10c01846ca3fe0d3a5d513020afe3
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
2019-07-16 18:06:57 +05:30
Haibo Huang
8a0f0ed5e7
Make memcpy memmove
...
Bug: http://b/63992911
Test: Change BoardConfig.mk and compile for each variant
Change-Id: Ia0cc68d8e90e3316ddb2e9ff1555a009b6a0c5be
2018-06-11 18:12:45 +00:00
Jeremy Compostella
611ad621c6
Revert "Add 64-bit slm optimized strlcpy and srlcat."
...
This reverts commit 2e7145c048
.
When src is at the end page, the sse2 strlcpy SSE2 optimized version
can issue a movdqu instruction that can cross the page boundary. If
the next page is not allocated to that process, it leads to
segmentation fault. This is a rare but has be caught multiple times
during robustness testing.
We isolated a way to reproduce that issue outside of an Android device
and we have been able to resolve this particular case. However, we
ran some additional compliance and robustness tests and found several
other similar page crossing issues with this implementation.
In conclusion, this optimization needs to be re-written from scratch
because its design is at cause. In the meantime, it is better to
remove it.
Change-Id: If90450de430ba9b7cd9282a422783beabd701f3d
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
2018-04-12 14:00:43 -07:00
Elliott Hughes
a80ddc8a34
Fix x86-64 __memset_chk.
...
I can only assume I was testing the 32-bit implementation when I claimed
this worked. While improving the 32-bit code I realized that I'd used
signed comparisons instead of unsigned, and came back to find that the
64-bit code didn't work.
By way of apology, make x86-64 the first architecture where __memset_chk
falls through to memset.
Change-Id: I54d9eee5349b6a2abb2ce81e161fdcde09556561
2016-03-03 16:46:25 -08:00
Elliott Hughes
ff9bda7201
Merge "Mandate optimized assembler for x86-64 __memset_chk."
2016-03-03 22:18:46 +00:00
Elliott Hughes
01d5b946ac
Remove optimized code for bzero, which was removed from POSIX in 2008.
...
I'll come back for the last bcopy remnant...
Bug: http://b/26407170
Change-Id: Iabfeb95fc8a4b4b3992e3cc209ec5221040e7c26
2016-03-02 17:21:07 -08:00
Elliott Hughes
61c95fe52d
Mandate optimized assembler for x86-64 __memset_chk.
...
Change-Id: I4d6b452f3cf850d405e8f5d7da01d432603e606b
2016-03-02 16:39:29 -08:00
Jake Weinstein
2926f9a31e
libc: remove bcopy from memmove on 64-bit architectures
...
* bcopy is deprecated on LP64 by the following commit:
ce9ce28e5d
Change-Id: I6849916f0ec4a2d0db9a360999ad1dc8edda952b
2015-08-17 22:06:12 +00:00
Chih-Hung Hsieh
0a93df369c
Fix opcode to compile with both gcc and llvm.
...
BUG: 17302991
Change-Id: I31febd9ad24312388068803ce247b295bd73b607
2015-04-23 21:40:31 +00:00
Varvara Rainchik
2e7145c048
Add 64-bit slm optimized strlcpy and srlcat.
...
Change-Id: Ic948934d91c83bbfdfd00c05ee8b14952e012549
Signed-off-by: Varvara Rainchik <varvara.rainchik@intel.com>
2014-11-12 17:32:28 +03:00
Varvara Rainchik
fce861498c
Fix for slm-tuned memmove (both 32- and 64-bit).
...
Introduce a test for memmove that catches a fault.
Fix both 32- and 64-bit versions of slm-tuned memmove.
Change-Id: Ib416def2610a0972e32c3b9b6055b54967643dc3
Signed-off-by: Varvara Rainchik <varvara.rainchik@intel.com>
2014-06-05 11:08:09 -07:00
Dan Albert
ce9ce28e5d
Removes bcopy and bzero from bionic.
...
These symbols are still defined for LP32 for binary compatibility, but
the declarations have been replaced with the POSIX recommended #defines.
Bug: 13935372
Change-Id: Ief7e6ca012db374588ba5839f11e8f3a13a20467
2014-06-03 17:22:07 -07:00
Varvara Rainchik
a020a244ae
Add 64-bit Silvermont-optimized string/memory functions.
...
Add following functions:
bcopy, bzero, memcpy, memmove, memset, stpcpy, stpncpy, strcat, strcpy,
strlen, strncat, strncpy, memcmp, strcmp, strncmp.
Set all these functions as the default ones.
Change-Id: Ic66b250ad8c349a43d25e2d4dea075604f6df6ac
Signed-off-by: Varvara Rainchik <varvara.rainchik@intel.com>
2014-05-12 17:37:07 -07:00
Elliott Hughes
bf425680e4
Let the compiler worry about implementing ffs(3).
...
It does at least as good a job as our old hand-written assembly anyway.
Change-Id: If7c4a1ac508bace0b71ee7b67808caa6eabf11d2
2013-10-24 16:29:40 -07:00
Elliott Hughes
8ca530e559
Add ffs and memcmp16 to x86_64.
...
Change-Id: I652c1356f1c7c52299977181c2cf154386979380
2013-10-17 17:03:22 -07:00