Commit graph

17 commits

Author SHA1 Message Date
ahs
919fb7f2e0 avx2 implementation for memset.
This patch includes handwritten avx2
assembly for memset 64-bit. Uses
non-temporal stores for very large sizes.
Also includes dynamic dispatch for APIs
having multiple implementations.

Convincing benchmark improvements for sizes above 512 bytes, and
although the slight regression for small sizes is unfortunate, it's
probably small enough to be okay?

Before:

  BM_string_memset/8/0            3.06 ns         3.04 ns    222703428 bytes_per_second=2.45261G/s
  BM_string_memset/16/0           3.50 ns         3.47 ns    202569932 bytes_per_second=4.29686G/s
  BM_string_memset/32/0           3.50 ns         3.48 ns    200064955 bytes_per_second=8.57386G/s
  BM_string_memset/64/0           3.49 ns         3.46 ns    201928186 bytes_per_second=17.2184G/s
  BM_string_memset/512/0          14.8 ns         14.7 ns     47776178 bytes_per_second=32.3887G/s
  BM_string_memset/1024/0         27.3 ns         27.1 ns     25884933 bytes_per_second=35.2515G/s
  BM_string_memset/8192/0          203 ns          201 ns      3476903 bytes_per_second=37.9311G/s
  BM_string_memset/16384/0         402 ns          399 ns      1750471 bytes_per_second=38.2725G/s
  BM_string_memset/32768/0         932 ns          925 ns       755750 bytes_per_second=33.0071G/s
  BM_string_memset/65536/0        2038 ns         2014 ns       347060 bytes_per_second=30.3057G/s
  BM_string_memset/131072/0       4012 ns         3980 ns       175186 bytes_per_second=30.6682G/s

After:

  BM_string_memset/8/0            3.32 ns         3.23 ns    208939089 bytes_per_second=2.3051G/s
  BM_string_memset/16/0           4.07 ns         3.98 ns    173479615 bytes_per_second=3.74822G/s
  BM_string_memset/32/0           4.07 ns         3.95 ns    177208119 bytes_per_second=7.54344G/s
  BM_string_memset/64/0           4.09 ns         4.00 ns    174729144 bytes_per_second=14.8878G/s
  BM_string_memset/512/0          10.7 ns         10.4 ns     65922763 bytes_per_second=45.6611G/s
  BM_string_memset/1024/0         18.0 ns         17.6 ns     40489136 bytes_per_second=54.3166G/s
  BM_string_memset/8192/0          109 ns          106 ns      6577711 bytes_per_second=71.7667G/s
  BM_string_memset/16384/0         221 ns          210 ns      3343800 bytes_per_second=72.684G/s
  BM_string_memset/32768/0         655 ns          623 ns      1153501 bytes_per_second=48.9781G/s
  BM_string_memset/65536/0        1547 ns         1495 ns       461702 bytes_per_second=40.8154G/s
  BM_string_memset/131072/0       2991 ns         2924 ns       240189 bytes_per_second=41.7438G/s

This patch drops the wmemset() code because we don't even have a
microbenchmark for it, we have as many implementations checked in as we
have non-test call sites (!), so at this point it seems like we've spent
more time maintaining wmemset() than running it!

Test: bionic/tests/run-on-host.sh 64
Signed-off-by: ahs <amrita.h.s@intel.com>
Change-Id: Ie5047df5300638c1e4c69f8285d33d034f79c83b
2022-07-22 21:48:50 +00:00
jaishank
2e50fa7cf8 Optimized L2 Cache value for Intel(R) Core Architectures.
Performance Gain:
AnTuTu             - 4.80%
3D Mark Sling Shot - 3.47%
BaseMarkGPU        - 5.51%
GeekBench          - 3.19%

Test: ./tests/run-on-host.sh 64

Change-Id: I6122835a3f5fd97cc291623d1062fe25843a2d93
Signed-off-by: jaishank <jaishankar.rajendran@intel.com>
2019-11-12 15:58:34 +00:00
Shalini Salomi Bodapati
4ed2f475d8 Add avx2 version of wmemset in binoic
Test: ./tests/run-on-host.sh 64
Change-Id: Id2f696cc60a10c01846ca3fe0d3a5d513020afe3
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
2019-07-16 18:06:57 +05:30
Haibo Huang
8a0f0ed5e7 Make memcpy memmove
Bug: http://b/63992911
Test: Change BoardConfig.mk and compile for each variant
Change-Id: Ia0cc68d8e90e3316ddb2e9ff1555a009b6a0c5be
2018-06-11 18:12:45 +00:00
Jeremy Compostella
611ad621c6 Revert "Add 64-bit slm optimized strlcpy and srlcat."
This reverts commit 2e7145c048.

When src is at the end page, the sse2 strlcpy SSE2 optimized version
can issue a movdqu instruction that can cross the page boundary.  If
the next page is not allocated to that process, it leads to
segmentation fault.  This is a rare but has be caught multiple times
during robustness testing.

We isolated a way to reproduce that issue outside of an Android device
and we have been able to resolve this particular case.  However, we
ran some additional compliance and robustness tests and found several
other similar page crossing issues with this implementation.

In conclusion, this optimization needs to be re-written from scratch
because its design is at cause.  In the meantime, it is better to
remove it.

Change-Id:  If90450de430ba9b7cd9282a422783beabd701f3d
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
2018-04-12 14:00:43 -07:00
Elliott Hughes
a80ddc8a34 Fix x86-64 __memset_chk.
I can only assume I was testing the 32-bit implementation when I claimed
this worked. While improving the 32-bit code I realized that I'd used
signed comparisons instead of unsigned, and came back to find that the
64-bit code didn't work.

By way of apology, make x86-64 the first architecture where __memset_chk
falls through to memset.

Change-Id: I54d9eee5349b6a2abb2ce81e161fdcde09556561
2016-03-03 16:46:25 -08:00
Elliott Hughes
ff9bda7201 Merge "Mandate optimized assembler for x86-64 __memset_chk." 2016-03-03 22:18:46 +00:00
Elliott Hughes
01d5b946ac Remove optimized code for bzero, which was removed from POSIX in 2008.
I'll come back for the last bcopy remnant...

Bug: http://b/26407170
Change-Id: Iabfeb95fc8a4b4b3992e3cc209ec5221040e7c26
2016-03-02 17:21:07 -08:00
Elliott Hughes
61c95fe52d Mandate optimized assembler for x86-64 __memset_chk.
Change-Id: I4d6b452f3cf850d405e8f5d7da01d432603e606b
2016-03-02 16:39:29 -08:00
Jake Weinstein
2926f9a31e libc: remove bcopy from memmove on 64-bit architectures
* bcopy is deprecated on LP64 by the following commit:

  ce9ce28e5d

Change-Id: I6849916f0ec4a2d0db9a360999ad1dc8edda952b
2015-08-17 22:06:12 +00:00
Chih-Hung Hsieh
0a93df369c Fix opcode to compile with both gcc and llvm.
BUG: 17302991

Change-Id: I31febd9ad24312388068803ce247b295bd73b607
2015-04-23 21:40:31 +00:00
Varvara Rainchik
2e7145c048 Add 64-bit slm optimized strlcpy and srlcat.
Change-Id: Ic948934d91c83bbfdfd00c05ee8b14952e012549
Signed-off-by: Varvara Rainchik <varvara.rainchik@intel.com>
2014-11-12 17:32:28 +03:00
Varvara Rainchik
fce861498c Fix for slm-tuned memmove (both 32- and 64-bit).
Introduce a test for memmove that catches a fault.
Fix both 32- and 64-bit versions of slm-tuned memmove.

Change-Id: Ib416def2610a0972e32c3b9b6055b54967643dc3
Signed-off-by: Varvara Rainchik <varvara.rainchik@intel.com>
2014-06-05 11:08:09 -07:00
Dan Albert
ce9ce28e5d Removes bcopy and bzero from bionic.
These symbols are still defined for LP32 for binary compatibility, but
the declarations have been replaced with the POSIX recommended #defines.

Bug: 13935372
Change-Id: Ief7e6ca012db374588ba5839f11e8f3a13a20467
2014-06-03 17:22:07 -07:00
Varvara Rainchik
a020a244ae Add 64-bit Silvermont-optimized string/memory functions.
Add following functions:
bcopy, bzero, memcpy, memmove, memset, stpcpy, stpncpy, strcat, strcpy,
strlen, strncat, strncpy, memcmp, strcmp, strncmp.
Set all these functions as the default ones.

Change-Id: Ic66b250ad8c349a43d25e2d4dea075604f6df6ac
Signed-off-by: Varvara Rainchik <varvara.rainchik@intel.com>
2014-05-12 17:37:07 -07:00
Elliott Hughes
bf425680e4 Let the compiler worry about implementing ffs(3).
It does at least as good a job as our old hand-written assembly anyway.

Change-Id: If7c4a1ac508bace0b71ee7b67808caa6eabf11d2
2013-10-24 16:29:40 -07:00
Elliott Hughes
8ca530e559 Add ffs and memcmp16 to x86_64.
Change-Id: I652c1356f1c7c52299977181c2cf154386979380
2013-10-17 17:03:22 -07:00