platform_bionic/libc/arch-x86_64/string
Elliott Hughes 7c95495ead avx2 memset: add missing vzeroupper.
Also improve a few labels. This actually improves performance slightly,
and removes the weird behavior I was seeing around 512 bytes in the
"before" numbers...

Before:
```
BM_string_memset/8/0            2.12 ns         2.12 ns    332002763 bytes_per_second=3.52172G/s
BM_string_memset/16/0           2.36 ns         2.36 ns    297459840 bytes_per_second=6.31618G/s
BM_string_memset/32/0           2.36 ns         2.36 ns    296996995 bytes_per_second=12.6321G/s
BM_string_memset/64/0           2.37 ns         2.36 ns    296196644 bytes_per_second=25.2097G/s
BM_string_memset/512/0          65.9 ns         65.8 ns     10609200 bytes_per_second=7.24172G/s
BM_string_memset/1024/0         69.5 ns         69.5 ns     10079176 bytes_per_second=13.7312G/s
BM_string_memset/8192/0          123 ns          123 ns      5726682 bytes_per_second=62.2494G/s
BM_string_memset/16384/0         183 ns          183 ns      3832127 bytes_per_second=83.5219G/s
BM_string_memset/32768/0         306 ns          306 ns      2292654 bytes_per_second=99.8293G/s
BM_string_memset/65536/0         570 ns          569 ns      1224926 bytes_per_second=107.185G/s
BM_string_memset/131072/0       1067 ns         1067 ns       654098 bytes_per_second=114.395G/s
```

After:
```
BM_string_memset/8/0            2.34 ns         2.34 ns    299919615 bytes_per_second=3.18993G/s
BM_string_memset/16/0           2.58 ns         2.58 ns    271170449 bytes_per_second=5.76711G/s
BM_string_memset/32/0           2.61 ns         2.61 ns    266003840 bytes_per_second=11.4245G/s
BM_string_memset/64/0           2.62 ns         2.62 ns    269191710 bytes_per_second=22.784G/s
BM_string_memset/128/0          2.84 ns         2.84 ns    244486639 bytes_per_second=41.994G/s
BM_string_memset/256/0          4.23 ns         4.23 ns    165575532 bytes_per_second=56.4047G/s
BM_string_memset/512/0          7.12 ns         7.12 ns     99398933 bytes_per_second=67.0164G/s
BM_string_memset/1024/0         10.9 ns         10.9 ns     64108888 bytes_per_second=87.2884G/s
BM_string_memset/8192/0         63.6 ns         63.6 ns     11012138 bytes_per_second=119.989G/s
BM_string_memset/16384/0         127 ns          127 ns      5506888 bytes_per_second=120.065G/s
BM_string_memset/32768/0         252 ns          251 ns      2783524 bytes_per_second=121.346G/s
BM_string_memset/65536/0         515 ns          515 ns      1357500 bytes_per_second=118.587G/s
BM_string_memset/131072/0       1013 ns         1012 ns       691605 bytes_per_second=120.587G/s
```

Bug: http://b/292281479
Test: treehugger
Change-Id: I45bfffedbdf0ec55a1b1341ffbab0af6d240d3a3
2023-07-27 14:23:17 -07:00
..
avx2-memset-kbl.S avx2 memset: add missing vzeroupper. 2023-07-27 14:23:17 -07:00
cache.h Optimized L2 Cache value for Intel(R) Core Architectures. 2019-11-12 15:58:34 +00:00
sse2-memmove-slm.S Make memcpy memmove 2018-06-11 18:12:45 +00:00
sse2-memset-slm.S avx2 implementation for memset. 2022-07-22 21:48:50 +00:00
sse2-stpcpy-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse2-stpncpy-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse2-strcat-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse2-strcpy-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse2-strlen-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse2-strncat-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse2-strncpy-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
sse4-memcmp-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00
ssse3-strcmp-slm.S Fix opcode to compile with both gcc and llvm. 2015-04-23 21:40:31 +00:00
ssse3-strncmp-slm.S Add 64-bit Silvermont-optimized string/memory functions. 2014-05-12 17:37:07 -07:00