tequilaOS/platform_bionic

Fork 0

Commit graph

Author	SHA1	Message	Date
Elliott Hughes	1c8a2a99a7	Optimize tolower(3)/toupper(3) from <ctype.h>. The tables in the BSD tolower/toupper are slower for ASCII than just doing the bit twiddling. We can't actually remove the tables on LP32, so move them into the "cruft" we keep around for backwards compatibility (but remove them for LP64 where they were never exposed). I noticed that the new bit-twiddling tolower(3) was performing better on arm64 than toupper(3). The 0xdf constant was requiring an extra MOV, and there isn't a BIC that takes an immediate value. Since we've already done the comparison to check that we're in the right range (where the bit is always set), though, we can EOR 0x20 to get the same result as the missing BIC 0x20 in just one instruction. I've applied that same optimization to towupper(3) too. Before: BM_ctype_tolower_n 3.30 ns 3.30 ns 212353035 BM_ctype_tolower_y 3.31 ns 3.30 ns 211234204 BM_ctype_toupper_n 3.30 ns 3.29 ns 214161246 BM_ctype_toupper_y 3.29 ns 3.28 ns 207643473 BM_wctype_towupper_ascii_n 3.53 ns 3.53 ns 195944444 BM_wctype_towupper_ascii_y 3.48 ns 3.48 ns 199233248 After: BM_ctype_tolower_n 2.93 ns 2.92 ns 242373703 BM_ctype_tolower_y 2.88 ns 2.87 ns 245365309 BM_ctype_toupper_n 2.93 ns 2.93 ns 243049353 BM_ctype_toupper_y 2.89 ns 2.89 ns 245072521 BM_wctype_towupper_ascii_n 3.34 ns 3.33 ns 212951912 BM_wctype_towupper_ascii_y 3.29 ns 3.29 ns 214651254 (Why do both the "y" and "n" variants speed up with the EOR change? Because the compiler transforms the code so that we unconditionally do the bit twiddling and then use CSEL to decide whether or not to actually use the result.) We also save 1028 bytes of data in the LP64 libc.so. Test: ran the bionic benchmarks and tests Change-Id: I7829339f8cb89a58efe539c2a01c51807413aa2d	2019-09-27 14:42:39 -07:00

Author

SHA1

Message

Date

Elliott Hughes

1c8a2a99a7

Optimize tolower(3)/toupper(3) from <ctype.h>.

The tables in the BSD tolower/toupper are slower for ASCII than just
doing the bit twiddling.

We can't actually remove the tables on LP32, so move them into the
"cruft" we keep around for backwards compatibility (but remove them for
LP64 where they were never exposed).

I noticed that the new bit-twiddling tolower(3) was performing better
on arm64 than toupper(3). The 0xdf constant was requiring an extra MOV,
and there isn't a BIC that takes an immediate value. Since we've already
done the comparison to check that we're in the right range (where the
bit is always set), though, we can EOR 0x20 to get the same result as
the missing BIC 0x20 in just one instruction.

I've applied that same optimization to towupper(3) too.

Before:

  BM_ctype_tolower_n                 3.30 ns         3.30 ns    212353035
  BM_ctype_tolower_y                 3.31 ns         3.30 ns    211234204
  BM_ctype_toupper_n                 3.30 ns         3.29 ns    214161246
  BM_ctype_toupper_y                 3.29 ns         3.28 ns    207643473

  BM_wctype_towupper_ascii_n         3.53 ns         3.53 ns    195944444
  BM_wctype_towupper_ascii_y         3.48 ns         3.48 ns    199233248

After:

  BM_ctype_tolower_n                 2.93 ns         2.92 ns    242373703
  BM_ctype_tolower_y                 2.88 ns         2.87 ns    245365309
  BM_ctype_toupper_n                 2.93 ns         2.93 ns    243049353
  BM_ctype_toupper_y                 2.89 ns         2.89 ns    245072521

  BM_wctype_towupper_ascii_n         3.34 ns         3.33 ns    212951912
  BM_wctype_towupper_ascii_y         3.29 ns         3.29 ns    214651254

(Why do both the "y" and "n" variants speed up with the EOR
change? Because the compiler transforms the code so that we
unconditionally do the bit twiddling and then use CSEL to decide whether
or not to actually use the result.)

We also save 1028 bytes of data in the LP64 libc.so.

Test: ran the bionic benchmarks and tests
Change-Id: I7829339f8cb89a58efe539c2a01c51807413aa2d

2019-09-27 14:42:39 -07:00

1 commit