a4959aa6f8
Following on from the towlower()/towupper() changes, add benchmarks for most of <ctype.h>, rewrite the tests to cover the entire defined range for all of these functions, and then reimplement most of the functions. The old table-based implementation is mostly a bad idea on modern hardware, with only ispunct() showing a significant benefit compared to any other way I could think of writing it, and isalnum() a marginal but still convincingly genuine benefit. My new benchmarks make an effort to test an example from each relevant range of characters to avoid, say, accidentally optimizing the behavior of `isalnum('0')` at the expense of `isalnum('z')`. Interestingly, clang is able to generate what I believe to be the optimal implementations from the most readable code, which is impressive. It certainly matched or beat all my attempts to be clever! The BSD table-based implementations made a special case of EOF despite having a `_ctype_` table that's offset by 1 to include EOF at index 0. I'm not sure why they didn't take advantage of that, but removing the explicit check for EOF measurably improves the generated code on arm and arm64, so even the two functions that still use the table benefit from this rewrite. Here are the benchmark results: arm64 before: BM_ctype_isalnum_n 3.73 ns 3.73 ns 183727137 BM_ctype_isalnum_y1 3.82 ns 3.81 ns 186383058 BM_ctype_isalnum_y2 3.73 ns 3.72 ns 187809830 BM_ctype_isalnum_y3 3.78 ns 3.77 ns 181383055 BM_ctype_isalpha_n 3.75 ns 3.75 ns 189453927 BM_ctype_isalpha_y1 3.76 ns 3.75 ns 184854043 BM_ctype_isalpha_y2 4.32 ns 3.78 ns 186326931 BM_ctype_isascii_n 2.49 ns 2.48 ns 275583822 BM_ctype_isascii_y 2.51 ns 2.51 ns 282123915 BM_ctype_isblank_n 3.11 ns 3.10 ns 220472044 BM_ctype_isblank_y1 3.20 ns 3.19 ns 226088868 BM_ctype_isblank_y2 3.11 ns 3.11 ns 220809122 BM_ctype_iscntrl_n 3.79 ns 3.78 ns 188719938 BM_ctype_iscntrl_y1 3.72 ns 3.71 ns 186209237 BM_ctype_iscntrl_y2 3.80 ns 3.80 ns 184315749 BM_ctype_isdigit_n 3.76 ns 3.74 ns 188334682 BM_ctype_isdigit_y 3.78 ns 3.77 ns 186249335 BM_ctype_isgraph_n 3.99 ns 3.98 ns 177814143 BM_ctype_isgraph_y1 3.98 ns 3.95 ns 175140090 BM_ctype_isgraph_y2 4.01 ns 4.00 ns 178320453 BM_ctype_isgraph_y3 3.96 ns 3.95 ns 175412814 BM_ctype_isgraph_y4 4.01 ns 4.00 ns 175711174 BM_ctype_islower_n 3.75 ns 3.74 ns 188604818 BM_ctype_islower_y 3.79 ns 3.78 ns 154738238 BM_ctype_isprint_n 3.96 ns 3.95 ns 177607734 BM_ctype_isprint_y1 3.94 ns 3.93 ns 174877244 BM_ctype_isprint_y2 4.02 ns 4.01 ns 178206135 BM_ctype_isprint_y3 3.94 ns 3.93 ns 175959069 BM_ctype_isprint_y4 4.03 ns 4.02 ns 176158314 BM_ctype_isprint_y5 3.95 ns 3.94 ns 178745462 BM_ctype_ispunct_n 3.78 ns 3.77 ns 184727184 BM_ctype_ispunct_y 3.76 ns 3.75 ns 187947503 BM_ctype_isspace_n 3.74 ns 3.74 ns 185300285 BM_ctype_isspace_y1 3.77 ns 3.76 ns 187202066 BM_ctype_isspace_y2 3.73 ns 3.73 ns 184105959 BM_ctype_isupper_n 3.81 ns 3.80 ns 185038761 BM_ctype_isupper_y 3.71 ns 3.71 ns 185885793 BM_ctype_isxdigit_n 3.79 ns 3.79 ns 184965673 BM_ctype_isxdigit_y1 3.76 ns 3.75 ns 188251672 BM_ctype_isxdigit_y2 3.79 ns 3.78 ns 184187481 BM_ctype_isxdigit_y3 3.77 ns 3.76 ns 187635540 arm64 after: BM_ctype_isalnum_n 3.37 ns 3.37 ns 205613810 BM_ctype_isalnum_y1 3.40 ns 3.39 ns 204806361 BM_ctype_isalnum_y2 3.43 ns 3.43 ns 205066077 BM_ctype_isalnum_y3 3.50 ns 3.50 ns 200057128 BM_ctype_isalpha_n 2.97 ns 2.97 ns 236084076 BM_ctype_isalpha_y1 2.97 ns 2.97 ns 236083626 BM_ctype_isalpha_y2 2.97 ns 2.97 ns 236084246 BM_ctype_isascii_n 2.55 ns 2.55 ns 272879994 BM_ctype_isascii_y 2.46 ns 2.45 ns 286522323 BM_ctype_isblank_n 3.18 ns 3.18 ns 220431175 BM_ctype_isblank_y1 3.18 ns 3.18 ns 220345602 BM_ctype_isblank_y2 3.18 ns 3.18 ns 220308509 BM_ctype_iscntrl_n 3.10 ns 3.10 ns 220344270 BM_ctype_iscntrl_y1 3.10 ns 3.07 ns 228973615 BM_ctype_iscntrl_y2 3.07 ns 3.07 ns 229192626 BM_ctype_isdigit_n 3.07 ns 3.07 ns 228925676 BM_ctype_isdigit_y 3.07 ns 3.07 ns 229182934 BM_ctype_isgraph_n 2.66 ns 2.66 ns 264268737 BM_ctype_isgraph_y1 2.66 ns 2.66 ns 264445277 BM_ctype_isgraph_y2 2.66 ns 2.66 ns 264327427 BM_ctype_isgraph_y3 2.66 ns 2.66 ns 264427480 BM_ctype_isgraph_y4 2.66 ns 2.66 ns 264155250 BM_ctype_islower_n 2.66 ns 2.66 ns 264421600 BM_ctype_islower_y 2.66 ns 2.66 ns 264341148 BM_ctype_isprint_n 2.66 ns 2.66 ns 264415198 BM_ctype_isprint_y1 2.66 ns 2.66 ns 264268793 BM_ctype_isprint_y2 2.66 ns 2.66 ns 264419205 BM_ctype_isprint_y3 2.66 ns 2.66 ns 264205886 BM_ctype_isprint_y4 2.66 ns 2.66 ns 264440797 BM_ctype_isprint_y5 2.72 ns 2.72 ns 264333293 BM_ctype_ispunct_n 3.52 ns 3.51 ns 198956572 BM_ctype_ispunct_y 3.38 ns 3.38 ns 201661792 BM_ctype_isspace_n 3.39 ns 3.39 ns 206896620 BM_ctype_isspace_y1 3.39 ns 3.39 ns 206569020 BM_ctype_isspace_y2 3.39 ns 3.39 ns 206564415 BM_ctype_isupper_n 2.76 ns 2.75 ns 254227134 BM_ctype_isupper_y 2.76 ns 2.75 ns 254235314 BM_ctype_isxdigit_n 3.60 ns 3.60 ns 194418653 BM_ctype_isxdigit_y1 2.97 ns 2.97 ns 236082424 BM_ctype_isxdigit_y2 3.48 ns 3.48 ns 200390011 BM_ctype_isxdigit_y3 3.48 ns 3.48 ns 202255815 arm32 before: BM_ctype_isalnum_n 4.77 ns 4.76 ns 129230464 BM_ctype_isalnum_y1 4.88 ns 4.87 ns 147939321 BM_ctype_isalnum_y2 4.74 ns 4.73 ns 145508054 BM_ctype_isalnum_y3 4.81 ns 4.80 ns 144968914 BM_ctype_isalpha_n 4.80 ns 4.79 ns 148262579 BM_ctype_isalpha_y1 4.74 ns 4.73 ns 145061326 BM_ctype_isalpha_y2 4.83 ns 4.82 ns 147642546 BM_ctype_isascii_n 3.74 ns 3.72 ns 186711139 BM_ctype_isascii_y 3.79 ns 3.78 ns 183654780 BM_ctype_isblank_n 4.20 ns 4.19 ns 169733252 BM_ctype_isblank_y1 4.19 ns 4.18 ns 165713363 BM_ctype_isblank_y2 4.22 ns 4.21 ns 168776265 BM_ctype_iscntrl_n 4.75 ns 4.74 ns 145417484 BM_ctype_iscntrl_y1 4.82 ns 4.81 ns 146283250 BM_ctype_iscntrl_y2 4.79 ns 4.78 ns 148662453 BM_ctype_isdigit_n 4.77 ns 4.76 ns 145789210 BM_ctype_isdigit_y 4.84 ns 4.84 ns 146909458 BM_ctype_isgraph_n 4.72 ns 4.71 ns 145874663 BM_ctype_isgraph_y1 4.86 ns 4.85 ns 142037606 BM_ctype_isgraph_y2 4.79 ns 4.78 ns 145109612 BM_ctype_isgraph_y3 4.75 ns 4.75 ns 144829039 BM_ctype_isgraph_y4 4.86 ns 4.85 ns 146769899 BM_ctype_islower_n 4.76 ns 4.75 ns 147537637 BM_ctype_islower_y 4.79 ns 4.78 ns 145648017 BM_ctype_isprint_n 4.82 ns 4.81 ns 147154780 BM_ctype_isprint_y1 4.76 ns 4.76 ns 145117604 BM_ctype_isprint_y2 4.87 ns 4.86 ns 145801406 BM_ctype_isprint_y3 4.79 ns 4.78 ns 148043446 BM_ctype_isprint_y4 4.77 ns 4.76 ns 145157619 BM_ctype_isprint_y5 4.91 ns 4.90 ns 147810800 BM_ctype_ispunct_n 4.74 ns 4.73 ns 145588611 BM_ctype_ispunct_y 4.82 ns 4.81 ns 144065436 BM_ctype_isspace_n 4.78 ns 4.77 ns 147153712 BM_ctype_isspace_y1 4.73 ns 4.72 ns 145252863 BM_ctype_isspace_y2 4.84 ns 4.83 ns 148615797 BM_ctype_isupper_n 4.75 ns 4.74 ns 148276631 BM_ctype_isupper_y 4.80 ns 4.79 ns 145529893 BM_ctype_isxdigit_n 4.78 ns 4.77 ns 147271646 BM_ctype_isxdigit_y1 4.74 ns 4.74 ns 145142209 BM_ctype_isxdigit_y2 4.83 ns 4.82 ns 146398497 BM_ctype_isxdigit_y3 4.78 ns 4.77 ns 147617686 arm32 after: BM_ctype_isalnum_n 4.35 ns 4.35 ns 161086146 BM_ctype_isalnum_y1 4.36 ns 4.35 ns 160961111 BM_ctype_isalnum_y2 4.36 ns 4.36 ns 160733210 BM_ctype_isalnum_y3 4.35 ns 4.35 ns 160897524 BM_ctype_isalpha_n 3.67 ns 3.67 ns 189377208 BM_ctype_isalpha_y1 3.68 ns 3.67 ns 189438146 BM_ctype_isalpha_y2 3.75 ns 3.69 ns 190971186 BM_ctype_isascii_n 3.69 ns 3.68 ns 191029191 BM_ctype_isascii_y 3.68 ns 3.68 ns 191011817 BM_ctype_isblank_n 4.09 ns 4.09 ns 171887541 BM_ctype_isblank_y1 4.09 ns 4.09 ns 171829345 BM_ctype_isblank_y2 4.08 ns 4.07 ns 170585590 BM_ctype_iscntrl_n 4.08 ns 4.07 ns 170614383 BM_ctype_iscntrl_y1 4.13 ns 4.11 ns 171495899 BM_ctype_iscntrl_y2 4.19 ns 4.18 ns 165255578 BM_ctype_isdigit_n 4.25 ns 4.24 ns 165237008 BM_ctype_isdigit_y 4.24 ns 4.24 ns 165256149 BM_ctype_isgraph_n 3.82 ns 3.81 ns 183610114 BM_ctype_isgraph_y1 3.82 ns 3.81 ns 183614131 BM_ctype_isgraph_y2 3.82 ns 3.81 ns 183616840 BM_ctype_isgraph_y3 3.79 ns 3.79 ns 183620182 BM_ctype_isgraph_y4 3.82 ns 3.81 ns 185740009 BM_ctype_islower_n 3.75 ns 3.74 ns 183619502 BM_ctype_islower_y 3.68 ns 3.68 ns 190999901 BM_ctype_isprint_n 3.69 ns 3.68 ns 190899544 BM_ctype_isprint_y1 3.68 ns 3.67 ns 190192384 BM_ctype_isprint_y2 3.67 ns 3.67 ns 189351466 BM_ctype_isprint_y3 3.67 ns 3.67 ns 189430348 BM_ctype_isprint_y4 3.68 ns 3.68 ns 189430161 BM_ctype_isprint_y5 3.69 ns 3.68 ns 190962419 BM_ctype_ispunct_n 4.14 ns 4.14 ns 171034861 BM_ctype_ispunct_y 4.19 ns 4.19 ns 168308152 BM_ctype_isspace_n 4.50 ns 4.50 ns 156250887 BM_ctype_isspace_y1 4.48 ns 4.48 ns 155124476 BM_ctype_isspace_y2 4.50 ns 4.50 ns 155077504 BM_ctype_isupper_n 3.68 ns 3.68 ns 191020583 BM_ctype_isupper_y 3.68 ns 3.68 ns 191015669 BM_ctype_isxdigit_n 4.50 ns 4.50 ns 156276745 BM_ctype_isxdigit_y1 3.28 ns 3.27 ns 214729725 BM_ctype_isxdigit_y2 4.48 ns 4.48 ns 155265129 BM_ctype_isxdigit_y3 4.48 ns 4.48 ns 155216846 I've also corrected a small mistake in the documentation for isxdigit(). Test: tests and benchmarks Change-Id: I4a77859f826c3fc8f0e327e847886882f29ec4a3 |
||
---|---|---|
.. | ||
android | ||
lib/libc | ||
README.md |
This directory contains upstream OpenBSD source. You should not edit these files directly. Make fixes upstream and then pull down the new version of the file.
TODO: write a script to make this process automated.