The property_benchmark needs to use the `load_default_path` parameter in
when writing its test, which is necessary now. Also, the benchmark
seemed to retry infinitely when the initialization failed, so it has now
been modified to exit when the initialization fails, and send a message
as it does so.
Bug: 305838406
Test: adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_property
Change-Id: Idb235b0e7c119fa08ace5b814d7f7404dd0eaeb3
The bionic benchmarks set the decay time in various ways, but
don't necessarily restore it properly. Add a new method for
getting the current decay time and then a way to restore it.
Right now the assumption is that the decay time defaults to zero,
but in the near future that assumption might be incorrect. Therefore
using this method will future proof the code.
Bug: 302212507
Test: Unit tests pass for both static and dynamic executables.
Test: Ran bionic benchmarks that were modified.
Change-Id: Ia77ff9ffee3081c5c1c02cb4309880f33b284e82
Add benchmarks to monitor performance of the syscalls which are frequently
used in Android. These benchmarks will help Android kernel team to catch
syscall performance regressions and flagging them early. They can also
be used to compare different kernel versions.
Add mmap/munmap syscall benchmarks as a starting point. mmap and munmap
are among most frequently used syscall in Andoird. Specific parameters
were also chosen based on the frequency of their use in Android.
Bug: 283058897
Test: local build and run
Change-Id: If8e53305174532dd698706ccd20e4b800d8720d7
err() not only adds its own newline, it'll add the strerror() output
first, so this stray newline messes up the error message. (Seen running
bionic benchmarks not as root, which causes the std_map benchmarks to
fail, which is especially annoying because it ends the whole benchmark
run, not just the specific benchmark, but that's a less obvious fix, so
I'm just fixing the trivial part here!)
Test: treehugger
Change-Id: I7e87ea4385c410942fbdd14e6fdfddbf7320082a
This is used to monitor the impact of different lock granularity in a
memory allocator. It creates different memory alloc/dealloc patterns
across different threads but keep the same amount of bytes to be
processed.
Bug: 288126442
Test: run benchmark with --benchmark_filter=BM_malloc_threads_throughput*
Change-Id: I24eea617a6346480524dcb8c0bdbe9bd8e90dd72
The lseek64, lstat64, pread64 and pwrite64 symbols have been removed
from musl 1.2.4.
Bug: 286415000
Test: builds
Change-Id: Icba9e92e1ccbb5b52d6c60e554ab411c9e5c1954
Currently, if a test is created like this:
BIONIC_BENCHMARK_WITH_ARG(BM_bench, "16");
Everything works as expected, a benchmark is created of BM_bench/16.
However, it is not possible to specify a benchmark should be called with
one argument, but iterate over different values. The example:
BIONIC_BENCHMARK_WITH_ARG(BM_bench, "16 32");
Creates a single benchmark run with two arguments:
BM_bench/16/32
This change modifies the algorithm to make it possible to create multiple
instances of the benchmark iterating over each argument as a single
argument. After this change, two benchmarks are executed:
BM_bench/16
BM_bench/32
To do the previous behavior, use:
BIONIC_BENCHMARK_WITH_ARG(BM_bench, "16/32");
This will create a single benchmark with two args. This format does
not support spaces in the args, so "16 / 32" is not valid.
Modified the test_small.xml to use the new format.
Test: All unit tests pass.
Change-Id: I6f486e1d4a90580c3dace0581ea65f439911ef5a
Add benchmark file dependency in tests instead of using hard-coded
path to bionic-benchmarks.
In addition, add a TEST_MAPPING file so the tests run when benchmark files change.
Test: All unit tests pass.
Test: Ran atest bionic-benchmarks-tests.
Change-Id: I95608f5b5e75d9d74930960a2431c9896b621ce8
This is a new mallopt option that will force purge absolutely
everything no matter how long it takes to purge.
Wrote a unit test for the new mallopt, and added a test to help
verify that new mallopt parameters do not conflict with each other.
Modified some benchmarks to use this new parameter so that we can
get better RSS data.
Added a new M_PURGE_ALL benchmark.
Bug: 243851006
Test: All unit tests pass.
Test: Ran changed benchmarks.
Change-Id: I1b46a5e6253538108e052d11ee46fd513568adec
The behavior of this benchmark includes three steps:
1. Use up to 16 MB by allocating blocks with given size in each thread.
2. Release the all blocks in random order.
3. Use up to 1.6 MB by allocating blocks with given size in each thread.
This is used to see how the allocator manages the free blocks and we can
measure the impact of randomization property used by the allocator.
Test: Run malloc-rss-benchmark $NUM_THREADS $ALLOC_SIZE
Change-Id: Ib68562996905839ee4367b1b059714e2325ca03e
Use /data/local/tests/unrestricted as on-device test path so that
debuggerd can generate proper back traces.
Bug: 199904562
Bug: 184739644
Test: treehugger
Change-Id: I4ae1fb3182dde081121179f1b099374119205dd3
For convenience, builds against musl libc currently use the
linux_glibc properties because they are almost always linux-specific
and not glibc-specific. In preparation for removing this hack,
tweak the linux_glibc properties by either moving them to host_linux,
which will apply to linux_glibc, linux_musl and linux_bionic, or
by setting appropriate musl or linux_musl properties. Properties
that must not be repeated while musl uses linux_musl and also still
uses the linux_glibc properties are moved to glibc properties, which
don't apply to musl. Whether these stay as glibc properties or get
moved back to linux_glibc later once the musl hack is removed is TBD.
Bug: 223257095
Test: m checkbuild
Test: m USE_HOST_MUSL=true host-native
Change-Id: I809bf1ba783dff02f6491d87fbdc9fa7fc0975b0
Fix references to symbols that don't exist in musl in the bionic
benchmarks, and disable the header tests for musl.
Bug: 190084016
Test: m USE_HOST_MUSL=true host-native
Change-Id: I6b1964afa4a7b6e6a4812e9f2605fcfc2fae9691
A decent chunk of the logcat profile is spent formatting the timestamps
for each line, and most of that time was going to snprintf(3). We should
find all the places that could benefit from a lighter-weight "format an
integer" and share something between them, but this is easy for now.
Before:
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_time_strftime 781 ns 775 ns 893102
After:
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_time_strftime 149 ns 147 ns 4750782
Much of the remaining time is in tzset() which seems unfortunate.
Test: treehugger
Change-Id: Ie0f7ee462ff1b1abea6f87d4a9a996d768e51056
I don't think the old benchmarks made any sense; going through all the
characters might have made sense as a unit test, but I'm not sure how
actionable they were for realistic cases. In particular, "all ASCII" is
a common special case that's worth measuring separately. I'm still not
hugely convinced, but at least separating the "ASCII" and "wide" paths
is an improvement. I can't think of a case where we did optimization
work on this kind of code without considering those two paths
separately.
I've added to the single-character benchmarks by splitting out the
separate cases instead --- one benchmark each for single-byte up to
4-byte characters.
Bug: http://b/206523398
Test: treehugger
Change-Id: I58cbfedb4b497a55580857eff307a024938cf006
Auto-generate NOTICE files for all the directories, and for each one
individually rather than mixing libc and libm together.
Test: N/A
Change-Id: I7e251194a8805c4ca78fcc5675c3321bcd5abf0a
This makes it easy to benchmark changes to bionic without needing
to reflash the device or mess with LD_LIBRARY_PATH.
Change-Id: Ic7ea0f075751f8f077612617802775d2d0a799dc
A lot of these benchmarks predate DoNotOptimize and rolled their own
hacks.
Bug: http://b/148307629
Test: ran benchmarks before & after and got similar results
Change-Id: If44699d261b687f6253af709edda58f4c90fb285
The benchmark calls __cxa_atexit 100000 times, then either exits early
without calling the destructors (_Exit) or exits normally, calling them
(exit).
Test: mmma bionic/benchmarks/spawn
Change-Id: I41fe702d6ef11acb7a1dec95bf546b5dc693bd4a
One turns out not to be used at all, and the pylintrc even uses the more
intention-revealing term in the machine readable part, just not the
comment!
Test: treehugger
Change-Id: I4db7f1cf4fa1aa8ee601857e4e4c400e2119887c
This benchmark also includes the measuring of RSS.
Bug: 137795072
Test: Ran different iterations and verified the RSS_MB value is nearly
Test: identical no matter the number of iterations.
Change-Id: I465a0eae9dcff2e1fb739c35623a26291680951f
Per Wilco Dijkstra at ARM, 16 and 32 byte copies are much more common
than 8 or 64.
Change-Id: I3699d8bcd5f9dd8a8ccd8564a6cf58d2bd7089f5
Suggested-By: Wilco Dijkstra <wilco.dijkstra@arm.com>
Add a calloc benchmark to make sure that a native allocator isn't
doing anything incorrectly when zero'ing memory.
Also add a fork call benchmark to verify that the time to make a
fork call isn't increasing.
Test: Ran benchmarks on walleye and verified that the numbers are not
Test: too variable between runs.
Change-Id: I61d289d277f85ac432a315e539cf6391ea036866
Modules with a libbase dependency also need a liblog dependency now.
Fixes the linker-reloc-bench build target.
Bug: b/147779981
Test: manual
Change-Id: I41dd35717b665524a26a92a0c268e42c93a383b7
The benchmark creates a set of DSOs that mimic the work involved in
loading the current version of libandroid_servers.so. The synthetic
benchmark has roughly the same number of libraries with roughly the same
relocations.
Currently, on a local aosp_walleye build that includes recent performance
improvements (including the Neon-based CL
I3983bca1dddc9241bb70290ad3651d895f046660), using the "performance"
governor, the benchmark reports these scores:
$ adb shell taskset 10 \
/data/benchmarktest64/linker-reloc-bench/linker-reloc-bench \
--benchmark_repetitions=20 --benchmark_display_aggregates_only=true
...
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
BM_linker_relocation/real_time_mean 70048 us 465 us 20
BM_linker_relocation/real_time_median 70091 us 466 us 20
BM_linker_relocation/real_time_stddev 329 us 8.29 us 20
$ adb shell taskset 10 \
/data/benchmarktest/linker-reloc-bench/linker-reloc-bench \
--benchmark_repetitions=20 --benchmark_display_aggregates_only=true
...
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
BM_linker_relocation/real_time_mean 83051 us 462 us 20
BM_linker_relocation/real_time_median 83069 us 464 us 20
BM_linker_relocation/real_time_stddev 184 us 8.91 us 20
Test: manual
Bug: none
Change-Id: I6dac66978f8666f95c76387093bda6be0151bfce
Update the native allocator documentation to include running of this
benchmark.
Move the malloc_benchmark.cpp to malloc_sql_benchmark.cpp and use
malloc_benchmark.cpp for benchmarking functions from malloc.h.
Bug: 137795072
Test: Ran new benchmark.
Change-Id: I76856de833032da324ad0bc0b6bd85a4ea8c253d
Right now, when we read a system property, we first (assuming we've
already looked up the property's prop_info) read the property's serial
number; if we find that the low bit (the dirty bit) in the serial
number is set, we futex-wait for that serial number to become
non-dirty. By doing so, we spare readers from seeing partially-updated
property values if they race with the property service's non-atomic
memcpy to the property value slot. (The futex-wait here isn't
essential to the algorithm: spinning while dirty would suffice,
although it'd be somewhat less efficient.)
The problem with this approach is that readers can wait on the
property service process, potentially causing delays due to scheduling
variance. Property reads are not guaranteed to complete in finite time
right now.
This change makes property reads wait-free and ensures that they
complete in finite time in all cases. In the new approach, we prevent
value tearing by backing up each property we're about to modify and
directing readers to the backup copy if they try to read a property
with the dirty bit set.
(The wait freedom is limited to the case of readers racing against
*one* property update. A writer can still delay readers by rapidly
updating a property --- but after this change, readers can't hang due
to PID 1 scheduling delays.)
I considered adding explicit atomic access to short property values,
but between binary compatibility with the existing property database
and the need to carefully handle transitions of property values
between "short" (compatible with atomics) and "long" (incompatible
with atomics) length domains, I figured the complexity wasn't worth it
and that making property reads wait-free would be adequate.
Test: boots
Bug: 143561649
Change-Id: Ifd3108aedba5a4b157b66af6ca0a4ed084bd5982