This CL improves the performance of below functions in helping with conversion
between utf8/utf16 with libutils:
- utf8_to_utf16_length
- utf8_to_utf16
- utf16_to_utf8_length
- utf16_to_utf
The basic idea is to keep the loop as tight as possible for the most
common cases, e.g. in UTF16-->UTF8 case, the most common case is
when the character is < 0x80 (ASCII), next is when it's < 0x0800 (
most Latin), and so on.
This version of implementation reduces the number of instructions
needed for every incoming utf-8 bytes in the original implementation
where:
1) calculating how many bytes needed given a leading UTF-8 byte
in utf8_codepoint_len(), it's a very clever way but involves
multiple instructions to calculate regardless
2) and an intermediate conversion to utf32, and then to utf16
utf8_to_utf32_codepoint()
The end result is about ~1.5x throughput improvement.
Benchmark results on redfin (64bit) before the change:
utf8_to_utf16_length: bytes_per_second=307.556M/s
utf8_to_utf16: bytes_per_second=246.664M/s
utf16_to_utf8_length: bytes_per_second=482.241M/s
utf16_to_utf8: bytes_per_second=351.376M/s
After the change:
utf8_to_utf16_length: bytes_per_second=544.022M/s
utf8_to_utf16: bytes_per_second=471.135M/s
utf16_to_utf8_length: bytes_per_second=685.381M/s
utf16_to_utf8: bytes_per_second=580.004M/s
Ideas for future improvement could include alignment handling and loop
unrolling to increase throughput more.
This CL also fixes issues below:
1. utf16_to_utf8_length() should return 0 when the source string has
length of 0, the original code returns -1 as below:
ssize_t utf16_to_utf8_length(const char16_t *src, size_t src_len)
{
if (src == nullptr || src_len == 0) {
return -1;
}
...
2. utf8_to_utf16() should check whether input string is valid.
Change-Id: I546138a7a8050681a524eabce9864219fc44f48e
If this was strlcpy16 it wouldn't be such a bad idea, but strncpy16 is
just an accident waiting to happen...
Test: N/A
Change-Id: Id296fdeadfb9f1f70ddc8fb6d31b3b6b5178a12c
Add FALLTHROUGH_INTENDED for clang compiler.
Bug: 112564944
Test: build with global -Wimplicit-fallthrough.
Change-Id: I40f8bbf94e207c9dd90921e9b762ba51abab5777
In the original code when target is an empty string
strlen16() would start reading the memory until a
"terminating null" (that is, zero) character is found.
This may happen because "*target++", at line 300,
would increment the pointer beyond the actual string.
Signed-off-by: Branislav Rankov <branislav.rankov@arm.com>
Signed-off-by: Tamas Petz <tamas.petz@arm.com>
Test: libutils_tests --gtest_filter=UnicodeTest.strstr16*
Change-Id: I213ffe061057c7fa8f34b68881e106a709557dcd
Without an explicit check, the return value can wrap around and return
a value that is far too small to hold the data from the resulting
conversion.
No CTS test is provided because it would need to allocate at least
SSIZE_MAX / 2 bytes of UTF-16 data, which is unreasonable on 64-bit
devices.
Bug: 37723026
Test: run cts -p android.security
Change-Id: I56ba5e31657633b7f33685dd8839d4b3b998e586
moved Foo.h as first include of Foo.cpp, and
removed redundant includes.
Made NativeHandle non virtual.
Test: run & compile
Bug: n/a
Change-Id: I37fa746cd42c9ba23aba181f84cb6c619386406a
Point to log/log.h where necessary, define LOG_TAG where necessary.
Accept that private/android_logger.h is suitable replacement for
log/logger.h and android/log.h.
Correct liblog/README
Effectively a cleanup and controlled select revert of
'system/core: drop or replace log/logger.h' and
'system/core: Replace log/log.h with android/log.h'.
Test: compile
Bug: 30465923
Change-Id: Ic2ad157bad6f5efe2c6af293a73bb753300b17a2
Should use android/log.h instead of log/log.h as a good example
to all others. Adjust header order to comply with Android Coding
standards.
Test: Compile
Bug: 26552300
Bug: 31289077
Change-Id: I33a8fb4e754d2dc4754d335660c450e0a67190fc
Inconsistent behaviour between utf16_to_utf8 and utf16_to_utf8_length
is causing a heap overflow.
Correcting the length computation and adding bound checks to the
conversion functions.
Test: ran libutils_tests
Bug: 29250543
Change-Id: I6115e3357141ed245c63c6eb25fc0fd0a9a7a2bb
(cherry picked from commit c4966a363e)
String16(const char *utf8) now returns the empty string in case
a string ends halfway throw a utf8 character.
Bug: 29267949
Clean cherry-pick from 1dcc0c8239
Change-Id: I5223caa7d42f4582a982609a898a02043265c6d3
strcmp needs a limit, otherwise it will compare the null terminator
with the next character in the haystack, which results in the compare
failing for all searches except where the needle is found at the very
end.
Bug: 28663748
Change-Id: I1939dc4037c2f2a75d617943b063d2d38a8c5e3a
These are needed for aapt to find javadoc comments that contain
"@removed" in order to skip them when printing styleable docs.
Bug: 28663748
Change-Id: I8866d2167c41e11d6c2586da369560d5815fd13e
Bug: 15274351
Bug: 15539240
Many MP3 files have incorrect utf16 chars, but the
Utf16_to_utf8_length() routine checks for errors in
standard utf16 char. utf16_to_utf8() was not checking
for errors in standard utf16 char.
Change-Id: Iafd922ff92cabe6bba8971215fcfd1fd471c894b
(cherry picked from commit 605b139cdf56364c6c9b37e59dd12efc61c24631)
- Deal with some -Wunused issues
- Override PRI macros (windows)
- Revert use of PRI macros on off64_t (linux)
- Deal with a gnu++11 complaince issue
Change-Id: Ie66751293bd84477a5a6dfd8a57e700a16e36964