cb199b4795
This CL improves the performance of below functions in helping with conversion between utf8/utf16 with libutils: - utf8_to_utf16_length - utf8_to_utf16 - utf16_to_utf8_length - utf16_to_utf The basic idea is to keep the loop as tight as possible for the most common cases, e.g. in UTF16-->UTF8 case, the most common case is when the character is < 0x80 (ASCII), next is when it's < 0x0800 ( most Latin), and so on. This version of implementation reduces the number of instructions needed for every incoming utf-8 bytes in the original implementation where: 1) calculating how many bytes needed given a leading UTF-8 byte in utf8_codepoint_len(), it's a very clever way but involves multiple instructions to calculate regardless 2) and an intermediate conversion to utf32, and then to utf16 utf8_to_utf32_codepoint() The end result is about ~1.5x throughput improvement. Benchmark results on redfin (64bit) before the change: utf8_to_utf16_length: bytes_per_second=307.556M/s utf8_to_utf16: bytes_per_second=246.664M/s utf16_to_utf8_length: bytes_per_second=482.241M/s utf16_to_utf8: bytes_per_second=351.376M/s After the change: utf8_to_utf16_length: bytes_per_second=544.022M/s utf8_to_utf16: bytes_per_second=471.135M/s utf16_to_utf8_length: bytes_per_second=685.381M/s utf16_to_utf8: bytes_per_second=580.004M/s Ideas for future improvement could include alignment handling and loop unrolling to increase throughput more. This CL also fixes issues below: 1. utf16_to_utf8_length() should return 0 when the source string has length of 0, the original code returns -1 as below: ssize_t utf16_to_utf8_length(const char16_t *src, size_t src_len) { if (src == nullptr || src_len == 0) { return -1; } ... 2. utf8_to_utf16() should check whether input string is valid. Change-Id: I546138a7a8050681a524eabce9864219fc44f48e |
||
---|---|---|
.. | ||
abi-dumps | ||
include/utils | ||
Android.bp | ||
BitSet_fuzz.cpp | ||
BitSet_test.cpp | ||
CallStack.cpp | ||
CallStack_fuzz.cpp | ||
CallStack_test.cpp | ||
CleanSpec.mk | ||
Errors.cpp | ||
Errors_test.cpp | ||
FileMap.cpp | ||
FileMap_fuzz.cpp | ||
FileMap_test.cpp | ||
FuzzFormatTypes.h | ||
JenkinsHash.cpp | ||
LightRefBase.cpp | ||
Looper.cpp | ||
Looper_fuzz.cpp | ||
Looper_test.cpp | ||
Looper_test_pipe.h | ||
LruCache_fuzz.cpp | ||
LruCache_test.cpp | ||
misc.cpp | ||
MODULE_LICENSE_APACHE2 | ||
Mutex_test.cpp | ||
NativeHandle.cpp | ||
NOTICE | ||
OWNERS | ||
Printer.cpp | ||
Printer_fuzz.cpp | ||
ProcessCallStack.cpp | ||
ProcessCallStack_fuzz.cpp | ||
RefBase.cpp | ||
RefBase_fuzz.cpp | ||
RefBase_test.cpp | ||
SharedBuffer.cpp | ||
SharedBuffer.h | ||
SharedBuffer_test.cpp | ||
Singleton_test.cpp | ||
Singleton_test.h | ||
Singleton_test1.cpp | ||
Singleton_test2.cpp | ||
StopWatch.cpp | ||
String8.cpp | ||
String8_fuzz.cpp | ||
String8_test.cpp | ||
String16.cpp | ||
String16_fuzz.cpp | ||
String16_test.cpp | ||
StrongPointer.cpp | ||
StrongPointer_test.cpp | ||
SystemClock.cpp | ||
SystemClock_test.cpp | ||
TEST_MAPPING | ||
Threads.cpp | ||
Timers.cpp | ||
Timers_test.cpp | ||
Tokenizer.cpp | ||
Trace.cpp | ||
Unicode.cpp | ||
Unicode_test.cpp | ||
Vector_benchmark.cpp | ||
Vector_fuzz.cpp | ||
Vector_test.cpp | ||
VectorImpl.cpp |