platform_system_core/libutils
Eric Miao cb199b4795 libutils: Improve performance of utf8_to_utf16/utf16_to_utf8
This CL improves the performance of below functions in helping with conversion
between utf8/utf16 with libutils:

  - utf8_to_utf16_length
  - utf8_to_utf16
  - utf16_to_utf8_length
  - utf16_to_utf

The basic idea is to keep the loop as tight as possible for the most
common cases, e.g. in UTF16-->UTF8 case, the most common case is
when the character is < 0x80 (ASCII), next is when it's < 0x0800 (
most Latin), and so on.

This version of implementation reduces the number of instructions
needed for every incoming utf-8 bytes in the original implementation
where:

  1) calculating how many bytes needed given a leading UTF-8 byte
     in utf8_codepoint_len(), it's a very clever way but involves
     multiple instructions to calculate regardless

  2) and an intermediate conversion to utf32, and then to utf16
     utf8_to_utf32_codepoint()

The end result is about ~1.5x throughput improvement.

Benchmark results on redfin (64bit) before the change:

utf8_to_utf16_length: bytes_per_second=307.556M/s
utf8_to_utf16:        bytes_per_second=246.664M/s
utf16_to_utf8_length: bytes_per_second=482.241M/s
utf16_to_utf8:        bytes_per_second=351.376M/s

After the change:

utf8_to_utf16_length: bytes_per_second=544.022M/s
utf8_to_utf16:        bytes_per_second=471.135M/s
utf16_to_utf8_length: bytes_per_second=685.381M/s
utf16_to_utf8:        bytes_per_second=580.004M/s

Ideas for future improvement could include alignment handling and loop
unrolling to increase throughput more.

This CL also fixes issues below:

  1. utf16_to_utf8_length() should return 0 when the source string has
     length of 0, the original code returns -1 as below:

    ssize_t utf16_to_utf8_length(const char16_t *src, size_t src_len)
    {
        if (src == nullptr || src_len == 0) {
            return -1;
        }
	...

  2. utf8_to_utf16() should check whether input string is valid.

Change-Id: I546138a7a8050681a524eabce9864219fc44f48e
2023-07-12 13:23:07 -07:00
..
abi-dumps Add an ABI dump directory for libutils 2022-12-08 11:06:22 +08:00
include/utils Drop const assignment operator. 2023-06-28 11:21:47 -07:00
Android.bp Replace "apex_inherit" min_sdk_version 2022-12-20 16:05:54 +00:00
BitSet_fuzz.cpp
BitSet_test.cpp
CallStack.cpp Fix thread unwind in CallStack. 2022-09-12 18:37:22 -07:00
CallStack_fuzz.cpp
CallStack_test.cpp Fix the build with a newer LLVM. 2022-09-14 20:16:25 +00:00
CleanSpec.mk
Errors.cpp Add the Missing Header 2023-06-05 13:49:17 -07:00
Errors_test.cpp Fix OkOrFail<status_t> conversion ambiguities 2022-02-25 14:27:41 -05:00
FileMap.cpp
FileMap_fuzz.cpp
FileMap_test.cpp
FuzzFormatTypes.h
JenkinsHash.cpp
LightRefBase.cpp
Looper.cpp libutils: DEBUG_* modes compile forever 2022-10-08 05:13:47 +00:00
Looper_fuzz.cpp
Looper_test.cpp Looper: Use sequence numbers in epoll_event to track requests 2021-09-01 14:52:52 +00:00
Looper_test_pipe.h
LruCache_fuzz.cpp
LruCache_test.cpp Fix LruCache, allow std:string caching 2023-06-15 00:37:52 +00:00
misc.cpp
MODULE_LICENSE_APACHE2
Mutex_test.cpp
NativeHandle.cpp
NOTICE
OWNERS
Printer.cpp
Printer_fuzz.cpp
ProcessCallStack.cpp
ProcessCallStack_fuzz.cpp Fix the missing std 2023-03-01 23:30:29 +00:00
RefBase.cpp libutils: RefBase DEBUG_REF love 2022-10-10 16:58:57 +00:00
RefBase_fuzz.cpp Fix the missing std 2023-03-01 23:30:29 +00:00
RefBase_test.cpp Halve iteration count for some RefBase tests 2023-05-31 10:19:54 -07:00
SharedBuffer.cpp
SharedBuffer.h
SharedBuffer_test.cpp
Singleton_test.cpp
Singleton_test.h
Singleton_test1.cpp
Singleton_test2.cpp
StopWatch.cpp
String8.cpp libutils: Include limits for std::numeric_limits::max 2022-10-21 10:26:08 +05:30
String8_fuzz.cpp
String8_test.cpp
String16.cpp
String16_fuzz.cpp
String16_test.cpp
StrongPointer.cpp RefBase: test for stack check 2022-07-29 00:54:57 +00:00
StrongPointer_test.cpp
SystemClock.cpp
SystemClock_test.cpp
TEST_MAPPING
Threads.cpp Merge "libutils: Remove a little dead code" am: 4a39ba316f am: 0a8e5126ef am: 9637277417 am: 74402be9d1 2022-02-09 00:00:07 +00:00
Timers.cpp
Timers_test.cpp
Tokenizer.cpp libutils: DEBUG_* modes compile forever 2022-10-08 05:13:47 +00:00
Trace.cpp
Unicode.cpp libutils: Improve performance of utf8_to_utf16/utf16_to_utf8 2023-07-12 13:23:07 -07:00
Unicode_test.cpp libutils: Add more tests for Unicode 2022-12-06 15:14:27 -08:00
Vector_benchmark.cpp
Vector_fuzz.cpp libutils: rewrite Vector fuzzer 2023-07-01 00:28:48 +00:00
Vector_test.cpp libutils: clearer abort on overflow. 2022-04-28 00:25:25 +00:00
VectorImpl.cpp libutils: clearer abort on overflow. 2022-04-28 00:25:25 +00:00