842 lines
36 KiB
Markdown
842 lines
36 KiB
Markdown
|
*This document was originally written for a broad audience, and it was*
|
||
|
*determined that it'd be good to hold in Bionic's docs, too. Due to the*
|
||
|
*ever-changing nature of code, it tries to link to a stable tag of*
|
||
|
*Bionic's libc, rather than the live code in Bionic. Same for Clang.*
|
||
|
*Reader beware. :)*
|
||
|
|
||
|
# The Anatomy of Clang FORTIFY
|
||
|
|
||
|
## Objective
|
||
|
|
||
|
The intent of this document is to run through the minutiae of how Clang FORTIFY
|
||
|
actually works in Bionic at the time of writing. Other FORTIFY implementations
|
||
|
that target Clang should use very similar mechanics. This document exists in part
|
||
|
because many Clang-specific features serve multiple purposes simultaneously, so
|
||
|
getting up-to-speed on how things function can be quite difficult.
|
||
|
|
||
|
## Background
|
||
|
|
||
|
FORTIFY is a broad suite of extensions to libc aimed at catching misuses of
|
||
|
common library functions. Textually, these extensions exist purely in libc, but
|
||
|
all implementations of FORTIFY rely heavily on C language extensions in order
|
||
|
to function at all.
|
||
|
|
||
|
Broadly, FORTIFY implementations try to guard against many misuses of C
|
||
|
standard(-ish) libraries:
|
||
|
- Buffer overruns in functions where pointers+sizes are passed (e.g., `memcpy`,
|
||
|
`poll`), or where sizes exist implicitly (e.g., `strcpy`).
|
||
|
- Arguments with incorrect values passed to libc functions (e.g.,
|
||
|
out-of-bounds bits in `umask`).
|
||
|
- Missing arguments to functions (e.g., `open()` with `O_CREAT`, but no mode
|
||
|
bits).
|
||
|
|
||
|
FORTIFY is traditionally enabled by passing `-D_FORTIFY_SOURCE=N` to your
|
||
|
compiler. `N==0` disables FORTIFY, whereas `N==1`, `N==2`, and `N==3` enable
|
||
|
increasingly strict versions of it. In general, FORTIFY doesn't require user
|
||
|
code changes; that said, some code patterns
|
||
|
are [incompatible with stricter versions of FORTIFY checking]. This is largely
|
||
|
because FORTIFY has significant flexibility in what it considers to be an
|
||
|
"out-of-bounds" access.
|
||
|
|
||
|
FORTIFY implementations use a mix of compiler diagnostics and runtime checks to
|
||
|
flag and/or mitigate the impacts of the misuses mentioned above.
|
||
|
|
||
|
Further, given FORTIFY's design, the effectiveness of FORTIFY is a function of
|
||
|
-- among other things -- the optimization level you're compiling your code at.
|
||
|
Many FORTIFY implementations are implicitly disabled when building with `-O0`,
|
||
|
since FORTIFY's design for both Clang and GCC relies on optimizations in order
|
||
|
to provide useful run-time checks. For the purpose of this document, all
|
||
|
analysis of FORTIFY functions and commentary on builtins assume that code is
|
||
|
being built with some optimization level > `-O0`.
|
||
|
|
||
|
### A note on GCC
|
||
|
|
||
|
This document talks specifically about Bionic's FORTIFY implementation targeted
|
||
|
at Clang. While GCC also provides a set of language extensions necessary to
|
||
|
implement FORTIFY, these tools are different from what Clang offers. This
|
||
|
divergence is an artifact of Clang and GCC's differing architecture as
|
||
|
compilers.
|
||
|
|
||
|
Textually, quite a bit can be shared between a FORTIFY implementation for GCC
|
||
|
and one for Clang (e.g., see [ChromeOS' Glibc patch]), but this kind of sharing
|
||
|
requires things like macros that expand to unbalanced braces depending on your
|
||
|
compiler:
|
||
|
|
||
|
```c
|
||
|
/*
|
||
|
* Highly simplified; if you're interested in FORTIFY's actual implementation,
|
||
|
* please see the patch linked above.
|
||
|
*/
|
||
|
#ifdef __clang__
|
||
|
# define FORTIFY_PRECONDITIONS
|
||
|
# define FORTIFY_FUNCTION_END
|
||
|
#else
|
||
|
# define FORTIFY_PRECONDITIONS {
|
||
|
# define FORTIFY_FUNCTION_END }
|
||
|
#endif
|
||
|
|
||
|
/*
|
||
|
* FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN is not defined, due to its
|
||
|
* complexity and irrelevance. It turns into a compile-time warning if the
|
||
|
* compiler can determine `*buf` has fewer than `size` bytes available.
|
||
|
*/
|
||
|
|
||
|
char *getcwd(char *buf, size_t size)
|
||
|
FORTIFY_PRECONDITIONS
|
||
|
FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN(buf, size, "`buf` is too smol.")
|
||
|
{
|
||
|
// Actual shared function implementation goes here.
|
||
|
}
|
||
|
FORTIFY_FUNCTION_END
|
||
|
```
|
||
|
|
||
|
All talk of GCC-focused implementations and how to merge Clang and GCC
|
||
|
implementations is out-of-scope for this doc, however.
|
||
|
|
||
|
## The Life of a Clang FORTIFY Function
|
||
|
|
||
|
As referenced in the Background section, FORTIFY performs many different checks
|
||
|
for many functions. This section intends to go through real-world examples of
|
||
|
FORTIFY functions in Bionic, breaking down how each part of these functions
|
||
|
work, and how the pieces fit together to provide FORTIFY-like functionality.
|
||
|
|
||
|
While FORTIFY implementations may differ between stdlibs, they broadly follow
|
||
|
the same patterns when implementing their checks for Clang, and they try to
|
||
|
make similar promises with respect to FORTIFY compiling to be zero-overhead in
|
||
|
some cases, etc. Moreover, while this document specifically examines Bionic,
|
||
|
many stdlibs will operate _very similarly_ to Bionic in their Clang FORTIFY
|
||
|
implementations.
|
||
|
|
||
|
**In general, when reading the below, be prepared for exceptions, subtlety, and
|
||
|
corner cases. The individual function breakdowns below try to not offer
|
||
|
redundant information. Each one focuses on different aspects of FORTIFY.**
|
||
|
|
||
|
### Terminology
|
||
|
|
||
|
Because FORTIFY should be mostly transparent to developers, there are inherent
|
||
|
naming collisions here: `memcpy(x, y, z)` turns into fundamentally different
|
||
|
generated code depending on the value of `_FORTIFY_SOURCE`. Further, said
|
||
|
`memcpy` call with `_FORTIFY_SOURCE` enabled needs to be able to refer to the
|
||
|
`memcpy` that would have been called, had `_FORTIFY_SOURCE` been disabled.
|
||
|
Hence, the following convention is followed in the subsections below for all
|
||
|
prose (namely, multiline code blocks are exempted from this):
|
||
|
|
||
|
- Standard library function names preceded by `__builtin_` refer to the use of
|
||
|
the function with `_FORTIFY_SOURCE` disabled.
|
||
|
- Standard library function names without a prefix refer to the use of the
|
||
|
function with `_FORTIFY_SOURCE` enabled.
|
||
|
|
||
|
This convention also applies in `clang`. `__builtin_memcpy` will always call
|
||
|
`memcpy` as though `_FORTIFY_SOURCE` were disabled.
|
||
|
|
||
|
## Breakdown of `mempcpy`
|
||
|
|
||
|
The [FORTIFY'ed version of `mempcpy`] is a full, featureful example of a
|
||
|
FORTIFY'ed function from Bionic. From the user's perspective, it supports a few
|
||
|
things:
|
||
|
- Producing a compile-time error if the number of bytes to copy trivially
|
||
|
exceeds the number of bytes available at the destination pointer.
|
||
|
- If the `mempcpy` has the potential to write to more bytes than what is
|
||
|
available at the destination, a run-time check is inserted to crash the
|
||
|
program if more bytes are written than what is allowed.
|
||
|
- Compiling away to be zero overhead when none of the buffer sizes can be
|
||
|
determined at compile-time[^1].
|
||
|
|
||
|
The declaration in Bionic's headers for `__builtin_mempcpy` is:
|
||
|
```c
|
||
|
void* mempcpy(void* __dst, const void* __src, size_t __n) __INTRODUCED_IN(23);
|
||
|
```
|
||
|
|
||
|
Which is annotated with nothing special, except for Bionic's versioner, which
|
||
|
is Android-specific (and orthogonal to FORTIFY anyway), so it will be ignored.
|
||
|
|
||
|
The [source for `mempcpy`] in Bionic's headers for is:
|
||
|
```c
|
||
|
__BIONIC_FORTIFY_INLINE
|
||
|
void* mempcpy(void* const dst __pass_object_size0, const void* src, size_t copy_amount)
|
||
|
__overloadable
|
||
|
__clang_error_if(__bos_unevaluated_lt(__bos0(dst), copy_amount),
|
||
|
"'mempcpy' called with size bigger than buffer") {
|
||
|
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
|
||
|
size_t bos_dst = __bos0(dst);
|
||
|
if (!__bos_trivially_ge(bos_dst, copy_amount)) {
|
||
|
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
|
||
|
}
|
||
|
#endif
|
||
|
return __builtin_mempcpy(dst, src, copy_amount);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Expanding some of the important macros here, this function expands to roughly:
|
||
|
```c
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
void* mempcpy(
|
||
|
void* const dst __attribute__((pass_object_size(0))),
|
||
|
const void* src,
|
||
|
size_t copy_amount)
|
||
|
__attribute__((overloadable))
|
||
|
__attribute__((diagnose_if(
|
||
|
__builtin_object_size(dst, 0) != -1 && __builtin_object_size(dst, 0) <= copy_amount),
|
||
|
"'mempcpy' called with size bigger than buffer"))) {
|
||
|
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
|
||
|
size_t bos_dst = __builtin_object_size(dst, 0);
|
||
|
if (!(__bos_trivially_ge(bos_dst, copy_amount))) {
|
||
|
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
|
||
|
}
|
||
|
#endif
|
||
|
return __builtin_mempcpy(dst, src, copy_amount);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
So let's walk through this step by step, to see how FORTIFY does what it says on
|
||
|
the tin here.
|
||
|
|
||
|
[^1]: "Zero overhead" in a way [similar to C++11's `std::unique_ptr`]: this will
|
||
|
turn into a direct call `__builtin_mempcpy` (or an optimized form thereof) with
|
||
|
no other surrounding checks at runtime. However, the additional complexity may
|
||
|
hinder optimizations that are performed before the optimizer can prove that the
|
||
|
`if (...) { ... }` can be optimized out. Depending on how late this happens,
|
||
|
the additional complexity may skew inlining costs, hide opportunities for e.g.,
|
||
|
`memcpy` coalescing, etc etc.
|
||
|
|
||
|
### How does Clang select `mempcpy`?
|
||
|
|
||
|
First, it's critical to notice that `mempcpy` is marked `overloadable`. This
|
||
|
function is a `static inline __attribute__((always_inline))` overload of
|
||
|
`__builtin_mempcpy`:
|
||
|
- `__attribute__((overloadable))` allows us to perform overloading in C.
|
||
|
- `__attribute__((overloadable))` mangles all calls to functions marked with
|
||
|
`__attribute__((overloadable))`.
|
||
|
- `__attribute__((overloadable))` allows exactly one function signature with a
|
||
|
given name to not be marked with `__attribute__((overloadable))`. Calls to
|
||
|
this overload will not be mangled.
|
||
|
|
||
|
Second, one might note that this `mempcpy` implementation has the same C-level
|
||
|
signature as `__builtin_mempcpy`. `pass_object_size` is a Clang attribute that
|
||
|
is generally needed by FORTIFY, but it carries the side-effect that functions
|
||
|
may be overloaded simply on the presence (or lack of presence) of
|
||
|
`pass_object_size` attributes. Given two overloads of a function that only
|
||
|
differ on the presence of `pass_object_size` attributes, the candidate with
|
||
|
`pass_object_size` attributes is preferred.
|
||
|
|
||
|
Finally, the prior paragraph gets thrown out if one tries to take the address of
|
||
|
`mempcpy`. It is impossible to take the address of a function with one or more
|
||
|
parameters that are annotated with `pass_object_size`. Hence,
|
||
|
`&__builtin_mempcpy == &mempcpy`. Further, because this is an issue of overload
|
||
|
resolution, `(&mempcpy)(x, y, z);` is functionally identical to
|
||
|
`__builtin_mempcpy(x, y, z);`.
|
||
|
|
||
|
All of this accomplishes the following:
|
||
|
- Direct calls to `mempcpy` should call the FORTIFY-protected `mempcpy`.
|
||
|
- Indirect calls to `&mempcpy` should call `__builtin_mempcpy`.
|
||
|
|
||
|
### How does Clang offer compile-time diagnostics?
|
||
|
|
||
|
Once one is convinced that the FORTIFY-enabled overload of `mempcpy` will be
|
||
|
selected for direct calls, Clang's `diagnose_if` and `__builtin_object_size` do
|
||
|
all of the work from there.
|
||
|
|
||
|
Subtleties here primarily fall out of the discussion in the above section about
|
||
|
`&__builtin_mempcpy == &mempcpy`:
|
||
|
```c
|
||
|
#define _FORTIFY_SOURCE 2
|
||
|
#include <string.h>
|
||
|
void example_code() {
|
||
|
char buf[4]; // ...Assume sizeof(char) == 1.
|
||
|
const char input_buf[] = "Hello, World";
|
||
|
mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
|
||
|
|
||
|
mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
|
||
|
__builtin_mempcpy(buf, input_buf, 5); // No compile-time error.
|
||
|
(&mempcpy)(buf, input_buf, 5); // No compile-time error, since __builtin_mempcpy is selected.
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Otherwise, the rest of this subsection is dedicated to preliminary discussion
|
||
|
about `__builtin_object_size`.
|
||
|
|
||
|
Clang's frontend can do one of two things with `__builtin_object_size(p, n)`:
|
||
|
- Evaluate it as a constant.
|
||
|
- This can either mean declaring that the number of bytes at `p` is definitely
|
||
|
impossible to know, so the default value is used, or the number of bytes at
|
||
|
`p` can be known without optimizations.
|
||
|
- Declare that the expression cannot form a constant, and lower it to
|
||
|
`@llvm.objectsize`, which is discussed in depth later.
|
||
|
|
||
|
In the examples above, since `diagnose_if` is evaluated with context from the
|
||
|
caller, Clang should be able to trivially determine that `buf` refers to a
|
||
|
`char` array with 4 elements.
|
||
|
|
||
|
The primary consequence of the above is that diagnostics can only be emitted if
|
||
|
no optimizations are required to detect a broken code pattern. To be specific,
|
||
|
clang's constexpr evaluator must be able to determine the logical object that
|
||
|
any given pointer points to in order to fold `__builtin_object_size` to a
|
||
|
constant, non-default answer:
|
||
|
|
||
|
```c
|
||
|
#define _FORTIFY_SOURCE 2
|
||
|
#include <string.h>
|
||
|
void example_code() {
|
||
|
char buf[4]; // ...Assume sizeof(char) == 1.
|
||
|
const char input_buf[] = "Hello, World";
|
||
|
mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
|
||
|
mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
|
||
|
char *buf_ptr = buf;
|
||
|
mempcpy(buf_ptr, input_buf, 5); // No compile-time error; `buf_ptr`'s target can't be determined.
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### How does Clang insert run-time checks?
|
||
|
|
||
|
This section expands on the following statement: FORTIFY has zero runtime cost
|
||
|
in instances where there is no chance of catching a bug at run-time. Otherwise,
|
||
|
it introduces a tiny additional run-time cost to ensure that functions aren't
|
||
|
misused.
|
||
|
|
||
|
In prior sections, the following was established:
|
||
|
- `overloadable` and `pass_object_size` prompt Clang to always select this
|
||
|
overload of `mempcpy` over `__builtin_mempcpy` for direct calls.
|
||
|
- If a call to `mempcpy` was trivially broken, Clang would produce a
|
||
|
compile-time error, rather than producing a binary.
|
||
|
|
||
|
Hence, the case we're interested in here is one where Clang's frontend selected
|
||
|
a FORTIFY'ed function's implementation for a function call, but was unable to
|
||
|
find anything seriously wrong with said function call. Since the frontend is
|
||
|
powerless to detect bugs at this point, our focus shifts to the mechanisms that
|
||
|
LLVM uses to support FORTIFY.
|
||
|
|
||
|
Going back to Bionic's `mempcpy` implementation, we have the following (ignoring
|
||
|
diagnose_if and assuming run-time checks are enabled):
|
||
|
```c
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
void* mempcpy(
|
||
|
void* const dst __attribute__((pass_object_size(0))),
|
||
|
const void* src,
|
||
|
size_t copy_amount)
|
||
|
__attribute__((overloadable)) {
|
||
|
size_t bos_dst = __builtin_object_size(dst, 0);
|
||
|
if (bos_dst != -1 &&
|
||
|
!(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
|
||
|
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
|
||
|
}
|
||
|
return __builtin_mempcpy(dst, src, copy_amount);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
In other words, we have a `static`, `always_inline` function which:
|
||
|
- If `__builtin_object_size(dst, 0)` cannot be determined (in which case, it
|
||
|
returns -1), calls `__builtin_mempcpy`.
|
||
|
- Otherwise, if `copy_amount` can be folded to a constant, and if
|
||
|
`__builtin_object_size(dst, 0) >= copy_amount`, calls `__builtin_mempcpy`.
|
||
|
- Otherwise, calls `__builtin___mempcpy_chk`.
|
||
|
|
||
|
|
||
|
How can this be "zero overhead"? Let's focus on the following part of the
|
||
|
function:
|
||
|
|
||
|
```c
|
||
|
size_t bos_dst = __builtin_object_size(dst, 0);
|
||
|
if (bos_dst != -1 &&
|
||
|
!(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
|
||
|
```
|
||
|
|
||
|
If Clang's frontend cannot determine a value for `__builtin_object_size`, Clang
|
||
|
lowers it to LLVM's `@llvm.objectsize` intrinsic. The `@llvm.objectsize`
|
||
|
invocation corresponding to `__builtin_object_size(p, 0)` is guaranteed to
|
||
|
always fold to a constant value by the time LLVM emits machine code.
|
||
|
|
||
|
Hence, `bos_dst` is guaranteed to be a constant; if it's -1, the above branch
|
||
|
can be eliminated entirely, since it folds to `if (false && ...)`. Further, the
|
||
|
RHS of the `&&` in this branch has us call `__builtin_mempcpy` if `copy_amount`
|
||
|
is a known value less than `bos_dst` (yet another constant value). Therefore,
|
||
|
the entire condition is always knowable when LLVM is done with LLVM IR-level
|
||
|
optimizations, so no condition is ever emitted to machine code in practice.
|
||
|
|
||
|
#### Why is "zero overhead" in quotes? Why is `unique_ptr` relevant?
|
||
|
|
||
|
`__builtin_object_size` and `__builtin_constant_p` are forced to be constants
|
||
|
after most optimizations take place. Until LLVM replaces both of these with
|
||
|
constants and optimizes them out, we have additional branches and function calls
|
||
|
in our IR. This can have negative effects, such as distorting inlining costs and
|
||
|
inhibiting optimizations that are conservative around branches in control-flow.
|
||
|
|
||
|
So FORTIFY is free in these cases _in isolation of any of the code around it_.
|
||
|
Due to its implementation, it may impact the optimizations that occur on code
|
||
|
around the literal call to the FORTIFY-hardened libc function.
|
||
|
|
||
|
`unique_ptr` was just the first thing that came to the author's mind for "the
|
||
|
type should be zero cost with any level of optimization enabled, but edge-cases
|
||
|
might make it only-mostly-free to use."
|
||
|
|
||
|
### How is checking actually performed?
|
||
|
|
||
|
In cases where checking can be performed (e.g., where we call
|
||
|
`__builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);`), Bionic provides [an
|
||
|
implementation for `__mempcpy_chk`]. This is:
|
||
|
|
||
|
```c
|
||
|
extern "C" void* __mempcpy_chk(void* dst, const void* src, size_t count, size_t dst_len) {
|
||
|
__check_count("mempcpy", "count", count);
|
||
|
__check_buffer_access("mempcpy", "write into", count, dst_len);
|
||
|
return mempcpy(dst, src, count);
|
||
|
}
|
||
|
```
|
||
|
This function itself boils down to a few small branches which abort the program
|
||
|
if they fail, and a direct call to `__builtin_mempcpy`.
|
||
|
|
||
|
### Wrapping up
|
||
|
|
||
|
In the above breakdown, it was shown how Clang and Bionic work together to:
|
||
|
- represent FORTIFY-hardened overloads of functions,
|
||
|
- report misuses of stdlib functions at compile-time, and
|
||
|
- insert run-time checks for uses of functions that might be incorrect, but only
|
||
|
if we have the potential of proving the incorrectness of these.
|
||
|
|
||
|
## Breakdown of open
|
||
|
|
||
|
In Bionic, the [FORTIFY'ed implementation of `open`] is quite large. Much like
|
||
|
`mempcpy`, the `__builtin_open` declaration is simple:
|
||
|
|
||
|
```c
|
||
|
int open(const char* __path, int __flags, ...);
|
||
|
```
|
||
|
|
||
|
With some macros expanded, the FORTIFY-hardened header implementation is:
|
||
|
```c
|
||
|
int __open_2(const char*, int);
|
||
|
int __open_real(const char*, int, ...) __asm__(open);
|
||
|
|
||
|
#define __open_modes_useful(flags) (((flags) & O_CREAT) || ((flags) & O_TMPFILE) == O_TMPFILE)
|
||
|
|
||
|
static
|
||
|
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
|
||
|
__attribute__((diagnose_if(1, "error", "too many arguments")));
|
||
|
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
|
||
|
__attribute__((overloadable))
|
||
|
__attribute__((diagnose_if(
|
||
|
__open_modes_useful(flags),
|
||
|
"error",
|
||
|
"'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
|
||
|
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
|
||
|
return __open_2(pathname, flags);
|
||
|
#else
|
||
|
return __open_real(pathname, flags);
|
||
|
#endif
|
||
|
}
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
|
||
|
__attribute__((overloadable))
|
||
|
__clang_warning_if(!__open_modes_useful(flags) && modes,
|
||
|
"'open' has superfluous mode bits; missing O_CREAT?") {
|
||
|
return __open_real(pathname, flags, modes);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Which may be a lot to take in.
|
||
|
|
||
|
Before diving too deeply, please note that the remainder of these subsections
|
||
|
assume that the programmer didn't make any egregious typos. Moreover, there's no
|
||
|
real way that Bionic tries to prevent calls to `open` like
|
||
|
`open("foo", 0, "how do you convert a const char[N] to mode_t?");`. The only
|
||
|
real C-compatible solution the author can think of is "stamp out many overloads
|
||
|
to catch sort-of-common instances of this very uncommon typo." This isn't great.
|
||
|
|
||
|
More directly, no effort is made below to recognize calls that, due to
|
||
|
incompatible argument types, cannot go to any `open` implementation other than
|
||
|
`__builtin_open`, since it's recognized right here. :)
|
||
|
|
||
|
### Implementation breakdown
|
||
|
|
||
|
This `open` implementation does a few things:
|
||
|
- Turns calls to `open` with too many arguments into a compile-time error.
|
||
|
- Diagnoses calls to `open` with missing modes at compile-time and run-time
|
||
|
(both cases turn into errors).
|
||
|
- Emits warnings on calls to `open` with useless mode bits, unless the mode bits
|
||
|
are all 0.
|
||
|
|
||
|
One common bit of code not explained below is the `__open_real` declaration above:
|
||
|
```c
|
||
|
int __open_real(const char*, int, ...) __asm__(open);
|
||
|
```
|
||
|
|
||
|
This exists as a way for us to call `__builtin_open` without needing clang to
|
||
|
have a pre-defined `__builtin_open` function.
|
||
|
|
||
|
#### Compile-time error on too many arguments
|
||
|
|
||
|
```c
|
||
|
static
|
||
|
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
|
||
|
__attribute__((diagnose_if(1, "error", "too many arguments")));
|
||
|
```
|
||
|
|
||
|
Which matches most calls to open that supply too many arguments, since
|
||
|
`int(const char *, int, ...)` matches less strongly than
|
||
|
`int(const char *, int, mode_t, ...)` for calls where the 3rd arg can be
|
||
|
converted to `mode_t` without too much effort. Because of the `diagnose_if`
|
||
|
attribute, all of these calls turn into compile-time errors.
|
||
|
|
||
|
#### Compile-time or run-time error on missing arguments
|
||
|
The following overload handles all two-argument calls to `open`.
|
||
|
```c
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
|
||
|
__attribute__((overloadable))
|
||
|
__attribute__((diagnose_if(
|
||
|
__open_modes_useful(flags),
|
||
|
"error",
|
||
|
"'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
|
||
|
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
|
||
|
return __open_2(pathname, flags);
|
||
|
#else
|
||
|
return __open_real(pathname, flags);
|
||
|
#endif
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Like `mempcpy`, `diagnose_if` handles emitting a compile-time error if the call
|
||
|
to `open` is broken in a way that's visible to Clang's frontend. This
|
||
|
essentially boils down to "`open` is being called with a `flags` value that
|
||
|
requires mode bits to be set."
|
||
|
|
||
|
If that fails to catch a bug, we [unconditionally call `__open_2`], which
|
||
|
performs a run-time check:
|
||
|
```c
|
||
|
int __open_2(const char* pathname, int flags) {
|
||
|
if (needs_mode(flags)) __fortify_fatal("open: called with O_CREAT/O_TMPFILE but no mode");
|
||
|
return FDTRACK_CREATE_NAME("open", __openat(AT_FDCWD, pathname, force_O_LARGEFILE(flags), 0));
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### Compile-time warning if modes are pointless
|
||
|
|
||
|
Finally, we have the following `open` call:
|
||
|
```c
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
|
||
|
__attribute__((overloadable))
|
||
|
__clang_warning_if(!__open_modes_useful(flags) && modes,
|
||
|
"'open' has superfluous mode bits; missing O_CREAT?") {
|
||
|
return __open_real(pathname, flags, modes);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
This simply issues a warning if Clang's frontend can determine that `flags`
|
||
|
isn't necessary. Due to conventions in existing code, a `modes` value of `0` is
|
||
|
not diagnosed.
|
||
|
|
||
|
#### What about `&open`?
|
||
|
One yet-unaddressed aspect of the above is how `&open` works. This is thankfully
|
||
|
a short answer:
|
||
|
- It happens that `open` takes a parameter of type `const char*`.
|
||
|
- It happens that `pass_object_size` -- an attribute only applicable to
|
||
|
parameters of type `T*` -- makes it impossible to take the address of a
|
||
|
function.
|
||
|
|
||
|
Since clang doesn't support a "this function should never have its address
|
||
|
taken," attribute, Bionic uses the next best thing: `pass_object_size`. :)
|
||
|
|
||
|
## Breakdown of poll
|
||
|
|
||
|
(Preemptively: at the time of writing, Clang has no literal `__builtin_poll`
|
||
|
builtin. `__builtin_poll` is referenced below to remain consistent with the
|
||
|
convention established in the Terminology section.)
|
||
|
|
||
|
Bionic's `poll` implementation is closest to `mempcpy` above, though it has a
|
||
|
few interesting aspects worth examining.
|
||
|
|
||
|
The [full header implementation of `poll`] is, with some macros expanded:
|
||
|
```c
|
||
|
#define __bos_fd_count_trivially_safe(bos_val, fds, fd_count) \
|
||
|
((bos_val) == -1) || \
|
||
|
(__builtin_constant_p(fd_count) && \
|
||
|
(bos_val) >= sizeof(*fds) * (fd_count)))
|
||
|
|
||
|
static
|
||
|
__inline__
|
||
|
__attribute__((no_stack_protector))
|
||
|
__attribute__((always_inline))
|
||
|
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
|
||
|
__attribute__((overloadable))
|
||
|
__attriubte__((diagnose_if(
|
||
|
__builtin_object_size(fds, 1) != -1 && __builtin_object_size(fds, 1) < sizeof(*fds) * fd_count,
|
||
|
"error",
|
||
|
"in call to 'poll', fd_count is larger than the given buffer"))) {
|
||
|
size_t bos_fds = __builtin_object_size(fds, 1);
|
||
|
if (!__bos_fd_count_trivially_safe(bos_fds, fds, fd_count)) {
|
||
|
return __poll_chk(fds, fd_count, timeout, bos_fds);
|
||
|
}
|
||
|
return (&poll)(fds, fd_count, timeout);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
To get the commonality with `mempcpy` and `open` out of the way:
|
||
|
- This function is an overload with `__builtin_poll`.
|
||
|
- The signature is the same, modulo the presence of a `pass_object_size`
|
||
|
attribute. Hence, for direct calls, overload resolution will always prefer it
|
||
|
over `__builtin_poll`. Taking the address of `poll` is forbidden, so all
|
||
|
references to `&poll` actually reference `__builtin_poll`.
|
||
|
- When `fds` is too small to hold `fd_count` `pollfd`s, Clang will emit a
|
||
|
compile-time error if possible using `diagnose_if`.
|
||
|
- If this can't be observed until run-time, `__poll_chk` verifies this.
|
||
|
- When `fds` is a constant according to `__builtin_constant_p`, this always
|
||
|
compiles into `__poll_chk` for always-broken calls to `poll`, or
|
||
|
`__builtin_poll` for always-safe calls to `poll`.
|
||
|
|
||
|
The critical bits to highlight here are on this line:
|
||
|
```c
|
||
|
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
|
||
|
```
|
||
|
|
||
|
And this line:
|
||
|
```c
|
||
|
return (&poll)(fds, fd_count, timeout);
|
||
|
```
|
||
|
|
||
|
Starting with the simplest, we call `__builtin_poll` with `(&poll)(...);`. As
|
||
|
referenced above, taking the address of an overloaded function where all but one
|
||
|
overload has a `pass_object_size` attribute on one or more parameters always
|
||
|
resolves to the function without any `pass_object_size` attributes.
|
||
|
|
||
|
The other line deserves a section. The subtlety of it is almost entirely in the
|
||
|
use of `pass_object_size(1)` instead of `pass_object_size(0)`. on the `fds`
|
||
|
parameter, and the corresponding use of `__builtin_object_size(fds, 1);` in the
|
||
|
body of `poll`.
|
||
|
|
||
|
### Subtleties of __builtin_object_size(p, N)
|
||
|
|
||
|
Earlier in this document, it was said that a full description of each
|
||
|
attribute/builtin necessary to power FORTIFY was out of scope. This is... only
|
||
|
somewhat the case when we talk about `__builtin_object_size` and
|
||
|
`pass_object_size`, especially when their second argument is `1`.
|
||
|
|
||
|
#### tl;dr
|
||
|
`__builtin_object_size(p, N)` and `pass_object_size(N)`, where `(N & 1) == 1`,
|
||
|
can only be accurately determined by Clang. LLVM's `@llvm.objectsize` intrinsic
|
||
|
ignores the value of `N & 1`, since handling `(N & 1) == 1` accurately requires
|
||
|
data that's currently entirely inaccessible to LLVM, and that is difficult to
|
||
|
preserve through LLVM's optimization passes.
|
||
|
|
||
|
`pass_object_size`'s "lifting" of the evaluation of
|
||
|
`__builtin_object_size(p, N)` to the caller is critical, since it allows Clang
|
||
|
full visibility into the expression passed to e.g., `poll(&foo->bar, baz, qux)`.
|
||
|
It's not a perfect solution, but it allows `N == 1` to be fully accurate in at
|
||
|
least some cases.
|
||
|
|
||
|
#### Background
|
||
|
Clang's implementation of `__builtin_object_size` aims to be compatible with
|
||
|
GCC's, which has [a decent bit of documentation]. Put simply,
|
||
|
`__builtin_object_size(p, N)` is intended to evaluate at compile-time how many
|
||
|
bytes can be accessed after `p` in a well-defined way. Straightforward examples
|
||
|
of this are:
|
||
|
```c
|
||
|
char buf[8];
|
||
|
assert(__builtin_object_size(buf, N) == 8);
|
||
|
assert(__builtin_object_size(buf + 1, N) == 7);
|
||
|
```
|
||
|
|
||
|
This should hold for all values of N that are valid to pass to
|
||
|
`__builtin_object_size`. The `N` value of `__builtin_object_size` is a mask of
|
||
|
settings.
|
||
|
|
||
|
##### (N & 2) == ?
|
||
|
|
||
|
This is mostly for completeness sake; in Bionic's FORTIFY implementation, N is
|
||
|
always either 0 or 1.
|
||
|
|
||
|
If there are multiple possible values of `p` in a call to
|
||
|
`__builtin_object_size(p, N)`, the second bit in `N` determines the behavior of
|
||
|
the compiler. If `(N & 2) == 0`, `__builtin_object_size` should return the
|
||
|
greatest possible size for each possible value of `p`. Otherwise, it should
|
||
|
return the least possible value. For example:
|
||
|
|
||
|
```c
|
||
|
char smol_buf[7];
|
||
|
char buf[8];
|
||
|
char *p = rand() ? smol_buf : buf;
|
||
|
assert(__builtin_object_size(p, 0) == 8);
|
||
|
assert(__builtin_object_size(p, 2) == 7);
|
||
|
```
|
||
|
|
||
|
##### (N & 1) == 0
|
||
|
|
||
|
`__builtin_object_size(p, 0)` is more or less as simple as the example in the
|
||
|
Background section directly above. When Clang attempts to evaluate
|
||
|
`__builtin_object_size(p, 0);` and when LLVM tries to determine the result of a
|
||
|
corresponding `@llvm.objectsize` call to, they search for the storage underlying
|
||
|
the pointer in question. If that can be determined, Clang or LLVM can provide an
|
||
|
answer; otherwise, they cannot.
|
||
|
|
||
|
##### (N & 1) == 1, and the true magic of pass_object_size
|
||
|
|
||
|
`__builtin_object_size(p, 1)` has a less uniform implementation between LLVM and
|
||
|
Clang. According to GCC's documentation, "If the least significant bit [of
|
||
|
__builtin_object_size's second argument] is clear, objects are whole variables,
|
||
|
if it is set, a closest surrounding subobject is considered the object a pointer
|
||
|
points to."
|
||
|
|
||
|
The "closest surrounding subobject," means that `(N & 1) == 1` depends on type
|
||
|
information in order to operate in many cases. Consider the following examples:
|
||
|
```c
|
||
|
struct Foo {
|
||
|
int a;
|
||
|
int b;
|
||
|
};
|
||
|
|
||
|
struct Foo foo;
|
||
|
assert(__builtin_object_size(&foo, 0) == sizeof(foo));
|
||
|
assert(__builtin_object_size(&foo, 1) == sizeof(foo));
|
||
|
assert(__builtin_object_size(&foo->a, 0) == sizeof(foo));
|
||
|
assert(__builtin_object_size(&foo->a, 1) == sizeof(int));
|
||
|
|
||
|
struct Foo foos[2];
|
||
|
assert(__builtin_object_size(&foos[0], 0) == 2 * sizeof(foo));
|
||
|
assert(__builtin_object_size(&foos[0], 1) == sizeof(foo));
|
||
|
assert(__builtin_object_size(&foos[0]->a, 0) == 2 * sizeof(foo));
|
||
|
assert(__builtin_object_size(&foos[0]->a, 1) == sizeof(int));
|
||
|
```
|
||
|
|
||
|
...And perhaps somewhat surprisingly:
|
||
|
```c
|
||
|
void example(struct Foo *foo) {
|
||
|
// (As a reminder, `-1` is "I don't know" when `(N & 2) == 0`.)
|
||
|
assert(__builtin_object_size(foo, 0) == -1);
|
||
|
assert(__builtin_object_size(foo, 1) == -1);
|
||
|
assert(__builtin_object_size(foo->a, 0) == -1);
|
||
|
assert(__builtin_object_size(foo->a, 1) == sizeof(int));
|
||
|
}
|
||
|
```
|
||
|
|
||
|
In Clang, [this type-aware requirement poses problems for us]: Clang's frontend
|
||
|
knows everything we could possibly want about the types of variables, but
|
||
|
optimizations are only performed by LLVM. LLVM has no reliable source for C or
|
||
|
C++ data types, so calls to `__builtin_object_size(p, N)` that cannot be
|
||
|
resolved by clang are lowered to the equivalent of
|
||
|
`__builtin_object_size(p, N & ~1)` in LLVM IR.
|
||
|
|
||
|
Moreover, Clang's frontend is the best-equipped part of the compiler to
|
||
|
accurately determine the answer for `__builtin_object_size(p, N)`, given we know
|
||
|
what `p` is. LLVM is the best-equipped part of the compiler to determine the
|
||
|
value of `p`. This ordering issue is unfortunate.
|
||
|
|
||
|
This is where `pass_object_size(N)` comes in. To summarize [the docs for
|
||
|
`pass_object_size`], it evaluates `__builtin_object_size(p, N)` within the
|
||
|
context of the caller of the function annotated with `pass_object_size`, and
|
||
|
passes the value of that into the callee as an invisible parameter. All calls to
|
||
|
`__builtin_object_size(parameter, N)` are substituted with references to this
|
||
|
invisible parameter.
|
||
|
|
||
|
Putting this plainly, Clang's frontend struggles to evaluate the following:
|
||
|
```c
|
||
|
int foo(void *p) {
|
||
|
return __builtin_object_size(p, 1);
|
||
|
}
|
||
|
|
||
|
void bar() {
|
||
|
struct { int i, j } k;
|
||
|
// The frontend can't figure this interprocedural objectsize out, so it gets lowered to
|
||
|
// LLVM, which determines that the answer here is sizeof(k).
|
||
|
int baz = foo(&k.i);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
However, with the magic of `pass_object_size`, we get one level of inlining to
|
||
|
look through:
|
||
|
```c
|
||
|
int foo(void *const __attribute__((pass_object_size(1))) p) {
|
||
|
return __builtin_object_size(p, 1);
|
||
|
}
|
||
|
|
||
|
void bar() {
|
||
|
struct { int i, j } k;
|
||
|
// Due to pass_object_size, this is equivalent to:
|
||
|
// int baz = foo(&k.i, __builtin_object_size(&k.i, 1));
|
||
|
// ...and `int foo(void *)` is actually equivalent to:
|
||
|
// int foo(void *const, size_t size) {
|
||
|
// return size;
|
||
|
// }
|
||
|
int baz = foo(&k.i);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
So we can obtain an accurate result in this case.
|
||
|
|
||
|
##### What about pass_object_size(0)?
|
||
|
It's sort of tangential, but if you find yourself wondering about the utility of
|
||
|
`pass_object_size(0)` ... it's somewhat split. `pass_object_size(0)` in Bionic's
|
||
|
FORTIFY exists mostly for visual consistency, simplicity, and as a useful way to
|
||
|
have e.g., `&mempcpy` == `&__builtin_mempcpy`.
|
||
|
|
||
|
Outside of these fringe benefits, all of the functions with
|
||
|
`pass_object_size(0)` on parameters are marked with `always_inline`, so
|
||
|
"lifting" the `__builtin_object_size` call isn't ultimately very helpful. In
|
||
|
theory, users can always have something like:
|
||
|
|
||
|
```c
|
||
|
// In some_header.h
|
||
|
// This function does cool and interesting things with the `__builtin_object_size` of its parameter,
|
||
|
// and is able to work with that as though the function were defined inline.
|
||
|
void out_of_line_function(void *__attribute__((pass_object_size(0))));
|
||
|
```
|
||
|
|
||
|
Though the author isn't aware of uses like this in practice, beyond a few folks
|
||
|
on LLVM's mailing list seeming interested in trying it someday.
|
||
|
|
||
|
#### Wrapping up
|
||
|
In the (long) section above, two things were covered:
|
||
|
- The use of `(&poll)(...);` is a convenient shorthand for calling
|
||
|
`__builtin_poll`.
|
||
|
- `__builtin_object_size(p, N)` with `(N & 1) == 1` is not easy for Clang to
|
||
|
answer accurately, since it relies on type info only available in the
|
||
|
frontend, and it sometimes relies on optimizations only available in the
|
||
|
middle-end. `pass_object_size` helps mitigate this.
|
||
|
|
||
|
## Miscellaneous Notes
|
||
|
The above should be a roughly comprehensive view of how FORTIFY works in the
|
||
|
real world. The main thing it fails to mention is the use of [the `diagnose_as_builtin` attribute] in Clang.
|
||
|
|
||
|
As time has moved on, Clang has increasingly gained support for emitting
|
||
|
warnings that were previously emitted by FORTIFY machinery.
|
||
|
`diagnose_as_builtin` allows us to remove the `diagnose_if`s from some of the
|
||
|
`static inline` overloads of stdlib functions above, so Clang may diagnose them
|
||
|
instead.
|
||
|
|
||
|
Clang's built-in diagnostics are often better than `diagnose_if` diagnostics,
|
||
|
since Clang can format its diagnostics to include e.g., information about the
|
||
|
sizes of buffers in a suspect call to a function. `diagnose_if` can only have
|
||
|
the compiler output constant strings.
|
||
|
|
||
|
[ChromeOS' Glibc patch]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/90fa9b27731db10a6010c7f7c25b24028145b091/sys-libs/glibc/files/local/glibc-2.33/0007-glibc-add-clang-style-FORTIFY.patch
|
||
|
[FORTIFY'ed implementation of `open`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/fcntl.h#41
|
||
|
[FORTIFY'ed version of `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/string.h#45
|
||
|
[a decent bit of documentation]: https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
|
||
|
[an implementation for `__mempcpy_chk`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/fortify.cpp#501
|
||
|
[full header implementation of `poll`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/poll.h#43
|
||
|
[incompatible with stricter versions of FORTIFY checking]: https://godbolt.org/z/fGfEYxfnf
|
||
|
[similar to C++11's `std::unique_ptr`]: https://stackoverflow.com/questions/58339165/why-can-a-t-be-passed-in-register-but-a-unique-ptrt-cannot
|
||
|
[source for `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/string.h#55
|
||
|
[the `diagnose_as_builtin` attribute]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#diagnose-as-builtin
|
||
|
[the docs for `pass_object_size`]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#pass-object-size-pass-dynamic-object-size
|
||
|
[this type-aware requirement poses problems for us]: https://github.com/llvm/llvm-project/issues/55742
|
||
|
[unconditionally call `__open_2`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/open.cpp#70
|