platform_bionic/docs/clang_fortify_anatomy.md

842 lines
36 KiB
Markdown
Raw Normal View History

*This document was originally written for a broad audience, and it was*
*determined that it'd be good to hold in Bionic's docs, too. Due to the*
*ever-changing nature of code, it tries to link to a stable tag of*
*Bionic's libc, rather than the live code in Bionic. Same for Clang.*
*Reader beware. :)*
# The Anatomy of Clang FORTIFY
## Objective
The intent of this document is to run through the minutiae of how Clang FORTIFY
actually works in Bionic at the time of writing. Other FORTIFY implementations
that target Clang should use very similar mechanics. This document exists in part
because many Clang-specific features serve multiple purposes simultaneously, so
getting up-to-speed on how things function can be quite difficult.
## Background
FORTIFY is a broad suite of extensions to libc aimed at catching misuses of
common library functions. Textually, these extensions exist purely in libc, but
all implementations of FORTIFY rely heavily on C language extensions in order
to function at all.
Broadly, FORTIFY implementations try to guard against many misuses of C
standard(-ish) libraries:
- Buffer overruns in functions where pointers+sizes are passed (e.g., `memcpy`,
`poll`), or where sizes exist implicitly (e.g., `strcpy`).
- Arguments with incorrect values passed to libc functions (e.g.,
out-of-bounds bits in `umask`).
- Missing arguments to functions (e.g., `open()` with `O_CREAT`, but no mode
bits).
FORTIFY is traditionally enabled by passing `-D_FORTIFY_SOURCE=N` to your
compiler. `N==0` disables FORTIFY, whereas `N==1`, `N==2`, and `N==3` enable
increasingly strict versions of it. In general, FORTIFY doesn't require user
code changes; that said, some code patterns
are [incompatible with stricter versions of FORTIFY checking]. This is largely
because FORTIFY has significant flexibility in what it considers to be an
"out-of-bounds" access.
FORTIFY implementations use a mix of compiler diagnostics and runtime checks to
flag and/or mitigate the impacts of the misuses mentioned above.
Further, given FORTIFY's design, the effectiveness of FORTIFY is a function of
-- among other things -- the optimization level you're compiling your code at.
Many FORTIFY implementations are implicitly disabled when building with `-O0`,
since FORTIFY's design for both Clang and GCC relies on optimizations in order
to provide useful run-time checks. For the purpose of this document, all
analysis of FORTIFY functions and commentary on builtins assume that code is
being built with some optimization level > `-O0`.
### A note on GCC
This document talks specifically about Bionic's FORTIFY implementation targeted
at Clang. While GCC also provides a set of language extensions necessary to
implement FORTIFY, these tools are different from what Clang offers. This
divergence is an artifact of Clang and GCC's differing architecture as
compilers.
Textually, quite a bit can be shared between a FORTIFY implementation for GCC
and one for Clang (e.g., see [ChromeOS' Glibc patch]), but this kind of sharing
requires things like macros that expand to unbalanced braces depending on your
compiler:
```c
/*
* Highly simplified; if you're interested in FORTIFY's actual implementation,
* please see the patch linked above.
*/
#ifdef __clang__
# define FORTIFY_PRECONDITIONS
# define FORTIFY_FUNCTION_END
#else
# define FORTIFY_PRECONDITIONS {
# define FORTIFY_FUNCTION_END }
#endif
/*
* FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN is not defined, due to its
* complexity and irrelevance. It turns into a compile-time warning if the
* compiler can determine `*buf` has fewer than `size` bytes available.
*/
char *getcwd(char *buf, size_t size)
FORTIFY_PRECONDITIONS
FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN(buf, size, "`buf` is too smol.")
{
// Actual shared function implementation goes here.
}
FORTIFY_FUNCTION_END
```
All talk of GCC-focused implementations and how to merge Clang and GCC
implementations is out-of-scope for this doc, however.
## The Life of a Clang FORTIFY Function
As referenced in the Background section, FORTIFY performs many different checks
for many functions. This section intends to go through real-world examples of
FORTIFY functions in Bionic, breaking down how each part of these functions
work, and how the pieces fit together to provide FORTIFY-like functionality.
While FORTIFY implementations may differ between stdlibs, they broadly follow
the same patterns when implementing their checks for Clang, and they try to
make similar promises with respect to FORTIFY compiling to be zero-overhead in
some cases, etc. Moreover, while this document specifically examines Bionic,
many stdlibs will operate _very similarly_ to Bionic in their Clang FORTIFY
implementations.
**In general, when reading the below, be prepared for exceptions, subtlety, and
corner cases. The individual function breakdowns below try to not offer
redundant information. Each one focuses on different aspects of FORTIFY.**
### Terminology
Because FORTIFY should be mostly transparent to developers, there are inherent
naming collisions here: `memcpy(x, y, z)` turns into fundamentally different
generated code depending on the value of `_FORTIFY_SOURCE`. Further, said
`memcpy` call with `_FORTIFY_SOURCE` enabled needs to be able to refer to the
`memcpy` that would have been called, had `_FORTIFY_SOURCE` been disabled.
Hence, the following convention is followed in the subsections below for all
prose (namely, multiline code blocks are exempted from this):
- Standard library function names preceded by `__builtin_` refer to the use of
the function with `_FORTIFY_SOURCE` disabled.
- Standard library function names without a prefix refer to the use of the
function with `_FORTIFY_SOURCE` enabled.
This convention also applies in `clang`. `__builtin_memcpy` will always call
`memcpy` as though `_FORTIFY_SOURCE` were disabled.
## Breakdown of `mempcpy`
The [FORTIFY'ed version of `mempcpy`] is a full, featureful example of a
FORTIFY'ed function from Bionic. From the user's perspective, it supports a few
things:
- Producing a compile-time error if the number of bytes to copy trivially
exceeds the number of bytes available at the destination pointer.
- If the `mempcpy` has the potential to write to more bytes than what is
available at the destination, a run-time check is inserted to crash the
program if more bytes are written than what is allowed.
- Compiling away to be zero overhead when none of the buffer sizes can be
determined at compile-time[^1].
The declaration in Bionic's headers for `__builtin_mempcpy` is:
```c
void* mempcpy(void* __dst, const void* __src, size_t __n) __INTRODUCED_IN(23);
```
Which is annotated with nothing special, except for Bionic's versioner, which
is Android-specific (and orthogonal to FORTIFY anyway), so it will be ignored.
The [source for `mempcpy`] in Bionic's headers for is:
```c
__BIONIC_FORTIFY_INLINE
void* mempcpy(void* const dst __pass_object_size0, const void* src, size_t copy_amount)
__overloadable
__clang_error_if(__bos_unevaluated_lt(__bos0(dst), copy_amount),
"'mempcpy' called with size bigger than buffer") {
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
size_t bos_dst = __bos0(dst);
if (!__bos_trivially_ge(bos_dst, copy_amount)) {
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
}
#endif
return __builtin_mempcpy(dst, src, copy_amount);
}
```
Expanding some of the important macros here, this function expands to roughly:
```c
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
void* mempcpy(
void* const dst __attribute__((pass_object_size(0))),
const void* src,
size_t copy_amount)
__attribute__((overloadable))
__attribute__((diagnose_if(
__builtin_object_size(dst, 0) != -1 && __builtin_object_size(dst, 0) <= copy_amount),
"'mempcpy' called with size bigger than buffer"))) {
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
size_t bos_dst = __builtin_object_size(dst, 0);
if (!(__bos_trivially_ge(bos_dst, copy_amount))) {
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
}
#endif
return __builtin_mempcpy(dst, src, copy_amount);
}
```
So let's walk through this step by step, to see how FORTIFY does what it says on
the tin here.
[^1]: "Zero overhead" in a way [similar to C++11's `std::unique_ptr`]: this will
turn into a direct call `__builtin_mempcpy` (or an optimized form thereof) with
no other surrounding checks at runtime. However, the additional complexity may
hinder optimizations that are performed before the optimizer can prove that the
`if (...) { ... }` can be optimized out. Depending on how late this happens,
the additional complexity may skew inlining costs, hide opportunities for e.g.,
`memcpy` coalescing, etc etc.
### How does Clang select `mempcpy`?
First, it's critical to notice that `mempcpy` is marked `overloadable`. This
function is a `static inline __attribute__((always_inline))` overload of
`__builtin_mempcpy`:
- `__attribute__((overloadable))` allows us to perform overloading in C.
- `__attribute__((overloadable))` mangles all calls to functions marked with
`__attribute__((overloadable))`.
- `__attribute__((overloadable))` allows exactly one function signature with a
given name to not be marked with `__attribute__((overloadable))`. Calls to
this overload will not be mangled.
Second, one might note that this `mempcpy` implementation has the same C-level
signature as `__builtin_mempcpy`. `pass_object_size` is a Clang attribute that
is generally needed by FORTIFY, but it carries the side-effect that functions
may be overloaded simply on the presence (or lack of presence) of
`pass_object_size` attributes. Given two overloads of a function that only
differ on the presence of `pass_object_size` attributes, the candidate with
`pass_object_size` attributes is preferred.
Finally, the prior paragraph gets thrown out if one tries to take the address of
`mempcpy`. It is impossible to take the address of a function with one or more
parameters that are annotated with `pass_object_size`. Hence,
`&__builtin_mempcpy == &mempcpy`. Further, because this is an issue of overload
resolution, `(&mempcpy)(x, y, z);` is functionally identical to
`__builtin_mempcpy(x, y, z);`.
All of this accomplishes the following:
- Direct calls to `mempcpy` should call the FORTIFY-protected `mempcpy`.
- Indirect calls to `&mempcpy` should call `__builtin_mempcpy`.
### How does Clang offer compile-time diagnostics?
Once one is convinced that the FORTIFY-enabled overload of `mempcpy` will be
selected for direct calls, Clang's `diagnose_if` and `__builtin_object_size` do
all of the work from there.
Subtleties here primarily fall out of the discussion in the above section about
`&__builtin_mempcpy == &mempcpy`:
```c
#define _FORTIFY_SOURCE 2
#include <string.h>
void example_code() {
char buf[4]; // ...Assume sizeof(char) == 1.
const char input_buf[] = "Hello, World";
mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
__builtin_mempcpy(buf, input_buf, 5); // No compile-time error.
(&mempcpy)(buf, input_buf, 5); // No compile-time error, since __builtin_mempcpy is selected.
}
```
Otherwise, the rest of this subsection is dedicated to preliminary discussion
about `__builtin_object_size`.
Clang's frontend can do one of two things with `__builtin_object_size(p, n)`:
- Evaluate it as a constant.
- This can either mean declaring that the number of bytes at `p` is definitely
impossible to know, so the default value is used, or the number of bytes at
`p` can be known without optimizations.
- Declare that the expression cannot form a constant, and lower it to
`@llvm.objectsize`, which is discussed in depth later.
In the examples above, since `diagnose_if` is evaluated with context from the
caller, Clang should be able to trivially determine that `buf` refers to a
`char` array with 4 elements.
The primary consequence of the above is that diagnostics can only be emitted if
no optimizations are required to detect a broken code pattern. To be specific,
clang's constexpr evaluator must be able to determine the logical object that
any given pointer points to in order to fold `__builtin_object_size` to a
constant, non-default answer:
```c
#define _FORTIFY_SOURCE 2
#include <string.h>
void example_code() {
char buf[4]; // ...Assume sizeof(char) == 1.
const char input_buf[] = "Hello, World";
mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
char *buf_ptr = buf;
mempcpy(buf_ptr, input_buf, 5); // No compile-time error; `buf_ptr`'s target can't be determined.
}
```
### How does Clang insert run-time checks?
This section expands on the following statement: FORTIFY has zero runtime cost
in instances where there is no chance of catching a bug at run-time. Otherwise,
it introduces a tiny additional run-time cost to ensure that functions aren't
misused.
In prior sections, the following was established:
- `overloadable` and `pass_object_size` prompt Clang to always select this
overload of `mempcpy` over `__builtin_mempcpy` for direct calls.
- If a call to `mempcpy` was trivially broken, Clang would produce a
compile-time error, rather than producing a binary.
Hence, the case we're interested in here is one where Clang's frontend selected
a FORTIFY'ed function's implementation for a function call, but was unable to
find anything seriously wrong with said function call. Since the frontend is
powerless to detect bugs at this point, our focus shifts to the mechanisms that
LLVM uses to support FORTIFY.
Going back to Bionic's `mempcpy` implementation, we have the following (ignoring
diagnose_if and assuming run-time checks are enabled):
```c
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
void* mempcpy(
void* const dst __attribute__((pass_object_size(0))),
const void* src,
size_t copy_amount)
__attribute__((overloadable)) {
size_t bos_dst = __builtin_object_size(dst, 0);
if (bos_dst != -1 &&
!(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
}
return __builtin_mempcpy(dst, src, copy_amount);
}
```
In other words, we have a `static`, `always_inline` function which:
- If `__builtin_object_size(dst, 0)` cannot be determined (in which case, it
returns -1), calls `__builtin_mempcpy`.
- Otherwise, if `copy_amount` can be folded to a constant, and if
`__builtin_object_size(dst, 0) >= copy_amount`, calls `__builtin_mempcpy`.
- Otherwise, calls `__builtin___mempcpy_chk`.
How can this be "zero overhead"? Let's focus on the following part of the
function:
```c
size_t bos_dst = __builtin_object_size(dst, 0);
if (bos_dst != -1 &&
!(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
```
If Clang's frontend cannot determine a value for `__builtin_object_size`, Clang
lowers it to LLVM's `@llvm.objectsize` intrinsic. The `@llvm.objectsize`
invocation corresponding to `__builtin_object_size(p, 0)` is guaranteed to
always fold to a constant value by the time LLVM emits machine code.
Hence, `bos_dst` is guaranteed to be a constant; if it's -1, the above branch
can be eliminated entirely, since it folds to `if (false && ...)`. Further, the
RHS of the `&&` in this branch has us call `__builtin_mempcpy` if `copy_amount`
is a known value less than `bos_dst` (yet another constant value). Therefore,
the entire condition is always knowable when LLVM is done with LLVM IR-level
optimizations, so no condition is ever emitted to machine code in practice.
#### Why is "zero overhead" in quotes? Why is `unique_ptr` relevant?
`__builtin_object_size` and `__builtin_constant_p` are forced to be constants
after most optimizations take place. Until LLVM replaces both of these with
constants and optimizes them out, we have additional branches and function calls
in our IR. This can have negative effects, such as distorting inlining costs and
inhibiting optimizations that are conservative around branches in control-flow.
So FORTIFY is free in these cases _in isolation of any of the code around it_.
Due to its implementation, it may impact the optimizations that occur on code
around the literal call to the FORTIFY-hardened libc function.
`unique_ptr` was just the first thing that came to the author's mind for "the
type should be zero cost with any level of optimization enabled, but edge-cases
might make it only-mostly-free to use."
### How is checking actually performed?
In cases where checking can be performed (e.g., where we call
`__builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);`), Bionic provides [an
implementation for `__mempcpy_chk`]. This is:
```c
extern "C" void* __mempcpy_chk(void* dst, const void* src, size_t count, size_t dst_len) {
__check_count("mempcpy", "count", count);
__check_buffer_access("mempcpy", "write into", count, dst_len);
return mempcpy(dst, src, count);
}
```
This function itself boils down to a few small branches which abort the program
if they fail, and a direct call to `__builtin_mempcpy`.
### Wrapping up
In the above breakdown, it was shown how Clang and Bionic work together to:
- represent FORTIFY-hardened overloads of functions,
- report misuses of stdlib functions at compile-time, and
- insert run-time checks for uses of functions that might be incorrect, but only
if we have the potential of proving the incorrectness of these.
## Breakdown of open
In Bionic, the [FORTIFY'ed implementation of `open`] is quite large. Much like
`mempcpy`, the `__builtin_open` declaration is simple:
```c
int open(const char* __path, int __flags, ...);
```
With some macros expanded, the FORTIFY-hardened header implementation is:
```c
int __open_2(const char*, int);
int __open_real(const char*, int, ...) __asm__(open);
#define __open_modes_useful(flags) (((flags) & O_CREAT) || ((flags) & O_TMPFILE) == O_TMPFILE)
static
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
__attribute__((diagnose_if(1, "error", "too many arguments")));
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
__attribute__((overloadable))
__attribute__((diagnose_if(
__open_modes_useful(flags),
"error",
"'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
return __open_2(pathname, flags);
#else
return __open_real(pathname, flags);
#endif
}
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
__attribute__((overloadable))
__clang_warning_if(!__open_modes_useful(flags) && modes,
"'open' has superfluous mode bits; missing O_CREAT?") {
return __open_real(pathname, flags, modes);
}
```
Which may be a lot to take in.
Before diving too deeply, please note that the remainder of these subsections
assume that the programmer didn't make any egregious typos. Moreover, there's no
real way that Bionic tries to prevent calls to `open` like
`open("foo", 0, "how do you convert a const char[N] to mode_t?");`. The only
real C-compatible solution the author can think of is "stamp out many overloads
to catch sort-of-common instances of this very uncommon typo." This isn't great.
More directly, no effort is made below to recognize calls that, due to
incompatible argument types, cannot go to any `open` implementation other than
`__builtin_open`, since it's recognized right here. :)
### Implementation breakdown
This `open` implementation does a few things:
- Turns calls to `open` with too many arguments into a compile-time error.
- Diagnoses calls to `open` with missing modes at compile-time and run-time
(both cases turn into errors).
- Emits warnings on calls to `open` with useless mode bits, unless the mode bits
are all 0.
One common bit of code not explained below is the `__open_real` declaration above:
```c
int __open_real(const char*, int, ...) __asm__(open);
```
This exists as a way for us to call `__builtin_open` without needing clang to
have a pre-defined `__builtin_open` function.
#### Compile-time error on too many arguments
```c
static
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
__attribute__((diagnose_if(1, "error", "too many arguments")));
```
Which matches most calls to open that supply too many arguments, since
`int(const char *, int, ...)` matches less strongly than
`int(const char *, int, mode_t, ...)` for calls where the 3rd arg can be
converted to `mode_t` without too much effort. Because of the `diagnose_if`
attribute, all of these calls turn into compile-time errors.
#### Compile-time or run-time error on missing arguments
The following overload handles all two-argument calls to `open`.
```c
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
__attribute__((overloadable))
__attribute__((diagnose_if(
__open_modes_useful(flags),
"error",
"'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
return __open_2(pathname, flags);
#else
return __open_real(pathname, flags);
#endif
}
```
Like `mempcpy`, `diagnose_if` handles emitting a compile-time error if the call
to `open` is broken in a way that's visible to Clang's frontend. This
essentially boils down to "`open` is being called with a `flags` value that
requires mode bits to be set."
If that fails to catch a bug, we [unconditionally call `__open_2`], which
performs a run-time check:
```c
int __open_2(const char* pathname, int flags) {
if (needs_mode(flags)) __fortify_fatal("open: called with O_CREAT/O_TMPFILE but no mode");
return FDTRACK_CREATE_NAME("open", __openat(AT_FDCWD, pathname, force_O_LARGEFILE(flags), 0));
}
```
#### Compile-time warning if modes are pointless
Finally, we have the following `open` call:
```c
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
__attribute__((overloadable))
__clang_warning_if(!__open_modes_useful(flags) && modes,
"'open' has superfluous mode bits; missing O_CREAT?") {
return __open_real(pathname, flags, modes);
}
```
This simply issues a warning if Clang's frontend can determine that `flags`
isn't necessary. Due to conventions in existing code, a `modes` value of `0` is
not diagnosed.
#### What about `&open`?
One yet-unaddressed aspect of the above is how `&open` works. This is thankfully
a short answer:
- It happens that `open` takes a parameter of type `const char*`.
- It happens that `pass_object_size` -- an attribute only applicable to
parameters of type `T*` -- makes it impossible to take the address of a
function.
Since clang doesn't support a "this function should never have its address
taken," attribute, Bionic uses the next best thing: `pass_object_size`. :)
## Breakdown of poll
(Preemptively: at the time of writing, Clang has no literal `__builtin_poll`
builtin. `__builtin_poll` is referenced below to remain consistent with the
convention established in the Terminology section.)
Bionic's `poll` implementation is closest to `mempcpy` above, though it has a
few interesting aspects worth examining.
The [full header implementation of `poll`] is, with some macros expanded:
```c
#define __bos_fd_count_trivially_safe(bos_val, fds, fd_count) \
((bos_val) == -1) || \
(__builtin_constant_p(fd_count) && \
(bos_val) >= sizeof(*fds) * (fd_count)))
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
__attribute__((overloadable))
__attriubte__((diagnose_if(
__builtin_object_size(fds, 1) != -1 && __builtin_object_size(fds, 1) < sizeof(*fds) * fd_count,
"error",
"in call to 'poll', fd_count is larger than the given buffer"))) {
size_t bos_fds = __builtin_object_size(fds, 1);
if (!__bos_fd_count_trivially_safe(bos_fds, fds, fd_count)) {
return __poll_chk(fds, fd_count, timeout, bos_fds);
}
return (&poll)(fds, fd_count, timeout);
}
```
To get the commonality with `mempcpy` and `open` out of the way:
- This function is an overload with `__builtin_poll`.
- The signature is the same, modulo the presence of a `pass_object_size`
attribute. Hence, for direct calls, overload resolution will always prefer it
over `__builtin_poll`. Taking the address of `poll` is forbidden, so all
references to `&poll` actually reference `__builtin_poll`.
- When `fds` is too small to hold `fd_count` `pollfd`s, Clang will emit a
compile-time error if possible using `diagnose_if`.
- If this can't be observed until run-time, `__poll_chk` verifies this.
- When `fds` is a constant according to `__builtin_constant_p`, this always
compiles into `__poll_chk` for always-broken calls to `poll`, or
`__builtin_poll` for always-safe calls to `poll`.
The critical bits to highlight here are on this line:
```c
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
```
And this line:
```c
return (&poll)(fds, fd_count, timeout);
```
Starting with the simplest, we call `__builtin_poll` with `(&poll)(...);`. As
referenced above, taking the address of an overloaded function where all but one
overload has a `pass_object_size` attribute on one or more parameters always
resolves to the function without any `pass_object_size` attributes.
The other line deserves a section. The subtlety of it is almost entirely in the
use of `pass_object_size(1)` instead of `pass_object_size(0)`. on the `fds`
parameter, and the corresponding use of `__builtin_object_size(fds, 1);` in the
body of `poll`.
### Subtleties of __builtin_object_size(p, N)
Earlier in this document, it was said that a full description of each
attribute/builtin necessary to power FORTIFY was out of scope. This is... only
somewhat the case when we talk about `__builtin_object_size` and
`pass_object_size`, especially when their second argument is `1`.
#### tl;dr
`__builtin_object_size(p, N)` and `pass_object_size(N)`, where `(N & 1) == 1`,
can only be accurately determined by Clang. LLVM's `@llvm.objectsize` intrinsic
ignores the value of `N & 1`, since handling `(N & 1) == 1` accurately requires
data that's currently entirely inaccessible to LLVM, and that is difficult to
preserve through LLVM's optimization passes.
`pass_object_size`'s "lifting" of the evaluation of
`__builtin_object_size(p, N)` to the caller is critical, since it allows Clang
full visibility into the expression passed to e.g., `poll(&foo->bar, baz, qux)`.
It's not a perfect solution, but it allows `N == 1` to be fully accurate in at
least some cases.
#### Background
Clang's implementation of `__builtin_object_size` aims to be compatible with
GCC's, which has [a decent bit of documentation]. Put simply,
`__builtin_object_size(p, N)` is intended to evaluate at compile-time how many
bytes can be accessed after `p` in a well-defined way. Straightforward examples
of this are:
```c
char buf[8];
assert(__builtin_object_size(buf, N) == 8);
assert(__builtin_object_size(buf + 1, N) == 7);
```
This should hold for all values of N that are valid to pass to
`__builtin_object_size`. The `N` value of `__builtin_object_size` is a mask of
settings.
##### (N & 2) == ?
This is mostly for completeness sake; in Bionic's FORTIFY implementation, N is
always either 0 or 1.
If there are multiple possible values of `p` in a call to
`__builtin_object_size(p, N)`, the second bit in `N` determines the behavior of
the compiler. If `(N & 2) == 0`, `__builtin_object_size` should return the
greatest possible size for each possible value of `p`. Otherwise, it should
return the least possible value. For example:
```c
char smol_buf[7];
char buf[8];
char *p = rand() ? smol_buf : buf;
assert(__builtin_object_size(p, 0) == 8);
assert(__builtin_object_size(p, 2) == 7);
```
##### (N & 1) == 0
`__builtin_object_size(p, 0)` is more or less as simple as the example in the
Background section directly above. When Clang attempts to evaluate
`__builtin_object_size(p, 0);` and when LLVM tries to determine the result of a
corresponding `@llvm.objectsize` call to, they search for the storage underlying
the pointer in question. If that can be determined, Clang or LLVM can provide an
answer; otherwise, they cannot.
##### (N & 1) == 1, and the true magic of pass_object_size
`__builtin_object_size(p, 1)` has a less uniform implementation between LLVM and
Clang. According to GCC's documentation, "If the least significant bit [of
__builtin_object_size's second argument] is clear, objects are whole variables,
if it is set, a closest surrounding subobject is considered the object a pointer
points to."
The "closest surrounding subobject," means that `(N & 1) == 1` depends on type
information in order to operate in many cases. Consider the following examples:
```c
struct Foo {
int a;
int b;
};
struct Foo foo;
assert(__builtin_object_size(&foo, 0) == sizeof(foo));
assert(__builtin_object_size(&foo, 1) == sizeof(foo));
assert(__builtin_object_size(&foo->a, 0) == sizeof(foo));
assert(__builtin_object_size(&foo->a, 1) == sizeof(int));
struct Foo foos[2];
assert(__builtin_object_size(&foos[0], 0) == 2 * sizeof(foo));
assert(__builtin_object_size(&foos[0], 1) == sizeof(foo));
assert(__builtin_object_size(&foos[0]->a, 0) == 2 * sizeof(foo));
assert(__builtin_object_size(&foos[0]->a, 1) == sizeof(int));
```
...And perhaps somewhat surprisingly:
```c
void example(struct Foo *foo) {
// (As a reminder, `-1` is "I don't know" when `(N & 2) == 0`.)
assert(__builtin_object_size(foo, 0) == -1);
assert(__builtin_object_size(foo, 1) == -1);
assert(__builtin_object_size(foo->a, 0) == -1);
assert(__builtin_object_size(foo->a, 1) == sizeof(int));
}
```
In Clang, [this type-aware requirement poses problems for us]: Clang's frontend
knows everything we could possibly want about the types of variables, but
optimizations are only performed by LLVM. LLVM has no reliable source for C or
C++ data types, so calls to `__builtin_object_size(p, N)` that cannot be
resolved by clang are lowered to the equivalent of
`__builtin_object_size(p, N & ~1)` in LLVM IR.
Moreover, Clang's frontend is the best-equipped part of the compiler to
accurately determine the answer for `__builtin_object_size(p, N)`, given we know
what `p` is. LLVM is the best-equipped part of the compiler to determine the
value of `p`. This ordering issue is unfortunate.
This is where `pass_object_size(N)` comes in. To summarize [the docs for
`pass_object_size`], it evaluates `__builtin_object_size(p, N)` within the
context of the caller of the function annotated with `pass_object_size`, and
passes the value of that into the callee as an invisible parameter. All calls to
`__builtin_object_size(parameter, N)` are substituted with references to this
invisible parameter.
Putting this plainly, Clang's frontend struggles to evaluate the following:
```c
int foo(void *p) {
return __builtin_object_size(p, 1);
}
void bar() {
struct { int i, j } k;
// The frontend can't figure this interprocedural objectsize out, so it gets lowered to
// LLVM, which determines that the answer here is sizeof(k).
int baz = foo(&k.i);
}
```
However, with the magic of `pass_object_size`, we get one level of inlining to
look through:
```c
int foo(void *const __attribute__((pass_object_size(1))) p) {
return __builtin_object_size(p, 1);
}
void bar() {
struct { int i, j } k;
// Due to pass_object_size, this is equivalent to:
// int baz = foo(&k.i, __builtin_object_size(&k.i, 1));
// ...and `int foo(void *)` is actually equivalent to:
// int foo(void *const, size_t size) {
// return size;
// }
int baz = foo(&k.i);
}
```
So we can obtain an accurate result in this case.
##### What about pass_object_size(0)?
It's sort of tangential, but if you find yourself wondering about the utility of
`pass_object_size(0)` ... it's somewhat split. `pass_object_size(0)` in Bionic's
FORTIFY exists mostly for visual consistency, simplicity, and as a useful way to
have e.g., `&mempcpy` == `&__builtin_mempcpy`.
Outside of these fringe benefits, all of the functions with
`pass_object_size(0)` on parameters are marked with `always_inline`, so
"lifting" the `__builtin_object_size` call isn't ultimately very helpful. In
theory, users can always have something like:
```c
// In some_header.h
// This function does cool and interesting things with the `__builtin_object_size` of its parameter,
// and is able to work with that as though the function were defined inline.
void out_of_line_function(void *__attribute__((pass_object_size(0))));
```
Though the author isn't aware of uses like this in practice, beyond a few folks
on LLVM's mailing list seeming interested in trying it someday.
#### Wrapping up
In the (long) section above, two things were covered:
- The use of `(&poll)(...);` is a convenient shorthand for calling
`__builtin_poll`.
- `__builtin_object_size(p, N)` with `(N & 1) == 1` is not easy for Clang to
answer accurately, since it relies on type info only available in the
frontend, and it sometimes relies on optimizations only available in the
middle-end. `pass_object_size` helps mitigate this.
## Miscellaneous Notes
The above should be a roughly comprehensive view of how FORTIFY works in the
real world. The main thing it fails to mention is the use of [the `diagnose_as_builtin` attribute] in Clang.
As time has moved on, Clang has increasingly gained support for emitting
warnings that were previously emitted by FORTIFY machinery.
`diagnose_as_builtin` allows us to remove the `diagnose_if`s from some of the
`static inline` overloads of stdlib functions above, so Clang may diagnose them
instead.
Clang's built-in diagnostics are often better than `diagnose_if` diagnostics,
since Clang can format its diagnostics to include e.g., information about the
sizes of buffers in a suspect call to a function. `diagnose_if` can only have
the compiler output constant strings.
[ChromeOS' Glibc patch]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/90fa9b27731db10a6010c7f7c25b24028145b091/sys-libs/glibc/files/local/glibc-2.33/0007-glibc-add-clang-style-FORTIFY.patch
[FORTIFY'ed implementation of `open`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/fcntl.h#41
[FORTIFY'ed version of `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/string.h#45
[a decent bit of documentation]: https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
[an implementation for `__mempcpy_chk`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/fortify.cpp#501
[full header implementation of `poll`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/poll.h#43
[incompatible with stricter versions of FORTIFY checking]: https://godbolt.org/z/fGfEYxfnf
[similar to C++11's `std::unique_ptr`]: https://stackoverflow.com/questions/58339165/why-can-a-t-be-passed-in-register-but-a-unique-ptrt-cannot
[source for `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/string.h#55
[the `diagnose_as_builtin` attribute]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#diagnose-as-builtin
[the docs for `pass_object_size`]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#pass-object-size-pass-dynamic-object-size
[this type-aware requirement poses problems for us]: https://github.com/llvm/llvm-project/issues/55742
[unconditionally call `__open_2`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/open.cpp#70