This is mostly copy-pasted from go/clang-fortify-anatomy. Since it offers extensive documentation on how FORTIFY works in Bionic, having it also live within Bionic seems quite helpful. Bug: 235150687 Fixes: 235150687 Test: None Change-Id: I20145a5ba3155b1c7b3977f9b688320a7fda4ea2
36 KiB
This document was originally written for a broad audience, and it was determined that it'd be good to hold in Bionic's docs, too. Due to the ever-changing nature of code, it tries to link to a stable tag of Bionic's libc, rather than the live code in Bionic. Same for Clang. Reader beware. :)
The Anatomy of Clang FORTIFY
Objective
The intent of this document is to run through the minutiae of how Clang FORTIFY actually works in Bionic at the time of writing. Other FORTIFY implementations that target Clang should use very similar mechanics. This document exists in part because many Clang-specific features serve multiple purposes simultaneously, so getting up-to-speed on how things function can be quite difficult.
Background
FORTIFY is a broad suite of extensions to libc aimed at catching misuses of common library functions. Textually, these extensions exist purely in libc, but all implementations of FORTIFY rely heavily on C language extensions in order to function at all.
Broadly, FORTIFY implementations try to guard against many misuses of C standard(-ish) libraries:
- Buffer overruns in functions where pointers+sizes are passed (e.g.,
memcpy
,poll
), or where sizes exist implicitly (e.g.,strcpy
). - Arguments with incorrect values passed to libc functions (e.g.,
out-of-bounds bits in
umask
). - Missing arguments to functions (e.g.,
open()
withO_CREAT
, but no mode bits).
FORTIFY is traditionally enabled by passing -D_FORTIFY_SOURCE=N
to your
compiler. N==0
disables FORTIFY, whereas N==1
, N==2
, and N==3
enable
increasingly strict versions of it. In general, FORTIFY doesn't require user
code changes; that said, some code patterns
are incompatible with stricter versions of FORTIFY checking. This is largely
because FORTIFY has significant flexibility in what it considers to be an
"out-of-bounds" access.
FORTIFY implementations use a mix of compiler diagnostics and runtime checks to flag and/or mitigate the impacts of the misuses mentioned above.
Further, given FORTIFY's design, the effectiveness of FORTIFY is a function of
-- among other things -- the optimization level you're compiling your code at.
Many FORTIFY implementations are implicitly disabled when building with -O0
,
since FORTIFY's design for both Clang and GCC relies on optimizations in order
to provide useful run-time checks. For the purpose of this document, all
analysis of FORTIFY functions and commentary on builtins assume that code is
being built with some optimization level > -O0
.
A note on GCC
This document talks specifically about Bionic's FORTIFY implementation targeted at Clang. While GCC also provides a set of language extensions necessary to implement FORTIFY, these tools are different from what Clang offers. This divergence is an artifact of Clang and GCC's differing architecture as compilers.
Textually, quite a bit can be shared between a FORTIFY implementation for GCC and one for Clang (e.g., see ChromeOS' Glibc patch), but this kind of sharing requires things like macros that expand to unbalanced braces depending on your compiler:
/*
* Highly simplified; if you're interested in FORTIFY's actual implementation,
* please see the patch linked above.
*/
#ifdef __clang__
# define FORTIFY_PRECONDITIONS
# define FORTIFY_FUNCTION_END
#else
# define FORTIFY_PRECONDITIONS {
# define FORTIFY_FUNCTION_END }
#endif
/*
* FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN is not defined, due to its
* complexity and irrelevance. It turns into a compile-time warning if the
* compiler can determine `*buf` has fewer than `size` bytes available.
*/
char *getcwd(char *buf, size_t size)
FORTIFY_PRECONDITIONS
FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN(buf, size, "`buf` is too smol.")
{
// Actual shared function implementation goes here.
}
FORTIFY_FUNCTION_END
All talk of GCC-focused implementations and how to merge Clang and GCC implementations is out-of-scope for this doc, however.
The Life of a Clang FORTIFY Function
As referenced in the Background section, FORTIFY performs many different checks for many functions. This section intends to go through real-world examples of FORTIFY functions in Bionic, breaking down how each part of these functions work, and how the pieces fit together to provide FORTIFY-like functionality.
While FORTIFY implementations may differ between stdlibs, they broadly follow the same patterns when implementing their checks for Clang, and they try to make similar promises with respect to FORTIFY compiling to be zero-overhead in some cases, etc. Moreover, while this document specifically examines Bionic, many stdlibs will operate very similarly to Bionic in their Clang FORTIFY implementations.
In general, when reading the below, be prepared for exceptions, subtlety, and corner cases. The individual function breakdowns below try to not offer redundant information. Each one focuses on different aspects of FORTIFY.
Terminology
Because FORTIFY should be mostly transparent to developers, there are inherent
naming collisions here: memcpy(x, y, z)
turns into fundamentally different
generated code depending on the value of _FORTIFY_SOURCE
. Further, said
memcpy
call with _FORTIFY_SOURCE
enabled needs to be able to refer to the
memcpy
that would have been called, had _FORTIFY_SOURCE
been disabled.
Hence, the following convention is followed in the subsections below for all
prose (namely, multiline code blocks are exempted from this):
- Standard library function names preceded by
__builtin_
refer to the use of the function with_FORTIFY_SOURCE
disabled. - Standard library function names without a prefix refer to the use of the
function with
_FORTIFY_SOURCE
enabled.
This convention also applies in clang
. __builtin_memcpy
will always call
memcpy
as though _FORTIFY_SOURCE
were disabled.
Breakdown of mempcpy
The FORTIFY'ed version of mempcpy
is a full, featureful example of a
FORTIFY'ed function from Bionic. From the user's perspective, it supports a few
things:
- Producing a compile-time error if the number of bytes to copy trivially exceeds the number of bytes available at the destination pointer.
- If the
mempcpy
has the potential to write to more bytes than what is available at the destination, a run-time check is inserted to crash the program if more bytes are written than what is allowed. - Compiling away to be zero overhead when none of the buffer sizes can be determined at compile-time1.
The declaration in Bionic's headers for __builtin_mempcpy
is:
void* mempcpy(void* __dst, const void* __src, size_t __n) __INTRODUCED_IN(23);
Which is annotated with nothing special, except for Bionic's versioner, which is Android-specific (and orthogonal to FORTIFY anyway), so it will be ignored.
The source for mempcpy
in Bionic's headers for is:
__BIONIC_FORTIFY_INLINE
void* mempcpy(void* const dst __pass_object_size0, const void* src, size_t copy_amount)
__overloadable
__clang_error_if(__bos_unevaluated_lt(__bos0(dst), copy_amount),
"'mempcpy' called with size bigger than buffer") {
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
size_t bos_dst = __bos0(dst);
if (!__bos_trivially_ge(bos_dst, copy_amount)) {
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
}
#endif
return __builtin_mempcpy(dst, src, copy_amount);
}
Expanding some of the important macros here, this function expands to roughly:
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
void* mempcpy(
void* const dst __attribute__((pass_object_size(0))),
const void* src,
size_t copy_amount)
__attribute__((overloadable))
__attribute__((diagnose_if(
__builtin_object_size(dst, 0) != -1 && __builtin_object_size(dst, 0) <= copy_amount),
"'mempcpy' called with size bigger than buffer"))) {
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
size_t bos_dst = __builtin_object_size(dst, 0);
if (!(__bos_trivially_ge(bos_dst, copy_amount))) {
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
}
#endif
return __builtin_mempcpy(dst, src, copy_amount);
}
So let's walk through this step by step, to see how FORTIFY does what it says on the tin here.
How does Clang select mempcpy
?
First, it's critical to notice that mempcpy
is marked overloadable
. This
function is a static inline __attribute__((always_inline))
overload of
__builtin_mempcpy
:
__attribute__((overloadable))
allows us to perform overloading in C.__attribute__((overloadable))
mangles all calls to functions marked with__attribute__((overloadable))
.__attribute__((overloadable))
allows exactly one function signature with a given name to not be marked with__attribute__((overloadable))
. Calls to this overload will not be mangled.
Second, one might note that this mempcpy
implementation has the same C-level
signature as __builtin_mempcpy
. pass_object_size
is a Clang attribute that
is generally needed by FORTIFY, but it carries the side-effect that functions
may be overloaded simply on the presence (or lack of presence) of
pass_object_size
attributes. Given two overloads of a function that only
differ on the presence of pass_object_size
attributes, the candidate with
pass_object_size
attributes is preferred.
Finally, the prior paragraph gets thrown out if one tries to take the address of
mempcpy
. It is impossible to take the address of a function with one or more
parameters that are annotated with pass_object_size
. Hence,
&__builtin_mempcpy == &mempcpy
. Further, because this is an issue of overload
resolution, (&mempcpy)(x, y, z);
is functionally identical to
__builtin_mempcpy(x, y, z);
.
All of this accomplishes the following:
- Direct calls to
mempcpy
should call the FORTIFY-protectedmempcpy
. - Indirect calls to
&mempcpy
should call__builtin_mempcpy
.
How does Clang offer compile-time diagnostics?
Once one is convinced that the FORTIFY-enabled overload of mempcpy
will be
selected for direct calls, Clang's diagnose_if
and __builtin_object_size
do
all of the work from there.
Subtleties here primarily fall out of the discussion in the above section about
&__builtin_mempcpy == &mempcpy
:
#define _FORTIFY_SOURCE 2
#include <string.h>
void example_code() {
char buf[4]; // ...Assume sizeof(char) == 1.
const char input_buf[] = "Hello, World";
mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
__builtin_mempcpy(buf, input_buf, 5); // No compile-time error.
(&mempcpy)(buf, input_buf, 5); // No compile-time error, since __builtin_mempcpy is selected.
}
Otherwise, the rest of this subsection is dedicated to preliminary discussion
about __builtin_object_size
.
Clang's frontend can do one of two things with __builtin_object_size(p, n)
:
- Evaluate it as a constant.
- This can either mean declaring that the number of bytes at
p
is definitely impossible to know, so the default value is used, or the number of bytes atp
can be known without optimizations.
- This can either mean declaring that the number of bytes at
- Declare that the expression cannot form a constant, and lower it to
@llvm.objectsize
, which is discussed in depth later.
In the examples above, since diagnose_if
is evaluated with context from the
caller, Clang should be able to trivially determine that buf
refers to a
char
array with 4 elements.
The primary consequence of the above is that diagnostics can only be emitted if
no optimizations are required to detect a broken code pattern. To be specific,
clang's constexpr evaluator must be able to determine the logical object that
any given pointer points to in order to fold __builtin_object_size
to a
constant, non-default answer:
#define _FORTIFY_SOURCE 2
#include <string.h>
void example_code() {
char buf[4]; // ...Assume sizeof(char) == 1.
const char input_buf[] = "Hello, World";
mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
char *buf_ptr = buf;
mempcpy(buf_ptr, input_buf, 5); // No compile-time error; `buf_ptr`'s target can't be determined.
}
How does Clang insert run-time checks?
This section expands on the following statement: FORTIFY has zero runtime cost in instances where there is no chance of catching a bug at run-time. Otherwise, it introduces a tiny additional run-time cost to ensure that functions aren't misused.
In prior sections, the following was established:
overloadable
andpass_object_size
prompt Clang to always select this overload ofmempcpy
over__builtin_mempcpy
for direct calls.- If a call to
mempcpy
was trivially broken, Clang would produce a compile-time error, rather than producing a binary.
Hence, the case we're interested in here is one where Clang's frontend selected a FORTIFY'ed function's implementation for a function call, but was unable to find anything seriously wrong with said function call. Since the frontend is powerless to detect bugs at this point, our focus shifts to the mechanisms that LLVM uses to support FORTIFY.
Going back to Bionic's mempcpy
implementation, we have the following (ignoring
diagnose_if and assuming run-time checks are enabled):
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
void* mempcpy(
void* const dst __attribute__((pass_object_size(0))),
const void* src,
size_t copy_amount)
__attribute__((overloadable)) {
size_t bos_dst = __builtin_object_size(dst, 0);
if (bos_dst != -1 &&
!(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
}
return __builtin_mempcpy(dst, src, copy_amount);
}
In other words, we have a static
, always_inline
function which:
- If
__builtin_object_size(dst, 0)
cannot be determined (in which case, it returns -1), calls__builtin_mempcpy
. - Otherwise, if
copy_amount
can be folded to a constant, and if__builtin_object_size(dst, 0) >= copy_amount
, calls__builtin_mempcpy
. - Otherwise, calls
__builtin___mempcpy_chk
.
How can this be "zero overhead"? Let's focus on the following part of the function:
size_t bos_dst = __builtin_object_size(dst, 0);
if (bos_dst != -1 &&
!(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
If Clang's frontend cannot determine a value for __builtin_object_size
, Clang
lowers it to LLVM's @llvm.objectsize
intrinsic. The @llvm.objectsize
invocation corresponding to __builtin_object_size(p, 0)
is guaranteed to
always fold to a constant value by the time LLVM emits machine code.
Hence, bos_dst
is guaranteed to be a constant; if it's -1, the above branch
can be eliminated entirely, since it folds to if (false && ...)
. Further, the
RHS of the &&
in this branch has us call __builtin_mempcpy
if copy_amount
is a known value less than bos_dst
(yet another constant value). Therefore,
the entire condition is always knowable when LLVM is done with LLVM IR-level
optimizations, so no condition is ever emitted to machine code in practice.
Why is "zero overhead" in quotes? Why is unique_ptr
relevant?
__builtin_object_size
and __builtin_constant_p
are forced to be constants
after most optimizations take place. Until LLVM replaces both of these with
constants and optimizes them out, we have additional branches and function calls
in our IR. This can have negative effects, such as distorting inlining costs and
inhibiting optimizations that are conservative around branches in control-flow.
So FORTIFY is free in these cases in isolation of any of the code around it. Due to its implementation, it may impact the optimizations that occur on code around the literal call to the FORTIFY-hardened libc function.
unique_ptr
was just the first thing that came to the author's mind for "the
type should be zero cost with any level of optimization enabled, but edge-cases
might make it only-mostly-free to use."
How is checking actually performed?
In cases where checking can be performed (e.g., where we call
__builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
), Bionic provides an
implementation for __mempcpy_chk
. This is:
extern "C" void* __mempcpy_chk(void* dst, const void* src, size_t count, size_t dst_len) {
__check_count("mempcpy", "count", count);
__check_buffer_access("mempcpy", "write into", count, dst_len);
return mempcpy(dst, src, count);
}
This function itself boils down to a few small branches which abort the program
if they fail, and a direct call to __builtin_mempcpy
.
Wrapping up
In the above breakdown, it was shown how Clang and Bionic work together to:
- represent FORTIFY-hardened overloads of functions,
- report misuses of stdlib functions at compile-time, and
- insert run-time checks for uses of functions that might be incorrect, but only if we have the potential of proving the incorrectness of these.
Breakdown of open
In Bionic, the FORTIFY'ed implementation of open
is quite large. Much like
mempcpy
, the __builtin_open
declaration is simple:
int open(const char* __path, int __flags, ...);
With some macros expanded, the FORTIFY-hardened header implementation is:
int __open_2(const char*, int);
int __open_real(const char*, int, ...) __asm__(open);
#define __open_modes_useful(flags) (((flags) & O_CREAT) || ((flags) & O_TMPFILE) == O_TMPFILE)
static
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
__attribute__((diagnose_if(1, "error", "too many arguments")));
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
__attribute__((overloadable))
__attribute__((diagnose_if(
__open_modes_useful(flags),
"error",
"'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
return __open_2(pathname, flags);
#else
return __open_real(pathname, flags);
#endif
}
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
__attribute__((overloadable))
__clang_warning_if(!__open_modes_useful(flags) && modes,
"'open' has superfluous mode bits; missing O_CREAT?") {
return __open_real(pathname, flags, modes);
}
Which may be a lot to take in.
Before diving too deeply, please note that the remainder of these subsections
assume that the programmer didn't make any egregious typos. Moreover, there's no
real way that Bionic tries to prevent calls to open
like
open("foo", 0, "how do you convert a const char[N] to mode_t?");
. The only
real C-compatible solution the author can think of is "stamp out many overloads
to catch sort-of-common instances of this very uncommon typo." This isn't great.
More directly, no effort is made below to recognize calls that, due to
incompatible argument types, cannot go to any open
implementation other than
__builtin_open
, since it's recognized right here. :)
Implementation breakdown
This open
implementation does a few things:
- Turns calls to
open
with too many arguments into a compile-time error. - Diagnoses calls to
open
with missing modes at compile-time and run-time (both cases turn into errors). - Emits warnings on calls to
open
with useless mode bits, unless the mode bits are all 0.
One common bit of code not explained below is the __open_real
declaration above:
int __open_real(const char*, int, ...) __asm__(open);
This exists as a way for us to call __builtin_open
without needing clang to
have a pre-defined __builtin_open
function.
Compile-time error on too many arguments
static
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
__attribute__((diagnose_if(1, "error", "too many arguments")));
Which matches most calls to open that supply too many arguments, since
int(const char *, int, ...)
matches less strongly than
int(const char *, int, mode_t, ...)
for calls where the 3rd arg can be
converted to mode_t
without too much effort. Because of the diagnose_if
attribute, all of these calls turn into compile-time errors.
Compile-time or run-time error on missing arguments
The following overload handles all two-argument calls to open
.
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
__attribute__((overloadable))
__attribute__((diagnose_if(
__open_modes_useful(flags),
"error",
"'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
return __open_2(pathname, flags);
#else
return __open_real(pathname, flags);
#endif
}
Like mempcpy
, diagnose_if
handles emitting a compile-time error if the call
to open
is broken in a way that's visible to Clang's frontend. This
essentially boils down to "open
is being called with a flags
value that
requires mode bits to be set."
If that fails to catch a bug, we unconditionally call __open_2
, which
performs a run-time check:
int __open_2(const char* pathname, int flags) {
if (needs_mode(flags)) __fortify_fatal("open: called with O_CREAT/O_TMPFILE but no mode");
return FDTRACK_CREATE_NAME("open", __openat(AT_FDCWD, pathname, force_O_LARGEFILE(flags), 0));
}
Compile-time warning if modes are pointless
Finally, we have the following open
call:
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
__attribute__((overloadable))
__clang_warning_if(!__open_modes_useful(flags) && modes,
"'open' has superfluous mode bits; missing O_CREAT?") {
return __open_real(pathname, flags, modes);
}
This simply issues a warning if Clang's frontend can determine that flags
isn't necessary. Due to conventions in existing code, a modes
value of 0
is
not diagnosed.
What about &open
?
One yet-unaddressed aspect of the above is how &open
works. This is thankfully
a short answer:
- It happens that
open
takes a parameter of typeconst char*
. - It happens that
pass_object_size
-- an attribute only applicable to parameters of typeT*
-- makes it impossible to take the address of a function.
Since clang doesn't support a "this function should never have its address
taken," attribute, Bionic uses the next best thing: pass_object_size
. :)
Breakdown of poll
(Preemptively: at the time of writing, Clang has no literal __builtin_poll
builtin. __builtin_poll
is referenced below to remain consistent with the
convention established in the Terminology section.)
Bionic's poll
implementation is closest to mempcpy
above, though it has a
few interesting aspects worth examining.
The full header implementation of poll
is, with some macros expanded:
#define __bos_fd_count_trivially_safe(bos_val, fds, fd_count) \
((bos_val) == -1) || \
(__builtin_constant_p(fd_count) && \
(bos_val) >= sizeof(*fds) * (fd_count)))
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
__attribute__((overloadable))
__attriubte__((diagnose_if(
__builtin_object_size(fds, 1) != -1 && __builtin_object_size(fds, 1) < sizeof(*fds) * fd_count,
"error",
"in call to 'poll', fd_count is larger than the given buffer"))) {
size_t bos_fds = __builtin_object_size(fds, 1);
if (!__bos_fd_count_trivially_safe(bos_fds, fds, fd_count)) {
return __poll_chk(fds, fd_count, timeout, bos_fds);
}
return (&poll)(fds, fd_count, timeout);
}
To get the commonality with mempcpy
and open
out of the way:
- This function is an overload with
__builtin_poll
. - The signature is the same, modulo the presence of a
pass_object_size
attribute. Hence, for direct calls, overload resolution will always prefer it over__builtin_poll
. Taking the address ofpoll
is forbidden, so all references to&poll
actually reference__builtin_poll
. - When
fds
is too small to holdfd_count
pollfd
s, Clang will emit a compile-time error if possible usingdiagnose_if
. - If this can't be observed until run-time,
__poll_chk
verifies this. - When
fds
is a constant according to__builtin_constant_p
, this always compiles into__poll_chk
for always-broken calls topoll
, or__builtin_poll
for always-safe calls topoll
.
The critical bits to highlight here are on this line:
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
And this line:
return (&poll)(fds, fd_count, timeout);
Starting with the simplest, we call __builtin_poll
with (&poll)(...);
. As
referenced above, taking the address of an overloaded function where all but one
overload has a pass_object_size
attribute on one or more parameters always
resolves to the function without any pass_object_size
attributes.
The other line deserves a section. The subtlety of it is almost entirely in the
use of pass_object_size(1)
instead of pass_object_size(0)
. on the fds
parameter, and the corresponding use of __builtin_object_size(fds, 1);
in the
body of poll
.
Subtleties of __builtin_object_size(p, N)
Earlier in this document, it was said that a full description of each
attribute/builtin necessary to power FORTIFY was out of scope. This is... only
somewhat the case when we talk about __builtin_object_size
and
pass_object_size
, especially when their second argument is 1
.
tl;dr
__builtin_object_size(p, N)
and pass_object_size(N)
, where (N & 1) == 1
,
can only be accurately determined by Clang. LLVM's @llvm.objectsize
intrinsic
ignores the value of N & 1
, since handling (N & 1) == 1
accurately requires
data that's currently entirely inaccessible to LLVM, and that is difficult to
preserve through LLVM's optimization passes.
pass_object_size
's "lifting" of the evaluation of
__builtin_object_size(p, N)
to the caller is critical, since it allows Clang
full visibility into the expression passed to e.g., poll(&foo->bar, baz, qux)
.
It's not a perfect solution, but it allows N == 1
to be fully accurate in at
least some cases.
Background
Clang's implementation of __builtin_object_size
aims to be compatible with
GCC's, which has a decent bit of documentation. Put simply,
__builtin_object_size(p, N)
is intended to evaluate at compile-time how many
bytes can be accessed after p
in a well-defined way. Straightforward examples
of this are:
char buf[8];
assert(__builtin_object_size(buf, N) == 8);
assert(__builtin_object_size(buf + 1, N) == 7);
This should hold for all values of N that are valid to pass to
__builtin_object_size
. The N
value of __builtin_object_size
is a mask of
settings.
(N & 2) == ?
This is mostly for completeness sake; in Bionic's FORTIFY implementation, N is always either 0 or 1.
If there are multiple possible values of p
in a call to
__builtin_object_size(p, N)
, the second bit in N
determines the behavior of
the compiler. If (N & 2) == 0
, __builtin_object_size
should return the
greatest possible size for each possible value of p
. Otherwise, it should
return the least possible value. For example:
char smol_buf[7];
char buf[8];
char *p = rand() ? smol_buf : buf;
assert(__builtin_object_size(p, 0) == 8);
assert(__builtin_object_size(p, 2) == 7);
(N & 1) == 0
__builtin_object_size(p, 0)
is more or less as simple as the example in the
Background section directly above. When Clang attempts to evaluate
__builtin_object_size(p, 0);
and when LLVM tries to determine the result of a
corresponding @llvm.objectsize
call to, they search for the storage underlying
the pointer in question. If that can be determined, Clang or LLVM can provide an
answer; otherwise, they cannot.
(N & 1) == 1, and the true magic of pass_object_size
__builtin_object_size(p, 1)
has a less uniform implementation between LLVM and
Clang. According to GCC's documentation, "If the least significant bit [of
__builtin_object_size's second argument] is clear, objects are whole variables,
if it is set, a closest surrounding subobject is considered the object a pointer
points to."
The "closest surrounding subobject," means that (N & 1) == 1
depends on type
information in order to operate in many cases. Consider the following examples:
struct Foo {
int a;
int b;
};
struct Foo foo;
assert(__builtin_object_size(&foo, 0) == sizeof(foo));
assert(__builtin_object_size(&foo, 1) == sizeof(foo));
assert(__builtin_object_size(&foo->a, 0) == sizeof(foo));
assert(__builtin_object_size(&foo->a, 1) == sizeof(int));
struct Foo foos[2];
assert(__builtin_object_size(&foos[0], 0) == 2 * sizeof(foo));
assert(__builtin_object_size(&foos[0], 1) == sizeof(foo));
assert(__builtin_object_size(&foos[0]->a, 0) == 2 * sizeof(foo));
assert(__builtin_object_size(&foos[0]->a, 1) == sizeof(int));
...And perhaps somewhat surprisingly:
void example(struct Foo *foo) {
// (As a reminder, `-1` is "I don't know" when `(N & 2) == 0`.)
assert(__builtin_object_size(foo, 0) == -1);
assert(__builtin_object_size(foo, 1) == -1);
assert(__builtin_object_size(foo->a, 0) == -1);
assert(__builtin_object_size(foo->a, 1) == sizeof(int));
}
In Clang, this type-aware requirement poses problems for us: Clang's frontend
knows everything we could possibly want about the types of variables, but
optimizations are only performed by LLVM. LLVM has no reliable source for C or
C++ data types, so calls to __builtin_object_size(p, N)
that cannot be
resolved by clang are lowered to the equivalent of
__builtin_object_size(p, N & ~1)
in LLVM IR.
Moreover, Clang's frontend is the best-equipped part of the compiler to
accurately determine the answer for __builtin_object_size(p, N)
, given we know
what p
is. LLVM is the best-equipped part of the compiler to determine the
value of p
. This ordering issue is unfortunate.
This is where pass_object_size(N)
comes in. To summarize the docs for
pass_object_size
, it evaluates __builtin_object_size(p, N)
within the
context of the caller of the function annotated with pass_object_size
, and
passes the value of that into the callee as an invisible parameter. All calls to
__builtin_object_size(parameter, N)
are substituted with references to this
invisible parameter.
Putting this plainly, Clang's frontend struggles to evaluate the following:
int foo(void *p) {
return __builtin_object_size(p, 1);
}
void bar() {
struct { int i, j } k;
// The frontend can't figure this interprocedural objectsize out, so it gets lowered to
// LLVM, which determines that the answer here is sizeof(k).
int baz = foo(&k.i);
}
However, with the magic of pass_object_size
, we get one level of inlining to
look through:
int foo(void *const __attribute__((pass_object_size(1))) p) {
return __builtin_object_size(p, 1);
}
void bar() {
struct { int i, j } k;
// Due to pass_object_size, this is equivalent to:
// int baz = foo(&k.i, __builtin_object_size(&k.i, 1));
// ...and `int foo(void *)` is actually equivalent to:
// int foo(void *const, size_t size) {
// return size;
// }
int baz = foo(&k.i);
}
So we can obtain an accurate result in this case.
What about pass_object_size(0)?
It's sort of tangential, but if you find yourself wondering about the utility of
pass_object_size(0)
... it's somewhat split. pass_object_size(0)
in Bionic's
FORTIFY exists mostly for visual consistency, simplicity, and as a useful way to
have e.g., &mempcpy
== &__builtin_mempcpy
.
Outside of these fringe benefits, all of the functions with
pass_object_size(0)
on parameters are marked with always_inline
, so
"lifting" the __builtin_object_size
call isn't ultimately very helpful. In
theory, users can always have something like:
// In some_header.h
// This function does cool and interesting things with the `__builtin_object_size` of its parameter,
// and is able to work with that as though the function were defined inline.
void out_of_line_function(void *__attribute__((pass_object_size(0))));
Though the author isn't aware of uses like this in practice, beyond a few folks on LLVM's mailing list seeming interested in trying it someday.
Wrapping up
In the (long) section above, two things were covered:
- The use of
(&poll)(...);
is a convenient shorthand for calling__builtin_poll
. __builtin_object_size(p, N)
with(N & 1) == 1
is not easy for Clang to answer accurately, since it relies on type info only available in the frontend, and it sometimes relies on optimizations only available in the middle-end.pass_object_size
helps mitigate this.
Miscellaneous Notes
The above should be a roughly comprehensive view of how FORTIFY works in the
real world. The main thing it fails to mention is the use of the diagnose_as_builtin
attribute in Clang.
As time has moved on, Clang has increasingly gained support for emitting
warnings that were previously emitted by FORTIFY machinery.
diagnose_as_builtin
allows us to remove the diagnose_if
s from some of the
static inline
overloads of stdlib functions above, so Clang may diagnose them
instead.
Clang's built-in diagnostics are often better than diagnose_if
diagnostics,
since Clang can format its diagnostics to include e.g., information about the
sizes of buffers in a suspect call to a function. diagnose_if
can only have
the compiler output constant strings.
-
"Zero overhead" in a way similar to C++11's
std::unique_ptr
: this will turn into a direct call__builtin_mempcpy
(or an optimized form thereof) with no other surrounding checks at runtime. However, the additional complexity may hinder optimizations that are performed before the optimizer can prove that theif (...) { ... }
can be optimized out. Depending on how late this happens, the additional complexity may skew inlining costs, hide opportunities for e.g.,memcpy
coalescing, etc etc. ↩︎