38be11e88c
It's a common cause of confusion, and even a brief explanation can be quite involved, so it's worth having something we can point to (and something that interested parties might just find via a web search). Bug: http://b/207248554 Test: treehugger Change-Id: I4a6d8917baf99a8f7abef05ce852a31ebe048d68
91 lines
4.1 KiB
Markdown
91 lines
4.1 KiB
Markdown
# EINTR
|
|
|
|
## The problem
|
|
|
|
If your code is blocked in a system call when a signal needs to be delivered,
|
|
the kernel needs to interrupt that system call. For something like a read(2)
|
|
call where some data has already been read, the call can just return with
|
|
what data it has. (This is one reason why read(2) sometimes returns less data
|
|
than you asked for, even though more data is available. It also explains why
|
|
such behavior is relatively rare, and a cause of bugs.)
|
|
|
|
But what if read(2) hasn't read any data yet? Or what if you've made some other
|
|
system call, for which there is no equivalent "partial" success, such as
|
|
poll(2)? In poll(2)'s case, there's either something to report (in which
|
|
case the system call would already have returned), or there isn't.
|
|
|
|
The kernel's solution to this problem is to return failure (-1) and set
|
|
errno to `EINTR`: "interrupted system call".
|
|
|
|
### Can I just opt out?
|
|
|
|
Technically, yes. In practice on Android, no. Technically if a signal's
|
|
disposition is set to ignore, the kernel doesn't even have to deliver the
|
|
signal, so your code can just stay blocked in the system call it was already
|
|
making. In practice, though, you can't guarantee that all signals are either
|
|
ignored or will kill your process... Unless you're a small single-threaded
|
|
C program that doesn't use any libraries, you can't realistically make this
|
|
guarantee. If any code has installed a signal handler, you need to cope with
|
|
`EINTR`. And if you're an Android app, the zygote has already installed a whole
|
|
host of signal handlers before your code even starts to run. (And, no, you
|
|
can't ignore them instead, because some of them are critical to how ART works.
|
|
For example: Java `NullPointerException`s are optimized by trapping `SIGSEGV`
|
|
signals so that the code generated by the JIT doesn't have to insert explicit
|
|
null pointer checks.)
|
|
|
|
### Why don't I see this in Java code?
|
|
|
|
You won't see this in Java because the decision was taken to hide this issue
|
|
from Java programmers. Basically, all the libraries like `java.io.*` and
|
|
`java.net.*` hide this from you. (The same should be true of `android.*` too,
|
|
so it's worth filing bugs if you find any exceptions that aren't documented!)
|
|
|
|
### Why doesn't libc do that too?
|
|
|
|
For most people, things would be easier if libc hid this implementation
|
|
detail. But there are legitimate use cases, and automatically retrying
|
|
would hide those. For example, you might want to use signals and `EINTR`
|
|
to interrupt another thread (in fact, that's how interruption of threads
|
|
doing I/O works in Java behind the scenes!). As usual, C/C++ choose the more
|
|
powerful but more error-prone option.
|
|
|
|
## The fix
|
|
|
|
### Easy cases
|
|
|
|
In most cases, the fix is simple: wrap the system call with the
|
|
`TEMP_FAILURE_RETRY` macro. This is basically a while loop that retries the
|
|
system call as long as the result is -1 and errno is `EINTR`.
|
|
|
|
So, for example:
|
|
```
|
|
n = read(fd, buf, buf_size); // BAD!
|
|
n = TEMP_FAILURE_RETRY(read(fd, buf, buf_size)); // GOOD!
|
|
```
|
|
|
|
### close(2)
|
|
|
|
TL;DR: *never* wrap close(2) calls with `TEMP_FAILURE_RETRY`.
|
|
|
|
The case of close(2) is complicated. POSIX explicitly says that close(2)
|
|
shouldn't close the file descriptor if it returns `EINTR`, but that's *not*
|
|
true on Linux (and thus on Android). See
|
|
[Returning EINTR from close()](https://lwn.net/Articles/576478/)
|
|
for more discussion.
|
|
|
|
Given that most Android code (and especially "all apps") are multithreaded,
|
|
retrying close(2) is especially dangerous because the file descriptor might
|
|
already have been reused by another thread, so the "retry" succeeds, but
|
|
actually closes a *different* file descriptor belonging to a *different*
|
|
thread.
|
|
|
|
### Timeouts
|
|
|
|
System calls with timeouts are the other interesting case where "just wrap
|
|
everything with `TEMP_FAILURE_RETRY()`" doesn't work. Because some amount of
|
|
time will have elapsed, you'll want to recalculate the timeout. Otherwise you
|
|
can end up with your 1 minute timeout being indefinite if you're receiving
|
|
signals at least once per minute, say. In this case you'll want to do
|
|
something like adding an explicit loop around your system call, calculating
|
|
the timeout _inside_ the loop, and using `continue` each time the system call
|
|
fails with `EINTR`.
|