Add some documentation about EINTR.
It's a common cause of confusion, and even a brief explanation can be quite involved, so it's worth having something we can point to (and something that interested parties might just find via a web search). Bug: http://b/207248554 Test: treehugger Change-Id: I4a6d8917baf99a8f7abef05ce852a31ebe048d68
This commit is contained in:
parent
ad12582726
commit
38be11e88c
1 changed files with 91 additions and 0 deletions
91
docs/EINTR.md
Normal file
91
docs/EINTR.md
Normal file
|
@ -0,0 +1,91 @@
|
|||
# EINTR
|
||||
|
||||
## The problem
|
||||
|
||||
If your code is blocked in a system call when a signal needs to be delivered,
|
||||
the kernel needs to interrupt that system call. For something like a read(2)
|
||||
call where some data has already been read, the call can just return with
|
||||
what data it has. (This is one reason why read(2) sometimes returns less data
|
||||
than you asked for, even though more data is available. It also explains why
|
||||
such behavior is relatively rare, and a cause of bugs.)
|
||||
|
||||
But what if read(2) hasn't read any data yet? Or what if you've made some other
|
||||
system call, for which there is no equivalent "partial" success, such as
|
||||
poll(2)? In poll(2)'s case, there's either something to report (in which
|
||||
case the system call would already have returned), or there isn't.
|
||||
|
||||
The kernel's solution to this problem is to return failure (-1) and set
|
||||
errno to `EINTR`: "interrupted system call".
|
||||
|
||||
### Can I just opt out?
|
||||
|
||||
Technically, yes. In practice on Android, no. Technically if a signal's
|
||||
disposition is set to ignore, the kernel doesn't even have to deliver the
|
||||
signal, so your code can just stay blocked in the system call it was already
|
||||
making. In practice, though, you can't guarantee that all signals are either
|
||||
ignored or will kill your process... Unless you're a small single-threaded
|
||||
C program that doesn't use any libraries, you can't realistically make this
|
||||
guarantee. If any code has installed a signal handler, you need to cope with
|
||||
`EINTR`. And if you're an Android app, the zygote has already installed a whole
|
||||
host of signal handlers before your code even starts to run. (And, no, you
|
||||
can't ignore them instead, because some of them are critical to how ART works.
|
||||
For example: Java `NullPointerException`s are optimized by trapping `SIGSEGV`
|
||||
signals so that the code generated by the JIT doesn't have to insert explicit
|
||||
null pointer checks.)
|
||||
|
||||
### Why don't I see this in Java code?
|
||||
|
||||
You won't see this in Java because the decision was taken to hide this issue
|
||||
from Java programmers. Basically, all the libraries like `java.io.*` and
|
||||
`java.net.*` hide this from you. (The same should be true of `android.*` too,
|
||||
so it's worth filing bugs if you find any exceptions that aren't documented!)
|
||||
|
||||
### Why doesn't libc do that too?
|
||||
|
||||
For most people, things would be easier if libc hid this implementation
|
||||
detail. But there are legitimate use cases, and automatically retrying
|
||||
would hide those. For example, you might want to use signals and `EINTR`
|
||||
to interrupt another thread (in fact, that's how interruption of threads
|
||||
doing I/O works in Java behind the scenes!). As usual, C/C++ choose the more
|
||||
powerful but more error-prone option.
|
||||
|
||||
## The fix
|
||||
|
||||
### Easy cases
|
||||
|
||||
In most cases, the fix is simple: wrap the system call with the
|
||||
`TEMP_FAILURE_RETRY` macro. This is basically a while loop that retries the
|
||||
system call as long as the result is -1 and errno is `EINTR`.
|
||||
|
||||
So, for example:
|
||||
```
|
||||
n = read(fd, buf, buf_size); // BAD!
|
||||
n = TEMP_FAILURE_RETRY(read(fd, buf, buf_size)); // GOOD!
|
||||
```
|
||||
|
||||
### close(2)
|
||||
|
||||
TL;DR: *never* wrap close(2) calls with `TEMP_FAILURE_RETRY`.
|
||||
|
||||
The case of close(2) is complicated. POSIX explicitly says that close(2)
|
||||
shouldn't close the file descriptor if it returns `EINTR`, but that's *not*
|
||||
true on Linux (and thus on Android). See
|
||||
[Returning EINTR from close()](https://lwn.net/Articles/576478/)
|
||||
for more discussion.
|
||||
|
||||
Given that most Android code (and especially "all apps") are multithreaded,
|
||||
retrying close(2) is especially dangerous because the file descriptor might
|
||||
already have been reused by another thread, so the "retry" succeeds, but
|
||||
actually closes a *different* file descriptor belonging to a *different*
|
||||
thread.
|
||||
|
||||
### Timeouts
|
||||
|
||||
System calls with timeouts are the other interesting case where "just wrap
|
||||
everything with `TEMP_FAILURE_RETRY()`" doesn't work. Because some amount of
|
||||
time will have elapsed, you'll want to recalculate the timeout. Otherwise you
|
||||
can end up with your 1 minute timeout being indefinite if you're receiving
|
||||
signals at least once per minute, say. In this case you'll want to do
|
||||
something like adding an explicit loop around your system call, calculating
|
||||
the timeout _inside_ the loop, and using `continue` each time the system call
|
||||
fails with `EINTR`.
|
Loading…
Reference in a new issue