llkd: Use more inclusive language
Documentation is synchronized to match external, to ease updating. blacklist is replaced with ignorelist or ignore depending on context. Test: none Change-Id: I6db7ad321684759e3c5ac1f66f940b6e8a5709a0
This commit is contained in:
parent
1e412e7d4e
commit
6d9e515905
3 changed files with 293 additions and 255 deletions
362
llkd/README.md
362
llkd/README.md
|
@ -1,199 +1,237 @@
|
|||
Android Live-LocK Daemon
|
||||
========================
|
||||
<!--
|
||||
Project: /_project.yaml
|
||||
Book: /_book.yaml
|
||||
|
||||
Introduction
|
||||
------------
|
||||
{% include "_versions.html" %}
|
||||
-->
|
||||
|
||||
Android Live-LocK Daemon (llkd) is used to catch kernel deadlocks and mitigate.
|
||||
<!--
|
||||
Copyright 2020 The Android Open Source Project
|
||||
|
||||
Code is structured to allow integration into another service as either as part
|
||||
of the main loop, or spun off as a thread should that be necessary. A default
|
||||
standalone implementation is provided by llkd component.
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
The 'C' interface from libllkd component is thus:
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
#include "llkd.h"
|
||||
bool llkInit(const char* threadname) /* return true if enabled */
|
||||
unsigned llkCheckMillseconds(void) /* ms to sleep for next check */
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
If a threadname is provided, a thread will be automatically spawned, otherwise
|
||||
caller must call llkCheckMilliseconds in its main loop. Function will return
|
||||
the period of time before the next expected call to this handler.
|
||||
# Android Live-LocK Daemon (llkd)
|
||||
|
||||
Operations
|
||||
----------
|
||||
Android 10 <!-- {{ androidQVersionNumber }} --> includes the Android Live-LocK Daemon
|
||||
(`llkd`), which is designed to catch and mitigate kernel deadlocks. The `llkd`
|
||||
component provides a default standalone implementation, but you can
|
||||
alternatively integrate the `llkd` code into another service, either as part of
|
||||
the main loop or as a separate thread.
|
||||
|
||||
There are two detection scenarios. Persistent D or Z state, and persistent
|
||||
## Detection scenarios <!-- {:#detection-scenarios} -->
|
||||
|
||||
The `llkd` has two detection scenarios: Persistent D or Z state, and persistent
|
||||
stack signature.
|
||||
|
||||
If a thread is in D or Z state with no forward progress for longer than
|
||||
ro.llk.timeout_ms, or ro.llk.[D|Z].timeout_ms, kill the process or parent
|
||||
process respectively. If another scan shows the same process continues to
|
||||
exist, then have a confirmed live-lock condition and need to panic. Panic
|
||||
the kernel in a manner to provide the greatest bugreporting details as to the
|
||||
condition. Add a alarm self watchdog should llkd ever get locked up that is
|
||||
double the expected time to flow through the mainloop. Sampling is every
|
||||
ro.llk_sample_ms.
|
||||
### Persistent D or Z state <!-- {:#persistent-d-or-z-state} -->
|
||||
|
||||
For usedebug releases only, persistent stack signature checking is enabled.
|
||||
If a thread in any state but Z, has a persistent listed ro.llk.stack kernel
|
||||
symbol always being reported, even if there is forward scheduling progress, for
|
||||
longer than ro.llk.timeout_ms, or ro.llk.stack.timeout_ms, then issue a kill
|
||||
to the process. If another scan shows the same process continues to exist,
|
||||
then have a confirmed live-lock condition and need to panic. There is no
|
||||
ABA detection since forward scheduling progress is allowed, thus the condition
|
||||
for the symbols are:
|
||||
If a thread is in D (uninterruptible sleep) or Z (zombie) state with no forward
|
||||
progress for longer than `ro.llk.timeout_ms or ro.llk.[D|Z].timeout_ms`, the
|
||||
`llkd` kills the process (or parent process). If a subsequent scan shows the
|
||||
same process continues to exist, the `llkd` confirms a live-lock condition and
|
||||
panics the kernel in a manner that provides the most detailed bug report for the
|
||||
condition.
|
||||
|
||||
- Check is looking for " __symbol__+0x" or " __symbol__.cfi+0x" in
|
||||
/proc/__pid__/stack.
|
||||
- The __symbol__ should be rare and short lived enough that on a typical
|
||||
system the function is seen at most only once in a sample over the timeout
|
||||
period of ro.llk.stack.timeout_ms, samples occur every ro.llk.check_ms. This
|
||||
can be the only way to prevent a false trigger as there is no ABA protection.
|
||||
- Persistent continuously when the live lock condition exists.
|
||||
- Should be just below the function that is calling the lock that could
|
||||
contend, because if the lock is below or in the symbol function, the
|
||||
symbol will show in all affected processes, not just the one that
|
||||
caused the lockup.
|
||||
The `llkd` includes a self watchdog that alarms if `llkd` locks up; watchdog is
|
||||
double the expected time to flow through the mainloop and sampling is every
|
||||
`ro.llk_sample_ms`.
|
||||
|
||||
Default will not monitor init, or [kthreadd] and all that [kthreadd] spawns.
|
||||
This reduces the effectiveness of llkd by limiting its coverage. If there is
|
||||
value in covering [kthreadd] spawned threads, the requirement will be that
|
||||
the drivers not remain in a persistent 'D' state, or that they have mechanisms
|
||||
to recover the thread should it be killed externally (this is good driver
|
||||
coding hygiene, a common request to add such to publicly reviewed kernel.org
|
||||
maintained drivers). For instance use wait_event_interruptible() instead of
|
||||
wait_event(). The blacklists can be adjusted accordingly if these
|
||||
conditions are met to cover kernel components. For the stack symbol checking,
|
||||
there is an additional process blacklist so that we do not incide sepolicy
|
||||
violations on services that block ptrace operations.
|
||||
### Persistent stack signature <!-- {:#persistent-stack-signature} -->
|
||||
|
||||
An accompanying gTest set have been added, and will setup a persistent D or Z
|
||||
process, with and without forward progress, but not in a live-lock state
|
||||
because that would require a buggy kernel, or a module or kernel modification
|
||||
to stimulate. The test will check that llkd will mitigate first by killing
|
||||
the appropriate process. D state is setup by vfork() waiting for exec() in
|
||||
child process. Z state is setup by fork() and an un-waited for child process.
|
||||
Should be noted that both of these conditions should never happen on Android
|
||||
on purpose, and llkd effectively sweeps up processes that create these
|
||||
conditions. If the test can, it will reconfigure llkd to expedite the test
|
||||
duration by adjusting the ro.llk.* Android properties. Tests run the D state
|
||||
with some scheduling progress to ensure that ABA checking prevents false
|
||||
triggers. If 100% reliable ABA on platform, then ro.llk.killtest can be
|
||||
set to false; however this will result in some of the unit tests to panic
|
||||
kernel instead of deal with more graceful kill operation.
|
||||
For userdebug releases, the `llkd` can detect kernel live-locks using persistent
|
||||
stack signature checking. If a thread in any state except Z has a persistent
|
||||
listed `ro.llk.stack` kernel symbol that is reported for longer than
|
||||
`ro.llk.timeout_ms` or `ro.llk.stack.timeout_ms`, the `llkd` kills the process
|
||||
(even if there is forward scheduling progress). If a subsequent scan shows the
|
||||
same process continues to exist, the `llkd` confirms a live-lock condition and
|
||||
panics the kernel in a manner that provides the most detailed bug report for the
|
||||
condition.
|
||||
|
||||
Android Properties
|
||||
------------------
|
||||
Note: Because forward scheduling progress is allowed, the `llkd` does not
|
||||
perform [ABA detection](https://en.wikipedia.org/wiki/ABA_problem){:.external}.
|
||||
|
||||
The following are the Android Properties llkd respond to.
|
||||
*prop*_ms named properties are in milliseconds.
|
||||
Properties that use comma (*,*) separator for lists, use a leading separator to
|
||||
preserve default and add or subtract entries with (*optional*) plus (*+*) and
|
||||
minus (*-*) prefixes respectively.
|
||||
For these lists, the string "*false*" is synonymous with an *empty* list,
|
||||
and *blank* or *missing* resorts to the specified *default* value.
|
||||
The `lldk` check persists continuously when the live lock condition exists and
|
||||
looks for the composed strings `" symbol+0x"` or `" symbol.cfi+0x"` in the
|
||||
`/proc/pid/stack` file on Linux. The list of symbols is in `ro.llk.stack` and
|
||||
defaults to the comma-separated list of
|
||||
"`cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable`".
|
||||
|
||||
#### ro.config.low_ram
|
||||
device is configured with limited memory.
|
||||
Symbols should be rare and short-lived enough that on a typical system the
|
||||
function is seen only once in a sample over the timeout period of
|
||||
`ro.llk.stack.timeout_ms` (samples occur every `ro.llk.check_ms`). Due to lack
|
||||
of ABA protection, this is the only way to prevent a false trigger. The symbol
|
||||
function must appear below the function calling the lock that could contend. If
|
||||
the lock is below or in the symbol function, the symbol appears in all affected
|
||||
processes, not just the one that caused the lockup.
|
||||
|
||||
#### ro.debuggable
|
||||
device is configured for userdebug or eng build.
|
||||
## Coverage <!-- {:#coverage} -->
|
||||
|
||||
#### ro.llk.sysrq_t
|
||||
default not ro.config.low_ram, or ro.debuggable if property is "eng".
|
||||
if true do sysrq t (dump all threads).
|
||||
The default implementation of `llkd` does not monitor `init`, `[kthreadd]`, or
|
||||
`[kthreadd]` spawns. For the `llkd` to cover `[kthreadd]`-spawned threads:
|
||||
|
||||
#### ro.llk.enable
|
||||
default false, allow live-lock daemon to be enabled.
|
||||
* Drivers must not remain in a persistent D state,
|
||||
|
||||
#### llk.enable
|
||||
default ro.llk.enable, and evaluated for eng.
|
||||
OR
|
||||
|
||||
#### ro.khungtask.enable
|
||||
default false, allow [khungtask] daemon to be enabled.
|
||||
* Drivers must have mechanisms to recover the thread should it be killed
|
||||
externally. For example, use `wait_event_interruptible()` instead of
|
||||
`wait_event()`.
|
||||
|
||||
#### khungtask.enable
|
||||
default ro.khungtask.enable and evaluated for eng.
|
||||
If one of the above conditions is met, the `llkd` ignorelist can be adjusted to
|
||||
cover kernel components. Stack symbol checking involves an additional process
|
||||
ignore list to prevent sepolicy violations on services that block `ptrace`
|
||||
operations.
|
||||
|
||||
#### ro.llk.mlockall
|
||||
default false, enable call to mlockall().
|
||||
## Android properties <!-- {:#android-properties} -->
|
||||
|
||||
#### ro.khungtask.timeout
|
||||
default value 12 minutes, [khungtask] maximum timelimit.
|
||||
The `llkd` responds to several Android properties (listed below).
|
||||
|
||||
#### ro.llk.timeout_ms
|
||||
default 10 minutes, D or Z maximum timelimit, double this value and it sets
|
||||
the alarm watchdog for llkd.
|
||||
* Properties named `prop_ms` are in milliseconds.
|
||||
* Properties that use comma (,) separator for lists use a leading separator to
|
||||
preserve the default entry, then add or subtract entries with optional plus
|
||||
(+) and minus (-) prefixes respectively. For these lists, the string "false"
|
||||
is synonymous with an empty list, and blank or missing entries resort to the
|
||||
specified default value.
|
||||
|
||||
#### ro.llk.D.timeout_ms
|
||||
default ro.llk.timeout_ms, D maximum timelimit.
|
||||
### ro.config.low_ram <!-- {:#ro-config-low-ram} -->
|
||||
|
||||
#### ro.llk.Z.timeout_ms
|
||||
default ro.llk.timeout_ms, Z maximum timelimit.
|
||||
Device is configured with limited memory.
|
||||
|
||||
#### ro.llk.stack.timeout_ms
|
||||
default ro.llk.timeout_ms,
|
||||
checking for persistent stack symbols maximum timelimit.
|
||||
Only active on userdebug or eng builds.
|
||||
### ro.debuggable <!-- {:#ro-debuggable} -->
|
||||
|
||||
#### ro.llk.check_ms
|
||||
default 2 minutes samples of threads for D or Z.
|
||||
Device is configured for userdebug or eng build.
|
||||
|
||||
#### ro.llk.stack
|
||||
default cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable
|
||||
comma separated list of kernel symbols.
|
||||
Look for kernel stack symbols that if ever persistently present can
|
||||
indicate a subsystem is locked up.
|
||||
Beware, check does not on purpose do forward scheduling ABA except by polling
|
||||
every ro.llk_check_ms over the period ro.llk.stack.timeout_ms, so stack symbol
|
||||
should be exceptionally rare and fleeting.
|
||||
One must be convinced that it is virtually *impossible* for symbol to show up
|
||||
persistently in all samples of the stack.
|
||||
Again, looks for a match for either " **symbol**+0x" or " **symbol**.cfi+0x"
|
||||
in stack expansion.
|
||||
Only available on userdebug or eng builds, limited privileges due to security
|
||||
concerns on user builds prevents this checking.
|
||||
### ro.llk.sysrq_t <!-- {:#ro-llk-sysrq-t} -->
|
||||
|
||||
#### ro.llk.blacklist.process
|
||||
default 0,1,2 (kernel, init and [kthreadd]) plus process names
|
||||
init,[kthreadd],[khungtaskd],lmkd,llkd,watchdogd,
|
||||
[watchdogd],[watchdogd/0],...,[watchdogd/***get_nprocs**-1*].
|
||||
Do not watch these processes. A process can be comm, cmdline or pid reference.
|
||||
NB: automated default here can be larger than the current maximum property
|
||||
size of 92.
|
||||
NB: false is a very very very unlikely process to want to blacklist.
|
||||
If property is "eng", the default is not `ro.config.low_ram` or `ro.debuggable`.
|
||||
If true, dump all threads (`sysrq t`).
|
||||
|
||||
#### ro.llk.blacklist.parent
|
||||
default 0,2,adbd&[setsid] (kernel, [kthreadd] and adbd *only for zombie setsid*).
|
||||
Do not watch processes that have this parent.
|
||||
An ampersand (*&*) separator is used to specify that the parent is ignored
|
||||
only in combination with the target child process.
|
||||
Ampersand was selected because it is never part of a process name,
|
||||
however a setprop in the shell requires it to be escaped or quoted;
|
||||
init rc file where this is normally specified does not have this issue.
|
||||
A parent or target processes can be specified as comm, cmdline or pid reference.
|
||||
### ro.llk.enable <!-- {:#ro-llk-enable} -->
|
||||
|
||||
#### ro.llk.blacklist.uid
|
||||
default *empty* or false, comma separated list of uid numbers or names.
|
||||
Do not watch processes that match this uid.
|
||||
Allow live-lock daemon to be enabled. Default is false.
|
||||
|
||||
#### ro.llk.blacklist.process.stack
|
||||
default process names init,lmkd.llkd,llkd,keystore,ueventd,apexd,logd.
|
||||
This subset of processes are not monitored for live lock stack signatures.
|
||||
Also prevents the sepolicy violation associated with processes that block
|
||||
ptrace, as these can not be checked anyways.
|
||||
Only active on userdebug and eng builds.
|
||||
### llk.enable <!-- {:#llk-enable} -->
|
||||
|
||||
Architectural Concerns
|
||||
----------------------
|
||||
Evaluated for eng builds. Default is `ro.llk.enable`.
|
||||
|
||||
- built-in [khungtask] daemon is too generic and trips on driver code that
|
||||
sits around in D state too much. To switch to S instead makes the task(s)
|
||||
killable, so the drivers should be able to resurrect them if needed.
|
||||
- Properties are limited to 92 characters.
|
||||
- Create kernel module and associated gTest to actually test panic.
|
||||
- Create gTest to test out blacklist (ro.llk.blacklist.*properties* generally
|
||||
not be inputs). Could require more test-only interfaces to libllkd.
|
||||
- Speed up gTest using something else than ro.llk.*properties*, which should
|
||||
not be inputs as they should be baked into the product.
|
||||
### ro.khungtask.enable <!-- {:#ro-khungtask-enable} -->
|
||||
|
||||
Allow `[khungtask]` daemon to be enabled. Default is false.
|
||||
|
||||
### khungtask.enable <!-- {:#khungtask-enable} -->
|
||||
|
||||
Evaluated for eng builds. Default is `ro.khungtask.enable`.
|
||||
|
||||
### ro.llk.mlockall <!-- {:#ro-llk-mlockall} -->
|
||||
|
||||
Enable call to `mlockall()`. Default is false.
|
||||
|
||||
### ro.khungtask.timeout <!-- {:#ro-khungtask-timeout} -->
|
||||
|
||||
`[khungtask]` maximum time limit. Default is 12 minutes.
|
||||
|
||||
### ro.llk.timeout_ms <!-- {:#ro-llk-timeout-ms} -->
|
||||
|
||||
D or Z maximum time limit. Default is 10 minutes. Double this value to set the
|
||||
alarm watchdog for `llkd`.
|
||||
|
||||
### ro.llk.D.timeout_ms <!-- {:#ro-llk-D-timeout-ms} -->
|
||||
|
||||
D maximum time limit. Default is `ro.llk.timeout_ms`.
|
||||
|
||||
### ro.llk.Z.timeout_ms <!-- {:#ro-llk-Z-timeout-ms} -->
|
||||
|
||||
Z maximum time limit. Default is `ro.llk.timeout_ms`.
|
||||
|
||||
### ro.llk.stack.timeout_ms <!-- {:#ro-llk-stack-timeout-ms} -->
|
||||
|
||||
Checks for persistent stack symbols maximum time limit. Default is
|
||||
`ro.llk.timeout_ms`. **Active only on userdebug or eng builds**.
|
||||
|
||||
### ro.llk.check_ms <!-- {:#ro-llk-check-ms} -->
|
||||
|
||||
Samples of threads for D or Z. Default is two minutes.
|
||||
|
||||
### ro.llk.stack <!-- {:#ro-llk-stack} -->
|
||||
|
||||
Checks for kernel stack symbols that if persistently present can indicate a
|
||||
subsystem is locked up. Default is
|
||||
`cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable`
|
||||
comma-separated list of kernel symbols. The check doesn't do forward scheduling
|
||||
ABA except by polling every `ro.llk_check_ms` over the period
|
||||
`ro.llk.stack.timeout_ms`, so stack symbols should be exceptionally rare and
|
||||
fleeting (it is highly unlikely for a symbol to show up persistently in all
|
||||
samples of the stack). Checks for a match for `" symbol+0x"` or
|
||||
`" symbol.cfi+0x"` in stack expansion. **Available only on userdebug or eng
|
||||
builds**; security concerns on user builds result in limited privileges that
|
||||
prevent this check.
|
||||
|
||||
### ro.llk.ignorelist.process <!-- {:#ro-llk-ignorelist-process} -->
|
||||
|
||||
The `llkd` does not watch the specified processes. Default is `0,1,2` (`kernel`,
|
||||
`init`, and `[kthreadd]`) plus process names
|
||||
`init,[kthreadd],[khungtaskd],lmkd,llkd,watchdogd, [watchdogd],[watchdogd/0],...,[watchdogd/get_nprocs-1]`.
|
||||
A process can be a `comm`, `cmdline`, or `pid` reference. An automated default
|
||||
can be larger than the current maximum property size of 92.
|
||||
|
||||
Note: `false` is an extremely unlikely process to want to ignore.
|
||||
|
||||
### ro.llk.ignorelist.parent <!-- {:#ro-llk-ignorelist-parent} -->
|
||||
|
||||
The `llkd` does not watch processes that have the specified parent(s). Default
|
||||
is `0,2,adbd&[setsid]` (`kernel`, `[kthreadd]`, and `adbd` only for zombie
|
||||
`setsid`). An ampersand (&) separator specifies that the parent is ignored only
|
||||
in combination with the target child process. Ampersand was selected because it
|
||||
is never part of a process name; however, a `setprop` in the shell requires the
|
||||
ampersand to be escaped or quoted, although the `init rc` file where this is
|
||||
normally specified does not have this issue. A parent or target process can be a
|
||||
`comm`, `cmdline`, or `pid` reference.
|
||||
|
||||
### ro.llk.ignorelist.uid <!-- {:#ro-llk-ignorelist-uid} -->
|
||||
|
||||
The `llkd` does not watch processes that match the specified uid(s).
|
||||
Comma-separated list of uid numbers or names. Default is empty or false.
|
||||
|
||||
### ro.llk.ignorelist.process.stack <!-- {:#ro-llk-ignorelist-process-stack} -->
|
||||
|
||||
The `llkd` does not monitor the specified subset of processes for live lock stack
|
||||
signatures. Default is process names
|
||||
`init,lmkd.llkd,llkd,keystore,ueventd,apexd,logd`. Prevents the sepolicy
|
||||
violation associated with processes that block `ptrace` (as these can't be
|
||||
checked). **Active only on userdebug and eng builds**. For details on build
|
||||
types, refer to [Building Android](/setup/build/building#choose-a-target).
|
||||
|
||||
## Architectural concerns <!-- {:#architectural-concerns} -->
|
||||
|
||||
* Properties are limited to 92 characters. However, this is not limited for
|
||||
defaults defined in the `include/llkd.h` file in the sources.
|
||||
* The built-in `[khungtask]` daemon is too generic and trips on driver code that
|
||||
sits around in D state too much. Switching drivers to sleep, or S state,
|
||||
would make task(s) killable, and need to be resurrectable by drivers on an
|
||||
as-need basis.
|
||||
|
||||
## Library interface (optional) <!-- {:#library-interface-optional} -->
|
||||
|
||||
You can optionally incorporate the `llkd` into another privileged daemon using
|
||||
the following C interface from the `libllkd` component:
|
||||
|
||||
```
|
||||
#include "llkd.h"
|
||||
bool llkInit(const char* threadname) /* return true if enabled */
|
||||
unsigned llkCheckMillseconds(void) /* ms to sleep for next check */
|
||||
```
|
||||
|
||||
If a threadname is provided, a thread automatically spawns, otherwise the caller
|
||||
must call `llkCheckMilliseconds` in its main loop. The function returns the
|
||||
period of time before the next expected call to this handler.
|
||||
|
|
|
@ -30,37 +30,37 @@ bool llkInit(const char* threadname); /* threadname NULL, not spawned */
|
|||
unsigned llkCheckMilliseconds(void);
|
||||
|
||||
/* clang-format off */
|
||||
#define LLK_ENABLE_WRITEABLE_PROPERTY "llk.enable"
|
||||
#define LLK_ENABLE_PROPERTY "ro." LLK_ENABLE_WRITEABLE_PROPERTY
|
||||
#define LLK_ENABLE_DEFAULT false /* "eng" and userdebug true */
|
||||
#define KHT_ENABLE_WRITEABLE_PROPERTY "khungtask.enable"
|
||||
#define KHT_ENABLE_PROPERTY "ro." KHT_ENABLE_WRITEABLE_PROPERTY
|
||||
#define LLK_ENABLE_SYSRQ_T_PROPERTY "ro.llk.sysrq_t"
|
||||
#define LLK_ENABLE_SYSRQ_T_DEFAULT true
|
||||
#define LLK_MLOCKALL_PROPERTY "ro.llk.mlockall"
|
||||
#define LLK_MLOCKALL_DEFAULT true
|
||||
#define LLK_KILLTEST_PROPERTY "ro.llk.killtest"
|
||||
#define LLK_KILLTEST_DEFAULT true
|
||||
#define LLK_TIMEOUT_MS_PROPERTY "ro.llk.timeout_ms"
|
||||
#define KHT_TIMEOUT_PROPERTY "ro.khungtask.timeout"
|
||||
#define LLK_D_TIMEOUT_MS_PROPERTY "ro.llk.D.timeout_ms"
|
||||
#define LLK_Z_TIMEOUT_MS_PROPERTY "ro.llk.Z.timeout_ms"
|
||||
#define LLK_STACK_TIMEOUT_MS_PROPERTY "ro.llk.stack.timeout_ms"
|
||||
#define LLK_CHECK_MS_PROPERTY "ro.llk.check_ms"
|
||||
#define LLK_ENABLE_WRITEABLE_PROPERTY "llk.enable"
|
||||
#define LLK_ENABLE_PROPERTY "ro." LLK_ENABLE_WRITEABLE_PROPERTY
|
||||
#define LLK_ENABLE_DEFAULT false /* "eng" and userdebug true */
|
||||
#define KHT_ENABLE_WRITEABLE_PROPERTY "khungtask.enable"
|
||||
#define KHT_ENABLE_PROPERTY "ro." KHT_ENABLE_WRITEABLE_PROPERTY
|
||||
#define LLK_ENABLE_SYSRQ_T_PROPERTY "ro.llk.sysrq_t"
|
||||
#define LLK_ENABLE_SYSRQ_T_DEFAULT true
|
||||
#define LLK_MLOCKALL_PROPERTY "ro.llk.mlockall"
|
||||
#define LLK_MLOCKALL_DEFAULT true
|
||||
#define LLK_KILLTEST_PROPERTY "ro.llk.killtest"
|
||||
#define LLK_KILLTEST_DEFAULT true
|
||||
#define LLK_TIMEOUT_MS_PROPERTY "ro.llk.timeout_ms"
|
||||
#define KHT_TIMEOUT_PROPERTY "ro.khungtask.timeout"
|
||||
#define LLK_D_TIMEOUT_MS_PROPERTY "ro.llk.D.timeout_ms"
|
||||
#define LLK_Z_TIMEOUT_MS_PROPERTY "ro.llk.Z.timeout_ms"
|
||||
#define LLK_STACK_TIMEOUT_MS_PROPERTY "ro.llk.stack.timeout_ms"
|
||||
#define LLK_CHECK_MS_PROPERTY "ro.llk.check_ms"
|
||||
/* LLK_CHECK_MS_DEFAULT = actual timeout_ms / LLK_CHECKS_PER_TIMEOUT_DEFAULT */
|
||||
#define LLK_CHECKS_PER_TIMEOUT_DEFAULT 5
|
||||
#define LLK_CHECK_STACK_PROPERTY "ro.llk.stack"
|
||||
#define LLK_CHECK_STACK_DEFAULT \
|
||||
#define LLK_CHECKS_PER_TIMEOUT_DEFAULT 5
|
||||
#define LLK_CHECK_STACK_PROPERTY "ro.llk.stack"
|
||||
#define LLK_CHECK_STACK_DEFAULT \
|
||||
"cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable"
|
||||
#define LLK_BLACKLIST_PROCESS_PROPERTY "ro.llk.blacklist.process"
|
||||
#define LLK_BLACKLIST_PROCESS_DEFAULT \
|
||||
#define LLK_IGNORELIST_PROCESS_PROPERTY "ro.llk.ignorelist.process"
|
||||
#define LLK_IGNORELIST_PROCESS_DEFAULT \
|
||||
"0,1,2,init,[kthreadd],[khungtaskd],lmkd,llkd,watchdogd,[watchdogd],[watchdogd/0]"
|
||||
#define LLK_BLACKLIST_PARENT_PROPERTY "ro.llk.blacklist.parent"
|
||||
#define LLK_BLACKLIST_PARENT_DEFAULT "0,2,[kthreadd],adbd&[setsid]"
|
||||
#define LLK_BLACKLIST_UID_PROPERTY "ro.llk.blacklist.uid"
|
||||
#define LLK_BLACKLIST_UID_DEFAULT ""
|
||||
#define LLK_BLACKLIST_STACK_PROPERTY "ro.llk.blacklist.process.stack"
|
||||
#define LLK_BLACKLIST_STACK_DEFAULT "init,lmkd.llkd,llkd,keystore,ueventd,apexd"
|
||||
#define LLK_IGNORELIST_PARENT_PROPERTY "ro.llk.ignorelist.parent"
|
||||
#define LLK_IGNORELIST_PARENT_DEFAULT "0,2,[kthreadd],adbd&[setsid]"
|
||||
#define LLK_IGNORELIST_UID_PROPERTY "ro.llk.ignorelist.uid"
|
||||
#define LLK_IGNORELIST_UID_DEFAULT ""
|
||||
#define LLK_IGNORELIST_STACK_PROPERTY "ro.llk.ignorelist.process.stack"
|
||||
#define LLK_IGNORELIST_STACK_DEFAULT "init,lmkd.llkd,llkd,keystore,ueventd,apexd"
|
||||
/* clang-format on */
|
||||
|
||||
__END_DECLS
|
||||
|
|
130
llkd/libllkd.cpp
130
llkd/libllkd.cpp
|
@ -98,26 +98,26 @@ seconds khtTimeout = duration_cast<seconds>(llkTimeoutMs * (1 + LLK_CHECKS_PER_T
|
|||
std::unordered_set<std::string> llkCheckStackSymbols;
|
||||
#endif
|
||||
|
||||
// Blacklist variables, initialized with comma separated lists of high false
|
||||
// Ignorelist variables, initialized with comma separated lists of high false
|
||||
// positive and/or dangerous references, e.g. without self restart, for pid,
|
||||
// ppid, name and uid:
|
||||
|
||||
// list of pids, or tids or names to skip. kernel pid (0), init pid (1),
|
||||
// [kthreadd] pid (2), ourselves, "init", "[kthreadd]", "lmkd", "llkd" or
|
||||
// combinations of watchdogd in kernel and user space.
|
||||
std::unordered_set<std::string> llkBlacklistProcess;
|
||||
std::unordered_set<std::string> llkIgnorelistProcess;
|
||||
// list of parent pids, comm or cmdline names to skip. default:
|
||||
// kernel pid (0), [kthreadd] (2), or ourselves, enforced and implied
|
||||
std::unordered_set<std::string> llkBlacklistParent;
|
||||
std::unordered_set<std::string> llkIgnorelistParent;
|
||||
// list of parent and target processes to skip. default:
|
||||
// adbd *and* [setsid]
|
||||
std::unordered_map<std::string, std::unordered_set<std::string>> llkBlacklistParentAndChild;
|
||||
std::unordered_map<std::string, std::unordered_set<std::string>> llkIgnorelistParentAndChild;
|
||||
// list of uids, and uid names, to skip, default nothing
|
||||
std::unordered_set<std::string> llkBlacklistUid;
|
||||
std::unordered_set<std::string> llkIgnorelistUid;
|
||||
#ifdef __PTRACE_ENABLED__
|
||||
// list of names to skip stack checking. "init", "lmkd", "llkd", "keystore" or
|
||||
// "logd" (if not userdebug).
|
||||
std::unordered_set<std::string> llkBlacklistStack;
|
||||
std::unordered_set<std::string> llkIgnorelistStack;
|
||||
#endif
|
||||
|
||||
class dir {
|
||||
|
@ -626,9 +626,9 @@ std::string llkFormat(bool flag) {
|
|||
return flag ? "true" : "false";
|
||||
}
|
||||
|
||||
std::string llkFormat(const std::unordered_set<std::string>& blacklist) {
|
||||
std::string llkFormat(const std::unordered_set<std::string>& ignorelist) {
|
||||
std::string ret;
|
||||
for (const auto& entry : blacklist) {
|
||||
for (const auto& entry : ignorelist) {
|
||||
if (!ret.empty()) ret += ",";
|
||||
ret += entry;
|
||||
}
|
||||
|
@ -636,10 +636,10 @@ std::string llkFormat(const std::unordered_set<std::string>& blacklist) {
|
|||
}
|
||||
|
||||
std::string llkFormat(
|
||||
const std::unordered_map<std::string, std::unordered_set<std::string>>& blacklist,
|
||||
const std::unordered_map<std::string, std::unordered_set<std::string>>& ignorelist,
|
||||
bool leading_comma = false) {
|
||||
std::string ret;
|
||||
for (const auto& entry : blacklist) {
|
||||
for (const auto& entry : ignorelist) {
|
||||
for (const auto& target : entry.second) {
|
||||
if (leading_comma || !ret.empty()) ret += ",";
|
||||
ret += entry.first + "&" + target;
|
||||
|
@ -699,61 +699,61 @@ std::unordered_set<std::string> llkSplit(const std::string& prop, const std::str
|
|||
}
|
||||
|
||||
bool llkSkipName(const std::string& name,
|
||||
const std::unordered_set<std::string>& blacklist = llkBlacklistProcess) {
|
||||
if (name.empty() || blacklist.empty()) return false;
|
||||
const std::unordered_set<std::string>& ignorelist = llkIgnorelistProcess) {
|
||||
if (name.empty() || ignorelist.empty()) return false;
|
||||
|
||||
return blacklist.find(name) != blacklist.end();
|
||||
return ignorelist.find(name) != ignorelist.end();
|
||||
}
|
||||
|
||||
bool llkSkipProc(proc* procp,
|
||||
const std::unordered_set<std::string>& blacklist = llkBlacklistProcess) {
|
||||
const std::unordered_set<std::string>& ignorelist = llkIgnorelistProcess) {
|
||||
if (!procp) return false;
|
||||
if (llkSkipName(std::to_string(procp->pid), blacklist)) return true;
|
||||
if (llkSkipName(procp->getComm(), blacklist)) return true;
|
||||
if (llkSkipName(procp->getCmdline(), blacklist)) return true;
|
||||
if (llkSkipName(android::base::Basename(procp->getCmdline()), blacklist)) return true;
|
||||
if (llkSkipName(std::to_string(procp->pid), ignorelist)) return true;
|
||||
if (llkSkipName(procp->getComm(), ignorelist)) return true;
|
||||
if (llkSkipName(procp->getCmdline(), ignorelist)) return true;
|
||||
if (llkSkipName(android::base::Basename(procp->getCmdline()), ignorelist)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
const std::unordered_set<std::string>& llkSkipName(
|
||||
const std::string& name,
|
||||
const std::unordered_map<std::string, std::unordered_set<std::string>>& blacklist) {
|
||||
const std::unordered_map<std::string, std::unordered_set<std::string>>& ignorelist) {
|
||||
static const std::unordered_set<std::string> empty;
|
||||
if (name.empty() || blacklist.empty()) return empty;
|
||||
auto found = blacklist.find(name);
|
||||
if (found == blacklist.end()) return empty;
|
||||
if (name.empty() || ignorelist.empty()) return empty;
|
||||
auto found = ignorelist.find(name);
|
||||
if (found == ignorelist.end()) return empty;
|
||||
return found->second;
|
||||
}
|
||||
|
||||
bool llkSkipPproc(proc* pprocp, proc* procp,
|
||||
const std::unordered_map<std::string, std::unordered_set<std::string>>&
|
||||
blacklist = llkBlacklistParentAndChild) {
|
||||
if (!pprocp || !procp || blacklist.empty()) return false;
|
||||
if (llkSkipProc(procp, llkSkipName(std::to_string(pprocp->pid), blacklist))) return true;
|
||||
if (llkSkipProc(procp, llkSkipName(pprocp->getComm(), blacklist))) return true;
|
||||
if (llkSkipProc(procp, llkSkipName(pprocp->getCmdline(), blacklist))) return true;
|
||||
ignorelist = llkIgnorelistParentAndChild) {
|
||||
if (!pprocp || !procp || ignorelist.empty()) return false;
|
||||
if (llkSkipProc(procp, llkSkipName(std::to_string(pprocp->pid), ignorelist))) return true;
|
||||
if (llkSkipProc(procp, llkSkipName(pprocp->getComm(), ignorelist))) return true;
|
||||
if (llkSkipProc(procp, llkSkipName(pprocp->getCmdline(), ignorelist))) return true;
|
||||
return llkSkipProc(procp,
|
||||
llkSkipName(android::base::Basename(pprocp->getCmdline()), blacklist));
|
||||
llkSkipName(android::base::Basename(pprocp->getCmdline()), ignorelist));
|
||||
}
|
||||
|
||||
bool llkSkipPid(pid_t pid) {
|
||||
return llkSkipName(std::to_string(pid), llkBlacklistProcess);
|
||||
return llkSkipName(std::to_string(pid), llkIgnorelistProcess);
|
||||
}
|
||||
|
||||
bool llkSkipPpid(pid_t ppid) {
|
||||
return llkSkipName(std::to_string(ppid), llkBlacklistParent);
|
||||
return llkSkipName(std::to_string(ppid), llkIgnorelistParent);
|
||||
}
|
||||
|
||||
bool llkSkipUid(uid_t uid) {
|
||||
// Match by number?
|
||||
if (llkSkipName(std::to_string(uid), llkBlacklistUid)) {
|
||||
if (llkSkipName(std::to_string(uid), llkIgnorelistUid)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Match by name?
|
||||
auto pwd = ::getpwuid(uid);
|
||||
return (pwd != nullptr) && __predict_true(pwd->pw_name != nullptr) &&
|
||||
__predict_true(pwd->pw_name[0] != '\0') && llkSkipName(pwd->pw_name, llkBlacklistUid);
|
||||
__predict_true(pwd->pw_name[0] != '\0') && llkSkipName(pwd->pw_name, llkIgnorelistUid);
|
||||
}
|
||||
|
||||
bool getValidTidDir(dirent* dp, std::string* piddir) {
|
||||
|
@ -811,7 +811,7 @@ bool llkCheckStack(proc* procp, const std::string& piddir) {
|
|||
}
|
||||
|
||||
// Don't check process that are known to block ptrace, save sepolicy noise.
|
||||
if (llkSkipProc(procp, llkBlacklistStack)) return false;
|
||||
if (llkSkipProc(procp, llkIgnorelistStack)) return false;
|
||||
auto kernel_stack = ReadFile(piddir + "/stack");
|
||||
if (kernel_stack.empty()) {
|
||||
LOG(VERBOSE) << piddir << "/stack empty comm=" << procp->getComm()
|
||||
|
@ -917,12 +917,12 @@ void llkLogConfig(void) {
|
|||
<< LLK_CHECK_MS_PROPERTY "=" << llkFormat(llkCheckMs) << "\n"
|
||||
#ifdef __PTRACE_ENABLED__
|
||||
<< LLK_CHECK_STACK_PROPERTY "=" << llkFormat(llkCheckStackSymbols) << "\n"
|
||||
<< LLK_BLACKLIST_STACK_PROPERTY "=" << llkFormat(llkBlacklistStack) << "\n"
|
||||
<< LLK_IGNORELIST_STACK_PROPERTY "=" << llkFormat(llkIgnorelistStack) << "\n"
|
||||
#endif
|
||||
<< LLK_BLACKLIST_PROCESS_PROPERTY "=" << llkFormat(llkBlacklistProcess) << "\n"
|
||||
<< LLK_BLACKLIST_PARENT_PROPERTY "=" << llkFormat(llkBlacklistParent)
|
||||
<< llkFormat(llkBlacklistParentAndChild, true) << "\n"
|
||||
<< LLK_BLACKLIST_UID_PROPERTY "=" << llkFormat(llkBlacklistUid);
|
||||
<< LLK_IGNORELIST_PROCESS_PROPERTY "=" << llkFormat(llkIgnorelistProcess) << "\n"
|
||||
<< LLK_IGNORELIST_PARENT_PROPERTY "=" << llkFormat(llkIgnorelistParent)
|
||||
<< llkFormat(llkIgnorelistParentAndChild, true) << "\n"
|
||||
<< LLK_IGNORELIST_UID_PROPERTY "=" << llkFormat(llkIgnorelistUid);
|
||||
}
|
||||
|
||||
void* llkThread(void* obj) {
|
||||
|
@ -932,14 +932,14 @@ void* llkThread(void* obj) {
|
|||
|
||||
std::string name = std::to_string(::gettid());
|
||||
if (!llkSkipName(name)) {
|
||||
llkBlacklistProcess.emplace(name);
|
||||
llkIgnorelistProcess.emplace(name);
|
||||
}
|
||||
name = static_cast<const char*>(obj);
|
||||
prctl(PR_SET_NAME, name.c_str());
|
||||
if (__predict_false(!llkSkipName(name))) {
|
||||
llkBlacklistProcess.insert(name);
|
||||
llkIgnorelistProcess.insert(name);
|
||||
}
|
||||
// No longer modifying llkBlacklistProcess.
|
||||
// No longer modifying llkIgnorelistProcess.
|
||||
llkRunning = true;
|
||||
llkLogConfig();
|
||||
while (llkRunning) {
|
||||
|
@ -1122,12 +1122,12 @@ milliseconds llkCheck(bool checkRunning) {
|
|||
}
|
||||
if (pprocp) {
|
||||
if (llkSkipPproc(pprocp, procp)) break;
|
||||
if (llkSkipProc(pprocp, llkBlacklistParent)) break;
|
||||
if (llkSkipProc(pprocp, llkIgnorelistParent)) break;
|
||||
} else {
|
||||
if (llkSkipName(std::to_string(ppid), llkBlacklistParent)) break;
|
||||
if (llkSkipName(std::to_string(ppid), llkIgnorelistParent)) break;
|
||||
}
|
||||
|
||||
if ((llkBlacklistUid.size() != 0) && llkSkipUid(procp->getUid())) {
|
||||
if ((llkIgnorelistUid.size() != 0) && llkSkipUid(procp->getUid())) {
|
||||
continue;
|
||||
}
|
||||
|
||||
|
@ -1320,29 +1320,29 @@ bool llkInit(const char* threadname) {
|
|||
if (debuggable) {
|
||||
llkCheckStackSymbols = llkSplit(LLK_CHECK_STACK_PROPERTY, LLK_CHECK_STACK_DEFAULT);
|
||||
}
|
||||
std::string defaultBlacklistStack(LLK_BLACKLIST_STACK_DEFAULT);
|
||||
if (!debuggable) defaultBlacklistStack += ",logd,/system/bin/logd";
|
||||
llkBlacklistStack = llkSplit(LLK_BLACKLIST_STACK_PROPERTY, defaultBlacklistStack);
|
||||
std::string defaultIgnorelistStack(LLK_IGNORELIST_STACK_DEFAULT);
|
||||
if (!debuggable) defaultIgnorelistStack += ",logd,/system/bin/logd";
|
||||
llkIgnorelistStack = llkSplit(LLK_IGNORELIST_STACK_PROPERTY, defaultIgnorelistStack);
|
||||
#endif
|
||||
std::string defaultBlacklistProcess(
|
||||
std::to_string(kernelPid) + "," + std::to_string(initPid) + "," +
|
||||
std::to_string(kthreaddPid) + "," + std::to_string(::getpid()) + "," +
|
||||
std::to_string(::gettid()) + "," LLK_BLACKLIST_PROCESS_DEFAULT);
|
||||
std::string defaultIgnorelistProcess(
|
||||
std::to_string(kernelPid) + "," + std::to_string(initPid) + "," +
|
||||
std::to_string(kthreaddPid) + "," + std::to_string(::getpid()) + "," +
|
||||
std::to_string(::gettid()) + "," LLK_IGNORELIST_PROCESS_DEFAULT);
|
||||
if (threadname) {
|
||||
defaultBlacklistProcess += ","s + threadname;
|
||||
defaultIgnorelistProcess += ","s + threadname;
|
||||
}
|
||||
for (int cpu = 1; cpu < get_nprocs_conf(); ++cpu) {
|
||||
defaultBlacklistProcess += ",[watchdog/" + std::to_string(cpu) + "]";
|
||||
defaultIgnorelistProcess += ",[watchdog/" + std::to_string(cpu) + "]";
|
||||
}
|
||||
llkBlacklistProcess = llkSplit(LLK_BLACKLIST_PROCESS_PROPERTY, defaultBlacklistProcess);
|
||||
llkIgnorelistProcess = llkSplit(LLK_IGNORELIST_PROCESS_PROPERTY, defaultIgnorelistProcess);
|
||||
if (!llkSkipName("[khungtaskd]")) { // ALWAYS ignore as special
|
||||
llkBlacklistProcess.emplace("[khungtaskd]");
|
||||
llkIgnorelistProcess.emplace("[khungtaskd]");
|
||||
}
|
||||
llkBlacklistParent = llkSplit(LLK_BLACKLIST_PARENT_PROPERTY,
|
||||
std::to_string(kernelPid) + "," + std::to_string(kthreaddPid) +
|
||||
"," LLK_BLACKLIST_PARENT_DEFAULT);
|
||||
// derive llkBlacklistParentAndChild by moving entries with '&' from above
|
||||
for (auto it = llkBlacklistParent.begin(); it != llkBlacklistParent.end();) {
|
||||
llkIgnorelistParent = llkSplit(LLK_IGNORELIST_PARENT_PROPERTY,
|
||||
std::to_string(kernelPid) + "," + std::to_string(kthreaddPid) +
|
||||
"," LLK_IGNORELIST_PARENT_DEFAULT);
|
||||
// derive llkIgnorelistParentAndChild by moving entries with '&' from above
|
||||
for (auto it = llkIgnorelistParent.begin(); it != llkIgnorelistParent.end();) {
|
||||
auto pos = it->find('&');
|
||||
if (pos == std::string::npos) {
|
||||
++it;
|
||||
|
@ -1350,18 +1350,18 @@ bool llkInit(const char* threadname) {
|
|||
}
|
||||
auto parent = it->substr(0, pos);
|
||||
auto child = it->substr(pos + 1);
|
||||
it = llkBlacklistParent.erase(it);
|
||||
it = llkIgnorelistParent.erase(it);
|
||||
|
||||
auto found = llkBlacklistParentAndChild.find(parent);
|
||||
if (found == llkBlacklistParentAndChild.end()) {
|
||||
llkBlacklistParentAndChild.emplace(std::make_pair(
|
||||
auto found = llkIgnorelistParentAndChild.find(parent);
|
||||
if (found == llkIgnorelistParentAndChild.end()) {
|
||||
llkIgnorelistParentAndChild.emplace(std::make_pair(
|
||||
std::move(parent), std::unordered_set<std::string>({std::move(child)})));
|
||||
} else {
|
||||
found->second.emplace(std::move(child));
|
||||
}
|
||||
}
|
||||
|
||||
llkBlacklistUid = llkSplit(LLK_BLACKLIST_UID_PROPERTY, LLK_BLACKLIST_UID_DEFAULT);
|
||||
llkIgnorelistUid = llkSplit(LLK_IGNORELIST_UID_PROPERTY, LLK_IGNORELIST_UID_DEFAULT);
|
||||
|
||||
// internal watchdog
|
||||
::signal(SIGALRM, llkAlarmHandler);
|
||||
|
|
Loading…
Reference in a new issue