Commit graph

88 commits

Author SHA1 Message Date
Mark Salyzyn
8fcfd85acc logd: address code fragility in last watermarks
Do not make the assumption that if worstPid is set, that the log
buffer id is not LOG_ID_EVENTS or LOG_ID_SECURITY. Add comments
to prevent future over-optimization based on this assumption.

Make sure we reset mLast[id] = begin() when we mark it unset, but
tell optimizer this is an _impossible_ path.

SideEffects: drop two branches in all erase calls, gain an unordered
             find() on an empty list for events and security buffers.
Test: gTest logd-unit-tests, liblog-unit-test & logcat-unit-tests
Bug: 32247044
Change-Id: Ic156ca2253c050c28021cedf48bedaf7bd692c09
2016-10-24 10:43:52 -07:00
Mark Salyzyn
fa07f9dc4b logd: mLastWorstPidOFSystem crash
mLastWorstPidOfSystem is filled with iterator references
that are not from AID_SYSTEM to aid the performance. But
we only clear entries from the list during erase if they
are from AID_SYSTEM. Remove the filter check in erase so
the stale references will be removed.

The conditions that caused this failure are difficult to
reproduce and are rare.

Test: gTests logd-unit-tests, liblog-unit-tests and logcat-unit-tests
Bug: 32247044
Bug: 31237377
Change-Id: Ie405dd643203b816cac15eef5c97600551cee450
2016-10-21 10:20:55 -07:00
Mark Salyzyn
f10e273790 logd: Use private interfaces for buffer size properties
Test: gTest logd-unit-tests, liblog-unit-tests and logcat-unit-tests
Bug: 31750617
Change-Id: I692577cfdf4bf8c93616f32df4b56786918aef1c
2016-10-18 09:49:51 -07:00
Mark Salyzyn
807e40ecc9 liblog: logd: Add android_lookupEventTag_len()
Allows us to mitigate the impact of MAP_PRIVATE and copy on write by
calling android_lookupEventTag_len instead of android_lookupEventTag,
and delaying the copy on write impact to the later.  We return a
string length in a supplied location along with the string pointer
with android_lookupEventTag_len(const EventTagMap* map, size_t* len,
int tag).  The string is not guaranteed to be nul terminated.  Since
android_lookupEventTag() called even once can cause the memory
impact, we will mark it as deprecated, but we currently have no
timeframe for removal since this is a very old interface.

Add an API for __android_log_is_loggable_len() that accepts the non
null terminated content and fixup callers that would gain because the
length is known prior to the call either in the compiler or at
runtime.  Tackle transition to android_lookupEventTag_len() and
fixup callers.

On any application that performs logging (eg: com.android.phone)

/proc/<pid>/smaps before:

xxxxxxxxxx-xxxxxxxxxx rw-p 00000000 fd:00 463 /system/etc/event-log-tags
Size:                 20 kB
Rss:                  20 kB
Pss:                   1 kB
Shared_Clean:          0 kB
Shared_Dirty:         20 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:            20 kB
AnonHugePages:         0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac

/proc/<pid>/smaps after:

xxxxxxxxxx-xxxxxxxxxx rw-p 00000000 fd:00 1773 /system/etc/event-log-tags
Size:                 20 kB
Rss:                  20 kB
Pss:                   1 kB
Shared_Clean:         20 kB  (was 0kB)
Shared_Dirty:          0 kB  (was 20kB)
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           20 kB  (was 0kB)
Anonymous:             0 kB  (was 20kB)
AnonHugePages:         0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac

Added liblog-unit-tests --gtest_filter=liblog.event_log_tags to
check for Shared_Clean: to not be 0 and Anonymous: to be 0 for
all processes referencing event-log-tags.  Which can include multiple
references to /system/etc/event-log-tags and future possible refs to
/data/misc/logd/event-log-tags and /dev/event-log-tags.  We want
failure messages to help point to errant code using the deprecated
interface.

This change saves 1/4MB of memory or more on a typical system.

Test: gTest liblog-unit-tests
Bug: 31456426
Change-Id: I9e08e44d9092bd96fe704b5709242e7195281d33
2016-10-03 11:08:34 -07:00
Mark Salyzyn
43a5f31e94 logd: clear all from non system user status request
pruneRows not necessarily ULONG_MAX when uid not system source,
allow for speed up for status response if pruneRows exhausted.

Change-Id: I38c76bb20215e3d96513a575e2e3bc85a5e5b41c
2016-09-02 07:41:27 -07:00
Mark Salyzyn
1eefca281e logd: crash in prune mLastWorstPidOfSystem
mLastWorstPidOfSystem is supposed to be indexed by element->getPid()

Bug: 31237377
Bug: 30797725
Bug: 30688716
Change-Id: I81a55e92f175ded1c571a0aa8836736d86b36b1d
2016-09-02 07:41:27 -07:00
Mark Salyzyn
6a06694a61 logd: Worst Tag filter enabled for events buffer
- Add drop logistics to TagTable
- replace uid references to a key reference since it
  is an UID for most buffers, but a TAG for the
  events and security buffer
- template the find worst entry mechanics into LogFindWorst class

Bug: 30118730
Change-Id: Ibea4be2c50d6ff4b39039e371365fed2453f17a2
2016-07-15 14:58:17 -07:00
Mark Salyzyn
0f85817985 logd: rename sort() for pids to sortPids()
Bug: 30118730
Change-Id: If992b53b52a7470427fbe82abb93c95c1b2dc57a
2016-07-15 14:37:04 -07:00
Chih-Hung Hsieh
1cc82ce95c Fix google-explicit-constructor warnings.
Bug: 28341362
Change-Id: I2cf746fd07addc6e6c22e98e2a5bb8a8ce47c885
2016-04-25 13:49:46 -07:00
Mark Salyzyn
8fa8896d2e logd: security buffer only AID_SYSTEM reader
- limit AID_SYSTEM uid or gid to read security buffer messages
- adjust liblog tests to reflect the reality of this adjustment

To fully test all security buffer paths and modes

$ su 0,0,0 /data/nativetest/liblog-unit-tests/liblog-unit-tests --gtest_filter=liblog.__security*
$ su 1000,1000,1000 /data/nativetest/liblog-unit-tests/liblog-unit-tests --gtest_filter=liblog.__security*
$ su 2000,2000,2000 /data/nativetest/liblog-unit-tests/liblog-unit-tests --gtest_filter=liblog.__security*

ToDo: Integrate the above individually into the gTest Q/A testing

Bug: 26029733
Change-Id: Idcf5492db78fa6934ef6fb43f3ef861052675651
2016-02-01 13:29:06 -08:00
Mark Salyzyn
7fd6c5c6f5 logd: prune maintain per-id watermark (part deux)
iterator corruption as we allow mLast to slip through the FIFO

Bug: 23685592
Bug: 26646087
Change-Id: Ifcbaecf390ee47b195c3d823d080c66c15db4530
2016-01-19 16:14:39 -08:00
Mark Salyzyn
507eb9fec2 logd: prune maintain per-id watermark
Without this change LogBuffer::prune and LogBuffer::erase
contributes 16.7% and 1.79% respectively. With this change,
they contributes 3.06 and 2.33% respectively. Pruning is
performed roughly 1 in every 255 log entries, a periodic
tamer latency spike.

Bug: 23685592
Change-Id: I6ae1cf9f3559bca4cf448efe8bcb2b96a1914c54
2016-01-11 14:46:51 -08:00
Mark Salyzyn
1a5bac2069 Merge "Revert "logd: liblog: whitelist "snet_event_log""" 2016-01-06 21:22:04 +00:00
Mark Salyzyn
0ee8de3c2d Revert "logd: liblog: whitelist "snet_event_log""
Callers will not guarantee that they can or will ratelimit, we need to
retain the ability to blacklist snet_event_log as a result.

This reverts commit 6aa21b225d.

Bug: 26178938
Change-Id: Ibf47d2e23a84c56f5f72d02312c698df7ff2b601
2016-01-06 21:19:23 +00:00
Mark Salyzyn
66607ebc0e logd: document ro.logd.size
Bad comment advise in LogBuffer.cpp results in partners failing to
considering using ro.logd.size to set the platform buffer size
default.

NB: It is not good practice to increase the log buffer size to deal
with logspam, as increases will result in logd scale issues getting
closer to hitting the background cgroup cpu cap. Once we hit that
cap, logd spirals, pruning old entries slower than the incoming log
entries. logd.writer will take 100% cpu.

Change-Id: If4a7a74f300d078eeaed0ffd3eb3fd77d1f9fe90
2016-01-05 09:09:18 -08:00
Mark Salyzyn
6aa21b225d logd: liblog: whitelist "snet_event_log"
Dangerous bridge to cross to whitelist, who is special, who is not?
Rationalized as these events are used to catch exploits on platform.
As it stands no one should be allowed to block any messages in the
security context, not even for development purposes.

Bug: 26178938
Change-Id: Ibdc76bc0fe29ba05be168b623af1c9f41d7edbd2
2015-12-30 10:07:19 -08:00
Mark Salyzyn
bec3c3def9 logd: Add worst pid of system filter
- Add a new statistic that reports per pid and log_id for AID_SYSTEM
- Add a new pruning filter ~1000/! boolean
- Use this new statistic to prune on worst pid within AID_SYSTEM

Bug: 26029733
Bug: 21615139
Bug: 22855208
Change-Id: Iab5dd28f807dcf03d276372853883f3b6afa8294
2015-12-29 09:32:35 -08:00
Mark Salyzyn
9c66a58f21 logd: Allow flags "eng" and "svelte" in boolean
- enhance property_get_bool, drop property_get_bool_svelte
- enhance base properties with ro and persist variants
- update and fortify README.property
- primarily move auditd and kernel logger into a realm where
  they can be controlled by build properties.
- Move logd.klogd to logd.kernel, and add ro.logd.kernel
  and persist.logd.kernel.
- Add ro.logd.auditd and persist.logd.auditd.
- Document persist.logd.security
- Document log.tag and persist.logd.tag properties.
- Document ro.logd.size, persist.logd.size and logd.size
  properties.

Bug: 26178938
Bug: 26029733
Bug: 17760225
Change-Id: Ibc1a497e45d69db5cf52133937f7ba6fb1d6cd21
2015-12-22 07:44:31 -08:00
Mark Salyzyn
ee3b838e13 logd: statistics per-pid filter
Primarily gives access to the Chattiest TIDs and TAGs
associated with a pid.

Has a secondary effect of allowing us to pull out the
command line, comm and in some cases the associated
PACKAGE for a specific pid while the logs are still
present even if the executable is gone.

Bug: 26029733
Bug: 21615139
Change-Id: I1ea63165a680a9318360579b70b1512078ed5682
2015-12-18 13:17:37 -08:00
Mark Salyzyn
cdda62b2c1 logd: ro.config.low_ram set buffer size to 64K
Bug: 25792367
Change-Id: Icae85a7de70b3a71f89b2bf8c80866649f727177
2015-12-14 14:37:07 -08:00
Mark Salyzyn
86052a5d4f Merge "logd: liblog: logcat: Add LOG_ID_SECURITY" 2015-12-11 17:33:33 +00:00
Mark Salyzyn
7b87365ecf logd: readlog apps get logger_entry_v4
Adds the uid field to outgoing content for readlog applications.
AID_LOG, AID_ROOT and AID_SYSTEM gain access to the information.

Bug: 25996918
Change-Id: I0254303c19d174cbf5e722c38844be5c54410c85
2015-12-09 08:12:07 -08:00
Mark Salyzyn
083b037c07 logd: liblog: logcat: Add LOG_ID_SECURITY
- Largish commit, buffer and access controls done together
- Add LOG_ID_SECURITY binary content log
- Add "default" meta buffer
- allow LOG_ID_SECURITY only from AID_SYSTEM and AID_ROOT UID & GID
- Use __android_log_security() to gate logging
- Add __android_log_security_bwrite() native access to security
  logging.
- Add liblog.__security_buffer end-to-end gTest

Bug: 26029733
Change-Id: Ibcf5b4660c17c1aa6902c0d93f8ffd29c93d9a93
2015-12-08 16:46:29 -08:00
Mark Salyzyn
b75cce0389 logd: wakeup on wrap or timeout
If a timeout is specified for the reader, then go to sleep
with the socket open. If the start time is about to get
pruned in the specified log buffers, then wakeup and dump
the logs; or wakeup on timeout, whichever comes first.

Bug: 25929746
Change-Id: I7d2421c2c5083b33747b84f74d9a560d3ba645df
2015-12-07 14:24:02 -08:00
Mark Salyzyn
ba7a9a016b logd: liblog: logcat: switch to android_log_clockid() (2)
android_log_timestamp returns the property leading letter,
it is better to return a clockid_t with android_log_clockid()

Bug: 23668800
Change-Id: I38dee773bf3844177826b03a26b03215c79a5359
2015-12-07 18:45:31 +00:00
Mark Salyzyn
9e18cdcebd Revert "logd: liblog: logcat: switch to android_log_clockid()"
This reverts commit 77b5696b1d.

Change-Id: I7711bf1a7e3f72ed29a2498d7287b725a0e624bd
2015-12-07 18:30:58 +00:00
Mark Salyzyn
77b5696b1d logd: liblog: logcat: switch to android_log_clockid()
android_log_timestamp returns the property leading letter,
it is better to return a clockid_t with android_log_clockid()

Bug: 23668800
Change-Id: I3c4e3e6b87f6676950797f1f0e203b44c542ed43
2015-12-04 14:49:19 -08:00
Mark Salyzyn
c2e7d4965f Revert "liblog: resolve deadlocks"
This reverts commit 7a2a307192.

Bug: 25693940
Change-Id: I9a7c926289e972f80c03c92e33535e1dedaa7381
2015-11-16 19:00:41 +00:00
Mark Salyzyn
7a2a307192 liblog: resolve deadlocks
Although ever present, an increased regression introduced with
commit b6bee33182 (liblog: logd:
support logd.timestamp = monotonic).

A signal handler can interrupt in locked context, if log is written
in the signal handler, we are in deadlock. Block signals while we
are locked. Separate out timestamp lock from is loggable lock to
reduce contention situations. Provide a best-guess response if
lock would fail in timestamp path.

Bug: 25563384
Change-Id: I6dccd6b99ebace1c473c03a785a35c63ed5c6a8a
2015-11-12 15:39:04 -08:00
Mark Salyzyn
8897a3fffd Merge "liblog: logd: support logd.timestamp = monotonic" 2015-11-04 00:31:53 +00:00
Mark Salyzyn
50122695d2 logd: pruning time horizon
Estimate a time horizon of ten fold for worst UID pruning

Bug: 24782000
Change-Id: I7118deb6d42531c68ed2ac2a59c57b0580b942cc
2015-11-03 15:34:15 -08:00
Mark Salyzyn
b6bee33182 liblog: logd: support logd.timestamp = monotonic
if ro.logd.timestamp or persist.logd.timestamp are set to the value
monotonic then liblog writer, liblog printing and logd all switch to
recording/printing monotonic time rather than realtime. If reinit
detects a change for presist.logd.timestamp, correct the older entry
timestamps in place.

ToDo: A corner case condition where new log entries in monotonic time
      occur before logd reinit detects persist.logd.timestamp, there
      will be a few out-of-order entries, but with accurate
      timestamps. This problem does not happen for ro.logd.timestamp
      as it is set before logd starts.

NB: This offers a nano second time accuracy on all log entries
    that may be more suitable for merging with other system
    activities, such as systrace, that also use monotonic time. This
    feature is for debugging.

Bug: 23668800
Change-Id: Iee6dab7140061b1a6627254921411f61b01aa5c2
2015-11-03 15:15:51 -08:00
Mark Salyzyn
58b8be8906 logd: correct for number of elements in prune
Chatty logs would distort the average log size by elevating the
elements, but not the size. Add statistical collection for the
number of elements that report chatty, and subtract that from
the number of elements to improve the pruning estimate. Pick
minElements as 1% rather than 10% of the total with this more
accurate number of elements, to a minumum of 4.

Bug: 24511000
Change-Id: I3f36558138aa0b2a50e4fac6440c3a8505d95276
2015-10-13 13:43:16 -07:00
Mark Salyzyn
aaad42f47c logd: use coalesce instead of merge (cleanup)
- switch to coalesce instead of merge in naming of functions
  and variables. Confusing since we also to merge-sorts and
  other activities in the logger.
- define maxPrune rather than using a number in the code path.

Bug: 24511000
2015-10-09 15:00:44 -07:00
Mark Salyzyn
2b25c66070 logd: clearAll by UID speedup
- If doing a clear, skip accounting
- Ensure for busy checking, behind a region lock for instance, only
  break out if there was something to do. Basically move the filter
  actions first, and defer checking the region lock to the ends of
  the loops.

Bug: 23711431
Change-Id: Icc984f406880633516fb17dda84188a30d092e01
2015-10-07 16:24:57 -07:00
Mark Salyzyn
c5dc970edc logd: clear return and deal with busy if readers locked
- Propagate to caller the clearing errors, busy blocked by reader.
- For clear, perform retries within logd with a one second lul each,
  telling readers to skip, but on final retry to kill all readers if
  problem still persists due to block reader (or high volume logspammer).

Bug: 23711431
Change-Id: Ie4c46bc9480a7f49b96a81fae25a95c603270c33
2015-10-07 16:23:57 -07:00
Mark Salyzyn
5bb2972dce logd: worst uid record watermark part five
A regression that resulted in increased memory consumption for some
logging patterns because we rarely did merge or leading checks, and
age-out checking. On the last prune cycle, we reset for a full scan.

Add some comments describing the pruning processes.

Bug: 23327476
Bug: 23681639
Bug: 23685592
Change-Id: I22b0f339c9269b006831fda9cefe295a263ebb92
2015-09-10 08:43:03 -07:00
Mark Salyzyn
831aa29730 logd: worst uid record watermark part four
With part deux we caused an apparent regression by not checking for
stale recorded iterators. This checking was on-purpose bypassesed
when leading prune entries were to be deleted without touching the
statistics engine due to an in-place merge.

Part deux had us leaving iterators we were not focussed on untouched
which in turn because they were left behind, had a much higher
likelihood of being deleted without touching the statistics engine.

Perform the check every delete.

Bug: 23789348
Change-Id: Idc6cc23d1f9e3b6cd9a083139a0de59479fbfe08
2015-09-03 17:13:45 -07:00
Mark Salyzyn
ccfe8446a1 logd: worst uid record watermark part three
Regression that cause records to be preserved for more than a day.

Bug: 23681639
Bug: 23685592
Change-Id: I5e4393c8e3ed935790994c77ec51dc6512a6daa6
2015-08-31 20:51:42 +00:00
Mark Salyzyn
49afe0d00f logd: worst uid record watermark part deux
Only record watermark if not known, or represents the worst UID
currently under focus. This has resulted in a halving of the average
prune time in the face of heavy spam because we get less processing
spikes.

Bug: 23327476
Change-Id: I19f297042b9fc2c98d902695c1c36df1bf5cd6f6
2015-08-24 14:04:45 -07:00
Mark Salyzyn
73160acc5c logd: switch asprintf to std::string
Bug: 23350706
Change-Id: I715cdd4563a09de3680081947a3439f0cac623be
2015-08-20 10:32:16 -07:00
Mark Salyzyn
c892ea3fa8 logd: worst uid record watermark
Hold on to last worst uid watermark and bypass a spike to O(n*n*x)
(n=samples, x=number of spammers) wrt chatty trimming.

Bug: 23327476
Change-Id: I9f21ce95e969b67e576417a760f75c4d86acf364
2015-08-20 10:25:57 -07:00
Mark Salyzyn
b39ed0c992 logd: prune 10% or 256 entries max
Bug: 22351810
Bug: 23327476
Change-Id: I902ba6b431d8b7cee2d65ee2f76e9f7c4f30b152
2015-08-20 08:35:45 -07:00
Mark Salyzyn
62ab0fd4ef logd: sizes > 1M prune in smaller batches
Switch to 1% batch sizes from 10% when individual buffer size > 1M

Bug: 22351810
Change-Id: Ifee570a54643ceb0ba767e1787e937f70cc90b72
2015-08-11 16:20:54 -07:00
Mark Salyzyn
e59c469fa8 logd: filter on __android_log_is_loggable
- Default level when not specified is ANDROID_LOG_VERBOSE
  which is inert.

Bug: 20416721
Bug: 19544788
Bug: 17760225
Change-Id: Icc098e53dc47ceaaeb24ec42eb6f61d6430ec2f6
2015-06-12 10:35:09 -07:00
Mark Salyzyn
047cc0729f logd: filters remove leading expire messages and rate
- Cleanup resulting from experience and feedback
- When filtering inside logd, drop any leading expire messages, they
  are cluttering up leading edge of tombstones (which filter by pid)
- Increase and introduce EXPIRE_RATELIMIT from 1 to 10 seconds
- Increase EXPIRE_THRESHOLD from 4 to 10 count
- Improve the expire messages from:
   logd : uid=1000(system) too chatty comm=com.google.android.phone,
                                                   expire 2800 lines
  change tag to be more descriptive, and reduce accusatory tone to:
   chatty : uid=1000(system) com.google.android.phone expire 2800
                                                               lines
- if the UID name forms a prefix for comm name, then drop UID name

Change-Id: Ied7cc04c0ab3ae02167649a0b97378e44ef7b588
2015-06-05 08:05:05 -07:00
Mark Salyzyn
511338dd57 logd: switch to unordered_map from BasicHashtable
BasicHashtable is relatively untested, move over to
a C++ template library that has more bake time.

Bug: 20419786
Bug: 21590652
Bug: 20500228
Change-Id: I926aaecdc8345eca75c08fdd561b0473504c5d95
2015-06-03 13:03:07 -07:00
Mark Salyzyn
10a124d342 Merge "logd: whitelist should not preserve expire messages" 2015-06-01 21:16:57 +00:00
Mark Salyzyn
5921276a16 logd: KISS & fix preserve a day
Code in 833a9b1e38 was borken,
simpler approach is to simply check last entry (to save a
syscall) minus EXPIRE_HOUR_THRESHOLD. This does make longer logs
less likely to call upon the spam detection than the algorithm
being replaced, but sadly we ended up with a log entry in the
future at the beginning of the logs confounding the previous
algorithm.

Bug: 21555259
Change-Id: I04fad67e95c8496521dbabb73b5f32c19d6a16c2
2015-06-01 13:06:35 -07:00
Mark Salyzyn
5392aac95d logd: deal with sloppy leading expire messages
The odds of chatty content also leading the logs is pretty high eg:

 1799 12017 I logd: uid=10007 chatty comm=Binder_B, expire 4 lines
 1799  1829 I logd: uid=10007 chatty comm=Binder_2, expire 4 lines
 1919 20637 I logd: uid=10007 chatty comm=m.sersistent, expire 1 line
 1919 20638 I logd: uid=10007 chatty comm=s.persistent, expire 1 line
 1919  2316 I logd: uid=10007 chatty comm=UlrDispatch, expire 4 lines
19379 20634 I logd: uid=10045 chatty, expire 14 lines
19379 19388 I logd: uid=10045 chatty comm=lizerDaemon, expire 4 lines
  591  4396 I logd: uid=1000 chatty comm=Thread-220, expire 5 lines
  591  1377 I logd: uid=1000 chatty comm=Thread-92, expire 4 lines
 1919  2267 I logd: uid=10007 chatty comm=WifiScanner, expire 4 lines
  591  4397 I logd: uid=1000 chatty comm=DhcpClient, expire 4 lines
  591  4398 I logd: uid=1000 chatty comm=Thread-222, expire 4 lines
  226   580 D CommandListener: Setting iface cfg

Change-Id: I5ab24bc7bf5d2690bae7e789831b07f23ff8bcc6
2015-06-01 13:04:09 -07:00