Sfink/Useful Notes: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
[[#A1]] This is A1
== Problem A ==


fish
When a bad memory leak kicks in, the system can be too unusable to
get useful data out.


face
=== Solution ===


fish
Make it easier to get information out when the system is suffering


face
[[#A1]] Periodically log memory-related information (existing bug, I think? also
telemetry)


fish
[[#A2]] Maintain a rotating database of detailed memory-related information (cf
atop)


face
[[#A3]] Make about:memory capable of outputting to a file, for use with a
command-line invocation 'firefox about:memory?verbose=1&outfile=...'


fish
=== Solution ===


face
Prevent the system from getting into such a bad state


fish
[[#A4]] Make a per-compartment (or per-?) cap on memory usage


face
[[#A5]] When sufferingMode==true, disable GC/CC on big tabs. Probably need to
deactivate them too.


fish
[[#A6]] Early warning when memory usage is getting too high


face
[[#A7]] Crash reporter-like UI for reporting memory problems (do not require an
actual crash to trigger)


fish
== Problem B ==


face
Hard for regular users to generate a useful memory problem report


fish
(all solutions from problem A are relevant here)


face
[[#B1]] Provide a way to dump and submit a reachability graph


fish
[[#B2]] Documentation for how to best help with a memory problem, with various
steps to follow.


face
[[#B3]] Track memory to individual page/tab/compartment/principals.


fish
[[#B4]] Tools for generating profiles with subsets of addons installed (or for
running with different subsets of addons within one profile)


face
[[#B5]] Tools for blaming memory usage on addons (eg detecting "safe" addons to
remove from consideration. Cross-referencing other users' addons and memory
usage similar to the crash correlation reports -- requires telemetry.)


fish
== Problem C ==


face
Hard for developers or knowledgeable and motivated users to generate
a useful memory problem report


fish
The above problem B crossed into this, so everything there is relevant.


face
[[#C1]] Rationalize and document all of our various leak-detection tools.


fish
[[#C2]] Automation and Windows equivalents of my /proc/<pid>/maps hacks


face
[[#C3]] Dumpers that give full heap, full graph, pruned graph. Visualizers,
ish
analyzers, etc. of the dumps.


face
[[#C4]] Collect age of various memory objects (how many CCs or GCs it has been
alive.)


fish
== Problem D ==


face
Garbage is not collected


fish
=== Solution ===


face
Report cycles that CC misses


fish
[[#D1]] Conservative scanner to find cycles involving things not marked as
CC-participants and report them as suspicious.


face
Solution: Report resources that leak over time but are still referenced (so
they are cleaned up before shutdown)


fish
[[#D2]] Register "expected lifetime" at acquisition time. Report things that live
longer than expected, filtered by diagnostics. ("lifetime assertions"? Not
quite.)


face
[[#D3]] Detect subgraphs that grow (at a constant rate?) while a page is open.


fish
[[#D4]] Detect subgraphs that are never accessed


face
== Problem E ==


fish
High memory usage, not leaked


face
(aside from current work like generational gc)


fish
[[#E1]] "Simulator" that runs over logs and estimates peak memory usage if CC/GC
ran at optimal times.


face
[[#E2]] Use reproducible test runs to evaluate what the performance/memory
tradeoff is for various things (eg jit code, structure sizes)


fish
== Problem F ==


face
Hard to navigate through a memory dump or the current state to track
down a specific problem


fish
[[#F1]] Dump all roots of a compartment, and trace roots back to the XPCOM/DOM/whatever thing that is holding onto that root (when available)


face
[[#F2]] Record addr,size,stack at every allocation


fish
[[#F3]]


face
</div>
----------------------------------------------------------------------


fish
Details:


face
<div id='A2'>A2. atop records a ton of statistics about memory, disk, network, CPU, and
other things at a 10 minute sampling interval. Stats are collected both on a
global and per-process granularity. It monitors every process that starts and
stops, even if the process appeared and disappeared entirely between two
samples. It dumps all this in a somewhat-compressed binary log.


fish
The visual UI has a good set of heuristics for detecting "large" values, and
coloring the output accordingly. If your disk is busy for >90% of the sampling
interval, it'll turn red. If your network traffic is a high percentage of the
expected maximum bandwidth, it'll turn red. etc.


face
It lets you use it in 'top-like' mode, where it displays the current state of
things, as well as in a historical mode where it reads from a log file. (It is
decidedly *not* seamless between the two, but it should be.)


fish
It also allows dumping historical data to text files. I've used that for
generating graphs of various values.


face
For the browser, many of the same metrics are applicable, but I'd also like an
ish
equivalent of the processes' info. The idea is to know "what was going on at
XXX?" So it should be user and browser actions, which tab was active, network
requests, significant events firing, etc.


face
</div>
----


fish
<div id='A3'>A3. The idea is that rather than waiting for the screen to redraw for every
action in getting to about:memory, you just do firefox 'about:memory...' and go
have a cup of tea while it thinks about it.


face
</div>
----


fish
<div id='A5'>A5. This is based on pure speculation, but I don't understand why the browser
is so incredibly unusable when memory usage is going nuts. Why is all that
memory being touched? Why isn't it just swapped out and forgotten? Under the
assumption that it's the GC scanning it over and over again, it seems like it
would be nice to suppress GC in this situation. Generational GC could eliminate
this problem in a nicer and much more principled way.


face
</div>
----


fish
<div id='B2'>B2. I have the impression that we have many, many memory-related problem
reports that end up being useless. I think that's really our fault; it's too
hard for users to file useful bug reports. Experienced Mozilla devs don't even
know what to do.


face
</div>
----


fish
<div id='B5'>B5. eg: collect up all API calls that an addon makes (or record them, or
whatever.) Maintain a whitelist of APIs. (If you pass in a string, assume it
may be duplicated a thousand times and stored in a sqlite DB forever, but if
you're just setting existing booleans or reading state, you're blameless.)


face
</div>
----


fish
<div id='C2'>C2. When looking at a memory leak, I took several snapshots of
/proc/<pid>/maps, diffed them to find a memory region that appeared and did not
disappear, and then dumped out the raw memory to a file. Then I ran strings on
it.


face
</div>
----


fish
<div id='D2'>D2. I don't really know enough about the system to flesh this out properly, but
it seems like when you have a bunch of memory lingering around when it really
ought to be dead, that many of the objects comprising that memory should be
able to "know" that they *probably* shouldn't live past... the current page, or
for more than a few seconds, or whatever. Assuming this is possible, it should
be possible to walk up a dominator graph and give a fairly directed answer to
"why has this outlived what it thought its lifespan would be?"


face
Not every memory allocation needs to be marked for this to work. You just need
one object within the "leaked" memory to be marked.


fish
It could also walk the graph "en masse" to ignore individual objects that are
reachable longer than expected and focus on the clusters of objects that are
kept alive by the same thing. (I'm thinking that the expected lifetime is a
guess, and may be inaccurate.)


face
</div>
----


fish
<div id='D4'>D4. eg use mprotect on a random subset of the heap to find pages (or smaller
 
regions, but that's harder) that are never accessed after some point. Remove
face
the GC/CC from consideration.
 
</div>
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
ish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
ish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
fish
 
face
 
<div id="A1">Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text Some longer explanatory text </div>
Confirmed users
328

edits