Crash reporting improvements: Difference between revisions

Added information about the improved stack overflow detection project
(Stack overflow improvements)
(Added information about the improved stack overflow detection project)
Line 146: Line 146:
=== Overview ===
=== Overview ===


Status: not started<br>
Status: in progress<br>
Developer(s):<br>
Developer(s): gsvelto<br>
Source code:<br>
Original source code: N/A<br>
Bugs:<br>
Bugs:<br>
* {{Bug|1671082}}
* {{bug|1671082}}
* {{bug|1678152}}
* {{bug|1758673}}
* {{bug|1768794}}


=== Description ===
=== Description ===


* Stack overflows aren't caught properly on macOS
For years we've assumed that stack overflows would be captured by the Breakpad
* Minidump generation when hitting stack overflows is poor on Linux (it's probably poor on macOS too)
exception handlers; this assumption was based on the presence of crash reports
* Linux stack overflows are not clearly labelled on Socorro
involving stack overflows on Windows, the use of an alternate signal stack on
Linux and macOS' exception handler architecture which delegates exceptions to a
separate thread. Real-world testing and bugs proved that we were actually
missing a significant amount of stack overflows:
* On Linux the alternate signal stack was only available on the main thread, stack overflows in other threads wouldn't be caught
* When we did catch a stack overflow on Linux the minidump writer might mistake the guard page for the stack, thus storing an empty stack in the generated minidump
* On Windows only some stack overflow crashes were caught, others would be silently forwarded to Windows Error Reporting
* On macOS the exception handler seems capable of catching the overflow but the minidump writer produces a malformed minidump which is completely unusable
* Crash reports caused by stack overflows are obvious on Windows which has a specific exception for them, but on macOS/Linux they're indistinguishable from other crashes


=== Plan ===
=== Plan ===
This project requires tackling several issues:
* On Linux we need to ensure all threads have an alternate signal stack installed when they're launched and we need to modify the minidump writer to properly identify where the stack is
* On macOS we need to investigate the issues with minidump writing, possibly integrating the required changes in the oxidized minidump writer
* On Windows we need to ensure that the Windows Error Reporting interceptor catches stack overflows
* We need to introduce a test that specifically checks for crash overflows and ensures that they're being caught properly, then enable it one platform at a time
* Last but not least we need to flag macOS/Linux stack overflows so that they're easy to tell apart from other type of regular crashes
Confirmed users
387

edits