SecurityEngineering/NSS Startup and Shutdown in Gecko: Difference between revisions

update with plan to not shut down NSS
(remove draft status)
(update with plan to not shut down NSS)
Line 5: Line 5:


The PSM component observes a number of events, including notification of profile changes, preference changes, and XPCOM shutdown. Upon receiving the "profile-before-change" notification, the PSM component releases all NSS resources held by PSM objects and calls NSS_Shutdown. Consequently, whenever a PSM object attempts to use an NSS resource or call an NSS function, it first must check if NSS has been shut down. This is coordinated by having such classes inherit from nsNSSShutDownObject. At each point NSS resources are used, these classes first acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown() (see nsNSSShutDown.h).
The PSM component observes a number of events, including notification of profile changes, preference changes, and XPCOM shutdown. Upon receiving the "profile-before-change" notification, the PSM component releases all NSS resources held by PSM objects and calls NSS_Shutdown. Consequently, whenever a PSM object attempts to use an NSS resource or call an NSS function, it first must check if NSS has been shut down. This is coordinated by having such classes inherit from nsNSSShutDownObject. At each point NSS resources are used, these classes first acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown() (see nsNSSShutDown.h).
On the topic of profile changes, the current implementation implies that profile switching is supported. That is, the implementation expects that Gecko may initialize NSS in one profile, shut it down, change profiles, and re-initialize NSS (all in the same process). This functionality has not been supported for a long time.


As a consequence of this design, when implementing new functionality that uses NSS, it is easy to do the wrong thing. For instance, code that merely calls NSS functions but do not hold NSS resources may forget to check and prevent NSS from shutting down during the use of that function. Another common mistake is to acquire the nsNSSShutDownPreventionLock but not actually check if NSS has been shut down. These have often led to shutdown crashes (see for example [https://bugzilla.mozilla.org/show_bug.cgi?id=1114741 bug 1114741], [https://bugzilla.mozilla.org/show_bug.cgi?id=1046221 bug 1046221], [https://bugzilla.mozilla.org/show_bug.cgi?id=1029173 bug 1029173], and [https://bugzilla.mozilla.org/show_bug.cgi?id=911336 bug 911336]).
As a consequence of this design, when implementing new functionality that uses NSS, it is easy to do the wrong thing. For instance, code that merely calls NSS functions but do not hold NSS resources may forget to check and prevent NSS from shutting down during the use of that function. Another common mistake is to acquire the nsNSSShutDownPreventionLock but not actually check if NSS has been shut down. These have often led to shutdown crashes (see for example [https://bugzilla.mozilla.org/show_bug.cgi?id=1114741 bug 1114741], [https://bugzilla.mozilla.org/show_bug.cgi?id=1046221 bug 1046221], [https://bugzilla.mozilla.org/show_bug.cgi?id=1029173 bug 1029173], and [https://bugzilla.mozilla.org/show_bug.cgi?id=911336 bug 911336]).
Line 28: Line 26:
** Currently the only way to do this that (mostly) works is for a class to implement the nsNSSShutDownObject mechanism, acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown. It should be possible to perform the same steps without implementing nsNSSShutDownObject (indeed, this would be better, since that interface has more to do with releasing long-lived NSS resources at shutdown). Furthermore, this mechanism doesn't entirely work, because if an object that implements nsNSSShutDownObject is instantiated after NSS has been shut down, isAlreadyShutDown will actually return false.
** Currently the only way to do this that (mostly) works is for a class to implement the nsNSSShutDownObject mechanism, acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown. It should be possible to perform the same steps without implementing nsNSSShutDownObject (indeed, this would be better, since that interface has more to do with releasing long-lived NSS resources at shutdown). Furthermore, this mechanism doesn't entirely work, because if an object that implements nsNSSShutDownObject is instantiated after NSS has been shut down, isAlreadyShutDown will actually return false.


=== Pie-in-the-sky Ultimate Best Scenario ===
==== Potentially Not Shutting Down NSS ====
In the best case scenario, it shouldn't even be possible to write code that does the wrong thing. That is, if Gecko code wants to call NSS functions, the very act of calling the functions should first result in a check that NSS has already been initialized and hasn't yet been shut down (and it should not be possible for NSS to shut down on another thread while that function is running). Similarly, any NSS resources held by Gecko code should automatically release themselves when NSS shuts down. The only way I can think of to achieve this is by some sort of static analysis and/or a shim layer that ensures the correct steps are taken (whereupon directly calling NSS functions would be prohibited). This would require significant engineering work on top of the steps already described above.
Properly implementing the coordinated shutdown of NSS has, to date, proved intractable. For architectural reasons and due to the significant complexity involved, attempting to shut down NSS in the way described in this document has been an ongoing source of crashes and hangs in Firefox. To that end, we have been exploring the possibility of not shutting down NSS at all. For this to work, we have had to address a number of potential concerns.
 
Certificate and key database corruption: In theory, if Firefox were to exit without coordinating with NSS, data stored in the certificate and key databases (backed by BerkeleyDB) could be lost. To mitigate this, we have migrated to using the sqlite-backed implementation. The databases are now journaled, and short of a bug in sqlite, we do not anticipate data loss due to database corruption.
 
PKCS#11 devices: In theory, if Firefox were to exit without coordinating with NSS and thus any attached PKCS#11 devices, data could be lost on these devices. However, it is our understanding that these devices must be robust against unexpected physical removal. Uncoordinated shutdown should present no worse a risk to user data.
 
FIPS 140-2 mode: While Mozilla does not ship a version of Firefox that supports FIPS mode out of the box, Red Hat does. It is our understanding that clearing key material is a requirement of FIPS and that not shutting down NSS may pose a problem for this requirement. [https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp3070.pdf Red Hat's FIPS 140-2 Security Policy] specifies that the application (i.e. Firefox) using the module (i.e. NSS) is responsible for zeroization of key material. More specifically, it says "All plaintext secret and private keys must be zeroized when the Module is shut down (with a FC_Finalize call), reinitialized (with a FC_InitToken call), or when the session is closed (with a FC_CloseSession or FC_CloseAllSessions call)." Thus, if Firefox never shuts down NSS, this requirement is trivially met.
 
Leak detection: By not shutting down NSS, technically we leak some allocated memory until shutdown. This could cause problems if our test infrastructure detected and reported these leaks. However, it appears not to (which itself is somewhat concerning). In any case, we will have to deal with this if and when we can detect these leaks.
 
Given that these concerns all have at least a preliminary answer, we will move forward with attempting to not shut down NSS in Firefox. This may expose unexpected issues that may lead to a reassessment of the situation, so this will be on a trial basis only in Nightly.
Confirmed users
299

edits