SecurityEngineering/NSS Startup and Shutdown in Gecko: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(remove draft status)
(update with plan to not shut down NSS)
Line 5: Line 5:


The PSM component observes a number of events, including notification of profile changes, preference changes, and XPCOM shutdown. Upon receiving the "profile-before-change" notification, the PSM component releases all NSS resources held by PSM objects and calls NSS_Shutdown. Consequently, whenever a PSM object attempts to use an NSS resource or call an NSS function, it first must check if NSS has been shut down. This is coordinated by having such classes inherit from nsNSSShutDownObject. At each point NSS resources are used, these classes first acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown() (see nsNSSShutDown.h).
The PSM component observes a number of events, including notification of profile changes, preference changes, and XPCOM shutdown. Upon receiving the "profile-before-change" notification, the PSM component releases all NSS resources held by PSM objects and calls NSS_Shutdown. Consequently, whenever a PSM object attempts to use an NSS resource or call an NSS function, it first must check if NSS has been shut down. This is coordinated by having such classes inherit from nsNSSShutDownObject. At each point NSS resources are used, these classes first acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown() (see nsNSSShutDown.h).
On the topic of profile changes, the current implementation implies that profile switching is supported. That is, the implementation expects that Gecko may initialize NSS in one profile, shut it down, change profiles, and re-initialize NSS (all in the same process). This functionality has not been supported for a long time.


As a consequence of this design, when implementing new functionality that uses NSS, it is easy to do the wrong thing. For instance, code that merely calls NSS functions but do not hold NSS resources may forget to check and prevent NSS from shutting down during the use of that function. Another common mistake is to acquire the nsNSSShutDownPreventionLock but not actually check if NSS has been shut down. These have often led to shutdown crashes (see for example [https://bugzilla.mozilla.org/show_bug.cgi?id=1114741 bug 1114741], [https://bugzilla.mozilla.org/show_bug.cgi?id=1046221 bug 1046221], [https://bugzilla.mozilla.org/show_bug.cgi?id=1029173 bug 1029173], and [https://bugzilla.mozilla.org/show_bug.cgi?id=911336 bug 911336]).
As a consequence of this design, when implementing new functionality that uses NSS, it is easy to do the wrong thing. For instance, code that merely calls NSS functions but do not hold NSS resources may forget to check and prevent NSS from shutting down during the use of that function. Another common mistake is to acquire the nsNSSShutDownPreventionLock but not actually check if NSS has been shut down. These have often led to shutdown crashes (see for example [https://bugzilla.mozilla.org/show_bug.cgi?id=1114741 bug 1114741], [https://bugzilla.mozilla.org/show_bug.cgi?id=1046221 bug 1046221], [https://bugzilla.mozilla.org/show_bug.cgi?id=1029173 bug 1029173], and [https://bugzilla.mozilla.org/show_bug.cgi?id=911336 bug 911336]).
Line 28: Line 26:
** Currently the only way to do this that (mostly) works is for a class to implement the nsNSSShutDownObject mechanism, acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown. It should be possible to perform the same steps without implementing nsNSSShutDownObject (indeed, this would be better, since that interface has more to do with releasing long-lived NSS resources at shutdown). Furthermore, this mechanism doesn't entirely work, because if an object that implements nsNSSShutDownObject is instantiated after NSS has been shut down, isAlreadyShutDown will actually return false.
** Currently the only way to do this that (mostly) works is for a class to implement the nsNSSShutDownObject mechanism, acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown. It should be possible to perform the same steps without implementing nsNSSShutDownObject (indeed, this would be better, since that interface has more to do with releasing long-lived NSS resources at shutdown). Furthermore, this mechanism doesn't entirely work, because if an object that implements nsNSSShutDownObject is instantiated after NSS has been shut down, isAlreadyShutDown will actually return false.


=== Pie-in-the-sky Ultimate Best Scenario ===
==== Potentially Not Shutting Down NSS ====
In the best case scenario, it shouldn't even be possible to write code that does the wrong thing. That is, if Gecko code wants to call NSS functions, the very act of calling the functions should first result in a check that NSS has already been initialized and hasn't yet been shut down (and it should not be possible for NSS to shut down on another thread while that function is running). Similarly, any NSS resources held by Gecko code should automatically release themselves when NSS shuts down. The only way I can think of to achieve this is by some sort of static analysis and/or a shim layer that ensures the correct steps are taken (whereupon directly calling NSS functions would be prohibited). This would require significant engineering work on top of the steps already described above.
Properly implementing the coordinated shutdown of NSS has, to date, proved intractable. For architectural reasons and due to the significant complexity involved, attempting to shut down NSS in the way described in this document has been an ongoing source of crashes and hangs in Firefox. To that end, we have been exploring the possibility of not shutting down NSS at all. For this to work, we have had to address a number of potential concerns.
 
Certificate and key database corruption: In theory, if Firefox were to exit without coordinating with NSS, data stored in the certificate and key databases (backed by BerkeleyDB) could be lost. To mitigate this, we have migrated to using the sqlite-backed implementation. The databases are now journaled, and short of a bug in sqlite, we do not anticipate data loss due to database corruption.
 
PKCS#11 devices: In theory, if Firefox were to exit without coordinating with NSS and thus any attached PKCS#11 devices, data could be lost on these devices. However, it is our understanding that these devices must be robust against unexpected physical removal. Uncoordinated shutdown should present no worse a risk to user data.
 
FIPS 140-2 mode: While Mozilla does not ship a version of Firefox that supports FIPS mode out of the box, Red Hat does. It is our understanding that clearing key material is a requirement of FIPS and that not shutting down NSS may pose a problem for this requirement. [https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/security-policies/140sp3070.pdf Red Hat's FIPS 140-2 Security Policy] specifies that the application (i.e. Firefox) using the module (i.e. NSS) is responsible for zeroization of key material. More specifically, it says "All plaintext secret and private keys must be zeroized when the Module is shut down (with a FC_Finalize call), reinitialized (with a FC_InitToken call), or when the session is closed (with a FC_CloseSession or FC_CloseAllSessions call)." Thus, if Firefox never shuts down NSS, this requirement is trivially met.
 
Leak detection: By not shutting down NSS, technically we leak some allocated memory until shutdown. This could cause problems if our test infrastructure detected and reported these leaks. However, it appears not to (which itself is somewhat concerning). In any case, we will have to deal with this if and when we can detect these leaks.
 
Given that these concerns all have at least a preliminary answer, we will move forward with attempting to not shut down NSS in Firefox. This may expose unexpected issues that may lead to a reassessment of the situation, so this will be on a trial basis only in Nightly.

Revision as of 21:03, 6 December 2017

This is an informational document outlining the modernization and simplification of NSS startup and shutdown in Gecko. It is organized in three parts: the current setup, the desired setup, and a roadmap for achieving the desired setup. If the current date is later than 1 November 2016, this document is likely out of date.

The Current Setup

Classes in PSM (and other parts of Gecko) implement a number of interfaces that require NSS functionality. Instantiating any of these classes causes the PSM component to be initialized. This initialization starts a few services needed by certificate verification. It also performs the initialization of NSS. This involves calling the overall NSS_Initialize function as well as some configuration options like only enabling specific cipher suites and loading the trust anchors for certificate verification.

The PSM component observes a number of events, including notification of profile changes, preference changes, and XPCOM shutdown. Upon receiving the "profile-before-change" notification, the PSM component releases all NSS resources held by PSM objects and calls NSS_Shutdown. Consequently, whenever a PSM object attempts to use an NSS resource or call an NSS function, it first must check if NSS has been shut down. This is coordinated by having such classes inherit from nsNSSShutDownObject. At each point NSS resources are used, these classes first acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown() (see nsNSSShutDown.h).

As a consequence of this design, when implementing new functionality that uses NSS, it is easy to do the wrong thing. For instance, code that merely calls NSS functions but do not hold NSS resources may forget to check and prevent NSS from shutting down during the use of that function. Another common mistake is to acquire the nsNSSShutDownPreventionLock but not actually check if NSS has been shut down. These have often led to shutdown crashes (see for example bug 1114741, bug 1046221, bug 1029173, and bug 911336).

The Desired Setup

NSS should be initialized exactly once and shut down exactly once. Code that uses it should only be able to run after NSS is guaranteed to be initialized. While such code is running, it should prevent NSS from being shut down out from under it. When NSS is about to be shut down, all NSS resources held by the platform should be released. Any NSS resource leaks as detected by NSS_Shutdown should be fatal in debug builds. Once NSS has been shut down (upon notification that the entire process is shutting down), all methods that would use NSS must first check for this and return an error.

Writing new code that correctly deals with these restrictions should be easy.

How to Get There

  • Remove unused and/or unnecessary cruft from PSM/NSS initialization (bug 1215267)
  • Fix all NSS shutdown leaks: bug 1230312
  • Make NSS shutdown leaks fatal
  • Handle the case where an object that needs to be tracked to free resources on NSS shutdown is created before NSS is initialized (this should and can be made to work - bug 1235634)
  • Separate NSS-only initialization from PSM component initialization
  • Ensure NSS is initialized before execution reaches any code that requires it
  • Provide a better mechanism for preventing NSS from shutting down (and checking if it has already shut down)
    • Currently the only way to do this that (mostly) works is for a class to implement the nsNSSShutDownObject mechanism, acquire an nsNSSShutDownPreventionLock and check isAlreadyShutDown. It should be possible to perform the same steps without implementing nsNSSShutDownObject (indeed, this would be better, since that interface has more to do with releasing long-lived NSS resources at shutdown). Furthermore, this mechanism doesn't entirely work, because if an object that implements nsNSSShutDownObject is instantiated after NSS has been shut down, isAlreadyShutDown will actually return false.

Potentially Not Shutting Down NSS

Properly implementing the coordinated shutdown of NSS has, to date, proved intractable. For architectural reasons and due to the significant complexity involved, attempting to shut down NSS in the way described in this document has been an ongoing source of crashes and hangs in Firefox. To that end, we have been exploring the possibility of not shutting down NSS at all. For this to work, we have had to address a number of potential concerns.

Certificate and key database corruption: In theory, if Firefox were to exit without coordinating with NSS, data stored in the certificate and key databases (backed by BerkeleyDB) could be lost. To mitigate this, we have migrated to using the sqlite-backed implementation. The databases are now journaled, and short of a bug in sqlite, we do not anticipate data loss due to database corruption.

PKCS#11 devices: In theory, if Firefox were to exit without coordinating with NSS and thus any attached PKCS#11 devices, data could be lost on these devices. However, it is our understanding that these devices must be robust against unexpected physical removal. Uncoordinated shutdown should present no worse a risk to user data.

FIPS 140-2 mode: While Mozilla does not ship a version of Firefox that supports FIPS mode out of the box, Red Hat does. It is our understanding that clearing key material is a requirement of FIPS and that not shutting down NSS may pose a problem for this requirement. Red Hat's FIPS 140-2 Security Policy specifies that the application (i.e. Firefox) using the module (i.e. NSS) is responsible for zeroization of key material. More specifically, it says "All plaintext secret and private keys must be zeroized when the Module is shut down (with a FC_Finalize call), reinitialized (with a FC_InitToken call), or when the session is closed (with a FC_CloseSession or FC_CloseAllSessions call)." Thus, if Firefox never shuts down NSS, this requirement is trivially met.

Leak detection: By not shutting down NSS, technically we leak some allocated memory until shutdown. This could cause problems if our test infrastructure detected and reported these leaks. However, it appears not to (which itself is somewhat concerning). In any case, we will have to deal with this if and when we can detect these leaks.

Given that these concerns all have at least a preliminary answer, we will move forward with attempting to not shut down NSS in Firefox. This may expose unexpected issues that may lead to a reassessment of the situation, so this will be on a trial basis only in Nightly.