Exceptions: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 76: Line 76:
=== outparamdel ===
=== outparamdel ===


(TBD)
With exceptions in place, <code>nsresult</code> methods become <code>void</code>-returning methods. Thus, methods with an "outparam" (more formally, a pointer-typed argument designated for passing a result out of the called method) will be able to place their result directly in the return value. This will make life easier for programmers and may save a few processor cycles.
 
We already have a tool called [outparamdel] for doing this refactoring. It hasn't been used yet because it can't really be done before exceptions. (Many getter methods always succeed, so we could technically apply [outparamdel] without exceptions, but it's risky because it assumes (a) the method will never be changed so that it can fail, and (b) the method isn't implemented in JavaScript. Neither assumption is safe.)


== Automated Refactoring ==
== Automated Refactoring ==

Revision as of 21:13, 17 January 2008

This is a discussion and planning document for refactoring Firefox to use C++ exceptions. Briefly, this means:

  • Enabling exceptions in the compiler
  • Replacing nsresult returns and checks with throw and catch statements
  • Changing getter methods to return their results directly (see outparamdel) instead of through pointer-typed parameters

The goal is to do all this without changing the behavior of Firefox, thus not introducing bugs. A somewhat more ambitious goal is to then refactor Firefox methods to be exception-safe as well, which will help avoid future bugs and may even eliminate a few existing bugs.

Benefits of Exceptions

The main reason for using exceptions is improving developer productivity. Hopefully, error handling code will become cleaner and more concise, and modifying code will be easier. outparamdel will make it easier to call getter methods. The error-handling overhaul might even come close to entirely nuking code relating to OOMs and null outparam pointers.

We don't know what the performance effects will be, but it's expected they will be minimal either way. Exceptions can increase performance because error tests are no longer needed in the common success case, output values can be returned directly, and inline code for propagating errors is removed. But exceptions can reduce performance because throwing and catching them is relatively expensive and the catch tables take up space.

General Plan

This section attempts to list all the changes that need to be made to make Firefox use exceptions. As much as possible, we'd like to break this process down into small steps and schedule them in an efficient order. (TBD)

Enabling Exceptions in the Compiler

An essential step is to enable exceptions in the compiler. By itself, this shouldn't do too much, but it may change the behavior of operator new in OOM conditions to throw exceptions instead of returning a null pointer. We'll want to do that eventually, anyway.

"Generic" Exceptions

There are several FF-specific issues with exceptions, which are discussed below. In this section, we discuss how to make FF use C++ exceptions for "generic" error conditions.

nsresult return types will be changed to void. This affects the IDL header file generator, hand-written declarations, and implementation method signatures.

An exception type hierarchy needs to be created to replace error codes. There is an opportunity here to include extra information in the exception object.

Statements like return NS_ERROR_FOO will be replaced with throw statements.

Call sites of nsresult methods need to be updated to handle exceptions instead of testing error codes. The key is to preserve current behavior. Call sites fall into a few categories:

Ignore nsresult. Call sites that ignore the nsresult must be wrapped in try { call() } catch {}. If we can prove that the call will never throw an exception, which we can currently do for about 1/3 of this category, we could remove the wrapper. However, this is potentially dangerous for the future: the called method may someday throw or propagate exceptions. Thus, it is better to make the call site exception-safe before removing the wrapper.

Forward nsresult. This is when the method containing the call site simply propagates the error code to its caller without executing any other code (except destructors). The macro NS_ENSURE_SUCCESS does this. This category is very nice: with exceptions, there is no error handling code at all. To remove the existing error handling code, we can simply remove the assignment of the error code out of the call site and remove the return statement. This may create a dead rv variable, which can be deleted as a (conceptual) separate step.

We place call sites that log the error into this category as well: logging will be needed less with exceptions.

Compensate, then forward nsresult. This is when the method containing the call site propagates the error code, but only after running compensation code. Compensation code restores some invariant that would otherwise be lost. The prototypical example is releasing resources to avoid leaks.

The preferred (by the designers of C++) pattern for this scenario is to create a stack object whose destructor maintains the invariant. This is particularly easy to do if the nsresult compensation code executes for both the failure and success cases: we wrap the compensation code in an object destructor and define an instance just before the call site.

If the compensation code currently runs only in the error case, then a direct translation cannot be made using compensation objects. The direct translation would be try { call() } catch { compensate(); throw e; }. However, it may be that for some call sites it is possible to express the error handling code using a stack compensation object.

Compensate, then return other nsresult. This is just like the previous case, except that a different error code is returned. In this case, we must use a catch block so that we can throw a new exception--propagation won't preserve the existing behavior. But we could still use stack compensation objects for compensation, and place only the new throw in the catch block.

But we might prefer to propagate the exception--the error handling code is simpler, and the stack trace is preserved. This would require more changes in the calling code.

Compensate, then continue. This is like the previous case, except that no error code is returned--execution continues. Again, because we are not propagating the exception, we need a catch block. Again, we do have the option of using stack compensation objects and an empty catch block.

Special Case Error: OOM

In current FF, new returns a null pointer when OOM, but with exceptions enabled, it will throw std::bad_alloc.

On most platforms, this shouldn't matter too much, as true OOMs are rare anyway. But we would of course like FF not to crash in this case, which requires catching std::bad_alloc. The easy way to do this is to refactor these call sites as described for generic errors. A more ambitious solution is to remove most of the catch blocks for OOMs, make the call sites exception-safe, and catch OOMs at a high level. That way, there would be very little code devoted to OOM handling.

Code that allocates memory by means other than new (e.g., malloc) will need to be changed to throw std::bad_alloc on failure.

Special Case Error: XPCOM Infrastructure

QueryInterface may be redesigned so that it never throws an exception. It will simply return a null pointer on failure, and callers must check for it if there is a possibility of failure. (Many QueryInterface calls are guaranteed to succeed.)

GetService may be handled the same way. (TBD)

CreateInstance might also generate OOMs only, although other failure modes are possible. (TBD)


XPConnect

XPConnect needs to be rewritten so that when JavaScript calls C++, the FFI catches C++ exceptions and generates JavaScript exceptions if necessary. Also, when C++ calls JavaScript, C++ exceptions should be generated in response to JavaScript exceptions. This is potentially a lot of work, because there are separate XPConnect implementations for every platform and compiler.

outparamdel

With exceptions in place, nsresult methods become void-returning methods. Thus, methods with an "outparam" (more formally, a pointer-typed argument designated for passing a result out of the called method) will be able to place their result directly in the return value. This will make life easier for programmers and may save a few processor cycles.

We already have a tool called [outparamdel] for doing this refactoring. It hasn't been used yet because it can't really be done before exceptions. (Many getter methods always succeed, so we could technically apply [outparamdel] without exceptions, but it's risky because it assumes (a) the method will never be changed so that it can fail, and (b) the method isn't implemented in JavaScript. Neither assumption is safe.)

Automated Refactoring

(TBD -- make sure to include code text quality issues.)

Static Analysis

The rest of the document (which needs rewriting to fit in better) is about how to use static analysis to help refactor Mozilla to use C++ exceptions instead of nsresults. In particular, this is about how to modify the call sites of methods so that the code uses exception handling and is exception safe.

The main goal of the rewriting is to make the code use exceptions instead of nsresults without modifying the behavior of the code. This breaks down into:

  • Ensuring that methods that don't crash now don't crash by throwing an unhandled exception.
  • Ensuring that the error handling code at the call sites runs under the same conditions and does the same thing. (Effectively, we are assuming that the current error handling is correct, and that replicating that behavior with exceptions will result in exception-safe code.)

Call sites that ignore nsresult return values

One case where we don't necessarily want to keep current behavior is at call sites that ignore their return values:

 callSomeMethodThatCanFail();

This is probably a bug, so we want a tool that can find and flag all of these results. After the bugs are fixed, any remaining call sites that really should ignore failures can be rewritten automatically to ignore exceptions.

Note that the following pattern is not an ignored nsresult return value:

 rv = callMethod();
 return rv;

This pattern can be rewritten to use exceptions (by simply letting any exceptions propagate up to the caller) with the techniques described in the section on making call sites exception safe.

Analysis to detect ignored return values

Roughly, the analysis should flag all call sites where:

  • the method is not guaranteed nothrow (i.e., the method can fail)
  • there is no branch on NS_FAILED(rv) or NS_SUCCEEDED(rv)

The easy way is to look for if statements where the condition contains NS_FAILED(rv) or NS_SUCCEEDED(rv). But that's not perfect, because of things like:

  • Calling NS_FAILED directly on the function without a variable
  • Assigning the rv or the NS_FAILED test result to a variable then branching later
  • Overwriting the value of rv before the test (so the rv isn't actually branched on)

The current thrower tool catches many of these cases but not all. It's believed that (a) thrower only produces false positives, not false negatives, and (b) doesn't produce too many false positives on Mozilla.

Rewriting ignored return values

If it turns out most of these are bugs, we can just flag them for manual repair.

On the other hand, if almost all of these sites are correct code, then we can handle an ignore site like this:

 callMethod();

by rewriting it to this:

 nsresult rv = callMethod();
 if (NS_FAILED(rv)) {}

and letting the full rewriting algorithm rewrite this to use exceptions. Or, we could rewrite it directly as:

 try {
   callMethod();
 } except (nserror e) {}

The latter pattern can be expressed with a macro. (dmandelin: But macros are bad, right? I'd vote for no macros here.)

Making call sites exception safe

Most call sites do check for errors and respond somehow. This section is about how to rewrite those call sites.

We'll start with the assumption that call sites adhere to a simple pattern that's relatively easy to rewrite, namely:

 nsresult rv = callSomeMethod();
 // note1: no code between call & rv test -- such code can of course be
 //        accommodated with exceptions but requires more design
 // note2: also accept NS_SUCCEEDED(rv) with code in opposite then/else branches
 if (NS_FAILED(rv)) {
   maybe fixup();
   return rv OR return NS_ERROR_FOO or return 0;
 } else {
   maybe do_stuff();
 }    

This requires a static analysis that finds all call sites that don't match this pattern so they can be flagged for manual rewrite.

Given the simple pattern, the rewrite looks like this:

     try {
         callSomeMethod();
         maybe do_stuff();
     } except (nserror e) {
        maybe fixup();
        maybe throw NS_ERROR_FOO; // if original returns new error code
        maybe return;             // if original returns 0
     }

The thrower tool handles some instances of this pattern. (dmandelin: I don't know exactly what thrower can do at this point.)

But there are at least two big practical problems with this rewrite.

First, when the original code failure case ends with 'return rv', the natural thing to do with exceptions is let the exception propagate. (If we use the rewrite above, we're cutting off the stack trace, which is bad for debugging.) If there is no 'fixup' code, then this is easy: just have no try block. If there is 'fixup' code, then we should rewrite to use a compensator object:

 // class created to run fixup
 class SomeCompensator {
   ~SomeCompensator() { fixup(); }
 };
 // call site
 SomeCompensator();
 callSomeMethod();
 

This is the style recommended by C++ authorities. The problem here is that the code is no longer equivalent to the original nsresult code: the fixup is run on success as well as failure. We still might be able to produce this rewriting automatically, but we need to know more about how the fixups appear in our code and how they can be identified.

Besides fixups, the current nsresult code often logs errors or runs assertions. With exception stack traces, logging should be less important, but assertions are a real issue. We need to decide out how the assertion sites should behave with exceptions and then figure out how to implement that in C++.

The second problem is that the if condition is often a little more complicated than what our pattern allows. We also have:

  • NS_FAILED(rv) || foo
  • NS_FAILED(rv) && foo
  • NS_FAILED(rv1) || NS_FAILED(rv2)
  • NS_FAILED(rv1) && NS_SUCCEEDED(rv2)

Above, we said we'd flag these for manual fix, but some of them, like the first two, are pretty common and we might want to fix them automatically. The first two cases seem fairly easy to handle, but we need more information on how they're being used.

nothrow methods

Methods that never fail are the easiest to handle: exception safety at call sites is free. These are called nothrow methods following the Effective C++ terminology. We can remove all exception handlers protecting nothrow call sites, making the code shorter and easier to read.

We'll assume that any method implemented in JavaScript can fail and throw an exception, so the nothrow analysis is restricted to methods implemented in C++. The prime example is property retrieval methods.

GNU attribute syntax

Microsoft C++ nothrow attribute syntax

nothrow analysis

There are three ways a method can fail:

  • By returning a non-zero (non-NS_OK) value,
  • By calling another nsresult method that can fail (i.e., is not nothrow),
  • By using a C++ feature that can throw an exception. The primary example is allocating memory with new. new badness

The static analysis must be designed to detect any of these conditions. Because it's hard to know what method will be called in C++, a good starting point for the analysis would be to say that a method is nothrow if it can only return 0, does not invoke new, and does not call any other method. If this doesn't pick up enough nothrow methods, we can either do it by hand or augment the analysis with a simple call graph construction.