Exceptions: Difference between revisions

Jump to navigation Jump to search
1,721 bytes removed ,  17 January 2008
no edit summary
No edit summary
Line 37: Line 37:
'''Ignore <code>nsresult</code>.''' Call sites that ignore the <code>nsresult</code> must be wrapped in <code>try { call() } catch {}</code>. If we can prove that the call will never throw an exception, which we can currently do for about 1/3 of this category, we could remove the wrapper. However, this is potentially dangerous for the future: the called method may someday throw or propagate exceptions. Thus, it is better to make the call site exception-safe before removing the wrapper.
'''Ignore <code>nsresult</code>.''' Call sites that ignore the <code>nsresult</code> must be wrapped in <code>try { call() } catch {}</code>. If we can prove that the call will never throw an exception, which we can currently do for about 1/3 of this category, we could remove the wrapper. However, this is potentially dangerous for the future: the called method may someday throw or propagate exceptions. Thus, it is better to make the call site exception-safe before removing the wrapper.


'''Forward <code>nsresult</code>.''' This is when the method containing the call site simply propagates the error code to its caller without executing any other code (except destructors). The macro <code>NS_ENSURE_SUCCESS</code> does this. This category is very nice: with exceptions, there is no error handling code at all. To remove the existing error handling code, we can simply remove the assignment of the error code out of the call site and remove the return statement. This may create a dead <code>rv</code> variable, which can be deleted as a (conceptual) separate step.  
'''Propagate <code>nsresult</code>.''' This is when the method containing the call site simply propagates the error code to its caller without executing any other code (except destructors). The macro <code>NS_ENSURE_SUCCESS</code> does this. This category is very nice: with exceptions, there is no error handling code at all. To remove the existing error handling code, we can simply remove the assignment of the error code out of the call site and remove the return statement. This may create a dead <code>rv</code> variable, which can be deleted as a (conceptual) separate step.  


We place call sites that log the error into this category as well: logging will be needed less with exceptions.
We place call sites that log the error into this category as well: logging will be needed less with exceptions.


'''Compensate, then forward <code>nsresult</code>.''' This is when the method containing the call site propagates the error code, but only after running compensation code. Compensation code restores some invariant that would otherwise be lost. The prototypical example is releasing resources to avoid leaks.  
'''Compensate, then propagate <code>nsresult</code>.''' This is when the method containing the call site propagates the error code, but only after running compensation code. Compensation code restores some invariant that would otherwise be lost. The prototypical example is releasing resources to avoid leaks.  


The preferred (by the designers of C++) pattern for this scenario is to create a stack object whose destructor maintains the invariant. This is particularly easy to do if the <code>nsresult</code> compensation code executes for both the failure and success cases: we wrap the compensation code in an object destructor and define an instance just before the call site.
The preferred (by the designers of C++) pattern for this scenario is to create a stack object whose destructor maintains the invariant. This is particularly easy to do if the <code>nsresult</code> compensation code executes for both the failure and success cases: we wrap the compensation code in an object destructor and define an instance just before the call site.
Line 82: Line 82:
== Automated Refactoring ==
== Automated Refactoring ==


(TBD -- make sure to include code text quality issues.)
There are on the order of 50,000 call sites affected by the refactoring to use exceptions, so we want to automate as much as possible. All of the basic patterns discussed above can be detected by static analysis and rewritten automatically, although there are always corner cases that are difficult to automate. We hope the corner cases are a small fraction that can be fixed by hand in reasonable time.


== Static Analysis ==
=== Code Text Quality ===


The rest of the document (which needs rewriting to fit in better) is about how to use static analysis to help refactor Mozilla to use C++ exceptions instead of nsresults. In particular, this is about how to modify the call sites of methods so that the code uses exception handling and is exception safe.
One key issue with automatic rewriting is the quality of the resulting source code text from a human reader's point of view. We don't want to automatically produce ugly exception code--one of our top-level goals is to make the code more readable.


The main goal of the rewriting is to make the code use exceptions instead of nsresults without modifying the behavior of the code. This breaks down into:
This issue needs work. We'll probably have to discover the requirements through trial and review.


* Ensuring that methods that don't crash now don't crash by throwing an unhandled exception.
=== Static Analysis Infrastructure ===


* Ensuring that the error handling code at the call sites runs under the same conditions and does the same thing. (Effectively, we are assuming that the current error handling is correct, and that replicating that behavior with exceptions will result in exception-safe code.)
Refactoring relies on static analysis to (a) detect patterns to be refactored, and (b) verify the safety of the refactoring. Verifying the safety of the refactoring means ensuring that the refactored program will behave identically to the original. Note that the safety verification is typically not sound--sound refactorings are very hard to create. But we hope to create "almost sound" refactorings that will work in practice. Of course, the refactored code must always be reviewed and tested.


== Call sites that ignore nsresult return values ==
Our static analysis infrastructure includes [Pork]-Oink/Elsa, [Dehydra] Classic, and Dehydra GCC. We can already do a decent job on many of the exceptions refactorings using this infrastructure. The only thing we really miss at this point is an abstract interpretation framework based on intraprocedural CFGs. We also lack an alias analysis, but we don't think precise alias analysis is important for this task.


One case where we don't necessarily want to keep current behavior is at call sites that ignore their return values:
== Refactoring patterns ==


  callSomeMethodThatCanFail();
Here we discuss automatic refactoring as applied to each pattern described above.


This is probably a bug, so we want a tool that can find and flag all of these results. After the bugs are fixed, any remaining call sites that really should ignore failures can be rewritten automatically to ignore exceptions.
=== Ignore ===


Note that the following pattern is '''not''' an ignored nsresult return value:
For the '''ignore''' pattern, we need an analysis to detect ignored <code>nsresult</code> return values and then a rewriting to wrap the call in a try block with empty catch. Optionally, the rewriting can also produce an annotation (comment or macro) at call sites that always succeed (assuming no JavaScript implementations) to aid in the later process of making the code exception-safe.


  rv = callMethod();
The ignore analysis can be formulated like this: A call site return value is ignored if along all forward code paths the value never reaches the condition of a branch statement and never escapes the method by being stored in a field, global variable, outparam, or argument of a called method.
  return rv;


This pattern can be rewritten to use exceptions (by simply letting any exceptions propagate up to the caller) with the techniques described in the section on making call sites exception safe.
To be accurate, the analysis really should be flow-sensitive (i.e., trace forward paths, and pay attention to statement ordering), because the same <code>rv</code> variable is often used for the results of many calls.


=== Analysis to detect ignored return values ===
This analysis has been coded as a Dehydra script, and it works fairly well. But there is a limitation because there are too many paths to explore them all in Dehydra, which introduces unsoundness. However, the analysis could be expressed efficiently and without this unsoundness using abstract interpretation.


Roughly, the analysis should flag all call sites where:
The rewriting is simple: the call expression is wrapped in a try block with empty catch. We need to think about whether this should be done with a macro, inline on one line, or on more than one line.


* the method is not guaranteed nothrow (i.e., the method can fail)
=== Propagate ===


* there is no branch on NS_FAILED(rv) or NS_SUCCEEDED(rv)
For the '''propagate''' pattern, we need an analysis to detect call sites that on error immediately return the same error code, and a rewriting to remove the error code storage, testing, and return.


The easy way is to look for '''if''' statements where the condition contains NS_FAILED(rv) or NS_SUCCEEDED(rv). But that's not perfect, because of things like:
The propagate analysis can be formulated like this: A call site return value is propagated directly if along all forward code paths where the value is non-zero the value is returned before any side effects other than logging occur. Side effects include writing to a variable that escapes the method or calling another method.


* Calling NS_FAILED directly on the function without a variable
Dehydra is a good fit for this analysis because in practice, this pattern can be detected or rejected by exploring very short paths forward from the call site: typically either the value is returned or a side effect is produced almost immediately.


* Assigning the rv or the NS_FAILED test result to a variable then branching later
One problem with producing this analysis in Dehydra is that Dehydra doesn't represent <code>if</code> conditions precisely. Dehydra is good at detecting things like <code>if (NS_FAILED(rv))</code> (assuming that we have patched FF to make <code>NS_FAILED</code> an inline function, which we have), but doesn't detect things like <code>if (NS_FAILED(rv) || foo)</code>. The simple case is the most common, so we might be able to get away with doing the more complex ones manually, but the complex cases are common enough that they are probably worth automating as well.


* Overwriting the value of rv before the test (so the rv isn't actually branched on)
Rewriting '''propagate''' sites is potentially tricky, because the analysis property is defined over control-flow paths, but rewriting source code generally operates on ASTs. Hence, we're not prepared to say just yet how this will work in detail, only a general description: The basic idea that all code paths where the value is nonzero should be deleted (because they will never be reached when an exception is thrown) and all code paths where the value is zero should be specialized for a zero value.


The current '''thrower''' tool catches many of these cases but not all. It's believed that (a) thrower only produces false positives, not false negatives, and (b) doesn't produce too many false positives on Mozilla.
=== Compensate and Propagate ===


=== Rewriting ignored return values ===
The analysis conditions to detect the '''compensate and propagate''' pattern: (a) the call site is not an ignore or propagate, and (b) every path where <code>rv</code> is nonzero reaches ends by returning <code>rv</code>.


If it turns out most of these are bugs, we can just flag them for manual repair.
Rewriting: TBD


On the other hand, if almost all of these sites are correct code, then we can handle an ignore site like this:
=== Compensate and Rethrow ===


  callMethod();
The analysis conditions to detect the '''compensate and rethrow''' pattern: (a) the call site is not an ignore or propagate, and (b) every path where <code>rv</code> is nonzero reaches ends by returning a non-rv error code.


by rewriting it to this:
Rewriting: TBD


  nsresult rv = callMethod();
=== Compensate and Continue ===
  if (NS_FAILED(rv)) {}


and letting the full rewriting algorithm rewrite this to use exceptions. Or, we could rewrite it directly as:
The analysis conditions to detect the '''compensate and rethrow''' pattern: (a) the call site is not an ignore or propagate, and (b) every path where <code>rv</code> is nonzero eventually joins the paths where <code>rv</code> is zero.


  try {
Rewriting: TBD
    callMethod();
  } except (nserror e) {}
 
The latter pattern can be expressed with a macro. (dmandelin: But macros are bad, right? I'd vote for no macros here.)
 
== Making call sites exception safe ==
 
Most call sites do check for errors and respond somehow. This section is about how to rewrite those call sites.
 
We'll start with the assumption that call sites adhere to a simple pattern that's relatively easy to rewrite, namely:
 
  nsresult rv = callSomeMethod();
  // note1: no code between call & rv test -- such code can of course be
  //        accommodated with exceptions but requires more design
  // note2: also accept NS_SUCCEEDED(rv) with code in opposite then/else branches
  if (NS_FAILED(rv)) {
    maybe fixup();
    return rv OR return NS_ERROR_FOO or return 0;
  } else {
    maybe do_stuff();
  }   
 
This requires a static analysis that finds all call sites that don't match this pattern so they can be flagged for manual rewrite.
 
Given the simple pattern, the rewrite looks like this:
 
      try {
          callSomeMethod();
          maybe do_stuff();
      } except (nserror e) {
        maybe fixup();
        maybe throw NS_ERROR_FOO; // if original returns new error code
        maybe return;            // if original returns 0
      }
 
The '''thrower''' tool handles some instances of this pattern. (dmandelin: I don't know exactly what thrower can do at this point.)
 
But there are at least two big practical problems with this rewrite.
 
First, when the original code failure case ends with 'return rv', the natural thing to do with exceptions is let the exception propagate. (If we use the rewrite above, we're cutting off the stack trace, which is bad for debugging.) If there is no 'fixup' code, then this is easy: just have no try block. If there is 'fixup' code, then we should rewrite to use a compensator object:
 
  // class created to run fixup
  class SomeCompensator {
    ~SomeCompensator() { fixup(); }
  };
 
  // call site
  SomeCompensator();
  callSomeMethod();
 
This is the style recommended by C++ authorities. The problem here is that the code is no longer equivalent to the original nsresult code: the fixup is run on success as well as failure. We still might be able to produce this rewriting automatically, but we need to know more about how the fixups appear in our code and how they can be identified.
 
Besides fixups, the current nsresult code often logs errors or runs assertions. With exception stack traces, logging should be less important, but assertions are a real issue. We need to decide out how the assertion sites should behave with exceptions and then figure out how to implement that in C++.
 
The second problem is that the if condition is often a little more complicated than what our pattern allows. We also have:
 
* NS_FAILED(rv) || foo
* NS_FAILED(rv) && foo
* NS_FAILED(rv1) || NS_FAILED(rv2)
* NS_FAILED(rv1) && NS_SUCCEEDED(rv2)
 
Above, we said we'd flag these for manual fix, but some of them, like the first two, are pretty common and we might want to fix them automatically. The first two cases seem fairly easy to handle, but we need more information on how they're being used.
 
== nothrow methods ==
 
Methods that never fail are the easiest to handle: exception safety at call sites is free. These are called '''nothrow''' methods following the ''Effective C++'' terminology. We can remove all exception handlers protecting nothrow call sites, making the code shorter and easier to read.
 
We'll assume that any method implemented in JavaScript can fail and throw an exception, so the nothrow analysis is restricted to methods implemented in C++. The prime example is property retrieval methods.
 
[http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Function-Attributes.html#index-g_t_0040code_007bnothrow_007d-function-attribute-1735 GNU attribute syntax]
 
[http://msdn2.microsoft.com/en-us/library/49147z04(VS.80).aspx Microsoft C++ nothrow attribute syntax]
 
=== nothrow analysis ===
 
There are three ways a method can fail:
 
* By returning a non-zero (non-NS_OK) value,
 
* By calling another nsresult method that can fail (i.e., is not nothrow),
 
* By using a C++ feature that can throw an exception. The primary example is allocating memory with '''new'''. [https://bugzilla.mozilla.org/show_bug.cgi?id=353144 new badness]
 
The static analysis must be designed to detect any of these conditions. Because it's hard to know what method will be called in C++, a good starting point for the analysis would be to say that a method is nothrow if it can only return 0, does not invoke new, and does not call any other method. If this doesn't pick up enough nothrow methods, we can either do it by hand or augment the analysis with a simple call graph construction.
313

edits

Navigation menu