Revision as of 23:19, 11 May 2011

Goal

No annoying error bars for recoverable errors (e.g. network), especially a requirement for Instant Sync
No annoying error bars for recurring errors that have already been acknowledged but can't be solved without a software update
No annoying error bars during planned server maintenance windows.
Be more informative about problems when an error occurs ("Unknown Error" is not very helpful and doesn't help when users report bugs).

Network errors should NEVER escalate
- ensure that all network failures actually end up setting the right bits here, cf bug 624436
- Instead warn user if they haven't synced successfully for a while (e.g. a week).
Def.: "Syncing successfully" = syncing without any *new* errors (errors that we've reported in the past should not resurface)
Operations must be able to send a response that does not escalate for some time
- ops uses 503+retry-after to indicate "work underway" which escalates immediately, 100% of the time, and is thus a poor solution for planned maintenance events
- currently the best (and only) way to indicate "work underway, try again later" is to hard-close the connection without sending a response.
- typically outages are no more than 15-20 minutes, so the client escalates SHOULD escalate after 5-10 consecutive errors
Kill Unknown Error
- Mention which engine the error occurred in
- If we know, mention what part of process the error occurred in (download, upload, etc.)
- Whereever we throw, throw a meaningful value that can be turned into an l10n

@@ Line 11: / Line 11: @@
 ** Instead warn user if they haven't synced successfully for a while (e.g. a week).
 * Def.: "Syncing successfully" = syncing without any *new* errors (errors that we've reported in the past should not resurface)
+* Operations must be able to send a response that does not escalate for some time
+** ops uses 503+retry-after to indicate "work underway" which escalates immediately, 100% of the time, and is thus a poor solution for planned maintenance events
+** currently the best (and only) way to indicate "work underway, try again later" is to hard-close the connection without sending a response.
+** typically outages are no more than 15-20 minutes, so the client escalates SHOULD escalate after 5-10 consecutive errors
 * Kill Unknown Error
 ** Mention which engine the error occurred in
 ** If we know, mention what part of process the error occurred in (download, upload, etc.)
 ** Whereever we throw, throw a meaningful value that can be turned into an l10n