CloudServices/Sync/FxSync/DeathToUnknownError: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎Goal: no error bars for ops maintenance either.)
(→‎Proposal: more about ops)
Line 11: Line 11:
** Instead warn user if they haven't synced successfully for a while (e.g. a week).
** Instead warn user if they haven't synced successfully for a while (e.g. a week).
* Def.: "Syncing successfully" = syncing without any *new* errors (errors that we've reported in the past should not resurface)
* Def.: "Syncing successfully" = syncing without any *new* errors (errors that we've reported in the past should not resurface)
* Operations must be able to send a response that does not escalate for some time
** ops uses 503+retry-after to indicate "work underway" which escalates immediately, 100% of the time, and is thus a poor solution for planned maintenance events
** currently the best (and only) way to indicate "work underway, try again later" is to hard-close the connection without sending a response.
** typically outages are no more than 15-20 minutes, so the client escalates SHOULD escalate after 5-10 consecutive errors
* Kill Unknown Error
* Kill Unknown Error
** Mention which engine the error occurred in
** Mention which engine the error occurred in
** If we know, mention what part of process the error occurred in (download, upload, etc.)
** If we know, mention what part of process the error occurred in (download, upload, etc.)
** Whereever we throw, throw a meaningful value that can be turned into an l10n
** Whereever we throw, throw a meaningful value that can be turned into an l10n

Revision as of 23:19, 11 May 2011

Goal

  • No annoying error bars for recoverable errors (e.g. network), especially a requirement for Instant Sync
  • No annoying error bars for recurring errors that have already been acknowledged but can't be solved without a software update
  • No annoying error bars during planned server maintenance windows.
  • Be more informative about problems when an error occurs ("Unknown Error" is not very helpful and doesn't help when users report bugs).

Proposal

  • Network errors should NEVER escalate
    • ensure that all network failures actually end up setting the right bits here, cf bug 624436
    • Instead warn user if they haven't synced successfully for a while (e.g. a week).
  • Def.: "Syncing successfully" = syncing without any *new* errors (errors that we've reported in the past should not resurface)
  • Operations must be able to send a response that does not escalate for some time
    • ops uses 503+retry-after to indicate "work underway" which escalates immediately, 100% of the time, and is thus a poor solution for planned maintenance events
    • currently the best (and only) way to indicate "work underway, try again later" is to hard-close the connection without sending a response.
    • typically outages are no more than 15-20 minutes, so the client escalates SHOULD escalate after 5-10 consecutive errors
  • Kill Unknown Error
    • Mention which engine the error occurred in
    • If we know, mention what part of process the error occurred in (download, upload, etc.)
    • Whereever we throw, throw a meaningful value that can be turned into an l10n