CloudServices/Sync/FxSync/DeathToUnknownError

From MozillaWiki
< CloudServices‎ | Sync‎ | FxSync
Revision as of 23:19, 11 May 2011 by Rsoderberg (talk | contribs) (→‎Proposal: more about ops)
Jump to navigation Jump to search

Goal

  • No annoying error bars for recoverable errors (e.g. network), especially a requirement for Instant Sync
  • No annoying error bars for recurring errors that have already been acknowledged but can't be solved without a software update
  • No annoying error bars during planned server maintenance windows.
  • Be more informative about problems when an error occurs ("Unknown Error" is not very helpful and doesn't help when users report bugs).

Proposal

  • Network errors should NEVER escalate
    • ensure that all network failures actually end up setting the right bits here, cf bug 624436
    • Instead warn user if they haven't synced successfully for a while (e.g. a week).
  • Def.: "Syncing successfully" = syncing without any *new* errors (errors that we've reported in the past should not resurface)
  • Operations must be able to send a response that does not escalate for some time
    • ops uses 503+retry-after to indicate "work underway" which escalates immediately, 100% of the time, and is thus a poor solution for planned maintenance events
    • currently the best (and only) way to indicate "work underway, try again later" is to hard-close the connection without sending a response.
    • typically outages are no more than 15-20 minutes, so the client escalates SHOULD escalate after 5-10 consecutive errors
  • Kill Unknown Error
    • Mention which engine the error occurred in
    • If we know, mention what part of process the error occurred in (download, upload, etc.)
    • Whereever we throw, throw a meaningful value that can be turned into an l10n