Services/F1/Server/ServicesStatusDB
Goal
When a third party service like Twitter gets down or is starting to be very slow, clients will retry to send to our server more and more requests and our infrastructure will be overloaded and potentially unresponsive.
The goal of the Services Status DB is to provide to every web server in our infrastructure a status of every third party service. The web server can decide to back off a request in case the service is down, and ask the client to retry after.
Principle
1. On every request the client adds a X-Target-Service header containing the domain of the service it wants to reach.
For example, if the clients want to share on Twitter, a "X-Target-Service: twitter.com" is added.
2. The web server (NGinx) that receives the query ask the Services DB what is the status of the service (as described later) and decide if the query should go through or not.
3. If the request is rejected, the client receives a 503 + Retry-After header and has to wait the time provided before it retries.
4. In case the request is accepted, it is passed to the upstream server (Python) that does the job
5. If the upstream server succeeds, it notifies asynchronously the Services DB
6. If the upstream server fails. e.g. if the third party service is considered down, it notifies asynchrously the Services DB
Client UX on outage
= Database The DB is a key/value storage, and stores for each Service:
- a status ratio
- a disabled flag
- a retry-after value
The DB is replicated in several places and is eventually consistent.
Ratio
Each service has a (GR / BR) ratio stored in that DB.
- GR = Number of good responses.
- BR = Number of bad responses.
What are "Good" and "Bad" responses is to the workers discretion.
Disabled
The DB also store a disabled flag for each service, that can be toggled manually to shut down a service if needed.
Retry-After
For each service, the DB stores a Retry-After value. When the DB starts, the value is loaded from a configuration file, but can be changed by the workers or the admin application.