Socorro:ClientAPI

From MozillaWiki
Jump to navigation Jump to search

DRAFT
The content of this page is a work in progress intended for review.

Please help improve the draft!

Ask questions or make suggestions in the discussion
or add your suggestions directly to this page.


Socorro Client API

Currently, Socorro has pythonic collectors, processors, and middleware that communicate with HBase via the Thrift protocol. There is a class named hbaseClient.py that contains several pieces of the business logic such as creation of HBase rowkeys based on ooid, and the methods for putting or retrieving data in HBase and manipulating queues and metrics counters.

One of the biggest limitations of the current architecture is that it is very sensitive to latency or outages on the HBase side. If the collectors cannot store an item in HBase then they will store it on local disk and it will not be accessible to the processors or middleware layer until it has been picked up by a cronjob and successfully stored in HBase. The same goes for processors, if they cannot retrieve the raw data and store the processed data, then it will not be available.

This page gives a high level overview of a potential solution to this problem that involves replacing the Thrift communication layer with a higher level API that lives outside of HBase and provides an intermediate queuing and temporary storage layer in addition to a REST interface that contains the necessary business logic needed to store and retrieve crash report data.

Queue and persistance layer

Transparent to the clients, when data is to be written, it will be placed in a distributed map that will perform a write-through to HBase. The map will have size and age configuration parameters such that items that have been written to hbase can be removed and replaced with newer items, and if items which haven't been durably stored yet need to be flushed out of the map, they can be persisted to local storage instead.

If an item is retrieved via the GET API, the map will be checked first. This provides both a hot cache for the most recent data as well as end-to-end accessibility of items even in the face of temporary unavailable of HBase. Storage of crash reports with certain flags such as unprocessed, legacy, and priority will transparently cause them to be added to appropriate queues so downstream workers can scan for items from those queues to be acted upon.

The distributed nature of the queues and map means that the in-memory data is stored with a configurable number of replica copies on other ClientAPI servers in the pool. If one server dies or is inaccessible, the data is still immediately available through the replica copies. Adding more ClientAPI servers will automatically redistribute the existing data and will also provide more storage. The ClientAPI server pool will have a metrics interface that will allow introspection of the amount of data contained in the various parts of it as well as health levels.

Current implementation state

The system as described has already had a few pieces previously prototyped out before the prototype was shelved. Further work on the prototype to bring it to a level suitable for significant acceptance and load testing would take a short period of time. It took advantage of the following libraries:

Jetty
A lightweight performant Java HTTP engine that supports simple REST interfaces
HBase REST (formerly called Stargate)
A built in REST interface for HBase
Hazelcast
A distributed collection Java library providing P2P clustered storage of data in lists, sets, queues, and maps