Socorro/ElasticSearch API: Difference between revisions
(→API Spec: Adding new default values.) |
(→API Spec: Dates accept more formats.) |
||
Line 70: | Line 70: | ||
* <tt>terms</tt>: Terms we are search for. Each term must be URL encoded. Several terms can be specified, separated by a + symbol. If not specified, nothing is searched, and the query returns the results corresponding to the other parameters. | * <tt>terms</tt>: Terms we are search for. Each term must be URL encoded. Several terms can be specified, separated by a + symbol. If not specified, nothing is searched, and the query returns the results corresponding to the other parameters. | ||
* <tt>product</tt>: The product we are interested in. (e.g. Firefox, Fennec, Thunderbird... ) Default value is "firefox". | * <tt>product</tt>: The product we are interested in. (e.g. Firefox, Fennec, Thunderbird... ) Default value is "firefox". | ||
* <tt>from_date</tt>: Search for crashes that happened after this date. Default value is a week ago. | * <tt>from_date</tt>: Search for crashes that happened after this date. Can use the following formats: "<tt>yyyy-MM-dd</tt>", "<tt>yyyy-MM-dd HH:ii:ss</tt>" or "<tt>yyyy-MM-dd HH:ii:ss.S</tt>". Default value is a week ago. | ||
* <tt>to_date</tt>: Search for crashes that happened before this date. Default value is now. | * <tt>to_date</tt>: Search for crashes that happened before this date. Can use the following formats: "<tt>yyyy-MM-dd</tt>", "<tt>yyyy-MM-dd HH:ii:ss</tt>" or "<tt>yyyy-MM-dd HH:ii:ss.S</tt>". Default value is now. | ||
* <tt>fields</tt>: Fields we are searching in. Several fields can be specified, separated by a + symbol. Default value is search in all fields. | * <tt>fields</tt>: Fields we are searching in. Several fields can be specified, separated by a + symbol. Default value is search in all fields. | ||
* <tt>version</tt>: Version of the product. Can be set to <tt>_all</tt> to search into all versions. Default value is search in all versions. | * <tt>version</tt>: Version of the product. Can be set to <tt>_all</tt> to search into all versions. Default value is search in all versions. |
Revision as of 00:31, 21 May 2011
Middleware API for ElasticSearch
This is a draft of the new API for querying ElasticSearch through the middleware API of Socorro.
The Middleware API aims to separate the front-end from the back-end by providing an interface to access the data. By doing so, the front-end will not have to care about the storage system, and will retrieve data from Hbase, PostgreSQL or ElasticSearch in a consistent and simple way, through our REST API.
The API is separated in several categories / entry points:
- /query
- /search
- /report
- /crash
- /stats
This categories are explained below.
This API is designed to be built on top of ElasticSearch. However, we want our users to have the choice of using ES or not. That is why we will try to make this API as modular as possible, so we can have different implementations using different storage or search engines (e.g. ElasticSearch, PostGreSQL... ). The Socorro UI should be completely independent from the storage engine used, and should use this API without caring about it.
The API
Version
Every URI is prefixed by a version number, so final URIs should look like: http://example.com/(api_version)/(request)/.
Query
Description
Low level query, just sends a JSON query to ES directly, and returns the result of this query.
API Spec
HTTP request: POST
Data: JSON query to send to ElasticSearch
URI: /query/[(types)/]
- types: Types of data we are looking into. If omitted, default value is _all. Several types can be specified, separated by a + symbol.
Return value
This request returns the exact data the storage system returned.
Example
curl -XPOST 'http://example.com/110505/query/crashes/' -d '{ "query" : { "match_all" : {} } }'
Search
Description
Searches for crashes and returns them. This search is highly configurable, but can also be really simple using default values.
API Spec
HTTP request: GET
URI: /search/(types)/(optional_parameters)
- types: Type of data we are looking into. Can be set to _all to search into all types. Several types can be specified, separated by a + symbol.
Optional parameters:
Except for the first one, every parameter can be omitted. Any omitted parameter has a default value or is not used while querying ES. You can use only some of those parameters or all of them. The order of parameters doesn't matter except for the first one (types).
The complete URI is as follow: /search/(types)/for/(terms)/product/(product)/from/(from_date)/to/(to_date)/in/(fields)/version/(version)/os/(os_name)/branches/(branches)/search_mode/(search_mode)/reason/(crash_reason)/build/(build_id)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_term/(plugin_term)
- terms: Terms we are search for. Each term must be URL encoded. Several terms can be specified, separated by a + symbol. If not specified, nothing is searched, and the query returns the results corresponding to the other parameters.
- product: The product we are interested in. (e.g. Firefox, Fennec, Thunderbird... ) Default value is "firefox".
- from_date: Search for crashes that happened after this date. Can use the following formats: "yyyy-MM-dd", "yyyy-MM-dd HH:ii:ss" or "yyyy-MM-dd HH:ii:ss.S". Default value is a week ago.
- to_date: Search for crashes that happened before this date. Can use the following formats: "yyyy-MM-dd", "yyyy-MM-dd HH:ii:ss" or "yyyy-MM-dd HH:ii:ss.S". Default value is now.
- fields: Fields we are searching in. Several fields can be specified, separated by a + symbol. Default value is search in all fields.
- version: Version of the product. Can be set to _all to search into all versions. Default value is search in all versions.
- os_name: Name of the Operating System. (e.g. Windows, Mac, Linux... ) Default value is search in all OS.
- branches: Several branches can be specified, separated by a + symbol. Default value is search in all branches.
- search_mode: Set how to search. Can be either is_exactly, contains or start_with. Default value is contains.
- crash_reason: Restricts search to crashes caused by this reason. Default value is empty.
- build_id: Restricts search to crashes that happened on a product with this build ID. Default value is empty.
- report_process: Can be any, browser or plugin. Default value is any.
- report_type: Can be any, crash or hang. Default value is any.
- plugin_in: Search for a plugin in this field. report_process has to be set to plugin. Default value is empty.
- plugin_search_mode: How to search for this plugin. report_process has to be set to plugin. Default value is empty.
- plugin_term: Terms to search for. Several terms can be specified, separated by a + symbol. report_process has to be set to plugin. Default value is empty.
Return value
The full JSON documents that meet the search parameters. JSON documents schema to be determined.
Example
http://example.com/110505/search/crashes/for/libflash.so/in/signature/product/firefox/version/4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/
Report
Description
Get a specific report.
API Spec
HTTP request: GET
URI: /report/(report_name)/product/(product)/version/(version)/from/(from_date)/to/(to_date)/
- report_name: The wanted report. Can be:
- top_changers_by_signature
- top_crashers_by_signature
- top_crashers_by_url
- top_crashers_by_domain
- top_crashers_by_topsite
- product: The product we are interested in. (e.g. Firefox, Fennec, Thunderbird... )
- version: Version of the product.
- from_date: Only crashes that happened after this date.
- to_date: Only crashes that happened before this date.
Crash
Description
Searches a crash by it's OOID and returns it. This query is already implemented in the Middleware.
API Spec
See http://code.google.com/p/socorro/wiki/APICalls
Stats
Description
This is a proposition.
Get some statistics around the data. E.g. counting by OS, by product, by ADU, by build... The difference with report is that stats only send back numeric data, counting through the entire data set or in a certain date range.
This may not be useful for Socorro UI right now, but could be in the future. It may be a good way of extending the information we give to our users. Those stats are very likely to be cached, meaning performance should not be an issue.
Examples of ES queries
{ "size" : 0, "query" : { "match_all" : {} }, "facets" : { "os" : { "terms" : { "script_field" : "_source.os_name" } } } }
Gives the number of crashes by OS. Example result:
{ "took" : 50, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 229, "max_score" : 1.0, "hits" : [ ] }, "facets" : { "os" : { "_type" : "terms", "missing" : 74, "terms" : [ { "term" : "Windows NT", "count" : 134 }, { "term" : "Mac OS X", "count" : 13 }, { "term" : "Linux", "count" : 8 } ] } } }
Implementation
Describe how we are going to implement this...