Socorro/ElasticSearch API: Difference between revisions

Replaced content with "This page is now in Socorro/Middleware."
(→‎API Spec: Adding result number and offset.)
(Replaced content with "This page is now in Socorro/Middleware.")
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Middleware API for ElasticSearch =
This page is now in [[Socorro/Middleware]].
 
'''This is a draft''' of the new API for querying ElasticSearch through the middleware API of Socorro.
 
The Middleware API aims to separate the front-end from the back-end by providing an interface to access the data. By doing so, the front-end will not have to care about the storage system, and will retrieve data from Hbase, PostgreSQL or ElasticSearch in a consistent and simple way, through our REST API.
 
The API is separated in several categories / entry points:
* /query
* /search
* /report
* /crash
* /stats
 
This categories are explained below.
 
This API is designed to be built on top of ElasticSearch. However, we want our users to have the choice of using ES or not. That is why we will try to make this API as modular as possible, so we can have different implementations using different storage or search engines (e.g. ElasticSearch, PostGreSQL... ). The Socorro UI should be completely independent from the storage engine used, and should use this API without caring about it.
 
= The API =
 
== Version ==
 
Every URI is prefixed by a version number, so final URIs should look like: http://example.com/(api_version)/(request)/.
 
== Query ==
 
=== Description ===
 
Low level query, just sends a JSON query to ES directly, and returns the result of this query.
 
=== API Spec ===
 
HTTP request: '''POST'''<br>
Data: JSON query to send to ElasticSearch<br>
URI: '''/query/[(''types'')/]'''
 
* ''types'': Types of data we are looking into. If omitted, default value is _all. Several types can be specified, separated by a + symbol.
 
=== Return value ===
 
This request returns the exact data the storage system returned.
 
=== Example ===
 
<pre>curl -XPOST 'http://example.com/110505/query/crashes/' -d '{
    "query" : {
        "match_all" : {}
    }
}'</pre>
 
== Search ==
 
=== Description ===
 
Searches for crashes and returns them. This search is highly configurable, but can also be really simple using default values.
 
=== API Spec ===
 
HTTP request: '''GET'''<br>
URI: '''/search/(''types'')/(''optional_parameters'')'''
 
* <tt>types</tt>: Type of data we are looking into. Can be set to <tt>_all</tt> to search into all types. Several types can be specified, separated by a + symbol.
 
<u>'''Optional parameters:'''</u>
 
Except for the first one, every parameter can be omitted. Any omitted parameter has a default value or is not used while querying ES. You can use only some of those parameters or all of them. The order of parameters doesn't matter except for the first one (types).
 
The complete URI is as follow:
/search/(''types'')/'''for/(''terms'')/product/(''product'')/from/(''from_date'')/to/(''to_date'')/in/(''fields'')/version/(''version'')/os/(''os_name'')/branches/(''branches'')/search_mode/(''search_mode'')/reason/(''crash_reason'')/build/(''build_id'')/report_process/(''report_process'')/report_type/(''report_type'')/plugin_in/(''plugin_in'')/plugin_search_mode/(''plugin_search_mode'')/plugin_term/(''plugin_term'')'''
 
* <tt>terms</tt>: Terms we are search for. Each term must be URL encoded. Several terms can be specified, separated by a + symbol. If not specified, nothing is searched, and the query returns the results corresponding to the other parameters.
* <tt>product</tt>: The product we are interested in. (e.g. Firefox, Fennec, Thunderbird... ) Default value is "firefox".
* <tt>from_date</tt>: Search for crashes that happened after this date. Can use the following formats: "<tt>yyyy-MM-dd</tt>", "<tt>yyyy-MM-dd HH:ii:ss</tt>" or "<tt>yyyy-MM-dd HH:ii:ss.S</tt>". Default value is a week ago.
* <tt>to_date</tt>: Search for crashes that happened before this date. Can use the following formats: "<tt>yyyy-MM-dd</tt>", "<tt>yyyy-MM-dd HH:ii:ss</tt>" or "<tt>yyyy-MM-dd HH:ii:ss.S</tt>". Default value is now.
* <tt>fields</tt>: Fields we are searching in. Several fields can be specified, separated by a + symbol. Default value is search in all fields. This is '''NOT''' implemented for PostgreSQL.
* <tt>version</tt>: Version of the product. Can be set to <tt>_all</tt> to search into all versions. Default value is search in all versions.
* <tt>os_name</tt>: Name of the Operating System. (e.g. Windows, Mac, Linux... ) Default value is search in all OS.
* <tt>branches</tt>: Several branches can be specified, separated by a + symbol. Default value is search in all branches.
* <tt>search_mode</tt>: Set how to search. Can be either <tt>is_exactly</tt>, <tt>contains</tt> or <tt>starts_with</tt>. Default value is contains.
* <tt>crash_reason</tt>: Restricts search to crashes caused by this reason. Default value is empty.
* <tt>build_id</tt>: Restricts search to crashes that happened on a product with this build ID. Default value is empty.
* <tt>report_process</tt>: Can be <tt>any</tt>, <tt>browser</tt> or <tt>plugin</tt>. Default value is any.
* <tt>report_type</tt>: Can be <tt>any</tt>, <tt>crash</tt> or <tt>hang</tt>. Default value is any.
* <tt>plugin_in</tt>: Search for a plugin in this field. <tt>report_process</tt> has to be set to <tt>plugin</tt>. Default value is empty.
* <tt>plugin_search_mode</tt>: How to search for this plugin. <tt>report_process</tt> has to be set to <tt>plugin</tt>. Default value is empty.
* <tt>plugin_term</tt>: Terms to search for. Several terms can be specified, separated by a + symbol. <tt>report_process</tt> has to be set to <tt>plugin</tt>. Default value is empty.
* <tt>result_number</tt>: Number of results to return. Default value is 100.
* <tt>result_offset</tt>: Offset of the first result to return. Default value is 0.
 
=== Return value ===
 
The full JSON documents that meet the search parameters. ''JSON documents schema to be determined.''
 
=== Example ===
 
<pre>http://example.com/110505/search/crashes/for/libflash.so/in/signature/product/firefox/version/4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/</pre>
 
== Report ==
 
=== Description ===
 
Get a specific report.
 
=== API Spec ===
 
HTTP request: '''GET'''<br>
URI: '''/report/(''report_name'')/product/(''product'')/version/(''version'')/from/(from_date)/to/(to_date)/'''
 
* <tt>report_name</tt>: The wanted report. Can be:
** top_changers_by_signature
** top_crashers_by_signature
** top_crashers_by_url
** top_crashers_by_domain
** top_crashers_by_topsite
* ''product'': The product we are interested in. (e.g. Firefox, Fennec, Thunderbird... )
* ''version'': Version of the product.
* ''from_date'': Only crashes that happened after this date.
* ''to_date'': Only crashes that happened before this date.
 
Example: <tt>http://example.com/201105/report/top_crashers_by_url/product/firefox/version/5.0/from/2011-05-01/to/2011-05-05/</tt>
 
== Crash ==
 
=== Description ===
 
Searches a crash by it's OOID and returns it. This query is already implemented in the Middleware.
 
=== API Spec ===
 
See http://code.google.com/p/socorro/wiki/APICalls
 
== Stats ==
 
=== Description ===
 
'''This is a proposition.'''
 
Get some statistics around the data. E.g. counting by OS, by product, by ADU, by build... The difference with report is that stats only send back numeric data, counting through the entire data set or in a certain date range.
 
This may not be useful for Socorro UI right now, but could be in the future. It may be a good way of extending the information we give to our users. Those stats are very likely to be cached, meaning performance should not be an issue.
 
=== Examples of ES queries ===
 
<pre>{
    "size" : 0,
    "query" : {
        "match_all" : {}
    },
    "facets" : {
        "os" : {
            "terms" : { "script_field" : "_source.os_name" }
        }
    }
}</pre>
 
Gives the number of crashes by OS. Example result:
 
<pre>{
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 229,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "os" : {
      "_type" : "terms",
      "missing" : 74,
      "terms" : [ {
        "term" : "Windows NT",
        "count" : 134
      }, {
        "term" : "Mac OS X",
        "count" : 13
      }, {
        "term" : "Linux",
        "count" : 8
      } ]
    }
  }
}</pre>
 
= Implementation =
 
''Describe how we are going to implement this...''
Confirmed users
245

edits