Confirmed users
574
edits
(Add motivations.) |
(Update now that ES and request-time rendering are implemented.) |
||
Line 1: | Line 1: | ||
Now that we have [https://bugzilla.mozilla.org/show_bug.cgi?id=1045183 request-time rendering] in place and no longer have to drag folders full of static HTML around the FS, indexing trees in parallel becomes feasible—something we'll need if we're going to scale up to dozens of trees. Motivations: | |||
# Don't let a broken build on one tree scuttle the indexing of the rest. | |||
# Reduce time to refresh the sum of all indexes so we can keep reasonably up to date. | # Reduce time to refresh the sum of all indexes so we can keep reasonably up to date. | ||
The config file is always going to exist, at least to point to the ES servers. | The config file is always going to exist, at least to point to the ES servers, but we need only the original, user-edited one, not a second one generated as an FS artifact of indexing. The following settings currently pulled from the generated file will be pulled from here. Changing them will require a WSGI restart. | ||
www_root | |||
es_hosts | |||
google_analytics_key | |||
default_tree | |||
max_thumbnail_size | |||
ES aliases can handle atomic transitions between versions of a tree's indices. | ES aliases can handle atomic transitions between versions of a tree's indices. | ||
But | But how do we know which of the trees in the config file are actually indexed so far? IOW, what if someone adds a new tree, and it takes awhile to index? Or what if somebody enables a new plugin for that tree, and we don't want to start showing filters for it until it's actually been used in an indexing run? We need to freeze certain attributes of a tree as they were at index time. We'll keep the list of these "frozen" attributes in a dedicated index that has 1 shard(?) and replicas all over the place so queries are fast. The docs would look like this: | ||
{name: "mozilla-central", | {name: "mozilla-central", | ||
es_alias: "dxr_{format}_{tree}", | |||
enabled_plugins: ["clang", "pygmentize"], | |||
maybe some plugin config # TODO: how do we pluggably serialize this? | |||
} | |||
Ordinarily, I'd lean toward memcached for those, but we'd be introducing another server and another lib for just one value. However, it turns out that ES is fast. Locally, in a one-node cluster (i.e. we'll have to re-bench more realistically to be sure), a simple search for all documents returns in 0.458ms on average: | Ordinarily, I'd lean toward memcached for those, but we'd be introducing another server and another lib for just one value. However, it turns out that ES is fast. Locally, in a one-node cluster (i.e. we'll have to re-bench more realistically to be sure), a simple search for all documents returns in 0.458ms on average: | ||
Line 41: | Line 45: | ||
Deploying new DXR code would look like this: | Deploying new DXR code would look like this: | ||
# Grab all the | # Grab all the es_alias entries out of the "tree" index. | ||
# Hit each alias ( | # Hit each alias (substituting the format version of the code I'm thinking about deploying) to see if it's there. | ||
# If all of them are there—that is, all trees have been built to be compatible with the new code—deploy the new webapp code. | # If all of them are there—that is, all trees that were built before have been built to be compatible with the new code—deploy the new webapp code. |