DXR Parallel Tree Indexing: Difference between revisions

Make a few corrections.
(Update now that ES and request-time rendering are implemented.)
(Make a few corrections.)
Line 17: Line 17:


     {name: "mozilla-central",
     {name: "mozilla-central",
     es_alias: "dxr_{format}_{tree}",
     format: 11,  # By storing the number here, we can just query for.... We don't have to worry about the alias template string going stale; the deploy script re-reads the config file each time.
     enabled_plugins: ["clang", "pygmentize"],
     enabled_plugins: ["clang", "pygmentize"],
     maybe some plugin config  # TODO: how do we pluggably serialize this?
     maybe some plugin config  # TODO: how do we pluggably serialize this? We don't need any of these yet, so it can wait.
     }
     }


Ordinarily, I'd lean toward memcached for those, but we'd be introducing another server and another lib for just one value. However, it turns out that ES is fast. Locally, in a one-node cluster (i.e. we'll have to re-bench more realistically to be sure), a simple search for all documents returns in 0.458ms on average:
Ordinarily, I'd lean toward memcached in front of those, but we'd be introducing another server and another lib for just one value. However, it turns out that ES is fast. Locally, in a one-node cluster (i.e. we'll have to re-bench more realistically to be sure), a simple search for all documents returns in 0.458ms on average:


     % ab -n 100 -c 3 'http://127.0.0.1:9200/dxr_test/tree/_search'
     % ab -n 100 -c 3 'http://127.0.0.1:9200/dxr_test/tree/_search'
Line 38: Line 38:
Indexing a tree would look like this:
Indexing a tree would look like this:


# Make a new index called "dxr_hot_prod_formatversion_mozilla-central_somerandombits". Maybe we'll prepend a timestamp and/or the machine name to the random bits, use a machine-local lock, or some other fancy stuff.
# Make a new index called "dxr_hot_prod_formatversion_mozilla-central_somerandombits".
# Index the tree into it.
# Index the tree into it.
# Deploy by updating (or creating) the "dxr_hot_prod_formatversion_mozilla-central" ES alias to point to the newly built ES index. (We'd have to worm the format version into someplace webapp-accessible so it would know what to sub in for "formatversion".)
# Deploy by updating (or creating) the "dxr_hot_prod_formatversion_mozilla-central" ES alias to point to the newly built ES index. (We'd have to worm the format version into someplace webapp-accessible so it would know what to sub in for "formatversion".)
Line 45: Line 45:
Deploying new DXR code would look like this:
Deploying new DXR code would look like this:


# Grab all the es_alias entries out of the "tree" index.
# Figure out the currently-deployed version, probably from just looking at the `format` file on the FS.
# Hit each alias (substituting the format version of the code I'm thinking about deploying) to see if it's there.
# Get the "tree" docs of both that (say, 11) and the new format version we want to deploy (12).
# If all of them are there—that is, all trees that were built before have been built to be compatible with the new code—deploy the new webapp code.
# If the version-12 tree docs are at least the intersection of the version-11 tree docs and the trees from the config file, deploy the new webapp code. (This will ensure that we're never decreasing tree coverage, even if someone deletes a tree from the config file or adds a new one. It also won't hold up deploys if version 11 of a tree never got around to being built.) Also, delete all the tree docs of version 11, just to keep the index from growing forever.
Confirmed users
574

edits