canmove, Confirmed users
225
edits
Line 23: | Line 23: | ||
[[File:Clock_app_clean.png|800px]] | [[File:Clock_app_clean.png|800px]] | ||
Running the data through a k-means clustering algorithm, the data points get clustered together into groups as illustrated in the image below. Each color represents a distinct cluster. The clustering algorithms have a few tunable parameters for deciding if a data point "fits" in an existing cluster or should become it's own cluster. | Running the data through a k-means clustering algorithm, the data points get clustered together into groups as illustrated in the image below. Each color represents a distinct cluster. The clustering algorithms have a few tunable parameters for deciding if a data point "fits" in an existing cluster or should become it's own cluster. This acts as like a wideband, low aplitude filter and greatly reduces false positives caused by entropy rather than legitimate regressions. | ||
[[File:Clock_app_clustered.png|800px]] | [[File:Clock_app_clustered.png|800px]] | ||
After clustering, the regression detection algorithm uses a sliding widow of a fixed number of data points. The width of the window is tunable. | After clustering, the regression detection algorithm uses a sliding widow of a fixed number of data points. The width of the window is tunable. Using a sliding window acts as a low pass filter and the width of the window sets the cutoff frequency. To put it in plain terms, if the sliding window is 4 samples wide, any 2 adjacent data points that fall into their own cluster and are bookended by data points in another cluster will get ignored. In the picture below, the yellow cluster is only a single data point that is bookended by data points in the purple cluster. A sliding window equal to or greater than 3 would reject the yellow cluster from the step detection as outlier noise. | ||
[[File:Clock_app_colored.png|800px]] | |||
After the sliding window filters out the high frequency, the step detection pass walks along in time order and examines what happened each time there is a transition from one cluster to the next. If there is a "positive" transition (e.g. cluster_mean<sub>1</sub> - cluster_mean<sub>2</sub> > 0) it will interpret that as an increase in the test results mean and flag it as a regression--assuming lower numbers are better. It is also able to detect "negative" transitions where the test results mean shifts downward and actually gets better. |